Multiplex community detection in complex networks using an evolutionary approach

Multiplex community detection in complex networks using an evolutionary approach

Journal Pre-proof Multiplex community detection in complex networks using an evolutionary approach Fatemeh Karimi , Shahriar Lotfi , Habib Izadkhah P...

9MB Sizes 0 Downloads 98 Views

Journal Pre-proof

Multiplex community detection in complex networks using an evolutionary approach Fatemeh Karimi , Shahriar Lotfi , Habib Izadkhah PII: DOI: Reference:

S0957-4174(20)30010-5 https://doi.org/10.1016/j.eswa.2020.113184 ESWA 113184

To appear in:

Expert Systems With Applications

Received date: Revised date: Accepted date:

18 September 2019 15 December 2019 3 January 2020

Please cite this article as: Fatemeh Karimi , Shahriar Lotfi , Habib Izadkhah , Multiplex community detection in complex networks using an evolutionary approach, Expert Systems With Applications (2020), doi: https://doi.org/10.1016/j.eswa.2020.113184

This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of record. This version will undergo additional copyediting, typesetting and review before it is published in its final form, but we are providing this version to give early visibility of the article. Please note that, during the production process, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain. © 2020 Published by Elsevier Ltd.

Highlights 

The MOEA/D-TS is applied to detect shared communities in multiplex networks.



Improved MOEA/D-TS uses the Clustering Coefficient for the generation of initial population.



No prior knowledge on the number and size of communities is required.



Experiments on real networks show the superiority of the improved MOEA/D-TS.

1

Multiplex community detection in complex networks using an evolutionary approach a

Fatemeh Karimi , Shahriar Lotfi a

a,1

, Habib Izadkhah

a

Department of computer science, Faculty of mathematical sciences, University of Tabriz, Tabriz, Iran

Abstract Multiplex networks are the general representative of complex systems composed of distinct interactions between the same entities on multiple layers. Community detection in the multiplex networks is the problem of finding a shared structure under all layers, which combines the information of the entire network. Most of the existing methods for community detection in the single-layer networks cannot be well applied to detect shared communities in multiplex networks. In this paper, we employ a multi-objective evolutionary approach, namely Multi-Objective Evolutionary Algorithm based on Decomposition with Tabu Search (MOEA/D-TS), to detect shared communities in multiplex networks. Also, we have improved the MOEA/D-TS using a social networks analysis measure named Clustering Coefficient (CC) in terms of the generation of the initial population. This hybrid algorithm employs the parallel computing capacity of the Multi-Objective Evolutionary Algorithm based on Decomposition (MOEA/D) along with the neighborhood search authority of Tabu Search (TS) for discovering Pareto optimal solutions. Extensive experiments on a variety of single-layer and multiplex real-world data sets show the superiority of the proposed method in comparison to state-ofthe-art algorithms and its capability for producing improved results. Keywords: Community detection; Multiplex networks; Evolutionary algorithms; Social network analysis 1. Introduction Fundamental features and intricate structures of many systems in the real world can be modeled as complex networks. All these networks from communication, biological and collaboration systems, to disease, social networks, and energy form different graphs, with objects and the interactions among them as nodes and edges respectively (Girvan & Newman, 2002; Said, Abbasi, Maqbool, Daud, & Aljohani, 2018; Watts & Strogatz, 1998). Many existing works commonly have focused on real systems as single-layer networks with homogeneous links (Said, Abbasi, Maqbool, Daud, & Aljohani, 2018; Srinivas & Rajendran, 2019; Tang & Liu, 2010; Xu, Xu, & Zhang, 2015). In these traditional networks, only one kind of link reflects all interactions and relationships among objects. However, in many real-world systems, we are faced with heterogeneous networks with different communications at multiple levels (Boccaletti et al., 2014; Interdonato, Tagarelli, Ienco, Sallaberry, & Poncelet, 2017; Mucha, Richardson, Macon, Porter, & Onnela, 2010). For instance, in a company as a social system, an employee may have communication using telephone, email, Facebook, or other social media in a way that each level of these relationships shapes different layers of a multilayer or multiplex network. All independent vertices located in different layers of a multiplex network have communication links which demonstrate diverse relationships between agents. Among various network analysis issues, community detection is a challenging optimization problem of specifying the optimum community structure of a network that brings together nodes with the same 1

Corresponding author. E-mail addresses: [email protected] (F. Karimi), [email protected] (S. Lotfi), [email protected] (H. Izadkhah).

2

properties (Guerrero, Montoya, Baños, Alcayde, & Gil, 2017). Generally, the existence of a structure characterized by densely connected nodes in modules has been proven for many real networks (Das, Behera, & Panigrahi, 2016). The sub-graphs of a network shape these communities or modules in which denser connections exist between the nodes within a community compare with a relationship with the rest of the sub-graphs. The communities represent functional units in the networks with a higher probability of shared properties. Also, these structures can reveal some latent or unclear information on different complex networks. For instance, a prevalent community in a social network shows a different set of people having continuous communication like a close group of friends, family, and co-workers or people who often share similar interests or backgrounds (Said et al., 2018). As well, in the present era of communication, individuals often have activity across different social networks. Hence, linking profiles belonging to the same user to connect distributed users from multiple platforms has become momentous (Interdonato et al., 2017). Current community detection researches mainly focus on single-layer networks. A community structure in this kind of networks is made up of a similar set of nodes (Girvan & Newman, 2002), but in a multiplex network, it has no uniform definition since different community structures are detected for each network layer (Ma, Gong, Yan, Liu, & Wang, 2018). Hence, the result of community detection in these structures cannot determine an exact community set in multilayer networks. Existing works in the field of multiplex community detection have two main branches, including the layer aggregation and the ensemble clustering approaches, along with the community definition based optimization approach (Ma et al., 2018). The layer aggregation firstly computes a weighed single-layer network by aggregating all connections of multiplex networks, and then available community detection algorithms are applied to discover communities. The ensemble clustering method detects the communities of each layer at the first step, and then a clustering technique is used to determine shared communities (Berlingerio, Pinelli, & Calabrese, 2013). In the latter approaches, community definition based optimization, at once the concept of a shared community is introduced, and then a determined algorithm is employed for community detection. A generalized version of the modularity has been proposed by Mucha et al. (2010) for the multiplex networks. Also, Amelio & Pizzuti (2014) proposed a multi-objective optimization algorithm for community detection where the quality of communities was evaluated using modularity (Newman & Girvan, 2004) and Normalized Mutual Information (NMI). Note that the layer aggregation and the ensemble clustering approaches do not consider the global definition of collective communities in the multiplex networks, neglect the interactions of communications, and ignore the global analysis of multilayer networks (Ma et al., 2018). This weakness reduces the capabilities of corresponding algorithms on high-dimensional multiplex networks. On the other hand, many community definition based optimization approaches (Amelio & Pizzuti, 2014) (De Domenico, Lancichinetti, Arenas, & Rosvall, 2015), neglect the comprehensive topological connection of multiplex networks and, the probability of getting into local optimal community divisions is unavoidable. The multiplex community detection is a discrete NP-hard optimization problem (Ma, Gong, Yan, Liu, & Wang, 2018). In the last decade, Evolutionary Algorithms (EAs) have played a significant role in the field of optimization problems (Interdonato et al., 2017). EAs are population-based stochastic methods that can perform well in confronting the diverse multi-objective optimization problems (MOP). The flexibility and characteristics of EAs make them a successful method to deal with optimization problems in different sizes, mainly complex optimization problems (Zhang & Li, 2007). However, despite extensive use of EAs for community detection in single-layer networks, their application in multiplex community detection problems has not been well investigated. One of the most well-known methods is MOEA/D, proposed by Zhang & Li (2007), which is known as an effective method to cope with different multi-objective optimization problems 3

in recent literature (Ho-Huu, Hartjes, Visser, & Curran, 2018; Zhang, Liu, Tsang, & Virginas, 2009). This algorithm decomposes the main multi-objective problem into different single-objective optimization subproblems and solves them simultaneously. It has a considerable ability to generate non-dominated solutions on the Pareto optimal front with uniform distribution and lower computational complexity than other evolutionary algorithms (like NSGA-II). Therefore, the MOEA/D is a qualified candidate for solving community detection problems in complex networks. In this paper, we have employed our previous hybrid method (Lotfi & Karimi, 2017), which is based on MOEA/D and Tabu search (TS) for solving the problem of multiplex community detection. This method comprehends the exploration power of a population-based global search (i.e., MOEA/D) and the exploitations of individual-based local search of TS. As is known, many existing evolutionary-based approaches have some shortcomings as consequences of the random generation of Initial Population (IP). These defects often result in a lengthy process for reaching a near-global optimum with low fitness-value IP, and hence, the chance of being stuck in a local optimum will be increased. To avoid the limitation of random initialization and generate high-quality solutions, we have proposed an improved version of MOEA/D-TS using a well-known measure, named the Clustering Coefficient (CC) (Watts & Strogatz, 1998) for IP generation at the first step of the algorithm. Although few similar ideas to CC are employed in other researches for the community detection problem (Said et al., 2018), existing studies do not explore the use of this concept in the area of multiplex networks and in company with a hybrid algorithm. The main contributions of this paper are summarized as follows: 1. We propose an improved version of the MOEA/D-TS to tackle the problem of community detection in the field of complex networks. This method uses the parallel computation capacity of MOEA/D along with the neighborhood search capability of the Improved Diversificator Tabu Search (IDTS) to simultaneously optimize the extracted subproblems from the main multi-objective problem in singlelayer and multiplex networks. 2. The improved MOEA/D-TS utilizes the well-known Clustering Coefficient (CC) to avoid the limitation of Random-IP generation. This strategy leads to high-quality results and accelerates the convergence process of MOEA/D-TS. The recognition of local bridges, which connect different communities of a network, is another advantage of using CC. 3. The proposed method solves the problem of multiplex community detection in two main steps. At first, the MOEA/D-TS determines the community structure of the first layer using modularity density measure as a fitness function to assess the quality of the population. There is no need to decide on the total number of clusters in the first step since the MOEA/D-TS selectively explores the search space and returns a set of solutions for network partitioning via optimization of the objective function. In the second phase, the problem is formulated as maximization of modularity Q index (for each remaining layer Li) and NMI (for each pair of layers Li and Li−1 in the network) simultaneously. 4. Extensive experiments on eight single-layer and nine multiplex networks in various sizes from social, transport, financial, and genetic areas show that the improved MOEA/D-TS can discover good community structure of single-layer networks and is capable of obtaining an accurate shared community structure in multiplex networks. The obtained results demonstrate the superiority of our method in terms of redundancy and accuracy of detected shared communities compared with other state-of-the-art community detection approaches. 4

The structure of the paper is organized as follows. The basic concepts and a comprehensive investigation of community detection methods are given in Section 2. Section 3 describes the different steps of the proposed method in more detail, and the experimental results are presented in Section 4. Finally, Section 5 concludes this paper and proposes some ideas as future works. 2. Background and related works Recently, the problem of community detection in complex networks has gained significant importance (Benson, Gleich, & Leskovec, 2016; Srinivas & Rajendran, 2019). Detecting the community structure is a reliable method to reveal unclear features of complex networks that are not easy to discover in the original form (Mucha et al., 2010). Accordingly, community detection in different kind of networks is one of the active areas in the current research fields, despite all available guidelines (Said et al., 2018). Various researches have been carried out due to the application of community detection in different areas, including complex system analysis, link prediction, and study of the recommender systems (Brandes et al., 2007; Girvan & Newman, 2002). In the following, at first, we introduce a basic terminology of the paper and then present a brief description of the MOEA/D-TS approach. A comprehensive review of the related works in the field of community detection also have been presented in the next sub-sections. 2.1. Terminology

Here we introduce the terms that are used throughout the paper: Multiplex networks: a specific case of multilayer networks showing different types of communication between the identical vertices at multiple layers. In this article, we use the terms multilayer and multiplex networks interchangeably. - Community: communities, clusters, or modules are a different set of vertices with a higher probability to place in the same groups. - Shared community: each layer in a multiplex network has its unique set of communities, while a shared community structure for a multiplex network describes community structures at all layers. - Multiplex community detection: The problem of community detection in multiplex networks is the process of finding a shared community structure considering the communications among vertices in all layers of the network. -

2.2. The MOEA/D-TS

A multi-objective optimization problem in its general form is modeled as: T F ( x )  min(f1 ( x ), f 2 ( x ), . . . , f m ( x )) ,

(1)

subject to x  

where the integer x is called the decision vector, and Ω is the feasible region in decision space. Suppose x A , x B   , we say x A dominate x B ( x A x B ) if and only if f i ( x A )  f i ( x B ) for every i  {1, 2, ..., m } and

f j (x )  f j (v ) for at least one index j  {1, 2,..., m } . If there is no point x   such that x dominates x*, then x* is a Pareto optimal solution. All the non-dominated x* set is called Pareto Front (PF). The general strategy of MOEA/D (Zhang & Li, 2007) is based on breaking the MOP into N scalar optimization subproblems and solving them at the same time. Hence, in this method, the approximation of the PF is decomposed into several 5

single-objective optimization subproblems. The MOEA/D utilizes the neighboring information of each subproblem (i.e., scalar aggregation function) for its optimization. Several approaches are available for constructing aggregation functions, and the most popular approaches are the Tchebycheff and the weighted sum (Zhang & Li, 2007), and we used the Tchebycheff approach in our method. Suppose   (1 ,..., m )T , shows *

*

* T

a collection of weight vectors and Z  ( z 1 , ..., z m ) be the ideal vector where Z i  min{f i (x ) | x  } for i=1,…, m. Using the Tchebycheff approach, the objective function defines as: te

*

*

*

Minimize g ( X |  , z )  max1i m  | f i ( X )  Z i |

(2)

subject to x where Ω is the feasible region in the decision space. For each non-dominated solution x* of (1), there exists a weight vector λ so that x* is the optimal solution of (2). In the MOEA/D, all the objective functions are minimized at the same time and in a single run. This algorithm defines a set of several closest weight vectors i of vector its neighborhood in  as {1 ,  , N } . The intended algorithm receives MOP as input and produces an External Population (EP) include the non-dominated solutions obtained during the search. Other input variables include the probability of crossover, Pc; the probability of mutation, Pm; the number of the subproblems, N; a uniform distribution of N weight vectors λ1,…, λN ; and the number of the weight vectors in the neighborhood of each weight vector, T. Algorithm 1 shows the overall pseudocode of MOEA/D. Algorithm 1 The MOEA/D general framework Step 1 Initialization  Set EP   and gen=0.  Produce an initial population P0 {x1,…,xN} and initialize z = (z1,…, zm)T using the optimum value of f i from the initial population as zi. Set FV i = F(xi).  Consider any two weight vectors and calculate Euclidean distances between them and then work out the T closest i

i

weight vectors to each weight vector. Set B(i) = {i1,…, iT} for each i = 1,…, N, where 1 , ..., T are the T closet i

weight vectors to λ . Step 2 Updating: For I = 1, …, N do 1. Reproduction: randomly pick out two index k and l from B(i), and using appropriate genetic operators produce a new solution y from xk and xl. 2. Mutation: Apply a Mutation operator on y to produce y  . 3. Updating of z: For each j = 1, …, m, if f j ( y )  z j , then set z j  f j ( y ) . te

j

te

4. Updating of Neighbors: For each index j  B (i ) , if g ( y  |  , z )  g ( x

j

j

|  , z ) , then set x

j

 y  , and

j

FV  F ( y ) . 5. Updating of EP: - Remove the whole vectors dominated by F ( y ) from EP. - If no vectors in EP dominates F ( y ) add F ( y ) to EP. 6. Replacement: Use binary tournament replacement strategy

Step 3 Stopping Criteria: If gen  genmax then stop and output EP. Otherwise, gen=gen+1, go to Step 2.

In the initialization step, the indexes of the T closest vectors of i are available in B(i) = {i1,…, iT}. The proximity of any two weight vectors is determined using the Euclidean distance measure. Therefore, the closest vector to i is itself, and hence, i ∈ B(i). The i-th subproblem could have the j-th subproblem as one of its neighbors if j ∈ B(i). In Step 2, and during the i-th pass of the loop, the algorithm considers the T k l neighbor subproblems of the i-th subproblem. Since x and x are the current best solutions to neighbors

6

of the i-th subproblem in Step 2.1, their offspring y probably is an excellent solution to the i-th subproblem. In Step 2.2, a problem-specific mutation applies to offspring y. This manner results in a feasible solution y' with a lower fitness function value for the neighbors of the i-th subproblem. Considering whole neighbors of j

j

the i-th subproblem, x will be replaced with y' in Step 2.4 if y' be superior to x . Because detecting the exact reference point z* is often very time-consuming, z is used as the equivalent. The initialized EP in Step 1.1, is updated by the solution y′ in Step 2.5. Finally, the External Population, EP, utilizes the newlygenerated solution y′ for its update at Step 2.6. The second phase of the MOEAD/TS is based on Tabu search (TS) (Glover, 1995) which its general pseudocode is shown in Algorithm 2 (Mei, Tang, & Yao, 2011). Doing a global search during the first step using MOEA/D results in a PF of non-dominated solutions. These results are set as starting points for the Improved Diversificator Tabu Search (IDTS) (Lotfi & Karimi, 2017). At the first step of the local search using IDTS, two most distant and consecutive points (SL1 or SL2) on the Pareto Front are selected. Then the best solution in the hatched dominant zone Cm is determined by middle point Cm (the middle vector cost of SL1 and SL2). During the second step, this procedure continues IDTS between SL1 and Cm (finding new point Cm1) and between Cm and SL2 (finding new point Cm2) to explore the best solutions in the specified dominant regions (Lotfi & Karimi, 2017). Fig. 1 illustrates the process of IDTS for local search in a bi-objective problem space. In comparison with the simple DTS (Mei et al., 2011), this method reduces unexplored areas within the problem space and uniformly distributes the resulting solution on the PF (Lotfi & Karimi, 2017). Algorithm 2 The TS, general framework Step 1: in a search space S consider an initial solution Set i* = i and k = 0 Step 2: set k = k+1 Make a subset of solutions in N (i, k) in a way that: - The movements to the tabu list are prohibited - The aspiration criterion a(i, m) is applied - The neighborhood of the current solution i at iteration k is N (i, k). Step 3: Find the best solution i' among N (i, k), then apply i = better i.' Step 4: Set i*=i if f(i)≤f(i*) Step 5: Update the list T' and aspiration criterion. Step 6: Stop the algorithm if a termination condition is reached. Otherwise, return to Step 2.

In order to update the old population with discovered solutions of the IDTS, the algorithm uses a binary tournament strategy. This population will be used as an initial set of solutions in the next generation of MOEA/D. f2 SL1

Cm1

f2(SL1) Cm

Cm2 SL2

f2(SL2)

f1(SL1)

f1(SL2)

f1

Fig. 1. The search space for the IDTS (Lotfi & Karimi, 2017)

In the final step of the MOEA/D-TS, we need to select a single solution from the PF. As a standard method, we can pick the solution that maximizes the similarity to the so-called ideal point. The ideal point represents the solution that simultaneously optimizes all the objectives being considered and chooses the best value of 7

each objective. In our method, the gray relational coefficient (GRC) (Wang & Rangaiah, 2017) is used to describe the similarity between each candidate solution and the ideal point. This method does not require weights or any other input from the user. The reader is referred to (Wang & Rangaiah, 2017) for additional details about the formulation of this measure. 2.3. Community detection in single-layer networks

Among various proposed methods for community detection in complex networks (Wei Liu, Jiang, Pellegrini, & Wang, 2016; Zou, Chen, Li, Lu, & Lin, 2017), Modularity Optimization Techniques (MOT), are the most commonly used approaches (Said et al., 2018; Xu et al., 2015). A numerical value of modularity measure specifies the quality of each solution in a way that more precise community structures will obtain higher modularity values in the range of [0, 1] (Newman & Girvan, 2004). The modularity (Q) is defined as:

 ls  ds 2      S 1 P  2 P    c

Q 

(3)

where c is the total number of partitions, the number of links within the partition S is shown by ls, p represents the total number of links in the network, and ds is the total degree of all the nodes in the partition. Due to the vast number of possible partitioning for a given network, even small graphs, a perfect optimization of Q is not feasible in reasonable computational time (Ju, Zhang, Ding, Zeng, & Zhang, 2016), and researchers have proved that the problem of modularity optimization is NP-complete. However, a near-optimal modularity maximum at a reasonable time can be achieved using several novel algorithms and strategies (Ju et al., 2016). The greedy agglomerative hierarchical clustering approach of Girvan & Newman (2002), sequentially merges different vertices to form larger communities in order to increase modularity measure. The introduced method in (Zhou, Lü, Yang, Wang, & Kong, 2015) applies a coalition formation game theory-based approach to identify overlapping and hierarchical communities. Xin, Xie, & Yang (2016) also proposed a Random Walk Sampling (RWS) method for overlapping community detection, with a lower cost than the traditional methods. This approach uses the random walk method to find the closest friends for each node. Srinivas & Rajendran (2019) proposed a new integer linear programming model to identify the most influential nodes and detect community structure in real-life networks.

Heuristic and meta-heuristic methods, like swarm intelligence, simulated annealing, and evolutionary-based algorithms (Holland & others, 1992), are also efficient approaches in this area. A GA-based method was introduced by Tasgin, Herdagdelen & Bingol (2007), which randomly produces the IP and uses a modularity measure as a fitness function. A community detection algorithm for small-scale networks was proposed by Gong, Fu, Jiao, & Du (2011), which combines a determined genetic algorithm (GA) with a hill climbing. Xu, Xu, & Zhang (2015) developed a novel algorithm based on backbone degree and expansion for community detection in real social networks. Gong et al. (2012) proposed MOEA/D-Net for solving community detection problem. In their method, two conflicting objectives include negative ratio association, and ratio cut are optimized simultaneously by MOEA/D. Shang, Bai, Jiao, & Jin (2013) presented the same method by a combination of GA and simulated annealing. The work of Wu & Pan (2015) also concentrates on the proposal of a Memetic Algorithm (MA) for multi-objective community detection. Zou et al. (2017) introduced a novel algorithm for community detection, named Multi-Objective Discrete Backtracking Search Algorithm (MODBSA/D). This method produces the IP using a label propagation-based operator and greedy selection approach to improve the convergence process. 2.4. Community detection in multiplex networks

Current community detection researches mainly focus on single-layer networks (Tang & Liu, 2010; Xin et al., 2016). However, many situations in the real world are appropriately modelled as multilayer networks (Kim & 8

Lee, 2015). The problem of multiplex community detection is a complex issue because the structure of communities in a multiplex network is not as explicit as single-layer networks (Ma et al., 2018). A graph G = {V, E} is general modeling of a single-layer network, where V and E are the set of nodes and links in the network, respectively. To formulate a multiplex network, we refer to the model described in (Kivelä et al., 2014). Let L  L1 , L2 , ..., Ll  be a set of layers. Consider a set V of entities; then, for each choice of entity in V and layer in L, we need to indicate whether the entity is present in that layer. V L  V  L denotes the set containing the entity-layer combinations in which an entity is present in the corresponding layer. The set E L  V L V L contains the undirected links between such entity-layer pairs. We hence denote with GL  (V L ; E L ;V ; L ) the



Vi  v V | (v , Li ) V L

multilayer network graph with a set of nodes V. For every layer Li  L ,

 V

are the set of nodes in the graph of Li , and Ei  V i V i are the set of edges in Li .

Note that in a particular case in which the same set of nodes are placed in all layers of a multilayer network, it is called a multiplex network (Ma et al., 2018). Fig. 2 gives a schematic illustration of a multiplex network having three layers and eight nodes. The networks under each layer have the same nodes and different types of communications. In order to meet a representation of the layers feature in a multiplex network, Tang, Wang, & Liu (2009) employ a structural feature extraction and cross-dimension integration. Toward finding a community structure over the multilayer graph, their proposed method applies a k-means clustering method on the computed embedding. De Domenico, Lancichinetti, Arenas, & Rosvall (2015), defined a new community structure in which the modular flows were detected using a proposed network flow compression method. Another set of researches mainly focuses on query-dependent community detection methods to find local communities (Interdonato et al., 2017). The definition of local communities was given by Hmimida & Kanawati (2015), and a seed-centric algorithm, Mux-Licod, was proposed for community detection in multiplex networks. Zakrzewska & Bader (2015) proposed an approach that uses a seed set expansion procedure for updating the communities incrementally during the network change. Yixuan Li, He, Bindel, & Hopcroft (2015), introduced the Lemon algorithm for local community discovery using truncated random walks and approximate invariant subspace. Also, in the work of Kanawati (2015), multiple local community functions were combined for finding the node-centric communities, and then the impact of applying these methods was evaluated. Despite the acceptable outcomes of all the above researches, it should be considered that all these methods neglect the global analysis of multilayer networks and omit the comprehensive topological connection of multiplex networks. Hence, the problem of high-quality community detection in high-dimensional multiplex networks become more complicated and challenging (Wenfeng Liu et al., 2017).

Fig. 2. An example of a multiplex network with three-layer

Generally, the existing approaches for community detection in multiplex networks can be classified into two main classes (Hmimida & Kanawati, 2015); 1) methods using existing single-layer community detection 9

algorithms, and 2) methods extending existing algorithms for multiplex networks. In the following subsections, we will review the current researches based on these two classes. 

Methods using existing single-layer community detection algorithms In these methods, the principal strategy is to transform the multiplex community detection problem into a similar problem in a single-layer network. There are two ways to reach this goal: layer aggregation and ensemble clustering. Layer aggregation: In this approach, a single-layer version of a given multilayer network is determined, and then community detection is done on the aggregated network using an existing community detection algorithm like InfoMap (Rosvall & Bergstrom, 2008). The aggregated clustering can be computed using different methods. The simple proposed method in (Seifi, 2012) prunes all links with a weight under a given threshold λ ∈ [0, 1] in the obtained consensus graph. The resulting set makes the aggregated partition of obtained connected components. The main challenge of this approach is determining the appropriate value for threshold λ (Hmimida & Kanawati, 2015). -

Ensemble clustering: The second approach is based on determining community structure of separate layers in the whole multiplex network, and then combining all obtained partitions by ensemble clustering approaches (Berlingerio, Pinelli, & Calabrese, 2013). Among various ensemble clustering approaches, one popular method combines all obtained partitions in the form of a consensus graph (Fern & Brodley, 2004). The consensus graph Gcons, which is not necessarily connected, consists of the same nodes equivalent to the initial graph and edges that link two nodes i, j ∈ V if they are placed in the same cluster at least in one layer. The number of times that nodes i and j are in the same cluster is a determinant of their link weight. In the work of Lancichinetti & Fortunato (2012), the well-know BGLL algorithm was employed to detect communities of each layer network. Then, after generating a consensus community matrix, the community structure of the whole network was obtained by BGLL. -



Methods extending existing algorithms for multiplex networks As mentioned earlier, the missing and the increasing of topology connection is the consequence of layer aggregation and the ensemble clustering approaches (Lancichinetti & Fortunato, 2012; Tang & Liu, 2010) which may result in a structure that cannot well reflect the desired shared community structure of the original multiplex networks. On the other hand, in the community definition based optimization approaches at first, the condition of collective communities is evaluated using an index, and then a heuristic or meta-heuristic approach optimizes this index to determine the defined communities (Ma et al., 2018). Due to the influential role of modularity optimization in the context of clustering in single-layer networks, the generalized version of the modularity has been proposed by researchers for the multiplex networks (Hmimida & Kanawati, 2015). The multiplex modularity (QM) was proposed by Mucha et al. (2010) for quality evaluation of the communities in multiplex networks. In their proposed method, a heuristic clustering algorithm, GenLouvin, was also used for multiplex community detection. This multiplex modularity computes as follows (Ma, Gong, Liu, Cai, & Jiao, 2014): QM 

 2u ijrs   1

  Aij  r [r ]

   (r , s )   (i , j )  (x i , x j )  

[r ] [r ] ki k j 

2m

[r ]

10

(4)

where the total number of links is represented by normalization factor u, the degree of node vi in the layer r is shown by k i[ r ] , and xi determines the cluster number of node vi. For the binary delta function δ(i, j) if i = j, then δ(i, j) = 1 and otherwise δ(i, j) = 0 and also the resolution parameterβr is set to 1 for most of the multiplex community detection problems (Mucha et al., 2010). Amelio & Pizzuti (2014), modeled the problem of community detection in the multiplex network as a multi-objective optimization and proposed the MultiMOGA to optimize two objective functions (i.e., Q and NMI). A multilayer random walk, was the basis of a community detection method proposed by Kuncheva & Montana (2015). In the proposed strategy, they implemented a new random walk for each layer, and then the per-layer transition probabilities were determined using a dissimilarity measure between nodes. The main weakness of many community definition based optimization approaches is that they neglect the comprehensive topological connection of multiplex networks and may trap in local optimal community divisions. In recent years, EAs have been used to solve community detection problems in single-layer networks. Extensive studies have proven the abilities and effectiveness of MOEA/D in the field of multi-objective optimization problems (MOPs) (Zhang & Li, 2007; Zhang et al., 2009) and especially community detection (Gog, Dumitrescu, & Hirsbrunner, 2007; Gong, Ma, Zhang, & Jiao, 2012). In this paper, we employ a hybrid evolutionary algorithm method, which is an extension of our previous work, MOEA/D-TS (Lotfi & Karimi, 2017), for community detection in multiplex networks. The extended version of MOEA/D-TS is different from the previous version in the way of IP generation. The following section gives more details about the proposed method.

3. Proposed method

The MOEA/D-TS is a hybrid evolutionary algorithm that combines the exploration power of a populationbased global search (i.e., MOEA/D) and the exploitations of individual-based local search of IDTS. The global search MOEA/D tries to search all the promising solution regions using network-specific knowledge, and the improved diversificator tabu search accelerates the convergence of the algorithm to optimal solutions. The MOEA/D has lower computational complexity than other EAs (like MOGLS and NSGA-II) and can produce high-quality solutions with simple decomposition methods (Zhang & Li, 2007). At the same time, the IDTS is a kind of metaheuristic algorithm based on a local search that refines an individual via problem-specific information for finding accurate solutions in an acceptable time. Many existing work use hybrid-algorithms for community detection in single-layer networks (Gong, Fu, Jiao, & Du, 2011; Ma, Gong, Liu, Cai, & Jiao, 2014; Wu & Pan, 2015) which are not applicable to multiplex networks. In this work, we employ MOEA/D-TS as a combination of two global and local search algorithm as a qualified candidate for solving community detection problems in multiplex networks. The principal motivation of employing the MOEA/D-TS for solving multiplex community detection the low computational complexity and considerable capability of MOEA/D to generate non-dominated solutions in MOPs along with the comprehensive local search of IDTS to reduce unexplored areas within the problem space and speed up the convergence procedure. EAs are effective methods for solving nonlinear optimization problems, but the convergence process may require a large number of iterations. This situation is more complicated in the problems that their solution space is hard to explore (Said et al., 2018). Standard EAs usually begin with a random IP which, may propel the algorithm to a local optimum and need a longe process to reach a near-global optimum. An approach that generates a high-quality initial population can achieve a near-global optimum during fewer iterations. To avoid the limitation of random initialization of IP and generate better solutions, in this paper, we have proposed an improved version of our previous hybrid method, MOEA/D-TS, which uses a well-known Clustering Coefficient (CC) measure to produce IP in the first step of the community detection. Generating IP using CC creates a population in which communities shape by different sets of densely connected nodes (Said et al., 2018). The recognition of local bridges, which connect different communities of a network, is another advantage of using CC. Researchers have used different EAs for community detection, but the proposed method is novel in terms of IP generation, which improves the efficiency and accuracy of the algorithm. 11

Given a multiplex network GL with l layers, L 

L1 , L2 , ..., Ll  , our problem is to detect shared community

structures of the network, with optimizing an objective function. In this paper, we have adopted the idea presented in (Amelio & Pizzuti, 2014) and formulate the problem of multiplex community detection as a MOP with an objective function based on modularity Q for each layer Li and NMI for the obtained community structure at each pair of layers Li and Li−1 in the network. Then we employ the modified MOEA/D-TS to optimize the desired objective function iteratively. In simple terms, we first determine the community structure of the first layer and then continue the community detection process for other layers by maximizing the modularity Q for each layer and the NMI between current and previous layers. Finally, the obtained community structure for the last layer Ll can be considered as the shared community of the whole multiplex network. Fig. 3 illustrates the main framework of the proposed method for the multiplex community detection problem.

S

i=1

Do community detection on Li using MOEA/D-TS Max F(L1)=Maximize (Modularity Density)

i=i+1

No

Yes Was Li the final layer?

Set the best obtained community structure of Li as the optimum community structure of Multiplex network L

Stop

Do community detection on Li using MOEA/DTS based on Li-1 Max F(Li)=Maximize (Q (Li), NMI (Li, Li-1)

Fig. 3. The flowchart of the proposed iterative process for community detection in multiplex networks

3.1. Multiplex community detection using an improved MOEA/D-TS

In this research, we have improved our previous method, MOEA/D-TS, with CC capabilities for solving the multiplex community detection problem. The MOEA/D is known as an effective method to cope with different multi-objective optimization problems in recent literature (Zhang & Li, 2007; Zhang, Liu, Tsang, & Virginas, 2009). This algorithm is based on an optimistic idea that suggests that a non-dominated solution of a MOP could have a chance to be an optimal solution under certain conditions (Gog et al., 2007). It decomposes the main multi-objective problem into different single-objective optimization subproblems and solves them simultaneously. In a MOP, the IDTS also tries to reduce unexplored areas within the problem space and uniformly distribute the resulting solution on the PF. Generating IP using CC, creates a population in which separate communities in the network are formed by a different set of densely connected nodes (Said et al., 2018). The recognition of local bridges, which connect different communities of a network, is another advantage of using CC. In the following, we have explained the different steps of MOEAD/TS in more detail. 

Encoding

In an evolutionary algorithm, a finite set of chromosomes, P = {P1, P2,…, Pn}, shape the IP. A locus-based representation for encoding has been used in CC-GA proposed in (Marra, Emrouznejad, Ho, & Edwards, 2015), while it has some shortcomings like facing with nodes without allocated labels. In the proposed method, a string-based representation with two main advantages is employed. This method assigns an integer label to each node, and therefore, the label assignment process could be ignored. On the other hand, individual reproduction operations using string-based representation can be implemented more beneficial 12

(Wenfeng Liu et al., 2017). In the encoding step of our MOEA/D-TS for the network with N vertices, an integer string S = {S1, S2,…, SN} represents a chromosome, in which the Si identifies the cluster number of node i with a value from the range [1, N]. Another supremacy of this method is that the decoding step can automatically determine the total number of network communities, without having any prior information about that (Gong et al., 2012). Fig. 4. (a) illustrates the divided network into three clusters, and (b) shows one of its possible string-based encodings. 

Clustering coefficient for the generation of an initial population

The generation of IP is a principal phase in EAs which has a significant role in the overall computational time of the algorithm and directly affects the quality of the obtained results in the subsequent steps. Many EA-based community detection algorithms rely on the random generation of the initial population. This method usually increases the probability of connecting distant nodes in a community detection problem and may lead to a low fitness valued population. For generating more accurate solutions without this limitation, with inspiration from the idea used in (Said et al., 2018), we have utilized the CC in the initial step of modified version of MOEA/D-TS. This measure considers two random neighbors of a node and calculates the probability of connection between them. The clustering coefficient of a node v i with degree Ki computes as: Cv i  2 

CE i K i (K i  1)

(5)

Where CEi determines the total number of connecting edges among neighbors of the node v i , except itself, and the upper and lower limit Cv i is always in the range of [0, 1]. A node v i with no connection among its neighbors have Cv i  0 and if all the neighbors of v i being connected, Cv i is equal to 1. The clustering coefficient of a vertex v i (of degree at least 2) determines the probability of connection between any two randomly chosen neighbors of v i , and is computed by dividing the number of triangles containing v i by the number of possible edges between its neighbors (Latapy, 2008). The problem of triangle finding can be solved in O ( n  ) time, where ω < 2.376 is the fast matrix product exponent (Latapy, 2008), and hence, it can be considered as the time complexity of CC computation. Note that calculating CC for IP generation is a one-time process at the initial step of MOEA/D-TS, and therefore, the overall execution time of the algorithm is not affected by it.

Fig. 4. An example of string-based encoding (a) cluster's status in a network (b) possible string-based encoding.

13

According to the CC measure, which quantifies node connectivity among their neighbors, well-connected nodes to each other are more likely to be placed in the same community. This feature is the main reason to use CC instead of choosing a random neighbor and connect nodes to those neighbors having the highest probability in a community. Hence, connecting nodes to their best neighbor with the most CC generates a better community structure in the network (Wenfeng Liu et al., 2017). In the very first step of generating the IP, one neighbor is chosen for each node. If two nodes vi and vj are neighbors, they can be placed in the same community. If the degree of selected node vi is more than one, the neighbor vj with the highest CC will be chosen. If two or more neighbors of vi have the same CC, one of them will be randomly selected. Whenever node vi has only one neighbor vj (i.e., its degree=1), the node connects to its only neighbor. In the situation that node vi has no neighbors (zero degrees), it will place in a random community. The CC-IP process has the following steps: 1. Randomly select node vi 2. Choose neighbor vj having top CC - If considered CC belongs to more than one node, randomly select one of them. - If degree vi = 1, select its only neighbor. - If node vi is alone, put it in a separate community. Generating IP using CC creates a population in which communities shape by different sets of densely connected nodes (Said et al., 2018). The recognition of local bridges, which connect different communities of a network, is another advantage of using CC. A local bridge is an edge between two nodes without any directly connected neighbors. These bridges are considered as the natural cut points to determine various connected parts of the network, and hence exploiting the local bridges is an efficient way to detect communities. The reason is that the neighbors of the connected nodes to the bridges do not have straight edges, and other nodes always have higher CC values. Thus, during the CC-based IP generation, MOEA/D-TS will not connect these nodes, and as a result, the scope of communities will be determined. For example, in Fig. 4 (a), the 12 nodes of the network are clustered in three communities, and local bridges connect nodes 1, 3, 4, 9, and 12. Since the minimum value of CC = 0.3 belongs to these nodes, the proposed method will not connect them and splits the network considering local bridges (Said et al., 2018). 

Fitness function

According to the method presented in (Amelio & Pizzuti, 2014), we can view the problem of finding a shared community structure in a multiplex network as the same problem in a dynamic network, in which its interconnections change over time. In this way, each time step in the dynamic network corresponds to a layer in the multiplex network. Hence, they extended the evolutionary clustering approach adopted for dynamic networks as an efficient method in multiplex networks in a way that an optimal shared community structure for a multiplex network GL can be obtained by iteratively optimizing an objective function.

Inspired by this method, we define the desired fitness function of our MOP for the multiplex community detection problem. Accordingly, an optimal shared community structure for a multiplex network GL with l layers, L

L1 , L2 , ..., Ll  , can be obtained by iteratively optimizing both the facet quality and the sharing cost (Amelio &

Pizzuti, 2014). The Facet quality thus guarantees that the community detection result of the i-th layer under

consideration maximizes the quality function to the max, while the sharing cost means that the clustering of the current facet agrees as much as possible with the clustering obtained for the previous i-1 layers. Finally, the obtained community structure for the last layer Ll will be considered as the shared community structure of the multiplex network. The facet quality FQ, which controls the quality of community structures, is evaluated using the modularity Q (introduced in section 2). For optimizing the sharing cost SC, which evaluates the similarity between the result of community detection in the current layer and the previous one, the Normalized Mutual Information (NMI) is the best choice. The NMI is a measure that estimates the similarity

14

between the actual partitioning of a network and the detected ones. Hence, the second objective for two network partitions A  {A1 , ..., Aa } and B  {B1 ,  , Bb } is computed as NMI (A, B): 2 iA1  jB1 Fij log(Fij N / Fi .Fj ) l

NMI (A , B ) 



lA

l

(6)

Fi .log(Fi . / N )   jB1 Fj .log(Fj . / N ) l

i 1

where F is a confusion matrix, and Fi(Fj) shows the number of elements in row i of matrix F and (or column j). The sum of communities in partition A(B) is also determined by lA(lB). The range of NMI values is always in [0, 1] in a way that, NMI=1 shows that the obtained communities in both layers are the same, and NMI=0 determines two completely distinct community structures. In the form of multi-objective problems, we consider the modularity Q and NMI as our two objective functions. So, the problem can be formulated as maximization of modularity Q for each layer Li and NMI for each pair of layers Li and Li−1 in the network as follows: Max F (Li) = Maximize (Q (Li), NMI (Li, Li-1)

(7)

In this method, the network ordering may have effects on the final performance of the MOEA/D-TS in a way that changing in the selection sequence of network layers for community detection could produce different results (Amelio & Pizzuti, 2014; Wenfeng Liu et al., 2017). Finding the optimal ordering of a multiplex network is a complex issue and needs a comprehensive investigation, which is not in the scope of this paper. Hence, we rely on the employed method on (Amelio & Pizzuti, 2014), which is based on the clustering coefficient measure and obtained the best experimental results. This approach computes the CC of a layer using the average clustering coefficients of its nodes. A graph with a high CC contains nodes that tend to be more connected and therefore have high potential to form communities. In order to determine the best community structure of the first layer, we employ the well-known concept of Modularity density (D) (Gong et al., 2012). The Modularity density is a foundational quality index for community detection based on the density of sub-graphs in which, the higher value of D determines more

accrue partitions. Considering an undirected graph G  V , E  with V  n vertexes, E  e edges, and an adjacent matrix A, we define L (V1 ,V 2 )   Aij and L (V1 ,V 2 )   Aij for two disjoint subsets of V like V1 i V1 , j V 2

i V1 , j V

2

and V2. Given a partition S = (V1, V2, ..., Vm) of the graph, where Vi is the vertex set of sub-graph Gi for i = 1, 2,..., m, the modularity density is defined as: L (V i ,V i )  L (V i ,V i ) |V i | i 1 m

D 

(8)

where the first and second terms of D correspond to the ratio association and the ratio cut, respectively. These terms reflect two fundamental aspects of a good partitioning, since small communities with dense interconnections are determined by maximizing ratio association, and large communities with less communication with the others can be achieved by minimizing the ratio cut. Accordingly, a trade-off between these two objectives is provided using the modularity density. The MOEA/D-TS considers the following fitness function for solving the community detection problem in the first layer of a multiplex network: Max F (L1) = Maximize (NRA, RC)

(9) m

L (V i ,V i )

i 1

|vi |

in which the first objective named Negative Ratio Association (NRA), is computed as NRA   m

L (V i ,V i )

i 1

|vi |

the other objective, the Ratio Cut (RC), is represented as RC   

and

.

There are various motivations for the selection of this objective function based on the NRA and RC. The first one is the potential of these two functions to balance each other’s tendency to increase or decrease the 15

number of communities (Ma et al., 2018). Secondly, according to the work of Fortunato & Barthelemy (2007), the lack of information about the size of a community and high sensitivity of the partition selection to the number of links in the network is the primary reason for the appearance of the resolution limit in the modularity measure. However, this restriction does not exist for both the criteria and the only impressive measure on which they related, is the density of sub-graphs. The total density of links within communities, which is determinant of the Ratio association and the Ratio cut, can be considered as the total density of links among different communities. Hence, both objectives provide different aspects of an accurate partition. Fig. 5 shows the process of multiplex community detection using MOEA/D-TS in a three-layer network. A Multiplex network L1

1

8

4

7 2

5

3

L2

7

3

5

6 8

1 2

8

4

1 2

L3

6

4

3

7

5

6

Step 1: Community detection of the first layer MOEA/D-TS Community detection based on Modularity Density: Max F(L1)=Maximize (NRA, RC)

1 3

2

5

L1

8

4

7 6

Step 2: Community detection of the other layers Community detection of layer L3 based on Modularity and maximizing NMI with the result of community detection in L1 Max F(L3)=Maximize (Q (L3), NMI (L3, L2)

Community detection of layer L2 based on Modularity and maximizing NMI with the result of community detection in L1 Max F(L2)=Maximize (Q (L2), NMI (L2, L1)

1 3

2

5

1 3

L2 2

L2

8 4

5

1

6

2

3

7

6

L3

8 4

7

5 6

3

1

7

L2

8 5

4

3

7

6

3

2

2

6

8 4

1

7

5

1 2

L1

8

4

L3

8 4

5

7

6

Community structure of last layer as final community structure of multiplex network

1 2

8 4

5

7

6

3

Fig. 5. The process of multiplex community detection using improved MOEA/D-TS in a three-layer network



Recombination

In the process of evolution, new offspring is generated by a combination of parent chromosomes using the crossover operator. The lack of global heuristic information is one of the shortcomings in the traditional crossover operators like the uniform crossover, which may affect the speed of the algorithm convergence. One of the efficient recombination methods in the field of EAs is the two-way crossover operator (Ma et al., 2014). Considering two parental chromosomes P1 and P2, the two-way crossover randomly selects a node vi and then finds its equivalent nodes with the same cluster number in P1. Then the same nodes in P2 take this cluster identifier. Performing similar operations in P2 will result in two descendants O1 and O2. This 16

crossover operator can generate offspring which inherit major communities from their parents, while it is difficult to achieve this result using the uniform and two-point crossover operations. Fig. 6. depicts the schematic steps of two-way crossover operation.

Fig. 6. An example of the two-way crossover operation



Mutation

Using the clustering coefficient to generate the initial population will result in producing a large number of small communities at the initial steps of MOEA/D-TS (Said et al., 2018). Traditional mutation methods may need a large number of iterations to merge these communities and cope with a community structure that has higher fitness function value. Therefore, we employ a modern mutation method proposed in (Gong et al., 2012), which guarantees the merge of small communities into larger ones and reduction of the required iterations to have a better fitness function with higher value at the same time. In this mutation method at first, we select a random community Cx and node vi, such that vi ∈ Cx. Then, a random neighbor vj ∈ Cy of the node vi is chosen in which Cx  Cy. In this way, a randomly selected node from a random community Cx connects to one of its neighbors vj from another community Cy. The current population will be updated if this process improves the fitness function of the current generation. This mutation operator has the following steps: • •



Select a community Cx randomly Select a random node vi ∈ Cx if vi is isolated, then no mutation is performed if vi and all of its neighbors are the members of community Cx, then choose another random node Select a random neighbor node vj ∈ Cy of the node vi from another community Cy  Cx and put both of them into a broader community.

Fig. 7 illustrates a flowchart of the execution process of improved MOEA/D-TS for the multiplex community detection problem.

17

Fig. 7. The flowchart of improved MOEA/D-TS for community detection in multiplex networks

4. Experimental results

In this section, extensive experimentation is provided to evaluate the capabilities and effectiveness of our method in the field of community detection. Our proposed method has compared with other contemporary methods on the various benchmarks and real-world networks. The performance of MOEA/D-TS is evaluated using different single-layer and multiplex networks. In the following sub-sections, firstly, the results of community detection using our proposed method in several single-layer networks are presented, and then these experiments are expanded to other multiplex networks. 4.1. Datasets

In this part, we consider various well-known single-layer and multiplex benchmarks from different categories, including biological, social, and collaborative networks. At first, we test the proposed algorithm on nine single-layer networks. These networks include the Zachary Karate Club (Zachary, 1977), the American College Football network (Girvan & Newman, 2002), the bottlenose Dolphins network (Lusseau et al., 2003), the Polbooks network (Newman, 2006), the Jazz Network (Gleiser & Danon, 2003), the E.Coli network (Shen-Orr, Milo, Mangan, & Alon, 2002), the Protein network (Salwinski et al., 2004), the Netscience (Newman, 2006) and the Facebook online-social network (Leskovec & Mcauley, 2012). Among these datasets, Zachary Karate Club, Dolphins, Football, and Polbooks, have a recognized actual community structure while the other remaining networks are unknown (Said et al., 2018). The details about each network's type and size are shown in Table 1. Table 1. The characteristics of single-layer networks (Said et al., 2018) Networks Karate Dolphins Football Polbooks Jazz E.coli Protein NetScience Facebook

Type Social Social Social Social Collaboration Biological Biological Collaboration Online-Social

Nodes 34 62 115 105 198 418 3724 1589 2888

Edges 78 159 613 441 5484 519 8748 2742 2981

Avg. CC 0.58 0.30 0.40 0.48 0.63 0.20 0.21 0.87 0.80

Avg. degree 4.58 5.12 10.66 8.40 27.69 2.48 4.69 3.45 2.06

Avg. path length 2.40 3.35 2.50 3.07 2.23 4.82 5.25 5.82 3.86

The selected multiplex network datasets for performance analysis of the proposed method come from various application fields. These eight networks include the London Transport network1 (Said et al., 2018), the CKM Physicians Innovation multiplex network, the medium-scale Plasmodium GPI multiplex network (Said et al., 2018), the Kapferer Tailor Shop network (Wenfeng Liu et al., 2017), the Cs-Aarhus (Wenfeng Liu et al., 2017), the Celegans GPI, the large-scale Arabidopsis GPI network and the MUS GPI multiplex network. Table 2 shows further properties of all considered datasets in which m denotes the number of communications. Multiplex Networks London Transport CKM Physicians Innovation Plasmodium GPI Kapferer Tailor Shop Cs-Aarhus Celegans GPI Arabidopsis GPI MUS GPI

1

Table 2. Detailed characteristics of the multiplex network datasets. Edge Type Node Node Train Station Routing 369 Physicians Social relation 246 Plasmodium Genetic connection 1203 Tailor Shop Social relation 39 Employees Social relation 61 Elegan Genetic connection 3879 Elegan Genetic connection 6980 Elegan Genetic connection 7747

All these real multiplex networks can be downloaded from https://comunelab.fbk.eu/data.php

18

Layers 3 3 3 4 5 6 7 7

m 441 1551 2521 1018 620 8181 18,654 19,842

4.2. Evaluation

In this part, the performance evaluation of improved MOEA/D-TS in different conditions has been conducted. To this end, we have performed various assessments, including convergence, stability and scalability tests. Also, in order to verify the integrity and the similarity between the obtained results of the proposed method under similar conditions, we have used the Wilcoxon Signed Rank Test (García, Molina, Lozano, & Herrera, 2009). 

The convergence of MOEA/D-TS

In order to analyze the effect of CC-IP on the convergence behavior of MOEA/D-TS, we studied the modularity improvement in 100 generations of MOEA/D-TS using Random-IP and CC-IP, for single-layer networks include Zachary Karate Club, Football, Dolphins, Polbooks and Jazz, as shown in Fig. 8. In this figure, the x and y axes show the generation number and the average of modularity values over 20 runs, respectively. All the results illustrate that using the Clustering Coefficient in IP generation of MOEA/D-TS results in high-quality populations and faster modularity improvements over multiple iterations for these networks. In comparison with simple MOEA/D-TS, which uses Random-IP, the resent proposed method with CC-IP generates a population with the highest modularity values in the initial stages, which significantly contributes to the convergence process of the algorithm. The comparative diagrams show that MOEA/D-TS with Random-IP starts from low values of modularity, which need more iterations to reach global optima, and in many cases, it gets stuck in local optimums. Unlike random start, high-quality CC-IP of improved MOEA/D-TS generates a wealthy initial population, which is improved with the assistance of evolutionary operators and reach global optima in fewer iterations. These outcomes prove the benefits of utilizing the clustering coefficient in the process of the IP generation. It is worth noting that this process continues for subsequent generations (over 100 generations), and the Random-IP will not eventually approach CC-IP. 0.6

0.5

0.4

0.5

P P

0.1

20

40 60 80 Number of Generations

0.3 0.2

P

100

0.3 0.2

P P

0.1

0

20

40 60 80 Number of Generations

0.6

0.6

0.5

0.5

100

0

20

40 60 80 Number of Generations

P P

0.4 0.3 P

0.2

0.4 0.3 0.2

P 0.1 0

0.4

P

0.1

Modularity

0

Modularity

Modularity

0.2

Modularity

Modularity

0.4 0.3

0.1 20

40 60 80 Number of Generations

0

100

20

40 60 80 Number of Generations

100

Fig. 8. The convergence diagram of average modularity values obtained by MOEA/D-TS with random-IP and CC-IP in the first 100 generations of MOEA/D-TS for (a) Zachary Karate Club, (b) Football, (c) Dolphins, (d) Polbooks and (e) Jazz networks. In this figure, the x and y axes show the iteration number and the average of modularity values over 20 runs, respectively.

19

100

For simplicity, we have illustrated the standard deviation of obtained results in the first 50 iterations of the MOEA/D-TS in the experiment, as shown in Fig. 9. The results show a high standard deviation between obtained results at initial iterations of the algorithm using Random-IP in all four benchmarks. However, using CC-IP has reduced these values to the minimum and in most cases, to zero. Note that, during subsequent iterations of the algorithm, all the results will converge, and the values of standard deviation, even in MOEA/D-TS with Random-IP, will tend to zero. 

Stability analysis

The stability analysis of evolutionary algorithms is an essential feature in the assessment of the algorithms' accuracy. The stability of an algorithm means that the acquired results are not subject to specific conditions or randomly obtained. Accordingly, Fig. 10 illustrates the best NMI and Modularity value of the four realworld networks over the 20 runs. The results in these box plots include minimum, first quartile, median, third quartile, and the maximum value of NMI and Modularity Q. The central rectangle in boxes connects the first quartile to the third one, and the red lines show the median. The whiskers above and below the box show the locations of the minimum and maximum, and the symbol + denotes outliers. As shown in Fig. 10 the variability of NMI and Q values in all tested networks is relatively small, and these results prove the stability of our algorithm in independent runs.

0.6

0.5

0.45 0.4

0.2

Modularity

Modularity

0.3 0.25

0.3 0.2

0.4 0.3 0.2

0.15 P

0.1

5

10

15

20 25 30 35 Number of Generations

40

45

50

0

P

P

0.1

0.1

P

P

0.05

0 5

10

15

20 25 30 35 Number of Generations 0.5

40

45

50

P 5

10

15

20 25 30 35 Number of Generations

40

0.5 0.4

Modularity

0.4

Modularity

Modularity

0.5

0.4

0.35

0.3 0.2

P

0.3

P

0.2

P

0

0.1

P

0.1

5

10

15

20 25 30 35 Number of Generations

40

45

0

50

5

10

15

20 25 30 Number of Generations

35

40

45

50

Fig. 9. Comparison between the implementation of MOEA/D-TS with Random-IP and CC-IP based on the standard deviation of obtained results (modularity) in the first 50 generations of the algorithm for (a) Zachary Karate Club, (b) Football, (c) Dolphins, (d) Polbooks and (e) Jazz networks. In this figure, the x and y axes show the iteration number and the average of modularity values over 20 runs, respectively. The error bars also determine the standard deviation of the obtained result over 20 runs.

20

45

50

1.02 0.6 1

0.58 0.56

Modularity Q

NMI

0.98

0.96

0.94

0.54 0.52 0.5 0.48 0.46

0.92 0.44 0.9

0.42

Karate

Dolphin

Football

Karate

Polbooks

Real-World Networks

Dolphin

Football

Polbooks

Real-World Networks

Fig. 10. The box plot of (a) NMI and (b) Modularity values over 20 runs on the four datasets. The results in these box plots include minimum, first quartile, median, third quartile, and the maximum value of NMI and Modularity Q. The central rectangle in boxes connects the first quartile to the third one, and the red lines show the median. The whiskers above and below the box show the locations of the minimum and maximum, and the symbol + denotes outliers.



Wilcoxon signed-rank test

Wilcoxon (García et al., 2009) is one of the nonparametric tests that evaluate the difference between the two treatments or conditions of related samples. The assumption of Normal distribution is not required in this test, and it works well with a small set of samples. In analyzing the results of this test, a significance level (p-value) higher than α indicates weak evidence against the null hypothesis. This result means we fail to reject the null hypothesis and cannot accept the alternative hypothesis. In this study, the Wilcoxon signed rank-test is done using the SPSS software for a 95% confidence interval (or a 5% level of significance, α = 0.05). Table 3 shows the results of analyzing the proposed method using the Wilcoxon test. Based on the modularity metric (Q) for single-layer networks and multiplex modularity (Qm) for multiplex networks, the Wilcoxon test applies the pairwise comparison between two sets of obtained results. For this purpose, we have considered 20 samples of obtained Q in Zachary Karate Club, Dolphins, Polbooks and Jazz network and also 30 samples of obtained Qm for London Transport, CKM Physicians Innovation, Plasmodium GPI, Celegans GPI Arabidopsis GPI, and MUS GPI and divided them into two separate sets. The null hypothesis state that there is a significant difference between the two sets of results. As shown in Table 3, all the obtained pvalues are higher than the level of significance (α = 0.05), and we fail to reject the null hypothesis. Table 3 indicates that two sets of obtained results by MOEA/D-TS for each dataset did not elicit a statistically significant change with a 95% confidence interval. Therefore, the test has not provided statistically significant evidence that the two sets of results are different for these benchmarks. Table 3. The results of the Wilcoxon test with α = 0.05



Single-layer Networks

p-value

Multiplex Networks

p-value

Karate Dolphins Football Polbooks

0.85 0.55 0.34 0.63

Jazz

0.46

London CKM Physicians Innovation Plasmodium GPI Celegans GPI Arabidopsis GPI MUS GPI

0.90 0.88 0.70 0.15 0.84 0.80

Scalability analysis

To evaluate the scalability of the proposed algorithm, we have investigated its behavior on the LFR benchmark networks with different sizes, proposed by Lancichinetti and Fortunato (Lancichinetti & 21

Fortunato, 2009). The LFR networks are characterized by heterogeneous distributions of node degrees and community sizes similar to real-world networks. The node degree and the community size of the network follow the power-law distribution in the LFR benchmark networks, with exponent α and β, respectively (Lancichinetti & Fortunato, 2009). The ratio of the external degree of each vertex is controlled by a mixing parameter μ ∈ [0, 1], where smaller values correspond to more obvious community structures. The mixing parameter μ is the expected fraction of links of a node connecting to other communities. In our experiments, the parameters are as follows: node degrees and community sizes are governed by the power law, with exponents being –2 and –1 respectively; the maximum and average degree are 50 and 20 respectively; the ranges of community sizes are C = [20, 100] and μ = 0.5. As shown in Fig. 11, the method exhibits near-linear time complexity and can be easily scaled to more extensive networks.

0

Time(s)

10

-2

10

-4

10

2

10

3

4

10

10

5

10

Number of Nodes Fig. 11. The mean values of the run-time over 20 independent runs of the MOEA/D-TS (in seconds) on the LFR networks with various sizes on a log-linear scale.

4.3.

Comparison

In this section, we compare the performance and capabilities of the proposed method with different community detection algorithms considering several types of real datasets in various sizes. 

Analysis of single-layer networks

Here, we have evaluated the capability of the proposed method for community detection in single-layer networks. To have a comprehensive performance analysis, we compare the obtained results of our method, MOEA/D-TS, with several recent community detection algorithms (Gong et al., 2012; Ju et al., 2016; Ma et al., 2018). The first set of community detection algorithms have been chosen from evolutionary-based method include Fast greedy algorithm (Newman, 2004), TGA (Gog, Dumitrescu, & Hirsbrunner, 2007); LGA (S. Li, Chen, Du, & Feldman, 2010); MENSGA (Yun Li, Liu, & Lao, 2013), CC-GA (Said et al., 2018), MOEA/D, MOEA/D-Net (Gong et al., 2012), MOPSO (Coello & Lechuga, 2002) and MOEA/DM (Ju et al., 2016). Based on the work of Said et al. in (Said et al., 2018), the other set of community detection algorithms includes a popular Label Propagation Algorithm (LPA) (Raghavan, Albert, & Kumara, 2007); InfoMap (Rosvall & Bergstrom, 2008); Edge Label Propagation Algorithm (ELPA) (Wei Liu et al., 2016); an overlapping link community detection algorithm (LinkCom) (Ahn, Bagrow, & Lehmann, 2010), a dynamic label propagation method (COPRA) (Gregory, 2010) a Clustering Coefficient based Genetic Algorithm (CC-GA) and an algorithm based on topological properties of network (NeTa) (Wei Liu, Pellegrini, & Wang, 2014). In this analysis, the fitness function is based on modularity density maximization (explained in Section 3), and two considered evaluation metrics are the modularity (Q) and the NMI. After comparing the obtained results 22

in experiments using different algorithms variables, and according to the parameter settings in comparative methods (Gong et al., 2012; Said et al., 2018), we have initialized the parameter of the proposed method. Hence, we define the number of subproblems, i.e., IP Size (N = 100); the neighborhood parameter (T = 10, i.e., 10% of population size); crossover probability (Pc = 0.6); mutation probability (Pm = 0.2); and the number of generation genmax = 100 for MOEA/D; and Tabu list size T' = 50, Tabu Life = 50 and termination criteria of 300 iterations without any improvement in Tabu Search. As mentioned before, the high modularity value produced by an algorithm shows its ability to uncovering a good community structure of a network. The maximum (Qmax), average (Qavg), and standard deviation of modularity (Qstd) determined by the proposed method compare to other evolutionary-based algorithms after performing 20 independent runs are shown in Table 4. The obtained results indicate that the MOEA/D-TS can find a more precise community structure and outperforms other methods in the Zachary Karate Club, Dolphins, Polbooks, and Jazz network. For the Football network, MOEAD/D-TS outperforms all other evolutionary-based methods except MOEA/DM.

Networks Karate Dolphins Football Polbooks Jazz

Table 4. Performance analysis of empirical networks based on maximum modularity Q on single-layer networks Fast TGA LGA MOPSO MOEA/D MOEA/D-TS MENSGA MOEA/DMOEA/D CC-GA greedy (Gog (S. Li et (Coello & M (Lotfi & Karimi, 2017) (Yun Li et Net (Gong (Zhang & (Said et (Newman, et al., al., Lechuga, (Ju et al., al., 2018) al., 2013) et al., 2012) Li, 2007) Qmax Qavg Qstd 2004) 2007) 2010) 2002) 2016) 0.380 0.403 0.419 0.419 0.420 0.420 0.4198 0.4198 0.4198 0.421 0.420 0.001 0.495 0.524 0.528 0.527 0.529 0.520 0.521 0.5265 0.5265 0.5297 0.528 0.002 0.557 0.593 0.604 0.604 0.594 0.604 0.6044 0.6033 0.6046 0.6044 0.597 0.005 0.502 0.524 0.527 0.526 0.527 0.527 0.5269 0.3600 0.5269 0.527 0.518 0.004 0.435 0.44 0.444 0.444 0.444 NA NA NA NA 0.4442 0.444 0.000

To have a more precise evaluation of algorithms, we use the Wilcoxon test on the obtained results of the previous experiment with the mentioned conditions. Table 5 summarizes the p-values produced by the Wilcoxon test with a 5% significance level (α = 0.05) for the pair-wise comparison of MOEA/D-TS and nine other algorithms on the obtained "maximum modularity Q" during 20 runs. As a null hypothesis, it is assumed that there is no significant difference between the obtained modularity values of each pair of algorithms. Table 5. p-values produced by Wilcoxon test comparing MOEA/D-TS and other algorithms over the maximum modularity Q values from Table 4 MOEA/D-TS vs. Karate Dolphin Football Polbooks Jazz Fast greedy 0.00 0.00 0.00 0.00 0.00 TGA 0.001 0.00 0.001 0.012 0.00 LGA 0.811 0.00 0.00 0.044 0.002 MENSGA 0.058 0.004 0.00 0.013 0.002 CC-GA 0.406 0.017 0.016 0.001 0.003 MOEA/D-Net _ 0.004 0.00 0.004 0.042 MOEA/D 0.073 _ 0.012 0.00 0.027 MOPSO _ 0.011 0.00 0.22 0.00 MOEA/DM

0.003

0.00

0.002

0.042

_

The p-values less than 0.05 in Table 5 are strong evidence against the null hypothesis. The Wilcoxon signedrank test results suggest that with 95% confidence, the difference between the MOEA/D-TS and other algorithms is statistically significant (for all benchmarks) except the following cases: a) In the football dataset, the MOEA/D-TS versus MENSGA and MOEA/D did not elicit a statistically significant change (p = 0.058), (p = 0.073). In these cases, the median change from MOEA/D-TS to MENSGA and MOEA/D is not significantly different from zero. b) For the Polbook dataset, the MOEA/D-TS versus LGA and CC-GA did not elicit a statistically significant change 23

(p = 0.811), (p = 0.406) with a 95% confidence level. Therefore, the test has not provided statistically significant evidence that the pair algorithms are different for this benchmark functions. As the table states, the MOEA/D-TS shows a significant improvement over compared algorithms with a level of significance α = 0.05 in most cases. Table 6 illustrates the comparative results of MOEA/D-TS with nine other algorithms considering NMI measure. As mentioned earlier, the actual partitioning of the Zachary Karate Club, Dolphins, Football, and Polbooks networks are known and defined. In Table 6, Tc and Uc represent the total number of correct partitions in each network, and the number of uncovered communities by our method, respectively. The maximum modularity values obtained by any algorithm are shown in bold. The NMI = 1 for three test networks show that MOEA/D-TS can detect the whole actual communities of empirical networks correctly and reach to the acceptable result for the complex Football network. All these results demonstrate the high reliability of our algorithm in achieving accurate partitioning. After our method, ELPA, NeTa, and MOEA/DNet, reach to NMI = 1 for the Zachary Karate Club. MOEA/D-Net also has the second rank in Dolphins and Polbooks networks while this place belongs to InfoMap for Football network. For simplicity, in the following tables, the algorithms NeTa, LinkCommunity, and COPRA are shown as NT, LC, and C, respectively.

Networks Karate Dolphins Football Pollbooks

Fast greedy 0.693 0.495 0.732 0.530

ELPA (Wei Liu et al., 2016) 1 0.667 0.950 NA

Table 6. Analysis of empirical single-layer networks of best obtained NMI LC C NT InfoMap LPA CC(Ahn (Grego (Wei Liu (Rosvall & (Raghavan GA et al., ry, et al., Bergstrom, et al., 2007) 2010) 2010) 2014) 2008) 0.461 0.837 1 0.699 0.999 0.690 0.376 0.525 0.762 0.537 0.456 0.594 0.641 0.867 0.950 0.969 0.872 0.550 NA NA NA NA NA NA

MOEA/D-TS

MOEA/ D-Net 1 1 0.925 0.596

Tc 2 2 12 3

Uc 2 2 9 3

NMI 1 1 0.973 1

In Table 7, we compare the obtained modularity measure by MOEA/D-TS with different community detection algorithms on eight real-world networks during the 20 runs of the MOEA/D-TS. The obtained results indicate that MOEA/D-TS achieves better modularity value for the Zachary Karate Club, Dolphins, E.coli, and NetScience networks. For the Polbooks network, the CC-GA and our method perform the same and produce equal modularity value (i.e., 0.527). For the Protein network, MOEA/D-TS outperforms all other methods, but cannot compete with NeTa which gets the best result. On the Facebook network, the MOEA/D-TS and InfoMap have the same modularity values better than the ELPA, LinkCommunity, NeTa, LPA, and COPRA; but the CC-GA achieves the best result for this network. Although MOEA/D-TS cannot get the first rank for the Football, Protein, and Facebook networks; it shows the high performance among other algorithms. The low values of standard deviation for the obtained result show that the proposed method can generate stable results for all the networks from various real-world fields. Table 7. Analysis of empirical single-layer networks of obtained modularity Q Networks

ELPA

LC

C

NT

Karate Dolphins Football Polbooks E.coli Protein NetScience Facebook

0.371 0.511 0.603 0.457 0.791 0.745 0.932 0.795

0.250 0.260 0.340 0.170 0.551 0.606 0.900 0.795

0.380 0.490 0.590 0.520 0.217 0.483 0.871 0.011

0.380 0.480 0.610 0.510 0.890 0.867 0.945 0.731



Qmax 0.402 0.528 0.601 0.523 0.735 0.713 0.930 0.796

InfoMap Qavg 0.402 0.524 0.601 0.523 0.732 0.711 0.929 0.795

Qstd 0.000 0.003 0.000 0.000 0.004 0.001 0.001 0.001

Qmax 0.402 0.502 0.603 0.526 0.767 0.738 0.915 0.781

Analysis of Multiplex networks

24

LPA Qavg 0.33 0.492 0.586 0.502 0.746 0.721 0.908 0.78

Qstd 0.105 0.044 0.019 0.024 0.014 0.011 0.007 0.015

Qmax 0.420 0.529 0.594 0.527 0.786 0.747 0.958 0.809

CC-GA Qavg 0.042 0.526 0.563 0.525 0.778 0.740 0.955 0.809

Qstd 0.000 0.003 0.020 0.002 0.005 0.004 0.002 0.000

Qmax 0.421 0.5297 0.6044 0.527 0.907 0.846 0.964 0.796

MOEA/D-TS Qavg 0.420 0.528 0.597 0.518 0.892 0.837 0.957 0.791

Qstd 0.001 0.002 0.005 0.004 0.015 0.009 0.007 0.005

This section demonstrates the performance analysis of the MOEA/D-TS framework compared with the other advanced community detection algorithms. In order to directly evaluate the community structure in multiplex networks, using determined metrics for single-layer networks are not sufficient. Moreover, the use of classical supervised indexes like NMI for evaluation of community detection in the multiplex networks, which their real community structures are unknown, is impractical. Hence, in this research, we employ two unsupervised metrics, i.e., the redundancy Rc, and the multiplex modularity Qm for performance appraisal of the proposed method in facing with different community detection problems (Ma et al., 2018). As mentioned earlier, the multiplex modularity index is an evaluation metric for assessment of community detection in multiplex networks, which is defined in a simple form:

 [l ] d i[ l ]d [j j ]   1 m Qm      Ai , j    2 l 1  i , j C ,l   2M [ l ]   where the normalization factor

  l  M [ l ]

(10)

determines the total number of links, A[l] demonstrates the

adjacency matrix of the l-th layer network and d i[ l ] (d [jl ] ) shows the degree of node i(j) in the l-th layer network. In most of multilayer community detection problems, the resolution parameter λ has value 1 (Mucha et al., 2010). Accordingly, high-quality community detection will be determined by high values of Qm, and low values of it denote the low quality of detected community structures (Said et al., 2018). The Rc measures the fraction of redundant links in a multiplex network (Strehl & Ghosh, 2002). This index is based on the principle that connection among the entities of the shared community should be available in multiple layers. Rc computes as: [l ] | l | A pq  0| 1 Rc    | C | C ( p ,q )VC*   |V C |

(11)

where |C| shows the number of communities in multilayer networks. VC demonstrates the set of pairs (p, q) in the community C with at least one layer connections, and V C* denotes the pairs (p, q) in the community C having links in at least two layers. In this way, a high value of Rc for a partition of multiplex networks corresponds to an accurate shared community division (Ma et al., 2018).We have set all parameters of MOEA/D-TS to the widely used values in other related multiplex community detection algorithms (Ma et al., 2014, 2018; Wu & Pan, 2015) as follows: The IP Size N=200, Pc = 0.9, Pm = 0.2, T = 20, and genmax = 200 for MOEA/D; T' = 100 and termination criteria of 100 iterations without any improvement for the Tabu Search algorithm. Although these parameter settings may not be optimal, they can converge MOEA/D-TS to well community divisions in multiplex networks. In the first part of experiments, the capabilities of single-layer and multilayer community detection algorithms in the field of multiplex networks have been investigated (Wenfeng Liu et al., 2017). Accordingly, we have compared our method with the MOEA-MultiNet (Wenfeng Liu et al., 2017), and consequently, the BGLL algorithm adopted as a method that uses only one layer. Table 8 demonstrates the comparison results between MEA/D-TS, MOEA-MultiNet, and BGLL on two real-world networks include Kapferer Tailor Shop and CS-Aarhus. The results show that the BGLL can get the maximum value 0.2179 for Qm by running in only one layer of Kapferer Tailor Shop multiplex network. In the competition between MOEA-MultiNet, and MOEA/DTS, the first rank belongs to our proposed method. Considering the average value of Rc in this network, MOEA/D-TS obtains the maximum value.

25

Similarly, the evaluation results of CS-Aarhus are shown in the second part of Table 8. The maximum value of Qm obtained by BGLL among five layers is 0.4685, and this value has been increased to 0.4010 by MOEAMultiNet, as the best result. In the assessment of Rc, the MOEA/D-TS has achieved the maximum. These results indicate the capability of the proposed method in solving multiplex community detection problems. Table 8. Comparison of maximum multiplex modularity Qm and Rc of improved MOEAD/TS with BGLL and MOEA-MultiNet Network Algorithm Algorithm Qm Rc L1 0.2179 0.3964 L2 0.2006 0.4717 Single-layer BGLL L3 0.1380 0.2657 Kapferer Tailor Shop L4 0.0932 0.4094 Multi-layer MOEA-MultiNet 0.2094 0.4735 Multi-layer MOEA/D-TS 0.2132 0.4904 L1 0.4685 0.2852 L2 0.1672 0.0472 Single-layer BGLL L3 0.0832 0.1205 L4 0.2893 0.1611 CS-Arhus L5 0.4115 0.2715 Multi-layer MOEA-MultiNet 0.4010 0.3186 Multi-layer MOEA/D-TS 0.4094 0.3206

In the second step, we have evaluated the performance of MOEA/D-TS on the quality of uncovered shared communities. Accordingly, we have considered five classical shared community detection algorithms as comparison methods. The desired algorithms include two community definition based optimization algorithm: GenLouvin (Mucha et al., 2010) and MMCD (Ma et al., 2018), a consensus clustering algorithm: CBGLL (Lancichinetti & Fortunato, 2012), and two-layer aggregation algorithms: CLPAm and Cinfomap (Tang & Liu, 2010). The comparison results of the proposed MOEA/D-TS and other introduced algorithms in 30 independent runs are represented in Table 9 and Table 10. According to Table 9, it is clear that CLPAm is the weakest algorithm, among others, which cannot work well for all tested multiplex networks. Low-quality results of these algorithms are the consequence of the global link properties of networks and omitting separation of merged clusters (Ma et al., 2018). The results of GenLouvin compare to CBGLL and Cinfomap are acceptable, but not the best. Among all algorithms, the MOEA/D-TS and MMCD have been able to achieve a high value of Qm for all tested multiplex networks. These results are evidence of the capability and superior performance of the proposed method on the multiplex community detection. The terms Qm, Qma, and Qms in Table 9 denote the maximum, average, and standard deviation of multiplex modularity values in 30 independent runs. The results in Table 9 also demonstrated that MOEA/D-TS is a potent algorithm with high stability (i.e., its standard deviation Qms are small for all networks) in finding robust solutions. The strength of this algorithm derives from its hybrid global-local search, and using network-specific knowledge includes the clustering coefficient and information of neighboring subproblems in the optimization process. Utilizing this networkspecific information leads the algorithm in an accurate direction, which avoids inefficient search processes. In our method, MOEA/D is responsible for the global search to find potential community regions, and IDTS performs a local search to enhance the quality of communities and achieve optimal partitions around these regions. Table 9. The comparative results of multiplex community detection algorithms in term of multiplex modularity Qm London CKM Physicians Plasmodium Celegans Arabidopsis Algorithm MUS GPI Q Transport Innovation GPI GPI GPI Qm/ 0.7456 0.6579 0.4601 0.4905 0.6576 0.6108 Cinfomap (Tang & Liu, 2010) Qma CBGLL Qm/ 0.7352 0.6833 0.4734 0.4732 0.6471 0.6273 (Lancichinetti & Fortunato, Qma 2012)

26

GenLouvin (Mucha et al., 2010) CLPAm (Tang & Liu, 2010) MMCD (Ma et al., 2018) MOEA/D-TS

Qm/ Qma Qm Qma Qms Qm Qma Qms Qm Qma Qms

0.7892

0.6944

0.5183

0.5445

0.7067

0.6794

0.5496 0.5352 0.0064 0.7956 0.7932 0.0012 0.8014 0.7999 0.0009

0.6169 0.5702 0.0284 0.7058 0.7056 0.0008 0.7142 0.7123 0.0019

0.3456 0.3406 0.0023 0.5310 0.5237 0.0045 0.5382 0.5354 0.0025

0.4099 0.3998 0.0043 0.5571 0.5524 0.0037 0.5592 0.5569 0.0033

0.6052 0.5700 0.0193 0.7199 0.7165 0.0041 0.7263 0.7237 0.0026

0.5208 0.5106 0.0052 0.6875 0.6840 0.0020 0.6914 0.6896 0.0014

Table 10 compares the performance of desired algorithms in the case of average Rc for detected communities, over 30 independent runs. The results indicate that MOEA/D-TS is superior to the maximum value of Rc for the London Transport, the Physicians Innovation, and the Celegans GPI. Algorithms with larger Rc values compared to MOEA/D-TS contains MMCD for the Plasmodium GPI and Cinfomap for Arabidopsis GPI and MUS GPI receptively. It should be reminded that Qm values acquired by MMCD, GPI, and Cinfomap are smaller than MOEA/D-TS for all tested networks. Also, our method achieves better results than MMCD in most cases except for the Plasmodium GPI network. The result is due to the parallel computing capacity of MOEA/D and the neighborhood search capability of Tabu Search in the process of optimization. Due to the average value of the Rc acquired by different comparison algorithms, the MOEA/D-TS can achieve values above the average for all tested networks except for Arabidopsis GPI. However, this result is acquired by MMCD just for London Transport, Plasmodium GPI, and Celegans GPI. Table 10. The comparative results of different community detection algorithms in term of Rc London CKM Physicians Plasmodium Celegans Arabidopsis Algorithm Transport Innovation GPI GPI GPI Cinfomap CBGLL GenLouvin CLPAm MMCD MOEA/D-TS

0.1172 0.1162 0.1655 0.1167 0.2048 0.2251

0.4809 0.4311 0.4688 0.5497 0.4756 0.5618

0.0023 0.0013 0.0014 0.0014 0.0032 0.0028

0.0016 0 0.0001 0 0.0041 0.0047

0.0229 0.0049 0.0181 0.0201 0.0129 0.0142

MUS GPI 0.0145 0.0067 0.0059 0.0138 0.0085 0.0103

Usually, the modularity Q is the fundamental evaluation metric of community detection in single-layer networks. Here, Fig. 12 illustrates the obtained values of modularity Q by different community detection algorithms under each layer of empirical networks. Note that, in this experiment, the arrangement of layers in each network is in their original form. We can see from Fig. 12 (a), (b), and (c) that the MOEA/D-TS shows larger modularity values relative to other comparison algorithms in all three layers of the London Transport, CKM Physicians Innovation, and Plasmodium GPI. According to Fig. 12 (d) for Celegans GPI network, MOEA/D-TS gets smaller modularity values than GenLouvin and MMCD in most layers of networks. Other networks are also in the same situation, as shown in Fig. 12 (e) and (f). These results are not compatible with previous outcomes of comparison algorithms regarding the multiplex modularity in Table 9, as the MOEA/DTS can achieve superior Qm for all the test networks. Accordingly, we can conclude that an optimal community structure of a multiplex network cannot be found using the community structure of single-layers (Wenfeng Liu et al., 2017). It emphasizes that obtained communities cannot properly demonstrate the shared community structure of the multiplex network in the different layers, and the comprehensive topological connection of these multiplex networks should be considered to detect more precise shared communities (Ma et al., 2018). Other obtained results indicate that MOEA/D-TS can achieve larger Qm and Q values in different layers of multiplex networks compare with CLPAm, Cinfomap, and CBGLL. These achievements come from the comprehensive searching capability of the proposed method, which considers

27

the global definition of shared communities in multiplex networks and makes it an efficient community definition based optimization method.

0.7 0.6

1.5

2 Layers

2.5

3

0.2

1

1.5

1

1.5

2 Layers

2.5

3

1

0.4

0

0.6

0.5 1

Modularity

0.5

Modularity

0.7

Modularity

Modularity

0.8

2 Layers

2.5

0.8 0.6 0.4 0.2

3

1

2

3

4

5

6

Layers

0.8

Modularity

Modularity

0.7 0.6 0.4 0.2 0

CLPAm Cinf omap CBGLL GenLouv in MMCD

0.6 0.5 0.4 0.3

1

2

3

4 Layers

5

6

7

1

2

3

4 Layers

5

6

7

Fig.12. The comparison results of obtained modularity by improved MOEA/D-TS, MMCD, Cinfomap, CBGLL, GenLouvin, and CLPAm under each layer of real multiplex networks. (a) London Transport. (b) CKM Physicians Innovation. (c) Plasmodium GPI. (d) Celegans GPI. (e) Arabidopsis GPI. (f) MUS GPI.

4.4.

Discussion

This paper employs an improved version of MOEA/D-TS, which uses the Clustering Coefficient for the generation of the initial population. The results indicate that taking advantage of the Clustering Coefficient measure for generating IP in MOEA/D-TS will result in high-quality populations and faster modularity improvements, as shown in Fig. 8. Accordingly, the algorithm will achieve acceptable convergence, or optimal solution with fewer iterations compare with a similar situation using Random-IP. In our experiments, we compared the average of modularity measure obtained in 20 independent runs of MOEAD/TS using Random-IP and CC-based IP in 100 generations. The results indicate a 24.5 percent improvement for the obtained modularity using CC-based IP vs. Random-IP in Zachary Karate Club. This improvement for Football, Dolphins, Polbooks, and Jazz networks reach 60, 37.5, 90, and 289 percent, respectively. These outcomes prove that the MOEA/D-TS with CC-IP is capable of generating a high-quality population, which significantly increases the convergence speed to the optimal solution. In the reliability evaluation of the proposed method, we have achieved to NMI of one for three test networks, including Zachary Karate Club, Dolphins and Polbooks, and 0.973 for Football network. As shown in Table 6, for the first three networks, the MOEA/D-TS discovered their actual communities correctly and outperformed all other approaches. The average execution-times in 20 runs are 0.53, 0.87, 1.15 seconds for each dataset, respectively. For the American College Football, the complex structure of the network lets the algorithm to discover only nine real clusters in average 2.23 seconds; however, again, our algorithm can obtain the best value of NMI. These results indicate that MOEA/D-TS can detect the whole actual communities of empirical networks ideally and reach to the acceptable result for the complex Football network. Far from the comprehensive searching capabilities of MOEA/D-TS, choosing an appropriate fitness function is an effective factor in achieving these excellent results. Using modularity density as fitness function leads the process of community detection in a way that a network is divided into small communities with dense interconnections while at the same time it also tries to find large communities with sparsely 28

connected with the other communities. Hence, a good community structure can be found by MOEA/D-TS, which consider a trade-off between these two objectives in single-layer networks. We have proved the stability of the proposed method as an illustrated diagram in Fig. 10. The finding indicates that MOEA/D-TS can generate stable results for all the networks from various real-world fields. The average value of the standard deviation of obtained modularity Q for eight single-layer networks shown in Table 7 reaches to 0.006. The equivalent average for the standard deviation of multiplex modularity Qm for six multiplex network of Table 9 is equal to 0.0021. The low values of standard deviation for the obtained result show that the proposed method can discover accurate results that are not subject to specific conditions or randomly obtained. The analysis of the proposed method in the field of single-layer community detection demonstrates the capability of MOEA/D-TS compare to the other evolutionary-based and non-evolutionary algorithms. According to Tables 4 and 5, the MOEA/D-TS performs better than other algorithms and achieves the best value of modularity Q in the Zachary Karate Club, Dolphins, Polbooks, and Jazz network. The only exceptions for this achievement are in the football network versus MENSGA and MOEA/D and in the Polbook dataset versus the LGA and CC-GA. The behavior of MOEA/D-TS beside different non-evolutionary community detection algorithms demonstrates its absolute superiority in the Zachary Karate Club, Dolphins, E.Coli, and NetScience networks, as shown in Table 7. For the Zachary Karate Club, the CC-GA, LPA, and InfoMap also perform well, but the LinkCommunity, an overlapping community detection algorithm, cannot compete with other algorithms. Also, the obtained results by ELPA, COPRA, and NeTa for this network are in the same range. On the Dolphins network, MOEA/D-TS can outperform all other methods with the highest modularity value. Besides, the LPA, ELPA, and InfoMap also produce acceptable results for this network. The performance of our method and the CC-GA for the Polbooks network are the same, and their obtained modularity value is equal to 0.527, while InfoMap also performs very close to these methods. The MOEA/DTS outperforms all other approaches in the E.coli network, and the NeTa produces high modularity value after our proposed method for this network. Accordingly, for the NetScience network, the MOEA/D-TS achieves the best modularity value and outperforms all other methods. The MOEA/D-TS on the Football network is the superior method among other algorithms except for NeTa, which acquires the first rank and achieve to the best results of this network. However, the maximum value of modularity acquired by MOEA/D-TS is very close to the NeTa (i.e., 0.6044 versus 0.61). The proposed algorithm is in the same condition for the Protein network and obtains modularity value Q = 0.846 against NeTa with Q = 0.867. In the case of the Facebook network, the first rank belongs to CC-GA with Q = 0.809, and the InfoMap and the MOEA/D-TS gain the second-best values as Q = 0.796. Although MOEA/D-TS cannot get the first rank for the Football, Protein, and Facebook networks, it shows the high performance among other algorithms. In addition to the above, we should note that MOEA/D-TS can achieve the modularity values above the average for all tested networks in comparison to all other evolutionary-based and non-evolutionary algorithms. The evaluation of the proposed method in the field of multiplex community detection has been conducted using the redundancy Rc, and the multiplex modularity Qm. The acquired results in Table 8 show that the maximum value of Qm obtained by BGLL reaches to 0.2179 in separate layers of Kapferer Tailor Shop network. The MOEA-MultiNet reaches the maximum multiplex modularity 0.2094, while the best value obtained by our method is 0.2132. The average values of Rc obtained by BGLL and MOEA-MultiNet on Kapferer Tailor Shop are equal to 0.4717 and 0.4735, respectively, that has increased to 0.4904 by our proposed method. In the case of the CS-Aarhus network, the maximum value of Qm obtained by BGLL among 29

five layers is 0.4685, and the best value is equal to 0.4010 for MOEA-MultiNet, while the proposed method achieves the Qm = 0.4097. The best-obtained result for an average of Rc in this network is equal to 0.3206, which belongs to MOEA/D-TS. The corresponding values obtained by BGLL and MOEA-MultiNet are 0.2715 and 0.3186, respectively. The results indicate that MOEA/D-TS is superior to MOEA-MultiNet and most single dimension based method for discovery of the shared community structure of multiplex networks. Furthermore, it can be concluded from the result that the optimal shared community structure in a multilayer network cannot be obtained by only one layer. According to Table 9, the MOEA/D-TS has the highest multiplex modularity score on all different multiplex networks. After our method, the MMCD can achieve to the high value of Qm for all tested multiplex networks. For example, in the London Transport multiplex network, the MOEA/D-TS and MMCD get the highest Qm value 0.8014 and 0.7956, respectively. It is while the obtained value of the other algorithms is less than 0.7892. The structure of the shared community in these two algorithms is defined using link properties, and this network-specific knowledge is combined into their search methods. Thus, MOEA/D-TS and MMCD can perform possible community detection in multiplex networks and find shared communities with high Qm values. However, between these two algorithms, this is MOEA/D-TS that gains maximum values of Qm and Qma (maximum and average value of multiplex modularity) for all tested multiplex networks. The average standard deviation of the obtained multiplex modularity for all datasets in 30 independent runs is equal to 0.0021, which proves the stability of the proposed method. The performance analysis of the comparative algorithms using average Rc in Table 10 shows the superiority of the MOEA/D-TS in the London Transport, the Physicians Innovation, and Celegans GPI multiplex networks. The improvement of the obtained average Rc by MOEA/D-TS in London Transport, the Physicians Innovation, and Celegans GPI are 9.9%, 2.2%, and 14.6%, respectively. For the Plasmodium GPI, the MMCD gains the maximum value 0.0032 for the Rc, and the second-best value is the outcome of our proposed method (i.e., 0.0028). The Cinfomap achieves to the best values for Arabidopsis GPI and MUS GPI. For the MUS GPI, after the CLPAm, the MOEAD/TS attains the maximum Rc value among the other three algorithms, but for the Arabidopsis GPI, the fourth position belongs to our method. However, the obtained value of Rc by MOEA/DTS is above the average in all tested networks except Arabidopsis GPI. As it is clear, any correlation between Qm and Rc cannot be proved based on obtained results in Tables 9 and 10, and hence, achieve to the shared community structure with high both Qm and Rc is a challenging process for most comparisons algorithms. However, MOEA/D-TS outperforms MMCD for the six networks in both Qm and Rc. On a similar basis, both of these algorithms use a global search to discover potential community regions in the whole problem space and then utilize a local search to achieve more precise results. Despite this resemblance, MOEA/D-TS has some specific features which make it superior. The MOEA/D-TS i) generates a high-quality IP in a way that separates communities shape from densely connected components of the network, and the network splits according to local bridges (Said et al., 2018). With high-quality IP, the algorithm will reach global optima in a fewer number of iterations. ii) Using MOEA/D-TS, an MOP is decomposed into several scalar subproblems to optimize simultaneously using information about their neighbors (Gong et al., 2012). Hence, the MOEA/D-TS is a compatible method for dealing with large-scale real-world optimization problems. 5. Conclusion and future works

In this paper, we have used an improved multi-objective evolutionary algorithm (MOEA/D-TS) for the problem of community detection in complex networks. In this algorithm, the Clustering Coefficient measure 30

was employed to generate the IP and determine the network layer ordering. The results indicate that taking advantage of CC measure for generating IP in MOEA/D-TS, provides the high-quality population in initial stages, which improves the overall convergence process of the proposed method. The empirical study includes the comparison of MOEA/D-TS with various community detection algorithms using nine single-layer and eight multiplex networks having different sizes and various fields. The experimental results indicate that improved MOEA/D-TS can obtain acceptable shared community structures in multiplex networks and has better performance than many state-of-the-art community detection algorithms. Here, we mainly focused on the discrete community structure of complex networks, and hence, the challenging problem of overlapping community detection in the multiplex network will be investigated in future works. On the other hand, many real-world complex networks have thousands of layers and millions of active users with complex community structures. Since current algorithms are not able to perform well on these networks, taking advantage of parallel computing methods can be a good option for solving this problem in future researches.

Authors’ contribution statements FK implemented the computer code, performed the experiments, and prepared the original manuscript. SL was a lead in the design and coordination of the study. SL and FK were responsible for computational modeling and carried out all experiments and statistical data analysis. FK, SL and HI have made substantial contributions to the conception, development of methodology and verification of experimental results. All authors reviewed the manuscript, discussed the results and approved the final manuscript.

CRediT author statement

Fatemeh Karimi: Conceptualization, Methodology, Software, Validation, Formal analysis, Investigation, Writing-Original Draft, Writing-Review & Editing. Shahriar Lotfi: Conceptualization, Methodology, Validation, Formal analysis, Supervision, Project administration, Writing-Review & Editing. Habib Izadkhah: Conceptualization, Methodology, Validation, Writing-Review & Editing.

References Ahn, Y.-Y., Bagrow, J. P., & Lehmann, S. (2010). Link communities reveal multiscale complexity in networks. Nature, 466(7307), 761. Amelio, A., & Pizzuti, C. (2014). Community detection in multidimensional networks. 2014 IEEE 26th International Conference on Tools with Artificial Intelligence, 352–359. Benson, A. R., Gleich, D. F., & Leskovec, J. (2016). Higher-order organization of complex networks. Science, 353(6295), 163–166. Berlingerio, M., Pinelli, F., & Calabrese, F. (2013). Abacus: frequent pattern mining-based community 31

discovery in multidimensional networks. Data Mining and Knowledge Discovery, 27(3), 294–320. Boccaletti, S., Bianconi, G., Criado, R., Del Genio, C. I., Gómez-Gardenes, J., Romance, M., Zanin, M. (2014). The structure and dynamics of multilayer networks. Physics Reports, 544(1), 1–122. Coello, C. A. C., & Lechuga, M. S. (2002). MOPSO: A proposal for multiple objective particle swarm optimization. Proceedings of the 2002 Congress on Evolutionary Computation. CEC’02 (Cat. No. 02TH8600), 2, 1051–1056. Danon, L., D\’\iaz-Guilera, A., & Arenas, A. (2006). The effect of size heterogeneity on community identification in complex networks. Journal of Statistical Mechanics: Theory and Experiment, 2006(11), P11010. Das, P. K., Behera, H. S., & Panigrahi, B. K. (2016). A hybridization of an improved particle swarm optimization and gravitational search algorithm for multi-robot path planning. Swarm and Evolutionary Computation, 28, 14–28. De Domenico, M., Lancichinetti, A., Arenas, A., & Rosvall, M. (2015). Identifying modular flows on multilayer networks reveals highly overlapping organization in interconnected systems. Physical Review X, 5(1), 11027. De Domenico, M., Nicosia, V., Arenas, A., & Latora, V. (2015). Structural reducibility of multilayer networks. Nature Communications, 6, 6864. Fern, X. Z., & Brodley, C. E. (2004). Solving cluster ensemble problems by bipartite graph partitioning. Proceedings of the Twenty-First International Conference on Machine Learning, 36. Fortunato, S., & Barthelemy, M. (2007). Resolution limit in community detection. Proceedings of the National Academy of Sciences, 104(1), 36–41. García, S., Molina, D., Lozano, M., & Herrera, F. (2009). A study on the use of non-parametric tests for analyzing the evolutionary algorithms’ behaviour: A case study on the CEC’2005 Special Session on Real Parameter Optimization. Journal of Heuristics, 15(6), 617–644. https://doi.org/10.1007/s10732-0089080-4 Girvan, M., & Newman, M. E. J. (2002). Community structure in social and biological networks. Proceedings of the National Academy of Sciences of the United States of America, 99(12), 7821–7826. Gleiser, P. M., & Danon, L. (2003). Community structure in jazz. Advances in Complex Systems, 6(04), 565– 573. Glover, F. (1995). Tabu search fundamentals and uses. Graduate School of Business, University of Colorado Boulder. Gog, A., Dumitrescu, D., & Hirsbrunner, B. (2007). Community detection in complex networks using collaborative evolutionary algorithms. European Conference on Artificial Life, 886–894. Gong, M., Fu, B., Jiao, L., & Du, H. (2011). Memetic algorithm for community detection in networks. Physical Review E, 84(5), 56101. Gong, M., Ma, L., Zhang, Q., & Jiao, L. (2012). Community detection in networks by using multiobjective evolutionary algorithm with decomposition. Physica A: Statistical Mechanics and Its Applications, 391(15), 4050–4060. Gregory, S. (2010). Finding overlapping communities in networks by label propagation. New Journal of Physics, 12(10), 103018. Guerrero, M., Montoya, F. G., Baños, R., Alcayde, A., & Gil, C. (2017). Adaptive community detection in complex networks using genetic algorithms. Neurocomputing, 266, 101–113. Hmimida, M., & Kanawati, R. (2015). Community detection in multiplex networks: A seed-centric approach. NHM, 10(1), 71–85. Ho-Huu, V., Hartjes, S., Visser, H. G., & Curran, R. (2018). An improved MOEA/D algorithm for bi-objective optimization problems with complex Pareto fronts and its application to structural optimization. Expert Systems with Applications, 92, 430–446. Holland, J. H., & others. (1992). Adaptation in natural and artificial systems: an introductory analysis with applications to biology, control, and artificial intelligence. MIT press. Interdonato, R., Tagarelli, A., Ienco, D., Sallaberry, A., & Poncelet, P. (2017). Local community detection in multilayer networks. Data Mining and Knowledge Discovery, 31(5), 1444–1479. 32

Ju, Y., Zhang, S., Ding, N., Zeng, X., & Zhang, X. (2016). Complex network clustering by a multi-objective evolutionary algorithm based on decomposition and membrane structure. Scientific Reports, 6, 33870. Kanawati, R. (2015). Empirical evaluation of applying ensemble methods to ego-centred community identification in complex networks. Neurocomputing, 150, 417–427. Kim, J., & Lee, J.-G. (2015). Community detection in multi-layer graphs: A survey. ACM SIGMOD Record, 44(3), 37–48. Kivelä, M., Arenas, A., Barthelemy, M., Gleeson, J. P., Moreno, Y., & Porter, M. A. (2014). Multilayer networks. Journal of Complex Networks, 2(3), 203–271. Kuncheva, Z., & Montana, G. (2015). Community detection in multiplex networks using locally adaptive random walks. Proceedings of the 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 2015, 1308–1315. Lancichinetti, A., & Fortunato, S. (2009). Benchmarks for testing community detection algorithms on directed and weighted graphs with overlapping communities. Physical Review E, 80(1), 16118. Lancichinetti, A., & Fortunato, S. (2012). Consensus clustering in complex networks. Scientific Reports, 2, 336. Lancichinetti, A., Fortunato, S., & Kertesz, J. (2009). Detecting the overlapping and hierarchical community structure in complex networks. New Journal of Physics, 11(3), 33015. Latapy, M. (2008). Main-memory triangle computations for very large (sparse (power-law)) graphs. Theoretical Computer Science, 407(1–3), 458–473. Leskovec, J., & Mcauley, J. J. (2012). Learning to discover social circles in ego networks. Advances in Neural Information Processing Systems, 539–547. Li, S., Chen, Y., Du, H., & Feldman, M. W. (2010). A genetic algorithm with local search strategy for improved detection of community structure. Complexity, 15(4), 53–60. Li, Yixuan, He, K., Bindel, D., & Hopcroft, J. E. (2015). Uncovering the small community structure in large networks: A local spectral approach. Proceedings of the 24th International Conference on World Wide Web, 658–668. Li, Yun, Liu, G., & Lao, S. (2013). A genetic algorithm for community detection in complex networks. Journal of Central South University, 20(5), 1269–1276. Liu, Wei, Jiang, X., Pellegrini, M., & Wang, X. (2016). Discovering communities in complex networks by edge label propagation. Scientific Reports, 6, 22470. Liu, Wei, Pellegrini, M., & Wang, X. (2014). Detecting communities based on network topology. Scientific Reports, 4, 5739. Liu, Wenfeng, Wang, S., Gong, M., & Zhang, M. (2017). An improved multiobjective evolutionary approach for community detection in multilayer networks. 2017 IEEE Congress on Evolutionary Computation (CEC), 443–449. Lotfi, S., & Karimi, F. (2017). A Hybrid MOEA/D-TS for Solving Multi-Objective Problems. Journal of AI and Data Mining, 5(2), 183–195. https://doi.org/10.22044/jadm.2017.886 Lusseau, D., Schneider, K., Boisseau, O. J., Haase, P., Slooten, E., & Dawson, S. M. (2003). The bottlenose dolphin community of Doubtful Sound features a large proportion of long-lasting associations. Behavioral Ecology and Sociobiology, 54(4), 396–405. Ma, L., Gong, M., Liu, J., Cai, Q., & Jiao, L. (2014). Multi-level learning based memetic algorithm for community detection. Applied Soft Computing, 19, 121–133. Ma, L., Gong, M., Yan, J., Liu, W., & Wang, S. (2018). Detecting composite communities in multiplex networks: A multilevel memetic algorithm. Swarm and Evolutionary Computation, 39, 177–191. Marra, M., Emrouznejad, A., Ho, W., & Edwards, J. S. (2015). The value of indirect ties in citation networks: SNA analysis with OWA operator weights. Information Sciences, 314, 135–151. Mei, Y., Tang, K., & Yao, X. (2011). Decomposition-based memetic algorithm for multiobjective capacitated arc routing problem. IEEE Transactions on Evolutionary Computation, 15(2), 151–165. Mucha, P. J., Richardson, T., Macon, K., Porter, M. A., & Onnela, J.-P. (2010). Community structure in timedependent, multiscale, and multiplex networks. Science, 328(5980), 876–878. Newman, M. E. J. (2004). Fast algorithm for detecting community structure in networks. Physical Review E, 69(6), 66133. 33

Newman, M. E. J. (2006). Finding community structure in networks using the eigenvectors of matrices. Physical Review E, 74(3), 36104. Newman, M. E. J., & Girvan, M. (2004). Finding and evaluating community structure in networks. Physical Review E, 69(2), 26113. Pizzuti, C. (2008). Ga-net: A genetic algorithm for community detection in social networks. International Conference on Parallel Problem Solving from Nature, 1081–1090. Pizzuti, C. (2009). A multi-objective genetic algorithm for community detection in networks. 2009 21st IEEE International Conference on Tools with Artificial Intelligence, 379–386. Raghavan, U. N., Albert, R., & Kumara, S. (2007). Near linear time algorithm to detect community structures in large-scale networks. Physical Review E, 76(3), 36106. Rosvall, M., & Bergstrom, C. T. (2008). Maps of random walks on complex networks reveal community structure. Proceedings of the National Academy of Sciences, 105(4), 1118–1123. Said, A., Abbasi, R. A., Maqbool, O., Daud, A., & Aljohani, N. R. (2018). CC-GA: A clustering coefficient based genetic algorithm for detecting communities in social networks. Applied Soft Computing, 63, 59–70. Salwinski, L., Miller, C. S., Smith, A. J., Pettit, F. K., Bowie, J. U., & Eisenberg, D. (2004). The database of interacting proteins: 2004 update. Nucleic Acids Research, 32(suppl_1), D449--D451. Seifi, M. (2012). Cœurs stables de communautés dans les graphes de terrain. Paris 6. Shang, R., Bai, J., Jiao, L., & Jin, C. (2013). Community detection based on modularity and an improved genetic algorithm. Physica A: Statistical Mechanics and Its Applications, 392(5), 1215–1231. Shen-Orr, S. S., Milo, R., Mangan, S., & Alon, U. (2002). Network motifs in the transcriptional regulation network of Escherichia coli. Nature Genetics, 31(1), 64. Srinivas, S., & Rajendran, C. (2019). Community Detection and Influential Node Identification in Complex Networks using Mathematical Programming. Expert Systems with Applications. Strehl, A., & Ghosh, J. (2002). Cluster ensembles---a knowledge reuse framework for combining multiple partitions. Journal of Machine Learning Research, 3(Dec), 583–617. Tang, L., & Liu, H. (2010). Community detection and mining in social media. Synthesis Lectures on Data Mining and Knowledge Discovery, 2(1), 1–137. Tang, L., Wang, X., & Liu, H. (2009). Uncoverning groups via heterogeneous interaction analysis. 2009 Ninth IEEE International Conference on Data Mining, 503–512. Tasgin, M., Herdagdelen, A., & Bingol, H. (2007). Community detection in complex networks using genetic algorithms. ArXiv Preprint ArXiv:0711.0491. Wang, Z., & Rangaiah, G. P. (2017). Application and analysis of methods for selecting an optimal solution from the Pareto-optimal front obtained by multiobjective optimization. Industrial & Engineering Chemistry Research, 56(2), 560–574. Watts, D. J., & Strogatz, S. H. (1998). Collective dynamics of “small-world” networks. Nature, 393(6684), 440. Wu, P., & Pan, L. (2015). Multi-objective community detection based on memetic algorithm. PloS One, 10(5), e0126845. Xin, Y., Xie, Z.-Q., & Yang, J. (2016). An adaptive random walk sampling method on dynamic community detection. Expert Systems with Applications, 58, 10–19. Xu, Y., Xu, H., & Zhang, D. (2015). A novel disjoint community detection algorithm for social networks based on backbone degree and expansion. Expert Systems with Applications, 42(21), 8349–8360. Zachary, W. W. (1977). An information flow model for conflict and fission in small groups. Journal of Anthropological Research, 33(4), 452–473. Zakrzewska, A., & Bader, D. A. (2015). A dynamic algorithm for local community detection in graphs. Proceedings of the 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 2015, 559–564. Zhang, Q., & Li, H. (2007). MOEA/D: A multiobjective evolutionary algorithm based on decomposition. IEEE Transactions on Evolutionary Computation, 11(6), 712–731. Zhang, Q., Liu, W., Tsang, E., & Virginas, B. (2009). Expensive multiobjective optimization by MOEA/D with Gaussian process model. IEEE Transactions on Evolutionary Computation, 14(3), 456–474. Zhou, L., Lü, K., Yang, P., Wang, L., & Kong, B. (2015). An approach for overlapping and hierarchical 34

community detection in social networks based on coalition formation game theory. Expert Systems with Applications, 42(24), 9634–9646. Zou, F., Chen, D., Li, S., Lu, R., & Lin, M. (2017). Community detection in complex networks: Multi-objective discrete backtracking search optimization algorithm with decomposition. Applied Soft Computing, 53, 285–295.

Declaration of interests

☒ The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

☐The authors declare the following financial interests/personal relationships which may be considered as potential competing interests:

35