Information Sciences 422 (2018) 290–304
Contents lists available at ScienceDirect
Information Sciences journal homepage: www.elsevier.com/locate/ins
A two-level learning strategy based memetic algorithm for enhancing community robustness of networks Wenfeng Liu, Maoguo Gong∗, Shanfeng Wang, Lijia Ma Key Laboratory of Intelligent Perception and Image Understanding of Ministry of Education, International Research Center for Intelligent Perception and Computation, Xidian University, Xi’an, Shaanxi Province 710071, China
a r t i c l e
i n f o
Article history: Received 30 November 2016 Revised 28 August 2017 Accepted 4 September 2017 Available online 8 September 2017 Keywords: Complex network Memetic algorithm Robustness Community structure
a b s t r a c t Community structure is a natural and inherent property of complex networks which can reflect their potential functionality. When the robustness of a network is improved, its community structure should be preserved as much as possible. However, most earlier studies only considered enhancing the network robustness and ignored the analysis of the community structure, which may alter the original topological structure and functionality of networks. In this paper, we propose a new memetic algorithm (MA-CR) with a two-level learning strategy to enhance the community robustness of networks, while maintaining the degree distribution and community structure. The proposed MA-CR is a hybrid globallocal heuristic search methodology which adopts genetic algorithm as the global search and the proposed two-level learning strategy as the local search. The two-level learning strategy is designed based on the potential characteristics of the node structure and community structure of networks, which aims at mitigating two-level targeted attacks. Experiments on synthetic scale-free networks as well as real-world networks demonstrate the effectiveness and stability of the proposed algorithm as compared with several state-ofthe-art algorithms. © 2017 Elsevier Inc. All rights reserved.
1. Introduction Complex networks are widely adopted as models to explore the properties of complex systems in terms of structure, stability and functionality [2,7,11,19,20,27,35,36]. Among the properties of complex networks, community structure is identified as an intrinsic and important one for reflecting the functionality of complex systems [12]. The community structure represents the functional modules in networks. In the domain of complex networks, communities are described as the subsets of networks. Generally, the connections within communities of a network are denser than those among communities [29,33], and nodes belonging to the same community probably have same or similar properties [39]. For instance, considering a protein-protein interaction network [12], communities are those functional modules of proteins with same or similar functions. In aviation networks, communities correspond to the sets of airlines that have more frequent traffic activities. Robustness is another important property for reflecting the capability in protecting the security of complex networks. When networks suffer from random failures or malicious attacks, the breakdown of a critical fraction of nodes or edges can lead to the collapse of networks [35]. In recent years, many methods focusing on how to enhance the robustness of networks have been carried out [14,15,35]. Generally, the major methods can be divided into three categories [40]. Initially, the first ∗
Corresponding author. E-mail address:
[email protected] (M. Gong).
http://dx.doi.org/10.1016/j.ins.2017.09.021 0020-0255/© 2017 Elsevier Inc. All rights reserved.
W. Liu et al. / Information Sciences 422 (2018) 290–304
291
category simply adds edges to existing networks [5,16]. Those methods can enhance network robustness, but adding extra edges to a network may change its degree distribution and community structure. Without adding new edges, the second category makes all nodes have similar degrees by reconnecting edges [17]. This solution may obtain significant enhancement of network robustness, but nearly change all the connections of the network. Moreover, it also influences the degree distribution and the community structure of the network. The third category tends to overcome the obstacles of earlier methods by swapping two randomly chosen edges [24,35], and the network robustness can get significantly enhancement while retaining the degree distribution of the network. Nevertheless, it may still break the community structure of the network. In summary, all these three categories of methods for robustness enhancement fail to preserve the community structure of networks. As we know, altering the community structure of networks may damage its topological structure and functionality, thus we should maintain the original community structure as much as possible. Traditional measures for the robustness of networks overlook situations in which the networks suffer from a destructive damage without fully collapsing. To solve this problem, Schneider et al. [35] put forward a robustness measure Rn , which considers the size of the largest connected part when nodes are removed gradually, namely Rn = N1 N q=1 s (q ), where s(q) represents the integrity of nodes in the largest connected component after q nodes are removed. 1/N is the normalization factor for comparing the robustness measure of networks with different scales. Generally, a larger Rn indicates a more robust network. After that, Zeng and Liu proposed a new robustness measure, link robustness Rl , to evaluate network robustness under link attacks [41]. Note that the aforementioned measures are both simple and effective, but none of them considers the integrity of community structure. In our previous paper [21], we modeled the malicious attack on networks as a two-level targeted one, with the small-scale node attack and large-scale community attack. Then, we presented a community robustness Rc to measure the community integrity of networks under the two-level targeted attacks. Moreover, in order to preserve the original community structure, we proposed a new constraint that keeps the intracommunity links’ amount in every community unchanged for link changes. Evolutionary algorithms (EAs), due to their inherent global searching abilities, have been found to be very effective for solving many hard optimization problems [13,23,34]. Memetic algorithm (MA) is a very important and popular branch in the domain of EA. The inspiration of MAs derives from the natural systems that introduce the individual learning into the population evolution. The designation of “memetic” derived from Dawkin’s concept of a meme, which can perform local refinements [10]. Therefore, MAs efficaciously synthesize the capabilities of the global search and local search, and have been demonstrated to be more efficient than traditional EAs in searching solutions for many optimization problems [8,18,28,30–32,37,43]. In this study, we propose a new memetic algorithm to enhance the community robustness of networks under the two constraints that remain the degree distribution and the intracommunity links’ amount of each community unchanged for link changes. The proposed algorithm, termed MA-CR, adopts genetic algorithm (GA) [26] as the global search and the proposed two-level learning strategy as the local search. In standard GAs, the evolution process begins with a population of random individuals. In each iteration, several individuals are firstly selected from the current population according to their fitness values, and then the recombination and mutation operations are performed to construct a new offspring population for the next iteration. The proposed two-level learning strategy is devised based on the potential characteristics of the node structure and community structure of networks, which aims at mitigating the small-scale node attacks and large-scale community attacks, respectively. Experiments on both synthetic scale-free networks and two real-world networks reveal that compared with several state-of-the-art algorithms, the proposed MA-CR can search for much more robust networks under the two constraints and has better stability. The remainder of the paper is organized as follows. Section 2 introduces the two-level targeted attacks and the community robustness index. Section 3 gives a detailed description of the proposed algorithm MA-CR. Section 4 conducts the experiments on the synthetic scale-free networks with different scales and two diverse real-world networks to demonstrate the effectiveness of MA-CR. Conclusions are summarized in the final section. 2. Community robustness of networks against two-level targeted attacks Generally, a network can be modeled as a graph G = (V, E ), where V = {1, 2, . . . , N} is the set of N nodes, and E = {ei j = (i, j ) | i, j ∈ V and i = j} denotes the set of M edges. In this paper, we mainly focus on undirected and unweighted networks. In another perspective, we consider that a network can also be represented as C = (S, E ), where S = {s1 , s2 , . . . , sk } denotes the set of communities of networks. E = {ci j = (si , s j ) | si , s j ∈ S and si = s j } is the set of connections between communities, and the entry of cij corresponds to the number of intercommunity links between community si and sj . Before giving a detailed description of our algorithm, we shall summarize our previous work [21] to briefly introduce the conceptions of the two-level targeted attacks and community robustness index. 2.1. Multi-level targeted attacks model In our previous paper [21], besides the damages from nodes or edges of networks, we considered a possible scenario where attacks could also occur on the communities of networks. Moreover, the damages induced by attacking communities are obviously much greater than those induced by attacking certain nodes or edges. Then, we modeled the malicious attack
292
W. Liu et al. / Information Sciences 422 (2018) 290–304
Fig. 1. Community detection results on the toy network G1 and a schematic illustration of the weighted attack strategy on G1 . (a) Toy network G1 features 12 nodes and 16 links, which can be divided into 3 communities. (b) The weighted attack strategy on G1 . Different communities are plotted with different colors.
on networks as a two-level targeted procedure. The first level is defined as the small-scale attack on those nodes which have the highest degree, and the second level is the large-scale attack on those communities that have the maximal number of interconnections. We can find that these two level targeted attacks may occur successively or simultaneously. The first one is more common in real-world applications, which can be modeled as a weighted attack strategy [21]. We thus mainly focus on the weighted attack strategy in this study. Fig. 1 shows the community detection results of a toy network and a schematic illustration of the weighted attack strategy on the toy network. By using the community detection algorithm BGLL [6], the toy network can be separated into some small parts. As we can see in Fig. 1(a), the toy network G1 is clustered into 3 communities (plotted with different colors). Fig. 1(b) shows an example of the weighted attack strategy on the toy network G1 . The damages are caused by the attacks on either nodes or communities. 2.2. Definition of community robustness Based on the proposed attack model, we designed a unique measure, the community robustness Rc , to evaluate the community integrity of networks under the proposed targeted attacks [21]. The definition of community robustness is given as follow: t 1 Rc = t
q=1
k 1 s( p, q ) k
(1)
p=1
where t is the times of the targeted attacks on the network, k is the number of communities in the network and s(p, q) denotes the fraction of the remaining functional nodes in the community p when the qth attack occurs. 1/t and 1/k are the normalization factors to make the robustness of networks with different numbers of attacks and communities can be compared. Obviously, higher Rc always indicates that the community structure of the network is more robust. Considering the weighted attack strategy, two level targeted attacks should be considered, respectively. Under the smallscale targeted attack, the community robustness of a network is defined as [21]:
Rc1
N 1 = N q=1
k 1 s( p, q ) k
(2)
p=1
where N is the number of nodes in the network, s(p, q) denotes the fraction of the remaining functional nodes in the community p when q nodes are attacked. S1 (q ) = 1k kp=1 s( p, q ) denotes the community integrity of the network after the removal of q nodes. Under the large-scale targeted attack, the community robustness index can be defined as [21]:
Rc2 =
k 1 S2 ( u ) k
(3)
u=1
where S2 (u) is the community integrity of the network when u communities are removed. In this study, we define a community network C whose nodes are the communities of its original network and edges are the connections among these communities. Therefore, Rc2 actually measures the node integrity of the community network. Considering the weighted attack strategy, the community robustness of a network is composed by a weighted form [21]:
Rc = α Rc1 + (1 − α )Rc2
(4)
W. Liu et al. / Information Sciences 422 (2018) 290–304
293
where α is a weighting parameter. If 0 ≤ α < 0.5, the network mainly suffers from the large-scale targeted community attacks. If 0.5 < α ≤ 1, then the network mainly suffers from the small-scale targeted node attacks. In the following text, we adopt Rc as our objective function, which is also called the fitness function in EAs. The goal is to maximize the measure Rc under the aforementioned two constraints. 3. The proposed memetic algorithm for enhancing community robustness of networks In this section, we describe in detail the proposed memetic algorithm MA-CR. First of all, the PopulationInitialization() procedure is used to generate the initial population. The CrossoverOperation() function is responsible for performing the crossover operation. The LocalSearch() function is used to perform the local search operation. The UpdatePopulation() procedure is applied to construct a population P that combines the parent population Pμ and the offspring population Pλ . The TwoTournamentSelection() function is applied to generate the next population for mating in MA. The two-tournament selection means running a tournament between two randomly chosen individuals in the current population, and the individual with better fitness is selected to the next population. The new population Pμ for the next generation are generated by the two-tournament selection procedure. The framework of MA-CR is given as Algorithm 1. Algorithm 1 The framework of MA-CR. 1: 2: 3: 4: 5: 6: 7: 8: 9:
Input: The initial network G0 ; The size of population μ; The crossover probability pc ; The local search probability pl . Pμ ← PopulationInitialization(G0 , μ); repeat Pλ ← CrossoverOperation(Pμ , pc ); Pλ ← LocalSearch(Pλ , pl ); P ← UpdatePopulation(Pμ , Pλ ); Pμ ← TwoTournamentSelection(P); until Reach the maximum iterations Output: Output the network Gbest with the highest robustness.
3.1. Representation and initialization In this study, each individual represents the communications of a network, which is actually the adjacency matrix of the network. Each element Aij in the individual is a binary value which means whether there exists communication between node i and j. Thus, a population with μ individuals represents μ networks. In the initialization procedure, each individual is assigned as the initial network G0 . Obviously, this population has a lack of diversity and quality. Therefore, we then randomly swap a small fraction of edges of each individual. Any swap operations that can satisfy the two constraints are received. We repeat this swap operation m times for each individual in the initial population, where m is a predefined parameter. The population initialization procedure is shown in Algorithm 2. Algorithm 2 Population initialization. 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13:
Input: The initial network G0 ; The size of population μ. Generate a population Pμ , each individual Gk of which is set to G0 , where k=1, 2, . . ., μ. for each individual Gk do counter ←0; repeat Randomly select two edges ei j and ekl from Gk ; if satisfy the two links changing constraints then Remove ei j and ekl from Gk , add eik and e jl to Gk ; end if counter ← counter+1; until counter = m end for Output: Initial population Pμ .
3.2. Crossover operation The crossover operation is another effective search operation in MAs, which can search for solutions in a global area. However, usually, the crossover operation works on two individuals and the problem lies here is how to retain the community structure and the degree distribution of the parent individuals. To overcome this dilemma, we modify the crossover
294
W. Liu et al. / Information Sciences 422 (2018) 290–304
operation in [42] and present a new crossover operation. The new crossover operation works on two parent individuals and can keep invariant the intracommunity links’ amount of each community and the degree distribution in the two offspring individuals. The motivation of the crossover operation is to swap partial network structure of the two parent individuals. Select two individuals Ga and Gb . For each node i, perform the following swap operations with the crossover probability pc . First, obtain the sets of neighbors of the current node i in Ga and Gb and remove their common neighbors. Thus we can G
G
obtain two totally different neighbor sets: NiGa and Ni b . Next, for each node j ∈ NiGa , pick a node k ∈ Ni b that has not been selected. To keep invariant the degree distribution of the network, randomly select another node l that edge ekl exists in Ga but edge ejl does not exist in Ga . If eij and ekl in Ga can satisfy the other links changing constraint, swap the connections of them. Gb is conducted by the similar swap operators. This operation can generate two offspring individuals. The details of the crossover operation are shown in Algorithm 3. Algorithm 3 Crossover operation. Input: Two parent individuals Ga and Gb ; The crossover probability pc . for each node i ∈ Ga do Randomly generate a value r within [0, 1]; 3: if r < pc then 4: G Obtain NiGa and Ni b ; 5: 1:
2:
6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20:
for each node j ∈ NiGa do
G
Randomly select a node k ∈ Ni b ; Randomly select another node l that edge ekl exists in Ga but edge e jl does not exist in Ga ; if ei j and ekl in Ga satisfy the links changing constraints then Remove ei j and ekl from Ga , add eik and e jl to Ga ; end if Randomly select another node m that edge e jm exists in Gb but edge ekm does not exist in Gb ; if eik and e jm in Gb satisfy the links changing constraints then Remove eik and e jm from Gb , add ei j and ekm to Gb ; end if end for end if end for Gc ← Ga , Gd ← Gb ; Output: The two child individuals Gc and Gd .
3.3. Local search operation For enhancing the community robustness of networks without changing the original community structure and the degree distribution, we propose a novel local search operation based on a two-level learning strategy for individual refinement. The proposed local search works on the network at the node level and the community level, which aims at mitigating the small-scale and the large-scale targeted attacks, respectively. 3.3.1. The node-level learning strategy A node-level learning strategy is designed to make the nodes with high degree only connect with the nodes in the same community [40]. The inspiration of this strategy is to avoid those important intercommunity links being removed first. Assuming that those intercommunity links disappear, the network tends to be resolved into some isolated parts, thus the community robustness under the small-scale targeted attack declines sharply. In addition, under malicious attacks, nodes with high degree are always removed first. Hence, we swap edges to make those important nodes only connect with nodes in the same communities. Suppose G is an individual for local refinement. First assign G to G∗ , then for each existing intercommunity link eij in s
i G∗ do the following operations with the probability pl . First, determine three sets si , di and dmax , where si denotes the si community that node i belongs to, di denotes the degree of node i and dmax is the maximal degree in community si . If the degree of node i is the maximum in community si , randomly find another edge ekl that exists in si but eik does not exist in si . Then remove eij and ekl from G∗ and add eik and ejl to G∗ . Finally, if the community robustness of G∗ is larger than that of G, assign G∗ to G. The details of the node-level learning strategy are shown in Algorithm 4. Fig. 2 shows how the node-level learning strategy can be applied to mitigate the small-scale node attacks. In Fig. 2, different communities with different colors are separated by the red dash lines and the node with the highest degree will be removed (attacked) first. A toy network G2 is shown in Fig. 2(a), in which the node with the highest degree connects with a node in another community by an intercommunity link. As mentioned before, the node with the highest degree is removed first. Thus, the intercommunity link is eliminated, and the toy network is divided into two isolated parts, which makes the
W. Liu et al. / Information Sciences 422 (2018) 290–304
295
Algorithm 4 Node-level learning strategy. 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16:
Input: The individual G; The local search probability pl . G∗ ← G; for each existing intercommunity link ei j in G∗ do Randomly generate a value r within [0, 1]; if r < pl then si Determine three sets si , di and dmax ; si if di = dmax then Randomly find another edge ekl that exists in si but eik does not exist in si ; Remove ei j and ekl from G∗ , add eik and e jl to G∗ ; end if end if end for if Rc (G∗ ) > Rc (G ) then G ← G∗ ; end if Output: The individual G after performing the node-level learning strategy.
Fig. 2. Illustration of the node-level learning strategy. (a) The toy network G2 is divided into two communities and the node with the highest degree connects with another node in a different community. After removing it, the toy network G2 is divided into two isolated parts. (b) The new toy network G3 is generated by swapping two chosen edges in the toy network G2 and the node with the highest degree currently only connects with a node in the same community. After removing it, the toy network G3 is still a connected component. Different communities with different colors are separated by the red dash lines and the blackening lines are four edges involved in the swap operations. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
Rc1 of the network decline sharply. Whereas we swap the two randomly chosen edges (plotted with blackening lines) in the toy network G2 to generate the new toy network G3 as shown in Fig. 2(b). Currently, the node with the highest degree only connects with nodes in the same community. After removing it, the network is still a connected component. Thus, obviously, the community robustness of the network under the small-scale node attacks can be improved conspicuously. 3.3.2. The community-level learning strategy A community-level learning strategy is designed as our second-level learning strategy which works on the generated community network. The motivation of this strategy is to make the community network exhibit an “onion-like” structure.
296
W. Liu et al. / Information Sciences 422 (2018) 290–304
Under large-scale community attacks, Rc2 actually measures the node integrity of the community network. It has been found and verified that complex networks with “onion-like” structures are very robust against the targeted high degree attacks [14,35]. The “onion-like” structure is a special network structure where nodes with almost the same degree are connected. Hence, according to these studies on “onion-like” structure, we swap some intercommunity links to make its community network represent an “onion-like” structure. Moreover, the scale of the generated community network is much smaller than that of the original network. Therefore, the community robustness of networks can be significantly improved by swapping a small fraction of intercommunity links. Suppose G is an individual for local refinement. First construct the community network C. Suppose two connections cij and ckl in C are selected to swap. If
|dsi − dsk | + ds j − dsl < β × dsi − ds j + |dsk − dsl |
(5)
then swap the connections in C, where dsi , ds j , dsk and dsl denote the number of intercommunity links of si , sj , sk and sl in the community network C, respectively. β is a control parameter which lies in the interval [0, 1]. The extent of the decreased difference is controlled by β . In [42], the authors have demonstrated that the enhancement of robustness can reach the highest when the value of β is 0.9. So we set β as 0.9 in the following experiments and also get the best results. Assumed that the condition in Eq. (5) is satisfied, then swap edges in the original network G to change the connections in the community network C. First, assign G to G∗ and obtain two sets of edges denoted as Eij and Ekl , where Ei j = {emn | m ∈ si , n ∈ s j and si = s j } is the set of intercommunity links between community si and sj , and Ekl = {e pq | p ∈ si , q ∈ s j and si = s j } is the set of intercommunity links between community sk and sl . Next, repeat the following procedures until no edges can be swapped. Randomly select an edge emn in Eij and another edge epq in Ekl . If both emp and enq do not exist in G∗ , remove emn and epq from G∗ , and add emp and enq to G∗ . It should be pointed out that emn and epq certainly satisfy the links changing constraints without any judgment. Finally, if the community robustness of G∗ is larger than that of G, assign G∗ to G. The details of the community-level strategy are shown in Algorithm 5.
Algorithm 5 Community-level learning strategy. 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22:
Input: The individual G; The crossover probability pl ; Control parameter β . Construct the community network C; for each existing connection ci j ∈ C do Randomly generate a value r within [0, 1]; if r < pl then Randomly select another existing ckl ∈ C; connection if dsi − dsk + ds j − dsl < β × dsi − ds j + dsk − dsl then G∗ ← G; Determine Ei j and Ekl ; repeat Randomly select an edge emn in Ei j , and another edge e pq in Ekl ; if emp and enq do not exist in G∗ then remove emn and e pq from G∗ , and add emp and enq to G∗ . end if until no edges can be swapped if Rc (G∗ ) > Rc (G ) then G ← G∗ ; end if end if end if end for Output: The individual G after performing the community-level learning strategy.
Fig. 3 shows how the community-level learning strategy is used for the mitigation of the large-scale community attacks. We only consider swap operations of intercommunity links. In Fig. 3(a), both two pairs of intercommunity links satisfy the links changing constraints (plotted with blackening lines). The total number of connections of its community network keeps invariant after swapping the two pairs of intercommunity links. In this case, based on the studies of onion-like structure, its community network may change to a more robust network structure. Considering another two cases as shown in Fig. 3(b) and Fig. 3(c), only one pair of intercommunity links satisfy the links changing constraints (plotted with blackening lines). The total number of connections of their community network increases after swapping the one pair of intercommunity links. In these cases, the community robustness of the network under the large-scale targeted attacks has a great possibility to be improved.
W. Liu et al. / Information Sciences 422 (2018) 290–304
297
Fig. 3. Illustration of the community-level learning strategy. (a) Two pairs of intercommunity links satisfy the links changing constraints (plotted with blackening lines). The total number of connections of the community network keeps invariant after edge swaps. (b) One pair of intercommunity links satisfy the links changing constraints (plotted with blackening lines). The total number of connections of the community network increases after edge swaps. (c) One pair of intercommunity links satisfy the links changing constraints (plotted with blackening lines). The total number of connections of the community network increases after edge swaps. Different communities with different colors are separated by the red dash lines. The blackening lines are those intercommunity links for edge swaps. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
4. Experimental results In this section, BA networks with different scales generated by the well-known Barabsi–Albert model [1,3] and two diverse empirical real-world networks are applied to test the effectiveness of MA-CR. We compare the proposed algorithm MA-CR with a greedy algorithm and GA in the following experiments. The greedy algorithm is proposed in [21] to enhance the community robustness against network attacks. GA is the genetic algorithm we devised to mitigate network attacks. All the experiments are simulated by MATLAB on a PC with Intel (R), Core (TM), 5 CPU 3.2 GHZ, 4 GB RAM.
298
W. Liu et al. / Information Sciences 422 (2018) 290–304
The performance of MA-CR and GA are influenced by several important parameters. In this study, the population size μ and the maximum iteration gmax are both set to 100, and the crossover probability pc and the local search probability pl are determined to 0.5 and 0.8. These parameter values are selected by employing the trial-and-error method, which has been widely used to select the parameter values in many literatures [22,38,42]. 4.1. Evaluation metrics In addition to the community robustness index Rc , we adopt some other criteria to evaluate the difference between the optimized and the original networks in this study. Originally, we will introduce the normalized mutual information (NMI) [9], which is employed to measure the similarity between two network partitions. Suppose A and B are the partitions of the original and optimized network, respectively. let F be a confusion matrix whose element Fij indicates the number of nodes that exist in community i of the partition A and also in community j of the partition B. Fi. (F.j ) is the sum of elements in the i-th row (or the j-th column) of F. Thus, the NMI of A and B is written as:
A n B −2 ni=1 F log(Fi j N/Fi· F· j ) j=1 i j NMI (A, B ) = nA B F log(Fi· /N ) + nj=1 F· j log(F· j /N ) i=1 i·
(6)
where nA (nB ) is the number of clusters in the partition A(B). The value of NMI lies in the interval [0, 1]. If the partition A is equal to B, NMI (A, B ) = 1. If the partition A is totally different from B, NMI (A, B ) = 0. Obviously, the more similar the two partitions are, the larger the value of NMI is. Another criterion is the difference of intracommunity links (Ec ), which is used to evaluate the difference of the intracommunity links’ amount of two networks [21]. Suppose the original and the optimized networks can be denoted as the adjacency matrix M and M , respectively. The criterion is then written as:
Ec =
k
M pq − M pq /2
(7)
i=1 p,q∈si
If Ec < 0, it shows that |Ec | intracommunity links in the original network have been changed into intercommunity links after optimization. If Ec > 0, it indicates that |Ec | intercommunity links in the original network have been changed into intracommunity links after optimization. Otherwise Ec = 0, it means that the total number of intracommunity links has not changed. Ec is an intuitive criterion to measure the difference between the original and the optimized networks. 4.2. Experiments on BA networks In this experiment, we employ synthetic scale-free networks with various scales to verify the effectiveness of MA-CR. The BA model is applied to construct scale-free networks. The process of the BA model is as follow, beginning with a small connected network of N0 nodes. Each time a new node is introduced to the existing network, and this new node is connected to M0 existing nodes simultaneously, where 1 ≤ M0 ≤ N0 . The existing nodes are selected by the probabilities which are proportional to their degrees. This strategy is called preferential attachment which can be observed in numerous real-world networks and systems. All these algorithms, including the greedy algorithm, GA and MA-CR, adopt Rc as the objective function to optimize the network structure under the two constraints. Fig. 4 shows the results of these three algorithms on the constructed BA networks with 10 0, 20 0, 30 0 and 50 0 nodes where N0 = 3 and M0 = 2. In this experiment, the results are given by the average of 30 independent trails. As we can see from Fig. 4, the community robustness obtained by MA-CR is always larger than that of the other two algorithms. Table 1 further lists the best and worst robustness among different algorithms, and the average robustness and the variance of 30 trials are also given. From Table 1, we also find that the best and worst robustness found by MA-CR are nearly always better than that of the other two algorithms, and the variances of 30 trials of MA-CR are the smallest. This suggests that MA-CR can get more robust network and has better stability than the other two algorithms. Figs. 5–8 exhibit the community integrity of the four tested networks under attacks at each level. The results show that, the proposed MA-CR and the greedy algorithm have similar capabilities in enhancing the community robustness of the BA networks under the small-scale node attacks, but the networks optimized by MA-CR are the most robust under the large-scale community attacks. It illustrates that solutions obtained by the greedy algorithm and GA are not global optima. Solutions obtained by MA-CR are closer to global optima. In other words, the greedy algorithm is easy to trap in local optima, which lead to the poor ability of global search. The proposed MA-CR efficaciously synthesizes the capabilities of global search and local search, which is capable of finding global optima. 4.3. Experiments on real-world networks In this part, we test the performance of MA-CR on two empirical real-world networks (see Table 2 for a brief description): (i) The electronic circuit network [25]: It contains 122 nodes and 189 edges. Nodes denote electrical elements and edges
W. Liu et al. / Information Sciences 422 (2018) 290–304
299
Fig. 4. The comparison among three algorithms on the BA networks with different sizes. We set α as 0.4 in this experiment. The results are given by the average of 30 independent trials. Table 1 The community robustness Rc (for α = 0.4) of networks optimized by different algorithms on BA networks. Size
Algorithms
Best
Worst
Average ± Variance
100
Greedy GA MA-CR Greedy GA MA-CR Greedy GA MA-CR Greedy GA MA-CR
0.3231 0.3190 0.3196 0.3030 0.2878 0.3050 0.3036 0.2974 0.3119 0.2608 0.2713 0.2857
0.3055 0.2879 0.3120 0.2792 0.2615 0.2900 0.2781 0.2739 0.3008 0.2499 0.2514 0.2812
0.3139 ± 3.0533 ×10−5 0.3080 ± 6.9963 ×10−5 0.3155 ± 4.6278 × 10−6 0.2877 ± 3.5374 ×10−5 0.2762 ± 4.8277 ×10−5 0.2970 ± 8.9251 × 10−6 0.2874 ± 3.6404 ×10−5 0.2874 ± 4.2040 ×10−5 0.3054 ± 5.6058 × 10−6 0.2551 ± 1.0028 ×10−5 0.2630 ± 4.3180 ×10−5 0.2834 ± 2.8230 × 10−6
200
300
500
algorithm
algorithm
algorithm
algorithm
Fig. 5. The fraction S1 (q) and S2 (u) belonging to the largest connected components versus the removal of q nodes or u communities for the BA network with 100 nodes where N0 = 3 and M0 = 2. (a) The fraction S1 (q) versus the removal of q nodes. (b) The fraction S2 (u) versus the removal of u communities. Results are given by the average of 30 independent trails. Table 2 Properties of the two real-world networks. N: number of nodes; M: number of edges. Networks
N
M
Description
Electronic circuit USAir
122 332
189 2126
The electionic circuits network The US air transportation network
300
W. Liu et al. / Information Sciences 422 (2018) 290–304
Fig. 6. The fraction S1 (q) and S2 (u) belonging to the largest connected components versus removal of q nodes or u communities for the BA network with 200 nodes where N0 = 3 and M0 = 2. (a) The fraction S1 (q) versus the removal of q nodes. (b) The fraction S2 (u) versus the removal of u communities. Results are given by the average of 30 independent trails.
Fig. 7. The fraction S1 (q) and S2 (u) belonging to the largest connected components versus the removal of q nodes or u communities for the BA network with 300 nodes where N0 = 3 and M0 = 2. (a) The fraction S1 (q) versus the removal of q nodes. (b) The fraction S2 (u) versus the removal of u communities. Results are given by the average of 30 independent trails.
Fig. 8. The fraction S1 (q) and S2 (u) belonging to the largest connected components versus the removal of q nodes or u communities for the BA network with 500 nodes where N0 = 3 and M0 = 2. (a) The fraction S1 (q) versus the removal of q nodes. (b) The fraction S2 (u) versus the removal of u communities. Results are given by the average of 30 independent trails.
W. Liu et al. / Information Sciences 422 (2018) 290–304
301
Table 3 Properties in the electronic circuits and USAir network: the community robustness index Rc (for α = 0.4), normalized mutual information (NMI), the difference of intracommunity links (Ec ) and the number of communities k. Results are given by the average of 30 independent trails. Networks
Algorithms
Rc
NMI
Ec
k
Electronic circuit
Original Greedy algorithm GA MA-CR Original Greedy algorithm GA MA-CR
0.2272 0.3075 0.2951 0.3174 0.2436 0.2883 0.2940 0.3041
1 0.7972 0.7702 0.7388 1 0.8031 0.8453 0.8632
0 0 0 0 0 0 0 0
13 14.75 14.63 14.43 9 7.400 7.467 7.433
USAir
Table 4 The community robustness Rc (for α = 0.4) of networks optimized by different algorithms on the two real-world networks. Networks
Algorithms
Best
Worst
Average ± Variance
Electronic circuit
Greedy algorithm GA MA-CR Greedy algorithm GA MA-CR
0.3247 0.3078 0.3248 0.3140 0.3041 0.3081
0.2880 0.2612 0.3115 0.2580 0.2817 0.3005
0.3075 ± 6.6119 ×10−5 0.2951 ± 1.2850 ×10−4 0.3174 ± 1.3241 × 10−6 0.2883 ± 1.7200 ×10−4 0.2964 ± 5.1469 ×10−5 0.3041 ± 3.7657 × 10−6
USAir
Fig. 9. The fraction S1 (q) and S2 (u) belonging to the largest connected components versus the removal of q nodes or u communities for the electronic circuit network. (a) The fraction S1 (q) versus the removal of q nodes. (b) The fraction S2 (u) versus the removal of u communities. Results are given by the average of 30 independent trails.
correspond to the interconnections among them. (ii) The USAir network [4]: It is the US air transportation network with 332 nodes and 2126 edges. Nodes denote airports and edges indicate airlines. Both of the two networks are undirected and unweighted. As we can see from Table 3, the results clearly show that the proposed MA-CR outperforms the other two algorithms in terms of Rc . As to the criteria NMI and |Ec |, we note that the networks optimized by these three algorithms have similar values of NMI, and the values of |Ec | are all 0, which means that the proposed MA-CR can greatly retain the original community structure. In Table 4, we list the best and worst robustness among different runs of each algorithm, and the average robustness and the variance of all different trials are also given. We find that the best and worst robustness obtained by MA-CR are always better than that of the other two algorithms, and the variances of 30 trials of MA-CR are the smallest. This suggests that MA-CR can get more robust network and has better stability than the other two algorithms. In order to illustrate the above opinions, we also compare the community integrity of the two networks at each level in Figs. 9 and 10. By analyzing the results obtained by these three algorithms, we observe that the networks optimized by MACR are the most robust under the large-scale targeted attacks. As to small-scale targeted attacks, the networks optimized by the greedy algorithm are more robust. This further indicates that solutions obtained by the greedy algorithm are not global optima. Solutions obtained by MA-CR are closer to global optima. The results demonstrate the effectiveness of the proposed MA-CR.
302
W. Liu et al. / Information Sciences 422 (2018) 290–304
Fig. 10. The fraction S1 (q) and S2 (u) belonging to the largest connected components versus the removal of q nodes or u communities for the USAir network. (a) The fraction S1 (q) versus the removal of q nodes. (b) The fraction S2 (u) versus the removal of u communities. Results are given by the average of 30 independent trails.
Fig. 11. The robustness improvement of the networks optimized by three algorithms when α changes from 0 to 1. The two test networks are (a) The electronic circuit network and (b) The USAir network. Results are given by the average of 30 independent trails.
Additionally, we analyze the effects of the parameter α on different algorithms. The community robustness of the networks optimized by different algorithms with different α are given in Fig. 11. As can be seen, in the electronic circuit network, all these three algorithms can significantly improve the community robustness of the network. However, the networks optimized by MA-CR are always the most robust whenever α changes from 0 to 1. In the USAir network, all these three algorithms can still greatly improve the community robustness over the initial network, and the networks optimized by MA-CR are nearly the most robust under the two-level targeted attacks when α changes from 0 to 1. In our previous paper [21], we have analyzed that the community robustness can be greatly improved and the community structure of networks can basically remain unchanged when α is set to 0.4. Thus, we also set the same value of α for the comparison experiments in this study. Furthermore, the NMI, Q and Ec values of the networks optimized by MA-CR with different α are reported in Fig. 12. As we can see from Fig. 12, the NMI and Q values basically remain unchanged with different α , which means that the community structure optimized by MA-CR are not sensitive to α . The Ec values are always 0 with different α , which further confirms the above opinion. In order to demonstrate the effectiveness of the proposed two-level learning-strategy-based local search, the convergence trajectories of MA-CR and GA on the electronic circuit and USAir networks are shown in Fig. 13. The experiment results are averaged over 30 independent trials. Considering the electronic circuit network, we can learn that GA tends to converge at nearly the 50th generation, the Rc increases sharply from the 0th to the 50th generation and the running time of these 50 generations is 354 s. The Rc value stays at 0.295 after the 50th generation. In contrast, the Rc value of the network optimized by MA-CR can reach 0.295 at nearly the 10th generation. MA-CR just needs 189 s to reach 0.295. We also find that GA can not find the optimal solution. The solution optimized by MA-CR is much closer to the optimal solution.
W. Liu et al. / Information Sciences 422 (2018) 290–304
303
Fig. 12. The NMI, Q and Ec values of the networks optimized by MA-CR when α changes from 0 to 1. The two test networks are (a) The electronic circuit network and (b) The USAir network. Results are given by the average of 30 independent trails.
Fig. 13. The convergence trajectories of MA-CR and GA on the electronic circuit and USAir networks. (a) The electronic circuit network and (b) The USAir network. Results are given by the average of 30 independent trails.
Considering the USAir network, GA tends to converge at nearly the 70th generation. The Rc value increases sharply from the 0th to the 70th generation and the running time is 3226 s. The Rc value stays at 0.287. In contrast, the Rc of the network optimized by MA-CR can reach 0.287 at the 40th generation. MA-CR just needs 2962 s to reach 0.287. Also, the solution optimized by MA-CR is much closer to the optimal solution. The results illustrate that to reach the same Rc value, the running time of MA-CR is much less than that of GA. If we run GA and MA-CR for the same wall-clock time, MA-CR can search for much more robust network. It illustrates that the proposed knowledge-based local search is very effective and efficient. 5. Conclusions In this paper, we propose a memetic algorithm based on a two-level learning strategy to enhance the community robustness of networks. The proposed MA-CR efficiently synthesizes GA and the proposed two-level learning-strategy-based local search procedure. The two-level learning strategy is devised based on the potential characteristics of the node structure and community structure of networks, which aims at mitigating the two-level targeted attacks. Experiments on both synthetic and real-world networks show the effectiveness and stability of the proposed MA-CR compared with several state-of-the-art algorithms. Meanwhile, the proposed MA-CR can effectively retain the community structure of networks. Acknowledgements This work was supported by the National Natural Science Foundation of China (Grant nos. 61772393, 61422209), the National Program for Support of Top-notch Young Professionals of China, and the National Key Research and Development Program of China (Grant no. 2017YFB0802200).
304
W. Liu et al. / Information Sciences 422 (2018) 290–304
References [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] [27] [28] [29] [30] [31] [32] [33] [34] [35] [36] [37] [38] [39] [40] [41] [42] [43]
R. Albert, A.L. Barabási, Statistical mechanics of complex networks, Rev. Mod. Phys. 74 (1) (2002) 47. R. Albert, H. Jeong, A.L. Barabási, Error and attack tolerance of complex networks, Nature 406 (6794) (20 0 0) 378–382. A.L. Barabási, R. Albert, Emergence of scaling in random networks, Science 286 (5439) (1999) 509–512. V. Batagelj, A. Mrvar, “pajek datasets”, URL http://vlado.fmf.uni-lj.si/pub/networks/data/default.htm. A. Beygelzimer, G. Grinstein, R. Linsker, I. Rish, Improving network robustness by edge modification, Physica A 357 (3) (2005) 593–612. V.D. Blondel, J.-L. Guillaume, R. Lambiotte, E. Lefebvre, Fast unfolding of communities in large networks, J. Stat. Mech. 2008 (10) (2008) P10008. Q. Cai, M. Gong, L. Ma, S. Ruan, F. Yuan, L. Jiao, Greedy discrete particle swarm optimization for large-scale social network clustering, Inf. Sci. 316 (2015) 503–516. F. Caraffini, F. Neri, G. Iacca, A. Mol, Parallel memetic structures, Inf. Sci. 227 (2013) 60–82. L. Danon, A. Diaz Guilera, J. Duch, A. Arenas, Comparing community structure identification, J. Stat. Mech. 2005 (09) (2005) P09008. R. Dawkin, The Selfish Gene, Oxford University Press 1 (1976) 976. L.N. Ferreira, L. Zhao, Time series clustering via community detection in networks, Inf. Sci. 326 (2016) 227–242. S. Fortunato, Community detection in graphs, Phys. Rep. 486 (3) (2010) 75–174. M. Gong, J. Yan, B. Shen, L. Ma, Q. Cai, Influence maximization in social networks based on discrete particle swarm optimization, Inf. Sci. 367 (2016) 600–614. H.J. Herrmann, C.M. Schneider, A.A. Moreira, J.S. Andrade Jr, S. Havlin, Onion-like network topology enhances robustness against malicious attacks, J. Stat. Mech. 2011 (01) (2011) P01027. S. Iyer, T. Killingback, B. Sundaram, Z. Wang, Attack robustness and centrality of complex networks, PLoS ONE 8 (4) (2013) e59613. Z.Y. Jiang, M.G. Liang, Improving the network load balance by adding an edge, in: Advanced Materials Research, vol. 433, Trans Tech Publ, 2012, pp. 5147–5151. H.S. Koppula, Study and Improvement of Robustness of Overlay Networks, Department of Computer Science & Engineering, Indian Institute of Technology–Kharagpur, 2008. Ph.D. thesis. N. Krasnogor, J. Smith, A tutorial for competent memetic algorithms: model, taxonomy, and design issues, IEEE Trans. Evol. Comput. 9 (5) (2005) 474–488. D. Li, B. Fu, Y. Wang, G. Lu, Y. Berezin, H.E. Stanley, S. Havlin, Percolation transition in dynamical traffic network with evolving critical bottlenecks, Proc. Nation. Acad. Sci. 112 (3) (2015) 669–672. J. Ludescher, A. Gozolchiani, M.I. Bogachev, A. Bunde, S. Havlin, H.J. Schellnhuber, Improved el niño forecasting by cooperativity detection, Proc. Nation. Acad. Sci. 110 (29) (2013) 11742–11745. L. Ma, M. Gong, Q. Cai, L. Jiao, Enhancing community integrity of networks against multilevel targeted attacks, Phys. Rev. E 88 (2) (2013) 022810. L. Ma, M. Gong, J. Liu, Q. Cai, L. Jiao, Multi-level learning based memetic algorithm for community detection, Appl. Soft Comput. 19 (2014) 121–133. L. Ma, M. Gong, J. Yan, F. Yuan, H. Du, A decomposition-based multi-objective optimization for simultaneous balance computation and transformation in signed networks, Inf. Sci. 378 (2017) 144–160. S. Maslov, K. Sneppen, Specificity and stability in topology of protein networks, Science 296 (5569) (2002) 910–913. R. Milo, S. Itzkovitz, N. Kashtan, R. Levitt, S. Shen Orr, I. Ayzenshtat, M. Sheffer, U. Alon, Superfamilies of evolved and designed networks, Science 303 (5663) (2004) 1538–1542. M. Mitchell, An Introduction to Genetic Algorithms, MIT Press, 1998. F. Morone, H.A. Makse, Influence maximization in complex networks through optimal percolation, Nature 524 (2015) 65–68. P. Moscato, A. Mendes, R. Berretta, Benchmarking a memetic algorithm for ordering microarray data, BioSystems 88 (1) (2007) 56–75. M.E. Newman, M. Girvan, Finding and evaluating community structure in networks, Phys. Rev. E 69 (2) (2004) 026113. Q.H. Nguyen, Y.S. Ong, M.H. Lim, A probabilistic memetic framework, IEEE Trans. Evol. Comput. 13 (3) (2009) 604–623. Y.S. Ong, A.J. Keane, Meta-lamarckian learning in memetic algorithms, IEEE Trans. Evol. Comput. 8 (2) (2004) 99–110. Y.S. Ong, M.H. Lim, X. Chen, Research frontier-memetic computationłpast, present & future, IEEE Comput. Intell. Mag. 5 (2) (2010) 24. F. Radicchi, C. Castellano, F. Cecconi, V. Loreto, D. Parisi, Defining and identifying communities in networks, Proc. Natl. Acad. Sci. U.S.A. 101 (9) (2004) 2658–2663. E. Sanchez, G. Squillero, A. Tonda, Industrial Applications of Evolutionary Algorithms, 34, Springer, 2012. C.M. Schneider, A.A. Moreira, J.S. Andrade, S. Havlin, H.J. Herrmann, Mitigation of malicious attacks on networks, Proc. Nation. Acad. Sci. 108 (10) (2011) 3838–3841. C. Song, S. Havlin, H.A. Makse, Self-similarity of complex networks, Nature 433 (7024) (2005) 392–395. K.C. Tan, S.C. Chiam, A. Mamun, C.K. Goh, Balancing exploration and exploitation with adaptive variation for evolutionary multi-objective optimization, Eur. J. Oper. Res. 197 (2) (2009) 701–713. K. Tang, Y. Mei, X. Yao, Memetic algorithm with extended neighborhood search for capacitated arc routing problems, IEEE Trans. Evol. Comput. 13 (5) (2009) 1151–1166. B. Yan, S. Gregory, Finding missing edges in networks based on their community structure, Phys. Rev. E 85 (5) (2012) 056112. Y. Yang, Z. Li, Y. Chen, X. Zhang, S. Wang, Improving the robustness of complex networks with preserving community structure, PLoS ONE 10 (2) (2015) e0116551. A. Zeng, W. Liu, Enhancing network robustness against malicious attacks, Phys. Rev. E 85 (6) (2012) 066130. M. Zhou, J. Liu, A memetic algorithm for enhancing the robustness of scale-free networks against malicious attacks, Physica A 410 (2014) 131–143. Z. Zhu, Y.S. Ong, M. Dash, Wrapper–filter feature selection algorithm using a memetic framework, IEEE Trans. Syst. Man Cybern. Part B 37 (1) (2007) 70–76.