Analysis of complex network performance and heuristic node removal strategies

Analysis of complex network performance and heuristic node removal strategies

Commun Nonlinear Sci Numer Simulat 18 (2013) 3458–3468 Contents lists available at SciVerse ScienceDirect Commun Nonlinear Sci Numer Simulat journal...

1MB Sizes 0 Downloads 74 Views

Commun Nonlinear Sci Numer Simulat 18 (2013) 3458–3468

Contents lists available at SciVerse ScienceDirect

Commun Nonlinear Sci Numer Simulat journal homepage: www.elsevier.com/locate/cnsns

Analysis of complex network performance and heuristic node removal strategies Ehsan Jahanpour, Xin Chen ⇑ Industrial and Manufacturing Engineering Program, Southern Illinois University, Edwardsville, IL 62026-1805, USA

a r t i c l e

i n f o

Article history: Received 27 August 2012 Received in revised form 26 April 2013 Accepted 30 April 2013 Available online 15 May 2013 Keywords: Complex network Network centrality Node removal

a b s t r a c t Removing important nodes from complex networks is a great challenge in fighting against criminal organizations and preventing disease outbreaks. Six network performance metrics, including four new metrics, are applied to quantify networks’ diffusion speed, diffusion scale, homogeneity, and diameter. In order to efficiently identify nodes whose removal maximally destroys a network, i.e., minimizes network performance, ten structured heuristic node removal strategies are designed using different node centrality metrics including degree, betweenness, reciprocal closeness, complement-derived closeness, and eigenvector centrality. These strategies are applied to remove nodes from the September 11, 2001 hijackers’ network, and their performance are compared to that of a random strategy, which removes randomly selected nodes, and the locally optimal solution (LOS), which removes nodes to minimize network performance at each step. The computational complexity of the 11 strategies and LOS is also analyzed. Results show that the node removal strategies using degree and betweenness centralities are more efficient than other strategies. Ó 2013 Elsevier B.V. All rights reserved.

1. Introduction and background Many complex networks display different levels of vulnerability against node removals [1,2]. Such differences are caused by network structure and the relative location of a node in the network. Mishkovski et al. [3] proposed normalized average edge betweenness as a vulnerability index to measure the robustness of different networks. They applied this index to assess several synthetic and real-world networks including Erdos Renyi, geometric random, small world, human brain, United Sates Power Grid, collaboration, urban transport, and European Union Power Grid networks. Moreover, Grubesic et al. [4] introduced three classes of structured approaches for evaluating network performance and vulnerability including network attributes, connectivity, and capacity. Identifying and removing important nodes from complex networks are a great challenge in many real-world applications. For example, to destroy criminal networks, it is desired to break a network into smaller components or increase the distance between nodes. A component is comprised of a group of nodes that connect to each other directly or indirectly; nodes in one component do not connect to nodes in another component. To prevent the spread of infectious diseases, it is important to reduce the speed of spreading diseases and the scale of outbreaks. The objective of this research is to investigate how different node removal strategies affect the structure and therefore performance of complex networks. Network performance is a network’s ability to diffuse information or entities. Holme et al. [5] introduced two network performance metrics along with four different node and link attack strategies to study the efficiency of the attack strategies over six types of networks, ⇑ Corresponding author. Tel.: +1 618 650 2853; fax: +1 618 650 2555. E-mail address: [email protected] (X. Chen). 1007-5704/$ - see front matter Ó 2013 Elsevier B.V. All rights reserved. http://dx.doi.org/10.1016/j.cnsns.2013.04.030

E. Jahanpour, X. Chen / Commun Nonlinear Sci Numer Simulat 18 (2013) 3458–3468

3459

including scientific collaboration, computer, Erdos Renyi, Watts-Strogatz, Barabasi-Albert, and clustered scale-free networks. In this study, six network performance metrics, including four new metrics, from four categories, speed, scale, homogeneity, and diameter, are applied to quantify network structure and performance. Eleven node removal strategies are designed to select and remove nodes and their impacts on the six network metrics are analyzed. After the catastrophic event of September 11, 2001, there has been an increasing interest in studying the impact of node removals on complex networks. Krebs [6] mapped the communication network of the hijackers of September 11, 2001, whose structure was further analyzed in several studies, e.g., [7–9]. Using emails as a communication and data transmission tool in criminal and terrorist groups was also studied [10]. Previous research [11] suggested that the unique structural properties of criminal networks include low connectedness, high centrality, covertness, dynamic nature, and dominant roles of certain nodes. In social science, to quantify the importance of a node in a social network is of great interest since the importance of a node indicates how fast information transmits from that node or how influential the node is in the network. For a given network, the importance of a node is determined by the relative location of the node. Freeman [12] and Cornwell [13] attempted to quantify the importance of nodes in social networks using node degrees, closeness, complement-derived closeness, and betweenness. Eigenvector centrality was also used to measure the popularity and importance of a node in social networks [14–16]; it measures the propensity of a node to connect to other nodes. These centrality metrics measure the impact of a node on network performance. For instance, the betweenness centrality [12] measures the degree to which a node falls on the shortest path connecting other pairs of nodes. Borgatti [9] defined two types of problems to assess the importance of nodes. The Key Player Problem-Positive (KPP-POS) studied the extent to which a node is embedded in the network. The Key Player Problem-Negative (KPP-NEG) studied the amount of reduction in cohesiveness of a network after elimination of a node from the network. The centrality metrics developed in previous research may be applied to the KPP-POS to identify the importance of nodes. For example, Zhang and Han [17] applied node degrees, betweenness, and reciprocal distance closeness to identify important nodes in supply-chain networks. In railway industry [18], important nodes were identified to improve inspection and maintenance procedures. The performance of a given railway network was measured in terms of the speed at which trains can travel. A Monte Carlo simulation approach was applied to identify the most important section of the railway network whose removal decreases network performance the most, and to prioritize improvement actions. There are, however, numerous problems in complex network analysis which are classified as the KPP-NEG, for instance, selecting a criminal or a group of criminals for neutralization, or quarantining a selected group of people to reduce the spread of diseases. It is important to understand that removing a node from a network not only eliminates the effect of that node, but may also affect the impact on the network by other nodes. Everett and Borgatti [19] introduced induced centrality of a group of nodes in a complex network. Induced centrality, in essence, measures the difference between the centrality of a full network and centrality of the network after removing a group of nodes. Albert et al. [1] studied the changes in network diameter when a small fraction of nodes is removed. Mathematical and optimization techniques were proposed to model complex networks and optimize the network’s performance. Shen and Smith [20] developed an optimal polynomial-time dynamic programming algorithm for (i) maximizing the number of connected components, and (ii) minimizing the maximum component size in trees when a limited number of removals are allowed. Although this algorithm was developed for trees, it was applied to general networks with increased computational complexity. The authors also proved that finding the optimal solution for weighted networks in which each node’s deletion costs differently is NP-hard. Zhang et al. [21] developed a multi-objective model to identify important nodes whose removal can (i) minimize the link removal cost, and (ii) minimize reliability. A multi-objective evolutionary algorithm was used to identify the most important nodes of a network. Although using optimization techniques may help find the optimal solution, i.e., a set of important nodes, and the optimal value for network performance metrics, they are computationally infeasible (NP-complete or NP-hard) for large networks. The purpose of this study is to design computationally efficient techniques for identifying a set of important nodes whose removal has significantly large impact on network performance. Six network performance metrics, including largest geodesic distance, network reciprocal distance, and four new metrics including average node coverage, largest coverage probability, expected scale, and shortest distance homogeneity, are used to assess the impact of node removals on a complex network. The focus of this research is the KPP-NEG. The objective is therefore to minimize the six network performance metrics. To find the optimal value for any network metric, a brute force evaluation of all possible removals may be applied. For n instance, finding the optimal r nodes in a network of n nodes (r 6 n) requires ð Þt time units, where t denotes the compur tation time required to calculate the network performance metric of a network of (n  r) nodes. The computational complexity of the brute force evaluation increases exponentially as the network size n increases. Efficient heuristic node removal strategies are needed to reduce computational complexity while achieving optimal or near-optimal network performance. Eleven node removal strategies, including the one-time-calculated degree, multiple-time-calculated degree, one-time-calculated betweenness, multiple-time-calculated betweenness, one-time-calculated reciprocal distance, multiple-time-calculated reciprocal distance, one-time-calculated complementary closeness, multiple-time-calculated complementary closeness, one-time-calculated eigenvector centrality, multiple-time-calculated eigenvector centrality, and random strategies, are investigated in this research. During node removals, it is assumed that the network structure is known and does not change due to factors other than node removals.

3460

E. Jahanpour, X. Chen / Commun Nonlinear Sci Numer Simulat 18 (2013) 3458–3468

The rest of this article is organized as follows: Section 2.1 introduces the six network performance metrics, Section 2.2 illustrates the eleven node removal strategies. In Section 3, the node removal strategies are applied to the Krebs network [6] and compared to the locally optimal solution. Impacts of the strategies on the six network performance metrics are analyzed in Section 4. Section 5 concludes the article with future research directions. 2. Network performance modeling and node removal strategies The six network performance metrics quantify networks’ structural properties and their abilities to diffuse information or entities. The elven node removal strategies help select and remove nodes to minimize network performance metrics. 2.1. Network performance metrics The network performance metrics are applied to measure networks’ diffusion speed, diffusion scale, homogeneity, and diameter. Diffusion speed measures how fast information or entities distribute throughout a network. Diffusion speed is of great interest to network performance analysis. For example, one of the objectives in neutralizing a criminal network is to increase the time it takes to pass information between nodes by removing important nodes, and therefore break the network or at least provide law enforcement agencies with more time to respond. The objective of node removal is to minimize diffusion speed. Two metrics are designed to measure diffusion speed in a network. The first metric, Network Reciprocal Distance (NRD), calculates the average reciprocal of the geodesic distance between every pair of nodes (Eq. (1)). This metric is similar in computation to the cohesiveness measurement proposed by Borgatti [9]

NRD ¼

2

Pn P i¼1

1 j>i dðpi ;pj Þ

nðn  1Þ

:

ð1Þ

In Eq. (1), n is the size of the network and d(pi, pj) denotes the geodesic distance, the distance of the shortest path, between nodes pi and pj. Assuming that every node sends information to all its directly connected nodes in one time unit, it takes d(pi, pj) time units for information to be sent from pi to pj. Information diffuses at the speed of dðp1i ;pj Þ distance per one time unit. The NRD measures the average diffusion speed in a network. dðp1i ;pj Þ ¼ 0 if pi and pj are disconnected. Another new network diffusion speed metric describes how fast information or entities are distributed from a node (center) to other nodes (periphery) [7]. Using the center-periphery concept, the percentage of nodes that receive information at each time step is calculated. The Average Node Coverage (ANC; Eq. (2)), however, computes the average number of nodes that receive information or entities in one time unit given that all connected nodes receive information or entities and it takes one time unit for a node to pass information to all its directly connected nodes

Pn ANC ¼

CSðpi Þ1 i¼1 LGDðpi Þ

nðn  1Þ

ð2Þ

:

In Eq. (2), the component size, CS(pi), is the number of nodes in the component that contain node pi; CS(pi)  1 is the number of nodes that are connected to node pi. LGD(pi) is the largest geodesic distance of the component to which node pi belongs. Since information or entities diffuse through the shortest paths, the LGD of a component indicates the longest i Þ1 possible time it takes for information or entities to diffuse to all nodes in the component. CSðp measures the average LGDðpi Þ number of nodes that receive the information or entities originated from pi in one time unit. n(n  1) is the normalization i Þ1 constant. CSðp ¼ 0 for a component of a single node pi. LGDðpi Þ Similar to diffusion speed, diffusion scale is analyzed using two metrics. The objective of node removal is to minimize diffusion scale. The Largest Coverage Probability (LCP; Eq. (3)), calculates the probability that the largest portion of a network receives information or entities.

LCP ¼

maxfCSðcl Þg n

l ¼ 1; 2; . . . ; m:

ð3Þ

In Eq. (3), m is the number of components in a network and CS(cl) is the size of lth component. A component is small if it 2 n2 has at most 23 nodes and large if it has at least n3 nodes [22]. If LCP 6 11 , the network contains only small components 2n3 1 whereas if LCP P 1 , the network contains at least a large component. n3 The Expected Scale (ES; Eq. (4)) computes the expected value of diffusion scale given that each node has the same probability to originate information or entities diffusion

ES ¼

m m X CSðcl Þ X CSðcl Þ2 CSðcl Þ  ¼ : n n i¼1 i¼1

ð4Þ

lÞ In Eq. (4), CSðc is the probability the node that initiates diffusion belongs to component cl. CS(cl) is the size of component cl n that indicates the maximum diffusion scale if a node in that component initiates diffusion. Structural properties are also important parameters to evaluate network performance. Watts and Strogatz [23] quantified the structural properties of networks using path length and clustering coefficient. Albert et al. [1] used the average degree of

E. Jahanpour, X. Chen / Commun Nonlinear Sci Numer Simulat 18 (2013) 3458–3468

3461

a network to describe homogeneity of that network and computed the average length of the shortest paths to quantify interconnectedness; they also studied network robustness against random and targeted attacks. In this research, homogeneity is defined as the average length of the shortest paths divided by the LGD. The objective of node removal is to minimize homogeneity. Eq. (5) defines a new metric, the Shortest Distance Homogeneity (SDH), where d(pi, pj) is the geodesic distance between pi and pj and it equals zero for disconnected nodes. Larger SDH indicates higher homogeneity; the geodesic distances between nodes do not vary a lot and the importance of nodes in diffusing information or entities do not vary significantly

SDH ¼

P P 2  ni¼1 nj¼iþ1 dðpi ; pj Þ : LGD  n  ðn  1Þ

ð5Þ

The LGD (Eq. (6)), or network diameter [22], alone is a useful network performance metric. The LGD measures the difference between networks when other properties of the networks, e.g., the average degree and size, are known. The LGD indicates the maximum time it takes to diffuse information or entities to connected nodes. The objective of node removal is to minimize the LGD.

LGD ¼ maxfdðpi ; pj Þg; dðpi ; pj Þ ¼ 0 ifpi

and pi

are disconnected:

ð6Þ

2.2. Node removal strategies Various algorithms were proposed to remove nodes from a network. Optimization techniques may only be applied in a limited number of problems [20,21]. Node-based measurements were used to select nodes for removals [1,9,17], but their performance was not analyzed or validated using necessary network performance metrics. Five node centrality metrics are studied in this research, including degree, betweenness, reciprocal closeness, complement-derived closeness, and eigenvector centrality. Each centrality metric is either calculated only one time for a network, or recalculated after removing each node. Ten structured node removal strategies (=2 strategies per centrality metric  5 centrality metrics) are used to remove nodes. In addition, a random node removal strategy is used as a baseline to examine the performance of the 10 structured node removal strategies. The five node centrality metrics are defined below. The degree of a node pk in a network is the number of links connected to pk [24]. This node metric is an indicator of the node’s connections with other nodes in the network. Eq. (7) calculates the degree, Deg(pk), of node pk, where n is the size of the network; bðpi ; pk Þ ¼ 1 if pi and pk are connected by an undirected link and bðpi ; pk Þ ¼ 0 otherwise.

Degðpk Þ ¼

n X bðpi ; pk Þ:

ð7Þ

i¼1

The betweenness of a node measures the frequency with which the node falls on the shortest paths connecting pairs of other nodes [12]. This metric indicates the potential of a node in controlling communications in a network. Eq. (8) calculates the betweenness, Betðpk Þ, of node pk, where bij(pk) (Eq. (9)) is the portion of the shortest paths connecting pi with pj that contain node pk. gij in Eq. (9) is the total number of shortest paths connecting pi and pj and gij(pk) is the number of shortest paths that connect pi and pj and contain pk

Betðpk Þ ¼

n X n X bij ðpk Þ;

ð8Þ

i¼1 iþ1

bij ðpk Þ ¼

1  g ij ðpk Þ: g ij

ð9Þ

The closeness of a node measures the degree to which the node is close to all other nodes in the network [12]. Two closeness metrics are used in this research. The reciprocal closeness, RCl(pk), of node pk is the sum of the reciprocals of the geo1 desic distances between pk and other nodes (Eq. (10)), where dðp;p ¼ 0 if pi and pk are disconnected. Þ k

RClðpk Þ ¼

X k–i

1 : dðpi ; pk Þ

ð10Þ

The complement-derived closeness [13] of a node is the same as the closeness centrality [12] of the node if the network has only one (connected) component. If a network has multiple (disconnected) components, the closeness centrality may be calculated by letting dðp1;p Þ ¼ 0 if pi and pk are disconnected. The complement-derived closeness is calculated by constructing i k a complementary network of the original network. Eq. (11) calculates the complement-derived closeness, CCl(pk), of node pk, where G represents the original network; gc is the number of nodes in Gc, the complement of G; dGc ðpk ; pj Þ denotes the distance between pk and pj in Gc; g(pk) is the number of nodes in pk’s component in G; and C 0c ðpk Þ is the closeness centrality of pk.

0

1 "g #1 c X CClðpk Þ ¼ @1  ðpk ; pj Þ ðg c  f ðpk ÞÞAC 0c ðpk Þ: j¼1

ð11Þ

3462

E. Jahanpour, X. Chen / Commun Nonlinear Sci Numer Simulat 18 (2013) 3458–3468

The eigenvector centrality of a node identifies the importance of the node according to the number and quality of its connections. A node with a large eigenvector centrality is important because either it is connected to more nodes or connected to nodes with more connections. Eq. (12) calculates the eigenvector centrality, EVC(pk), of node pk, where k is the largest eigenvalue of the adjacency matrix A of the network; Aik is the ikth element of A; Aik = 1 if pi and pk are directly connected and Aik = 0 otherwise; and EVC ¼ ½EVCðp1 Þ; EVCðp2 Þ; . . . ; EVCðpn ÞT is the eigenvector corresponding to k.

EVCðpk Þ ¼

n 1X Aik EVCðpi Þ: k i¼1

ð12Þ

Fig. 1 illustrates a network of 10 nodes and their degree and eigenvector centrality. Although node 7 has the largest degree, node 6 has the largest eigenvector centrality because it is connected to more important nodes than node 7. The five node centrality metrics may be used to select nodes for removal. Since each metric can be computed once for the original network, or multiple times as nodes are removed, there are total 11 strategies, including 10 different structured node removal strategies and the random node removal strategy: (1) One-time-calculated Degree Strategy (ODS) selects nodes according to their degree in the original network. Nodes with larger degree are removed first. If multiple nodes have the same degree, one of them is randomly selected for removal; this rule is applied to all 11 node removal strategies. (2) Multiple-time-calculated Degree Strategy (MDS) selects the node with the largest degree at each step and removes it. The degree is recalculated at each step. (3) One-time-calculated Betweenness Strategy (OBS) selects nodes with the largest betweenness calculated for the original network and removes them. (4) Multiple-time-calculated Betweenness Strategy (MBS) selects nodes with the largest betweenness and removes them. The betweenness is recalculated at each step. (5) One-time-calculated Reciprocal closeness Strategy (ORS) selects nodes with the largest reciprocal closeness calculated for the original network and removes them. (6) Multiple-time-calculated Reciprocal closeness Strategy (MRS) selects nodes with the largest reciprocal closeness and removes them. The reciprocal closeness is recalculated at each step. (7) One-time-calculated Complement-derived closeness Strategy (OCS) selects nodes with the largest complementary closeness calculated for the original network and removes them. (8) Multiple-time-calculated Complement-derived closeness Strategy (MCS) selects nodes with the largest complementary closeness and removes them. The complementary closeness is recalculated at each step. (9) One-time-calculated Eigenvector centrality Strategy (OES) selects nodes with the largest eigenvector centrality calculated for the original network and removes them. (10) Multiple-time-calculated Eigenvector centrality Strategy (MES) selects nodes with the largest eigenvector centrality and removes them. The eigenvector centrality is recalculated at each step. (11) Random strategy randomly selects nodes and removes them. 3. Numerical example The 11 node removal strategies are applied to remove nodes from the main component of the Krebs criminal network [6], which consists of 63 nodes. After removing each node, the six network performance metrics are calculated to compare the performance of the removal strategies. C++ codes are written to simulate the node removal strategies and calculate the network performance metrics. The Krebs network is the network of hijackers who participated in the September 11, 2001 attacks. Although complete data on such a network might not exist, Krebs [6] provided detailed information on how the network was structured. The Krebs network consists of 74 suspected members; Borgatti [9] analyzed the main component, i.e., 63 individuals that played the most important role, in the group. These 63 nodes and the undirected links between them are used in the numerical

Fig. 1. Eigenvector centrality for a network of ten nodes.

E. Jahanpour, X. Chen / Commun Nonlinear Sci Numer Simulat 18 (2013) 3458–3468

3463

example to analyze the performance of different node removal strategies. Fig. 2 illustrates the main component of the Krebs network. Each square represents a node and each line represents an undirected link between two nodes. Table 1 shows the values of the six network performance metrics for the main component of the Krebs network. The NRD reveals that along the path from a node to another, information is diffused by covering 40% of the path during each time unit. In other words, it 1 takes on average 0:4 ¼ 2:5 time units for information to diffuse from a random node to another. The ANC shows that on average 21.7% of nodes receive information during each time unit. The LCP of 1 denotes that the largest component is comprised of all 63 nodes. The ES shows that information is expected to be received by all 63 nodes. The SDH illustrates that the average geodesic distance between all nodes is 49.2% of the largest geodesic distance; the distance between certain nodes is twice as large as the average geodesic distance. Finally, the LGD indicates that the largest geodesic distance is six; it takes at most six time units to diffuse information throughout the network. Since the 11 node removal strategies are heuristic algorithms, it is necessary to compare their performance to the locally optimal value of each network performance metric. To find the locally optimal solution at each step, each network performance metric is calculated for the network assuming a node is removed; the node whose removal results in the minimum value of the metric is removed and the minimum value is used to assess the performance of the node removal strategies. The minimum value identified at each step may not be the global optimal value for the network performance metric. For instance, removing node 5 from Fig. 2 results in the minimum ES (ESn5 = 56.19) and node 5 is removed at the first step. At the second step, removing node 23 results in the minimum ES (ESn5;23 = 49.69) and node 23 is removed. Removing nodes 34 and 27, however, results in lower ES (ESn34;27 = 29.95) than removing nodes 5 and 23. The minimum value identified at each step is therefore a locally optimal value of the network performance metric, which might not be the same as the global optimal value; the node selected for removal is a Locally Optimal Solution (LOS). In this research, the performance of the 11 node removal strategies are compared the LOS. Fig. 3(a) and (b) describe how the NRD and ANC, respectively, change as nodes are removed from the Krebs network. Both metrics measure diffusion speed. Total 60 nodes are removed from the Krebs network because at least three nodes are needed to apply the OBS and MBS. Fig. 3(a) and (b) each have 12 saw-toothed lines; each of which represents the performance of one of the 11 node removal strategies or the LOS. In Fig. 3(a), for instance, NRD = 0.0970 after 10 nodes are removed according to the MRS whereas the minimum NRD = 0.0773, which is achieved by the LOS. Similarly, Fig. 4(a) and (b) describe how the LCP and ES, respectively, change as nodes are removed from the Krebs network; both metrics measure diffusion

Fig. 2. Main component of the krebs network.

3464

E. Jahanpour, X. Chen / Commun Nonlinear Sci Numer Simulat 18 (2013) 3458–3468

Table 1 Network performance metrics for the main component of the Krebs network. Network performance metrics

NRD

ANC

LCP

ES

SDH

LGD

Main component of the Krebs network

0.400

0.217

1.000

63.000

0.492

6

Fig. 3. Network diffusion speed under different node removal strategies.

scale. In addition to the 11 node removal strategies and the LOS, Fig. 4(a) also shows two lines, Large Comp and Small Comp,   2 1 which represent the normalized minimum size of a large component n3 ¼ nn3 and the normalized maximum size of a ! 2 small component

1

0:5n3 ¼

n3 2

n

, respectively. Fig. 5 describes how the SDH, which measures network homogeneity, changes

as nodes are removed. Fig. 6 describes how the LGD, which measures network diameter, changes as nodes are removed. Figs. 3–6 reveal several important insights: (1) The 10 structured node removal strategies are more efficient than the random strategy because they decrease the network performance metrics faster than the random strategy. At least one of the 10 strategies achieves the same or better network performance (lower values) than the random strategy regardless of how many nodes are removed. (2) The LOS is not always better than the 10 structured strategies because the LOS may not be the same as the global optimum of the network performance metrics.

Fig. 4. Network diffusion scale under different node removal strategies.

E. Jahanpour, X. Chen / Commun Nonlinear Sci Numer Simulat 18 (2013) 3458–3468

3465

Fig. 5. Network homogeneity under different node removal strategies.

Fig. 6. Network diameter under different node removal strategies.

(3) The importance of a node may change significantly after one or multiple node removals. For instance, a member who was not important in a criminal organization could become important after one or more members were removed from the organization. The multiple-time-calculated strategies (MDS, MBS, MRS, MCS, and MES) are more effective in identifying important nodes than the one-time-calculated strategies (ODS, OBS, ORS, OCS, and OES) because the multipletime-calculated strategies select nodes with the largest node centrality metrics at each step. Sometimes, however, the performance of the one-time-calculated strategies is better than that of the multiple-time-calculated strategies. (4) All six network performance metrics sometimes increase as nodes are removed, although the objective is to minimize the metrics. The increase in the network performance metrics indicates (a) a node removal strategy becomes ineffective, or (b) the network performance metric cannot be decreased further. 4. Discussion In order to assess the efficiency of the 10 node removal strategies and compare their performance to that of the random strategy and the LOS, an efficiency index (Eq. (13)), qs,t, is used to calculate the relative efficiency of each node removal strategy for each network performance metric. r is the total number of nodes removed from the network. s refers to the 11 node removal strategies and the LOS. t refers to the six network performance metrics. i refers to the ith node removed from the network. as,t,i = 1 if s achieves the best (minimum) among all 12 values for metric t after the ith node is removed; as,t,i = 0 P otherwise. sas,t,i is the total number of strategies including the LOS that achieve the best performance for metric t. as;t;i a a 0 6 P a 6 1: Ps;t;i ¼ 0 indicates s is not one of the best strategies for removing the ith node. Ps;t;i ¼ 1 indicates s is the a a s s;t;i s s;t;i s s;t;i P only best strategy for removing the ith node. Suppose multiple strategies achieve the best performance, i.e., s as;t;i > 1, a one of them should be randomly selected to remove the ith node. Ps;t;i is therefore the conditional probability that s is sea s s;t;i

lected for the removal of the ith node given that the best strategy is used to remove the node. qs,t is the average conditional

3466

E. Jahanpour, X. Chen / Commun Nonlinear Sci Numer Simulat 18 (2013) 3458–3468

Table 2 Efficiency indices (qs,t) of node removal strategies.

Table 3 Computational complexity of the node removal strategies. Node removal strategy

Network performance metric

ODS MDS OBS MBS ORS MRS OCS MCS OES MES LOS Random

O(n3 log n) O(n4) O(n5 log n) O(n6) O(n5 log n) O(n6) O(n5 log n) O(n6) O(n5 log n) O(n6) O(n6) O(n2)

NRD

ANC

LCP

ES

SDH

LGD

O(n6)

O(n6)

O(n5)

O(n6)

O(n5)

probability. 0 6 qs;t 6 1. The larger qs,t is, the more efficient s is for metric t. For a given metric t, strategy s with the maximum qs,t should be used to remove nodes.

qs;t ¼

r 1X a P s;t;i : r i¼1 s as;t;i

ð13Þ

Table 2 shows qs,t of each node removal strategy for each of the six network performance metrics. The largest qs,t for each metric is highlighted in Table 2. None of the strategies has the largest qs,t for all six network performance metrics. The LOS should be used to decrease NRD and LGD. The MDS should be used to decrease ANC and SDH. The MCS should be used to decrease LCP. The MBS should be used to decrease ES. Among the 10 structured node removal strategies, on average, the MDS and MBS are more efficient than other strategies. The MBS is the most efficient structured strategy during the removal of the first 24 nodes (Figs. 3–6); its performance deteriorates afterwards because the network becomes dominated by components of a single node or two connected nodes. Computational complexity of the node removal strategies needs to be taken into consideration in choosing the best strategies. The complexity of each strategy is determined by the computation time of the node centrality metric and the number of times the metric needs to be calculated. The computational complexity of the degree of a node is O(n). To compute the degree of all n nodes, the complexity is O(n2). With the ODS, nodes must be sorted according to their degree; nodes with larger degree are removed first. The complexity of the best sorting algorithms is O(n log n) [25]. Overall, the complexity of the ODS is O(n3 log n). With the MDS, the nodes with the largest degree needs to be identified with the complexity O(n), and the process must be repeated at each step. The complexity of the MDS is O(n4). Using Brandes’ Algorithm [26], calculating the betweenness of a node requires O(ne) time, where e denotes the number of links in a network. Since there are at most nðn1Þ links, the complexity of calculating betweenness is O(n3). The OBS and MBS have time complexity of O(n5 log n) 2 6 and O(n ), respectively. The complexity of computing reciprocal closeness centrality, complement-derived closeness centrality, and eigenvector centrality are O(n3) [27]. Similar to the OBS and MBS, the complexity of the ORS, OCS, and OES is O(n5 log n) and the complexity of the MRS, MCS, and MES is O(n6). The complexity of the LOS depends on the network performance metric. The complexity of computing the NRD of a network is O(n3). At each step, n nodes can be removed and the NRD is calculated for each remaining network. In addition, the minimum NRD must be identified at each step. The complexity of each step is O(n5). Since there are n steps, the overall complexity of the LOS for the NRD is O(n6). Because the complexity of computing the ANC, LCP, and SDH is O(n3), the complexity of LOS for these three metrics is O(n6). Because the complexity of computing the ES and LGD is O(n2), the complexity of the LOS for these two metrics is O(n5). The complexity of the random strategy is O(n2) because the complexity

E. Jahanpour, X. Chen / Commun Nonlinear Sci Numer Simulat 18 (2013) 3458–3468

3467

of removing a randomly selected node at each step is O(n). Table 3 compares the computational complexity of all 11 node removal strategies and the LOS. 5. Conclusions and future study Identifying and removing the most important nodes from complex networks is a practical but NP-complete or NP-hard problem. Nodes are removed from a network with various objectives that aim at minimizing diffusion speed, diffusion scale, homogeneity, and/or diameter. Six network performance metrics, including four new metrics, are applied in this research to quantify these objectives. Ten structured heuristic node removal strategies are designed and applied to remove nodes from the September 11, 2001 airplane hijackers’ network (also called Krebs network). These 10 strategies are compared to two extreme cases, i.e., the random strategy, which is the least structured, and the LOS, which achieves the local optima of the network performance metrics. The comparisons (Figs. 3–6 and Table 2) show that the structured strategies are more efficient than the random strategy in terms of all six network performance metrics. None of the structured strategies, however, performs the best for all six metrics. Some structured strategies are more efficient than the LOS. For most network performance metrics, MBS and MDS decrease them more efficiently than other strategies. Analysis of computational complexity (Table 3) reveals that the multiple-time-calculation strategies require more computation time than the one-time-calculation strategies although the former have better performance than the latter. The random strategy takes the least amount of time. Among the 10 structured strategies and the LOS, ODS and MDS are the least complex because the node degree can be calculated faster than all network performance metrics and other node centrality metrics. Excluding the ODS and MDS, the LOS is less complex than the OBS, MBS, ORS, MRS, OCS, MCS, OES, and MES for the two network performance metrics ES and LGD. According to the efficiency indices (Table 2), the MDS and MBS are preferred. Compared to the MBS, MDS requires less computation time (Table 3). The MBS, however, has greater impact on network performance compared to the MDS P P ( t2T qMBS;t ¼ 1:183 > t2T qMDS;t ¼ 1:161, where T is the set of all six network performance metrics). The MDS, MBS, and other node removal strategies may be integrated into a mixed strategy, which determines the removal strategy for each node. Future research may be focused on four areas: (1) Developing mixed strategies to increase the efficiency index and/or decrease computational complexity; (2) Applying the node removal strategies to statistically different networks generated according to the topology of the Krebs network; (3) Evaluating the node removal strategies and their performance in large complex networks such as the network of infectious disease [28] and power grids [23]; (4) Performing analytical studies on the node removal strategies and network performance metrics to further understand their inherent relationships.

References [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24]

Albert R, Jeong H, Albert B. Error and attack tolerance of complex networks. Nature 2000;406:378–82. Buldyrev S, Parshani R, Paul G, Stanley H, Havlin S. Catastrophic cascade of failures in interdependent networks. Nature 2010;464:1025–8. Mishkovski I, Biey B, Kocarev L. Vulnerability of complex networks. Commun Nonlinear Sci Numer Simul 2011;16:341–9. Grubesic T, Matisziw T, Murray A, Snediker D. Comparative approaches for assessing network vulnerability. Int Reg Sci Rev 2011;34(2):230–52. Holme P, Kim B, Yoon C, Han S. Attack vulnerability of complex networks. Phys Rev E 2002;65:056109. Krebs V. Mapping networks of terrorist cells. Connect 2002;24(3):43–52. Penzar D, Srbljinovic´ A. About modelling of complex networks with applications to terrorist group modelling. Interdiscip Descr Complex Syst 2005;3(1):27–43. Fellman P, Wright R. Modeling terrorist networks. Complexity, ethics and creativity conference, LSE; 2003. Borgatti S. Identifying sets of key players in a social network. Comput Math Organ Theory 2006;12(14):21–34. Lim M, Negnevitsky M, Hartnett J. Tracking and monitoring e-mail traffic activities of criminal and terrorist organisations using visualisation Tools. In: Proceedings of the sixth Australian information warfare & security conference, Geelong, Victoria, Australia; 2005. Vos Fellman P. The complexity of terrorist networks. Int J Netw Virtual Organ 2011;8(1):4–14. Freeman L. Centrality in social networks conceptual clarification. Soc Netw 1978/79;1(3):215–39. Cornwell B. A complement-derived centrality index for disconnected graphs. Connect 2005;26(2):70–81. Kolaczyk E. Statistical analysis of network data. New York: Springer; 2009. Newman M. The mathematics of networks. Basingstoke, Palgrave Macmillan: The New Palgrave Encyclopedia of Economics; 2008. Bonacich P. Factoring and weighting approaches to status scores and clique identification. J Math Sociol 1972;2(1):113–20. Zhang X, Han J. Analysis on the Importance of Nodes in Supply Chain Network. International Conference on Business Computing and Global Informatization, Shanghai, China, 2011. Zio E, Marella M, Podofillini L. Importance measures-based prioritization for improving the performance of multi-state systems: application to the railway industry. Reliab Eng Syst Saf 2007;92(10):1303–14. Everett M, Borgatti S. Induced, endogenous and exogenous centrality. Soc Netw 2010;32(4):339–44. Shen S, Smith J. Polynomial-time algorithms for solving a class of critical node problems on trees and series-parallel graphs. Network 2012;60(2):103–19. Zhang C, Ramirez-Marquez J, Sanseverino C. A holistic method for reliability performance assessment and critical components detection in complex networks. IIE Trans 2011;43:661–75. Jackson M. Social and economic networks. New Jersey: Princeton University Press; 2008. Watts D, Strogatz S. Collective dynamics of ‘small-world’ networks. Nat 1998;393:440–2. Newman M. Networks: an introduction, Oxford University Press; 2010.

3468

E. Jahanpour, X. Chen / Commun Nonlinear Sci Numer Simulat 18 (2013) 3458–3468

[25] Cormen T, Leiserson C, Rivest R, Stein C. Introduction to algorithms, 2nd ed., McGraw-Hill, Higher Education; 2001. [26] Brandes U. A faster algorithm for betweenness centrality. J Math Sociol 2001;25(2):163–77. [27] Borgatti S, Everett M, Freeman C. Analytic technologies. Available at: . Accessed on 02/18/ 2012. [28] Cauchemez S, Bhattarai A, Marchbanks T, Fagan R, Ostroff S, Ferguson N, Swerdlow D, Pennsylvania H1N1 working group. Role of social networks in shaping disease transmission during a community outbreak of 2009 H1N1 pandemic influenza. Proc Natl Acad Sci USA 2011;108(7):2825–30.