Vulnerability of labeled networks

Vulnerability of labeled networks

Physica A 389 (2010) 5538–5549 Contents lists available at ScienceDirect Physica A journal homepage: www.elsevier.com/locate/physa Vulnerability of...

645KB Sizes 5 Downloads 150 Views

Physica A 389 (2010) 5538–5549

Contents lists available at ScienceDirect

Physica A journal homepage: www.elsevier.com/locate/physa

Vulnerability of labeled networks✩ Daniel Trpevski a,∗ , Daniel Smilkov a , Igor Mishkovski b , Ljupco Kocarev a,c a

Macedonian Academy of Sciences and Arts, Skopje, Macedonia

b

Department of Electronics, Politecnico di Torino, Corso Duca degli Abruzzi 24, I-10129, Turin, Italy

c

BioCircuits Institute, University of California San Diego, La Jolla, CA, USA

article

info

Article history: Received 12 April 2010 Received in revised form 12 July 2010 Available online 18 August 2010 Keywords: Labeled networks Vulnerability EU power grid AS network

abstract We propose a metric for vulnerability of labeled graphs that has the following two properties: (1) when the labeled graph is considered as an unlabeled one, the metric reduces to the corresponding metric for an unlabeled graph; and (2) the metric has the same value for differently labeled fully connected graphs, reflecting the notion that any arbitrarily labeled fully connected topology is equally vulnerable as any other. A vulnerability analysis of two real-world networks, the power grid of the European Union, and an autonomous system network, has been performed. The networks have been treated as graphs with node labels. The analysis consists of calculating characteristic path lengths between labels of nodes and determining largest connected cluster size under two node and edge attack strategies. Results obtained are more informative of the networks’ vulnerability compared to the case when the networks are modeled with unlabeled graphs. © 2010 Elsevier B.V. All rights reserved.

1. Introduction Many complex systems in the real world can be conceptually described as networks, where nodes represent the system constituents and edges depict the interaction between them. Often enough, the network representation of these systems is an undirected and unweighted graph which greatly simplifies the structure of complex systems. Attempts have been made to include network specific details by annotating graphs representing networks in various ways [1]. In this way heterogeneous networks where different entities may have diverse relationships are more appropriately described. This paper specifically deals with the vulnerability of graphs with node annotations, where a node can have one (or several) of k distinct labels. Real-world networks which can be viewed as having labeled nodes are numerous and include taxonomies of autonomous systems (ASes) in the Internet, the Wikipedia article graph where articles belong to various categories, sources and sinks in Europe’s gas network, maps of university relationships from different countries [2–5], etc. Vulnerability concerns the issue of network robustness to malicious attacks. Two types of studies in vulnerability are met in the literature: one where the effect of removing nodes or edges on the network connectivity is studied [6–11]; the other where an explicit model of entity flow (e.g. electrical current, gas, or Internet data) is presumed for the network, and the effect of removing nodes or edges on the network flow is studied [2,12–15]. The work in this paper falls in the first type. The paper proceeds as follows. In Section 2 we define vulnerability measures which will be used to study networks with node annotations. Section 3 investigates vulnerability of a power grid network in the European Union, while in Section 4 we deal with an AS network. Conclusions are addressed in Section 5. ✩ This work relates to Department of the Navy Grant N62909-10-1-7074 issued by Office of Naval Research Global. The United States Government has a royalty-free license throughout the world in all copyrightable material contained herein. ∗ Corresponding author. Tel.: +389 70 275 592. E-mail addresses: [email protected] (D. Trpevski), [email protected] (D. Smilkov), [email protected] (I. Mishkovski), [email protected] (L. Kocarev).

0378-4371/$ – see front matter © 2010 Elsevier B.V. All rights reserved. doi:10.1016/j.physa.2010.08.008

D. Trpevski et al. / Physica A 389 (2010) 5538–5549

5539

Fig. 1. A graph with three different annotations.

2. Measures of vulnerability for labeled networks 2.1. Edge betweenness and characteristic path length Several measures for network vulnerability have been proposed so far. It is easily shown [16] that a proper vulnerability measure should be derived from the link betweenness and hence, we follow this approach in defining one for labeled networks. Consider an undirected, unweighted graph G = (V , E ), where V represents the node set and E is the edge set. The betweenness of edge e is defined as: be =

− ni,j (e) ni,j

i,j∈V ,i̸=j

.

(1)

ni,j denotes the number of shortest paths between nodes i and j, and ni,j (e) is the number of those paths containing e. Thus, it measures the extent to which edge e is involved in the shortest paths between nodes. The average edge betweenness of the graph is: b(G) =

1 −

|E |

be .

(2)

e∈E

The average edge betweenness is related to the characteristic path length of a graph, which represents an average of all the geodesic distances (shortest path lengths). In the case of multiple shortest paths between any two nodes i, j ∈ V , the characteristic path length is defined as

∑ L(G) =

1



|V | (|V | − 1) i,j∈V ,i̸=j

di,j

g ∈Pi,j

ni,j

,

(3)

where Pi,j is the set of all geodesics between nodes i and j, and di,j is the length of each such geodesic distance. L(G) is related to b(G) by b(G) =

|V | (|V | − 1) L(G). |E |

(4)

Essentially the two quantities express the same information about the graph G [16]. Since for a graph with a fixed number of nodes b(G) and L(G) decrease as the number of edges in the graph increases, it can be said that they represent how ‘‘well connected’’ a graph is. The higher the values of b(G) and L(G), the more vulnerable G is to loss of edges. 2.2. Labeled graphs Further, let G be a labeled graph where each node can have one or several of k distinct labels, i.e. V = ∪ki=1 Vi and Vm ∩ Vn is not necessarily empty, m, n ∈ {1, . . . , k}. In such a setting a valid question concerns the vulnerability of communication between nodes of specific labels. For example, consider the graphs in Fig. 1. Although all of them represent cycles, the difference between them is obvious. While the leftmost graph is unlabeled, nodes of the same label are two hops away in the middle graph, and one hop away in the rightmost graph. Clearly, communicating with a node of the same label requires more edges in the middle graph G2 than in the rightmost graph G3 , which intuitively makes this communication in G2 more vulnerable to loss of edges. To formalize the difference between such graphs, the first step is choosing which of the previously defined metrics is suitable for the matter. We require that a metric for vulnerability of labeled graphs has the following properties:

• Considering a labeled graph as an unlabeled one (i.e. having only one label), it reduces to the corresponding metric for an unlabeled graph.

• It has the same value for all possible labelings in a fully connected graph, reflecting the notion that any arbitrarily labeled fully connected topology is equally vulnerable as any other. Consider the edge betweenness in (1) first, which can be decomposed as: be =

− m,n∈{1,...,k}

beVm ,Vn ,

(5)

5540

D. Trpevski et al. / Physica A 389 (2010) 5538–5549

where bVe m ,Vn =



ni,j (e)

i∈Vm ,j∈Vn

ni , j

,

i ̸= j

(6)

indicates the importance of the edge in the shortest path communication between nodes of sets Vm and Vn , m, n ∈ {1, . . . , k}. Consequently, (2) can be rewritten as: b(G) =

b(G)Vm ,Vn ,



(7)

m,n∈{1,...,k}

where bVm ,Vn (G) =

1 −

|E |

beVm ,Vn ,

(8)

e∈E

m, n ∈ {1, . . . , k}. This stresses the relative importance of communication between particular labels of nodes to the overall betweenness of the graph. In the case of two labels, such as in the example in Fig. 1, (5) and (7) become: be = bVe 1 ,V1 + bVe 1 ,V2 + bVe 2 ,V1 + bVe 2 ,V2

(9)

and b(G) = bV1 ,V1 (G) + bV1 ,V2 (G) + bV2 ,V1 (G) + bV2 ,V2 (G),

(10)

respectively. For the graphs in Fig. 1 one obtains: b(G1 ) = b(G2 ) = b(G3 ) = 2, 1 bV1 ,V1 (G2 ) = bV2 ,V2 (G2 ) = , 2 1 V1 ,V2 V2 ,V1 b (G2 ) = b (G2 ) = , 2 1 V1 ,V1 V2 ,V2 b (G3 ) = b (G3 ) = , 4 3 bV1 ,V2 (G3 ) = bV2 ,V1 (G3 ) = . 4 The results would indicate that in G2 , on average, an edge participates equally in all kinds of shortest path communication, which would also suggest that all communication is of equal vulnerability. Clearly, when k = 1, (10) reduces to (2). However, for an arbitrarily labeled fully connected graph the values of bVm ,Vn (G) depend on the number of nodes in Vm and Vn , which gives an inaccurate result of non-equal vulnerability. Next, consider the second metric, L(G). For a graph with k distinct labels, the characteristic path length between nodes of class m and n, m, n ∈ {1, . . . , k} is:

∑ L

Vm ,Vm

(G) =

1

di,j

g ∈Pi,j



|Vm | (|Vm | − 1) i,j∈Vm ,i̸=j

ni,j

(11)

and

∑ L

Vm ,Vn

(G) =

1



di,j

g ∈Pi,j

|Vm ||Vn | i∈Vm ,j∈Vn

ni , j

(12)

when m ̸= n. Calculating these for G2 and G3 in Fig. 1 gives: LV1 ,V1 (G2 ) = LV2 ,V2 (G2 ) = 2, LV1 ,V2 (G2 ) = LV2 ,V1 (G2 ) = 1, LV1 ,V1 (G3 ) = LV2 ,V2 (G3 ) = 1,

LV1 ,V2 (G3 ) = LV2 ,V1 (G3 ) = 1.5, revealing the average distance between nodes of respective classes and giving some indication as to the topology of the graph. In the case of G2 it suggests the more intuitive result that communication between nodes of the same label is more vulnerable than that of nodes in different labels.

D. Trpevski et al. / Physica A 389 (2010) 5538–5549

5541

Table 1 The 5 most and least interconnected pairs of labels in the EU power grid in the sense of shortest path communication. 5 lowest LVm ,Vn (G) L

Vm ,Vn

(G)

26.7148 27.1145 28.0086 28.0409 28.9582

5 highest LVm ,Vn (G) Label m

Label n

LVm ,Vn (G)

Label m

Label n

3 1 5 2 4

3 1 5 3 3

38.3702 37.7245 34.0184 33.3845 32.4937

5 1 4 2 3

6 6 6 6 6

When there is only one label in the graph (11) clearly reduces to (3). Note that the characteristic path length as defined with (11) and (12) is 1 for an arbitrarily labeled fully connected graph of any size, reflecting the notion that any labeled fully connected topology is equally vulnerable as any unlabeled one. Also note that

|Vm | (|Vm | − 1) Vm ,Vm L (G), |E | |Vm ||Vn | Vm ,Vn bVm ,Vn (G) = L (G), m ̸= n |E | bVm ,Vm (G) =

i.e. the characteristic path length is the normalized average edge betweenness which is invariant to the number of nodes in Vm and Vn in a fully connected topology. Therefore we propose inspecting the characteristic path lengths between nodes of specific labels to gain insight into how well these nodes are interconnected in the sense of shortest path communication. A noteworthy remark is that characteristic path length has been used to inspect a network’s efficiency [8,9]. Finally, consider graph G2 in Fig. 1. Note that there are two paths of length 2 connecting nodes of the same label. The characteristic path length is oblivious of this fact. Indeed, imagine deleting an edge from G2 : again LV1 ,V1 (G2 ) = LV2 ,V2 (G2 ) = 2, but only one path exists between nodes of the same label, intuitively making G2 with a missing edge more vulnerable than the original G2 . Therefore, in order to determine the vulnerability of a network to edge deletions, one needs to simulate node and edge loss and calculate how the characteristic path length is affected. 2.3. Size of the largest connected cluster When deleting nodes or edges, the network may become fragmented into several disconnected components. Therefore, another quantity of interest when studying network vulnerability is the size S of the largest connected cluster. The slower the decay of S with node or edge removal, the more robust is the network. In labeled networks, removing nodes or edges with specific labels will yield more insight as to their importance in the network and their effect on the size and composition of the largest connected cluster. 3. EU power grid In this section we present the vulnerability analysis for a network representing the power grid of the European Union. The network was extracted from the Platts electricity transmission lines dataset [2]. It contains a total of 12 741 nodes of the following 6 types: 1. 2. 3. 4. 5. 6.

Gas node (a junction, intersection, or end node) — 2212 nodes Electricity substation — 5096 nodes Natural gas power plant — 998 nodes Generic power plant — 4383 nodes DC line connected substation — 31 nodes Liquefied natural gas (LNG) terminal — 21 nodes

connected by 17 798 edges which have been treated as undirected. Fig. 2 depicts the degree distribution for the EU power grid network, plotted on a log–log scale. A power-law of the form p(k) ∝ k−γ has been fitted to the data, and the plausibility of this power-law model has been checked according to the method described in [17]. Since the p-value of the statistical test is lower than 0.1 (p = 0.02), a power-law model is not a good description of the degree distribution in the power grid. The characteristic path length between node labels ranges approximately from 26 to 39, as the results in Table 1 show. The most efficiently connected nodes in terms of distance are natural gas power plants, gas nodes, and substations connected by DC lines. In addition, electricity substations and generic power plants are similarly distanced to natural gas power plants. On the other hand, the longest characteristic path lengths are observed between LNG terminals and every other type of node in the network, which is due to the peripheral geographical position of these terminals in the network as well as to their late appearance in the power grid network.

5542

D. Trpevski et al. / Physica A 389 (2010) 5538–5549

Fig. 2. Node degree distribution for EU power grid.

Fig. 3. Relative change in L(G) between node labels as a function of the removed nodes when the first attack strategy is applied.

3.1. Node removal We now investigate how the characteristic path length L(G) and the size S of the largest connected cluster change when nodes are removed from the EU power grid network. We simulate intentional node attacks by removing the nodes in decreasing order of their degree k. Two attack strategies are used: removal of the most connected nodes not taking their label into consideration, and removal of the most connected nodes of a given label. For brevity, we do not show all the results when specific labels of nodes are removed. Fig. 3 shows the change in L(G) between types of nodes when the first attack strategy is applied. The characteristic path length between the natural gas power plants (label 3) has the greatest increase, revealing that although the natural gas power plants have the best shortest path communication in the unperturbed network, it is also very vulnerable to node attacks. On the other hand, the characteristic path length between the gas nodes (label 1) is not affected much, i.e. it has the smallest increase. This is the second most robust communication in the undamaged EU power grid, and remains robust under the first attack strategy. The results when nodes are removed according to the second attack strategy are shown in Fig. 4. Particularly, when removing the highest degree electricity substations (label 2), one can notice how the interconnectedness between the generic power plants is lowered (label 4). The most robust communication in terms of interconnectedness is again, the communication between the gas nodes (label 1). Fig. 5 shows how the largest connected cluster size S changes when the first attack strategy is applied. When 5.5% of the most connected nodes are removed from the network, the size of the largest connected cluster is reduced to half. Removing electricity substations (label 2) greatly reduces the largest connected cluster, followed by gas nodes (label 1), natural gas power plants (label 3) and generic power plants (label 4). This reveals how important a particular type of node is to the network. As can be expected, the electricity substations provide the network connectivity, whereas the generic and natural gas power plants play no role in keeping the network connected. Since the gas nodes in the network include intersections

D. Trpevski et al. / Physica A 389 (2010) 5538–5549

5543

Fig. 4. Relative change in L(G) between nodes of specific labels as a function of removed nodes of label 2.

Fig. 5. Size of the largest connected cluster as a function of the fraction of removed nodes.

and junctions as well as end nodes, removing the most connected of them reduces the largest connected cluster to about 80% of the original size. The amount of remaining nodes of each label in the largest connected cluster under the first attack strategy is as follows (figure not shown). When 5.5% of the nodes are removed, only 30% of the generic power plants and LNG terminals remain in the largest connected cluster, and about 50% of natural gas power plants remain. The least affected are the gas nodes, 80% remain in the largest connected cluster. The result is significant since it indicates a disproportionate decrease in the suppliers (natural gas power plants) and demanding sinks (gas nodes), which might cause an overload in the fragmented network. A detailed flow model could further clarify this point. Also, removing only 5.5% gas nodes reduces the percentage of LNG terminals to around 5% in the largest connected cluster, due to the fact that liquefied natural gas must pass through gas junctions or intersections in order to get to its destination. In the same manner, the removal of 5.5% of the electricity substations cuts off almost 70% of the generic power plants. On the other hand, results show that removing natural gas power plants and generic power plants do not cut off any other type of nodes. 3.2. Edge removal In this section we investigate how the characteristic path length L(G) and the size S of the largest connected cluster change under edge removal from the EU power grid network. Again, intentional edge attacks are simulated, and edges are removed in decreasing order of their betweenness. Two attack strategies are used: removal of edges without considering the label of nodes they connect, and removal of edges that connect nodes of a given label. Results on how L(G) increases between certain node labels when the first attack strategy is applied are given in Fig. 6 and are similar to those in Fig. 3 when nodes are removed. It is evident that by removing 5.5% of the edges the characteristic path length is increased by approximately 33%. Once again, the smallest increase is between the gas nodes (label 1).

5544

D. Trpevski et al. / Physica A 389 (2010) 5538–5549

Fig. 6. Relative change in L(G) between types of nodes as a function of removed edges (disregarding the edge type).

Fig. 7. Relative change in L(G) between certain node labels as a function of removed edges connecting nodes of label 2.

When only edges between electricity substations (label 2) are removed (see Fig. 7), the largest increase in L(G) is between generic power plants (label 4), and between electricity substations (label 2) and DC line connected substations (label 5). The fragmentation of the network when up to 30% of the edges are removed is shown in Fig. 8. Removing up to 5.5% of any type of edges does not do serious damage to the network since S is ≈90% of its original value. Beyond this point the most harm to the largest connected cluster is done by attacking with the first strategy; S is reduced to around 10% of the initial size. Furthermore, removing edges between electricity substations has a significant effect on S, reducing it up to 40%. Removing edges between gas nodes has a limited effect on S — deleting more than 10% of them does not decrease S significantly below 80%. With the first attack strategy the change in the amount of remaining nodes in the network follows the change in S (figure not shown). When edges connecting gas nodes are removed, only the gas nodes and LNG terminals are affected (figure not shown). This is similar to the effect of removing only gas nodes from the network. The number of gas nodes rapidly drops by 80%, while approximately half of the LNG terminals remain in the largest connected cluster when 30% of these edges are removed. 4. AS network In this section we present results obtained from the vulnerability analysis for a graph of autonomous systems (AS graph). The dataset used represents a labeled AS graph, obtained by applying machine learning techniques on extensive data sources [3], with the intent of identifying a more complete graph than relying solely on BGP-derived AS graphs. Each of the 19,537 autonomous systems in the network is classified as belonging to one of 6 classes: 1. Universities 2. Customer ASes

D. Trpevski et al. / Physica A 389 (2010) 5538–5549

5545

Fig. 8. Size of the largest connected cluster as a function of the fraction of removed edges.

Fig. 9. The complementary cumulative node degree distribution for the AS network.

3. 4. 5. 6.

Network Information Centers (NICs) Small Internet service providers (ISPs) Large ISPs Internet exchange points (IXPs),

which are described in [3] as follows. Large ISPs cover large backbone providers, tier-1 ISPs, with intercontinental networks. Small ISPs stand for regional and access providers with small metropolitan or regional networks. Customer ASes represent companies or organizations such as web hosting companies, technology companies, consulting companies, hospitals, banks, military networks, government networks, etc., that run their own networks but do not provide Internet connectivity services to other ASes. Universities are university or college networks, different from typical customer ASes for having substantially larger networks that serve thousands of end hosts. IXPs are small networks which serve as interconnection points for large and small ISPs. Finally, NICs represent networks that host important network infrastructure, such as root or top-level domain servers. 923 of the ASes in the network have not been classified by the method in [3]. They remain in the network and are labeled as ‘‘Abstained’’. The degree distribution of the AS network is shown in Fig. 9 on a log–log scale along with a fitted power-law. The method for fitting and verifying the plausibility of the power-law model is the same as for the EU power grid. The p-value of the statistical test is p = 0.72, much larger than 0.1, so it is very likely that the degree distribution of this network is a discrete power-law which applies for degrees larger than kmin = 7 and has a scaling exponent γ = 2.0987. The network’s edges are also annotated, information which we disregard for simplicity. The results are given in Tables 2 and 3 for the 10 most interconnected (low LVm ,Vn (G)) and 10 least interconnected types of ASes (high LVm ,Vn (G)), respectively. As expected, communication between tier-1 providers is the most efficient in terms of path length, the average being around 1.5 edges. Following is the communication between all other types of autonomous systems and the tier-1 providers, pointing out the central role of large ISPs in the Internet connectivity. The role of IXPs as interconnecting ASes is also evident, as large and small ISPs both have short path lengths to IXPs.

5546

D. Trpevski et al. / Physica A 389 (2010) 5538–5549

Table 2 The top 10 pairs of AS types with lowest characteristic path length. LVm ,Vn (G)

Label m

Label n

1.529 2.448 2.626 2.633 2.783 2.862 2.880 3.352 3.598 3.635

Large ISP Large ISP Large ISP Large ISP Large ISP Large ISP Large ISP IXP Small ISP Customer AS

Large ISP IXP Small ISP Customer AS Abstained University NIC IXP IXP IXP

Table 3 The top 10 pairs of AS types with highest characteristic path length. LVm ,Vn (G)

Label m

Label n

4.179 4.116 4.091 4.038 4.028 4.019 4.006 3.977 3.963 3.899

University NIC University Abstained University Small ISP Customer AS Small ISP Customer AS Small ISP

NIC Abstained Abstained Abstained University NIC NIC University University Abstained

The longest characteristic path lengths in the AS network also reveal the relative position of AS types in the network. For example, universities are embedded deeper in the network with an average distance of 4 hops between themselves. As customer ASes, universities need to be connected to a provider, and according to the results, on average there are three providers between them. 4.1. Node removal The same analysis as for the EU power grid is done for the AS network, and results are presented in this section. We simulate intentional node attacks by removing the nodes in decreasing order of their degree k. Two attack strategies are used: removal of the most connected nodes not taking their label into consideration, and removal of the most connected nodes of a given label. Fig. 10 shows how L(G) changes between types of ASes for the first attack strategy. The greatest increase occurs for universities (label 1) and IXPs (label 6), universities and customer ASes (label 2), customer ASes with themselves, small ISPs (label 4), small ISPs (label 4) and NICs (label 3), and happens when around 2.25% of the nodes are removed from the AS network. This result is easily explained since the network has a power-law distribution and the highest degree nodes are the large ISPs which provide backbone connectivity, thus being on the paths on almost all the nodes in the network. Conversely, L(G) between large ISPs on one hand, and NICs and IXPs on the other quickly disappears because of the absence of paths between these types of nodes in the remaining connected component. Although the latter communicating groups of nodes have the shortest paths in the unperturbed network, their communication is the most vulnerable. At 2.5% of removed nodes, there is an overall sharp drop in L(G) since the largest connected cluster is greatly diminished. When removing university nodes, only their connection with the other type of nodes is affected slightly (results not shown). This is also true for NICs and customer ASes, since these do not provide Internet connectivity services, and therefore cannot affect communication occurring between other types of ASes. On the other hand, removing small ISPs affects communication between all types of nodes, although very slightly. Concerning the fragmentation process, the largest connected cluster is reduced the most when the first attack strategy is applied — almost to S = 0 when 4% of the nodes are removed from the network (see Fig. 11). This is expected since the network has a power-law distribution, and removing the hubs severely affects its connectivity. Naturally, removing only the small ISPs (label 4) also reduces the largest cluster, namely, by 30% when 6% of these nodes are deleted. As the removal of the hubs is detrimental to the network, the amount of each type of AS in the remaining network also sharply decreases, eventually reducing close to 0, along with S (figure not shown). Small ISPs provide Internet access to end hosts as well as to other AS types. Removing 6% of them from the AS network has the following effect: 75% of the customer ASes, around 80% of university nodes and 85% of IXPs and NICs remain in the network (figure not shown). Removing all of the small ISPs gives a largest connected cluster half the size of the original one and the remaining nodes in it are: 68% of the customer ASes, 71% of university nodes, 82% of the NICs and 79% of the IXPs.

D. Trpevski et al. / Physica A 389 (2010) 5538–5549

5547

Fig. 10. L(G) between types of ASes as a function of removed ASes with highest degree.

Fig. 11. Size of the largest cluster as a function of the fraction of the removed nodes.

Naturally, all large ISPs remain in the largest connected cluster. This points out the extent to which Internet access to other ASes is provided by small ISPs, as opposed to directly connecting to large ISPs. Evidently, most of the customer ASes receive their services directly via large ISPs. However, removing only a small amount of small ISPs will cut off most of the customers being served by the small ISPs. Needless to say, ASes which do not provide Internet access, such as customer ASes, universities and NICs do not reduce S significantly when they are removed, and only their number in the largest connected cluster is affected. 4.2. Edge removal In this section we investigate how the characteristic path length L(G) and the size S of the largest connected cluster change under edge removal from the AS network. As for the EU power grid, intentional edge attacks are simulated, and edges are removed in decreasing order of their betweenness. Two attack strategies are used: removal of edges without considering the label of nodes which they are connecting, and removal of edges that connect nodes of a given label. The change in L(G) for the first attack strategy is given in Fig. 12. The amount of increase of L(G) is between 15% and 30% for every type of nodes when 5.5% of the edges are removed. The biggest increase is observed between large ISPs and NICs, and large ISPs and customer ASes. The largest connected cluster size S is reduced the most when the first attack strategy is applied (see Fig. 13). Removing 30% of the edges reduces S to 65% of its original size. We already saw that small ISPs have a limited role in providing Internet access (Section 4.1). Similarly, removing edges between them has a limited effect on S, which is reduced to ≈90%. The percentage of each type of nodes which remain in the largest connected cluster, when the first edge attack strategy is applied, follows the decline in S closely (figure not shown). When up to 15% of the edges are removed, the number of all ASes in the largest connected component does not fall below 90% of the respective original values. Afterwards, the decline is similar to that of S in Fig. 13, the most affected nodes being the NICs, reduced to below 55%, and the least affected being the

5548

D. Trpevski et al. / Physica A 389 (2010) 5538–5549

Fig. 12. L(G) between certain types of ASes as a function of removed edges.

Fig. 13. Size of the largest cluster as a function of the fraction of the removed edges.

customer ASes, above 70% of their initial number. A most interesting observation is that all of the large ISPs remain in the largest connected cluster under this edge removal strategy. Since removing edges according to the second attack strategy does not show a significant effect on S, we omit the results on the number of remaining nodes in the largest connected cluster. 5. Conclusions Modeling networks with annotated graphs captures much more of the network specific details. In this paper we analyze the structural vulnerability of two networks with node annotations: the EU power grid, and an extensive AS network. Considering the networks as annotated rather than unlabeled graphs reveals more information about their vulnerability when nodes and edges are attacked. To measure vulnerability we inspect the changes in characteristic path lengths between specific labels of nodes under node and edge deletions. The characteristic path length has two important properties for which we choose it over the average edge betweenness:

• Considering a labeled graph as an unlabeled one, it reduces to the corresponding metric for an unlabeled graph. • It has the same value for all possible labelings in a fully connected graph, reflecting the notion that any arbitrarily labeled fully connected topology is equally vulnerable as any other. Moreover, we determine how the size of the largest connected cluster changes when nodes (edges) are removed. Two attack strategies have been used: removing highest degree nodes (highest betweenness edges) without regard to the node labels, and removing highest degree nodes (highest betweenness edges) of a specific label. All of this yields more information for the networks’ vulnerability as opposed to just viewing them as unlabeled graphs, and the results are as follows.

D. Trpevski et al. / Physica A 389 (2010) 5538–5549

5549

In the EU power grid, the natural gas power plants have the shortest path length between them, followed by gas nodes and by DC line connected substations. The longest path lengths exist between DC line connected substations and LNG terminals. The connection between natural gas power plants is among the most vulnerable ones when attacking nodes with the first strategy, even though before the attack this is where the shortest paths are. The most robust connection is that between the gas nodes. Moreover, the number of suppliers (natural gas power plants) and sinks (gas nodes) in the largest connected cluster under this attack strategy is lowered to 50% and 80%, respectively. This change is disproportionate and warrants additional study to see whether it might cause an overload in the network. The role of the electricity substations as interconnection points between generic power plants is quantified, in the sense that only 70% of the latter remain in the largest connected cluster which is 60% of its original size when 5.5% of the electricity substations are attacked. Similar insights are obtained for edge removal. Edges connecting gas nodes have limited effect on the largest connected cluster size, and disconnect only gas nodes and LNG terminals from it. Edges connecting electricity substations have a more substantial effect. As for the AS network, the characteristic path length is the shortest among large ISPs, followed by the connections between large ISPs and all other AS types, pointing out their central role as tier-1 and backbone providers. The longest characteristic path lengths also give information about the relative positions of ASes. For example, the distance between two universities is three Internet service providers, on average. The first attack strategy removes the hubs from this power-law network and has a detrimental effect. Communication among large ISPs is the most vulnerable, and many other characteristic path lengths are affected because of the backbone role of large ISPs. The largest connected cluster is severely damaged under this attack strategy, reduced almost to 0% of its original size. Removing non-provider ASes on the other hand, does not have a significant effect on other communications besides the one in which the specific label is involved. The role of small ISPs has also been quantified: 32% of the customer ASes and 29% of the universities receive their services from small ISPs and removing only 6% of the small ISPs will cut off Internet connectivity to most of the dependent customer ASes and universities. Under edge removal no large ISPs are removed from the largest connected cluster. Acknowledgements We thank EU MANMADE project for support. LK thanks the Ministry of Education and Science of Republic of Macedonia for support (project: Annotated graphs in system biology). References [1] X. Dimitropoulos, D. Krioukov, A. Vahdat, G. Riley, Graph annotations in modeling complex network topologies, ACM Trans. Model. Comput. Simul. 19 (4) (2009) 17. [2] R. Carvalho, L. Buzna, F. Bono, E. Gutierrez, W. Just, D. Arrowsmith, Robustness of trans-European gas networks, Phys. Rev. E 80 (2009) 016106. [3] X. Dimitropoulos, D. Krioukov, G. Riley, K.C. Claffy, Revealing the autonomous system taxonomy: the machine learning approach, Passive and Active Measurements Workshop (PAM) (Mar.) (2006). [4] T. Zesch, I. Gurevych, Analysis of the wikipedia category graph for NLP applications, in: Proceedings of the TextGraphs-2 Workshop (NAACL-HLT), 2007. [5] J.L. Ortega, I. Aguillo, Mapping world-class universities on the Web, Inf. Process. Manage. 2 (45) (2009) 272–279. [6] R. Albert, A.-L. Barabasi, Statistical mechanics of complex networks, Rev. Modern Phys. 74 (2002) 47. [7] R. Albert, A.L.H. Jeong, A.-L. Barabasi, Nature (London) 406 (2000) 378. [8] P. Holme, B.J. Kim, C.N. Yoon, S.K. Han, Attack vulnerability of complex networks, Phys. Rev. E 65 (2002) 056109. [9] P. Crucitti, V. Latora, M. Marchiori, A. Rapisarda, Efficiency of scale-free networks: error and attack tolerance, Phys. A 320 (2003) 622–642. [10] R. Albert, I. Albert, G.L. Nakarado, Structural vulnerability of the North American power grid, Phys. Rev. E 69 (2004) 025103(R). [11] V. Latora, M. Marchiori, Vulnerability and protection of infrastructure networks, Phys. Rev. E 71 (2005) 015103(R). [12] P. Crucitti, V. Latora, M. Marchiori, Model for cascading failures in complex networks, Phys. Rev. E 69 (2004) 045104(R). [13] L. Huang, L. Yang, K. Yang, Geographical effects on cascading breakdowns of scale-free networks, Phys. Rev. E 73 (2006) 036102. [14] A.E. Motter, Y.-C. Lai, Cascade-based attacks on complex networks, Phys. Rev. E 66 (2002) 065102(R). [15] A.E. Motter, Cascade control and defense in complex networks, Phys. Rev. Lett. 93 (2004) 098701. [16] S. Boccaletti, J. Buldu, R. Criado, J. Flores, V. Latora, J. Pello, M. Romance, Multiscale vulnerability of complex networks, Chaos 17 (2007) 043110. [17] A. Clauset, C.R. Shalizi, M.E.J. Newman, Power-law distributions in empirical data, http://arxiv.org/abs/0706.1062v2.