Computer Communications 33 (2010) 2001–2011
Contents lists available at ScienceDirect
Computer Communications journal homepage: www.elsevier.com/locate/comcom
Characterizing and modeling the Internet Router-level topology – The hierarchical features and HIR model Jun Zhang a,b,*, Hai Zhao b, Jiuqiang Xu b, Zheng Liu b a b
Key Laboratory of Medical Image Computing, Ministry of Education, Northeastern University, Shenyang 110819, China College of Information Science and Engineering, Northeastern University, Shenyang 110819, China
a r t i c l e
i n f o
Article history: Received 29 October 2009 Received in revised form 27 April 2010 Accepted 20 July 2010 Available online 24 July 2010 Keywords: Complex network Hierarchy k-Core decomposition Modeling Internet Router-level topology
a b s t r a c t In this paper, we seek to understand the properties related with the hierarchical structure of the Internet Router-level topology. We first analyzed the hierarchical features of the Internet Router-level topology with the actual topology measuring data authorized by CAIDA (The Cooperative Association for Internet Data Analysis) Skitter project. Then we analyzed the distribution of the nodes in each shell, the distribution of the edges within the shell and to the higher shells and induced the computing functions. We proposed a new framework for modeling the Internet Router-level topology with the conclusion we have drawn and we implemented the model, HIR model. HIR model was based on the k-core decomposition in order to reveal the hierarchical of the network. The analysis shows that HIR model can reproduce Internet on most properties and has got the hierarchies of network close to that of Internet. The experiments show that HIR model can fully reflect the hierarchy characteristics of the Internet Router-level topology while preserving the power-law distribution on degree. Crown Copyright 2010 Published by Elsevier B.V. All rights reserved.
1. Introduction As a prototypical example of complex network, the research and modeling on the Internet topology has become a hot topic at present and has attracted more and more attention of academia. In recent years, researchers in this field have made considerable progress, especially at the Autonomous System level (AS-level) Internet topology. However, due to the complexity of the Internet Router-level topology, people have not got a clear understanding of it up to now. There are still some laws in this seemingly chaotic network that we have not known, and there is a need for us to further excavate. The research and modeling on the hierarchy of the Internet Router-level topology can help people better understand the characteristics of the Internet basic architecture, which will bring a great effect on the design of the next generation of Internet and the research on the protocol systems related with Internet. The work in this paper is focusing on the Internet Router-level topology. Recent studies show that the Internet Router-level topology does not only exhibit power-law degree distributions, but also exhibits evident hierarchical properties. Internet has not only LAN/ WAN or AS/Router-level hierarchies in traditional meaning, but * Corresponding author at: Key Laboratory of Medical Image Computing, Ministry of Education, Northeastern University, Shenyang 110819, China. Tel.: +86 024 83681067 10; fax: +86 024 83681067 17. E-mail address:
[email protected] (J. Zhang).
also exhibits spontaneous and multi hierarchical features [1]. Based on the k-core decomposition, analyzing the hierarchy of the Internet Router-level topology, and then modeling on it are necessary for protocol performance evaluation and simulation of a variety of network problems. In this paper, we began with the k-core decomposition, combining with the massive data authorized by CAIDA Skitter project, and analyzed the hierarchical features of the Internet Router-level topology. Based on it, we studied the method of modeling on the hierarchy of the Internet Router-level topology, and proposed a hierarchical model for it based on the k-core decomposition, HIR model. Experiments show that HIR model can not only reflect the power-law distribution of the network, but can also exhibit the hierarchical features of it.
2. Related work Internet research requires realistic models to correctly generate Internet-like networks. Internet topology modeling is a complex task, involving network measurement, graph theory, algorithm design, statistics, data mining, visualization, mathematical modeling and other research areas [2]. It is due to the complexity and difficulty, that a large number of experts are attracted to this field. Early modeling efforts focused around random graphs with relatively regular degree distributions. One typical model of them was proposed by Waxman [3]. This model later was abandoned
0140-3664/$ - see front matter Crown Copyright 2010 Published by Elsevier B.V. All rights reserved. doi:10.1016/j.comcom.2010.07.010
2002
J. Zhang et al. / Computer Communications 33 (2010) 2001–2011
in favor of other models such as GT-ITM [4] that incorporated the hierarchical structures observed in the Internet. With the rapid growth of the network and the persistent effort of the network measurement [5–7], real topology data started becoming available, in particular at the AS-level. Using such data Faloutsos et al. first observed that the degree distribution of the AS-level topology is actually consistently highly skewed [8]. Structural models such as GT-ITM failed to reproduce this specific form of degree distribution. Consequently, the research community has shown considerable interest in obtaining topology models that better resemble the real data, and has introduced many Internet models, such as BRITE model [9], Inet model [10], BA model [11], AB model [12], GLP model [13], PFP model [14,15], DP model[16], TANG model [17,18], MLW model [19] and Nem [20]. However, one should note that most of these graph generative models usually aim in modeling (explaining) just a single property of the network. The graphs so generated do not match the observed topologies with respect to a wide range of metrics considered important in the literature. Among them, only PFP model and Nem [21] can accurately reproduce a large number of the topology characteristics. But, PFP model was based on the AS-level topology, the parameters for modeling the Router-level topology have not been obtained so far. On the other hand, PFP model was originally designed to match only two properties of the Internet AS-level topology — the exact form of degree distribution and the rich-club connectivity, and so it cannot simulate the hierarchical features of the topology. Nem [22] which can generate Internet topology at both Router-level and AS-level can reflect Internet properties well in power-laws and distances properties, but as for hierarchical properties, it has not considered. Here, we consider the use of the k-core decomposition as a method for studying and modeling the hierarchical structure of the Internet Router-level topology. k-cores in graph theory was introduced by Seidman [23] and by Bollobas [24] as a method of (destructively) simplifying graph topology to aid in analysis and visualization. Now, the main applications of the k-core decomposition in the research of the complex network are as follows: (1) The analysis on the hierarchy of large scale complex network topologies [25,26]; (2) The visualization of large scale complex network topologies [27,28]; (3) The validation of the models of large scale complex networks [29].
3. Main definitions of k-core decomposition The k-core decomposition allows to characterize networks beyond the degree distribution and uncover structural properties and hierarchies due to the specific architecture of the system. The k-core decomposition [23,24,29–31] consists in identifying particular subsets of the network, called k-cores, each one obtained by a recursive pruning strategy. The k-core decomposition therefore provides a probe to study the hierarchical properties of large scale networks, focusing on the network’s regions of increasing centrality and connectedness properties. The main definitions are as follows: Let us consider a graph G = (V, E) of jVj = n vertices and jEj = e edges; a k-core is defined as follows [26]: Definition 1 (k-core [26]). A subgraph H = (C, EjC) induced by the set C # V is a k-core or a core of order k iff "v 2 C: degreeH(v) P k, and H is the maximum subgraph with this property. The number of the vertices in the k-core is called the size of the k-core.
According to Definition 1, the k-core of a network is thus the set of nodes and edges which remains after all nodes with degree less than k, and the edges attached to them, have been culled or removed. It is worth remarking that this process is not equivalent to prune nodes of a certain degree. When you remove nodes of a given degree you change the degree of the remaining nodes, so the process must be repeated until all the nodes in the remaining graph have at least the degree k. Definition 2 (shell-index [26]). A vertex i has shell-index k if it belongs to the k-core but not to (k + 1)-core. It is worth remarking that the shell-index of a node is not equivalent to the degree of it. Indeed, a star-like subgraph formed by a node with a high-degree that connects many nodes with degree one, and connected only with a single edge to the rest of the graph, has only got shell-index one no matter how large is the degree of the node. Definition 3 (k-shell [26]). A k-shell Sk is composed of all the vertices whose shell-index is k. The maximum value k such that Sk is not empty is denoted kmax. The k-core is thus the union of all shells Sc with c P k. The number of nodes in the k-shell is called the size of the k-shell. The k-core decomposition therefore identifies progressively internal cores and decomposes the networks layer by layer, revealing the structure of the different k-cores from the outmost one to the most internal one. 4. Data choice and access 4.1. Data choice The complexity of Internet results directly the complexity of its topology, especially for the Router-level topology. Facing millions of Internet routers, the first difficulty faced by us is how to measure them from the Internet. The Embed Technology Laboratory of Northeastern University in China was authorized by CAIDA in 2005, and has been taking part in the research on the characteristics of the Internet topology actively after the first node of CAIDA in China (Neu node) was established [32]. The Embed Technology Laboratory of Northeastern University can not only get the topology measuring data from CAIDA in the world, but can also analyze the first topology information of Neu node both timely and dynamically. It can provide us abundant data resources and convenient conditions for researching the characteristics of the Internet Router-level topology. So we choose the Router-lever Internet topology measuring data of CAIDA as the basis data used in the paper. 4.2. Results of data access The data used in the paper are from the Router-lever Internet topology measuring data of CAIDA in May 2007. We have got the Router-level Internet topology measuring results from 15 CAIDA monitors around the world, and have resolved IP aliases of them by using CAIDA iffinder IP Alias Resolution. The results are shown in Table 1. In order to resolve the sampling bias, we combined the measurement results of 15 monitors in Table 1. At last, we got a graph with 360,652 nodes and 925,769 edges. The biggest degree of it is 1206 and the highest shell-index of it is 25.
2003
J. Zhang et al. / Computer Communications 33 (2010) 2001–2011 Table 1 The monitors and their measurement results in may, 2007. Monitor
Edges
Nodes
arin cam champagne d-roor e-root f-root h-root i-root lhr m-root neu riesling sjc uoregon yto
381,069 371,167 388,662 371,876 380,607 375,229 366,709 383,232 367,260 202,611 397,299 384,090 359,032 357,952 371,977
286,554 282,019 290,672 282,048 288,271 283,454 275,589 290,073 281,534 151,214 266,844 287,254 273,192 272,362 280,616
5. The distribution of the nodes in each shell Fig. 1. Distribution of the nodes in k-shells of the Internet Router-level topology.
5.1. The distribution of the nodes in each shell The k-core decomposition recursively removes the nodes which have less than k neighbors from the Internet maps until all the nodes in the remained graph have at least k neighbors. The remained graph is k-core, and the nodes which have been removed and the edges among them construct (k 1)-shell. So we can use k-core decomposition to peel the network layer by layer, revealing the structure of the different cores and shells from the outmost one to the more internal ones. Using the k-core decomposition to peel the network we have got above, at last the Internet Router-level topology is divided into 25 shells. The number of the nodes in each shell is listed in Table 2, and shown in Fig. 1. We can see from Table 2 that the nodes are decreased from outer to inner in the outer shells. While in the inner shells, there is an evident fluctuation. It is difficult to find a function to compute the nodes in each shell. But during the k-core decomposition, the remained graph is smaller, the nodes in the k-cores are decreased from outer one to inner one, so we can first compute the nodes in each core, then get the nodes in each shell by the subtraction of the nodes in two adjacent cores. The distribution of the nodes in each core is shown as Fig. 2(a). The fitness figure is shown in Fig. 2(b), in which axis x is the k-core, and axis y is the logarithm of the ratio of the nodes in the k-core with the total nodes in the network. The fitness result is:
Y ¼ 0:1347 X 0:0967:
So we can get the function of the ratio of the nodes in the k-core (nk) with the total nodes (N) as follows:
pk ¼ 100:1347k0:0967 :
ð2Þ
Then the function of the nodes in the k-core (nk) is:
nk ¼ 100:1347k0:0967 N:
ð3Þ
In which, N is the total nodes in the network. Because the deviation of the fitness result with the actual data is large in core 1–3, we need to adjust the fitness results of these cores as follows:
n01 ¼ N; 2
n0k ¼ ðN n1 Þ ð10 k Þ=10 þ nk ;
ðk ¼ 2; 3Þ:
ð4Þ
Then the nodes in the k-shell (Sk) can be obtained by subtracting the nodes in (k + 1)-core from that in k-core, as follows:
sK ¼ n K ; sk ¼ nk nkþ1 ;
ðk ¼ 1; 2; . . . ; K 1Þ:
ð5Þ
In which, K is the highest shell-index of the Internet Router-level topology. 5.2. The distribution of the isolated nodes in each shell
ð1Þ
Table 2 The number of the nodes in each shell. Shell
Nodes
Shell
Nodes
1 2 3 4 5 6 7 8 9 10 11 12 13
115,636 119,251 41,840 30,459 14,599 11,487 7039 6052 3209 3188 2110 1131 1375
14 15 16 17 18 19 20 21 22 23 24 25
778 750 360 267 275 143 230 171 21 107 17 157
It can be seen from the results of the k-core decomposition that there are some isolated nodes in every shell except for the highest shell. There are no edges among these isolated nodes in the shell. They are linked with the nodes in the higher shells. There are k edges between each isolated node and the nodes in the higher shells. The distribution of the isolated nodes in each shell is shown as Fig. 3(a). The fitness figure is shown in Fig. 3(b), in which axis x is the k-shell, and axis y is the logarithm of the ratio of the isolated nodes with the total nodes in the network. The fitness result is as follows:
Y ¼ 0:3753 X 2:0923:
ð6Þ
So the function of the isolated nodes in k-shell (Ik) is as follows:
Ik ¼ e0:3753k2:0923 N;
ðk ¼ 1; 2; . . . ; K 1Þ:
ð7Þ
One should note that there are no isolated nodes in the highest shell.
2004
J. Zhang et al. / Computer Communications 33 (2010) 2001–2011
Fig. 2. Distribution of the nodes in the k-cores of the Internet Router-level topology.
Fig. 3. The distribution of the isolated nodes in each shell.
6. The distribution of the edges in the network The Internet Router-level topology is constructed by nodes and edges. The nodes describe the routers, and the edges describe the links between two routers. We have analyzed the distribution of the nodes in each shell above, now we analyze the distribution of the edges in each shell. The edges in one shell can be divided into two parts. One is the edges which connect the nodes in the same shell, we call them the edges within the shell, and the other is the edges which connect the nodes in the shell to the nodes in the higher shells, we call them the edges between the shells. Fig. 4(a) shows the distribution of the edges in k-shell, in which axis x is the k-shell, and axis y is the edges in the k-shell. We can see that the edges in each shell decrease from the outer shell to the inner shell except the outermost one. The fitness result under the logarithm is shown as Fig. 4(b). We can see that the fitness result is not satisfactory. So we continue to compute the ratio of the edges (Lk) with the nodes (Sk) in the shell, and fit it under the logarithm, shown as Fig. 5.
The fitness result is as follows:
Y ¼ 0:9473 X 0:0050:
ð8Þ
Then we get the function of the edges in k-shell (Lk) as follows:
Lk ¼ eð0:9473ln k0:0050Þ sk ;
ðk ¼ 1; 2; . . . ; KÞ:
ð9Þ
6.1. The distribution of the edges within the shell The edges in one shell can be divided into two parts: the edges within the shell and the edges between the shells, which we de0 0 note as Lkk and Lkk ðk > kÞ respectively. Now we first analyze the distribution of the edges within the shell. According to the ratio of the edges within the shell with the nodes in the shell we draw figures, shown as Fig. 6. Thus we get the function to compute the edges within the shell as follows:
Lkk ¼ ð0:3303 k 0:1382Þ sk ;
ðk ¼ 1; 2; . . . ; K 1Þ:
ð10Þ
2005
J. Zhang et al. / Computer Communications 33 (2010) 2001–2011
Fig. 4. Distribution of the edges in each shell.
From the fitness result, we can see that the deviation of the highest shell is very large, but because there is no shell higher than it, the edges in the highest shell is equal to the edges within it, that is LKK ¼ LK , and need not be computed by Eq. (10). 6.2. The distribution of the edges from one shell to the highest shell Among the edges between the shells, the edges linked to the nodes in the highest shell are different from the others, and need to be analyzed separately. The distribution of the edges between one shell and the highest shell is shown as Fig. 7, in which axis x is k-shell, and axis y is the ratio of the edges from one shell to the highest shell with the edges in the shell. Thus we get the function of the edges from one shell to the highest shell as follows:
LKk ¼ 2ðkðK1ÞÞ=2 Lk ; Fig. 5. Fitness graph of the ratio of the edges with the nodes in the shell.
Fig. 6. The distribution of the edges within each shell.
LKK1
¼ LK1
LK1 K1 :
ðk ¼ 1; 2; . . . ; K 2Þ;
ð11Þ
Fig. 7. The distribution of the edges from the nodes in the k-shell to the nodes in the highest shell.
2006
J. Zhang et al. / Computer Communications 33 (2010) 2001–2011
Fig. 8. The distribution of the edges from the nodes in the k-shell to the nodes in the higher shells.
6.3. The distribution of the edges from one shell to the higher shells Fig. 8 shows the distribution of the edges from one shell to the higher shells, in which axis x and axis y are the k-shell, and axis z is the ratio of the edges from one shell to the higher shells (except the highest shell) with the edges in the shell. The range of k is from 1 to 24. According to Fig. 8, we get the function as follows: 0
0
Lkk ¼ ea1 k þa2 Lk ;
0
ðk ¼ k þ 1; . . . ; K 1Þ:
ð12Þ
Then we fit on the coefficient of a1 and a2, and get the functions: 2
a1 ¼ 0:0015 k þ 0:0398 k 0:375;
ð13Þ
a2 ¼ 0:0051 k 0:8229:
ð14Þ
Combining with Eq. (12)–(14), we get the function of the edges from k-shell (1 6 k 6 K 2) to the higher shells as follows: 0
Lkk ¼ eð0:0015k
2
þ0:0398k0:375Þk0 þ0:0051k0:8229
Lk ;
0
ðk ¼ k þ 1; . . . ; K 1Þ:
ð15Þ
7. The hierarchical model of the Internet Router-level topology Combining with the results we analyzed above and wiht some classical modeling on the Internet topology, now we give a new framework on modeling the Internet Router-level topology which uses k-core decomposition as basement and combines with the property of preferential attachment. While ensuring that the power-law on the degree distribution of the network, fully reflects the hierarchical structure of the Internet Router-level topology. 7.1. Design thinking The main thought of the modeling is as follows: (1) According to the k-core decomposition, we can regard the network as the evolvement from the highest shell. So when modeling the Internet topology, it begins with the highest shell, then imports the second highest shell, and adds links between them, and so on, until the lowest shell. (2) We can see from the above analysis that the highest shell is special compared with the other shells, such as the number of the nodes, the distribution of the nodes’ degree and the ratio of the edges. It plays a crucial role. So it must be treated separately.
(3) On the basis of the highest shell, when we add one new shell, we should add edges among the nodes in the shell, then add the edges between the nodes in the shell and the nodes in the higher shells. (4) According to the analysis above, we can see that there are some isolated nodes in each shell except the highest shell. There are no edges among them in the shell. They are all linked with the higher shells, there are k edges from each of them to the nodes in the higher shells. So when we construct the kcore upon the (k + 1)-core, we should first add some new nodes according to Eq. (5), then choose some nodes according to Eq. (7) among them to be connect to the higher shells, and the rest of the nodes are not only linked with each other, but also linked with the nodes in the higher shells. (5) The number of the nodes in each shell of the model is computed by Eq. (5). The isolated nodes in each shell are computed by Eq. (7). The edges within the shell are computed by Eq. (10). The edges among the nodes in the shell to the nodes in the highest shell are computed by Eq. (11). The edges among the nodes in the shell to each higher shell are computed by Eq. (15). (6) When we choose the nodes in the higher shells to connect with the nodes in the new shell, we adopted the preferential attachment as basis, and choose the nodes according to the probability computed by Eq. (16). 1þ0:28 logðdi Þ
P ¼ di
=
X
1þ0:28 logðdj Þ
dj
ð16Þ
:
j
7.2. The modeling algorithm The modeling algorithm is as follows: Input: N — the number of the nodes in the network; K — the highest shell-index of the Internet Router-level topology; Output: Internet Router-level topology. 1. Compute the nodes in each core, nk, according to Eqs. (3) and (4); 2. Compute the nodes in each shell, Sk, according to Eq. (5); 3. Compute the isolated nodes in each shell, Ik, according to Eq. (7); 4. Compute the edges in each shell, Lk, according to Eq. (9); 5. Compute the edges within the shell, Lk , according to Eq. k
(10); 6. 7. 8. 9. 10. 11. 12. 13. 14.
Compute the edges to the highest shell, LKk , according to Eq. (11); Compute the edges with each higher shell, kþ2 K1 , according to Eq. (15); Lkþ1 k ; Lk ; . . . ; Lk Construct the highest shell, K-shell, according to the nodes and edges in the highest shell; for (k = K 1; k > 0; k) Add Sk nodes, and set their initial degree 0;
Construct Lkk edges among Sk Ik nodes in the new shell; for (k0 = K; k0 > k; k0 ) 0
for ði ¼ 1; i <¼ Lkk ; i þ þÞ Choose one node, v1, in the k0 -shell with the probability 1þ0:28 logðdi Þ
P ¼ di
.X j
1þ0:28 logðdj Þ
dj
;
2007
J. Zhang et al. / Computer Communications 33 (2010) 2001–2011
15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25.
Pick up the nodes which degree less than k in the k-shell into a set, S; if (S is not empty) choose a node, v2, randomly in S; else choose a node, v2, in the un-isolated nodes in the k-shell; end if; Add one edge between v1 and v2, increase the degree of them by 1; end for; end for; end for; The end
Note: Step 1–7 is used to compute the parameters for modeling. Step 8 is used to construct the highest shell. First set the degree of the nodes in it at 0, and then once add one edge for one node, then increase the degree of it by 1. Construct K edges for each node in the shell, then add edges between two nodes choosen randomly, until the edges in K-shell satisfy the need. Step 9–24 is used to construct (K 1)-core, (K 2)-core, . . . until the outmost core, that is 1-core. In which step 11 is used to add edges within the k-shell: add one edge between two nodes choosen randomly in the un-isolated nodes in k-shell according toLkk , and increase the degree of the two nodes by 1. Step 12–23 is used to construct edges between k-shell and k0 shell for each k0 higher than k. Choose one node in the k0 -shell according to the probability formula as one node for one edge, the other node is choosen randomly in k-shell. First choose the node whose degree is less than k to ensure that the degree of all nodes in k-shell is bigger than or equal to k at last. If there is no such node, then choose a node among the un-isolated nodes in order to ensure that the isolated nodes in k-shell have just k edges with the nodes in the higher shells. 7.3. Experiment and analysis 7.3.1. Comparison of basic properties According to the above algorithm, we have achieved the model, which we call HIR model. By comparing with the actual Internet Router-level topology, the advantages and disadvantages of HIR model are discussed. Table 3 illustrates the comparison of the basic properties of HIR model with the actual measurement data of Internet Router-level topology, that includes the nodes, the edges, the biggest degree, the average degree, the highest shell-index, the average shell-index, the average path lengths, the assortative coefficient and the average clustering coefficient, etc. It can be seen that at most properties, HIR model can well simulate the Internet Router-level topology. It is close to the actual measuring data of Internet at most properties, especially for the highest shell-index. HIR model can not only simulate the basic properties of the Internet Router-level topology, but can also fully exhibit the hierarchical structure of it. While for the assortative
coefficient, the deviation of them is large. Both of them are negative which mean that they are all disassortative networks, and the essence is not changed. 7.3.2. The comparison of the features in k-cores In order to illustrate the advantages of HIR model on exhibiting the hierarchical properties of Internet, we analyze the nodes, edges and the average degree in each core of HIR model and Internet actual data, as shown in Fig. 9. It can be seen that HIR model is close to the Internet actual data on the properties in each core, which further illustrates the advantages of HIR model in describing the hierarchical properties of the Internet Router-level topology. 7.3.3. Comparison of the degree distribution Degree distribution is the most commonly used evaluation of the characteristics of a network. The definition of it is: let n(d) denote the number of the nodes whose degree is d in the network, then the degree distribution of the nodes is described as follows: Choose one node from the network randomly, the probability of its degree is d, that is p(d) = n(d)/n, then p(d) is the degree distribution of the network. If p(d) dr (r is positive), then the network is called the scale-free network. The degree distribution of the Internet actual data and HIR model under the log–log scale is shown as Fig. 10. We can see that the degree distribution trend of them is consistent. HIR model well simulate the power-law degree distribution of the Internet Router-level topology. 7.3.4. Comparison of the shell-index distribution The shell-index distribution is defined: let n(k) denote the number of the nodes whose shell-index is k, then the shell-index distribution of the nodes is described as follows: Choose one node from the network randomly, the probability of its shell-index is k, that is p(k) = n(k)/n, then p(k) is the shell-index distribution of the network. The shell-index distribution of HIR model and the Internet actual data under the log–log scale is shown as Fig. 11. We can see that the two curves are very close to each other which illustrate that HIR model has the same shell-index distribution with the Internet. 7.3.5. Comparison of the average degree of the nearest neighbors Degree correlation is another important measurement of the network, it can be described by the average degree of the nearest neighbors dnn(d) of nodes of degree d. Its definition is as follows: Definition 4 (average degree of the nearest neighbors [33]). The average degree of the nearest neighbors dnn(d) of nodes of degree d is:
dnn ðdÞ ¼
1 X 1 X di ; nd j=d ¼d dj i2VðjÞ
ð17Þ
j
where V(j) is the set of the dj neighbors of node j and nd is the number of the nodes of degree d.
Table 3 The basic properties of Internet and HIR model. Source
Nodes
Edges
dmax
hdi
kmax
hki
hDi
ac
cc
Internet HIR model
360,652 360,652
925,769 958,039
1206 1122
5.134 5.313
25 25
2.806 2.930
8.864 5.296
0.024 0.294
0.079 0.053
Note: dmax = the biggest degree, hdi = the average degree, kmax = the highest shell-index, hki = the average shell-index, hDi = the average path length, ac = the assortative coefficient, cc = the average clustering coefficient.
2008
J. Zhang et al. / Computer Communications 33 (2010) 2001–2011
Fig. 9. The comparison of the features in k-cores of Internet and HIR model. (a) Comparison of the nodes in linear-log scale; (b) comparison of the edges in linear-log scale; (c) comparison of the average degree.
Fig. 10. The comparison of degree distribution under the log–log scale.
Fig. 11. The comparison of shell-index distribution under the log–log scale.
J. Zhang et al. / Computer Communications 33 (2010) 2001–2011
2009
If dnn(d) increases with d, then the network is the assortative, which indicates that large degree nodes are preferentially connected with other large degree nodes; If dnn(d) decreases with d, then the network is disassortative, which corresponds to a network structure where nodes with small degree are preferentially connected to the large degree nodes. The average degree of the nearest neighbors of HIR model and the Internet actual data is shown as Fig. 12. We can see that they both decrease with degree d which means they are both disassortative networks. 7.3.6. Comparison of local cluster-coefficient Another often studied relevant quantity of the network is the clustering coefficient [34] that measures the local group cohesiveness and is defined for any node j as the fraction of connected neighbors of j
ccj ¼
2Ej ; dj ðdj 1Þ
ð18Þ Fig. 13. The comparison of local cluster-coefficient.
where Ej is the number of the edges between the dj neighbors of j. The clustering of the network can be shown by the local clustering coefficient of the nodes of degree d, cc(d), defined as
ccðdÞ ¼
1 X ccj : nd j=d ¼d
ð19Þ
j
The local clustering coefficient of HIR model and the Internet actual data is shown as Fig. 13. We can see that they are well closed. 7.3.7. Comparison of rich-club coefficient Zhou and Mondragón [35] discovered a new hierarchy structure of the Internet, called the rich-club phenomenon, which describes the fact that high-degree nodes, ‘rich’ nodes, are tightly interconnected with other rich nodes, forming a core group or club. Subgraphs formed by richer nodes are progressively more interconnected. The quantity of interconnectivity among members of a rich-club is measured by the metric of rich-club connectivity. The rank of a node, r, denotes a node’s position on the non-increasing degree list of the network, i.e. r = 1, . . . , N, where N is the total nodes contained in the network. If the rich-club membership is defined as ‘the r best connected nodes’, then the rich-club connectivity, U(r), is defined as the ratio of the actual number of edges to the maximum possible
Fig. 14. The comparison of rich-club coefficient.
number of edges, i.e. r(r 1)/2, among the rich-club members. Rich-club connectivity indicates how well club members ‘know’ each other, e.g. U = 1 means that all the members have a direct link to any other member, i.e. they form a fully connected mesh, a clique. Fig. 14 illustrates the rich-club connectivity of HIR model and Internet actual data as a function of the increasing rich-club membership measured as the node rank normalised by the number of the nodes. We can see that the two curves are well close to each other.
7.4. Discussion
Fig. 12. The comparison of the average degree of the nearest neighbors.
From the above analysis we can see that the HIR model can reproduce the Internet Router-level topology on degree distribution, shell-index distribution, degree correlation, clustering property rich-club connectivity, etc. On the other hand, the properties in each core of HIR model and Internet are also close, which fully illustrate the advantages of HIR model on simulating the Internet Router-level topology. HIR model can fully reflect the hierarchy characteristics of Internet topology while preserving the powerlaw distribution of node degree.
2010
J. Zhang et al. / Computer Communications 33 (2010) 2001–2011
Fig. 15. The changes of the nodes and the average path length of the largest connected graph after attacked.
It is worth to note that the hierarchy of HIR model is different from that of GT-ITM model. These two hierarchies have different origins and motivations: on the one hand, the GT-ITM classification is based on the inference of AS commercial relationships; on the other hand, and in a somehow opposite point of view, the k-core decomposition gives a classification of the network’s nodes which does not have a priori fixed number of classes or levels, but which adapts itself to the situation of the network. Moreover, the shell-index of a node is not fixed once and for all, but may fluctuate in time due to possible connectivity changes. In this aspect, such a hierarchy provides very relevant information about the state of the network at a given time.
8. Application of the model Modeling on the Internet Router-level topology has vast applications, such as the analysis of network traffic flow, the routing protocol evaluation, analysis of the security threat posed by viruses, worms, spyware, and spam, and the tolerance to a deliberate attack. Above all, it can also provide a guide to the design of the next generation of Internet. From the above analysis, we can find that the nodes in the highest core play a crucial role in the whole network. Then does attacking them affect the robustness of the network? About this question, we make an experiment on HIR model. Suppose some nodes in the highest core are attacked by some factors, and are invalid, their invalidation will then affect their neighbors, and result in the invalidation of some of the neighbors, and so on. At last, they will lead many nodes to be invalidated which will affect the connectivity of the whole network. We can measure the robustness of the network by the number of the valid nodes and the average path length of the largest connected graph of the remaining network. The algorithm of simulating attack on one node in the highest core is as follows: 1. 2. 3. 4.
Choose a node randomly in the highest core as the node attacked; Remove it and all the edges attached to it from the network; Construct a list of it to store its nearest neighbor nodes; Choose a neighbor node from the list, let its degree be reduced by 1; 5. If the neighbor node’s degree equals to 0, then remove it from the network; 6. Repeat step 3–5 until all the neighbor nodes have been treated; 7. Pick the largest connected graph from the remaining network;
According to the algorithm, we obtained the largest connected graph of the network remained when 1%, 10%, 20%, 30%, . . ., 100% of the nodes in the highest core are attacked, and we computed the number of the nodes and the average path length of it. The results are shown in Fig. 15. It can be seen that with the increase of the percentage of the attacked nodes in the highest core, the number of the nodes of the remaining largest connected graph decreases slowly, and the average path length increases slowly. But their changes are very small compared with that of the original network. So attacking on the nodes in the highest core cannot affect evidently the robustness of the network. This is because of the fact that although the nodes in the highest core are in the center of the network, their degree is not large compared with the other nodes, for example, the largest degree of the network we analyzed is 1206, and the largest degree in the highest core is only 254. So attacking on the nodes in the highest core cannot affect evidently the whole robustness of the network. Thus we can simulate attacking on the model, find the fragile nodes to protect them especially in order to enhance the security of the network. 9. Conclusion Recent studies show that the Internet Router-level topology has got evident hierarchy. In order to better describe the properties of Internet, making a model reflecting the hierarchy of Internet topology is very necessary. Up to now, there are few models that can reflect the hierarchy of the Internet Router-level topology. For PFP model, which was originally designed to match only two properties of the Internet AS-level topology — the exact form of degree distribution and the rich-club connectivity, cannot reflect the hierarchy of the network. The transitional hierarchy models, such as GT-ITM model, classification is based on the inference of AS commercial relationships, and not the hierarchy of Internet topology. So we proposed a model, HIR model, which can fully reflect the hierarchical structure of the Internet Router-level topology based on the k-core decomposition. It can be applied to simulating experiments, and provide a guide to the design of the next generation of Internet. References [1] K. Calvert, M. Doar, E. Zegura, Modeling Communication Magazine l.35 (1997) 160–163.
Internet
topology,
IEEE
J. Zhang et al. / Computer Communications 33 (2010) 2001–2011 [2] Y. Zhang, H.L. Zhang, B.X. Fang, A survey on Internet topology modeling, Journal of Software 15 (8) (2004) 1220–1226. [3] B.M. Waxman, Routing of multipoint connections, IEEE Journal of Selected Areas in Communications (Special Issue: Broadband Packet Communications) 6 (9) (1988) 1617–1622. [4] E. Zegura, K. Calvert, S. Bhattacharjee, How to model an internetwork, in: INFOCOM, 1996. [5] Traceroute.org, Public route server and looking glass site list.
. [6] National Laboratory for Applied Retwork Research, Route views archive. . [7] Cooperative Association for Internet Data Analysis, Caida. . [8] M. Faloutsos, P. Faloutsos, C. Faloutsos, On power-law relationship of the Internet topology, in: Proceedings of the ACM SIGCOMM’ 99, 1999, pp. 251–262. [9] A. Medina, A. Lakhina, I. Matta, J. Buers, BRITE: an approach to universal topology generation, in: Proceedings of the MASCOTS, Washington, 2001, pp. 346–353. [10] Jared Winick, Sugih Jamin, Inet-3.0: Internet topology generator, Technical Report, CSE-TR-456-02, University of Michigan, Ann Arbor, 2002. [11] A.L. Barabási, R. Albert, Emergence of scaling in random networks, Science 286 (1999) 509–512. [12] R. Albert, A.L. Barabási, Topology of evolving networks: local events and universality, Physical Review Letters 85 (24) (2000) 5234. [13] B. Tian, D. Towsley, On distinguishing between Internet power-law topology generators, in: Proceedings of the IEEE INFOCOM 20.02, vol. 2, IEEE, New York, 2002, pp. 638–647. [14] S. Zhou, R.J. Mondragon, Accurately modeling the internet topology, Physical Review E 70 (2004) 066108. [15] S. Zhou, R.J. Mondragon, Towards modeling the Internet topology – the interactive growth model, Teletraffic Science and Engineering 5 (2003) 121– 130. [16] S.T. Park, D.M. Pennock, C.L. Giles, Comparing static and dynamic measurements and models of the Internet’s topology, in: Proceedings of the 23rd Annual Joint Conference of the IEEE Computer and Communications Societies, vol. 3, 2004, pp. 1616–1627. [17] B. Sagy, G. Mira, W. Avishai, An incremental super-linear preferential Internet topology model, in: Proceedings of the 5th Annual Passive and Active Measurement Workshop, LNCS, vol. 3015, 2004, pp. 53–62. [18] B. Sagy, G. Mira, W. Avishai, A geographic directed preferential Internet topology mode, 2005, Arxiv:CS, NI/0502061. [19] G. Chen, Z.P. Fan, X. Li, Modeling the complex Internet topology, in: G. Vattay, L. Kocarev (Eds.), Complex Dynamics in Communication Networks, SpringerVerlag, Berlin, 2005, pp. 213–234.
2011
[20] D. Magoni, Nem: a software for network topology analysis and modeling, in: Proceedings of the MASCOTS’02, 2002, pp. 364–371. [21] D. Magoni, J.J. Pansiot, Evaluation of Internet topology generators by powerlaw and distance indicators, in: Proceedings of the ICON’02, 2002, pp. 401– 406. [22] D. Magoni, J.J. Pansiot, Internet topology modeler based on map sampling, in: Proceedings of the ISCC’02, 2002, pp. 1021–1027. [23] S.B. Seidman, Network structure and minimum degree, Social Networks 5 (1983) 269–287. [24] B. Bollobás, The evolution of sparse graphs, in: B. Bollobás (Ed.), Graph Theory and Combinatorics, Proceedings of the Cambridge Combinatorial Conference in honour of Paul Erdós, Academic Press, London, 1984, pp. 35–37. [25] A.V. Goltsev, S.N. Dorogovtsev, J.F.F. Mendes, k-core (bootstrap) percolation on complex networks: critical phenomena and nonlocal effects, Physical Review E 73 (2006) 056101. [26] J.I. Alvarez-Hamelin, L. Dall’Asta, A. Barrat, A. Vespignani, k-core decomposition of Internet graphs: hierarchies, self-similarity and measurement biases, Networks and Heterogeneous Media 3 (2008) 371– 393. [27] M. Baur, U. Brandes, M. Gaertler, D. Wagner, Drawing the AS graph in 2.5 dimensions, in: Proceedings of the 12th International Symposium on Graph Drawing, Springer-Verlag editor, 2004, pp. 43–48. [28] J.I. Alvarez-Hamelin, L. Dall’Asta, A. Barrat, A. Vespignani, k-core decomposition: a tool for the visualization of large scale networks, 2005, Arxiv preprint cs:NI/0504107. [29] I.J.I. Alvarez-Hamelin, L. Dall’Asta, A. Barrat, A. Vespignani, Large scale networks fingerprinting and visualization using the k-core decomposition, in: Y. Weiss, B. Schölkopf, J. Platt (Eds.), Advances in Neural Information Processing Systems, vol. 18, MIT Press, Cambridge, MA, 2006, pp. 41–50. [30] V. Batagelj, M. Zaversnik, Generalized cores, Arxiv preprint cs:0202039, 2002, pp. 1–3. [31] B. Bollobás, A. Thomason, Random graphs of small order, in: M. Karonski, A. Rucinski (Eds.), Random Graphs’ 83, Annals of Discrete Math, vol. 28, 1985, pp. 47–97. [32] Macroscopic Topology Measurements, CAIDA. . [33] R. Pastor-Satorras, A. Vázquez, A. Vespignani, Dynamical and correlation properties of the Internet, Physical Review Letters 87 (2001) 258701. [34] D.J. Watts, S.H. Strogatz, Collective dynamics of small-world networks, Nature 393 (1998) 440–442. [35] S. Zhou, R.J. Mondragón, The rich-club phenomenon in the Internet topology, IEEE Communication Letters 8 (3) (2004) 180–182.