Computer Networks 48 (2005) 605–625 www.elsevier.com/locate/comnet
Edge-based traffic engineering for OSPF networks
q
Jun Wang *, Yaling Yang, Li Xiao, Klara Nahrstedt Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL 61801-2302, United States Received 22 February 2004; received in revised form 12 October 2004; accepted 1 November 2004 Available online 22 December 2004 Responsible Editor: I. Matta
Abstract This paper proposes and evaluates a novel, edge-based approach, which we call the k-set Traffic Engineering (TE) method, to perform traffic engineering in OSPF networks by partitioning traffic into uneven k-traffic sets. The traffic partitioning and splitting takes place only at network edges, leaving the core simple. We theoretically prove that if k is large enough, the k-set TE method achieves the general optimal traffic engineering where full-mesh overlaying and arbitrary traffic splitting, such as in MPLS, have to be used. We give an upper bound of the smallest k that achieves such a general optimum. In addition, we provide a constant worst case performance bound if k is smaller than the optimal k. Finding the optimal traffic splitting and routing for a given k is NP-hard. Therefore, we present a heuristic algorithm to handle the problem. The performance of the k-set TE method together with the proposed heuristic algorithm is evaluated by simulation. The results confirm that a fairly small k (2 or 4) can achieve good near-optimal traffic engineering. Overall, the k-set TE method provides a simple and efficient solution to achieve load balancing in OSPF networks. It follows the ‘‘smart edge, simple core’’ design rule of the Internet. It is also able to keep ‘‘the same path for the same flow,’’ which is desirable and beneficial to TCP applications. Ó 2004 Elsevier B.V. All rights reserved. Keywords: Traffic engineering; OSPF; Edge-based; Traffic set; Mathematical programming/optimization
q This work was supported by NSF Grant under contract number NSF ANI 00-73802 and NSF CISE Grant under contract number NSF EIA 99-72884. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation. * Corresponding author. E-mail addresses:
[email protected] (J. Wang),
[email protected] (Y. Yang),
[email protected] (L. Xiao),
[email protected] (K. Nahrstedt).
1. Introduction Traffic engineering is essential for todayÕs Internet Service Providers (ISPs) because of rapid growth of the network and increasing demands coming from end users and new applications. The major task of traffic engineering is to find appropriate routing and traffic allocation schemes
1389-1286/$ - see front matter Ó 2004 Elsevier B.V. All rights reserved. doi:10.1016/j.comnet.2004.11.008
606
J. Wang et al. / Computer Networks 48 (2005) 605–625
for given physical networks and user traffic demands, so that the traffic load is balanced and the overall network performance is optimized. One way to bring traffic engineering is to deploy new flow-based connection-oriented protocols, such as the Multi-Protocol Label Switching (MPLS) protocol, where traffic engineering is easy to implement. However, the destination-based hop-by-hop routing protocol, such as the Open Shortest Path First (OSPF) protocol [1], is still the most commonly used intra-domain routing protocol in todayÕs Internet. On one hand, it is simple, robust, and highly scalable. But on the other hand, it is believed that OSPF may lead to congestion, hence suffering from bad performance if traffic engineering does not exist. Therefore, traffic engineering in OSPF networks is extremely important and meaningful, and a good traffic engineering solution on top of OSPF can both improve network performance and leverage the widespread deployment of OSPF. Some of the latest research results show that the Equal Cost Multi-Path (ECMP) feature 1 in OSPF can help to achieve the traffic engineering purpose. Ideally, if we assume that traffic can be split arbitrarily, then by properly setting link weights [2], we can achieve load balancing in OSPF that is comparable to MPLS networks [3]. However, in reality, the ECMP only supports even traffic splitting, which is not enough to approximate the general optimal result as in MPLS. Therefore, Sridharan et al. proposed an enhanced method in [4], where only a subset of next hops of the equal cost paths between a source and a destination is used for distributing traffic. By carefully choosing the next hop subsets, this method mimics uneven traffic splitting and better approximates the general optimal result. Some other methods, such as in [5], try to achieve better traffic engineering by combining the OSPF ECMP and MPLS techniques together. We will summarize and compare the existing traffic engineering approaches in detail in Section 2.
1
The ECMP feature allows traffic to be distributed equally among multiple next hops of the equal cost paths between a source and a destination [1].
All existing approaches for OSPF traffic engineering are based on ECMP and link weight manipulation. Besides the even splitting constraint, another significant disadvantage of ECMP is that the packets from a source to a destination may no longer be able to travel along the same path, even if they are from a single TCP flow. Since multiple paths may have very different delays and jitters (in order to result in multiple equal cost paths thus triggering ECMP, we may not be able to consider delays when we set up link weights), TCP flows may suffer from bad performance due to packet out-of-order delivery. Furthermore, some routers in a network may not turn on the ECMP feature. Therefore, a big challenge is to achieve better traffic engineering in OSPF networks without the use of ECMP. We will solve this challenge in this paper by pushing traffic engineering decisions to the network edge. Our approach is inspired by the Differentiated Service (DiffServ) scheme [6,7]. Originally, the DiffServ scheme has been proposed as a cost-effective and scalable solution to provide better end-to-end QoS to applications. The basic idea of DiffServ fits in the ‘‘smart edge, simple core’’ design rule of the Internet very well; i.e., to push complex operations (such as packet classification, profiling, advanced weighted scheduling, etc.) to the network edge and keep the core simple. On one hand, because edge nodes have detailed flow information, they can perform finer-grained control on packets and provide partitioning along the flow boundaries. On the other hand, keeping the network core simple lets the system scale well. Apart from the original QoS goal of DiffServ, our goal is traffic load balancing. Our approach is called the k-set edge-based traffic engineering method (k-set TE method or simply k-set method for short). We intentionally use ‘‘traffic set’’ terminology instead of ‘‘traffic class’’ to differentiate our traffic partition from the service class concept in the QoS domain. In this paper, we define the traffic set as the traffic fraction that is split at the network edge for traffic engineering and load balancing purposes. As we will see later, the packets in one traffic set, flowing from the source to the destination, follow one simple path and no further splitting is allowed along the path. This means, our
J. Wang et al. / Computer Networks 48 (2005) 605–625
approach partitions traffic into uneven k traffic sets according to certain ratios 2 and routes these k traffic sets in proper ways to balance the traffic load in the entire network. Since the traffic splitting is pushed to the edge where flow information is available, we can perform uneven traffic splitting to get better near-optimal results, and also to follow strictly the rule of ‘‘simple core’’ in the Internet. More specifically, in order to implement the route differentiation and traffic splitting among multiple traffic sets, we take advantage of the Type of Service (ToS) field [8] or the Differentiated Service (DS) field [6,9] in IP headers and the QoS routing table extension in OSPF [2], where different next hops for different traffic sets can be recorded. Because our approach can be incorporated into existing protocols or their extensions, no substantial change is needed for deploying our approach except the packet classification and traffic allocation at network edges. In this paper, we also prove that for any given network, there always exists a large enough k which achieves the general optimum of traffic engineering. 3 And, this optimal k is less than or equal to the total number of links in the given network. If k is fixed and smaller than the optimal k, then how to partition and route traffic to achieve the optimal result under the k-set constraint (i.e., splitting only at edges and hop-by-hop routing for each traffic set) composes the k-set TE problem which is NP-hard. Note that this k-set optimum is not the general optimum, but the optimum under k-set constraint. We give a theoretical upper bound of the gap between the k-set optimum and the general optimum. Experimental results show that the k-set optimum is an excellent approximation to the general opti-
2 The traffic partition is packet-wise. However, since we perform the partition only at network edges, we can enforce the partition ratios and follow flow boundaries at the same time. 3 In this paper, we define the general optimum of traffic engineering to be the best load allocation we can get by using the optimal general routing or explicit flow-based routing as in MPLS. In this case, each flow is optimally distributed over all paths between a source and a destination [3,4,10]. The general optimum is equivalent to the optimum that is obtained by using shortest paths with arbitrary traffic splitting [10]. Furthermore, it can be obtained by solving linear programs (see detail in Section 3.3).
607
mum even with a very small k. We also present a fast heuristic solution to approximate the k-set optimum based on DijkstraÕs algorithm and linear programming. Performance is evaluated via simulations. Results show that our k-set TE method is very effective to approximate the general optimum even when k = 2 or 4, making our approach simple, efficient, and practical. The rest of the paper is organized as follows. Section 2 summarizes and compares existing traffic engineering research, followed by the introduction to some fundamentals of traffic engineering in Section 3. Section 4 gives an overview of our kset TE method. Then, in Section 5, we formulate the k-set traffic splitting and allocation problem using mixed integer programming and propose our heuristic solution to the problem. The performance issues are investigated in Section 6 by simulation. Finally, Section 7 concludes this paper.
2. Related work As mentioned in Section 1, traffic engineering is of great importance in the Internet. Existing traffic engineering techniques can be roughly classified into two categories in terms of different underlying routing paradigms they assume. One category is based on flow-based explicit routing (also called constraint-based routing). Traffic engineering solutions in this category mainly focus on flow-based networks, such as the MPLS networks [11–13]. The other category is based on destination-based hop-by-hop routing. Approaches in this category assume that the underlying networks use the traditional hop-by-hop forwarding paradigm, such as the OSPF networks [1]. The major difference between these two categories is the traffic engineering granularity. Methods in the first category aim at finding optimal routing and traffic allocation schemes for individual flows. Because each flow can explicitly select its own paths, finer control granularity and better results (with respect to balancing traffic load) can be obtained. However, as argued in [3,4,14], such methods have significant limitations because of the disadvantages of the flow-based protocols (e.g., MPLS) on which they rely. First, MPLS is not
608
J. Wang et al. / Computer Networks 48 (2005) 605–625
yet widely deployed. Second, MPLS is more complex and less robust (than OSPF). Third, MPLS is less scalable due to large size of routing table and state information. The approaches of the second category do not assume the existence of MPLS. They rely on the most widely used intra-domain routing protocols in todayÕs Internet, such as the OSPF or IS-IS. Forwarding decisions are solely based on the destination addresses in packetsÕ IP headers, and routing table construction is based on DijkstraÕs algorithm. In fact, if we look at each destination node and if there is no traffic splitting, the routes from all source nodes to the destination form a spanning tree rooted at the destination [15,16]. The existing approaches, as well as our k-set TE method, are compared in Table 1 with respect to different criteria. In the table, ‘‘Ar-sp,’’ ‘‘Un-sp,’’ ‘‘Ev-sp,’’ ‘‘Ev-sp,ext.,’’ and ‘‘Lmt-sp’’ stand for Arbitrarily splittable, Un-splittable, Evenly splittable, Evenly splittable with subset selecting extension, and Limited splittable, respectively. The flow-based approaches have more precise control and can achieve the general optimum of traffic engineering, but their scalability is low because of the ‘‘N-square’’ problem [10]. Some other disadvantages of the flow-based approaches have already been discussed above. In the destinationbased category, the arbitrarily splittable approach, which is equivalent to the general optimal method in the flow-based category, enjoys the finest con-
trol, but there is no existing protocol that supports it. The evenly splittable approach with subset selecting extension, proposed lately by Sridharan et al. in [4], has fine-grained control and can approximate the arbitrarily splittable approachÕs performance. However, it has to explicitly change every routerÕs routing table to rule out some feasible next hops for each destination, thus introducing ‘‘hand-crafting’’ overhead. Moreover, since traffic may split at any location of the network, packets from the same flows may follow very different paths, leading to out-of-order delivery problems that deteriorate the performance of TCP flows. The original evenly splittable approach suffers from the same problem. The unsplittable approach does not have this problem, but it can not achieve good traffic engineering results because its traffic control is too coarse. Compared with all these existing approaches, in general, our k-set method has the following advantages: (1) Its traffic control granularity is tunable by changing the number of traffic sets that it uses. (2) It does not require any significant changes to the existing protocols, so it is practical and applicable. (3) Due to the edge-only traffic splitting, it can enforce one flowÕs packets to follow the same path, therefore eliminating the TCP performance deteriorating problem. (4) Finally, it follows the ‘‘smart edge, simple core’’ design rule of the Internet, therefore, it retains the same scalability and stability of the current Internet.
Table 1 Comparisons between different traffic engineering approaches (including our approach) Flow based explicit routing
Destination based hop-by-hop forwarding
Traffic-set based hop-by-hop
Ar-sp
Un-sp
Ar-sp
Ev-sp, ext.
Ev-sp
Un-sp
Lmt-sp
Control granularity Applicability (Protocol support) Scalability Traffic from same flow follows same path? Splitting location (edge or core nodes) Routing complexity
Finest Yes MPLS
Medium Yes MPLS
Fine No
Fine Yes OSPF
Medium Yes OSPF
Coarse Yes OSPF
Tunable Yes OSPF
Low No
Low Yes
Medium No
High No
High No
High Yes
High Yes
Any
None
Any
Any
Any
None
Edge only
P
NPC
P
NPC
NPC
P/NPC
NPC
Reference(s)
[11]
[11]
[10]
[4]
[3,4,10]
[1,2,15,17]
This paper
J. Wang et al. / Computer Networks 48 (2005) 605–625
3. Traffic engineering in OSPF networks: fundamentals
609
60
50
3.1. Network model A network is defined as a directed graph G = (V, E), where V is the node set and E is the link set. Ve V is the edge node set which contains all edge nodes 4 in the network. All the other nodes are core nodes. We use (x, y) to denote a link from node x to node y. For any link (x, y) 2 E, c(x, y) represents its capacity and f(x, y) represents the amount of traffic load that actually goes through the link. Then the utilization 5 of link (x, y), denoted by u(x, y), is defined as u(x, y) = f(x, y)/c(x, y). Furthermore, we use D to denote the traffic demand matrix among edge nodes. For any x, y 2 V, D(x, y) = dxy gives the traffic de-
4
Edge nodes are the nodes generating and absorbing traffic. Edge nodes have better information of individual flows. Therefore, they are capable of performing fine-grained control on traffic. 5 In some papers, link utilization is also called the relative congestion of that link. Note that here link utilization could be larger than 1 because the amount of traffic demand that is put onto a link could exceed the linkÕs capacity. When a linkÕs utilization is larger than 1, it means that the link is overloaded or congested, and some portion of the traffic on this link will be dropped. Note that how to handle such congestion is the task of congestion control or queue management, and is out of the scope of this paper. Our load balancing goal is to minimize the possibility that some links are overloaded due to unbalanced traffic allocation/routing in the network. Therefore, when we pre-compute the traffic allocation during the traffic engineering procedure, if a linkÕs utilization is close to or exceeds 1, we will assign a very large cost/weight on that link to discourage the further use of it.
40
Link cost
In this section, a network model will be presented. Based on the network model, some fundamentals of traffic engineering (TE) will be introduced, which include the commonly used cost and objective functions for quantitatively evaluating and comparing different TE methods [3,4]. We will also define the general optimum of traffic engineering and introduce a method to obtain the general optimum based on linear programming. The general optimum will be used as a benchmark later when our k-set TE method is evaluated.
30
20
10
0
0
0.2
0.4
0.6
0.8
1
1.2
Link utilization u(e)
Fig. 1. Link cost u(u(e)) as function of link utilization u(e).
mand from x to y. If x and y are not edge nodes, i.e., x 62 Ve or y 62 Ve, then dxy = 0. We assume that D is given. 6 3.2. Cost function and optimization objective Being independent of the specific traffic engineering techniques, both cost function and optimization objective can be defined in different ways. For comparability, we use the commonly used objective and cost functions as in [3,4]. For any link e 2 E, suppose its capacity is c(e) and the total traffic load on it is f(e). (Then u(e) = f(e)/c(e) is the utilization on link e.) The objective function and the piece-wise linear cost function are defined as follows: X U¼ uðuðeÞÞ; e2E
where for all e 2 E; uð0Þ ¼ 0 and 8 1 for 0 6 x < 1=3; > > > > > 3 for 1=3 6 x < 2=3; > > > < 10 for 2=3 6 x < 9=10; u0 ðxÞ ¼ > 70 for 9=10 6 x < 1; > > > > > 500 for 1 6 x < 11=10; > > : 5000 for 11=10 6 x < 1:
ð1Þ
6 How to get such traffic matrices is out of our research scope in this paper. Good references are [18–20].
610
J. Wang et al. / Computer Networks 48 (2005) 605–625
To better illustrate how the cost function evolves with link utilization, the evolution curve is shown in Fig. 1. The basic idea is that the higher the link utilization is, the more expensive the link becomes. We impose very heavy penalties on a link when its utilization approaches 1.0 to prevent the link from being overloaded. For our optimization objective, we first sum up the link costs of all links, then minimize it. Note that different optimal results may exist if we use different traffic engineering techniques. For example, if we use flow-based traffic engineering (such as in MPLS), we can achieve the best result, because flow-based technique has the finest control and the smallest constraint in terms of how traffic is split and routed. 3.3. General optimum with arbitrary traffic splitting If we are allowed to split and allocate traffic freely among any paths between sources and destinations without any additional constraints, we can obtain the best traffic engineering solution that is called the general optimum (OPT for short) [3,4]. The total cost under the general optimum solution is denoted by U . Note that the general optimum will be the target that we want to approach and compare against when the k-set constraints are imposed later. Given a network G = (V, E), the edge node set Ve, the traffic demand matrix D = {dit}, and the cost and objective functions defined above in Eq. (1), we can obtain the general optimum (U ) by solving the following linear program. Note that although the general optimum can be obtained theoretically in a polynomial time, in reality, it is not achievable in OSPF networks even with ECMP, because traffic in such networks can NOT be split and allocated freely. Find variables fijt, which satisfy X U ¼ MIN u ði;jÞ2E ij Xn Xn subject to fijt
fjit ¼ d it ; i 2 V ; j¼1 j¼1 t 2 V e ; i 6¼ t X uij ¼ fijt =cij ; t2V e
fijt P 0;
ði; jÞ 2 E;
ði; jÞ 2 E; t 2V; ð2Þ
uij P uij ;
ði; jÞ 2 E;
2 uij P 3uij ; 3
ði; jÞ 2 E;
uij P 10uij
16 ; 3
uij P 70uij
178 ; 3
uij P 500uij
ði; jÞ 2 E; ð3Þ ði; jÞ 2 E;
1468 ; 3
uij P 5000uij
16318 ; 3
ði; jÞ 2 E; ði; jÞ 2 E:
Note that the constraints in (3) define cost on each link, which is an implementation of the cost function defined in Eq. (1). Solving this linear program gives a traffic allocation {fijt} on each link (i, j) such that the objective function is minimized. We can also obtain the corresponding link weights for constructing shortest path routing by solving the dual of the linear program [10]. However, we are only interested in the optimal cost U for comparison with our k-set TE method later.
4. k-Set TE method: overview Traffic can not be split and routed freely in OSPF networks. OSPF ECMP is not good enough to approximate the general optimum (OPT) because of the limitation of equal splitting. The basic idea of our k-set TE method is that (1) we virtually divide a given physical network into k overlays, (2) partition the traffic into uneven k traffic sets only at edge nodes and identify these k traffic sets by using the ToS/DS field in IP heads, and (3) route the traffic sets independently over the k virtual overlays on top of the given physical network. Note that the traffic splitting and allocation happen only at edge nodes, but such splitting and allocation can be uneven among traffic sets to achieve better load balancing. Since the edge nodes have much less traffic to handle than the core nodes do, uneven traffic splitting is feasible at the network edge. (This is exactly the same argument as used by the DiffServ model for push-
J. Wang et al. / Computer Networks 48 (2005) 605–625
ing intelligence and complexity to network edge.) Moreover, since the edge nodes have flow information, they can partition traffic more precisely following flow boundaries. No further traffic splitting or re-allocation is allowed in the core. If we look at only one traffic set at a time, say Traffic Set 1, it is routed exactly the same way as in an OSPF network without ECMP splitting. That is, for every destination node t, the routes that Traffic Set 1 travels from all the sources to t form a shortest path spanning tree rooted at t. Fig. 2 shows simple examples of the k-set TE method and the general optimum (OPT) method. In the given original topology, s and t are the only two edge nodes. Suppose the traffic demand is from s to t. Then, the OPT method with arbitrary splitting can use at least five distinct routes to split the traffic. The routes are hs, a, ti, hs, b, ti, hs, c, ti, hs, c, b, ti, and hs, c, d, ti. There are two splitting locations: s and c (a core node). If we use the kset TE method, then the possible results are shown in the two bottom sub-figures, with k = 2 and 4, respectively. In the case where k = 2, we have to choose only two distinct routes to perform traffic engineering. One possible choice is hs, a, ti and hs, c, ti. Similarly, if k = 4, then we can split traffic into four routes: hs, a, ti, hs, b, ti, hs, c, ti, and hs, c, d, ti, as shown in the sub-figures. In both cases of k-set TE method, we can see that only one splitting location, s, exists, which is an edge node. Note
s
c
d
b
t Original topology
a
s
c
d
b
t
a
s
k-set method (k = 2)
c
d
b
t
a
s
OPT method (arbitrary splitting)
c
d
b
t
a
k-set method (k = 4)
Location of traffic splitting
Fig. 2. Simple example to show k-set TE method and OPT method.
611
that in the case of k = 4, it looks like a split happening at node c, but it does not. This is because the two traffic sets, routed independently on hs, c, ti and hs, c, d, ti, have already been partitioned and allocated by node s before they reach c. That is, although the two traffic sets physically share the same link (s, c), they are separated virtually. From this example, we can observe that the OPT method (with arbitrary splitting) may achieve the best TE result because it can split traffic into more streams and at any place in a network, thus having finest control granularity. We can also imagine that the larger k we use, the better result the k-set TE method may achieve. Several questions may arise. Is it possible for the k-set TE method to achieve the same result of the general optimum? If so, how large k should be? If a small k is used, then what is the best result that the k-set method can achieve? We call this best result the k-set optimum and denote its cost by U k . (Note that this k-set optimum is not the general optimum, but the optimum under k-set constraint.) Finally, what is the performance gap between the k-set optimum and the general optimum (i.e., the difference between U and U k )? We will focus on these questions next in Section 5. Briefly, our target of traffic engineering is the general optimum U . As we have discussed in Section 3.3, U can be achieved theoretically by solving a linear program. Since this method requires arbitrary traffic splitting, it is infeasible in real OSPF networks because only even traffic splitting is allowed in such networks. By using our k-set TE method for a given k, we expect that U k is as close to U as possible. Actually, we will show in Theorem 1 (Section 5) that if k is sufficiently large, then the k-set TE method can achieve the general optimum (i.e., U k ¼ U if k is sufficiently large). Otherwise if k is not sufficiently large, then we will study the gap between U and U k , and will give a worst-case constant bound of U k =U in Theorem 2. Our bound is tight and is the same as the bound given by Fortz and Thorup in [3]. Furthermore, if k is small, we will show that searching for the k-set optimum U k is NP-hard. Therefore, we have devised a heuristic algorithm to approximate U k for a small k (Section 5.3). We denote the result of
612
J. Wang et al. / Computer Networks 48 (2005) 605–625
Fig. 3. Roadmap of the k-set TE method.
b k . That is, U b k is the dithe heuristic algorithm by U rect approximation to Uk . Since the gap between U and U k is very small in real networks (Section bk 5.2), we expect that through approaching U k , U 7 is also a good approximation to the general optimum U (our final target). Finally, we summab k in Fig. rize the relationship between U , U k , and U b 3. Uk can be regarded as a bridge from U k to U . 5. k-Set traffic engineering method In this section, we will elaborate on the k-set TE method by answering the questions raised in the previous section (Section 4). We will first formulate the problem of how to achieve the k-set optimum for a given k, which is called the k-set TE problem and is proven to be NP-hard. Then, we will study the questions associated with the number of traffic set, k. Finally, we will present our heuristic solution to the k-set TE problem, followed by some implementation issues and discussions.
constraint, the problem arises as how we split traffic at each source node and how we route these k traffic sets, so that the objective U, defined in Eq. (1), is minimized. We use U k to denote the minimal U under the k-set constraint. (Note that U k is optimal under the k-set TE method and it is not necessary to be the general optimum, U .) In the next section, we will determine the number of traffic sets, k, and the gap between U k and the general optimum U achieved by using the TE method proposed in [10] where traffic is arbitrarily splittable. The k-set TE problem is NP-hard. In terms of the NP-hardness, let us consider a special case of the k-set TE problem, where k = 1 and there is only one destination node. Even this special case is NP-hard because it is exactly a variation of the minimal-cost Steiner tree problem, which is a well-known NP-hard problem. The k-set TE problem can be mathematically formulated as a mixed integer programming problem. Given an IP network G = (V, E) and the demand matrix D = {dstjs, t 2 V}, 8 and assuming V = {1, 2, . . . ,n} is the set of nodes, we define the following three groups of decision variables in Eq. (4), where T at denotes the spanning tree for Traffic Set a, rooted at t and covering all edge nodes in the network G: d ait ¼ trafficðdemandÞallocation to Traffic Set a at edge node i; ( xaijt ¼
k X
d ait ¼ d it ; 8i; t 2 V e ;
a¼1
1; ði; jÞ 2 E; t 2 V e ; ði; jÞ 2 T at ; 0; otherwise;
fijta ¼ traffic fraction of Traffic Set a on link ði; jÞ for destination t; 8ði; jÞ 2 E; t 2 V e : ð4Þ
5.1. k-Set TE problem formulation The k-set TE problem is the fundamental problem behind our k-set TE method. Under the k-set 7 The performance will be evaluated in Section 6 by simulations.
Note that xaijt are binary variables, and fijta ¼ 0 if xaijt ¼ 0. Finally, based on the objective and cost functions given in Eq. (1), the k-set TE problem is formulated as follows:
8
If nodes s, t are not edge nodes, then dst = 0.
J. Wang et al. / Computer Networks 48 (2005) 605–625
Find variables d ait , xaijt , and fijta , which satisfy X U k ¼ MIN u ð5Þ ði;jÞ2E ij subject to Xn xa ¼ 1; j¼1 ijt
ði; jÞ 2 E; t 2 V e ; i 6¼ t; a ¼ 1; . . . ; k;
Xn
fa
j¼1 ijt
n X
fjita ¼ d ait ;
ð6Þ
i 2 V ; t 2 V e;
j¼1
a ¼ 1; . . . ; k; Xk
da a¼1 it
fijta P 0;
¼ d it ;
i 2 V ; t 2 V e;
ð8Þ
d ait P 0; ði; jÞ 2 E; t 2 V e ; a ¼ 1; . . . ; k;
fijta ¼ 0 if xaijt ¼ 0; xaijt 2 f0; 1g; uij ¼
ð7Þ
ði; jÞ 2 E; t 2 V e ; a ¼ 1; . . . ; k;
ði; jÞ 2 E; t 2 V e ; a ¼ 1; . . . ; k;
k XX fijta : cij t2V e a¼1
ð9Þ
Here we omit the link cost constraints that are similar to Eq. (3). In Eq. (5), we let uij = u(uij). Constraints (6) state that every node has only one outgoing link (next-hop) to forward each traffic set. Constraints (7) are flow conservation constraints. Both (6) and (7) together enforce the tree structure towards each destination for each traffic set. Constraints (8) enforce the proper traffic splitting at each edge node. Eq. (9) define the utilization on each link. 5.2. How many traffic sets do we need? In this subsection, our focus is on how the value of k affects the performance of the k-set TE method. Intuitively, the more traffic sets we use, the better result we get. However, due to the scalability and applicability factors, this number should be kept small. Then, natural questions are: (1) Can we achieve, using the k-set method with a certain k, the general optimal result of arbitrary traffic splitting? If so, what the optimal k should be? (2) How many traffic sets do we need to achieve a reasonably good sub-optimal result? We will address these questions as follows.
613
First, theoretically, we prove that the k-set TE method can achieve U (the general optimum of arbitrary traffic spitting) if k is sufficiently large. Theorem 1. For any given topology G = (V, E) and traffic matrix D, there always exist large enough kÕs and corresponding traffic allocation strategies such that the k-set method achieves the general optimum U (defined in Eq. (2)). Furthermore, the smallest value of such optimal kÕs should be less than or equal to jEj. That is, we can always find a k 6 jEj such that U k ¼ U . Proof. The proof is given in Appendix A. h Although theoretically we have proved that an optimal k can always be found to achieve U k ¼ U , this k might be too large to be practical. If given a fixed small k, since U k is the best approximation to U by using the k-set TE method, we would like to know that U k =U (i.e., the gap between the k-set optimum and the general optimum) is bounded by a constant. Theorem 2. Given a fixed k (k is less than the value that achieves the general optimum), if the cost and objective functions in Eq. (1) are used, then U k =U 6 5000, and the upper bound is tight. Proof. The proof is given in Appendix B.
h
The constant bound that we have obtained in Theorem 2 is a worst case upper bound in theory, which equals the bound given by Fortz and Thorup in [3]. In practice, the difference between U k and U may not be that big. In what follows, we will test the actual gap between U k and U in real networks by solving both the mixed integer program in Eqs. (5)–(9) and the linear program in Eq. (2). We generate a network topology using BRITE [21] topology generator with the Waxman model. The topology consists of 20 nodes, and 14 of them are edge nodes (70% of the total). The number of links is 80. Link capacities are randomly generated in [10,1024]. The traffic matrix is randomly generated using Method 1 in Section 6.1. In summary, the setup is the same as that of simulation Case 1 in Section 6.1 (see Table 2),
614
J. Wang et al. / Computer Networks 48 (2005) 605–625
Table 2 Setup of the three simulation cases
Case Case Case Case a
1 2 3.1 3.2
# Nodes
# Edge nodes
# Links
Link capacities
Traffic matrix
k
100 100 200 200
70 70 100 100
400 400 800 800
Unif(10,1024) Fixed to 100a Unif(10,1024) Unif(10,1024)
Unif(1,maxCap(x, y)) (Method 1) Unif(1,maxCap(x, y)) (Method 1) Unif(1,maxCap(x, y)) (Method 1) Randomly, with hot spots (Method 2)
2, 2, 2, 2,
4 4 4 4
The exact number here is not important because we will scale the traffic matrix accordingly.
5
x 10
*
*
Cost (normalized by OPT, in logarithm): log ( φ / φ*)
2.5
OPT: 1-set: 2-set: 4-set:
φ φ φ φ
3
1 * 2 * 4
Cost ( φ )
2
1.5
1
0.5
0 0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
Total traffic demand (normalized by the total capacity)
(a)
0.4
OPT 1-set 2-set 4-set
1.6 1.4 1.2 1 0.8 0.6 0.4 0.2 0 0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
Total traffic demand (normalized by the total capacity)
(b)
Fig. 4. Performance comparison between general optimal solution and k-set method, with k = 1, 2, 4 (20 nodes, 14 edge nodes, 80 links). (a) Cost in real values and (b) normalized by OPT, in logarithm.
except for the smaller network size. We use the ILOG CPLEX [22] as the optimization solver. Fig. 4 shows the results of the experiments. The x-axis represents the overall input traffic load that is scaled and normalized by the total amount of capacity (sum of capacities of all links). In Fig. 4a, the curves of U , U 2 and U 4 are collapsed together. To show the differences clearer, in Fig. 4b, we illustrate the normalized cost (by U ) in logarithm (i.e., the curves represent log U k =U , k = 1, 2, 4). Even so, the differences are still very difficult to recognize. In fact, our data shows that U 4 is almost the same as U . The results clearly show that in a realistic network with a small size, a fairly small k, say 2 or 4, is able to achieve very good performance as long as we can find the optimal solutions to the k-set TE problem. However, since this problem is NP-hard in nature, it is not
practical to obtain such optimal solutions for large-scale networks. Therefore, we will propose a heuristic algorithm next in Section 5.3 to search for good near-optimal solutions quickly. Its performance in large-scale networks will be studied by simulation in Section 6. 5.3. Heuristic solution to k-set TE problem In the previous subsection, we have shown that if we find the optimal solutions to the k-set TE problem, then the k-set TE method indeed approaches the general optimal traffic allocation very well, even when k is very small, for example, k = 2, 4 (only 1 or 2 ToS/DS bits in the IP header). But for large-scale networks, it is not practical to find such optimal solutions to the k-set TE problem because it is NP-hard. Therefore, we would like to
J. Wang et al. / Computer Networks 48 (2005) 605–625
design a heuristic solution to this problem in this subsection. Our heuristic solution consists of two steps. The basic idea is first to search for k nearoptimal overlays each for one traffic set, then to optimize the traffic allocation among these overlays. Step 1: Search for the k near-optimal virtual overlays (routing solution) each for one traffic set. Given a topology G = (V, E), the edge node set Ve and traffic demand matrix D = {dij}, the following pseudo-code presents the algorithm sketch. Since we use a basic Dijkstra-based shortest path algorithm, the Inverse Capacity Shortest-Path algorithm (InvCap for short), as our search engine, we call our algorithm the k-set-InvCap in Algorithm 1. Algorithm 1. k-SET-INVCAP (G, Ve, D, k) {Find k virtual overlays} B W
0 {Init bandwidth usage matrix B = {bij} to 0} {1/cij} {Init link weight to reciprocal of link capacity: 1/cij} for a = 1 to k do Pa DIJKSTRA (G, Ve, W) {Find all paths for ath overlay} D UPDATE-LINK-USAGE(B,2a ,Pa) {Subtract this overlay from network} W {1/(cij bij)} end for
In Algorithm 1, B = {bij} and W = {wij} represent the link bandwidth usage matrix and the link weight matrix, respectively (recall that bij is the bandwidth usage on Link (i, j) and wij is the weight of Link (i, j)). The function call DIJKSTRA (G, Ve, W) returns a virtual overlay among all edge nodes, consisting of shortest paths among them in terms of the given link weight matrix W. The function call UPDATE-LINK-USAGE (B,D 0 ,P) increases each linkÕs bandwidth usage by routing the given demand matrix D 0 onto the given overlay P. That is, for any edge nodes s and t, there exists a path pst 2 P, and we increase each on-path linkÕs bandwidth usage by the given demand from s to t (i.e., d 0st ). As shown in the algorithm, we in fact divide the traffic demand into k matrices in an exponential way:
D¼
615
D D D D þ þ þ k 1 þ k 1 : 2 4 2 2
At each time, we find an overlay and update the link usage by deploying corresponding fraction of the traffic demand onto that overlay. The algorithm we use to find the overlay at each iteration is simply the DijkstraÕs algorithm with link weight equal to reciprocal of its residual bandwidth; i.e., if wij is the weight of (i, j), then wi,j = 1/(cij bij). Note that the computation of the weight matrices is solely based on the link capacities {cij} and the traffic demand matrix D, which is static and has nothing to do with the traffic allocation. Therefore, the static weight matrices can be assigned to the links and be used to construct the routing table at each node in a decentralized manner (see more detailed discussion in Section 5.5.3). Step 2: Determine the traffic allocation among the k traffic sets based on the virtual overlays (routing solution) found in Step 1. Now we have found the k near-optimal virtual overlays for traffic sets. What we want next is to find the best traffic allocation among these traffic sets. That is, Given fixed xaijt , find variables d ait , fijta , which satisfy X b k ¼ MIN u U ði;jÞ2E ij subject to n Xn X a f
fjita ¼ d ait ; i 2 V ; t 2 V e ; a ¼ 1; . . . ; k; ijt j¼1 j¼1
d ait
fijta
P 0; P 0; ði; jÞ 2 E; t 2 V e ; a ¼ 1; . . . ; k; Xk d a ¼ d it ; i 2 V ; t 2 V e ; a¼1 it
fijta ¼ 0 if xaijt ¼ 0; ði; jÞ 2 E; t 2 V e ; a ¼ 1; . . . ; k; uij ¼
X t2V e
Xk
fijta : a¼1 c ij ð10Þ
Again, we omit the link cost constraints in Eq. (10) which are similar to Eq. (3). Note that Eq. (10) is a linear, not a mixed integer program because xaijt are fixed constants (0 or 1) now. At this step, we find the traffic fd ait g for each node i. Note Pk allocation a that a¼1 d it ¼ d it , dit is the total traffic demand from i to t. If i is a core node, then fd ait g is always
616
J. Wang et al. / Computer Networks 48 (2005) 605–625
equal to 0, because no traffic is generated at i, i.e., dit = 0 for any destination t. The network operation should enforce every edge node to comply with the traffic allocation fd ait g.
5.5. Implementation issues and discussions 5.5.1. Use of bits in ToS/DS field The ToS field was first defined in RFC 791 [8], which describes one entire byte (8 bits) in an IP header. LetÕs write it as [b0b1b2b3b4b5b6b7]. The three most significant bits of the ToS field, [b0b1b2], are called the precedence that defines the priority or importance of an IP packet. For example, Precedence 7 ([b0b1b2] = [1 1 1]) refers to ‘‘Network Control,’’ Precedence 6 ([b0b1b2] = [1 1 0]) refers to ‘‘Internetwork Control,’’ and Precedence 0 ([b0b1b2] = [0 0 0]) refers to ‘‘Routine.’’ The rest of the ToS field is used as follows: [b3] specifies delay (0 = Normal; 1 = Low), [b4] specifies throughput (0 = Normal; 1 = High), [b5] specifies reliability (0 = Normal; 1 = High), and [b6b7] is reserved for future use. Since the one-byte ToS field has been almost completely unused since it was defined almost 20 years ago, DiffServ was proposed and the ToS field was redefined and superseded by the Differentiated Service (DS) field to support multiple service classes [6,9]. Basically, the DiffServ standard maintains backward compatibility with RFC 791, but allows more efficient use of [b3b4b5]. That is, Precedence 7 and 6 remains the same; Precedence 5 is redefined as the ‘‘Express Forwarding (EF);’’ Precedences 4–1 define four ‘‘Assured Forwarding (AF)’’ classes; and Precedence 0 identifies the best effort traffic. In addition,
5.4. Example Fig. 5 illustrates an example of how the heuristic algorithm works. In the original topology, there are totally 6 nodes, 3 of which are edge nodes (s1, s2, t). Each link has capacity of 1 except c(k, t) = 2. d s1 t ¼ 1 and d s2 t ¼ 1 are the only traffic demands. At Step 1, we first use the Inverse Capacity Short Path (InvCap) scheme to find the overlay for Traffic Set 1, which is shown in the upper figure in the middle. Then we update the bandwidth usage on each link of the Traffic Set 1 overlay by increasing the amount of 1/2. On the ‘‘residual’’ topology, we apply the InvCap once again and find the second overlay for Traffic Set 2, which is shown in the bottom figure in the middle. As we can see, both overlays have a tree structure. At Step 2, we optimize the traffic allocation using the two overlays we found at Step 1. By solving the linear program given in Eq. (10), we have: d 1s1 t ¼ 2=3, d 2s1 t ¼ 1=3, d 1s2 t ¼ 2=3, and d 2s2 t ¼ 1=3, as shown in the right hand side graph. The total cost is 3uð2=3Þ þ 4uð1=3Þ ¼ 5 13. It is easy to verify that this is exactly the general optimum.
Residual capacity Link capacity
1
s1
i k
1
s1
1 2
k 1
t
s1
j
1
1 j 1 Overlay for Traffic Set 1
i k
Original topology
s2
1 s2 t
t j
d s22t = 1 3
t j
Overlay for Traffic Set 2 Step 1: searching for overlays
i 2 3
2 3 d =
s2 s1
d s11t =
k
s2 t
1
s2
1
1/2
i
1
d s21t = 1 3
1
1/2
Traffic allocation: d s11t = 2 3 2 S1 to t, Traffic Set 2: d s1t = 1 3 1 S2 to t, Traffic Set 1: d s2t = 2 3 S1 to t, Traffic Set 1:
S2 to t, Traffic Set 2:
d s22t = 1 3
Step 2: optimizing traffic allocation on given overlays
Fig. 5. Example of how k-set heuristic algorithm works.
J. Wang et al. / Computer Networks 48 (2005) 605–625
[b3b4] defines further priority granularity to AF classes by specifying different dropping probabilities. Finally, [b5] is always 0 and [b6b7] is still ignored. As we can see, both the ToS and the DS definitions ignore the last two bits ([b6b7]). We can use these two bits to identify the traffic sets in our kset TE method. On one hand, since these two bits are not interpreted even by DiffServ enabled routers, our k-set TE method will not interfere with DiffServ or any other existing router configurations. On the other hand, since we focus only on an intra-domain environment, configuring routers to recognize the traffic sets (the unused bits in ToS/ DS field) is implementable. Finally, although only 2 bits can be used (supporting up to 4 traffic sets), we will show later in Section 6 that 4 traffic sets are sufficient. Another choice is to use [b3b4b5] when [b0b1b2] = [0 0 0]; i.e., we use [b3b4b5] to identify traffic sets only for best effort packets. This is because these three bits are not (or rarely) used by best effort traffic. Now, we can only split best effort traffic to achieve load balancing. However, in sight of the dominant amount of best effort traffic in todayÕs Internet, it is still a feasible choice. 5.5.2. Heuristic algorithms for the k-set TE problem In Section 5.3, we proposed one heuristic algorithm for the k-set TE problem. Actually, since the routes (overlays) and traffic allocation are precomputed by the network operator [3], we may apply more advanced heuristic search techniques, such as simulated annealing, Tabu, or genetic algorithms, to handle the problem. However, according to our simulation results, the performance of the heuristic algorithm is not that critical. As long as the resultant k overlays are fairly good, the overall performance of the k-set TE b k ) will be suffimethod (with respect to the cost U ciently close to the general optimum U . This is due to the optimization of the traffic allocation at Step 2, after the k overlays are found. 5.5.3. Traffic allocation and routing table computation Our k-set TE method pre-computes traffic allocation (splitting ratios at each edge node) after
617
finding k virtual overlays (each for one traffic set). The network operator should distribute such splitting ratios to edge nodes. We suppose that each edge node is responsible for enforcing such splitting ratios. Core nodes do not need to know the traffic allocation because there is no traffic splitting there at all. Their responsibility is to classify packets into traffic sets and forward them to corresponding next hops based on both the destination addresses and the ToS/DS field. This multi-field packet classification and routing is feasible due to the latest advances in IP router design [23,24]. There are two ways to handle the routing table construction. Remember that all the routes are actually pre-computed by the network operator. So the first way is to distribute the routing tables directly to each node in the network. However, this centralized method may not scale well. And, it may not handle link failures well, because if failures happen, all the routing tables need to be recomputed centrally by the network operator and re-distributed to all the routers. In light of the drawbacks of the centralized way of constructing routing tables, the second way is based on a decentralized mechanism. Note that in our k-set TE method, we find routes/overlays before we allocate traffic onto the routes/overlays. 9 And more importantly, only static weights are used to find the routes/overlays. As we can see in Algorithm 1, only the link capacities {cij} and the traffic matrix D = {dij}, which are assumed to remain static, are needed to determine the k sets of link weights to find the k overlays and the associated routes. Therefore, the network operator can assign the k sets of weights (which are all static) to the links in the network. The link weights are broadcasted among nodes as in a classic OSPF network. The only difference is that k weights are associated with each link instead of just one in the classic OSPF network. Then each node can reconstruct the routes for each overly (by shortest path searching based on an appropriate
9
This is opposite to the method proposed in [10] where traffic allocations are first determined and then link weights are resolved by solving associated linear programs.
618
J. Wang et al. / Computer Networks 48 (2005) 605–625
set of weights) and build its own routing tables accordingly. This decentralized method can handle link failures as the classic OSPF network does. Note that this method does require a consistent tie-breaking law for every node to reconstruct the original overlays/routes, but this requirement is not difficult to meet. For example, node IDÕs can be used for this purpose, assuming that they are unique in the same network domain. 5.5.4. Coexistence with legacy nodes In the k-set TE method, we assume that all nodes can interpret the ToS/DS field in packetsÕ IP headers and find correct next hops based on both the destination addresses and the bits. But for legacy nodes, they may not be able to do so. If there are legacy nodes in a network, we should use the k-set TE method more carefully. One way to handle this issue is to add more constraints into our linear programs to accommodate such legacy nodes. The performance might not be as good as in a network where no legacy node exists, though. Another interesting issue associated with the coexistence is how to find critical nodes on which we deploy our new method to obtain maximal performance gain, if we are allowed to deploy our k-set method only on part of the nodes in a network. We will do more research on both issues in our future work.
6. Performance evaluation of the heuristic k-set TE method 6.1. Simulation setup We use the topology generator BRITE [21] to randomly generate our network topologies, where the Waxman model is used and the nodes are placed according to the heavy-tail distribution. Edge nodes are randomly selected from the entire node set. Link capacities are randomly generated between the interval [10, 1024]. The traffic demand generation among all the edge nodes is more complicated. We use two different methods in our study: one is based on random demand generation using uniform distributions;
the other uses random demand generation with modeling of hot spots [3]. The second method is supposed to be more realistic and have larger variation. Method 1. In this method, for each pair of edge nodes x and y, we first perform a Dijkstra-based routing to find the capacity of the widest path from x to y, denoted by maxCap(x, y). Then the traffic demand from x to y is randomly generated within [1.0, maxCap(x, y)] following uniform distribution. Method 2. We use the demand generating method proposed by Fortz and Thorup in [3]. Briefly, for each node x, we generate two random numbers Ox, Dx 2 [0,1]. Further, for each pair (x, y) of nodes, we pick a random number C(x, y) 2 [0,1]. Then the traffic demand from x to y is determined by aOx Dy C ðx;yÞ e dðx;yÞ=2D ; where a is a scaling parameter, d(x, y) is the Euclidean distance from x to y, and D is the largest Euclidean distance between any pair of nodes. ‘‘The Ox and Dy model that different nodes can be more or less active senders and receivers, thus modeling hot spots on the net’’ [3]. The factor e d(x, y)/2D models the fact that there is higher demand between close pairs of nodes. This method generates traffic matrices with higher variations. To simulate different traffic demand load, we first use one of the above methods to generate a base traffic matrix. Then we scale the base traffic matrix up or down by different ratios to generate the different traffic matrices with the load that we want. The same scaling scheme was also used in [3] and [4]. Since the distribution of link capacities is critical in traffic engineering, we design the first two simulation cases by varying the assignment of link capacities. In both cases, the total number of nodes is 100, 70% of which are edge nodes. The edge nodes are randomly selected from the entire node set. The number of links is 400. In the first case, the link capacities are randomly generated between the interval [10, 1024]. We use this case
J. Wang et al. / Computer Networks 48 (2005) 605–625
to mimic a very heterogeneous link capacity environment. In the second case, we let all the links be equally capacitated, which is an extreme case of very high homogeneity. During the proof of Theorem 2, we suspect that our approach prefers heterogeneous networks and might not work very well in a homogeneous environment. However, the simulation results show clearly that our k-set TE method performs consistently well in both cases, even when k is set to be small. We test k = 2 and k = 4 for each case. When k = 4, the k-set TE method almost achieves the general optimum U (where arbitrary traffic splitting is allowed). In the third simulation case, we increase the number of nodes, edge nodes, and links to 200, 100, and 800, respectively, to study the performance in a larger scale network. In this case, we also test the performance under two different traffic matrix setups. Table 2 summarizes the three simulation cases that we will focus on. To evaluate the performance, we are interested in not only the overall cost of the network U (which is the objective to optimize), but also the maximum and variance of link utilization. We find that besides U, the variance of link utilization is also a good metric to evaluate how well a TE approach performs. A good approach should be able to balance traffic among all links, hence the utilization variance should be small. Again, we use ILOG CPLEX as a tool to solve the traffic allocation linear program at Step 2 in Section 5.3. 6.2. Simulation results and analysis 6.2.1. Case 1. A heterogeneous environment with link capacities randomly generated within [10, 1024] Fig. 6 shows the simulation results of Case 1. The x-axis represents the overall input traffic load that is scaled and normalized by the total amount of capacity (sum of capacities of all links). In this case, first we observe that the hop-count shortest path method (‘‘Hop-cnt SP’’ in Fig. 6) performs extremely bad. The max link utilization goes beyond 1, even when the traffic demand is very light. We trace the result and find that, there are
619
a few ‘‘stringent’’ links in the network that have very small capacities but are in critical positions; i.e., they are on shortest paths between some edge nodes. Therefore, even when traffic load is light, these stringent links are congested, making the overall cost and max link utilization large. This result demonstrates the effectiveness and necessity of traffic engineering. Fig. 6a presents the cost U versus scaled traffic load. For comparison, we also show the result of the Inverse Capacity OSPF (InvCap) in the graph. Actually, it equals to the k-set method with k = 1. All curves have the exponential-like shape because of the cost function we use (Eq. (1)). As we can see, our k-set method (k = 2, 4) constantly outperforms both Hop-cnt SP and InvCap. Moreover, when we set k = 4, it approximates the general optimum. Even when k = 2, its performance is still acceptable. Fig. 6b and c show the maximum and variance of link utilization, respectively. They also confirm the effectiveness of the k-set method very well. Lastly, remember that the extra overhead incurred by the k-set method is only 1 or 2 bits in IP header if k = 2 or 4. 6.2.2. Case 2. A homogeneous environment with equal link capacities We design this simulation case because we suspect that the k-set method might degrade if link capacities are homogeneously distributed. In this case we consider the very extreme case where all links have the same capacity. Then the Hop-count SP and the InvCap are equivalent because all links have the same weight in terms of the reciprocal of their capacities. This is confirmed in Fig. 7 where both methods produce the same curves with respect to all metrics. Similarly, the k-set method successfully approximates the general optimum under all traffic load conditions with respect to the cost, max link utilization, and the variance of link utilization. All results in the three sub-figures verify that the k-set method is still effective even when links have homogeneous capacities. 6.2.3. Case 3. A larger scale network with link capacities randomly generated within [10, 1024] We study performance issues on larger scale networks in this part. The topology setup is the
620
J. Wang et al. / Computer Networks 48 (2005) 605–625 5
6
x 10
5
Cost (φ )
4
3
2 OPT Hop-cnt SP InvCap 2-set 4-set
1
0 0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
Total traffic demand (normalized by the total capacity)
(a) 2.5
2 OPT Hop-cnt SP InvCap 2-set 4-set
1.8 1.6 Variance of Link Utilization
Maximum link utilization
2
1.5
1
OPT Hop-cnt SP InvCap 2-set 4-set
0.5
0 0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
Total traffic demand (normalized by the total capacity)
(b)
1.4 1.2 1 0.8 0.6 0.4 0.2
0.4
0 0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
Total traffic demand (normalized by the total capacity)
(c)
Fig. 6. BRITE 100 nodes, 70 edge nodes and 400 links, very heterogeneous link capacities. (a) Cost (U), (b) maximal link utilization and (c) variance of link utilization.
same as in Case 1 except that the number of nodes, the number of edge nodes and the number of links are increased to 200, 100, and 800, respectively. Link capacities are randomly picked in [10, 1024]. In order to examine the performance of the k-set TE method under different traffic distributions, we design two sub-cases (Case 3.1 and Case 3.2) where different traffic demand generation methods are used. In Case 3.1, traffic demand between each pair of edge nodes, x and y, is randomly generated within a certain interval [1.0, maxCap(x, y)] uniformly (see Method 1, Section 6.1); while in Case
3.2, the more sophisticated scheme is used, with larger variations and the modeling of hot spots (Method 2, Section 6.1). Fig. 8a and b depict the simulation results for Case 3.1 and Case 3.2, respectively. As we can observe from both figures, the k-set TE method achieves very close results to the general optimum (OPT) when k is set to 2 or 4, no matter which traffic distribution is used. We can then conclude that our k-set TE method performs consistently well even in large-scale networks under different traffic distributions.
J. Wang et al. / Computer Networks 48 (2005) 605–625
621
5
2
x 10
OPT Hop-cnt SP InvCap 2-set 4-set
1.8 1.6 1.4
Cost ( φ)
1.2 1 0.8 0.6 0.4 0.2 0 0
0.05 0.1 0.15 0.2 0.25 0.3 0.35 Total traffic demand (normalized by the total capacity)
0.4
(a) 2.5 OPT Hop-cnt SP InvCap 2-set 4-set Variance of Link Utilization
Maximal Link Utilization
2
1.5
1
0.05
OPT Hop-cnt SP InvCap 2-set 4-set
0.5
0 0
0.1
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0 0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
Total traffic demand (normalized by the total capacity)
Total traffic demand (normalized by the total capacity)
(b)
(c)
0.4
Fig. 7. BRITE 100 nodes, 70 edge nodes and 400 links, equal link capacities. (a) Cost (U), (b) maximal link utilization and (c) variance of link utilization.
6.2.4. Some discussions (1) An interesting observation from Figs. 6b and 7b is that, the general optimal method (OPT) does NOT always achieve the optimal max link utilization. But it is reasonable because the general optimal method is optimal only with respect to the overall network cost U, not the max link utilization. Actually, Fig. 9 illustrates a very simple but convincing example. If every link has capacity 1 and the only traffic demand from s to t is 1, then the general optimal TE solution is to put 1/3 traf-
fic onto hs, a, ti and 2/3 onto hs, ti. The total cost is 2 (U = 2u(1/3) + u(2/3) = 2) and the max link utilization is 0.667. However, if we put 1/2 traffic onto both paths, then we have total cost of 2.5 (U = 3u(1/2) = 2.5), but we get a lower max link utilization 0.5. In fact, not only the max link utilization, if we consider some other metrics, such as the average link utilization, then the OPT method does not necessarily achieve optimal results. (2) All simulation results show that the k-set method can achieve performance very close
622
J. Wang et al. / Computer Networks 48 (2005) 605–625 6
8
7
8 OPT Hop-cnt SP InvCap 2-set 4-set
6
6
5
5
4
3
2
2
1
1
0.1
0.2
0.3
0.4
0.5
0.6
0.7
OPT Hop-cnt SP InvCap 2-set 4-set
4
3
0 0
x 10
7
Cost (φ)
Cost (φ)
6
x 10
0 0
0.1
Total traffic demand (normalized by the total capacity)
0.2
0.3
0.4
0.5
0.6
0.7
Total traffic demand (normalized by the total capacity)
(a)
(b)
Fig. 8. Larger scale network: BRITE 200 nodes, 100 edge nodes and 800 links, heterogeneous link capacities, with different traffic distributions. (a) Cost (U): the traffic demands among edge nodes are randomly generated according to a uniform distribution. (b) Cost (U): the traffic demands among edge nodes are randomly generated with hot spots [3] (larger variations).
Optimal w.r.t. cost 0.333
s
a 0.667
Non-optimal w.r.t. cost
0.333
0.5
t
s
a 0.5
0.5
t
Fig. 9. Example to show why general optimal method does not achieve optimal max link utilization.
to the general optimum when k = 4. This strongly suggest that in real OSPF networks, k = 4 is large enough to obtain good nearoptimal performance and no larger k is needed. (3) Finally, in terms of the route computation and traffic allocation, the k-set method enjoys very low overhead. Actually, at Step 1 (searching overlays), it takes only k times as much time as the DijkstraÕs algorithm to converge; at Step 2 (traffic allocation), it also takes only polynomial time to solve the linear program. In our 100-node cases, typically it takes seconds for both steps to converge. The worst case of Step 2 takes several minutes. Our computation platform was the SUN Blade-1000 workstation with a 600 MHz SPARCV9 CPU.
7. Conclusion and future work Traffic engineering is crucial for load balancing and QoS provisioning in todayÕs Internet, especially in OSPF networks. Almost all existing approaches rely on the ECMP feature of OSPF and split traffic anywhere in a network. In this paper, we propose a novel, edge-based traffic engineering approach for OSPF networks, which classifies and splits traffic only at the network edge instead of in the core. The performance issue was addressed both theoretically and by simulation. The results showed clearly that our approach performs well even with a very small number of traffic sets (2 or 4). Finally, we conclude our work by presenting our contributions in this paper: (1) We presented a new approach to traffic engineering in OSPF networks. It is simple, flexible and efficient. It strictly
J. Wang et al. / Computer Networks 48 (2005) 605–625
follows the ‘‘smart edge, simple core’’ design rule of the Internet. It is also beneficial to TCP applications because it keeps ‘‘same path for same flow’’. (2) We proved that our k-set approach achieves the general optimum if k is large enough. On the other hand, if k is given small, we provided a constant worst-case performance bound. (3) We developed a new heuristic algorithm to solve the k-set traffic allocation and routing problem behind our new approach. Simulation results confirmed its effectiveness. For the future work, we would like to investigate the two-fold coexistence problem with legacy nodes in a network. On one hand, we will modify our approach to accommodate legacy nodes where the new approach is not deployed yet. Performance will be evaluated in the new scenario. On the other hand, if we are allowed to choose only some of the nodes in a network to deploy our new approach, we will find critical nodes to achieve maximal performance gain. Acknowledgment The authors would like to thank William Yurcik, National Center for Supercomputing Applications (NCSA), University of Illinois at Urbana-Champaign, for his valuable comments and discussion. Appendix A. Proof of Theorem 1 One straightforward observation is given in Lemma 1. Lemma 1. If we can use k traffic sets to achieve the general optimal traffic allocation (i.e., U k ¼ U ), then we can always use any k 0 > k traffic sets to achieve the general optimum, too. This is because we can pick up any set of traffic and split it further into two traffic sets virtually. But physically, these two traffic sets follow the same routes and the overall traffic allocation remains intact. Thus, the general optimum is still achieved by the k + 1 traffic sets. We can iterate this splitting process until we reach k 0 traffic sets.
623
Then, we will show that the k-set TE method can achieve U (the general optimum of arbitrary traffic splitting) if k is large enough as follows. Proof of Theorem 1. First, we prove the single destination case. Assuming the general optimal traffic allocation is given and there is only one destination node t, then we can easily construct a traffic graph (t-graph) by eliminating links without traffic allocation from the original topology. We then perform the following iterative operations. We pick up any source node, say s1, and find a path from s1 to t. Such path must exist because otherwise the flow conservation rule is broken. Along the path we find the ‘‘bottleneck’’ link(s), say e1, where the traffic allocation is the smallest among all the on-path links. We assign the path and the traffic allocation to a traffic set, say Traffic Set 1, and reduce the same amount of traffic allocation on each on-path link (so the flow conservation rule still holds after the reduction). The ‘‘bottleneck’’ link is reduced to 0 so it is eliminated from the t-graph. We can iterate this process until all links are eliminated in t-graph. At this time we have achieved the general optimal traffic allocation by using the traffic sets we have found. As we can see, at each iteration we assign a new traffic set and eliminate at least one link. Since there are at most jEj links in the t-graph, so the number of traffic sets we have used must be smaller than or equal to jEj. Now let us consider multiple destinations. Since we are under the destination-based routing paradigm, traffic allocation and routing for different destinations does not interfere with each other. Therefore, we can repeat the above process for one destination at a time. Then, we may find that for destination ti, k(ti) traffic sets are needed to achieve the general optimum. Therefore, by Lemma 1, max{k(ti)} must be able to achieve the general optimum for the entire network. Finally, because k(ti) 6 jEj, we can guarantee to find a k, k 6 jEj, which achieves the general optimum. h Actually, we can use another way to prove the multi-destination case by adding a virtual destination node T to the t-graph and repeat the proof for the single destination case.
624
J. Wang et al. / Computer Networks 48 (2005) 605–625 All the |E| edges are used to balance the traffic from s to t.
s
t
TE with arbitrary splitting
Only k edges (out of |E| ) can be used to balance the traffic from s to t.
s
t
k-class TE method
Fig. B.1. Worst case scenario where only k out of jEj edges can be used by the k-set TE method to balance traffic.
Appendix B. Proof of Theorem 2 Now we prove that there is a worst case performance bound of the k-set TE method and the bound is tight. We prove it only for the singlesource single-destination (SSSD) case, because all other cases can be converted into the SSSD case by adding one virtual source S, one virtual destination T, and a set of virtual links to connect all sources and destinations to S and T, respectively. Proof of Theorem 2. We construct an SSSD case shown in Fig. B.1, where all links have the same capacity, c. Assuming the cost and objective functions given in Eq. (1) are used, and the total demand from s to t is d. In the general optimal traffic engineering with arbitrary splitting, the total cost will be
d d d U ¼ jEju ¼ ; P jEj cjEj cjEj c because according to Eq. (1), the smallest value of the piece-wise slope of the cost function is 1 (i.e., u 0 (x) P 1 and u(0) = 0). Similarly, we can calculate the cost if the k-set TE method is used:
d d d U k ¼ ku 6 5000k ¼ 5000 ; kc kc c because according to Eq. (1), the largest value of the piece-wise slope of the cost function is 5000 (i.e., u 0 (x) 6 5000 and u(0) = 0). Then we can easily find out that the worst case performance bound is 5000, i.e.,
U k 6 5000: U It is easy to verify that this is the worst case scenario, because if the links have different capacities or are shared by multiple paths, then our k-set TE method can obtain better results by choosing better paths (with larger capacities). Finally, since the worst case scenario is reachable, we have proved both the upper bound and the tightness of the bound. h References [1] J. Moy, OSPF Version 2, RFC 2328, April 1998. [2] G. Apostolopoulos, D. Williams, S. Kamat, R. Guerin, A. Orda, T. Przygienda, QoS Routing Mechanisms and OSPF Extensions, RFC 2676, August 1999. [3] B. Fortz, M. Thorup, Internet traffic engineering by optimizing OSPF weights, in: Proceedings of IEEE INFOCOM 2000, Tel-Aviv, Israel, 2000. [4] A. Sridharan, R. Guerin, C. Diot, Achieving near-optimal traffic engineering solutions for current OSPF/IS-IS networks, in: IEEE INFOCOM 2003, San Francisco, CA, 2003. [5] A. Riedl, Optimized routing adaptation in IP networks utilizing OSPF and MPLS, in: Proceedings of IEEE ICC 2003, Anchorage, Alaska, 2003. [6] S. Blake et al., An Architecture for Differentiated Services, RFC 2475. [7] K. Nichols, V. Jacobson, L. Zhang, A Two-bit Differentiated Services Architecture for the Internet, RFC 2638. [8] Internet Protocol, RFC 791. [9] F. Baker, D. Black, S. Blake, K. Nichols, Definition of the Differentiated Services Field (DS Field) in the IPv4 and IPv6 Headers, RFC 2474. [10] Y. Wang, Z. Wang, L. Zhang, Internet Traffic engineering without full mesh overlaying, in: Proceedings of INFOCOM 2001, 2001.
J. Wang et al. / Computer Networks 48 (2005) 605–625 [11] Y. Wang, Z. Wang, Explicit routing algorithms for internet traffic engineering, in: Proceedings of ICCCNÕ99, 1999. [12] D. Awduche et al., Requirements for Traffic Engineering Over MPLS, RFC 2702, September 1999. [13] J. Wang, S. Patek, H. Wang, J. Liebeherr, Traffic engineering with AIMD in MPLS networks, in: Proceedings of the Protocols for High-Speed Networks Workshop, 2002, pp. 192–210, citeseer.nj.nec.com/512747.html. [14] S. Yilmaz, I. Matta, On the scalability-performance tradeoffs in MPLS and IP routing, in: Proceedings of SPIE: Scalability and Traffic Control in IP Networks, 2002. [15] J. Wang, K. Nahrstedt, Hop-by-Hop Routing Algorithms For Premium Traffic, ACM Computer Communication Review 32 (5) (2002) 73–88. [16] J. Wang, K. Nahrstedt, Hop-by-hop routing algorithms for premium-class traffic in DiffServ networks, in: Proceedings of IEEE Infocom 2002, 2002. [17] J. Wang, L. Xiao, K.-S. Lui, K. Nahrstedt, Bandwidth sensitive routing in DiffServ networks with heterogeneous bandwidth requirements, in: Proceedings of IEEE ICC 2003, Anchorage, Alaska, 2003. [18] A. Feldmann, A. Greenberg, C. Lund, N. Reingold, J. Rexford, Deriving traffic demands for operational IP networks: Methodology and experience, IEEE/ACM Transactions on Networking 9 (2001) 265–278. [19] Y. Zhang, M. Roughan, C. Lund, D. Donoho, An information-theoretic approach to traffic matrix estimation, in: Proceedings of ACM SIGCOMMÕ03, Karlsruhe, Germany, 2003. [20] Y. Zhang, M. Roughan, N. Duffield, A. Greenberg, Fast accurate computation of large-scale IP traffic matrices from link loads, in: Proceedings of ACM SIGMETRICSÕ03, San Diego, CA USA, 2003. [21] A. Medina, A. Lakhina, I. Matta, J. Byers, Brite: an approach to universal topology generation, in: Proceedings of the International Workshop on MASCOTS Õ01, Cincinnati, Ohio, 2001. [22] ILOG CPLEX, in: http://www.ilog.com/products/cplex/. [23] P. Gupta, S. Lin, N. McKeown, Routing lookups in hardware at memory access speeds, in: IEEE Infocom 1998, San Francisco, vol. 3, 1998, pp. 1240–1247. [24] P. Gupta, N. McKeown, Packet Classification on Multiple Fields, ACM Computer Communication Review 29 (1999) 147–160.
Jun Wang received the B.S. and M.Engr. degrees in computer science and technology from Tsinghua University, Beijing, China, and the Ph.D. degree in computer science from the University of Illinois at UrbanaChampaign in 2003. From 2003 to 2004, he was a postdoctoral associate with the National Center for Supercomputing Applications (NCSA) and the Department of Computer Science
625
at University of Illinois at Urbana-Champaign. He is currently a senior security engineer at NCSA, University of Illinois at Urbana-Champaign. His research interests include computer networks and data communications, network survivability and security, network QoS, multimedia systems, and distributed systems.
Yaling Yang received the B.S. degree in telecommunications from the University of Electronic Science and Technology of China, Chengdu, Sichuan, China, in 1999. She is currently a Ph.D. student at University of Illinois at Urbana-Champaign. Her research interests include network QoS and resource management.
Li Xiao (SÕ02) received his B.S. and M.Eng. degrees in automatic control from Tsinghua University, Beijing, China. He also received the M.S. degree in computer science from the University of Illinois at Urbana-Champaign, where he is currently a Ph.D. candidate. His research interests are computer networks and data communication, with focus on network routing, QoS and network resilience.
Klara Nahrstedt is an associate professor at the University of Illinois at Urbana-Champaign, Computer Science Department. Her research interests are directed towards multimedia middleware systems, quality of service(QoS), QoS routing, QoS-aware resource management in distributed multimedia systems, and multimedia security. She is the coauthor of the widely used multimedia book ÔMultimedia: Computing, Communications and ApplicationsÕ published by Prentice Hall, recipient of the Early NSF Career Award, the Junior Xerox Award, and the IEEE Communication Society Leonard Abraham Award for Research Achievements. Since 2001, she is the editor-in-chief of the ACM/Springer Multimedia Systems Journal, and the Ralph and Catherine Fisher Associate Professor. He received her BA in mathematics from Humboldt University, Berlin, in 1984, and M.Sc. degree in numerical analysis from the same university in 1985. She was a research scientist in the Institute for Informatik in Berlin until 1990. In 1995 she received her Ph.D. from the University of Pennsylvania in the Department of Computer and Information Science.