Computer Communications 35 (2012) 970–979
Contents lists available at SciVerse ScienceDirect
Computer Communications journal homepage: www.elsevier.com/locate/comcom
A Local Fast-Reroute mechanism for single node or link protection in hop-by-hop routed networks Hui-Kai Su ⇑ Dept. of Electrical Engineering, National Formosa University, Yunlin 632, Taiwan, ROC
a r t i c l e
i n f o
Article history: Received 22 March 2010 Received in revised form 13 February 2012 Accepted 15 February 2012 Available online 24 February 2012 Keywords: Network protection Network survivability Fast reroute Network routing Internet protocol
a b s t r a c t Network survivability has become one of the most important QoS (Quality of Service) parameters in IP network-based applications, particularly with regard to real-time multimedia applications. IP-based protection that enable recovery from failure in just a few milliseconds can provide greater network resilience than traditional routing recovery or other lower-layer recovery technologies. This paper proposes an IP protection scheme, called IP Local Fast-Reroute (IPLFRR), for single node or link protection. This scheme works in an intra-area routing domain, providing a simple and efficient solution to improve the survivability of IP networks. Unlike MPLS Fast-Reroute, which requires an extra MPLS layer and related protocols, the proposed scheme is applicable to a network employing conventional IP routing and forwarding. Moreover, our mechanism is capable of preventing service disruptions and packet loss caused by the transient loops that normally occur during reconvergence of the network following a failure. Because the backup next-hops are predetermined, service interruption can be limited to a few milliseconds, which is on par with the failure detection time. Simulation results show that IPLFRR is capable of improving network survivability, following the failure of a single node or link. Ó 2012 Elsevier B.V. All rights reserved.
1. Introduction The internet (IP network) is a hop-by-hop routed network. The recent maturation of the internet has seen all-IP solutions applied in numerous communication networks. For real-time broadband multimedia applications, network availability is one of the important QoS (Quality of Service) parameters in IP transport networks. A set level of service must be guaranteed, regardless of the scale, duration, and type of failure. The two main approaches used to improve network resilience in the IP layer are IP restoration and IP protection. IP restoration attempts to find a new route by which to restore connectivity once a failure has occurred [1], e.g., Interior Gateway Protocol (IGP) routing recovery. IP protection, the intention of which is to achieve rapid recovery from failure (in just a few milliseconds), is based on fixed and predetermined failure recovery, where the selection of the next-hop is performed in conjunction with the identification of a backup for the next-hop. However, IP protection differs from the mechanisms of lower-layer failure recovery, such as a SONET Protection Switch or MPLS fast re-route techniques, due to the distinct routing characteristics of packet-switching networks and circuit-switching networks. In IP networks, the packet forwarding information is aggregated within
⇑ Tel.: +886 5 6315619; fax: +886 5 6315609. E-mail address:
[email protected] 0140-3664/$ - see front matter Ó 2012 Elsevier B.V. All rights reserved. doi:10.1016/j.comcom.2012.02.007
the next-hop; therefore, the concept of a backup path is unable to deal with the affected packets when a link or node failure occurs. Since the ARPANET was first established, IP networks have had the feature of restoration. Recently, many protection and restoration schemes have been provided in the lower layers in IP networks, e.g., SONET APS (Automatic Protection Switching) and MPLS Fast-Reroute [2]. Due to its distributed and connectionless architecture, an IP network is much more difficult to protect than connection-oriented networks. However, present networks are unable to satisfy the critical requirements of the growing number of real-time multimedia applications. There are three key reasons. The first reason is that current approaches to IP restoration take too long. The speed and volume of transmissions have increased, leading to an increase in packet loss when a link or node fails. Recovery times are depicted in Fig. 1. In a typical link-state routing protocol, the time to recover encompasses failure detection, propagation of the failure information, and convergence to new routes. Failure detection time depends on the physical layer or the hello routing protocol. When failure is detected on the physical layer, it may take only a few milliseconds. Propagation delay and flooding delay are the key determinants of the propagation of failure information, typically consuming 10 ms to 100 ms per hop. Finally, an SPF (shortest path first) algorithm computes the new routes and installs them into the routing table; however, total convergence time may be up to several tens of seconds depending on network size.
H.-K. Su / Computer Communications 35 (2012) 970–979
Link failure detection
Propagation of the failure information
Convergence to new routes
Reaction to the failure with IP protection scheme Recovery time of IP routing convergence Recovery time of IP protection
Fig. 1. The recovery times of IP routing convergence and IP protection.
Any link or node failure in a routed network disrupts the delivery of traffic until the network routes re-converge on the new topology. Packets may be dropped or enter a loop if their forwarding paths traverse the failed component. Such disruptions often last several tens of seconds or longer, and approximately 54% of network failures exceed 1 min [1]. Most applications have been constructed to tolerate short-time failures, but such disruptions are intolerable for interactive real-time applications and non-interactive real-time streaming applications, such as Voice over IP (VoIP), Video on Demand (VoD), and P2P streaming applications. The jitter buffer (or playout buffer) in real-time applications is designed to address packet jitter and network congestion in IP networks and can tolerate only a brief transport disruption, e.g., 200 ms to 450 ms for VoIP, and 1 s to 30 s for VoD. Therefore, IP restoration alone appears incapable of providing adequate network survivability for real-time applications; protection mechanisms are also needed. IP protection is capable of preserving the flow of data during IP restoration. This limits the period of disruption to the time required for failure detection and the reaction of the IP protection mechanism, such that IP restoration can be performed as usual. Finally, after the IGP converges, packets can be delivered according to the new route. The second reason that present networks are unable to satisfy the requirements of real-time multimedia applications is their inability to detect failures occurring in higher layers, despite the fact that lower-layer protection and restoration may work faster than IP protection. For example, an optical protection mechanism can protect against link failures but not against failure of an IP router or forwarding software. By contrast, higher-layer entities may be able to protect against lower-layer failures if there is an alternate route between communicating entities. The third reason is that traditional IGPs incur packet loops and losses due to transient loops or micro loops [3]. With IGPs, each time the network topology changes, some of the routers modify and update their Forwarding Information Base (FIB) to take into account the new topology. Each change in topology causes a convergence phase. During this phase, routers may have transient inconsistencies in their FIBs, which often cause packet loops and losses, even if the reachability of the destination is not compromised following the change in topology. Packet losses and transient loops are also caused by link down events resulting from maintenance operations, even if this operation is predictable and not urgent. Thus, IP protection mechanisms are necessary to enhance the availability of IP networks. This paper proposes an IP Local Fast-Reroute (IPLFRR) mechanism, with two algorithms for node protection and link protection, IPLFRR-N (IPLFRR for Node protection) and IPLFRR-L (IPLFRR for Link protection), in an intra-area routing domain. The underlying concepts can be found in [4,5]. Our mechanism provides simple, effective protection of IP networks. Unlike MPLS Fast-Reroute, the proposed mechanism is applicable to networks employing con-
971
ventional IP routing and forwarding with the ability to prevent service disruptions and packet loss caused by the transient loops that commonly occur during the re-convergence of the network following a failure. In addition, additional control protocols or enhanced routing protocols are not required, such that current link-state routing protocols will suffice. The candidates for backup nexthop are pre-determined when IP routing converges. In the event that a node or a link fails, the detection by an adjacent node allows local rerouting of the packets to the backup next-hop. Because the backup next-hop has been determined beforehand, the interruption to service can be limited to a few milliseconds. In simulations, only backup next information for each destination needed to be calculated and extra control messages were unnecessary. Simulation results show that most failures were recovered efficiently. The remainder of the paper is organized as follows. In Section 2, we introduce related studies dealing with the protection of IP networks. The IP Local Fast-Reroute framework is introduced in Section 3. In Section 4, details of the underlying mechanism, and the IPLFRR algorithms for node protection and link protection are explained. In Section 5, we present simulations of network resilience. Finally, Section 6 provides our conclusions.
2. Background This paper focuses on IP network protection; however, improvements in IP network resilience in the lower layers can be found in [1,6–10], e.g., SONET/SDH protection, optical network protection, and MPLS Fast-Reroute. Convergence of IP routing can also be enhanced using a novel router architecture, an incremental SPF algorithm, and schemes to prioritize and update IP network prefixes [11]. The issue of IP protection has been discussed since 2002, when a scheme for the precomputation of the second shortest paths was introduced in [11]. In this scheme, the minimum-cost path of each node to every other node in the network is computed. An alternate path is computed only if the primary path becomes unusable due to a failure. An alternative is for each node to anticipate the failure and precompute feasible backup routes to all other nodes. When the primary route fails, the packets are quickly rerouted to the second shortest path; however, in practice it has been difficult to provide an efficient and fast rerouting service capable of avoiding looping. An IP Fast-Reroute (IPFRR) framework [12,13] and loop-free alternate selection scheme [14] were proposed by the IETF Routing Area Working Group. IPFRR is compatible with current link-state routing protocols, such as OSPF and IS–IS. The IPFRR framework introduces three mechanisms to repair paths: equal cost multipaths (ECMP), loop-free alternate paths (LFAP), and multi-hop repair paths. ECMP and LFAP are the simplest methods to repair paths and are considered sufficient for approximately 80% of failures. Multi-hop repair paths are considerably more complex [15], requiring extra control protocols or enhanced routing protocols. It is anticipated that approximately 98% of failures could be repaired using this method. Thus, IP protection schemes can be divided into two types: local and global. LFAP [14] is a local IP protection scheme, in which the packets affected by a failure in a link or node are rerouted directly to the backup next-hop. Local IP protection schemes are simple to implement; however, the protection performance depends on network topology. By contrast, global IP protection schemed are capable of 100% protection for the network; however, complex calculation and complicated algorithms are required. Global schemes typically use multiple routing configurations [16], Notvia address [17,18] or IP tunnels [19,20]. The proposed mechanism falls under the class of local IP protection, in which IPLFRR-L is equivalent to schemes involving loop-
972
H.-K. Su / Computer Communications 35 (2012) 970–979
free alternate paths in the IETF IPFRR framework. The proposed IPLFRR-N achieves results similar to those of multi-hop repair paths in IETF IPFRR frameworks; however, our approach does not require additional controls or enhanced routing protocols. 3. IP Local Fast-Reroute framework An IP network environment is illustrated in Fig. 2. The core of this system includes an IP network with 8 routers where each link is labeled with its cost. Two real-time multimedia applications, a VoIP application and a multimedia streaming application, are also shown. (In principle, other multimedia applications, such as P2P streaming applications, could also be operated on the proposed framework). According to IP routing, the packets for both applications are delivered to their destination along the shortest path. In the event that Router 2 is broken, both applications are interrupted until IGP converges. IP Local Fast-Reroute is used to deal with the short-time disruption, such that the users of real-time applications observe on interruption to service resulting from failures at a single node or link. 3.1. Link-state routing mechanism The proposed IP Local Fast-Reroute mechanism is based on current link-state routing protocols, such as OSPF [21] and OSI’s IS–IS [22]. A link-state routing protocol requires that each router maintain an area map of the network. When a network link changes state (up or down), a notification called a link state advertisement (LSA) is distributed throughout the network. All of the routers note the change and recompute their routes accordingly. If a router receives a new LSA, the SPT (shortest path tree) is rebuilt using an SPF (shortest path first) algorithm, such as Dijkstra’s algorithm, in accordance with the new area map of the network. The SPT is built as a source tree for each destination, after which the routing information is deduced from the SPT and stored into its own routing table. 3.2. IP Local Fast-Reroute mechanism The functional blocks of IP routing in a router are shown in Fig. 3. IP Local Fast-Reroute requires three additional functions in the router. First, the backup next-hop calculation of IPLFRR must be added to the existing link-state routing protocol. After performing the current link-state routing protocol and SPF algorithm to select the primary next-hops, IPLFRR algorithms are executed to select their backup next-hops, whereupon the routing table stores the information related to the backup next-hop. Finally, if the nexthop for the destination address is unavailable, the backup next-hop is referenced and the affected packets are rerouted to the backup next-hop as rapidly as possible. In this paper, we assume that all nodes are located in the same routing area. Each router with link-state routing protocols maintains an identical database describing the topology of the entire
3
3
IP Phone R6
3
3
R3
R5
Media Player
2
5
R0
R2
R4 A Hop-by-Hop Routed Network (An IP Network)
Fig. 2. An IP network environment.
4. Backup next hop selection for IP Local Fast-Reroute 4.1. Destination routing characteristic Once the topology and link state of the IP network have stabilized, traffic flowing from every ingress node to a particular egress node can be represented as a destination tree. Fig. 5 shows the destination tree to node N0. The network topology can be modeled as a graph GðN; LÞ where N is the router set and L represents the set of physical or logical links. Pðo; dÞ represents the path set of an Original–Destination (O–D) pair, where o 2 N; d 2 N and o – d. b dÞ, However, the traffic is delivered along the shortest path Pðo; b where Pðo; dÞ # Pðo; dÞ for 8o 2 N; 8d 2 N and o – d. Additionally, the shortest path must satisfy Eq. (1). L bP ðo;dÞ is defined as the link b dÞ; Lp is defined as the link set of an set of the shortest path Pðo; O–D path p, and wi is defined as the link cost/weight of the link i.
X
i2L
wi 6
bP ðo;dÞ
wj ;
where p 2 Pðo; dÞ;
8d 2 N;
8o 2 N;
and o – d
ð1Þ
Due to the distributed routing architecture of an IP network, each node selects its own best next-hop to an egress node, and b dÞ for 8o 2 N and the nodes jointly produce the shortest path Pðo; 8d 2 N. All paths to the egress node d form a destination tree TðdÞ. The notation TðdÞ expresses a simplex destination tree to the egress node d, for example, d ¼ N 0 in Fig. 5. Additionally, TðdÞ can be represented by Eq. (2).
[ l2L
IP Phone
X j2Lp
TðdÞ ¼
11
5
R7
5 3
2
5
Media Server
R1
3
area. A routing table with primary next-hops is calculated from this database by constructing a shortest-path tree. After calculating the primary routing information, the IPLFRR algorithms are executed to select the candidates for the backup next-hop according the topology database. If a node or link failure occurs along the primary route, the affected packets are locally rerouted to the backup next-hop as rapidly as possible and end users perceive no service interruption. An example of node protection using IPLFRR-N is illustrated in Fig. 4(a), which depicts a network sector comprising 8 nodes and 13 links, as well as the link costs, the primary next-hop to destination, and the backup next-hop to that destination. After the routing converges, the compact routing tables store the data related to destination (Des.), primary next-hop (NH), and backup next-hop (BNH), as shown in Fig. 4(b). The destination provided in the routing tables uniquely identifies network prefix and network mask in the real routing table. A backup next-hop of null means that the failure is unrecoverable using IPLFRR. Consider the transmission of packets from N6 to N0. Each node along the routing path fN 6 ; N 3 ; N 2 ; N 0 g forwards the packets to its primary next-hop, according to the normal routing mechanism. Upon N 2 failure, the adjacent nodes of N 2 , which are N 3 and N 4 , are the first to detect the node failure. These upstream adjacent nodes trigger IPLFRR and immediately deliver the packets to their backup next-hops (N 1 and N 3 ). After new routing is established, N 3 and N 4 suspend IPLFRR and deliver the packets using the normal routing mechanism according to the new routing table. Upon the establishment of a new route, new backup next-hops are also selected.
bP ðo;dÞ
l
ð2Þ
;o2Nfdg
This study proposes two IPLFRR algorithms based on the constructed destination routing: one for node protection and the other is for link protection. The behaviors of single node failure and single link failure are explained below.
973
H.-K. Su / Computer Communications 35 (2012) 970–979
Fig. 3. The functional blocks of IP routing in a router.
N3
3
N1
3
N6
… ...
N7
5
11
N4
BNH 7 7 7 null 5 null null
Node7 Dest. NH 0 4 1 4 2 4 3 4 4 4 5 6 6 6
BNH 6 6 6 6 null 4 null
Fig. 4. (a) An example of node protection with IPLFRR scheme, and (b) compact routing tables of each node.
3
3
3
N5 5
N2 2
N7
5
N4
N6
3
N3
11
(b) 3
N5 5
N7
N1
3 3
3
N0
3
5
Node6 Dest. NH 0 3 1 3 2 3 3 3 4 3 5 5 7 7
5
BNH 4 4 4 null null null 3
BNH 1 null null null null null 6
5
N2
2
Node5 Dest. NH 0 3 1 3 2 3 3 3 4 4 6 6 7 6
BNH null 0 null null 4 4 3
N1
3
N3
3
2
BNH 3 2 null null null 5 null
BNH null 0 null 0 0 0 0
(a) N6
Node3 Dest. NH 0 2 1 1 2 2 4 4 5 5 6 6 7 4
5
BNH null null 1 1 1 1 1
Node2 Dest. NH 0 0 1 3 3 3 4 4 5 3 6 3 7 4
11
Fig. 5. A destination tree of traffic flows in an IP network.
Primary next hop to node N0 Backup next hop to node N0 Node1 Dest. NH 0 0 2 3 3 3 4 3 5 3 6 3 7 3
N0
3
2
N4
5
5
Node4 Dest. NH 0 2 1 3 2 2 3 3 5 5 6 3 7 7
N7
2
(b) Node0 Dest. NH 1 1 2 2 3 2 4 2 5 2 6 2 7 2
5
N0
3
N2
2
...
5
3
N5
5
N2
2
5
N5
…
…...
3
3
3
3
5
3
N1
3
N3
3
5
(a) … ... …... N6
3
N0
2
N4
11
Fig. 6. (a) Divided subtrees while single node failure occurs, and (b) a new destination tree of traffic flows with IPFFRR-N scheme.
4.1.1. Single node failure In the first scenario, if node N i fails, the TðdÞ where d 2 N is divided into m þ 1 subtrees, such that m is equal to the number of upstream nodes adjacent to the failure node N i on TðdÞ. For example, i ¼ 2 and m ¼ 2 in Fig. 6 (a). If N 2 failure occurs, TðN 0 Þ is separated into T 0 ðN 0 Þ; T 0 ðN 3 Þ and T 0 ðN 4 Þ. If a node failure occurs, the nodes in TðdÞ to the same egress can be classified into four categories:
Pseudo failure node: In our scheme, the primary next-hop is assumed to be a Pseudo failure node for the selection of the second best feasible next-hop as the candidate for backup nexthop. For example, N 2 is a Pseudo failure node of N 3 and N 4 to N0. Aware nodes: Aware nodes are upstream nodes adjacent to the failure node on TðdÞ. Aware nodes are capable of immediately sensing failure in the next-hop according to linked physical sig-
974
H.-K. Su / Computer Communications 35 (2012) 970–979
N1
3
N3
3
5
3
3
3
N5
N2
2
5
N0
3
2
N7
5
N4
N6
3
N3
11
(b)
N1
3
5
3
3
3 2
4.1.2. Single link failure In the second scenario, if a bidirectional link fN i ; N j g fails, where fN i ; N j g 2 L, the TðdÞ is divided into two subtrees. For example, the bidirectional link fN 3 ; N 2 g failure in Fig. 7(a) causes the division of TðN 0 Þ into T 0 ðN 0 Þ and T 0 ðN 3 Þ. However, many logical links may be aggregated into one physical link. The failure of a physical link may induce multiple logical link failures, and all of the affected logical links would be marked to Pseudo failure links. Multiple-link failure also can be resolved by applying the proposed method. This situation is similar to the node protection shown in the previous sub-section. The TðdÞ is segmented into multiple subtrees, such that if the aware nodes of each broken subtree can connect directly to the subtree of T 0 ðdÞ or via other subtrees of the aware nodes, multiple-link failure is avoided. However, to simplify the description, we explain only the condition of single link failure below. Single link failures differ from single node failures in that the directed link to the next-hop is defined as the Pseudo failure link on
N6
5
Three characteristics can be observed in the destination tree TðdÞ. First, a failure in the Pseudo failure node only influences the routes of its upstream nodes to the egress, comprising the aware nodes and the upstream unaware nodes. Second, this failure is immediately detected only by the adjacent nodes, i.e., the aware nodes. Third, before exchanging information related to the failure, the upstream unaware nodes do not know of the failure and continue forwarding packets according to the obsolete routing information. Because only the aware nodes are capable of detecting failures before the exchange of routing information, they are responsible for performing the Local Fast-Reroute. The new path P0 ðo; dÞ is conb dÞ for 8o 2 N. Therestructed by the aware nodes to replace Pðo; fore, if the subtrees of the aware nodes (T 0 ðN 3 Þ and T 0 ðN 4 Þ in our example) are able to connect directly to the subtree of T 0 ðdÞ or via other subtrees of the aware nodes, P0 ðo; dÞ for 8o 2 N exists for IP Local Fast-Reroute on the aware node. Otherwise, P 0 ðo; dÞ for 8o 2 N is nonexistent and IPLFRR will fail as a result of the node failure event. There are two constraints on the aware nodes in the selection of the best feasible backup next-hop. First, the new path P0 ðo; dÞ must not involve the failure node. Second, if the subtree of the aware nodes is unable to connect to T 0 ðdÞ or other aware nodes, the packets must not be rerouted to the upstream unaware nodes of the disconnected subtree. The second requirement prevents the rerouted packets from entering a loop. It should be noted that the aware nodes locally reroute packets to the backup next-hop as rapidly as possible during IGP convergence. This prevents any disruption in IP transport service. For example, in Fig. 6(b), the aware nodes N 3 and N 4 can locally reroute the packets to their backup next-hop to N0, in the event that node N 2 fails.
(a)
5
nals, link detection in the data-link layer, or fast hello protocols in the network layer. For example, if N 2 fails, N 3 and N 4 are the Aware nodes on TðN 0 Þ. Upstream unaware nodes: Upstream unaware nodes are upstream nodes that are not directly connected to the failure node. Until information related to the failure has been exchanged, the upstream unaware nodes do not know of the failure and continue forwarding packets according to the obsolete routing information. For example, if N 2 fails, N 5 ; N 6 and N 7 are the Upstream unaware nodes on TðN 0 Þ. Unaffected nodes: unaffected nodes (also called as downstream unaware nodes) are nodes downstream of the failure node on TðdÞ. The failure of a node does not disrupt the routes in the unaffected nodes, and packets still can be delivered to their destination using the old routing information. For example, if N 2 fails, N 1 is the unaffected node.
N5 5
N7
5
N2
3
N0
2
N4
11
Fig. 7. (a) Divided subtrees while single link failure occurs, and (b) a new destination tree of traffic flows with IPFFRR-L scheme.
TðdÞ instead of the Pseudo failure node. For example, the link fN 3 ; N 2 g is a Pseudo failure link of N 3 on TðN 0 Þ. A destination tree is divided into two subtrees as a result of link failure. Aware nodes, upstream unaware nodes, and unaffected nodes are defined in the same manner as in the single node failure. Once the link fN i ; N j g fails, it is initially detected by both edge nodes associated with the link. The upstream unaware nodes and the unaffected nodes continue forwarding packets to the destination d according to the original primary route. Therefore, if the subtrees T 0 ðN i Þ are able to link to the subtree of T 0 ðdÞ; fN i g can locally reroute the packets to the backup next-hop during IGP convergence, without disrupting the service. For example, in Fig. 7 (b), the aware node N 3 locally reroutes packets to N0 when the link fN 3 ; N 2 g fails. 4.2. IPLFRR-N algorithm A formal algorithmic description of the IPLFRR-N algorithm is shown in Fig. 8, and the notation is explained in Table 1. After the calculation and exchange of routing information, IPLFRR-N is performed to determine backup next-hops for each destination network address by node nself , where nself 2 N. At the initiation of the algorithm, all backup next-hops ðnextb ðiÞÞ to each egress node ði; 8i 2 N nself Þ are set to null. For each egress node i, the procedure is divided into three phases. The first phase involves the prediction of failure, the analysis of topology, and the classification of nodes. The second phase involves the division of a destination tree into subtrees due to the failure of the primary next-hop. Finally, a repair of the destination tree is attempted, at which point the feasible backup next-hop is selected. In the first phase, a destination tree of d is built based on the well-known shortest path algorithm, whereupon all of the aware nodes, upstream unaware nodes, unaffected nodes, and pseudo failure nodes for the primary next-hop on TðdÞ are classified. Additionally, in the application of this scheme to a realistic network, the equal cost multi-path (ECMP) technique has been considered in the algorithm. ECMP is capable of balancing traffic flow in networks, but it may involve the routing loop while the failure of nodes or links occurs. Consistency in the flow of traffic through each node is easily controlled in the proposed scheme, as is the selection of the best single path in conventional IP routing mechanisms. For example, if multiple alternative paths exist, the next-hop with maximal node ID is selected. Hence, the destination trees of the
H.-K. Su / Computer Communications 35 (2012) 970–979
975
Fig. 8. IP Local Fast-Reroute algorithm for node protection.
same egress calculated for each node can be synchronized, and the issue of looping associated with ECMP is avoided. In the second phase, the primary next-hop (pseudo failure node) N f is assumed to fail, and the destination tree TðdÞ is divided into jN a j þ 1 subtrees. Moreover, all of the connected links of the N f are removed from the temporal link sets of related nodes. In the third phase, an attempt is made to link the root nodes of each subtree to the destination tree T 0 ðdÞ directly or via other subtrees of aware nodes. Finally, a feasible backup next-hop (nextb ðdÞ ¼ predðnself Þ) can be determined. However, if no feasible backup next-hop to egress node d for next p ðdÞ failure can also be selected, IP protection fails, and next b ðdÞ ¼ null. Thus, the complexity of the algorithm is OðjNj3 Þ, making the mechanism suitable even for small-scale networks.
4.3. IPLFRR-L algorithm The IPLFRR-L algorithm in Fig. 9 is similar to the IPLFRR-N algorithm. The differences are explained below. First, a pseudo failure link Lf between next p ðdÞ and nself is used instead of the pseudo failure node in the first phase. This enables aware nodes, upstream unaware nodes, and unaffected nodes to be classified as well. Second, according to the pseudo failure link Lf , the destination tree TðdÞ is divided into two subtrees, and the pseudo failure link is removed from the temporal link sets of nself and next p ðdÞ. Finally, in the same manner, an attempt is made to repair the destination tree T 0 ðdÞ, whereupon the feasible backup next-hop (next b ðdÞ ¼ predðnself Þ) can be resolved. However, if no feasible backup nexthop to d for the bidirectional link fnself ; next p ðdÞg failure can be
976
H.-K. Su / Computer Communications 35 (2012) 970–979
Table 1 Notations for IPLFRR-N and IPLFRR-L algorithms. Notation
Description
N jNj AðiÞ A0 ðiÞ S
A set of nodes in a topology The total number of nodes in a topology The set of arcs (links) of the node i The temporal set of arcs (links) of the node i for backup next hop calculating The distance label of each node in the set S is optimal
S nself nextp ðdÞ nextb ðdÞ N TðdÞ Na N ua N un Nf Lf C ij disðiÞ predðiÞ run
The distance label of each node in S is the shortest path length from the root provided that each internal node in the path lies in S A node that is performing the IPLFRR algorithm to decide candidates for backup next hop by itself The primary next hop to the egress node on the node nself The backup next hop to the egress node on the node nself A node set on the tree TðdÞ whose root is node d A set of aware nodes A set of upstream unaware nodes A set of unaffected nodes A set of pseudo failure nodes A set of pseudo failure links The cost of the link fi; jg The distance from the root of the SPT to the node i The predecessor of the node i on a SPT An iterative counter to limit the searching times of linked subtree
selected, IP protection fails and nextb ðdÞ ¼ null. The complexity of the algorithm is OðjNj3 Þ. 5. Simulations The goal of the following simulations was to verify the performance of the IPLFRR-N and IPLFRR-L schemes, according to survivability and protectability. Survivability is defined as the ratio of the survival O–D pairs to all O–D pairs in a network following the occurrence of a failure. However, O–D pairs with failure nodes as destination node or source node are excluded from the calculation. For example, if 10 O–D pairs exist in a network, only 6 O–D pairs survive when a failure occurs. Thus, survivability = 0.6. In addition, protectability is defined as the ratio of protectable O–D pairs to recoverable O–D pairs. As with the above example, after IGP convergence, the 6 failure paths can be repaired. However, according to the protection scheme, only 3 paths can be protected, and their backup paths are predicted prior to failure. Thus, protectability = 0.5. Using the C programming language, we developed our own simulation tool comprising three parts. The first part deals with importing the network topology. The second part deals with the calculation of the routing table, including the calculation of primary next-hop and backup next-hop. Finally, the third part deals with failure testing and performance statistics. The simulations were run under Ubuntu 10.10 and Linux on a personal computer with an Intel (R) Core (TM)2 Quad CPU Q9500 @2.83GHz, 4GB RAM. In the following sub-sections, we present the topology GðN; LÞ and discuss the typical NSFNET network, real ISP backbone networks, and representative internet topologies. After importing the network topology, SPF algorithm, LFAP algorithm, IPLFRR-N algorithm and IPLFRR-L algorithm are executed to calculate the routing tables for each node. In the simulations, the routing table is a compact routing table, and includes the attributes of destination (Des.), primary next-hop (NH), and backup next-hop (BNH) for every mechanism involved in IP network protection. Finally, all of the scenarios for each node/link failure were simulated, and all O–D pairs were tested according to the routing tables. In this manner, the available and unavailable paths are determined and the average survivability and protectability were computed statistically. 5.1. NSFNET network In the computational experiments, we first considered the NSFNET backbone network with 15 nodes and 22 duplex links shown
in Fig. 10 [23]. In the first scenario, we observed the network survivability and protectability for a single node failure. The average network survivability without IPLFRR-N scheme was 0.90; however, IPLFRR-N scheme improved this to 0.97. After IGP converged, the average network survivability was 0.99. Thus, the average protectability of IPLFRR-N in the NSFNET network was 0.81. The simulation results for node failures are listed in Table 2. We first observe that the existence of 2-degree nodes strongly influences the IPLFRR-N performance. According to the IPLFRR-N scheme, the traffic through the 2-degree nodes cannot be rerouted back around adjacent nodes (backup next-hop) to the destination as long as its next-hop is broken, as occurred in the failures of Node 4, Node 5, Node 7 and Node 9. In addition, the protectability of IPLFRR-N also depended on the network topology and the structure of SPT. For example, Node 1, Node 3, and Node 10 were unable to achieve 100% protectability. Thus, with the exception of the nodes adjacent to the 2-degree nodes and a number of nodes restricted by the topology, all node failures were effectively recovered using the IPLFRR-N scheme. In the second scenario, we simulated network routing with a failure at any single link. The average network survivability without the IPLFRR-L scheme was 0.90; however, the average network survivability was improved to 0.94 by the IPLFRR-L scheme. Following IGP convergence, the average network survivability was 0.99. Thus, the average protectability was 0.58. Statistics related to the link failures are listed in Table 3. We can observe that the 2-degree nodes also had a direct influence on the performance of IPLFRR-L. According to the IPLFRR-L scheme, the traffic through the 2-degree nodes cannot be rerouted back around other directed links (connected to backup next-hop) to the destination as long as its directed link fails in the next-hop, as occurred in failures {4, 6}, {6, 7}, {5, 8} and {8, 9}. In addition, the performance was also affected by the network topology and structure of the SPT. With the exception of these special links, all link failures were effectively recovered using the IPLFRR-L scheme prior to IGP convergence.
5.2. ISP backbone networks The topology of the internet is evolving rapidly. In addition to the NSFNET backbone network, we also considered the actual topologies of the recent ISP backbone networks listed in Table 4 and Table 5 [24,25]. First, the use of IPLFRR-N and IPLFRR-L enhanced survivability to a level approaching the survivability after
977
H.-K. Su / Computer Communications 35 (2012) 970–979
Fig. 9. IP Local Fast-Reroute algorithm for link protection.
Table 2 Node protection in 15-node and 22-link NSFNET network.
11
14
10
1
13 3 0
4
6
Failure node
0
1
2
3
4
5
6
7
SIGP P IPLFRR SIPLFRR S0IGP
0.96 1.00 1.00 1.00
0.95 0.60 0.98 1.00
0.95 1.00 1.00 1.00
0.91 0.65 0.97 1.00
0.91 0.69 0.97 1.00
0.80 0.59 0.92 1.00
0.96 1.00 1.00 1.00
0.82 0.50 0.91 1.00
8
9
10
11
12
13
14
0.93 1.00 1.00 1.00
0.69 0.50 0.77 0.86
0.92 0.64 0.97 1.00
1.00 1.00 1.00 1.00
0.91 1.00 1.00 1.00
0.87 1.00 1.00 1.00
1.00 1.00 1.00 1.00
9
7
12
SIGP P IPLFRR SIPLFRR S0IGP
2 5 8
Fig. 10. 15-node NSFNET network.
SIGP : average network survivability before IGP converges, P IPLFRR : average network protectability with IPLFRR-N or IPLFRR-L scheme. SIPLFRR : average network survivability with IPLFRR-N or IPLFRR-L scheme. S0IGP : average network survivability after IGP converges.
978
H.-K. Su / Computer Communications 35 (2012) 970–979
Table 3 Link protection in 15-node and 22-link NSFNET network. Failure link
{0, 1}
{0, 3}
{0, 2}
{1, 2}
{1, 7}
{2, 5}
{3, 10}
{3, 4}
SIGP P IPLFRR SIPLFRR S0IGP
0.94 1.00 1.00 1.00
0.91 0.42 0.95 1.00
0.94 1.00 1.00 1.00
0.96 1.00 1.00 1.00
0.88 0.42 0.93 1.00
0.87 0.48 0.93 1.00
0.88 0.42 0.93 1.00
0.92 0.35 0.95 1.00
SIGP P IPLFRR SIPLFRR S0IGP
SIGP P IPLFRR SIPLFRR S0IGP
{4, 5}
{4, 6}
{5, 8}
{5, 12}
{6, 7}
{7, 9}
{8, 9}
{9, 11}
0.89 0.52 0.95 1.00
0.90 0.40 0.94 1.00
0.89 0.33 0.92 1.00
0.87 0.39 0.92 1.00
0.89 0.33 0.92 1.00
0.80 0.12 0.82 1.00
0.87 0.43 0.92 1.00
0.94 1.00 1.00 1.00
{9, 13}
{9, 14}
{10, 11}
{10, 13}
{11, 12}
{12, 13}
0.86 0.63 0.95 1.00
0.87 1.00 0.87 0.87
0.97 1.00 1.00 1.00
0.89 0.54 0.95 1.00
0.95 0.50 0.98 1.00
0.90 0.50 0.95 1.00
IGP recovery. Second, the use of IPLFRR-N and IPLFRR-L enhanced the protectability of the topologies to beyond 0.95, such that the averages of the topologies exceeded those of the NSFNET backbone network. This enables protection against any single node failure or any single link failure well. Thus, our IPLFRR schemes are suitable for actual IP networks with high connectivity-degree nodes.
Table 5 Link failure on ISP backbone networks. Name
AS no.
Nodes
Links
SIGP
P IPLFRR
SIPLFRR
S0IGP
Sprint Ebone AT&T Level3 Tiscali
1239 1755 7018 3356 3257
44 28 108 53 51
106 66 141 456 129
0.975 0.965 0.976 0.996 0.981
0.947 0.946 0.979 0.998 0.957
0.993 0.994 0.989 1.000 0.995
0.995 0.997 0.99 1.000 0.996
5.3. Representative internet topology In the previous simulation, we observed that performance is largely determined by network topology. We were also interested in the performance of random flat topologies, which are recoverable following the occurrence of single node failure. We employed the BRITE topology generator to produce random flat topologies for the simulation [26]. New nodes are connected to candidate neighbor nodes using Waxman’s probability function ða ¼ 0:19 and b ¼ 0:2) and the total node number jNj, after which a topology GðN; LÞ can be generated. The connectivity of each node dG ðnÞ (where n 2 N) refers to the number of links at node n. Additionally, all link costs are constant and equal, i.e., the same link bandwidth. Fig. 11(a) and (b) show the relationships between protectability and network scalability following a node failure in the random flat topologies, in which the value of each node is equal to 4 and 6, respectively. We observe in the figures the performances of the IPLFRR-N scheme, LFAP scheme, and LFAP using the ECMP scheme [14]. First, with an increase in network scalability, the protectability of the tree schemes decreased. Although many available paths can be found in a large IP network, the IP network is still limited in the traffic throughput along the shortest paths as a result of destination routing. Thus, the performance does not match that of other protection schemes in connection-oriented networks, e.g., MPLS, SONET, and optical networks; however, IP protection for node failure is effective for small area networks. Second, IP protection against node failure is suitable for networks with high connectivity-degree nodes. In a small network, the computational complexity would not be a major factor in the performance of
Table 4 Node failure on ISP backbone networks. Name
AS no.
Nodes
Links
SIGP
P IPLFRR
SIPLFRR
S0IGP
Sprint Ebone AT&T Level3 Tiscali
1239 1755 7018 3356 3257
44 28 108 53 51
106 66 141 456 129
0.96 0.951 0.978 0.985 0.971
0.976 0.994 0.992 1.000 0.994
0.987 0.992 0.986 0.999 0.990
0.988 0.992 0.987 0.999 0.990
Fig. 11. The performance of node protection in random flat topologies.
the IPLFRR-N scheme. Finally, the IPLFRR-N scheme achieved performance superior to the other schemes as well as the previous simulation. Fig. 12(a) and (b) illustrate the relationships between average network protectability and network scalability following the occurrence of a single link failure in random flat topologies, the average connectivity of which is equal to 4 and 6, respectively. It was observed that the performances of the IPLFRR-L scheme was superior to that of the LFAP scheme, particularly in widely connected networks. A number of ECMP next-hops may be feasible as loop-free backup next-hops, but they do not satisfy the criteria of the LFAP scheme. If LFAP is performed using ECMP, the average network protectability is the same as for the IPLFRR-L scheme. In addition, an increase in network scalability led to a decrease in
H.-K. Su / Computer Communications 35 (2012) 970–979
979
In a future study, we will investigate 100% IP network protection and traffic engineering based on the mechanism of IPLFRR. Acknowledgement We would like to thank the National Science Council (NSC) in Taiwan. This research was supported in part by the NSC under the Grant No. NSC-98-2218-E-150-006 and NSC100-2221-E-150077. References
Fig. 12. The performance of link protection in random flat topologies.
the protectability of both schemes. Although many available paths are available in large IP networks, they are still limited with regard to the traffic throughput along the shortest paths due to destination-based routing. Thus, the performance does not meet that of other protection schemes in connection-oriented networks. However, the IPLFRR-L scheme provides a simple and effective solution to the issue of protection. 6. Conclusion This paper proposed an IP Local Fast-Reroute (IPLFRR) mechanism with two algorithms for node protection and link protection in an intra-area routing domain, providing a simple and effective solution to protect multimedia networks. In the proposed mechanism, IPLFRR-L is equivalent to the mechanism of loop-free alternate paths in IETF IPFRR. IPLFRR-N is an instance of multi-hop repair paths in IETF IPFRR; however, additional control protocols are not required if the link-state routing protocol is used. Based on the destination shortest path tree, candidates for feasible backup next-hops can be selected easily in advance to avoid temporary looping problems. This approach is based on the concept of destination SPT; therefore it could be extended to deal with the failure of multiple nodes or links by anticipating such failures. Because IP packets are routed with destination information under the shortest-path constraint, the performance of the IPLFRR schemes depends on the topology of the network and the structure of the SPT. However, our simulations indicate that IPLFFR-L provides efficient protection against link failure, just as IPLFRR-N protects against node failure, and both schemes work well on both a small- and large-scale in an intra-area domain. Finally, the IPLFRR provides a simple, yet reliable, transport service over IP networks.
[1] G. Iannaccone, C.-N. Chuah, S. Bhattacharyya, C. Diot, Feasibility of IP restoration in a tier 1 backbone, IEEE Network 18 (2) (2004) 13–19. [2] P. Pan, G. Swallow, A. Atlas, Fast reroute extensions to RSVP-TE for LSP tunnels, RFC 4090, May 2005. [3] P. Francois, O. Bonaventure, M. Shand, S. Bryant, S. Previdi, C. Filsfils, Loop-free convergence using ofib IETF Draft draft-ietf-rtgwg-ordered-fib-04.txt, 2010. [4] H.-K. Su, C.-S. Wu, Y.-S. Chu, Ip local node protection, in: Proceedings of the Second International Conference on Systems and Networks Communications (ICSNC 2007), 2007. [5] H.-K Su, C.-S. Wu, Local link protection scheme in IP networks, in: Proceedings of International Conference on Computational Science (ICCS 2007), LNCS, 4490, 2007. [6] C. Metz, IP protection and restoration, IEEE Internet Computing 4 (2) (2000) 97–102. [7] P.-H. Ho, H. Mouftah, Framework of spare capacity re-allocation with S-SLSP for mesh WDM networks, Computer Networks 40 (1) (2002) 167–179. [8] A. Fumagalli, L. Valcarenghi, IP restoration vs. WDM protection: is there an optimal choice, IEEE Networks 14 (6) (2000) 34–41. [9] L. Sahasrabuddhe, S. Ramamurthy, B. Mukherjee, Fault management in IPover-WDM networks: WDM protection versus IP restoration, IEEE Journal on Selected Areas in Communications 20 (1) (2002) 21–33. [10] G. Suwala, G. Swallow, SONET/SDH-like resilience for IP networks: a survey of traffic protection mechanisms, IEEE Network 18 (2) (2004) 20–25. [11] C. Alaettinoglu, A. Zinin, IGP fast reroute in: IETF Routing Mtg., Atlanta, GA, USA, 2002. [12] M. Shand, S. Bryant, IP fast reroute framework, RFC 5714 (2010). [13] M. Shand, S. Bryant, A framework for loop-free convergence, RFC 5715 (2010). [14] A. Atlas, A. Zinin, Basic specification for IP fast reroute: loop-free alternates, RFC 5286 (2008). [15] A. Atlas, R. Torvi, G. Choudhury, D. Fedyk, OSPFv2 extensions for link capabilities to support u-turn alternates for IP/LDP fast-reroute, IETF Draft, draft-atlas-ospf-local-protect-cap-02.txt February 2006. [16] T. Cicic, A. Hansen, A. Kvalbein, M. Hartmann, R. Martin, M. Menth, S. Gjessing, O. Lysne, Relaxed multiple routing configurations: Ip fast reroute for single and correlated failures, IEEE Transactions on Network and Service Management 6 (1) (2009) 1–14. [17] M. Shand, S. Bryant, S. Previdi, IP fast reroute using not-via addresses IETF Draft draft-ietf-rtgwg-ipfrr-notvia-addresses-06.txt, 2010. [18] G. Enyedi, P. Szilagyi, G. Retvari, A. Csaszar, Ip fast reroute: lightweight not-via without additional addresses, in: INFOCOM 2009, IEEE, 2009, pp. 2771–2775. [19] L. Pan, M. Xu, Q. Li, D. Jen, Lightweight Ip fast reroute with tunnel-at, in: 18th International Workshop on Quality of Service (IWQoS) 2010, 2010, pp. 1–2. [20] Y. Yang, M. Xu, Q. Li, A lightweight ip fast reroute algorithm with tunneling, in: IEEE International Conference on Communications (ICC) 2010, 2010, pp. 1–5. [21] J. Moy, OSPF version 2, RFC 2328, April 1998. [22] R. Callon, Use of OSI IS-IS for routing in TCP/IP and dual environments, RFC 1195, December 1990. [23] K. Claffy, G. Polyzos, H.-W. Braun, Traffic characteristics of the T1 NSFNET backbone, in: Proceedings of IEEE INFOCOM ’93, 1993. [24] N. Spring, R. Mahajan, D. Wetherall, T. Anderson, Measuring ISP topologies with rocketfuel, IEEE/ACM Transactions on Networking 12 (1) (2004) 2–16. [25] Rocketfuel maps and data.
. [26] A. Medina, A. Lakhina, I. Matta, J. Byers, BRITE: an approach to universal topology generation, in: Proceedings of IEEE Modeling, Analysis and Simulation of Computer and Telecommunication Systems, 2001.