ARTICLE IN PRESS
JID: COMPNW
[m3Gdc;September 25, 2015;12:4]
Computer Networks xxx (2015) xxx–xxx
Contents lists available at ScienceDirect
Computer Networks journal homepage: www.elsevier.com/locate/comnet
Bringing high availability to BGP: A survey J. Camilo Cardona1,∗, Pierre Francois1, Bruno Decraene2, John Scudder3, Adam Simpson4, Keyur Patel5
Q1
Avenida Mar Mediterraneo 22, Madrid 28918, Spain
a r t i c l e
i n f o
Article history: Received 24 March 2015 Revised 16 July 2015 Accepted 14 September 2015 Available online xxx Keywords: BGP High availability Survey
a b s t r a c t The Border Gateway Protocol (BGP) interconnects the Autonomous Systems forming the global Internet. The high stability and scalability offered by the protocol motivated its use for other critical networking services, making BGP a key component for IP networks. BGP was, however, unable to adapt to network changes fast enough to fulfill the Service Level Agreements imposed to operators. Different proposals had been made in the last years to adapt BGP to these stricter requirements. This document surveys these enhancements, focusing on those already accessible to network operators or in an advanced state of standardization. We describe and compare these techniques with different alternatives, and provide basic recommendations for their deployment and operation. © 2015 Published by Elsevier B.V.
1
1. Introduction
2
The Border Gateway Protocol (BGP) allows Autonomous Systems (ASes) to exchange inter-domain routing information and provide global connectivity. BGP was conceived as a scalable and decentralized routing protocol, offering the ability to implement independent policies within each AS. Because of its flexibility, BGP has become a key component of the IP routing suite, as it is increasingly used to support a broad range of new services. For instance, BGP has been extended to exchange reachability information for other communication protocols (e.g. IPv6) and to support Virtual Private Networks (VPN) services (e.g. BGP/MPLS-VPNs). Due to the broad range of services it supports, the requirements faced by BGP implementations are much stricter than
3 4 5 6 7 8 9 10 11 12 13 14
∗
Q2
Corresponding author. Tel.: +34 914816210/644213402; fax: +34 914816965. E-mail address:
[email protected] (J.C. Cardona). 1 IMDEA networks. 2 Orange. 3 Juniper networks. 4 Alcatel-Lucent. 5 Cisco systems.
the ones for which the protocol was originally designed. Current networking services must comply with rigorous Service Level Agreements (SLAs), demanding service availability of typically 99.999%. As BGP provides reachability information to these services, any disruptions caused by BGP convergence can compromise the respect of SLAs. BGP, however, was initially designed to prioritize stability and scalability over convergence speed. Hence, the protocol was often not able to adapt to network changes fast enough to not affect service operation. Manufacturers, ISPs, and researchers recognized that BGP had to be enhanced for high availability, i.e. to adapt the protocol to provide an availability equal or larger to the 99.999% usually demanded by clients. In this document, we survey the main changes supported in main BGP implementations to improve availability. From the myriad of proposals of the last years to achieve this goal, we focus on those either running on operational networks, or in an advanced state of standardization. Our objective is to recommend best practices to network operators and illustrate the most relevant changes to a broader audience. We classify the improvements to enhance the availability provided by BGP in four types: Path diversity (Section 3). While ISPs normally receive multiple paths to external prefixes at the border of their
http://dx.doi.org/10.1016/j.comnet.2015.09.005 1389-1286/© 2015 Published by Elsevier B.V.
Please cite this article as: J.C. Cardona et al., Bringing high availability to BGP: A survey, Computer Networks (2015), http://dx.doi.org/10.1016/j.comnet.2015.09.005
15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38
JID: COMPNW 2
39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77
ARTICLE IN PRESS J.C. Cardona et al. / Computer Networks xxx (2015) xxx–xxx
network, BGP does not reflect such diversity in the ISP’s internal routers [19,94]. When an active path is lost, the dissemination of alternative paths within the network can take seconds [67]. Path diversity techniques reduce the convergence time by efficiently pre-disseminating the alternative paths to each router within the network. Session management (Section 4). The simple state machine that governs the behavior of vanilla BGP sessions facilitated the implementation of the protocol and its operation. This mechanism, however, made it very hard for the protocol to optimally react to different type of network events. Legacy BGP implementations required several seconds to detect connectivity loss to adjacent BGP routers. Additionally, BGP frequently recurred to session resets, which can last hundreds of second to even minutes to converge [6,15,18,93], even if these could be avoided. Improvements to the BGP session management state machine were defined so as to provide appropriate reaction to specific events disrupting a BGP session. Router architecture (Section 5). Router with legacy architectures could easily take hundreds of milliseconds to update their forwarding tables, even if they would know valid alternate paths at the moment of the failure [25,26]. In the last years, improvements to router architectures have decreased this duration of unavailability, by leveraging path diversity in the data-plane and implementing more efficient data structures in the Forwarding Information Base (FIB) of the router. Multiprotocol BGP and BPG/MPLS VPNs (Section 6). BGP is used to exchange reachability information for multiple services through an extension of the protocol denominated Multiprotocol âÇô BGP (MP–BGP). BGP/MPLS VPNs are probably the most frequently known services, besides IPv4, that BGP supports. Network environments implementing these services bring their own particular high availability challenges, which are covered by the techniques included in this type. The rest of the document is structured as follows: Section 2 includes an overview of BGP. In Section 7, we discuss the main unsolved problems (industry and research) and the ongoing efforts to solve them. Finally, we conclude in Section 8.
78
2. Background
79
The BGP version 4 is the base specification that a router implementing BGP should support, which is specified in the RFC 4271 [74].6 This section provides an overview of this version of the protocol. We divide the explanation in four different parts:
80 81 82 83 84 85 86 87 88 89 90 91 92
[m3Gdc;September 25, 2015;12:4]
• BGP sessions. BGP speakers connect with each other using BGP sessions, in order to exchange paths towards destinations, encoded as Network Layer Reachability Information (NLRI). We provide details on how BGP sessions are formed between peers, as well as how speakers manage session events in Section 2.1. • iBGP and eBGP. BGP does not only convey routing information among Autonomous System. The protocol is also used to exchange paths within Autonomous Systems. We
6
RFCs 6286, 6608, and 6793 update the specifications of RFC 4271 and are thus also considered part of the base version of the protocol.
Table 1 BGP message types [16,74]. Message type
Description
OPEN
OPEN messages are used by a router to request the establishment of a BGP session with a neighbor After establishing a session, KEEPALIVE messages are exchanged between BGP peers to ensure that the devices are still active NOTIFICATION messages are used to inform BGP peers of anomalies in the BGP session, and close the session The UPDATE message conveys the routing information between the peers Exchanged between Routers supporting the Route Refresh capability, ROUTE-REFRESH messages allows a BGP speaker to ask its peer to re-advertise all its routes [16]
KEEPALIVE
NOTIFICATION
UPDATE ROUTEREFRESH
describe the differences between these two roles of BGP, respectively referred to as eBGP and iBGP, in Section 2.2. • BGP router design and decision process. Each BGP speaker selects the best path for each destination, among the set of received paths through what is called the BGP decision process. This process uses the information included in BGP paths, referred to as BGP path attributes, to compare the known paths to the same NLRI. Since any BGP speaker can modify path attributes upon processing, ASes can influence the selection process to implement their policies. Such aspects of BGP are presented in Section 2.3. • BGP extensions. BGP provides different mechanisms that can be used to support extensions to the protocol. These mechanisms can be implemented without compromising compatibility with devices that only support the base version of the protocol. We describe such mechanisms in Section 2.4.
93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110
2.1. BGP sessions and the BGP Finite State Machine
111
The BGP session between two peers is maintained over a TCP connection through which the peers exchange routing information, using different types of messages [6,18]. Table 1 provides a description of each type of message. The rules that govern the behavior of BGP speakers are described by the Finite State Machine (FSM) of the BGP standard (Fig. 1) [74]. The BGP FSM defines six session states and the events that trigger state changes. The BGP FSM was designed to be simple and did not differentiate disruptive events (e.g. node failure) from events that only partially interrupt the connection (e.g. a planned node restart). This simplicity can trigger session resets after events in which a less drastic approach could have been followed. Since a session reset typically triggers convergence on large sets of BGP destinations, potentially harming availability, new techniques were proposed to preserve availability in cases where a less radical session recovery procedure can be applied. We describe these techniques in Section 4.
112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129
2.2. eBGP, iBGP, and route reflection
130
BGP defines different behaviors for external or internal BGP peers; these are referred to as eBGP and iBGP respectively. While eBGP can be considered as the mechanism to
132
Please cite this article as: J.C. Cardona et al., Bringing high availability to BGP: A survey, Computer Networks (2015), http://dx.doi.org/10.1016/j.comnet.2015.09.005
131 133
JID: COMPNW
ARTICLE IN PRESS
[m3Gdc;September 25, 2015;12:4]
J.C. Cardona et al. / Computer Networks xxx (2015) xxx–xxx
3
Fig. 3. BGP RIBs.
Fig. 1. BGP Finite State Machine. Rekhter et al. [74] describes each state and the conditions for transition between states.
Fig. 2. A sample BGP network.
134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158
exchange paths among ASes, iBGP is used to distribute external paths among the routers of an AS. Fig. 2 illustrates iBGP and eBGP sessions. BGP defines several path propagation rules for iBGP sessions, such as the constraint of only announcing to iBGP peers routes received from eBGP peers. This rule would require the establishment of a full mesh of BGP sessions for path dissemination. As this approach can create scalability and operational issues in large networks, operators typically base the iBGP topology on route reflection [5,64]. Route reflectors help controlling scalability in terms of number of BGP sessions to be maintained by each speaker.7 A route reflector is a BGP router that relays paths received from iBGP peers to other iBGP peers. The set of routers to which the route reflector can announce iBGP routes are their clients. Operators can configure different route reflectors in their network and connect them using a full iBGP mesh or other route reflectors. Bates et al. [5] describes the rules governing the behavior of route reflectors. Route reflectors are key components in todayâÇÖs iBGP networks. However, the inclusion of route reflectors in a network can reduce route diversity and lead to slow convergence in some situations [38,56]. These problems arise from the fact that route reflectors must still obey the rule of only propagating a single best route for a given NLRI. The knowl-
7 BGP confederations [57] is an alternative technique to route reflectors for reducing the scalability issues of iBGP sessions in ASes. Confederations divide ASes into smaller sub-autonomous systems, for which configuring an iBGP full mesh is feasible. The High-Availability techniques described in this paper can also be implemented in BGP confederations.
edge of multiple paths for each destination, at each node of the network, is critical for BGP high availability, as it can reduce convergence after failures for multiple seconds [26,94]. We will discuss proposed solutions to solve these issues in Section 3.
159
2.3. BGP router design
164
Each BGP router constantly processes path updates received from neighboring BGP peers. Specifically, the router must decide whether paths should be filtered out on reception or its attributes modified; which paths should be selected as best and installed in the routing table; and which paths should be announced to other BGP peers. BGP defines three different conceptual types of Routing Information Base (RIB) that model this process inside a router. An illustration of the RIBs and their relationships is provided in Fig. 3. Note that this description of the RIBs and their relationship are abstract concepts that model the management of routes in a BGP router. The actual implementation of the RIBs and their operation in a system depends on the router manufacturer. 1. Adj-RIBs-In. A BGP speaker maintains the routes received from its BGP peers in the Adj-RIBs-In and applies the locally configured policy to them. The policies applied to the received paths aim at rejecting incoming routes or modifying their attributes in order to tweak the selection of the best-path, according to the needs of the ISP. For example, paths received from customers tend to be preferred over paths received from transit providers [32,36]. This policy is usually reflected by setting the local preference attribute of customer paths to a higher value than what is set for provider paths. 2. Loc-RIB. After applying the policies, the BGP speaker selects the set of best paths using the best path selection algorithm (Fig. 4) and stores them in the Loc-RIB. A router processes the Loc-RIB, together with other routes available to the router, to select its best path for each destination NLRI, and ultimately store them in the router’s main RIB. The RIB is then further translated into a Forwarding Information Base (FIB) that is used by the router to forward packets. Note that as routes from other routing protocols might be preferred over BGP routes, not all routes in the Loc-RIB find their way into the FIB. 3. Adj-RIBs-Out. Finally, the router maintains RIBs aimed at tracking which paths were announced over which BGP session. These RIBs are denominated Adj-RIBs-Out and
Please cite this article as: J.C. Cardona et al., Bringing high availability to BGP: A survey, Computer Networks (2015), http://dx.doi.org/10.1016/j.comnet.2015.09.005
160 161 162 163
165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202
ARTICLE IN PRESS
JID: COMPNW 4
J.C. Cardona et al. / Computer Networks xxx (2015) xxx–xxx
BGP best-path selection algorithm 1. 2. 3. 4. 5. 6. 7. 8. 9.
Prefer path with highest Local preference. Prefer path originated by local router. Prefer path with shorter AS-path length. Prefer path with lowest origin code. Prefer path with lower MED (Only done if neighboring AS is the same) Prefer EBGP to IBGP. Prefer path with closest next-hop. Prefer oldest path, if EBGP. Prefer path in which the Router ID of NH is lowest. Fig. 4. BGP algorithm [74].
203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224
are populated after applying the policies to the routes present in the Loc-RIB. BGP outbound policies are necessary, as not all paths are to be propagated to neighboring ASes. For example, a path from a settlement-free peer should not be propagated to transit providers, since this will generate connectivity costs that would not provide any benefit for the ISP. As the creation of each Adj-RIBsOut requires processing the Local-RIB, it is not scalable for a BGP speaker to maintain a single Adj-RIBs-Out for each of their peers. Therefore, BGP routers allow operators to group peers that share the same outbound policy into peer-groups [103]. By employing peer-groups, routers can prepare updates for more than one peer after processing their RIBs only once, thus reducing the consumption of system resources. The RIBs are part of what is called the control plane of the router, while the FIB is part of the data plane. Legacy routers used to place control and data plane functionalities in the same components. Modern router architectures have decoupled the two planes, which now reside in separate entities. There have been further improvements in router architecture that were designed to improve BGP availability, as described in Section 5.
225
2.4. Capabilities, address families and communities
226
We describe in this section three different elements that allow the extension of the functionality of BGP without compromising backward compatibility with devices only supporting the base version of the protocol. We first introduce capabilities. We then explain how address families, and communities can be used to convey new types of information through BGP.
227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246
[m3Gdc;September 25, 2015;12:4]
• Capabilities. The introduction of new BGP features might require the modification of specific components of the protocol, such as the FSM or the format of BGP messages. These modifications might break the compatibility with sessions to other BGP speakers not supporting the features. BGP capabilities were introduced to provide a method to exchange the features that each side supports [81]. An optional parameter exchanged in the OPEN message of the BGP sessions lists the capabilities supported by each side. Based on this exchange, speakers can decide whether an optional protocol feature should be enabled. Many standardized extensions to BGP define whether a feature can be used if the peer is not announcing its support with a capability. Alternatively, a BGP speaker can be
configured to not pursue the establishment of a session when the other side does not support some specific capability. Several of the functions included in this document, such as Graceful Restart (Section 4.3) or ADD-PATH (Section 3.4), use BGP capabilities. • Address families. Despite being initially designed for IPv4, the BGP protocol has been extended to announce routing information for other types of services [4]. The use of BGP to carry routing information for multiple services is referred to as Multiprotocol–BGP (MP–BGP). For example, MPLS L3VPNs, one of the most common types of service offered by ISPs, use MP–BGP to propagate the information of each VPN [70,77]. MP–BGP speakers identify the protocols that they support with their peers, using address family and Subsequent Address Family Identifiers (AFI/SAFI). MP–BGP can face particular challenges for service availability that are not present in the base protocol. We discuss them, together with the techniques to counter them, in Section 6. • Attributes and communities. Attributes are used to convey characteristics of the paths and are included in the UPDATE message of BGP. AS-PATH and Next-Hop are examples of path attributes. Although possible, protocol designers seldom extend the list of attributes, as this would require an intrusive modification of the BGP implementation. A better option to append new path information is by the use of communities [13]. Communities are transitive-optional attributes, i.e. attributes that are not necessarily supported by a router, but must be transmitted to the next BGP speaker. BGP Communities are a list of numbers, similar to tags, which provide a way of conveying properties of the paths included in the UPDATE. Some communities have a well-known, standardized, meaning, such as the NO-EXPORT (0xFFFFFF01) [58], which target prefixes that should not be propagated further to external ASes. Operators and designers can provide semantic meaning to specific communities for their own purposes and applications. G-shutdown (Section 4.2) is an example of a high availability method that uses communities for its operation.
247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286
3. Increasing path diversity
287
In this section, we describe several methods proposed in the last years to overcome path hiding within networks. We begin by briefly describing the reason behind this phenomenon and its overall consequences. Network operators strive to maintain a good level of redundancy in the external connectivity of their networks. This typical interconnection approach provides a large number of different paths for each prefix to the network. Indeed, several studies have observed large path diversity at the edge of ISP networks [19,92,94,97]. However, since BGP defines that routers can only advertise their selected best path, such path diversity is not always available to all routers of the network. The existence of paths with better attributes (higher local preference, shorter AS-length, etc.) can hinder the propagation of alternate exit points for a given destination prefix. This situation is exacerbated by the use of route reflectors, which further limits path propagation [19,94]. By not having multiple paths for the same destination prefix, routers lose
288
Please cite this article as: J.C. Cardona et al., Bringing high availability to BGP: A survey, Computer Networks (2015), http://dx.doi.org/10.1016/j.comnet.2015.09.005
289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305
JID: COMPNW
ARTICLE IN PRESS
[m3Gdc;September 25, 2015;12:4]
J.C. Cardona et al. / Computer Networks xxx (2015) xxx–xxx
Fig. 5. Interconnection between three ASes.
306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336
a good opportunity to load balance traffic. Furthermore, this condition weakens the speed with which the network can recover from disruptive events such as link or node failures. If the best path for a prefix fails, affected routers might lose reachability to that prefix until the network re-converges. Therefore, the existence of various, disjoint paths to the same prefixes in the RIB of the router are beneficial for network availability. Let us illustrate the problem of path hiding using Fig. 5, which depicts the interconnection of three ASes. AS1 is connected to AS2 through two parallel connections (EA1 to EA2 and EB1 to EB2). AS1 can also reach AS2 via AS3 using EC1. Two route reflectors (RRA1 and RRB1) distribute the external routes within AS1. We consider that the local preference set by AS1 for routes received from AS2 and AS3 is the same. Although AS1 has three possible paths to reach the prefixes announced by D2, routers (such as O1) may receive only a single path. In this particular example, the longest available path to reach D2, via AS3, is not announced to the two route reflectors, as EC1 prefers any of the paths directly connected to AS2. The other two paths to AS2 are equivalent until the IGP step of the BGP decision process. Would the two route reflectors be located in the same point of the network, both of them would choose the same exit point towards D2. Thus, O1 only receives one of the three available paths, losing its ability to load balance or to switch quickly to a valid alternative path in case the main path is lost. In the next sections, we examine some of the different methods that have been proposed to augment and manage path diversity within BGP routers. We provide a summary of them in Table 2.
337
3.1. iBGP topology planning
338
The path diversity of the network depends on various factors, such as the iBGP topology, the location of exit points, the position of route reflectors, and the attributes of the path received over eBGP sessions. One approach of managing the path diversity of the network is for ISPs to account for it during network design [64]. A proper location of route reflectors is essential for the propagation of paths that are equal until the IGP step of the BGP selection algorithm. For instance, in Fig. 5, if the distance of the RRs to router EA1 were shorter than their distance to EB1, both RRs would select the same path. The operator could
339 340 341 342 343 344 345 346 347 348
5
place one of the route reflectors closer to EB1 to solve this issue. This solution, however, becomes very complex when the network designer must consider all prefixes and neighboring ASes [10,11,39]. One particular method that network operators can use to increase the number of paths available is creating additional iBGP sessions among routers in the network [69]. For instance, let us consider that the route reflectors of Fig. 5 are closer to the router EA1 and that the operator cannot place one of them on a different position. The operator can add a direct iBGP session between O1 and EB1, thus ensuring that an additional path would be available for O1. Although operators usually avoid the configuration of many iBGP sessions, network diversity could be easily improved in some cases by adding a few additional sessions.
349
3.2. Best external
364
A good design of iBGP topology still cannot solve the lack of propagation of alternative paths if edge routers receive better ones from internal iBGP peers. For example, having a single path with the highest local preference value will prevent edge routers from propagating alternate external paths in the iBGP topology. Marques et al. [53] proposes a modification to BGP, referred to as best-external, which allows routers to announce their best external path, even if an internal path was favored by their decision process. Similarly, best-external also defines the mechanisms that permit route reflectors to disseminate alternative routes to its clients and non-client peers [53]. In Fig. 5, if best-external is enabled on EC1, the router will propagate the path it receives from AS3 to its route reflectors, even if the paths via EA1 or EB1 are selected as best. Thanks to this approach, in case of a failure of the primary paths to AS2, the route reflectors already know an alternate path and can propagate it to its clients, instead of temporarily withdrawing the path until the alternate path is propagated by EC1. Most vendors support the configuration of the bestexternal behavior. This technique does not provide by itself a definitive solution for path diversity, but constituted a first attempt at solving the need for non-best path propagation with ISP networks. A disadvantage of this solution lies in the low granularity of the decision to propagate a best external path. For example, alternate non-best-paths learned from transit providers would be propagated by an edge router connected to the provider, even when multiple paths via customers are already known within the topology, overloading the BGP control-plane of route reflectors with paths that they would never select as best.
350 351 352 353 354 355 356 357 358 359 360 361 362 363
365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396
3.3. Diverse BGP paths via route reflector planes
397
Operators can obtain some path diversity in their networks by employing best external together with iBGP topology planning. This process, nevertheless, is different for every network and can be considered as a complex network management task. ISPs require solutions that ensure path diversity without relying on ad-hoc designs. Diverse BGP paths, defined in RFC 6774 [71], is a mechanism that attempts to
398
Please cite this article as: J.C. Cardona et al., Bringing high availability to BGP: A survey, Computer Networks (2015), http://dx.doi.org/10.1016/j.comnet.2015.09.005
399 400 401 402 403 404
ARTICLE IN PRESS
JID: COMPNW 6
[m3Gdc;September 25, 2015;12:4]
J.C. Cardona et al. / Computer Networks xxx (2015) xxx–xxx
Table 2 Summary of path diversity techniques.
Technique
405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442
Number of possible additional paths
iBGP topology planning
Depends on final design. If direct iBGP sessions are not established, the maximum number of paths is limited by the route reflectors implemented in the network
Best external
Each border router or route reflector can announce an alternative path. If the feature is only enabled in border routers, the paths would only be propagated until the first RR
Diverse Paths
Potentially as many as route reflector planes are configured
ADD-PATH
Depends on the ADD-PATH mode selected [87]
Type of additional paths
Compatibility required
Operational effort
AS-Wide best paths that are hidden by route reflectors. AS-Wide best paths are paths that are considered as best when applying rules of the BGP decision process up to the IGP tie-break step Border routers announce the best external path. Route reflectors may disseminate alternative paths to client or non-clients peers depending on which router announces the best overall path Any type of path
None
An operator must plan and model each network individually
Border routers must support the feature
None
Route reflectors must support the feature
Depends on the ADD-PATH mode selected [87]
Sending and receiving routers must support the feature
Design of route reflector planes and configuration of features in the edge devices Configuration of the feature in devices. Monitoring of resources might be necessary
solve this by providing operators the option to configure redundant route reflectors so as to let them announce different paths. Each route reflector supporting Diverse Paths still sends a single path per destination to their clients, but the route reflector architecture is now considered as a set of “Route Reflection Planes”, each plane playing a different role in the path dissemination infrastructure. One plane can, for example, be configured to announce a path different from the one that it would select as best, according to the standard BGP selection process and its local configuration. Typically, a route reflector plane can announce the overall best path to their clients, while a second plane would be configured to announce the second-best path. Using as an example the network depicted in Fig. 5, we can form two route reflector planes, one with RRA1 and another with RRB1. Even though the path via EA1 is preferred for both route reflectors due to its attributes and the position of both route reflectors, the first route reflector announces the best path via EA1, while the second route reflector, configured to announce alternate paths, announces its secondbest path (via EB1). Operators can configure as many route reflection planes as needed to obtain the desired path diversity. The network design just needs to ensure that the route reflectors in the different planes classify the sets of available paths in the same order using the standard BGP decision process (Fig. 4). Two approaches are proposed to achieve this. The first approach is to always co-locate route reflector for every required plane. This guarantees that each route reflector will consider equal IGP distance to nexthops, globally providing a consistent view of the ordering among best paths from each route reflection location. The second approach consists in disabling the IGP tie-break step in the BGP decision process so that all route reflectors perform the same ordering of their preference among paths. By using Diverse Paths, a standard BGP speaker configured with multiple iBGP sessions to the various route
reflection planes can obtain path diversity without requiring changes in its implementation. In order to ensure availability of paths in case of route reflector failure or maintenance, such speakers however need to be configured with at least two sessions to each route reflection plane.
443 444 445 446 447
3.4. ADD-PATH
448
One proposal that has been in development for the last few years is the extension of BGP to allow the propagation of more than one path per destination over a single BGP session. At the time of this writing, this technique, denominated ADD-PATH, is still undergoing standardization [60,87,97,98]. First implementations of ADD-PATH have, however, already been made available by most vendors [75]. While ADD-PATH was initially devised as a way to mitigate the MED oscillation problem [38,56], it became the general method considered for disseminating multiple paths per destination within a service provider network. ADD-PATH allows the inclusion of multiple paths in an UPDATE message by adding a path identifier field to the NLRI encoding. A router supporting ADD-PATH, and willing to exchange multiple paths with a peer, should advertise the ADDPATH capability upon initializing the BGP session [97]. ADD-PATH describes the extensions to BGP that allow the propagation of multiple paths between BGP speakers, but it does not specify the paths to be advertised. The selection of paths depends on the supported applications and operator requirements. Multiple approaches for the selection of the paths to be advertised by a given BGP speaker were investigated [87]. These variants of path selection are called ADDPATH “modes”, and manufacturers are free to select which ones they support based on their customer requirements. Simpson et al. [87] provides a guide of the best practices for the implementation of ADD-PATH, including all the path selection modes which were suggested by vendors and operators. We describe next the three mainly considered ADDPATH modes.
449
Please cite this article as: J.C. Cardona et al., Bringing high availability to BGP: A survey, Computer Networks (2015), http://dx.doi.org/10.1016/j.comnet.2015.09.005
450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478
JID: COMPNW
ARTICLE IN PRESS
[m3Gdc;September 25, 2015;12:4]
J.C. Cardona et al. / Computer Networks xxx (2015) xxx–xxx
521
ADD-PATH N. The selection of the ADD-PATH mode for a network is not trivial, as they carry their own benefits and drawbacks. In order to provide high availability, ADD-PATH can be implemented in order to let route reflectors advertise N paths with different nexthops to each client (where N is larger than 1). This is referred to as Add-N mode. Conceptually, a BGP speaker advertising N alternate paths per destination using Add-N mode can run the BGP algorithm, select the best path, then re-run its decision process by disregarding the previously selected best path, as well as the other paths having the same NEXTHOP and Originator attributes. This procedure can be repeated as many times as necessary to obtain the N paths (or less, if no more paths are available). ADD-PATH ALL. The choice of the number and the type of paths routers announce to their peers is left for operators and manufacturers to decide. A simple choice would be to advertise all available paths, which is called as the ADD-PATH ALL mode. Operators using this mode must be careful, as routers on large networks could lack the necessary memory and processing resources to process all paths. ADD-PATH multipath mode. ADD-PATH N and ADD-ALLpaths do not fit all the requirements faced by operators. Content distribution networks, for instance, are more interested to restrain the propagated paths to those that routers can install in their FIBs, when BGP multipath [35] is configured. While not ensuring the propagation of multiple paths, this mode is considered attractive by some operators as the BGP multipath configuration unifies with the path propagation policy of the route reflector. ADD-PATH example. In Fig. 5, AS1 has three possible paths to reach the prefixes announced by D2. If the operator decides to implement ADD-PATH using the “Add-N, N = 2” mode on its route reflectors, the path via AS3 would not be announced to O1. The network designer can consider this level of path diversity enough for its network. However, the operation of EA1 and EB1 could depend on various common elements (the same power source, shared fibers, etc.). If any of these elements fail, both EA1 and EB1 would suffer a disruption. In this scenario, the propagation of the path via EC1 could be significant for the availability of the connection from AS1 to AS2. At the end, operators must choose an ADD-PATH mode that balances functionality and resource utilization.
522
3.5. Conclusion on path diversity techniques
523
Transmitting a single path per NLRI limits the diversity of path available at the router level within an AS. The existence of backup paths can be leveraged by High Availability techniques, such as IP Fast ReRoute (FRR) [86], to switch swiftly to alternative paths when the main path fails. In [26] authors analyze path diversity and the effects of path hiding for a Tier-1 network. In the studied network, 95% of the prefixes had more than one path available; however, only 20% of the routers received multiple paths for more than 50% of the prefixes. Network operators currently have at their disposal multiple options to distribute path diversity in their networks. The final selection of the protocol to choose depends on the requirements and resources of each network. ISPs networks usually consist of heterogeneous devices, many of which
479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520
524 525 526 527 528 529 530 531 532 533 534 535 536 537
7
support different features. An operator must understand the capabilities of each of their routers and how to leverage them to obtain the desired path diversity. From the techniques introduced in this section, ADDPATH offers the greatest flexibility and functionality. If all routers in the network support ADD-PATH, the operator only has to choose the mode that offers a good performance without compromising routing resources. Van den Schrieck et al. [97] analyzes the benefits and performance costs of different ADD-PATH modes for simulated topologies resembling Tier1 and Transit networks. For both types of network, the average number of paths per routers doubles when ADD-PATH N-2 is enabled (This result would be similar to using diverse paths, in a 2-level architecture, and best external). When using ADD-PATH ALL, the increase depends on the type of network. For example, the average number of paths maintained by a router increases up to eight times for the studied Tier-1 networks. ADD-PATH required an implementation effort for the router manufacturers, in terms of code optimization and interoperability. In cases in which some network elements do not support ADD-PATH, the operator can use the other available techniques, such as best external and diverse paths, to achieve path diversity. These techniques, however, require a more elaborated network design that needs to be setup in accordance to the network topology. Table 2 contains a summary of the techniques introduced in this section.
538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564
4. Session level improvements
565
Historically, the BGP state machine has been kept simple to facilitate implementation and interoperability. This simplicity, however, let the protocol not deal efficiently with different types of network events. Service provider networks are constantly experiencing events that affect the state of their nodes or links, such as hardware failures, software upgrades, and configuration changes. Each type of event affects the network in different ways and takes place in different operational contexts, however, BGP treats them equally. For instance, a BGP session restart on a router can be achieved without interrupting packet forwarding. This event should be treated differently from, for example, a sudden fiber cut that breaks data-plane connectivity. In the last few years, network operators and vendors have proposed various techniques to improve the session management of BGP by tailoring its behavior for a better fit to the different events that affect sessions. This section provides an overview of the most important of these techniques, which we summarize in Table 3.
566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584
4.1. Fast neighbor failure detection
585
In case of a failure of a BGP peer, it is compulsory for the router to remove all paths received from that session and choose alternative paths. The mechanisms defined by the BGP protocol to verify peer availability are slow for current application demands. BGP establishes two timers for this end: The first is the keepalive timer, which states the number of seconds between consecutive keepalive packets. The second is the hold timer, which states the time that a router
586
Please cite this article as: J.C. Cardona et al., Bringing high availability to BGP: A survey, Computer Networks (2015), http://dx.doi.org/10.1016/j.comnet.2015.09.005
587 588 589 590 591 592 593 594
ARTICLE IN PRESS
JID: COMPNW 8
[m3Gdc;September 25, 2015;12:4]
J.C. Cardona et al. / Computer Networks xxx (2015) xxx–xxx
Table 3 Summary of session level techniques.
Technique
Compatibility required
Operational effort
Fast detection of failure of eBGP neighbor Fast detection of failure of iBGP peer
Support of feature on both devices
Configuration of feature in the routers
Router must support fast de-peering upon loss of connectivity for iBGP peers
Graceful shutdown
Prepare network for the shutdown of a node or link
A device must support the automatic trigger of the mechanism. Manual implementation does not need any requirement
Graceful restart
Maintain network availability while restarting the control plane of routers Avoid session breakdowns after specific BGP errors
Feature support
Configuration of feature. Only recommended for networks in which IGP route flapping is inexistent If the automatic feature is available, the deployment is simple. A manual implementation of the technique might require a more complex operation Configuration of feature in the device
Fast failure detection - BFD Fast failure detection - IGP
Revised error handling
595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633
Network events that attempts to solve
waits until it declares a neighbor down. The recommended value for the hold time is of 90 s, while the keepalive timer should be one third of the hold timer [74]. By following these values, a router would wait 90 s after a failure of a peer before moving to back up paths. Many manufacturers allow the manual configuration of these timers; however, setting them to very low values can affect the central route processor. Various techniques can be deployed to detect BGP peer failures faster. These techniques differ between iBGP and eBGP sessions: For iBGP, a neighbor is usually not directly connected to the router. Hence, the detection of the failure of an iBGP neighbors can only rely in the IGP protocol of the network, which usually can detect failures in less than a second. Most vendors provide options to tear down iBGP sessions upon the loss of connectivity to the peer. This feature, however, must be implemented with care, as route flapping can cause unnecessary iBGP session disruptions. For eBGP peers the problem is different, as the routers are commonly connected through the same L2 domain, normally using intermediate devices (e.g. a switch). Under these circumstances, Ethernet interfaces are ineffective in detecting connectivity failures. Bidirectional Forwarding Detection (BFD), specified in [47], can be used to detect failures in those environments. BFD is an efficient hello protocol independent from BGP. One advantage of BFD is that it can be easily implemented in hardware, allowing fast hellos without affecting the main CPU [31]. 4.2. Graceful shutdown Network operators often perform maintenance tasks that require shutting down one or more active BGP peering sessions with neighboring ASes. These operations can cause a convergence period in which routers might drop packets; even when alternative paths exist [101]. For example, [28] reports that, in a large VPN service provider, an average of 46% unavailability were related to maintenance operations. The relative frequency of this operation highlights the need of defining a solution that prepares networks for planned operations where peering sessions are disabled. For outbound
Procedure support
Enabling feature in the device (in case it is not enabled by default)
traffic, path diversity methods (Section 3) together with fast detection mechanism (Section 4.1) can help reduce the risk of dropping packets. However, it is possible that the maintenance affects all propagated paths and thus still cause packet loss. On the other hand, the availability of inbound traffic relies in the capability of the external AS to detect the event and converge to alternative routes. The requirements for a solution that can maintain availability upon planned BGP session shutdown, denominated graceful shutdown, are described in RFC 6198 [22]. The specifications from [22] are flexible enough to allow a trade-off between configuration simplicity and available features. The document demands that solutions for these requirements must be able to support incremental deployment and the necessary mechanisms to return the network to the prior state after the procedure is over. One example of a technique that fulfills the requirements of RFC 6198 is described in [30]. The authors propose a makebefore-break solution that modifies the local preference of the paths affected by the operation, thereby allowing the propagation of alternative paths on the network. The solution recommends the use of a specific community (g-shut community) to identify the affected prefixes. The operators should configure the edge routers on the network to assign a low local preference to the prefixes tagged with the g-shut community and strip the community before further propagating them. Before the procedure, an engineer can append the community to all affected paths, using manual or automated mechanisms. Thanks to the lower local preference, the network re-converges to a state where the affected paths are not used. After the maintenance tasks are executed, the operator can remove the community from the paths and the network returns to its initial state. Let us provide an example of the operation of the solution proposed in [30] using the network from Fig. 6. The network is composed of three ASes. AS1 connects to AS2 through two direct links. An intermediate AS, AS3, provides an alternative (indirect) path between AS1 and AS2. Let us take the point of view of AS1 while the operators of AS2 perform maintenance in its infrastructure. We assume that O1 is load-balancing traffic to AS2 using the two links
Please cite this article as: J.C. Cardona et al., Bringing high availability to BGP: A survey, Computer Networks (2015), http://dx.doi.org/10.1016/j.comnet.2015.09.005
634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674
JID: COMPNW
ARTICLE IN PRESS J.C. Cardona et al. / Computer Networks xxx (2015) xxx–xxx
Fig. 6. Connectivity between three ASes.
703
connecting both ASes. If operators of AS2 disable the link between EA1 and EA2, the packets heading to AS2 via EA2 will be lost until O1 converges. To avoid packet loss, both operators can implement the guideline from [30]. The operator of AS2 would trigger EA2 to mark the paths announced to EA1 with the g-shut community. EA1 would strip the community and then reduce the local preference for the paths received from EA2. After receiving the lower local preference from the routes from EA1, O1 would only send traffic to EB1. Note that if the operator of AS2 requires the maintenance of both links to AS1, this same mechanism would allow AS1 to select the path through AS3 to reach the prefixes announced by AS2. With respect to the point of view of AS2 under the same situation, g-shutdown can also improve availability for any traffic flowing between D2 and O1. In that case, router EA2 could apply an internal policy tagging the paths received on the links that fall under maintenance with the g-shut community. EA2 would then lower the local preference of these paths, hence preparing the internal network for the procedure. Francois et al. [23,28] investigate the effects of graceful shutdown on both large and small ISP network. First, authors explored the impact of maintenance over those topologies. The loss of connectivity during maintenance fluctuated between 0 and 3.3 s, depending on the architecture of each network (Full mesh, RRs, BGP-free core, etc.). For same cases, enabling graceful shutdown allows reducing the loss of connectivity to 0 for most of the architectures, and limiting the impact to 0.4 for the worst of the cases.
704
4.3. Graceful restart for BGP
705
The original BGP standard defines that in case of loss of TCP connectivity to a peer, the peer must be declared as down, and all paths received from the peer removed from the routing table [74]. When a session with a BGP peer restarts, the procedure is the same. This causes a convergence period that can lead to packet loss. Session restarts are common in ISPs, induced mostly by maintenance operations or Route Processor failovers. Network operators, therefore, requested features that could lower the unavailability period on these cases. Recent routers decouple the elements performing the control plane operations from the data plane operations. The route processors send the FIB updates to the line cards, which forward the packets based on it [62]. Thanks to this
675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702
706 707 708 709 710 711 712 713 714 715 716 717 718
[m3Gdc;September 25, 2015;12:4] 9
architecture, the router can keep forwarding packets independently of the state of the sessions of the routing protocols [15]. This state is only recommended for short periods in order to minimize network problems, but can be leveraged in case of a session restart. We discuss with more detail this topic in Section 5.1. Graceful restart, defined in [79], describes a procedure that allows BGP speakers to communicate their ability of maintaining packet forwarding over short disruptions of the BGP session. If a router is aware of this feature from a BGP peer, it can keep forwarding packets to the peer upon session failure. Even when routers do not support the forwarding of packets after a session restart, they may support the option of understanding the capability and allow routers supporting the feature to implement it, if necessary. Sangli et al. [79] also defines the use of a marker called End-of-RIB that defines the end of the transmission of the whole RIB. The End-of-RIB is necessary for the usage of the GR mechanism. The operation of graceful restart is as follows. Routers announce the graceful restart capability when establishing a session with a peer. The message describing the capability includes the option of defining for which address families the capability is supported. If a router announces the GR capability, but it does not support the capability on any address family, it means that the router cannot preserve the forwarding function; however, it supports the implementation of the procedure by its peers. After a BGP session is torn down, routers that understand the GR capability will continue forwarding packets to the affected router. Routers mark the routes from the affected router as stale. If a NOTIFICATION message ends the session, the normal procedure of tearing the session down is followed.8 There is a limit imposed on the time in which routers blindly forward the packets. If the session is still down after such timer expires, the peer will remove the paths coming from this peer from the routing table. After the restarting router establishes a new TCP session with the peer, an initial update procedure of all paths takes place. When the exchange is over, the peer sends an End-ofRIB marker. The End-of-RIB signals the other routers that it can restart its convergence process. The router must eliminate all routes that were marked as stale, which were not refreshed during this phase. Graceful restart must be used with caution, as an inadequate implementation can lead to unnecessary traffic loss. It is recommended to use this operation together with the graceful restarts mechanism of the IGPs. However, when deployed correctly, GR can avoid unavailability of several tens of seconds due to BGP session restart [84,89]. Current High-end routers support Non Stop Routing procedures (Section 5.2), which can maintain the routing session even after switching over to the backup RP. This type of procedure might replace graceful restart, as it does not depend on the routers around to function. Nevertheless, it is important to understand and consider GR for occasions in which not all routers support NSR or when the restart of the session is not due to a RP switchover.
8 The IETF is currently discussing an extension to GR that would allow NOTIFICATION messages to also trigger the described mechanism [83].
Please cite this article as: J.C. Cardona et al., Bringing high availability to BGP: A survey, Computer Networks (2015), http://dx.doi.org/10.1016/j.comnet.2015.09.005
719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775
JID: COMPNW
ARTICLE IN PRESS
10
J.C. Cardona et al. / Computer Networks xxx (2015) xxx–xxx
790
As we just described, graceful restart requires peers to exchange all their paths upon session reestablishment. This procedure is time consuming, but necessary for the peers to detect stale routes. A current improvement proposed to this mechanism, called Enhanced Graceful restart [65], seeks to speed convergence by avoiding the re-advertisement of all paths. In Enhanced GR, routers exchange “version” numbers of their routing information, which helps them keep track of the state of updates received from their peers. The version number allows routers to identify the routes that have been preserved by the remote speaker after the session restart. Enhanced GR is a new BGP capability, independent to graceful restart, and both can be announced by BGP speakers. Patel et al. [65] explains the details of the operation of this mechanism.
791
4.4. Revised error handling for BGP
792
RFC 4271 [74] defines the procedures that BGP speakers should execute after encountering errors in any type of BGP message. If the error is on an UPDATE message, the specification defines that the session with the peer must be reset. Clearly, this operation also affects the connectivity of other NLRI exchanged over the session. This error handling procedure has generated undesired effects in real networks. For instance, in 2010 a router from a research experiment announced an experimental attribute that triggered a disruption in a significant part of the Internet [76]. These events motivated the need to define an improved error handling mechanism that would prevent these incidents. The specification of the new error handling mechanism is still ongoing, the current status is described in [82] and [85]. Although details are still being worked out, the basic idea is to define error handling mechanisms in which errors in UPDATE messages do not affect the service of other NLRI. In case of corrupted UPDATE messages, the BGP speakers should withdraw the affected NLRIs instead of resetting the session.
776 777 778 779 780 781 782 783 784 785 786 787 788 789
793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812
4.5. Conclusion on session level techniques
819
In this section, we examined several techniques proposed in the last years to refine BGP session management. Each of the introduced methods focuses on maintaining network availability under various types events. Table 3 contains a summary of the techniques introduced in this section, including compatibility and operational information that operators must consider before implementing them.
820
5. Router architecture improvements
821
Router architectures have changed in the last few years driven by the need for faster forwarding. Initial router architectures were non-modular: the Route Processor (RP), the entity that performs the control plane operations, was highly tied to the elements that perform the packet forwarding. Current architectures follow a distributed architecture. The RP became independent of forwarding functions, which are entirely performed by line cards [7,66,91]. The separation of functions allowed the decoupling of control and data plane in the router. Although the main objective was to design routers
813 814 815 816 817 818
822 823 824 825 826 827 828 829 830
[m3Gdc;September 25, 2015;12:4]
that could forward packets as fast as possible, manufacturers also leveraged such distributed router architectures to facilitate network availability. This section describes some of the most relevant mechanisms introduced in this architecture for this end. 5.1. Nonstop Forwarding
831 832 833 834 835 836
Most manufacturers support nowadays the faculty of maintaining forwarding state when the control plane is restarted on the same or on a different RP [88,91]. This feature is usually denominated Non-Stop Forwarding (NSF). Modern routers can implement NSF due to the separation of control-plane and data-plane functions within the router [8]. Under this architecture, the RP uses the routing protocols to calculate the FIB and then sends updates to be performed on the FIB to the line cards. This approach allows routers to maintain forwarding independently from the state of the route processor. NSF, independently, cannot improve the availability of routers. This technique, however, is leveraged by other methods, such as graceful restart (Section 4.3), to maintain packet forwarding under short disruptions of BGP sessions.
844
5.2. Nonstop Routing for BGP
852
High performance routers require redundancy on all their key hardware components, such as RPs. The backup RP is activated if the primary fails or must be disabled due to maintenance. The switchover procedure creates an unavailability period in which packets are lost. Due to the relatively frequency of this operation, there was demand for a cleaner switchover between the active and backup RPs. Initial router architecture maintained the backup RP in a cold state. In case of a switchover, the backup RP needed to restart, in a procedure that could take several seconds [43,90]. A second generation of routers implemented partial synchronization between the main and backup route processors. The backup RP maintained some information of the state of the router, which allowed a faster restart time [43,90]. However, backup RP still lacked the session state of the routing protocols. Therefore, the IGP and BGP processes needed to restart after a switchover. In the last few years, manufacturers have been producing routers that can keep the backup RP completely synchronized with the primary RP. This permits the switchovers between primary and backup route processors without the need of restarting sessions. This mechanism is denominated Non-Stop-Routing (NSR) and is currently supported in some high-end routing platforms [2,8,9,41,46,88,91]. Graceful restart (Section 4.3) and Non-Stop-Routing can both be used to maintain network availability under RP switchovers. If both features are available, the latter may be preferred, as it does not depend on any protocol or the collaboration of peer routers. However, GR is still important for cases in which routers do not support NSR.
853
837 838 839 840 841 842 843 845 846 847 848 849 850 851
854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882
5.3. Improvements in FIB implementation
883
This section describes improvements towards high availability stemming from enhancements to the implementation of the FIB.
884
Please cite this article as: J.C. Cardona et al., Bringing high availability to BGP: A survey, Computer Networks (2015), http://dx.doi.org/10.1016/j.comnet.2015.09.005
885 886
JID: COMPNW
ARTICLE IN PRESS
[m3Gdc;September 25, 2015;12:4]
J.C. Cardona et al. / Computer Networks xxx (2015) xxx–xxx
11
Fig. 7. Dual connected ASes.
887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930
Routers maintain the paths from all routing protocols in a data structure denominated the RIB. Routing protocols might provide the router with more than one valid path to reach a prefix. From the available paths, a router chooses a set of them to forward the packets. These paths are stored in the FIB.9 Modern routers are normally built with a distributed architecture, in which the line cards are connected among each other with a switching fabric. While RIBs are maintained by the route processor, FIBs are downloaded into line cards. After receiving a packet, the line cards use the FIB to determine whether it can send the packets to one of its ports or if it must send it to another line card. The initial implementations of FIBs contained only the destination prefix and enough information to reach the nexthop on the network. When the next-hop is reached through an Ethernet interface and without encapsulations, the FIB would link the destination prefix with the outgoing interface and the MAC address of the next hop. This implementation of the FIB is usually denominated as plain FIB [26]. The reason behind this simple structure was performance. Indeed, using plain FIBs, the line cards could forward the packet after one look-up on the FIB. In Fig. 8a we show an example of a plain FIB for the reference scenario depicted in Fig. 7. If the best path towards the prefixes announced by router D2 is via EA1, the FIB only contains information on how to reach the direct next-hop IA1 (which is the next-hop on the shortest path towards EA1). While plain FIBs were developed to achieve good forwarding performance; they might still be inefficient when failures occur. The topology from Fig. 7 can be used to provide an example of this issue. Let us consider that router IA1 suffers a failure. By using a well configured IGP, router O1 can identify the failure and calculate a backup path through router IB1 after a few tens of milliseconds [26,44]. However, O1 must take an additional time to update its forwarding table. If router D2 is announcing a large number of BGP prefixes, this additional time can dominate the convergence. For instance, if AS2 announces around 500k prefixes, the failure could lead to multiple tens of BGP convergence time [26]. Although all prefixes announced by D2 shared the same path, the plain FIB treats them independently. An evolution of the plain FIB, denominated hierarchical FIB, tackles these problems by adding levels of indirection [26,44]. An example of a hierarchical FIB is depicted in Fig. 8b.
9
Note the difference between the RIB and the FIB. The RIB is the combination of the routes from all routing protocols in the router. The sub-set of all routes that are used for forwarding forms the FIB. The number of routes per destination on the FIB can vary depending on the network topology, the routing protocol operation, and the ability of the router to support multipath.
Fig. 8. Different implementation of FIBs.
As shown in the figure, all prefixes announced by router D2 now point to a structure that holds the indirect next-hop (IA1). If IA1 fails, the hierarchical FIB must only modify the direct next-hop to reach EA1 (IB1 in this case). This change can be executed much quicker than when using plain FIBs. In Section 3 we discussed the benefits for convergence obtained by maintaining additional paths. Nevertheless, if a FIB only stores the best paths, it would still be necessary for the router to update the FIB in case of a failure. A hierarchical FIB could leverage diversity by pre-storing valid backup paths in order to quickly switch to them if necessary. We can illustrate this case by using the topology from Fig. 7. After employing any of the techniques discussed on Section 3, O1 could receive a set of paths to D2’s prefixes via EA1 and another set of paths to the same prefixes via EB1. As depicted in Fig. 8c, the hierarchical FIB could keep the paths through EB1 in memory. If EA1 fails, O1 could quickly switch to backup paths. In [26], the authors perform various experiments to quantify the benefits of hierarchical FIB. For both edge and core type of failures, authors report the effectiveness of the hierarchical FIB in decoupling the convergence time of the network from the number of prefixes affected. In cases affecting more than 300 k prefixes, hierarchical FIB can reduce convergence time from multiple tens of seconds to a few hundred milliseconds [21,27]. The techniques described in this section can be expanded for cases in which more than one path is possible (i.e. load balancing) or when some kinds of encapsulation are used (e.g. MPLS, GRE). The hardware implementation of a hierarchical FIB is challenging, as the number of look-ups increases but overall performance must not be undermined.
Please cite this article as: J.C. Cardona et al., Bringing high availability to BGP: A survey, Computer Networks (2015), http://dx.doi.org/10.1016/j.comnet.2015.09.005
931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961
JID: COMPNW 12
962 963
964
ARTICLE IN PRESS J.C. Cardona et al. / Computer Networks xxx (2015) xxx–xxx
The availability, performance, and constraints of the features described in this section depend on each manufacturer [8]. 6. High availability in MP–BGP and BGP/MPLS L3VPNs
978
MP–BGP allows the exchange of information for multiple network layer protocols [4]. MPLS Virtual Private Networks (MPLS VPNs) is one of the most remarkable network services that rely on MP–BGP to function. Some of the techniques described in previous sections can be extended for MP–BGP or BGP/MPLS VPNs. For instance, graceful restart for BGP with MPLS is defined in [73], and the techniques described in Section 5.3 can be extended for MPLS networks [26,44]. Nevertheless, there are other specific challenges for MP–BGP that cannot be tackled with any of the methods described so far. In this section, we describe some of the challenges for service availability faced by MP–BGP and BGP/MPLS VPNs, and provide an overview of mechanisms specific designed to handle them.
979
6.1. Maintaining one session per AFI/SAFI
965 966 967 968 969 970 971 972 973 974 975 976 977
980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002
1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015
[m3Gdc;September 25, 2015;12:4]
Some service providers use a common route reflection infrastructure to distribute all IPv4 Internet routes, IPv6 Internet routes, and BGP/MPLS VPN routes. Since the amount of information carried by each address family can be significant, and they are, by default, served over the same sessions, a resource intensive update process for a given address family might delay the propagation of updates for other address families. This “Head-of-Line” blocking leads to a delayed convergence process. Another concern with respect of mixing AFI/SAFI families stems from the current approach to react to errors in UPDATE messages, in which an error forces a session reset. When paths of all AFI/SAFI are announced over the same session, an error in one message would tear down the session, thereby affecting reachability for all address families. An operational solution to this problem is to explicitly configure separate BGP sessions for each type of service. However, this approach is operatively cumbersome, since it would require independent configurations for each session. A new capability for automatic MP–BGP multi-session is being developed [80]. This capability lets routers establish disjoint sessions with each peer, one per AFI/SAFI, and achieve the same level of isolation. 6.2. Constrained route distribution BGP/MPLS VPN networks may serve a large number of routes over the same infrastructure. The amount of memory required on a Provider Edge (PE) router to hold the routing tables of all VPNs can be considerable. In order to reduce memory consumption, edge routers could be configured to keep only the paths that belong to the locally configured VPNs [70,77]. PEs distinguish the paths that must be imported into their routing tables using an extended community called the Route Target (RT). Each path that does not contain any of the RTs corresponding to locally served VPNs can thus be discarded, upon reception, by the router. This mechanism allowed operators to successfully use
middle-end routers in points of presence that only serve a few VPN clients. However, when the set of locally configured VPN changes, the PE might need to import paths that were previously rejected. The PE must thus send a ROUTE-REFRESH message to its route reflectors, in order to re-obtain the full routing table from them. In large networks, the exchange of the routing information of all VPNs can take minutes, which defeats the purpose of reducing overhead and “Head-of-Line” blocking. Ideally, a PE should simply be able to indicate the routes that it must import to its route reflectors [59]. This has been achieved with âÇ£Constrained Route Distribution for BGP/MPLS IP VPNsâÇ¥, defined in [54]. Using this technique, the RTs of the routes that the PEs need to import are transmitted to their route reflectors. The RR only needs to announce the routes that are tagged with the requested RTs.10 This solution is now supported by most router vendors [3,20,45]. 7. Current and future developments In this paper, we have described the main techniques developed to improve the availability of networks running BGP. Even if some of these methods are still in the process of standardization, different manufacturers already support them and most of them are already implemented in operational networks. BGP is, nonetheless, still in the process of expansion. In this section, we briefly introduce potential developments of BGP for future high availability features and their impact on research. We envision the inclusion of high availability features in BGP at four different levels: (1) further specialization of the protocol for network events; (2) the inclusion of high availability in new BGP services; (3) the interplay of BGP HA techniques with Software Defined Networks (SDN); and (4) analysis of HA techniques in global scale convergence. We will explain each of these levels individually. Further specialization of BGP. We have already discussed several mechanisms that attempt to specialize BGP for network events in Section 4. In the next years, other similar techniques can be developed. These types of extension are usually very specific and useful only for particular networks. BGP persistence [95] is an example of this type of proposal. This technique expands the concept of graceful restart, described in Section 4.3, and allows a router to keep routes that were received over a recently failed BGP session. Different from graceful restart, BGP persistence lowers the preference of stale routes, but, if no alternate route is found, allows them to stay in the routing table for a configurable amount of time. The foreseen application for such a feature is a very controlled environment, typically for reachability to prefixes addressing the edge networks of an eyeball ISP. In that scenario, the loss of connectivity for these prefixes will be unlikely recovered from another part of the network. The routes maintained by this technique hence serve as “last resort” routes and are useful for situations in which the control plane of the network fails, while forwarding plane still functions. The scope of BGP persistence is known to be only suitable for 10 A related feature, called Outbound Route Filtering [17], is used in typical BGP sessions to signal a BGP peer the filters that will be applied to incoming routes, thus avoiding their unnecessary transmission.
Please cite this article as: J.C. Cardona et al., Bringing high availability to BGP: A survey, Computer Networks (2015), http://dx.doi.org/10.1016/j.comnet.2015.09.005
1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070
ARTICLE IN PRESS
JID: COMPNW
[m3Gdc;September 25, 2015;12:4]
J.C. Cardona et al. / Computer Networks xxx (2015) xxx–xxx
13
Table 4 Summary of BGP HA techniques.
Description
Objective
Path diversity mechanism
Mechanisms that allow the announcement of more than one path per prefix
Augment diversity to reduce path hiding after network failures
Networks with high connectivity diversity
BFD
Independent protocol for detection of correct neighbor connectivity Reducing preference for paths affected by a network maintenance
Fast detection of connectivity disruption with a neighbor Pre-convergence of network before planned devices maintenance
All types of networks
BGP capability and mechanism that permit a router to keep forwarding packets to a recently lost BGP peer for a short period of time Maintains session states between active and backup route processors
Preserve packet forwarding under short downtime of the control plane, when the data plane of the router still functions
Networks formed by high-end routers that support the feature
Avoid packet loss upon backup RP switchover
Revised error handling for BGP
Improved error handling mechanism for BGP sessions
Hierarchical FIB
Multisession BGP
Additional level of indirection in the FIB that allows a quick modification of the direct next-hop for multiple prefixes with only one step. Furthermore, it can store alternative paths in FIB Creation of individual session per BGP address family
Avoid the restart of a whole session due to malformed updates from a single NLRI Reduce packet loss after failures on networks with high levels of diversity
Networks containing devices equipped with redundant route processors that support the feature All types of networks
Constrained route distribution
Permits a device to specify ask for the VPN-NLRIs marked with a specific Route Target.
Graceful shutdown
Graceful restart
NSR
1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089
Networks for which it is recommended
Technique
For BGP speakers exchanging more than one address families, protect each address family from failures on others Omit the need of a complete route refresh after the configuration of a new VPN in a device.
such limited but actually possible cases. Uses of the feature in more complex environments could lead to forwarding loops. This technique is recommended only for networks in which the operator has carefully assessed its proper use [95]. HA in new BGP services. BGP supports different services that were created in the last few years or that are still in development. Examples of these services are Internet EXchange Point (IXP) route servers [42] or BGP for large-scale data centers [51]. If any of these new services become important for the network operation in the future, it will be compelling to add high availability features to them. IXP route servers, for instance, are extensions to eBGP that allow a centralized BGP speaker to redistribute the prefixes announced by members of an IXP. Route servers help maintaining configuration scalability for the members of the IXP and play a role similar to the ones of route reflector within networks. However, similarly to route reflectors, route servers can also harm route diversity as they only advertise a single route per prefix [29,42]. Furthermore, as IXPs act as a single broadcast domain and
All types of networks
Considerations From the methods exposed in Section 3, an operator must choose the one(s) suitable for the network
Feature support by vendors can provide automatic graceful shutdown operation
Mechanism must be standardized to be deployed
All types of networks
It is only supported on routers with modern architectures. It is recommended to implement path diversity methods for an efficient deployment
Networks containing routers exchanging more than one address families via BGP
Manual configuration is always possible. New feature is in standardization process for automatic multisession establishment
Networks with MPLS-VPN services
there is no IGP running among IXP members, the implementation of a route server can allow the appearance of network “black holes” inside the IXP. A black hole is a network failure that is not detected by any control plane protocol. In the case of an IXP with a route server, a black hole is created when the connectivity between two members fails, while there is still connectivity between the members and the route server. IXPs are important pieces of the Internet fabric [12], therefore, it makes sense to implement High Availability features to solve these kinds of problems. Techniques such as Diverse Paths (Section 3.3) or Add-Paths (Section 3.4) can be adapted to the IXP environment and used to solve the lack of path diversity [29]. Regarding network black holes, BFD (Section 4.1), or expansions of it [1], can be run between IXP members in order to quickly detect failures between them. BGP in SDN. The future of BGP can highly depend on its interplay with new protocol and solutions born from Software Defined Networking (SDN). SDN is an umbrella term that refers to a series of new technologies that decouple the
Please cite this article as: J.C. Cardona et al., Bringing high availability to BGP: A survey, Computer Networks (2015), http://dx.doi.org/10.1016/j.comnet.2015.09.005
1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108
JID: COMPNW 14
ARTICLE IN PRESS J.C. Cardona et al. / Computer Networks xxx (2015) xxx–xxx
1145
control plane from forwarding devices [40]. SDN applications can offer network engineers the tools to facilitate the implementation of new functionalities and services. SDN solutions could interact with BGP in different ways. Some SDN devices, for instance, could reside in the network together with legacy BGP speakers [78]. Other SDN applications can leverage the flexibility offered by BGP and use it to exchange different types of information [34,50,55]. Network programmers should be careful in designing high available services using SDN tools. For example, SDN applications that interact directly with the RIBs or FIBs of a router [40] should rely on features such as hierarchical FIB (Section 5.3) to allow the router to quickly switch to a backup path in case of a failure. Other SDN applications that communicate to BGP speakers using the BGP protocol itself should be aware of features like graceful restart (Section 4.3) to maintain connectivity in certain network events. In conclusion, the development of SDN standards and SDN architectures should preserve the benefits of BGP HA features, or provide alternatives that preserve the performance of BGP in terms of resilience [72,96,102]. Effects on global convergence. In this paper, we covered techniques that provide high availability against internal disruptive events. Several authors have analyzed the dynamics of the convergence of the global routing table [24,37,48,49,52,63], the impact of external disruptions in individual networks [61,99,100], and suggested different solutions to accelerate global convergence [14,33,68]. We did not include techniques that aim to speed global convergence because (1) the control that a single AS has over external disruptions and global convergence is low and (2) the impact of an internal event can potentially affect a larger amount of traffic for the specific AS [93]. Nevertheless, the effect of a large-scale implementation of the techniques we described is still an open question. Could, for instance, the widely application of ADD-PATH or graceful shutdown speed up the convergence of the routing table? Future research work could measure or analyze this effect.
1146
8. Conclusion
1147
1160
BGP currently supports a large number of valuable network services. Service providers, manufacturers, and researchers have developed in the last years several techniques that can improve the availability of BGP networks. In this paper we have provided a survey of these techniques. In Table 4, we provide a summary of the techniques here described, their application, and recommendation of the types of networks where they should be used. There is no single guideline for a correct implementation of HA on BGP networks. The methods here described are complementary, as they attempt to improve the availability of BGP at different levels. From the available techniques, a network engineer must select the features that are more suitable for its network, in terms of support, operational effort, and benefits.
1161
Acknowledgments
1162
Camilo Cardona and Pierre Francois were partially funded for this work by Cisco Systems, under the Silicon Valley Community Foundation grant #CG 573362. The opinions and views expressed in this paper are those of the authors and
1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144
1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159
1163 1164 1165
[m3Gdc;September 25, 2015;12:4]
are not representing the views of Cisco Systems or the Silicon Valley Foundation.
1166
References
1168
[1] N. Akiya, C. Pignataro, D. Ward, Seamless Bidirectional Forwarding Detection (S-BFD) for IPv4, IPv6 and MPLS, draft-ietf-bfd-seamless-ip-01. Work in Progress. IETF Draft (2015). [2] Alcatel-Lucent, Highly Reliable IP Networks. Application Note, 2007. (WEB). URL http://www3.alcatel-lucent.com/solutions/mpls4ips/ docs/HiRel_an.pdf [3] Alcatel-Lucent, BGP, 2015. (https://infoproducts.alcatel-lucent. com/html/0_add-h-f/93-0074-HTML/7750_SROS_Routing_Procols_ Guide/bgp.pdf). [4] T. Bates, R. Chandra, D. Katz, Y. Rekhter, Multiprotocol extensions for BGP-4, IETF RFC 4760 (2007). [5] T. Bates, E. Chen, R. Chandra, BGP route reflection-an alternative to full mesh IBGP, IETF RFC 4456 (2006). [6] Z. Ben Houidi, M. Meulle, R. Teixeira, Understanding slow BGP routing table transfers, in: Proceedings of the 9th ACM SIGCOMM Conference on Internet Measurement Conference, ACM, 2009, pp. 350–355. [7] V. Bollapragada, C. Murphy, R. White, Inside Cisco IOS Software Architecture, Cisco Press, 2000. [8] C. Bookham, et al., Versatile Routing and Services with BGP: Understanding and Implementing BGP in SR-OS, John Wiley & Sons, 2014. [9] Brocade, BROCADE MLX SERIES, Multiservice IP/MPLS Routers, 2010. (WEB). URL http://www.brocadechina.com/download/MLX/Brocade_ MLX_Service_Provider_SB.pdf [10] M.-O. Buob, M. Meulle, S. Uhlig, Checking for optimal egress points in iBGP routing, in: Proceedings of the 6th International Workshop on Design and Reliable Communication Networks, 2007. DRCN 2007., IEEE, 2007, pp. 1–8. [11] M.-O. Buob, S. Uhlig, M. Meulle, Designing optimal iBGP routereflection topologies, in: Proceedings of the NETWORKING 2008 Ad Hoc and Sensor Networks, Wireless Networks, Next Generation Internet, Springer, 2008, pp. 542–553. [12] J.C. Cardona Restrepo, R. Stanojevic, A history of an internet exchange point, ACM SIGCOMM Comput. Commun. Rev. 42 (2) (2012) 58–64. [13] R. Chandra, P. Traina, T. Li, BGP communities attribute, IETF RFC 1997 (1996). [14] J. Chandrashekar, Z. Duan, Z.-L. Zhang, J. Krasky, Limiting path exploration in BGP, in: Proceedings of the 24th Annual Joint Conference of the IEEE Computer and Communications Societies INFOCOM 2005, 4, IEEE, 2005, pp. 2337–2348. [15] D.-F. Chang, R. Govindan, J. Heidemann, An empirical study of router response to large BGP routing table load, in: Proceedings of the 2nd ACM SIGCOMM Workshop on Internet Measurment, ACM, 2002, pp. 203–208. [16] E. Chen, Route refresh capability for BGP-4, IETF RFC 2918 (2000). [17] E. Chen, Y. Rekhter, Outbound route filtering capability for BGP-4, IETF RFC 5291 (2006). [18] P.-c. Cheng, J.H. Park, K. Patel, S. Amante, L. Zhang, Explaining BGP slow table transfers, in: Proceedings of the 2012 IEEE 32nd International Conference on Distributed Computing Systems (ICDCS), IEEE, 2012, pp. 657–666. [19] J. Choi, J.H. Park, P.-c. Cheng, D. Kim, L. Zhang, Understanding BGP next-hop diversity, in: Proceedings of the 2011 IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS), IEEE, 2011, pp. 846–851. [20] Cisco, Configuring BGP: RT Constrained Route Distribution, 2010. (http://www.cisco.com/c/en/us/td/docs/ios/iproute_bgp/configura tion/guide/15_1s/irg_15_1s_book/irg_rt_filter.pdf). [21] Cisco, BGP From Dinosaur to Racecar, 2012. (http://www.cisco knowledgenetwork.com/files/202_03-05-2012_20120305_EMEAR_ Webinar_v6.pdf?PRIORITY_CODE=). [22] B. Decraene, C. Francois P. Pelsser, A.J. Elizondo, Z. Ahmad, T. Takeda, Requirements for the graceful shutdown of BGP sessions, IETF RFC 6198 (2011). [23] B. Decraene, P. Francois, Improving Network Availability Through the Graceful Shutdown of BGP Sessions, 2008. (http://data.proidea.org.pl/ plnog/1edycja/materialy/prezentacje/decraene.pdf). [24] A. Fabrikant, U. Syed, J. Rexford, There’s something about MRAI: timing diversity can exponentially worsen BGP convergence, in: Proceedings of the INFOCOM, 2011, IEEE, 2011, pp. 2975–2983. [25] C. Filsfils, BGP convergence in much less than a second, NANOG40 (2006). [26] C. Filsfils, P. Mohapatra, J. Bettink, P. Dharwadkar, P. De Vriendt, Y. Tsier, V. Van Den Schrieck, O. Bonaventure, P. Francois, BGP PrefiI Independent Convergence (PIC), Technical Report, Cisco, 2011.
Please cite this article as: J.C. Cardona et al., Bringing high availability to BGP: A survey, Computer Networks (2015), http://dx.doi.org/10.1016/j.comnet.2015.09.005
1167
1169 1170 1171 1172 1173Q3 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241Q4 1242
JID: COMPNW
ARTICLE IN PRESS J.C. Cardona et al. / Computer Networks xxx (2015) xxx–xxx
1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322
[27] D. Fishburne, Routed Fast Convergence and High Availability, 2013 (http://monsterdark.com/wp-content/uploads/Routed-FastConvergence-and-High-Availability.pdf). [28] P. Francois, O. Bonaventure, B. Decraene, P.-A. Coste, Avoiding disruptions during maintenance operations on BGP sessions, IEEE Trans. Netw. Serv. Manag. 4 (3) (2007) 1–11. [29] P. Francois, J.C. Cardona Restrepo, A.Ha Simpson, ADD-PATH for Route Servers, draft-francois-idr-rs-addpaths-01. Work in Progress. IETF Draft (2014). [30] P. Francois, B. Decraene, C. Pelsser, K. Patel, C. Filsfils, Graceful BGP session shutdown, draft-ietf-grow-bgp-gshut-06. Work in Progress. IETF Draft (2014). [31] P. Francois, C. Filsfils, J. Evans, O. Bonaventure, Achieving sub-second IGP convergence in large IP networks, ACM SIGCOMM Comput. Commun. Rev. 35 (3) (2005) 35–44. [32] L. Gao, On inferring autonomous system relationships in the Internet, IEEE/ACM Trans. Netw. (ToN) 9 (6) (2001) 733–745. [33] L. Gao, J. Rexford, Stable Internet routing without global coordination, IEEE/ACM Trans. Netw. (TON) 9 (6) (2001) 681–692. [34] H. Gredler, J. Medved, A. Farrell, S. Previdi, S. Ray, North-Bound Distribution of Link-State and TE Information using BGP, draft-ietf-idr-lsdistribution-10. Work in Progress. IETF Draft (2015). [35] B.R. Greene, P. Smith, Cisco ISP Essentials, Cisco Press, 2002. [36] T. Griffin, G. Huston, BGP wedgies, IETF RFC 4264 (2005). [37] T.G. Griffin, B.J. Premore, An experimental analysis of BGP convergence time, in: Proceedings of the 9th International Conference on Network Protocols, 2001, IEEE, 2001, pp. 53–61. [38] T.G. Griffin, G. Wilfong, Analysis of the MED oscillation problem in BGP, in: Proceedings of the 10th IEEE International Conference on Network Protocols, 2002, IEEE, 2002, pp. 90–99. [39] E. Gutiérrez, D. Agriel, E. Saenz, E. Grampin, RRLOC: a tool for iBGP route reflector topology planning and experimentation, in: Procedings of the Network Operations and Management Symposium (NOMS), 2014 IEEE, IEEE, 2014, pp. 1–4. [40] S. Hares, R. White, Software-defined networks and the interface to the routing system (I2RS), IEEE Internet Comput. 17 (July–August (4)) (2013) 84–88. [41] Huawei, Feature Description - VRPv8, 2011, (WEB). [42] E. Jasinska, N. Hilliard, R. Raszuk, N. Bakker, Internet Exchange Route Server, draft-ietf-idr-ix-bgp-route-server-06. Work in Progress. IETF Draft (2014). [43] C.R. Johnson, Y. Kogan, Y. Levy, F. Saheban, P. Tarapore, VoIP reliability: a service provider’s perspective, IEEE Commun. Mag. 42 (7) (2004) 48–54. [44] Juniper, Ensuring Rapid Restoration in JUNOS OS-Based Networks, Whitepaper, 2010. [45] Juniper, Configuring BGP Route Target Filtering for VPNs, 2014 (http://www.juniper.net/documentation/en_US/junos14.1/topics/ usage-guidelines/vpns-configuring-bgp-route-target-filtering-invpns.html). [46] Juniper, Nonstop Active Routing Concepts, 2014, (WEB). URL http:// www.juniper.net/documentation/en_US/junos13.2/topics/concept/nsroverview.html [47] D. Katz, D. Ward, Bidirectional forwarding detection (BFD), IETF RFC 5880 (2010). [48] C. Labovitz, A. Ahuja, A. Bose, F. Jahanian, Delayed internet routing convergence, ACM SIGCOMM Comput. Commun. Rev. 30 (4) (2000) 175–187. [49] C. Labovitz, A. Ahuja, R. Wattenhofer, S. Venkatachary, The impact of Internet policy and topology on delayed routing convergence, in: Proceedings of the 20th Annual Joint Conference of the IEEE Computer and Communications Societies. IEEEINFOCOM 2001, 1, IEEE, 2001, pp. 537–546. [50] P. Lapukhov, A. E., P. Marques, E. Nkposong, Use of BGP for opaque signaling, draft-lapukhov-bgp-opaque-signaling-00. Work in Progress. IETF Draft (2014). [51] P. Lapukhov, A. Premji, E. Mitchell, Use of BGP for routing in largescale data centers, draft-ietf-rtgwg-bgp-routing-large-dc-01. Work in Progress. IETF Draft (2015). [52] Z.M. Mao, R. Govindan, G. Varghese, R.H. Katz, Route flap damping exacerbates Internet routing convergence, in: Proceedings of the ACM SIGCOMM Computer Communication Review, 32, ACM, 2002, pp. 221–233. [53] P. Marques, R. Fernando, P. Mohapatra, H. Gredler, E. Chen, Advertisement of the best external route in BGP, draft-ietf-idr-best-external-05. Work in Progress. IETF Draft (2012). [54] P. Marques, J. Guichard, R. Raszuk, R. Bonica, K. Patel, L. Fang, L. Martini, Constrained Route Distribution for Border Gateway Protocol/MultiProtocol Label Switching (BGP/MPLS) Internet Protocol (IP) Virtual Private Networks (VPNs), IETF RFC 4684 (2006).
[m3Gdc;September 25, 2015;12:4] 15
[55] P. Marques, M. Napierala, L. Fang, N. Bitar, A. Shukla, P. Pan, BGPsignaled end-system IP/VPNs., draft-ietf-l3vpn-end-system-04. Work in Progress. IETF Draft (2014). [56] D. McPherson, V. Gill, D. Walton, A. Retana, Border gateway protocol (BGP) persistent route oscillation condition, IETF RFC 3345 (2002). [57] D. McPherson, J.G. Scudder, Autonomous system confederations for BGP, IETF RFC 5065 (2007). [58] D. Meyer, BGP communities for data collection, IETF RFC 4384 (2006). [59] I. Minei, P.R. Marques, Scalability considerations in BGP/MPLS IP VPNs, IEEE Commun. Mag. 45 (4) (2007) 26–31. [60] P. Mohapatra, R. Fernando, R. Raszuk, C. Filsfils, Fast Connectivity Restoration Using BGP Add-path, draft-pmohapat-idr-fast-connrestore-03. Work in Progress. IETF Draft (2013). [61] E.S. Myakotnykh, O.J. Wittner, B.E. Helvik, A. Abdelkefi, J.K. Hellan, O. Kvittem, T. Skjesol, A. Øslebø, An analysis of interdomain availability and causes of failures based on active measurements, Telecommun. Syst. 52 (2) (2013) 847–860. [62] K.-K. Nguyen, B. Jaumard, A. Agarwal, A distributed and scalable routing table manager for the next generation of IP routers, IEEE Netw. 22 (2) (2008) 6–14. [63] R. Oliveira, B. Zhang, D. Pei, L. Zhang, Quantifying path exploration in the internet, IEEE/ACM Trans. Netw. 17 (2) (2009) 445–458. [64] J.H. Park, R. Oliveira, S. Amante, D. McPherson, L. Zhang, BGP route reflection revisited, IEEE Commun. Mag. 50 (7) (2012) 70–75. [65] K. Patel, E. Chen, J. Scudder, R. Fernando, Accelerated routing convergence for BGP graceful restart, draft-ietf-idr-enhanced-gr-05. Work in Progress. IETF Draft (2014). [66] D.E. Pavlichek, R. Chowbay, W.W. Downing III, et al., Juniper Networks Reference Guide: JUNOS Routing, Configuration, and Architecture, Addison-Wesley Professional, 2003. [67] D. Pei, J. Van der Merwe, BGP convergence in virtual private networks, in: Proceedings of the 6th ACM SIGCOMM Conference on Internet Measurement, ACM, 2006, pp. 283–288. [68] C. Pelsser, O. Maennel, P. Mohapatra, R. Bush, K. Patel, Route flap damping made usable, in: Passive and Active Measurement, Springer, 2011, pp. 143–152. [69] C. Pelsser, T. Takeda, E. Oki, K. Shiomoto, Improving route diversity through the design of iBGP topologies, in: Proceedings of the IEEE International Conference on Communications, 2008. ICC’08, IEEE, 2008, pp. 5732–5738. [70] I. Pepelnjak, J. Guichard, MPLS and VPN Architectures: CCIP Edition, 1, Cisco Press, 2002. [71] R. Raszuk, R. Fernando, K. Patel, D. McPherson, K. Kumaki, Distribution of diverse BGP paths, IETF RFC 6774 (2012). [72] M. Reitblatt, M. Canini, A. Guha, N. Foster, Fattire: declarative fault tolerance for software-defined networks, in: Proceedings of the 2nd ACM SIGCOMM Workshop on Hot topics in Software Defined Networking, ACM, 2013, pp. 109–114. [73] Y. Rekhter, R. Aggarwal, Graceful restart mechanism for BGP with MPLS, IETF RFC 4781 (2007). [74] Y. Rekhter, T. Li, S. Hares, Border gateway protocol 4, IETF RFC 4271 (2006). [75] A. Retana, Advertisement of Multiple Paths in BGP: Implementation Report, draft-ietf-idr-add-paths-implementation-00. Work in Progress. IETF Draft (2015). [76] N. RIPE, Duke University BGP Experiment, 2010, (WEB). URL https:// labs.ripe.net/Members/erik/ripe-ncc-and-duke-university-bgpexperiment [77] E.C. Rosen, Y. Rekhter, BGP/MPLS IP virtual private networks (VPNs), IETF RFC 4364 (2006). [78] C.E. Rothenberg, M.R. Nascimento, M.R. Salvador, C.N.A. Corrêa, S. Cunha de Lucena, R. Raszuk, Revisiting routing control platforms with the eyes and muscles of software-defined networking, in: Proceedings of the 1st Workshop on Hot topics in Software Defined Networks, ACM, 2012, pp. 13–18. [79] S. Sangli, E. Chen, R. Fernando, J. Scudder, Y. Rekhter, Graceful restart mechanism for BGP, IETF RFC 4724 (2007). [80] J. Scudder, C. Appanna, I. Varlashkin, Multisession BGP, draft-ietf-idrbgp-multisession-07. Work in Progress. IETF Draft (2012). [81] J. Scudder, R. Chandra, Capabilities advertisement with BGP-4, IETF RFC 5492 (2009). [82] J. Scudder, E. Chen, P. Mohapatra, K. Patel, Revised Error Handling for BGP UPDATE Messages, draft-ietf-idr-error-handling-18. Work in Progress. IETF Draft (2014). [83] J. Scudder, R. Fernando, K. Patel, J. Haas, Notification Message support for BGP Graceful Restart, draft-ietf-idr-bgp-gr-notification-03. Work in Progress. IETF Draft (2014). [84] P. Sebos, J. Yates, G. Li, D. Rubenstein, M. Lazer, An integrated IP/optical approach for efficient access router failure recovery, in: Proceedings of the Optical Fiber Communications Conference, IEEE, 2003.
Please cite this article as: J.C. Cardona et al., Bringing high availability to BGP: A survey, Computer Networks (2015), http://dx.doi.org/10.1016/j.comnet.2015.09.005
1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 Q5 1401 1402
JID: COMPNW 16
ARTICLE IN PRESS
[m3Gdc;September 25, 2015;12:4]
J.C. Cardona et al. / Computer Networks xxx (2015) xxx–xxx
1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 Q6 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459
[85] R. Shakir, Operational Requirements for Enhanced Error Handling Behaviour in BGP-4, draft-ietf-grow-ops-reqs-for-bgp-error-handling07. Work in Progress. IETF Draft (2014). [86] M. Shand, S. Bryant, IP fast reroute framework, IETF RFC 5714 (2010). [87] A. Simpson, P. Francois, K. Patel, J. Uttaro, R. Fragassi, Best Practices for Advertisement of Multiple Paths in IBGP, draft-ietf-idr-add-pathsguidelines-07. Work in Progress. IETF Draft (2014). [88] J. Sonderegger, O. Blomberg, K. Milne, S. Palislamovic, JUNOS High Availability, O’Reilly, 2009. [89] K. Sriram, D. Montgomery, O. Borchert, O. Kim, D.R. Kuhn, Study of BGP peering session attacks and their impacts on routing performance, IEEE J. Selected Areas in Commun. 24 (10) (2006) 1901–1915. [90] S. Srivastava, Redundancy management for network devices, in: Proceedings of the 9th Asia-Pacific Conference on Communications, 2003. APCC 2003, vol. 3, IEEE, 2003, pp. 1157–1162. [91] M. Tahir, M. Ghattas, D. Birhanu, S.N. Nawaz, Cisco IOS XR Fundamentals, Pearson Education, 2009. [92] R. Teixeira, K. Marzullo, S. Savage, G.M. Voelker, In search of path diversity in ISP networks, in: Proceedings of the 3rd ACM SIGCOMM Conference on Internet Measurement, ACM, 2003, pp. 313–318. [93] R. Teixeira, J. Rexford, Managing routing disruptions in internet service provider networks, IEEE Commun. Mag. 44 (3) (2006) 160–165. [94] S. Uhlig, S. Tandel, Quantifying the BGP routes diversity inside a tier-1 network, in: Proceedings of the Networking Technologies, Services, and Protocols; Performance of Computer and Communication Networks; Mobile and Wireless Communications Systems NETWORKING 2006., Springer, 2006, pp. 1002–1013. [95] J. Uttaro, E. Chen, B. Decraene, J. Scudder, Support for Long-lived BGP Graceful Restart, draft-uttaro-idr-bgp-persistence-03. Work in Progress. IETF Draft (2013). [96] N.L. Van Adrichem, B.J. Van Asten, F.A. Kuipers, Fast recovery in software-defined networks, in: Proceedings of the 3rd European Workshop on Software Defined Networking (EWSDN), Budapest, Hungary, 1–3 september 2014, EWSDN, 2014. [97] V. Van den Schrieck, P. Francois, O. Bonaventure, BGP add-paths: the scaling/performance tradeoffs, IEEE J. Selected Areas Commun. 28 (8) (2010) 1299–1307. [98] D. Walton, A. Retana, J. Scudder, E. Chen, Advertisement of Multiple Paths in BGP, draft-ietf-idr-add-paths-10. Work in Progress. IETF Draft (2014). [99] F. Wang, L. Gao, J. Wang, J. Qiu, On understanding of transient interdomain routing failures, in: Proceedings of the 13th IEEE International Conference on Network Protocols. ICNP 2005, IEEE, 2005, p. 10. [100] F. Wang, Z.M. Mao, J. Wang, L. Gao, R. Bush, A measurement study on the impact of routing events on end-to-end Internet path performance, ACM SIGCOMM Comput. Commun. Rev. 36 (4) (2006) 375– 386. [101] L. Wang, M. Saranu, J.M. Gottlieb, D. Pei, Understanding BGP session failures in a large ISP, in: Proceedings of the 26th IEEE International Conference on Computer Communications INFOCOM 2007, IEEE, 2007, pp. 348–356. [102] D. Williams, H. Jamjoom, Cementing high availability in OpenFlow with RuleBricks, in: Proceedings of the 2nd ACM SIGCOMM Workshop on Hot topics in Software Defined Networking, ACM, 2013, pp. 139– 144. [103] R. Zhang, M. Bartell, BGP Design and Implementation, Cisco Press, 2003.
1460 1461 1462 1463 1464 1465 1466 1467 1468 1469
Juan Camilo Cardona graduated from the University Santo Toms in Colombia as a Telecommunications Engineer. In 2008 he received a M.Sc. in Communications Engineering from Technische Universitt Mnchen, in Germany. He is currently a Ph.D. candidate in Telematics Engineering at University Carlos III of Madrid. Before joining Institute IMDEA networks, Juan Camilo worked for several years at Internet Service Providers and Network Integrators.
Pierre Francois obtained his Ph.D. degree in Computer Science from Université catholique de Louvain, Belgium, in 2007. He is now a staff researcher at the IMDEA Networks Institute, where he notably carries out research in collaboration with ISPs on network management. He is notably a consultant on routing technologies at Cisco Systems. His work includes several papers published in top conferences and journals within the networking field, as well as multiple Internet Engineering Task Force (IETF) Working Group documents and RFCs, in various working groups of the IETF Routing and IETF Operations and Manage-
1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 1481 1482 1483
Bruno Decraene is an Orange IP networking expert. His main topics of interest are unicast intra-domain and inter-domain routing in Service Provider networks. He is working on a range of short to long term activities such as engineering studies, IP network architecture, and longer term IETF standardization with the aim of providing innovation within Orange networks. Before joining Orange Labs, he worked with France Telecom on Information Systems for network management and network mediation for billing.
1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 1494
John Scudder is a Distinguished Engineer at Juniper Networks. He has worked in the Internet Industry since 1990, when he joined the Internet Engineering team at Merit Network, Inc, doing network engineering and support for the NSFNET. Since then he has worked at a variety of Internet companies, large and small. His interests include routing protocols, particularly BGP, and routing security. He co-chairs the IETF IDR (which standardizes BGP and its extensions) and SPRING working groups, and is a past co-chair of the IETF routing area working group.
1495 1496 1497 1498 1499 1500 1501 1502 1503 1504 1505 1506
Adam Simpson is a Product Manager of the Service Router Product Group at Alcatel-Lucent. He has more than 15 years of experience in network device manufacturing, and been involved in the product development of different networking technologies, including wireless, carrierEthernet, and BGP. He has contributed to standardization with numerous IETF RFCs.
1507 1508 1509 1510 1511 1512 1513 1514
Keyur Patel is a Distinguished Engineer at Cisco Systems. His current interests include the evolution of technologies involving inter-domain routing (BGP), Routing policies, L3VPNs, MVPNs, and L2VPNs. He is an active member of the IETF, where he authors several RFCs and drafts. He holds various patents and academic papers.
1515 1516 1517 1518 1519 1520 1521
ment areas.
Please cite this article as: J.C. Cardona et al., Bringing high availability to BGP: A survey, Computer Networks (2015), http://dx.doi.org/10.1016/j.comnet.2015.09.005