On surviving dual-link failures in path protected optical WDM mesh networks

On surviving dual-link failures in path protected optical WDM mesh networks

Optical Switching and Networking 3 (2006) 71–88 www.elsevier.com/locate/osn On surviving dual-link failures in path protected optical WDM mesh networ...

980KB Sizes 0 Downloads 27 Views

Optical Switching and Networking 3 (2006) 71–88 www.elsevier.com/locate/osn

On surviving dual-link failures in path protected optical WDM mesh networksI Mahesh Sivakumar, Krishna M. Sivalingam ∗ Department of CSEE, University of Maryland, Baltimore County, Baltimore, MD 21250, United States Received 8 March 2005; received in revised form 3 April 2006; accepted 22 April 2006 Available online 3 July 2006

Abstract In this paper, we investigate the problem of enhancing dual-failure restorability in path protected mesh-restorable optical Wavelength Division Multiplexed (WDM) networks. Recent studies have demonstrated the need to survive simultaneous dual-link failures and have also provided solutions for handling such failures. A key finding of these early efforts is that designs providing complete (i.e. 100%) protection from all dual-failures need almost triple the spare capacity compared to a system that protects against all single-link failures. However, it has also been shown that systems designed for 100% single-link failure protection can provide reasonable protection from dual-link failures [M. Clouqueur, W. Grover, Mesh-restorable networks with 74 enhanced dualfailure restorability properties, in: Proc. SPIE OPTICOMM, Boston, MA, 2002, pp. 1–12]. Thus, the motivation for this work is to develop a hybrid mechanism that provides maximum (close to 100%) dual-failure restorability with minimum additional spare capacity. The system architecture considered is circuit-switched with dynamic arrival of sessions requests. We propose an adaptive mechanism, which we term active protection, that builds upon a proactive path protection model (that provides complete singlefailure restorability), and adds dynamic segment-based restoration to combat dual-link failures. The objective is to optimize network survivability to dual-link failures while minimizing additional spare capacity needs. We also propose a heuristic constraintbased routing algorithm, which we term best-fit, that aids backup multiplexing among additional spare paths towards this goal. Our findings indicate that the proposed active protection scheme achieves close to complete (100%) dual-failure restorability with only a maximum of 3% wavelength-links needing two backups, even at high loads. Moreover, at moderate to high loads, our scheme attains close to 16% improvement over the base model that provides complete single-failure restorability. Also, the best-fit routing algorithm is found to significantly assist backup multiplexing, with around 15%–20% improvement over first-fit at all loads. The segment-based restoration algorithm reiterates the importance of utilizing wavelength converters in protection and is seen to provide around 15%–20% improvement over link restoration especially at moderate to high loads. c 2006 Elsevier B.V. All rights reserved.

Keywords: Optical networks; WDM; Survivability; Reconfigurability; Wavelength converters; Time-slot interchangers; Architectures; Heuristics

1. Introduction I Part of the research was supported by a grant from Cisco Systems,

San Jose, CA, Intel Corporation and NSF grant No. ANI-0322959. A preliminary version of this paper was presented at the SPIE Opticomm 2003 conference at Dallas, TX in October 2003. ∗ Corresponding address: UMBC, Department of CSEE, 1000 Hilltop Circle, Room 325, MD 21250 Baltimore, United States. E-mail addresses: [email protected] (M. Sivakumar), [email protected] (K.M. Sivalingam). c 2006 Elsevier B.V. All rights reserved. 1573-4277/$ - see front matter doi:10.1016/j.osn.2006.04.004

This paper studies the problem of providing survivability against failures in optical Wavelength Division Multiplexed (WDM) networks [2,3]. Network studies have shown that frequent cable cuts result in link failures, which result in tremendous data loss due to the

72

M. Sivakumar, K.M. Sivalingam / Optical Switching and Networking 3 (2006) 71–88

amount of traffic carried on those links. Protection from single-link failures in mesh-based networks has been extensively studied in previous work [4–9]. However, it has been recently observed that networks can also frequently experience multiple simultaneous link failures, due to several reasons including: (i) During the time to repair a broken link, of the order of a few hours to a few days, it is likely to have another link failure resulting in two links being down at the same time [10]; and (ii) Instances where individual links share the same fiber duct to the extent of a few hundred meters cause a logical two-link failure if the fiber duct is damaged. The links sharing the same fiber duct are termed Shared Risk Link Groups (SRLG) and have been studied in [11]. It is hence imperative that we address the problem of providing recovery from multiple simultaneous link failures. The work presented in this paper considers the problem of providing recovery from dual-link failures, which assumes that a maximum of two fiber links can fail simultaneously at any given instant in time. We do not consider dual- and multiple-link failures caused by SRLG failures in this study but present a discussion on how the technique we propose to protect from general dual-link failures can be used to protect failures resulting from SRLGs in Section 6. In designing a system that recovers from every type of link failure (single or multiple), one of the most important factors to be considered relates to the spare capacity required to meet system design goals. A key finding of the early efforts relating to spare capacity requirements for dual-link failure recovery [1] is that designs providing complete (i.e. 100%) protection from every dual-link failure need almost triple the spare capacity of a system that protects only against every single-link failure. Not surprisingly, one of the primary objectives in designing a network resilient to dual-link failures is to minimize the spare capacity needed. A good place to start would be to identify the extent to which networks designed for complete single-failure restorability protect against dual-link failures. Accordingly, it has been shown that systems designed for 100% single-link failure protection can provide reasonable protection from duallink failures [1]. Also, in spite of the fact that dual-link failures are a likely occurrence, they are not expected to be as frequent as single-link failures. Hence, a clever choice would be to protect the network against duallink failures when they occur rather than having to dedicate spare capacity at all times. Thus, the motivation of this work is to develop a hybrid mechanism that provides maximum (close to 100%) dual-failure restorability with minimum additional spare capacity

over what is required by a design that provides 100% single-link failure recoverability, while leveraging the benefits of both proactive and reactive approaches to survivability. The system architecture considered is circuitswitched with dynamic arrival of sessions requests. We also consider limited wavelength conversion [12], where all nodes have a limited number of converters. The basic system design guarantees 100% protection from all single-link failures using a proactive path protection technique, wherein an incoming session request is provided with a link-disjoint primary and backup route. The Predetermined Backup Route (PBR) will be used when one or more of the primary links fails. In identifying scenarios that need additional spare capacity, we identify vulnerable lightpaths that that will be rendered unrestorable by a dual-link failure, and try to optimize the spare capacity needs of these vulnerable lightpaths. Towards this end, we propose an adaptive mechanism, which we term active protection, that builds upon the proactive path protected model (that provides complete single-failure restorability) and adds dynamic segment-based restoration to protect from dual-link failures. The objective is to optimize network survivability (and minimize spare capacity needs) to dual-link failures while maintaining close to complete single-failure restorability. The segments of the dynamic restoration scheme is designed based on wavelength converter availability at the intermediate nodes of the lightpath. Furthermore, to increase benefits from backup multiplexing among the backup paths, we present a best-fit routing algorithm to choose linkdisjoint pairs for incoming connection requests. The paper’s main contributions are: (i) by appropriately combining both proactive and reactive approaches to dual-link failure survivability, we can obtain high dualfailure restorability and optimize spare capacity needs; (ii) by carefully designing the primary and backup route computation algorithm, we can increase backup resource utilization by selecting routes that will allow maximum backup multiplexing among protection paths. We use discrete event simulations for our study and compare the proposed active protection scheme to the base model which provides 100% protection to single-link failures and the proposed best-fit routing algorithm with the first-fit routing algorithm. For the active protection scheme, we measure the dual-failure restorability, the additional spare capacity optimization provided and the extent to which vulnerable lightpaths need to be protected to obtain close to complete dual-failure restorability. For the best-fit algorithm, we measure the improvement in spare capacity

M. Sivakumar, K.M. Sivalingam / Optical Switching and Networking 3 (2006) 71–88

utilization obtained. The study was done for the 20-node 32-link Arpanet and the 11-node 22-link modified NJLATA topologies [13]. In addition, the 14node NSFNET topology was used for evaluating the best-fit algorithm. Simulation results (for the studied topologies and parameters) show that the proposed active protection scheme achieves close to complete dual-failure restorability with only a maximum of 3% of the wavelength-links having two backup paths. At moderate to high loads, our scheme attains close to 16% improvement over the base model. Interestingly, it is seen that by protecting only around 80%–85% of the vulnerable lightpaths, we can obtain close to complete dual-failure restorability, making the hybrid approach of using protection and restoration more appealing. The segment-based restoration algorithm emphasizes the importance of wavelength converters in protection and is seen to provide around 15%–20% improvement over link restoration, especially at moderate to high loads. The proposed constraint-based best-fit routing algorithm is found to significantly assist backup multiplexing, with around 15%–20% improvement in spare capacity utilization over a non-constraint-based routing model (first-fit), at all loads. The rest of the paper is organized as follows. Section 2 provides the necessary background and related work on dual-failures. A detailed description of the proposed Active Protection scheme is provided in Section 3. Techniques to reduce additional spare capacity requirements and the proposed best-fit heuristic for routing link-disjoint pairs is described in Section 4. The performance of the proposed scheme is then analyzed in Section 5. We then present a brief discussion on how the proposed scheme can be extended to protect multiple link failures arising due to SRLG failures in Section 6. Finally, Section 7 concludes the paper. 2. Background and related work This section summarizes previous research efforts on fault tolerance in optical WDM mesh networks that are relevant to our study. In designing a survivable mesh network, the various mechanisms can be classified as either proactive protection or reactive restoration [14]. Protection reserves backup resources for each link or path during the connection setup or network design time. Restoration reacts after a failure occurs and tries to reroute the affected traffic around the failed links. Protection is faster and guarantees recovery since backup capacity is already reserved. However, it leads to inefficient utilization due to unused backup resources

73

when failures do not occur. On the other hand, restoration utilizes resources more efficiently by using them only when needed. However, it does not guarantee recovery from failures and may take a longer time to establish recovery paths for all affected failures. In this paper, we present an approach that tries to combine the advantages of both these techniques to achieve high restoration speed and high guarantees, while achieving high resource utilization. 2.1. Previous work on dual-failures Most of the earlier work has concentrated on singlelink failures, where only one network link fails at a given instance in time. It has been recently recognized that dual-link failures should also be considered in designing failure recovery mechanisms [10]. Dual-link failures are said to occur when two fiber links fail at the same time. The classification of possible dual-link failures in optical mesh networks has been investigated in [10] and [15]. The former analyzes the spare capacity requirements for a network employing link protection to provide complete double-link failure recovery. The latter provides a hierarchical classification for duallink failures and assesses the algorithms’ capability to recover from failures in each class. The work presented in [13] extended the study in [10] to include backup multiplexing that improves resource utilization and also formulated an ILP towards the same. Spare capacity requirements for handling dual-link failures in networks that are designed to combat all single-link failures was researched in [16] and [17]. Finally, a protection model to incorporate varied protection to different service classes with the objective of optimizing the investment in capacity to provide high availability can be found in [1]. A study of spare-channel design schemes for selfhealing networks can be found in [18]. We next present an availability analysis of duallink failures for networks designed to provide complete single-failure restorability (through path protection), to identify scenarios that require additional spare capacity. 2.2. Classification of dual-failures This section analyzes the possible dual-link failure scenarios in mesh topologies to identify their implications for spare capacity requirements. We assume that two links in the network can fail in any arbitrary order and the network uses path protection, i.e. a link disjoint backup path is provided to each connection at setup time. The various scenarios are illustrated in Fig. 1 for one lightpath affected by the failure. For simplicity, we

74

M. Sivakumar, K.M. Sivalingam / Optical Switching and Networking 3 (2006) 71–88

Fig. 1. Dual-failure models: (a) Spatially disjoint failures; (b) Spatially contending failures; (c) Backup-link failures — Case i; (d) Backup-link failures — Case ii.

only show one lightpath traversing the link. In what follows, we assume that the failure of link e is followed by the failure of link f . A similar analysis for link protection can be found in [10]. • Spatially disjoint failures: Consider Fig. 1(a) which shows two primaries, one using link e and the other using link f . The backup paths of the primaries are link disjoint as shown. If link e fails and is followed by the failure of link f before link e is repaired, we have a dual-link failure scenario. In this case, the appropriate backup paths can be used to reroute the affected primary traffic. Such failures are termed Spatially Disjoint since the backup paths of the affected lightpaths do not share resources. Hence this dual-failure scenario can be handled by a design that protects single-link failures without requiring additional spare capacity. • Spatially contending failures: These failures represent the case where the backup path of a lightpath using link e and the backup path of a lightpath using link f compete for resources. This occurs only when backup multiplexing is allowed. Such failures render one of the two lightpaths

unrestorable if additional backup paths are not provided. This can be avoided by not allowing backup multiplexing at the expense of lower resource utilization. Fig. 1(b) shows one such failure where the backup path of the primary using link e and that of the primary using link f share a link between nodes E and F. Simultaneous failure of both links will result in these backup paths contending for resources on the common link (assuming that they were backup multiplexed on that link.) • Backup link failures (BLF): If the second link failure ( f ) occurs on one of the backup link(s) of a lightpath affected by the first link failure (e), it will result in disruption of the traffic restored after the first failure. Such failures are termed Backup Link Failures. Depending on the spatial nature of the backup paths for the primary traffic traversing link f , two situations may arise: (1) The backup paths of the lightpaths affected by link f ’s failure do not traverse link e: In this case, the lightpaths affected by f ’s failure can be rerouted on their appropriate backup paths while the traffic affected by e’s failure that is traversing f will be disrupted. If p1 represents the traffic

M. Sivakumar, K.M. Sivalingam / Optical Switching and Networking 3 (2006) 71–88

rerouted through f from e (backup traffic) and p2 represents the traffic affected by f ’s failure (primary traffic), two solutions are possible. One way is to have the backup path(s) of f carry p1 and p2 and the other way is to reroute p1 on to another path (possibly precomputed) that does not pass through f . The first possibility is shown in Fig. 1(c), where, failure of links e and f would require the backup path of f to carry the traffic affected by f ( p2 ) and e’s traffic routed on f ( p1 ). (2) The backup path of a lightpath affected by f ’s failure passes through e: In this case, more than one backup path will be required for each such affected primary path, irrespective of whether link or path protection is used. This is because the backup paths of both the failed links are rendered useless as illustrated in Fig. 1(d) where the lightpaths passing through links e and f have their backup paths traversing each other. Hence, a failure in those two links would leave both failures unrestored. Thus, it is seen that backup link failures are the only ones that will need additional spare capacity beyond what is provided by proactive protection mechanisms that provide complete single-failure restorability. Our proposed algorithm exploits this property and minimizes the additional spare capacity needed to handle such failures. • Unrecoverable failures: These failures occur when the network has nodes of degree two. The failure of those links would isolate the node from the rest of the network and cannot be recovered from, irrespective of the protection mechanism used. In Fig. 1(d), if the links between nodes A, B and A, E fail simultaneously, node A would be isolated from the network, rendering the failures unrecoverable. 3. Proposed active protection mechanism This section presents the details of the hybrid dualfailure recovery mechanism, which we term Active Protection. We first describe the network model followed by the basic concepts. The detailed algorithm and the mechanisms involved are then described. Finally, we illustrate with examples the details of how the proposed scheme recovers from dual-link failures that require additional spare capacity. 3.1. Network model We consider a circuit-switched all-optical network. We assume that connections are established and torn

75

down dynamically based on lightpath requests that arrive at a source based on a Poisson process with destinations chosen in a uniformly random manner. The input to the problem is a network topology represented by G(N , L), where N is the number of nodes and L is the number of links in the network. Each link has a fixed number of wavelengths, W The number of wavelengths per link (W ), its capacity and the number of wavelength converters available (the given wavelength converters are uniformly divided among the nodes in the network) are also provided as input. Fixed alternate path routing is used to route the incoming paths, for simplicity. This technique involves maintaining a fixed set of predetermined link disjoint routes (say k) for each source–destination pair. The basic network is assumed to provide 100% single-failure guarantees. The design achieves this by providing each connection that arrives in the network with a primary lightpath and a link-disjoint predetermined backup lightpath (that does not have any link in common with the primary and is referred to as PBR in our study) between any given source and destination. The Predetermined Backup Route (PBR) will be used when one or more of the primary links fail. Furthermore, a limited wavelength conversion model, wherein a limited number of converters are shared at each node (share-per-node architecture) on a First-Come-First-Served (FCFS) basis, is assumed. For wavelength assignment, the Minimized First-Fit (MinFF) algorithm from [19], which tries to reduce the number of wavelength converters required and effectively utilize the existing ones, is used. 3.2. Basic concepts The design objectives of the proposed hybrid scheme, which we term active protection, can be summarized as follows: (1) To enhance dual-failure restorability in a network designed to provide 100% single-link failure guarantee, with minimal additional spare capacity. (2) To improve additional spare capacity (the spare capacity that is used in protecting dual-link failures, in addition to that used for single-link failure survivability) utilization. (3) To obtain high restoration speed and guarantees while efficiently utilizing the resources (wavelengths and wavelength converters). To achieve the above objectives, the proposed strategy is to provide additional spare capacity only when needed, i.e., only to the dual-link failure

76

M. Sivakumar, K.M. Sivalingam / Optical Switching and Networking 3 (2006) 71–88

scenarios where necessary (backup link failures) and only when necessary. For each primary connection, a Predetermined Backup Route (PBR) is established when the lightpath is set up and a Restored Backup Route (RBR) is calculated when any link on the primary path fails. Thus, while the PBR exists at all times, the RBR exists only for the duration of the failure. In other words, lightpaths are given additional spare capacity only when required. By appropriately combining both proactive and reactive methodologies, the proposed scheme tries to attain high restoration guarantees and speed of restoration while efficiently utilizing the resources. Moreover, the restoration algorithm, as will be seen, is not bounded by time and is designed in such a way as to improve restoration guarantees by enhancing the utilization of wavelength converters. We first define the concepts of vulnerable primary and vulnerable backup paths, when a link failure (denoted by e) occurs. • Vulnerable backup: For a given lightpath, whose traffic is currently carried over its predetermined backup route (PBR) due to a failure e, assume that a second link failure occurs on the PBR. The lightpath potentially cannot be restored due to the unavailability of both its primary and backup (PBR) routes. We term such lightpaths vulnerable backups. In Fig. 1(c), the lightpath (PBR(e), traversing link f ) whose primary path contains link e is a vulnerable backup as the failure of both e and f would render the lightpath unrestorable. • Vulnerable primary: A primary lightpath, whose PBR contains the failed link (e), is said to be a vulnerable primary since its PBR cannot be utilized if any of its primary path’s links fail before e is fixed. In Fig. 1(d), the lightpath whose primary contains link f and whose PBR contains link e is a vulnerable primary. Note that these are the only paths that may require additional spare capacity to what is provided initially in order to obtain complete (100%) dual-failure restorability. As will be seen in Section 5, it may not be necessary to protect all such paths in order to obtain nearly complete (close to 100%) dual-failure restorability. 3.3. Algorithm description We now describe the active protection algorithm. When a link in the network fails (denoted by link e), the following operations are performed: (1) The end nodes of the failed link signal the source and destination of all the affected lightpaths. The

traffic on each such lightpath is then switched on to its corresponding PBR path. This follows the classical proactive path protection mechanism. (2) Next, the system attempts to determine and establish recovery capacity for the two vulnerable path types described earlier. (a) For each vulnerable backup and primary affected by failure e, a search for a Restored Backup Route (RBR) is initiated. This is based on a segment-based restoration scheme, explained in Section 3.4. If a restoration route is indeed found for a given lightpath, resources are reserved on those paths for the duration of the link failure. If such routes are not found in the first attempt for a particular lightpath, subsequent attempts may be done a little later. If no lightpath is found until the first failure is fixed, the search is abandoned. Also, all the RBRs calculated remain either until the corresponding failed link is repaired or until the primary lightpath containing the failed link is torn down. (3) Assume that another link fails (denote this f ) during the time the first link is being repaired. The system then performs the following operations: (a) For a lightpath whose traffic was already on a PBR and when f happens to be on this PBR, its traffic will transferred to the RBR calculated in Step 2(a). (b) All the primary lightpaths traversing link f will be rerouted on to their PBRs as long as it does not contain a failed link. (c) In the above step, if a lightpath’s PBR did contain a failed link, the corresponding RBRs would have been calculated in Step 2(b). Hence, traffic would be rerouted on to this RBR. Note that there is still a finite possibility that a lightpath’s traffic not be successfully rerouted depending on whether an RBR is found successfully in Step 2. Thus, the overall dual-failure restorability of our mechanism is determined by the guarantees provided by the dynamic segment-based restoration algorithm used in Step 2. We next describe the segment-based restoration technique used to calculate the second backup path (RBR) for vulnerable lightpaths. 3.4. Segment restoration Consider the link restoration mechanism, where an alternate route is found around an affected link. If the end nodes of the failed link do not have a wavelength

M. Sivakumar, K.M. Sivalingam / Optical Switching and Networking 3 (2006) 71–88

77

Fig. 2. Segment restoration: (a) Nodes B and E of the segment have converters; (b) No intermediate node has a converter, thus segment restoration is equivalent to path restoration; and (c) End nodes of the failure have a converter, thus segment restoration is equivalent to link restoration.

converter, the restoration path needs to be set up on the same wavelength that is being used in the primary. This can seriously limit the chances of finding the alternate path thereby reducing the overall restoration efficiency. We propose a segment-based restoration technique to overcome this drawback. Work on segment protection has been done in [20–24]. Our approach differs in the way the segments are calculated and the fact that we use the segment-based technique for dynamic restoration. We define a segment as a set of links that has at least one wavelength converter available (at the instant when it is formed) at the start and end nodes of the segment. Thus, the length of the segment varies dynamically in accordance with the availability of wavelength converters on the path nodes. Segmentbased restoration is illustrated in Fig. 2. Fig. 2(a) shows a lightpath spanning nodes A to F. It is assumed that nodes B and E have wavelength converters (we assume a sparse wavelength conversion model). If link C D fails, the failure is detected by nodes C and D and propagated towards their source and destination. The signal stops at nodes that have a wavelength converter to spare (nodes B and E in the example). These nodes now participate in a search for a restoration route around the segment formed. In the example, a segment is formed between nodes B and E. The advantage of this approach is that it does not require the restoration path to be on the same wavelength as the primary, thereby increasing the probability of finding a backup segment.

Segment restoration reduces to path restoration when no intermediate node in the lightpath has a converter as shown in Fig. 2(b). Here, the restoration path is found between nodes A and F, the source and destination of the path respectively. When the end nodes of the failed link have one or more converters as shown in Fig. 2(c) (nodes C and D), segment restoration is equivalent to link restoration and a restoration path is found around the failed link. Thus, the objective of the segment restoration technique is to enhance restoration guarantees and wavelength converter utilization. Further, forming a segment is simple and does not need extensive computations. As will be seen from the performance studies, segment restoration outperforms link restoration at moderate to high loads. 3.5. Protecting dual-link failures Fig. 3 illustrates with examples the details on how the active protection scheme handles the backup link failure scenarios presented in Fig. 1. In what follows, link e is assumed to fail first followed by link f . For simplicity, we assume that the source and destination node of the failed link have converters. The backup link failure scenario is explained in Figs. 3(a) and (b). Fig. 3(a) shows how a vulnerable backup is protected. Here, the second link failure ( f ) occurs on PBR(P1 ) (the vulnerable backup) of a lightpath P1 affected by the first link (e) failure. The PBR of the lightpath using link f (PBR(P2 )) does not traverse e. When link e fails, the affected traffic

78

M. Sivakumar, K.M. Sivalingam / Optical Switching and Networking 3 (2006) 71–88

Fig. 3. Handling dual-link failures: (a) Vulnerable backups; (b) Vulnerable primaries.

is routed on to PBR(P1 ) and RBR(P1 ) is calculated. When f fails, the affected primary traffic (P2 ) is rerouted on to its backup path (PBR(P2 )) and the traffic rerouted from e (backup traffic of P1 on link f ) will now be diverted on to RBR(P1 ). If the PBR of the lightpath traversing link f does use link e, as shown in Fig. 3(b), the primary traversing link f (P2 ) would have been considered vulnerable when link e failed and hence would have had an RBR calculated. This restored backup of f (RBR(P2 )) will be used when link f fails, while traffic rerouted from e on link f (PBR(P1 )) will use RBR(P1 ). 3.5.1. Advantages The benefits obtained from the proposed scheme are summarized as follows: (1) The time for which a failure exists is much less than the time for which a session exists. Since the restored paths (RBRs) exist only for the time the failures persist, additional capacity is reserved for small durations as opposed to lightpath lifetimes required when 100% dual-link failure protection is provided. (2) The RBRs are used under two cases: (i) for vulnerable backups and (ii) for vulnerable primary lightpaths. In both the cases, the restored path only needs to be available if and when the second failure occurs. Thus, the algorithm used to calculate RBRs has more relaxed computation time limits compared to pure restoration scheme which has to quickly compute restoration routes after failure occurrence. (3) In both the cases mentioned above, an RBR is utilized only when the second link failure occurs. If we assume that, most of the time, only two link failures will happen at a given instant, no two RBRs will be used simultaneously, thus, allowing

them to share resources, i.e., RBRs can be backup multiplexed. 4. Minimizing additional spare capacity The proposed active protection heuristic tries to find an RBR for every vulnerable path resulting from a link failure. This section explores ways to reduce the Additional Spare Capacity (ASC) thus found. One way to reduce spare capacity requirements is to consider backup multiplexing among the RBRs. One can also consider load balancing algorithms and Quality of Protection (QoP) to achieve the same. Due to lack of space, however, we only explore RBR multiplexing here. Discussions on reducing ASC with load balancing (which includes an ILP for path protection that aims at minimizing the maximum link utilization) and QoP can be found in [25]. 4.1. RBR multiplexing Dual-link failure models are based on the assumption that there can be a maximum of two link failures in the network at any instant in time. As a result, the vulnerable paths resulting from a link failure can share the second backup (RBR), provided they are link disjoint. Consider the example shown in Fig. 4. The figure shows two primaries (P2 and P3 ) and one backup (B1 ) using link (2, 3). Assume nodes 2 and 3 to have converters. If link (2, 3) fails, lightpaths P1 , B2 and B3 become vulnerable to a second failure. Since all these paths are link disjoint, it is sufficient to provide just one RBR around the failed link. This is because, a second link failure can affect at most one of the vulnerable paths. Note that this improvement is made possible for two reasons:

M. Sivakumar, K.M. Sivalingam / Optical Switching and Networking 3 (2006) 71–88

79

Fig. 4. Backup multiplexing among RBRs.

(i) Presence of wavelength converters at the end nodes of the failed link: If we assume a sparse conversion model with k of the n nodes in the network having converters, the probability that the end nodes of the failed link have converters is (k/n) × (k/n) = (k 2 /n 2 ). (ii) Link disjoint vulnerable lightpaths. In this paper, we propose a heuristic routing algorithm that attempts to maximize the number of vulnerable lightpaths (for each link) that are link disjoint. If the vulnerable paths are made completely link disjoint, we would require just one RBR for all the affected paths (best case). On the other hand, even if they are not completely link disjoint, by minimizing the number of common links we can minimize the maximum number of vulnerable paths that can be affected by a dual-link failure. 4.2. Best-fit routing algorithm In this section, we describe the proposed routing algorithm that aims to minimize the intersection (the number of common links) among the vulnerable paths for each link. We first define the concept of VPaths and VLinks for a link i. Vulnerable paths (VPaths) For each link i, the VPaths are those lightpaths which will be rendered unrestorable by a second link failure, assuming that the first failure occurred on link i. In Fig. 4, P1 , B2 and B3 will rendered vulnerable to a duallink failure if the second failure occurs on one of their links, given that the first failure occurred on link (2, 3).

Vulnerable links (VLinks) The VLinks of a link i are those links that make up the VPaths of the link. VLinks for a path p is the union of VLinks for each link i on p. In Fig. 4, the VLinks of link (2, 3) are the links of the paths P1 , B2 and B3 , i.e., VLinks (i) = {(1, 5)(5, 6)(6, 4)(7, 8)(8, 3)(2, 9)(9, 10)}. The VLink shared by the maximum number of VPaths determines the number of RBRs that need to be found to ensure complete dual-failure restorability. Routing function Given a network topology and the current network state, provide the incoming connection Ci with a primary Pi and link disjoint backup P B Ri such that the primary-backup pair intersects with the minimum number of VLinks of both Pi and P B Ri , on an average. Data structures Each link i in the network stores the following details for every other link j. • VLinkStatus: the number of VPaths of link i that use link j. In the example in Fig. 4, VLinkStatus [(2, 3), (5, 6)] = 1; indicates that link (5, 6) is used by 1 of (2, 3)’s VPath. • VPathList: list of VPaths of j that traverse the link i (as s–d pairs) VPathList(i, j) ⇐⇒ VPathList( j, i). In Fig. 4, VPathList [(2, 3), (5, 6)] = [(1 → 4)] indicates that one of link (5, 6)’s VPath (between nodes 1 and 4) is traversing link (2, 3). Each node maintains a routing table that has an entry for every other node in the network. This table contains the precomputed link disjoint pairs for each of those destinations.

80

M. Sivakumar, K.M. Sivalingam / Optical Switching and Networking 3 (2006) 71–88

Algorithm We now discuss the working of the best-fit routing algorithm. Given the source s, destination d, and the set of link disjoint pairs between s and d, the algorithm computes the link disjoint pair that has minimum intersection with the VPaths of the links on their path. Due to lack of space, the algorithm is not illustrated here. The pseudo-code for the algorithm can be found in [25]. The result of the algorithm is an implicit specification of the link disjoint pair that best fits the constraints imposed in the problem statement. The link disjoint pair with the least intersection with existing VLinks (on an average) is chosen and the wavelengths on the appropriate links are set to be busy. Among the two paths in the link-disjoint pair, the one with smaller hop length is chosen as the primary and the other is chosen as the PBR for the primary. Also, the VLinkStatus and VPathStatus of each link in the primary and PBR are updated to reflect the arrival of a new connection. When a connection leaves the network, the wavelengths used by the primary and backup of the connection are set to be free. The VLinkStatus and VPathStatus of the links in the primary and backup are updated. 5. Performance evaluation In this section, we present the performance evaluation of the proposed mechanisms. 5.1. Simulation model The network topologies studied in evaluating the performance of the active protection scheme are the 20-node 32-link Arpanet [10] and the 11-node 22link modified NJLATA [13]. For the best-fit routing algorithm, we also studied the 14-node NSFNET. Neither of these two topologies have nodes with degree two or less. As a result, all dual-link failures are potentially recoverable (i.e., no unrecoverable failures can occur). All links are assumed to be bidirectional. Each simulation is run for 1 million connection requests. Traffic arrivals are based on a Poisson process at each node, with a service time exponentially distributed with mean 1 unit. Destinations for a session request are chosen based on a uniform random process. Connection arrival rates on the routes are obtained by using a scaling factor γ . In our simulation, we choose the scaling factor in such a way as to keep the average blocking probability approximately in the neighborhood of 10−3 . Failure insertion is also based on a Poisson process and the links to be failed are chosen randomly.

The failures are inserted in such a way that only a maximum of two links fail at any instant in time. Also, the failure holding times are taken to much less (0.1 units) compared to the session holding times. All nodes in the network are given a limited number of converters, each capable of full wavelength conversion. We only show the results for the case when 30 converters (evenly distributed among all the nodes) were given to the network, since similar trends are seen when the number of converters was varied between 10 and 50. In evaluating the performance of the Active Protection scheme, three different mechanisms, described below, are studied: No additional protection This is the base model we start with, where every lightpath in the network is protected only by their PBRs. This model is capable of protecting against duallink failures on non-VPaths. The number of dual-link failures restored in this case indicates the percentage of such failures that do not need more than one backup path for its primaries. Vulnerable Path Protection (VPP) Here, we provide RBRs to both vulnerable primaries and vulnerable backups. Vulnerable Backup Protection (VBP) Here, only vulnerable backups (PBRs) are protected by RBRs. The motivation for considering both VBP and VPP is to evaluate the percentage of additional spare capacity (RBRs) reserved to protect vulnerable primaries, which represent the lightpaths that were guaranteed 100% protection by the initial model and the vulnerable backups, which were not guaranteed any protection. The provisioning of additional protection paths (RBRs) does not significantly affect the blocking probability, since the probability that a new session request originates when a link fails and needs the same resources used by the RBRs is small. Hence we only consider the blocking performance as a metric to evaluate the best-fit routing algorithm. The other performance metrics considered are: a. Dual-Failure Restorability (DFR) The fraction of dual-link failures that were successfully rerouted (either on to their PBR or on to their RBR).

M. Sivakumar, K.M. Sivalingam / Optical Switching and Networking 3 (2006) 71–88

(a) Arpanet with 20 nodes and 32 links.

81

(b) Modified NJLATA with 11 nodes and 23 links.

Fig. 5. Performance of the three mechanisms with respect to the dual-failure restorability (DFR) metric, varying load, for Arpanet and NJLATA topologies with 20 wavelengths per link. Scaling factor varies between 0 and 1.

b. Maximum additional spare capacity (ASCmax ) The maximum fraction of the network capacity that will require additional spare capacity to ensure complete restorability for any dual-link failure. Towards this, we measure the maximum number of lightpaths which will require an RBR (ASCPath ) and the number of links used by these paths (ASCWave ) over the period of the simulation. While ASC Path indicates the maximum number of paths that need to have 200% backup capacity (two backups) to ensure complete dualfailure restorability, ASC Wave indicates that number of wavelength-links used by the paths. The maximum fraction of the network capacity that will need two backup paths at any instant of time (ASCmax ) is then given by ASCmax = (ASC Wave/num wave links), where (num wave links = L × W ) is the total number of wavelength-links in the network. If Ts and T f are the average session and failure holding times respectively, (Ts /T f ) represents the average time for which these additional paths will be protected by an RBR. c. Number of RBRs required (RBRreq ) The number of RBRs that need to be calculated when a link i fails in order to ensure complete dual-failure restorability. RBRreq represents the number of RBRs that were reserved by the active protection algorithm, when a link fails and is defined as: RBRreq = Max j∈VLinks(i) VLinkStatus(i, j). d. Number of RBRs used (RBRused ) The number of RBRs that were actually needed when a second link failure ( j) occurs on the VLink of the first link (i) failure. RBRused represents the actual

number of RBRs that would have been sufficient to obtain complete dual-failure restorability and is defined as RBRused = VLinkStatus(i, j). e. % RBRs found (Frbr ) The fraction of RBRs that were successfully found, and reflects the restoration efficiency of the segment restoration algorithm. It indicates (on the average), the percentage of lightpaths (%ASC Path) that were given 200% backup capacity by the active protection algorithm. f. Improvement of VPP over VBP in handling BLFs The improvement obtained by using VPP over VBP in handling backup-link failures. This is useful in deciding between the two especially when network designers are able to assess how frequently a vulnerable primary or vulnerable backup is affected in a real network. 5.2. Performance results This section presents the detailed simulation-based performance evaluation. Comparison of the schemes Fig. 5 presents the performance of the three studied mechanisms with respect to the Dual-Failure Restorability (DFR) metric for Arpanet and NJLATA topologies, with 20 wavelengths per link. As seen in Fig. 5(a) presenting Arpanet results, VBP and VPP outperform the basic mechanism for all loads and VPP obtains close to 100% restorability. The results also show that providing protection to vulnerable backups alone substantially increases DFR. For Arpanet, at low

82

M. Sivakumar, K.M. Sivalingam / Optical Switching and Networking 3 (2006) 71–88

Table 1 Vulnerable primary restorability provided by the Active Protection algorithm for varying loads, in Arpanet and NJLATA topologies Load

Dual-failure restorability (DFR) ARPANET NJLATA

0.05 0.10 0.20 0.40 0.60 0.80

99 96.33 93 92 92 91

99 99 98 95 94.5 93

Scaling factor varies between 0 and 1.

loads, the increase in DFR is close to 6% for VBP and 10% for VPP compared to the basic mechanism. This is due to the fact that at low loads the number of paths affected is less, and hence the scheme with no RBRs is able to provide 90% restorability. However, as the load increases, we can observe more improvement. At high loads, the basic scheme provides only around 80% restorability, while improvements of 10% for VBP and 17% for VPP are observed. Similar results are seen for NJLATA too as shown in Fig. 5(b). Note that vulnerable primaries represent the lightpaths that were provided 100% guarantees on singlefailure restorability. On the other hand, the vulnerable backups are not provided with any protection guarantee. The percentage of vulnerable primaries restored successfully by the algorithm is illustrated in Table 1, for Arpanet and NJLATA respectively. As can be seen, the algorithm manages to achieve close to complete vulnerable primary restorability at low to moderate loads. Comparison of VBP and VPP The performance comparison of VBP and VPP in the 32-link Arpanet and the 22-link modified NJLATA topology are shown in Table 2(a) and (b) respectively. The number of wavelengths per link was fixed at 20.

The DFR values for the two schemes are repeated from Fig. 5. As seen in the table (and also in the previous figure), VBP attains higher DFR as compared to VPP for all loads. This is expected since we provide protection to both vulnerable primaries and vulnerable backups in VPP and only to vulnerable backups in VBP. The improvement is seen to be higher as the load increases. The second row shows the extent to which VPP is better in handling backup link failures and in particular the vulnerable primaries since VBP does not handle them. The actual choice between VBP and VPP can be made based on this improvement and this in turn depends on how frequently link failures affect vulnerable backups and primaries. In the case of Arpanet, VPP provides an average of 40% improvement over VBP for moderate and high loads. The modified NJLATA network, which is denser, sees even more improvement (close to 55% for moderate and high loads). As expected, at low loads, not many resources are utilized and hence the improvements are not so pronounced (27% for Arpanet and 34% for NJLATA respectively). To further compare the two mechanisms, the table presents the overall additional spare capacity required to obtain complete dual-failure restorability and the maximum number of paths and links (wavelengths) that require 200% capacity at any point of time. For example, at low load for Arpanet topology, VBP requires 200% protection capacity for 3 paths and 5 links, compared to 4 paths and 5 wavelengths for VPP. At low loads, both schemes require almost identical spare capacity. This is as expected since there are not many paths affected. As the load increases, the number of vulnerable primaries increases and hence VPP needs to protect more paths using two backups. For example, 20 links need two backups at high loads, for the Arpanet topology, using VPP. Hence, the fraction of wavelength-links that need 200% backup capacity only

Table 2 Performance comparison of the VBP and VPP schemes with regard to the various performance metrics (a) Arpanet with 20 nodes and 32 links

(b) Modified NJLATA with 11 nodes and 23 links

Parameters measured

Low load

Moderate load

High load

VBP

VPP

VBP

VPP

VBP

VPP

DFR % Improvement of VPP over VBP for BLFs %ASC ASC Wave ASC Path F rbr

95.5 –

99 27

93.5 –

97 42

91.5 –

96.5 40

0.7 5 3 99

0.7 5 4 85

1.09 7 4 92

2.03 13 8 84

1.4 9 5 87

3.1 20 12 82

Parameters measured

Low load

Moderate load

High load

VBP

VPP

VBP

VPP

VBP

VPP

DFR % Improvement of VPP over VBP for BLFs %ASC ASC Wave ASC Path F rbr

96 –

99 34

93.5 –

97 53

93.5 –

97 57

0.68 3 3 93

1.5 7 7 84

1.36 6 5 85

2.9 13 11 83

1.5 7 7 83

2.7 12 12 83

M. Sivakumar, K.M. Sivalingam / Optical Switching and Networking 3 (2006) 71–88

(a) All nodes have 3 wavelength converters.

83

(b) Even-numbered nodes have no wavelength converters while the others have 3 converters.

Fig. 6. Comparison of segment and link restoration, in terms of success in finding RBRs, for the Arpanet topology with 20 wavelengths per link.

amount to 3% for VPP and 1.5% for VBP even at high loads while being negligible at low loads. Since VPP provides significant improvements in handing backup link failures (BLFs) with only a 1.5% increase in spare capacity needs as compared to VBP, it is better to protect both vulnerable primaries and vulnerable backups. Ideally, if enough spare capacity were available for all these wavelength-links, then we can provide 100% protection against all dual-link failures. However, due to capacity limitations, the mechanism may not always find an RBR for the vulnerable paths leading to lower DFR values. The table also presents the metric, Frbr , which captures the extent to which RBRs are successfully found. This is higher for VBP compared to VPP, since the latter protects against both vulnerable backups and primaries. For example, this is seen to be 87% and 82% respectively for VBP and VPP, for Arpanet at high loads. However, even though only 80%–85% of the additional spare capacity is available, we are able to achieve close to 100% dual-failure restorability. This is because a lightpath is not rerouted only if the following two conditions are met: (i) the lightpath was vulnerable, and (ii) an RBR could not be found to protect it. The occurrence of both these conditions is seen to have low probability and hence the algorithm that finds the restoration route (RBR) need not very effective to guarantee complete dual-failure restorability. Segment vs. link restoration We next compare the segment restoration approach used to find RBR for vulnerable backup paths, to link restoration. Fig. 6 presents the percentage of RBRs

determined for the Arpanet topology. The two different scenarios studied are: (i) all nodes have 3 wavelength converters and (ii) Even numbered nodes have no converters while others have 3 converters. At low loads, both the algorithms provide similar results since resources are hardly used at such loads. Hence, the limiting factor at these loads is the length of the backups found. Since segment restoration tends to protect a larger set of links, it is likely that the backup path has more hops resulting in link restoration performing better. This can be noticed in Fig. 6(a) for the case when load is 0.2. Moreover, when there are no converters at some nodes segment restoration may be as bad as path restoration and this factor dominates at low loads as can be seen in Fig. 6(b). At moderate and high loads, segment restoration provides significant improvements over link restoration. For example, an improvement of nearly 21% is seen at the load value of 0.8, for the scenario presented in Fig. 6(a). Comparison of routing algorithms — best-fit vs. first-fit We finally compare the performance of the proposed best-fit routing algorithm with the first-fit algorithm. Since the best-fit algorithm tries to minimize the intersection of a link’s VPaths, it is likely to increase the possibility of VPaths being link-disjoint and therefore reduce the average number of RBRs required (average RBRreq ) for a link failure.1 We first compare the performance of the two routing algorithms with respect to the average value of RBRreq obtained, at different loads, for Arpanet and NSFNET, in Fig. 7(a) and (b) respectively. 1 RBR will be less than the number of VPaths affected by a link req failure when one or more of the VPaths are link-disjoint.

84

M. Sivakumar, K.M. Sivalingam / Optical Switching and Networking 3 (2006) 71–88

(a) Arpanet with 20 nodes and 32 links.

(b) NSFNET with 14 nodes and 23 links.

Fig. 7. Comparison of average RBRreq for the best-fit and first-fit routing algorithm as a function of load, with 30 wavelengths per link.

(a) Arpanet with 20 nodes and 32 links.

(b) NSFNET with 14 nodes and 23 links.

Fig. 8. Blocking performance of best-fit and first-fit routing algorithm as a function of load, with 30 wavelengths per link.

It can be seen from the graphs that the bestfit algorithm outperforms the first-fit algorithm at all loads by 15%–20%. This is expected since the firstfit algorithm might not choose a primary-backup pair that is link-disjoint with one or more VPaths of its links even if one exists. The improvement is slightly more at low loads as compared to that obtained at high loads. This is due to the fact that at high loads it is difficult to find a primary-backup pair that is link disjoint with the VPaths of its links. It is, however, interesting to note the average number of RBRs required by both algorithms at different loads. For instance, in Fig. 7(a), when the best-fit algorithm is used and the average load at the nodes is 30 erlang, close to 60% of the VPaths affected by a link failure require an RBR. However, when the average load at the nodes is 200 erlang, only 45% of the affected VPaths require an RBR. A similar trend can also be seen when the first-fit algorithm is used. This

is also expected due to the following: with increasing loads, the average link utilization increases and hence there is more opportunity to find link-disjoint VPaths. This suggests that the percentage of affected VPaths that require additional spare capacity reduces as load increases, irrespective of the routing algorithm used. The blocking performance of the two routing algorithms for Arpanet and NSFNET is presented in Fig. 8(a) and (b) respectively. As seen in the graphs, the blocking performance of the best-fit algorithm is better than that obtained with the first-fit algorithm at moderate loads. This is due to the fact the best-fit algorithm tends to distribute the load on different paths for connections on the same source–destination pair (due to the constraint based routing used) even though it is likely to choose paths with a longer hop length. The improvement is significant at moderate loads, and when the loads are high both algorithms provide almost

M. Sivakumar, K.M. Sivalingam / Optical Switching and Networking 3 (2006) 71–88

(a) Arpanet with 20 nodes and 32 links.

85

(b) NSFNET with 14 nodes and 23 links.

Fig. 9. RBRreq vs. RBRused for the best-fit routing algorithm as a function of load, with 30 wavelengths per link.

similar performance. For instance, in Fig. 8(a), the blocking performance with best-fit is seen to be ten times better than that obtained with first-fit when the average load at the nodes is 70 erlang and almost similar when the average load at the nodes is 150 erlang. This is because, at moderate loads, the blocking performance is not constrained as significantly by larger hop lengths as it would be at high loads, when there are very few resources available. Therefore, the improvement obtained at high loads with load distribution is balanced by the reduction in blocking due to higher hop lengths. In comparing VBP with VPP, we had discussed the efficiency of the restoration technique required to achieve close to 100% dual-failure restorability. To recapitulate, the results showed that it was not necessary for the restoration algorithm to be very effective in order to obtain close to complete dual-failure restorability. This was because, even if an RBR was not found for a VPath V p , it is necessary for the second link failure to occur on one of its links for V p to go unrestored. We performed a similar evaluation with the best-fit algorithm and the results are illustrated in Fig. 9(a) and (b) for Arpanet and NJLATA respectively. The graphs present the average number of RBRs reserved by the algorithm (RBRreq ) and the average number of RBRs actually used (on a dual-link failure) for varying loads. For instance, in Fig. 9(a), when the load per node is 25 erlang, the average number of RBRs provided by the algorithm is 67% of the link utilization whereas the average number of RBRs actually required was only 35% of the link utilization. This difference is more significant as the load increases. For instance, at high loads, almost a 40% improvement can be noted and the average number of RBRs used was only 10% of the link utilization. This goes to show that the active

protection scheme would be very efficient at high loads and in general, provide close to 100% dual-failure restorability with minimal addition to spare capacity requirements. 6. Extension to SRLG Shared Risk Link Groups (SRLGs) refer to situations where links in a network share a common fiber (or a common physical attribute). If one link fails, other links in the group will also fail. Links in the group are thus said to have a shared risk. In general, to avoid this, the backup paths should avoid using links in the same SRLG group as the links they are protecting (the links of the primary). Otherwise, when the protected link fails the backup path will also fail, resulting in a dual-link failure scenario. In general, there are two ways for a backup path to avoid the SRLGs of its protected interface: (1) The system only allows creation of backups that are SRLG-disjoint with the links of the corresponding primary. (2) The system tries to avoid SRLGs of the protected interface, but if that is not possible it creates the backup path anyway. In this case there are two explicit paths. The first explicit path tries to avoid the SRLGs of the protected interface. If that does not work, the backup path uses the second path (which ignores SRLGs). There are heuristics described in literature that calculate SRLG-disjoint paths [26–29]. Exploring a new method for the same would fall outside the scope of this article, given space limitations. We hence provide a qualitative discussion on the various options available

86

M. Sivakumar, K.M. Sivalingam / Optical Switching and Networking 3 (2006) 71–88

in extending the current scheme in countering dual-link failures caused by SRLGs. In the proposed active protection scheme, there are at most two instances where backup paths are found for a primary. The first is when the connection is established at which time the primary and a link-disjoint Predetermined Backup Route (PBR) are provided for the connection. The second is when a failure occurs on a link through which the primary traverses. A Restored Backup Route (RBR) for the segment formed around the failed link is found to protect the primary segment. The proposed scheme can be extended to handle SRLGs by requiring both PBR and RBR to be SRLG disjoint with the links of the primary path (in case of a PBR) or the segment (in case of an RBR). Based on the availability of such SRLG-disjoint backup paths, the following cases are possible: (1) The PBR is not SRLG-disjoint with the primary: Under this scenario, one or more of the links in the primary are shared by the PBR (i.e. they are part of the same SRLG) and if a failure affects one of those links, we will have a dual-link failure caused by SRLG and both the primary and PBR will be rendered unavailable. The RBR that will be found in this case is required to be SRLG-disjoint with the primary and the PBR. If such an RBR is not found, then the SRLG failure is not restorable. (2) The PBR is SRLG-disjoint with the primary: In this case, there are no links in the primary and PBR that are from a single SRLG. As a result, a failure on one of the links of the primary would not prevent the lightpath from being restored on to its PBR. However, the RBR for the primary segment will be required to be SRLG disjoint with the primary and PBR in order to avoid a second link failure on the PBR to be unrestored. From the aforementioned cases it is clear that for each lightpath (connection), irrespective of whether the PBR is SRLG-disjoint with the primary or not, 100% dual-failure restorability can still be achieved if the RBR is SRLG-disjoint with the primary segment it is protecting. A dual-link failure scenario involving that lightpath will not be restored only when both the PBR and RBR are not SRLG-disjoint with the primary elements they are protecting. The advantage of this is that the RBR would in general protect only a subset of the links of the primary and it will be relatively easier to find an SRLG-disjoint path for a shorter sub-path (segment) as compared to finding one that is SRLGdisjoint with all the links of the primary. In the best case, if the end nodes of the failed link have converters,

it will be sufficient to find an alternate path around the failed link that is SRLG-disjoint with it. It is evident that this will also benefit from the presence of wavelength converters in the network. There are other techniques that can be used to handle the problem, as described below. Segment formation Currently, the proposed mechanism identifies segments on the primary path based on the availability of wavelength converters at the time of a failure. This can be extended or modified to form segments which have SRLG-disjoint paths. Further, the calculation of segments will be done at connection setup time to avoid segment formation delays during restoration. This will ensure that an RBR can be found around the segment when a failure occurs (subject to capacity availability). Alternately, segments can be identified after the occurrence of a failure. In this case, avoiding choosing the start and end nodes of a segment based on whether the outgoing/incoming link is part of an SRLG will increase the chances of finding an SRLG-disjoint RBR around the failed link. This will be an added constraint in forming segments around the failed link in addition to the requirement of wavelength converters at the end nodes. Two backup paths As mentioned earlier, regardless of whether the PBR is SRLG-disjoint or not, the RBR needs to be SRLG-disjoint in order to prevent SRLG-based duallink failures that cannot be handled. If an SRLG-based dual-link failure affects the primary and the PBR of a lightpath, the restorability depends upon the availability of an SRLG-disjoint RBR. Further, there will be data loss until such a path is found. This would negate the advantages that the current scheme provides wherein it tries to avoid the delay for restoration of the second failure. One way of reducing the delays will be to find two backup paths for those lightpaths for which an SRLG-disjoint PBR cannot be found. The RBRs will be found for each segment of the path at connection set-up time to reduce the delay in restoration. The RBRs, however, will be required to be SRLG-disjoint with the primary path to guarantee dual-link failure recovery. 7. Conclusions In this paper, we presented a hybrid failure recovery mechanism that tries to utilize the potential in mesh

M. Sivakumar, K.M. Sivalingam / Optical Switching and Networking 3 (2006) 71–88

networks to handle link failures. In particular, we studied the dual-link failure problem and tried to minimize the spare capacity essential to provide complete survivability in such cases. We presented an algorithm that attempts to achieve high restoration speed and guarantees while optimizing the spare capacity requirements. We also proposed a routing algorithm to increase opportunities for backup multiplexing among additional backup paths. Furthermore, we also presented a segment-based restoration approach that relaxed the need to have the same wavelength in the backup paths by better utilizing the wavelength converters. We find that complete dualfailure restorability can be obtained with very little addition to spare capacity. Also, we showed that it is worthwhile to protect both vulnerable primaries and vulnerable backups. Furthermore, we quantified the importance of wavelength conversion in protection with the segment restoration algorithm. Finally, we also showed that by increasing opportunities for backup multiplexing, a considerable reduction in ASC requirements can be obtained. A discussion on extending the scheme to Shared Link Risk Groups is also included. Acknowledgments The authors sincerely thank Manav Mishra and Christian Maciocco of Intel Corporation, for their comments, and Sundar Subramani of UMBC for his help with part of the simulations. References [1] M. Clouqueur, W. Grover, Mesh-restorable networks with enhanced dual-failure restorability properties, in: Proc. SPIE OPTICOMM, Boston, MA, 2002, pp. 1–12. [2] K. Sivalingam, S. Subramaniam (Eds.), Emerging Optical Network Technologies, Springer Publishers, Boston, MA, 2004. [3] O. Gerstel, R. Ramaswami, Optical layer survivability-an implementation perspective, IEEE Journal on Selected Areas in Communications 18 (10) (2000) 1885–1923. [4] J.B. Slevinsky, W.D. Grover, M.H. MacGregor, An algorithm for survivable network design employing multiple self-healing rings, in: Proc. IEEE GLOBECOM, Houston, TX, 1993, pp. 1568–1572. [5] M. Medard, S.G. Finn, R.A. Barry, WDM loop-back recovery in mesh networks, in: Proc. IEEE INFOCOM, New York, NY, 1999. [6] D. Stamatelakis, W.D. Grover, IP layer restoration and network planning based on virtual protection cycles, IEEE Journal on Selected Areas in Communications 18 (10) (2000) 1938–1949. [7] G. Ellinas, A.G. Hailemariam, T.E. Stern, Protection cycles in mesh WDM networks, IEEE Journal on Selected Areas in Communications 18 (10) (2000) 1924–1937. [8] S. Ramamurthy, B. Mukherjee, Survivable WDM mesh networks Part 1 — Protection, in: Proc. IEEE INFOCOM, vol. 2, New York, NY, 1999, pp. 744–751.

87

[9] S. Ramamurthy, B. Mukherjee, Survivable WDM mesh networks, Part II — Restoration, in: Proc. International Conference on Communications (ICC), Vancouver, Canada, 1999, pp. 2023–2030. [10] H. Choi, S. Subramaniam, H.-A. Choi, On double-link failure recovery in WDM optical networks, in: Proc. IEEE INFOCOM, New York, NY, 2002. [11] J. Doucette, W.D. Grover, Capacity design studies of spanrestorable mesh networks with shared-risk link group (SRLG) effects, in: Proc. SPIE OPTICOMM, Boston, MA, 2002, pp. 25–38. [12] R. Ramaswami, G. Sasaki, Multiwavelength optical networks with limited wavelength conversion, in: Proc. IEEE INFOCOM, Kobe, Japan, 1997. [13] W. He, M. Sridharan, A.K. Somani, Capacity optimization for surviving double-link failures in mesh-restorable optical networks, in: Proc. SPIE OPTICOMM, Boston, MA, 2002, pp. 13–24. [14] M. Sivakumar, R. Shenai, K.M. Sivalingam, Protection and restoration for optical WDM networks: A survey, in: K. Sivalingam, S. Subramaniam (Eds.), Emerging Optical Network Technologies, Springer Publishers, 2004, pp. 297–332. [15] M. Clouqueur, W.D. Grover, Computational and design studies on the unavailability of mesh-restorable networks, in: Proc. IEEE/VDE Design of Reliable Communication Networks (DRCN), Munich, Germany, 2000, pp. 181–186. [16] M. Clouqueur, W.D. Grover, Availability analysis of spanrestorable mesh networks, IEEE Journal on Selected Areas in Communications 20 (4) (2002) 810–821. [17] S. Lumetta, M. Medard, Classification of two-link failures in alloptical networks, in: Optical Fiber Communication Conference, Vancouver, Canada, 2001. [18] H. Sakuchi, Y. Okanoue, S. Hasagrewa, Spare-channel design schemes for self-healing networks, ICICE Transactions on Communications (1992) 624–633. [19] M. Sivakumar, S. Subramaniam, On the performance impact of wavelength assignment, wavelength conversion architecture and placement algorithms, Optical Networks Magazine 3 (2) (2002). [20] R. Srinivasan, A.K. Somani, A generalized framework for analyzing time-space switched optical networks, IEEE Journal on Selected Areas in Communications (2002) 202–215. [21] J. Wang, L. Sahasrabuddhe, B. Mukherjee, Path vs. subpath vs. link restoration for fault management in Ip-over-WDM networks: Performance comparisons using GMPLS control signaling, IEEE Communications Magazine (2002) 80–87. [22] A. Todimala, B. Ramamurthy, A dynamic partitioning subpath protection routing technique in WDM mesh networks, in: Proceedings of ICCC, Mumbai, India, 2002. [23] K.P. Gummadi, M.J. Pradeep, C.S.R. Murthy, An efficient primary-segmented backup scheme for dependable real-time communication in multihop networks, IEEE/ACM Transactions on Networking 11 (1) (2003) 81–94. [24] K.P. Gummadi, M.J. Pradeep, C.S.R. Murthy, A segmented backup scheme for dependable real-time communication in multihop networks, in: Proceedings of the Eighth International Workshop on Parallel and Distributed Real Time Systems, Cancun, Mexico, 2000. [25] M. Sivakumar, K. Sivalingam, On surviving dual-link failures in path protected optical WDM mesh networks, Tech. Rep., University of Maryland at Baltimore County (UMBC), http://dawn.cs.umbc.edu/wdm-pubs.html, June 2004.

88

M. Sivakumar, K.M. Sivalingam / Optical Switching and Networking 3 (2006) 71–88

[26] D. Xu, Y. Xiong, C. Qiao, A new PROMISE algorithm in networks with shared risk link groups, in: Proc. GLOBECOM, San Francisco, CA, 2003, pp. 2356–2540. [27] G. Li, C. Kalmanek, R. Doverspike, Fiber span failure protection in mesh optical networks, SPIE Optical Networks Magazine 3 (3) (2002).

[28] R. Bhandari, Survivable Networks — Algorithms for Diverse Routing, Kluwer Academic Publishers, 1998. [29] E. Bouillet, J.-F. Labourdette, G. Ellinas, R. Ramamurthy, S. Chaudhuri, Stochastic approaches to compute shared mesh restored lightpaths in optical network architectures, in: Proc. IEEE INFOCOM, New York, NY, 2002, pp. 801–807.