Computer Communications 28 (2005) 987–999 www.elsevier.com/locate/comcom
Restoration mechanisms for handling channel and link failures in optical WDM networks: tunable laser-based switch architectures and performance analysis Harini Krishnamurthya,1, Krishna M. Sivalingamb,2,*, Manav Mishrac,3 b
a Amazon.com Inc., Seattle, WA 98144, USA Department of Computer Science and Electrical Engineering, University of Maryland, Baltimore County (UMBC), 1000 Hilltop Circle, Baltimore, MD 21250, USA c Microsoft Corporation, Redmond, WA 98052, USA
Received 24 July 2003; revised 12 November 2004; accepted 24 November 2004 Available online 16 December 2004
Abstract In this paper, we study restoration mechanisms to handle channel and link failures in an optical wavelength division multiplexed (WDM) wavelength-routed wide-area backbone network based on a mesh topology. The solution uses a small number of tunable lasers to provide restoration capability. We consider two types of failures: link failures and individual channel (or wavelength) failures that occur when one or more transceivers fail at a node that is the source of lightpath(s) or due to a failure in an intermediate node’s optical switch fabric. We use the restoration mechanism that attempts to find alternate paths and resources after failure occurs. In our proposed mechanism, restoration is first attempted using the tunable lasers to transmit on the failed wavelengths. If all the failed lightpaths cannot be restored using the tunable lasers, unused wavelengths on the same link are used, if optical wavelength conversion is available. For the remaining lightpaths requiring restoration, two different link-level restoration mechanisms called redirection algorithm (RDA) and disjoint path algorithm (DPA) are used. Results based on discrete-event simulations to understand the performance of the proposed mechanisms, in terms of restoration efficiency and restoration times, are presented. The results show that for networks of varying size and node degree with 32 wavelengths on each link, using as few as eight tunable lasers per link provides good restoration efficiency under moderate traffic load. The performance of the proposed algorithms is compared to an earlier restoration mechanism based on broadcast, and it is seen that the proposed mechanism performs better, by offering both lower restoration times and higher restoration efficiency even with a small number of lasers. The impact of the number of tunable lasers on the performance is studied for failures occurring simultaneously on two links. It is seen that for a small number of such channel failures, as few as four tunable lasers per link are sufficient to recover from failures on a single link and on two links. q 2004 Elsevier B.V. All rights reserved. Keywords: Optical networks; Wavelength division multiplexed (WDM) networks; Failure recovery; Survivability; Restoration; Channel failures
* Corresponding author. Tel: C1 410 455 3961; fax: C1 410 455 3969. E-mail addresses:
[email protected] (H. Krishnamurthy),
[email protected] (K.M. Sivalingam),
[email protected] (M. Mishra). 1 Most of this author’s contributions were done while the author was a graduate student at Washington State University, Pullman and at University of Maryland, Baltimore County. 2 Part of the research was supported by a grant from Cisco Systems, San Jose, CA, Intel Corporation and NSF grant No. ANI-0322959. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation. 3 Most of this author’s contributions were done while he was with Intel Corporation in Hillsboro, OR, USA. 0140-3664/$ - see front matter q 2004 Elsevier B.V. All rights reserved. doi:10.1016/j.comcom.2004.11.004
1. Introduction Significant advances in the field of optical communications and networking over the past few years have resulted in the widespread deployment of optical wide-area and metropolitan networks [1], based on optical Wavelength Division Multiplexing (WDM) technology. WDM technology enables the multiplexing of several independent streams of information, each on a different wavelength, on the same fiber [2]. This helps reduce the impact of the opto-electronic speed mismatch by providing multiple channels that operate closer to the electronic processing speed. Optical WDM
988
H. Krishnamurthy et al. / Computer Communications 28 (2005) 987–999
networks have been considered for access, local area, metropolitan area, and wide area networks. In this paper, we consider a circuit-switched, mesh topology based wide-area wavelength routed network that consists of optical crossconnect routers connected by WDM links. Traffic session requests arrive based on a stochastic process and the system establishes a lightpath (an end-to-end all-optical circuitswitched connection) for each session. Network monitoring statistics show that failures are not an uncommon occurrence in backbone networks [3–5]. Hence, fault-tolerance (or survivability) is an important consideration for such high capacity networks, since failures may result in significant degradation of network performance. The common types of failures that have been studied are node and link failures. The recovery mechanisms for such failures may be broadly classified as: (i) protection or pro-active schemes, where the recovery path with partially or fully allocated resources is determined at the time of session establishment; and (ii) restoration or reactive schemes, where the recovery path is determined after failure occurs. Another classification is to categorize the mechanisms as link-level, where the traffic is re-routed around the affected link leaving the rest of the path intact, and path-level, where the traffic is re-routed on a different path between the source and the destination. In this paper, we consider link failures and channel failures. It is assumed that each link has a transmitter array at one end and a receiver array at the other. A channel failure occurs when one or more transmitters fails at the source of a lightpath or due to a failure in the switching fabric. Thus, recovery mechanisms are necessary for the lightpaths using the affected channels. A link failure can be considered as a special case of channel failures where all channels fail. Our solutions are based on the restoration (i.e. reactive) method, where there is no prior resource allocation on a recovery path. The mechanism uses a small number of tunable lasers at every link. Tunable laser diodes have been widely accepted as essential components for future optical networks [6–11]. Tunable lasers are typically used in inventory control and in sparing, to reduce the overhead associated with maintaining fixed lasers as spares. This paper is the first, to the best of our knowledge, to consider the use of tunable lasers for restoration. The design goal is to use a small number of these devices, given their higher cost relative to fixed-tuned lasers. The objective of the proposed tunable-laser based mechanism is to re-route affected traffic on the same link and use the link-level restoration mechanism only when absolutely necessary. To recover from channel failures, local restoration is first attempted by sending an affected session’s traffic on its original wavelength using a tunable laser. If enough tunable lasers are not available to switch all the affected lightpaths, then unused wavelengths on the same link may be used if wavelength conversion is present at the routers. The remaining affected lightpaths, if any, are
restored using a link-level restoration scheme. Two algorithms for link-level restoration are presented in the paper: the Neighbor Redirection Algorithm (RDA) where a node uses one of its neighbors as the designated redirector node for each of its links, and the Disjoint Paths Algorithm (DPA), where the affected lightpaths are re-routed on two link-disjoint paths. We study the performance of the proposed mechanisms using discrete-event simulation for different network topologies and system configurations. The results indicate that as the spare capacity (i.e. the number of tunable lasers on a link) increases, the restoration efficiency increases. It is also observed that having four tunable lasers on a 32wavelength link results in a fairly high restoration efficiency for moderate network loads for NSFNET, ARPANET, and other randomly generated topologies with up to 50 nodes. The performance of the algorithms is compared to a restoration mechanism described in [12] and it is observed that the proposed algorithms result in about 40% lower restoration times and higher restoration efficiencies even with a small number of tunable lasers. The impact of the number of tunable lasers on the restoration achieved for simultaneous failures on two links is also studied and compared to that when a single link is affected. It is observed that for a small number of channel failures, as few as four tunable lasers are sufficient to recover from failures on a single link and on two links. The rest of this paper is organized as follows. Section 2 discusses related work on protection and restoration for optical WDM networks. Section 3 describes the network architecture used in the study and Section 4 presents the details of the proposed algorithms. Section 5 includes a study of the performance based on simulation. Section 6 summarizes the paper.
2. Related work This section summarizes some of the previous work done in the area of recovery mechanisms for mesh-topology based optical networks. As described earlier, the mechanisms are classified as protection or restoration and also as link-level or path-level protection or restoration [3,13]. Models based on Integer Linear Programming (ILP) have been proposed to solve the problem of determining primary and backup paths when protection schemes are used, as in [4,14,15]. The formulation in [4] seeks to jointly optimize the set-up of primary paths and protection paths, while maximizing the revenue generated by the network. In [15], the problem is modeled as three different ILPs—the first to maximize restoration, the second to minimize the capacity, while setting up backup paths and the third, a joint optimization to minimize capacity. A scheme to control the connections in a network in which path protection is employed, is presented in [5].
H. Krishnamurthy et al. / Computer Communications 28 (2005) 987–999
In [12], restoration mechanisms based on broadcast are used to determine the restoration paths. The work presented in [16] describes experiments that have been conducted to study end-to-end restoration, when Generalized Multi-protocol Label Switching (GMPLS) is used as the control protocol. Hybrid mechanisms that combine the relative benefits of protection and restoration have been presented in [17,18]. Some of the mechanisms used for mesh network protection can be achieved using well-known techniques from ring topologies, such as ring covers [19]. In [20], the concept of virtual protection cycles for recovery from link and node failures in IP over WDM networks is introduced. Another approach using protection cycles to provide automatic protection switching is presented in [21]. Failure handling in optical networks has also been studied from the perspective of providing user-specified quality of protection. The work in [22] introduces a parameter ‘Quality of Protection’ to measure the level of protection that is achieved in the network. In [23], a connection reliability parameter is used to provide partial restoration path. All of the above work considers protection and restoration mechanisms against single link failures. Schemes to handle dual link failures have also been studied in the literature. In [24], it has been proposed that computing two disjoint backup paths for every link in the network would result in 100% protection from dual link failures. The work also proposes a graph contraction and expansion mechanism to achieve 100% restoration using a single protection path for every link. In [25], the authors extend
989
the work in [24] and include backup multiplexing in their ILP model of the problem. A hybrid mechanism that combines protection and restoration has been presented in [26]. A specific form of dual link failures known as co-incident Shared Risk Link Groups (SRLG) where the two links that fail are co-incident on a common node, are considered in [27]. The work concludes that introducing spare capacity in the network to recover from the failure of co-incident SRLGs, is sufficient to recover from a comparable number of general dual link failures. The above paragraphs summarized recent research on recovery mechanisms for mesh-topology optical WDM networks. The proposed work differs from the previous work by considering channel-level failures and the use of tunable lasers to support restoration.
3. System architecture and failure model This section describes the system and network architectures and the failure models considered. 3.1. Network and cross-connect architecture The network topology under consideration is a mesh topology of optical cross-connects (or switches) interconnected by WDM links. We consider systems with and without optical wavelength conversion capabilities. Fig. 1 presents the architecture of a generic optical cross-connect
Fig. 1. A generic cross-connect node architecture that has fixed lasers as spares.
990
H. Krishnamurthy et al. / Computer Communications 28 (2005) 987–999
Table 1 Different types of tunable lasers and their characteristics Technology
Tuning range (no. of channels)
Tuning latency
Channel spacing (GHz)
Distribute Bragg Reflector (DBR) Sampled Grating-DBR Vertical Cavity Surface Emitting Laser
40
1s
50
80
20 ns
50
80
10 ms
50
and its key components. Each outgoing link is associated with a fixed transmitter laser array, where each laser is tuned to a particular wavelength. In current optical switches, for every one of these fixed lasers, another one tuned to the same wavelength is maintained as a backup [9], as shown in Fig. 1. This implies that there is 100% additional capacity for sparing. This is excessive, given the low frequency of failures. Hence, we propose to replace the backup set of fixed lasers with a set of fewer tunable lasers, with the aim of reducing the overhead associated with spare capacity maintenance. Thus, a set of TL(%W) tunable lasers is available at the switch [28]. The fixed and tunable lasers are directly connected to the switch fabric, in order to add lightpaths originating from the node. The output multiplexer is capable of multiplexing signals from the transit lightpaths and the lightpaths originating at the node. Tunable lasers. Tunable lasers are chosen as they are capable of tuning to a wide range of wavelengths [6–9]. Tunable lasers will be an important influence in future optical networks, since they have wide-ranging applications such as inventory, sparing and dynamic lightpath provisioning. The technology is maturing and we can expect volume production in the years to come.
The important characteristics of a tunable laser are the tuning range and the tuning latency. It is important to have short tuning latencies since it directly affects restoration times as will be seen later. The tunable lasers are used only when a link or channel failure occurs. There are several mechanisms used to produce tunable lasers [10,11]. The tuning ranges and latencies of these technologies are shown in Table 1. In this paper, we consider the use of SG-DBR lasers, that have a switching latency of less than 20 ns, which is necessary to achieve low restoration times. We consider two different tunable laser based switch configurations: share-per-node and share-per-link. In the share-per-node configuration, a bank of tunable lasers is shared by all the lightpaths originating at the node as shown in Fig. 2(a). In a share-per-link configuration, a bank of tunable lasers is available for each outgoing link as shown in Fig. 2(b). The share-per-link configuration is expected to perform better than the share-per-node configuration since there are greater number of lasers available for restoration. However, the share-per-link configuration is more expensive and the utilization of the tunable lasers can be quite low if failures are infrequent. The design objective is to trade off the restoration achieved against the cost associated with both the configurations. 3.2. Failure model Studies to date on fault recovery mechanisms in networks have considered link failures and node failures. In the former case, the link fails while the end nodes are still active. In the latter case, one or both of the end nodes fails. This paper considers a third type of failure that is unique to optical WDM networks, namely channel failures. Here, a subset of simultaneous channels on a link may fail at a given instance. Channel failures may occur due to failures of fixed
Fig. 2. Share-per-node and Share-per-link configuration of tunable lasers.
H. Krishnamurthy et al. / Computer Communications 28 (2005) 987–999 Table 2 Summary of notations used in Sections 4 and 5 M E dav Lav W TL Nllp Nlf tl ts tc Rrda Ridpa H(r)
Number of nodes in the network Number of links (edges) in the network Average node degree Average link load Number of wavelengths per link Total number of tunable lasers per link Total number of active lightpaths on link l Number of lightpaths on link l affected by the failure Tuning latency of the tunable laser Switch configuration time Time to send a control message Route calculated by the redirector node in RDA ith disjoint route used in DPA Number of hops on route r
lasers at the source of the lightpaths or due to failures in the switching matrix. In the event that all channels on a link fail, it may be treated as a link failure. The restoration mechanism presented in Section 4 handles multiple channel failures on a single link (and single link failures) and failures of two links incident on a common node. Section 4 describes the solutions proposed to handle these failures.
4. Proposed solutions The proposed mechanism is based on the restoration (i.e. reactive) approach that attempts to restore affected connections after failure occurs. This does not require backup resource allocation during session setup, unlike proactive mechanisms. For such restoration mechanisms, the two main design criteria considered are: (i) maximization of restoration efficiency, defined as the fraction of connections
991
that are restored when failure occurs, and (ii) minimization of restoration time, defined as the time between failure detection and connection restoration. 4.1. Channel failures We differentiate between channel failures due to transmitter failure at the source of a lightpath and failures in the switching fabric at an intermediate node along the path. For the former case, a tunable laser at the source is used to transmit the affected lightpath on the original channel. For the latter case, the affected lightpaths are terminated and the traffic is buffered and transmitted on the same or a different channel via the tunable lasers and the other fixed lasers available at the node [28]. Due to the O/E/O conversion, the end-to-end delays of the restored lightpaths will increase until the original paths are reestablished. This temporary increase in end-to-end delay may be acceptable in most situations, as a price to pay for restoration. However, the low tuning latencies of the tunable lasers results in fast restoration times, which is usually the important metric for restoration schemes. The details of the restoration procedure are provided below. Table 2 summarizes the notations used in this section. Restoration procedure. Consider a link l that is currently carrying Nllp lightpaths out of Nlf lightpaths are affected by a failure of some channels on the link or the entire link itself. The restoration procedure is described as follows: (1) If the available number of spare tunable lasers for the link, TL is sufficient to restore all the failed connections (i.e. TLR Nlf ), the affected connections are switched to the same wavelength on which they
Fig. 3. Re-routing over a free channel in the event of channel failure. In the example, lightpaths on wavelengths w2 and w4 are affected by the failure; tunable lasers l1 and l2 are, respectively, tuned to w2 and w4 to provide recovery.
992
H. Krishnamurthy et al. / Computer Communications 28 (2005) 987–999
were set up using the tunable lasers. This is shown in Fig. 3, where transmitters for wavelengths w2 and w4 have failed; tunable lasers l1 and l2 are tuned to w2 and w4, respectively, to provide recovery. Note that this does not require any wavelength conversion or control messages to other nodes. (2) If not enough tunable lasers are available to restore all failed connections, then the connections are switched to unused wavelengths on the same link. This requires wavelength conversion at the affected node in order to convert the incoming wavelength to the newly selected wavelength. Thus, if ðNkf % TLC W K Nllp Þ, then all connections can be handled locally by the affected node. This will require control messages to the neighboring node to alter its switching configuration followed by an acknowledgment from that neighbor. This method will also require change of switching configuration at the node under consideration. (3) If there still exist un-restored connections, they are handled by using a link-level restoration mechanism described in Section 4.2. This involves determining by-pass route(s) around the link, and transferring data to those route(s). Note that in the event of a link failure, Step 3 is invoked to handle all the affected lightpaths. End of procedure 4.2. Link failure In this section, we present two link-level restoration algorithms called the Redirection Algorithm (RDA) and the Disjoint Paths Algorithm (DPA). In order to recover from link failures, pre-computed paths are used in an effort to reduce the restoration time. 4.2.1. The redirection algorithm (RDA) With RDA, for each outgoing link of a given node, one of the node’s neighbors is selected as the redirector node. Consider a link l from node j to node k and let node r that is a neighbor of node j be the redirector node. Node j determines the appropriate redirector node for a given link based on the current loads on all its other neighboring links. One possibility is to select the neighbor link with the lowest load. The redirector node’s role is to determine an alternate route from itself to node k. The redirector node uses simple shortest-path routing to determine its path to the specified node. The redirector node simply excludes node j (and all its incident links) whom it is serving so that traffic does not go through that node. The node uses the topology information gathered from the periodic routing update information. When link l fails, all affected connections that cannot be handled in Steps 1 and 2 above are forwarded by node j to the redirector node r. The latter then reroutes as many connections as possible on the by-pass route. Thus, a node
Fig. 4. Examples to illustrate the RDA and DPA algorithms.
attempts to balance the forwarding by sharing the rerouting load among its neighbor nodes. An example usage of the RDA algorithm is shown in Fig. 4, where node A has selected node C as the redirector node for link AD. Node C has pre-computed the path CGD as the bypass route to node D. When link AD fails, affected connections are first re-directed to node C, which in turn forwards the traffic along path CGD to node D. In summary, this approach makes use of pre-determined neighboring nodes to handle affected lightpath traffic that require re-routing. 4.2.2. Disjoint paths algorithm (DPA) In the Disjoint Paths Algorithm (DPA), every node precomputes two static link-disjoint backup paths for each of its outgoing links. If a failure occurs on a link, the affected connections are re-routed through both the two precomputed disjoint paths. This is shown in Fig. 4, where A-C-E-F-B and A-H-B are the two backup paths for the link AB. The motivation behind having two different routes in DPA is to increase the restoration efficiency. When only one alternate route is used, it may not be possible to restore all affected connections. We have used Suurballe’s algorithm [29] to determine two link-disjoint routes from j to every one of its neighbors. Note that it is possible to extend the scheme to have a node compute more than two alternate by-pass paths for each of its links. This will help better distribute the load and increase restoration efficiency. 4.3. State information and route determination Both of the above link restoration algorithms are distributed algorithms that require periodic information exchanges. It is assumed that all the restoration-related
H. Krishnamurthy et al. / Computer Communications 28 (2005) 987–999 Table 3 State information maintained by the link restoration algorithms for the example network shown in Fig. 4 Neighbor table at node A for RDA Neighbor
Redirector assigned
Lightpath list
Number of wavelengths on link
D C H
C D C
{1} {3} {4,5}
4 4 4
Redirector table ar redirector node C, for RDA Destination
Neighbor
Route
D H
A A
C-G-D C-E-F-B-F-H
Disjoint route table at node A, for RDA Neighbor
Route 1
Route 2
B C D
A-H-B A-D-G-C A-C-G-D
A-C-E-F-B A-B-F-E-C N/A
information is exchanged during the link summary exchanges of the routing algorithm. Additional network and link state information are to be maintained at each node, as summarized in Table 3. In RDA, each node stores a neighbor table that holds information about the neighbors, the redirectors assigned for links to those neighbors, the number of wavelengths available on the link and a list of lightpaths passing through the link to that neighbor. In addition, each node serving as a redirector node needs to store additional routing information for the nodes that it serves. The routes to the destinations are determined using Dijkstra’s shortest path algorithm. For example, with RDA, node A has an entry for each of its neighbors—node D is the redirector node if there is a failure (partial or complete) on link AC and node C is the redirector node if either link AD or AH fails (partial or complete). Node C maintains the alternate route C-G-D to node D and route C-E-F-B-F-H to node H. With DPA, a node stores two link-disjoint routes to each neighbor that are determined using Suurballe’s algorithm [29]. In the example shown, node A maintains two routes each to nodes B, C and D (for which only one path is available). Note that the actual implementations may also maintain other pertinent state information such as the currently used and available wavelengths on each link.
5. Performance analysis This section describes the performance of the proposed algorithms. The performance was studied using discrete event simulation, based on YACSIM [30], a C-based simulation package.
993
5.1. Simulation details Simulations were conducted for four different networks: (i) the 14-node 21-link NSFNET, (ii) the 20-node 32-link ARPANET, (iii) a 15-node 21-link inter-connected ring network [12] and (iv) a randomly generated 50-node network with average degree of 3.25. Each link in the network has 32 wavelengths. Failures on a single link and on two links are considered. The performance of the algorithms in the absence and presence of wavelength conversion is evaluated. A pseudo-dynamic traffic demand is assumed, where for a given traffic load in the network, connections are dynamically admitted into the network. During this time, connection requests arrive according to a Poisson process, connections are established and torn down. After steady state is achieved, the network is frozen and no more connections requests are accepted. Failures are then randomly introduced in the network for the existing connections and the performance of the algorithms is studied. In the simulations, 100,000 connection requests are admitted into the network, before the network state is frozen to introduce failures. This is similar to the approach in [12]. The system load is denoted by the average link utilization P Nllp W in the network, as given by Lav Z l2E E The performance metrics considered are: (i) the restoration efficiency, defined as hZ(NR/NF), where NR denotes the total number of restored connections and NF the total number of affected connections, during the entire simulation run, and (ii) the restoration time, per failure instance. 5.2. Analysis of failures on a single link This section studies the performance of the algorithm when the failure affects a single link in the network. The tunable lasers are assumed to be in a share-per-link configuration. 5.2.1. No conversion Fig. 5 shows the restoration efficiency (h) for the 50-node random network with no wavelength conversion, for the RDA and DPA algorithms, varying the number of failed channels per failure ðNlf Þ and the number of spare tunable lasers (TL) per link. Note that Nlf Z 32 corresponds to link failure. It is observed that, for both RDA and DPA, h decreases as Nlf increases. It is also seen that, for a given Nlf the achieved h increases with increasing spare capacity (i.e. TL). When TLR Nlf , 100% restoration is achieved; this is expected since there are adequate spare lasers to restore the failed ones. It is also observed that DPA algorithm always provides better restoration than RDA. This is due to the fact that DPA provides better load sharing of the affected lightpaths by using two disjoint paths. RDA, on the other
994
H. Krishnamurthy et al. / Computer Communications 28 (2005) 987–999
Fig. 5. Restoration efficiency studies for a random 50-node network davZ3.25 and LavZ63% using the Redirection Algorithm (DPA) in the absence of wavelength conversion.
hand, uses a single path through the redirector node is used to re-route all the affected lightpaths. For example, for Nlf Z 8 and TLZ2, it is found that DPA has about 10% higher restoration efficiency than RDA. When link failures occur, the restoration efficiencies do not exceed 35% for both algorithms for TL%2. This is due to the lack of sufficient spare capacity given the high network load of 63%. However, for higher values of TL, much higher restoration is observed. 5.2.2. With conversion Fig. 6 presents the performance of the algorithms for the NSFNET topology with the nodes possessing optical
wavelength conversion capabilities. The performance trends are similar to the earlier analysis where conversion was not present. As before, DPA is seen to out-perform RDA. and achieves restoration efficiency close to 100% is obtained for link failures even when the value of TL is as low as 4. This shows that the introduction of wavelength conversion in the network results in a substantial improvement in the performance of the algorithms. 5.2.3. Comparison of algorithms in the absence and presence of wavelength conversion Fig. 7 shows the comparative performance of the algorithms with and without wavelength conversion for
Fig. 6. Restoration efficiency for NSFNET with LavZ76% using the Redirection algorithm (RDA) and the Disjoint Paths Algorithm (DPA) in the presence of wavelength conversion.
H. Krishnamurthy et al. / Computer Communications 28 (2005) 987–999
995
Fig. 7. Comparison of restoration efficiency in the presence and absence of wavelength conversion under varying loads for (a) NSFNET with two tunable lasers per link and 16 channels failed per link (b) ARPANET with four tunable lasers per linkand the entire link failed.
the NSFNET and ARPANET topologies varying average link load. It is seen that, as the average link load increases, the restoration efficiency decreases for all the algorithms. DPA with wavelength conversion performs the best since two routes are available for restoration and also due to the fact that more wavelengths are available. There is a significant difference between the performance of the algorithms in the presence and absence of wavelength conversion, as observed earlier. For instance, when 16 channels fail and there are two tunable lasers per link, the restoration efficiency of RDA with wavelength conversion, for an average link load of 66% is almost twice that obtained by
the RDA with no wavelength conversion. A similar trend is observed in Fig. 7(b) when the entire link is assumed to fail. 5.2.4. Comparison of link-level restoration algorithms The performance of the proposed algorithms is compared to the broadcast-based restoration mechanism described in [12]. In the broadcast mechanism, when a link fails, the source node of the link propagates broadcast messages to all nodes in the network, in an attempt to discover a route to the destination of the failed link. The route taken by the broadcast message reaching the destination first, is the route used for restoration. The comparison of the algorithms is shown in Fig. 8.
Fig. 8. Comparison of restoration efficiency of DPA and the Broadcast mechanism for (a) 0 and one tunable laser per link (b) one and four tunable lasers per link for the 15-node inter-connected ring network.
996
H. Krishnamurthy et al. / Computer Communications 28 (2005) 987–999
Fig. 8(a) shows the performance of the DPA algorithm for TLZ0 and TLZ1 and the broadcast algorithm. It is observed that the DPA algorithm with TLZ1 has the highest restoration efficiency amongst the cases studied. The performance of the broadcast mechanism is slightly higher than that of DPA when no tunable lasers are available. The broadcast mechanism is more likely to find a restoration route than the RDA and DPA algorithms because the route is determined dynamically. On the other hand, RDA and DPA pick pre-computed shortest routes as the restoration routes. In RDA and DPA, if the session’s wavelength is not available on the pre-computed restoration routes, the session remains unrestored. In the broadcast mechanism, only if a wavelength is not found on all possible routes does the session remain unrestored. This implies that the broadcast mechanism may use a longer route for restoration than the RDA and DPA algorithms. Fig. 8(b) shows the performance comparison for the DPA algorithm for TLZ0 and TLZ1 and the broadcast algorithm. As expected, the restoration efficiency of the DPA algorithm is substantially higher when the number of tunable lasers is increased. 5.2.5. Restoration time calculation The restoration time is defined as the time elapsed between failure detection and connection restoration. The restoration time for a given instance of failure is calculated based on the different cases as shown below. Let TL denote the time to restore channels on the same link; TR the time to restore channels by rerouting and T, the total restoration time. For other notations, please refer to Table 2. Case 1. ðNlf % TLC W K Nllp Þ: All the affected lightpaths are switched to their original wavelengths using the tunable lasers or to unused wavelengths on the same link, if wavelength conversion is present. If wavelength conversion is absent, the fixed lasers may not be used for restoration since the affected session then has to be switched on the original wavelength. If adequate tunable lasers available, then the restoration time is given by TZ TL Z tl . If wavelength conversion is necessary, then a control message is sent to the neighbor (followed by an acknowledgment) requesting switch configuration of the neighbor. In this case, the restoration time is given by TZ TL Z maxðtl ; 2tc C ts Þ. Case 2. ðNlf O TLC W K Nllp Þ: As many lightpaths as possible are switched on the same link, and the remaining are re-routed on alternate routes. The restoration time for this case is calculated as follows: TR Z 2tc HðRrda Þ C ts ðHðRrda Þ K 1Þ for RDA
(1)
TR Z max½2tc HðRidpa Þ C ts ðHðRidpa Þ K 1Þ iZ1;2
(2)
T Z TL C TR
ðsequential execution of local and reroutingÞ
for DPA
Table 4 Restoration times for networks, single link failure, no tunable lasers per link for RDA and DPA Network and load 14-Node NSFNET, 54% 15-Node interconnected ring, 38% 20-Node ARPANET, 25% 50-Node, davZ3.25, 55%
Restoration time (ms) RDA
DPA
Broadcast
4.47
5.01
7.63
3.85
3.49
6.72
4.34
4.78
8.19
3.95
3.82
8.95
T Z maxðTL ; TR Þ ðparallel execution of local and reroutingÞ
All control messages are followed by an acknowledgment-this accounts for the factor 2 in the calculations. Numerical analysis. The restoration times obtained by RDA and DPA with no tunable lasers per link and the broadcast mechanism for link failures in different networks is shown in Table 4. The values of tc and ts are assumed to be 550 ms and 500 ms, respectively [5]. Each entry denotes the network topology and the average link load—both medium and low loads have been considered. The restoration time for an affected session is directly proportional to the number of hops in the restoration route. Since the broadcast mechanism may use a restoration route with a greater hop count than the RDA and the DPA algorithms, the restoration time is higher. For instance, in the 50-node network, the restoration times obtained by the RDA and DPA mechanisms are about 40% lower than that obtained by the broadcast mechanism. Table 5 shows the restoration time when there are two tunable lasers per link, with four channels failed per link. The restoration times obtained are seen to be very low using RDA and DPA. It is to be noted that these times depend upon the tuning latencies of the tunable lasers in use. Hence, to achieve milli-second and sub-millisecond restoration times, it is essential to obtain the appropriate low-latency technology.
Table 5 Restoration times for networks, number of channels failedZ8, two tunable lasers per link for RDA and DPA Network and load
(3)
(4)
14-Node NSFNET, 57% 15-Node interconnected ring, 38% 20-Node ARPANET, 42% 50-Node, davZ3.25, 55%
Restoration Time (ms) RDA
DPA
1.78 1.28
1.69 1.20
1.32 1.13
1.68 1.23
H. Krishnamurthy et al. / Computer Communications 28 (2005) 987–999
997
Fig. 9. Restoration efficiency varying the number of tunable lasers shared per link and shared per node, using the Disjoint paths algorithm for (a) NSFNET with LavZ54%, Number of channels failedZ8 and (b) NSFNET with LavZ76%, Entire link failed, in the case of simultaneous failure on two links incident on a common node.
5.3. Failures on two links This section presents an analysis of the performance of the restoration algorithms when simultaneous failures occur on two links incident on a common node. This is similar to the SRLGs studied in [27]. The restoration efficiency obtained with the DPA algorithm when the two node architectures have tunable lasers in share-per-node and share-per-link configurations, is studied.
5.3.1. Comparison of share-per-node and share-per-link architectures Fig. 9 shows the performance comparison of the shareper-link and share-per-node configurations of the tunable lasers. As expected, the share-per-link configuration gives better performance than the share-per-node configuration. It is noted that for channel failures, as the number of tunable lasers increases, the restoration efficiencies using both architectures converge. However, this is not the case for link
Fig. 10. Comparison of DPA algorithm for failure on a single link and dual failure for nodes with tunable lasers shared per link for ARPANET with (a) Number of failed channelsZ8 and (b) the entire link failed.
998
H. Krishnamurthy et al. / Computer Communications 28 (2005) 987–999
failures. When channels fail, restoration may be achieved by switching failed connections on the same link. In the case of failure of only a few channels, the two architectures become identical, as the number of tunable lasers increases. For instance, when a maximum of four channels fail on each link, and they are switched on the same link, having eight tunable lasers shared-per-node and shared-per-link will result in identical performance. It may thus be inferred that for a small number of channel failures, it is possible to achieve high restoration levels at a lower cost. However, when the entire link fails, all affected sessions have to be rerouted around the failed link. Since many more sessions are affected, there is a significant difference between having the tunable lasers shared-per-link and shared-per-node. 5.3.2. Comparison with performance of algorithms for failure on single link The restoration efficiency comparison for failures on a single link and on two links is shown in Fig. 10. A significant difference is observed in the performance when a single link fails and when two links fail (Fig. 10(b)). This is due to the fact that the spare capacity is insufficient to restore all the failed sessions when two links fail. However, when only a few channels fail, the restoration efficiencies for recovering from failure on a single link and on two links, converge quickly, as the number of tunable lasers per link increases. This implies that, in the case of channel failures, a low spare capacity is sufficient to recover from failures both on a single link and on two links.
6. Conclusions This paper proposed a restoration mechanism to handle channel and link failures on a single link and on two links. To achieve recovery from channel failures, a local restoration algorithm which attempts to switch the affected sessions on the same link using tunable lasers, is initiated. If local restoration is not possible due to insufficient spare capacity at the switch, a link-level restoration scheme is adopted. Two link-level restoration algorithms—the Redirection Algorithm (RDA) and the Disjoint Paths Algorithm (DPA)—are proposed. A detailed study of the performance of the algorithms in terms of the restoration efficiency and restoration time was presented. The results show that for failures on a single link, as the spare capacity (i.e. the number of tunable lasers) increases, the restoration efficiency also increases. It is seen that as few as four tunable lasers per link results in a high restoration efficiency for moderately high loads. The restoration procedure results in much lower restoration times, due to the low tuning latency of the tunable lasers. A comparison of the link-level restoration algorithms with the mechanism described in [12] showed that the DPA
algorithm achieves the highest restoration efficiency. It is also seen that about 40% lower restoration times are obtained by RDA and DPA, as compared to the mechanism in [12]. For simultaneous failures on two links, it is seen that the share-per-link configuration of the tunable lasers results in better restoration as compared to the share-pernode configuration. It is also observed that for channel failures under moderate loads, a low spare capacity is sufficient to recover from both failures on single links and on dual links.
Acknowledgements A part of the research was conducted while the first author was on a summer internship at Intel Corporation.
References [1] P. Green, Progress in optical networking, IEEE Communications Magazine 39 (1) (2001) 54–61. [2] K. Sivalingam, S. Subramaniam (Eds.), Emerging Optical Network Technologies, Springer, Boston, MA, 2004, p. 2004. [3] O. Gerstel, R. Ramaswami, Optical layer survivability—an implementation perspective, IEEE Journal on Selected Areas in Communications 18 (10) (2000) 1885–1923. [4] M. Sridharan, A. Somani, Revenue maximization in survivable WDM networks, in: Proceedings of the SPIE OPTICOMM, Dallas, TX, 2000, pp. 291–302. [5] H. Zang, B. Mukherjee, Connection management for survivable wavelength-routed WDM mesh networks, Optical Networks Magazine 2 (4) (2001) 17–28. [6] A. Neukermans, R. Ramaswami, MEMS technology for optical networking applications, IEEE Communications Magazine 39 (1) (2001) 62–69. [7] C.J. Chang-Hasnain, Tunable VSCEL, IEEE Journal on Selected Topics in Quantum Electronics 6 (6) (2000) 978–987. [8] R. O’Dowd, Tunable and agile laser transmitter developments for future DWDM optical networks: towards managed wavelength control and switching, Photonic Network Communications 2 (1) (2000) 97–103. [9] New Focus TLS 420C: Using Tunable Lasers to Improve Network Efficiency and Meet Customer Requirements, White Paper, 2001 (http://www.newfocus.com). [10] G. Sarlet, J. Wesstrom, P. Rigole, B. Broberg, Widely tunable edge emitters, White Paper (http://www.adc.com). [11] G.A. Fish, Monolithic, widely-tunable, DBR lasers, in: Proceedings of the Optical Fiber Communications (OFC), Anaheim, CA, 2001. [12] S. Ramamurthy, B. Mukherjee, Survivable WDM mesh networks, Part II—restoration, in: Proceedings of the International Conference on Communications (ICC), Vancouver, Canada, 1999, pp. 2023–2030. [13] M. Sivakumar, R. Shenai, K.M. Sivalingam, Protection and restoration for optical WDM networks: a survey in: K. Sivalingam, S. Subramaniam (Eds.),, Emerging Optical Network Technologies, Springer, Berlin, 2004. [14] S. Ramamurthy, B. Mukherjee, Survivable WDM mesh networks Part 1—protection, in: Proceedings of the IEEE INFOCOM, vol. 2, New York, NY, 1999, pp. 744–751. [15] B. Doshi, S. Dravida, P. Harshavardhana, O. Hauser, Y. Wang, Optical network design and restoration, Bell Labs Technical Journal 1999; 58–83.
H. Krishnamurthy et al. / Computer Communications 28 (2005) 987–999 [16] G. Li, J. Yates, R. Doverspike, D. Wang, Experiments in fast restoration using GMPLS in optical/ electronic mesh networks, in: Proceedings of the Optical Fiber Communications (OFC), Anaheim, CA, 2001. [17] R. Shenai, M. Venkatachalam, C. Maciocco, K.M. Sivalingam, Threshold based selective survivability for optical WDM mesh networks, in: Proceedings of the First International Conference on Broadband Networks—Optical Networking Symposium, San Jose, CA, 2004. [18] R. Shenai, C. Maciocco, M. Mishra, K.M. Sivalingam, Threshold based selective link restoration for optical WDM mesh networks, in: Proceedings of the International Workshop on the Design of Reliable Communication Networks (DRCN), Banff, Canada, 2003. [19] D. Zhou, S. Subramaniam, Survivability in optical networks, IEEE Network Magazine 14 (6) (2000) 16–23. [20] D. Stamatelakis, W.D. Grover, IP layer restoration and network planning based on virtual protection cycles, IEEE Journal on Selected Areas in Communications 18 (10) (2000) 1938–1949. [21] G. Ellinas, A.G. Hailemariam, T.E. Stern, Protection cycles in mesh WDM networks, IEEE Journal on Selected Areas in Communications 18 (10) (2000) 1924–1937. [22] O. Gerstel, G. Sasaki, Quality of protection (QoP): a quantitative unifying paradigm to protection service grades, in: Proceedings of the SPIE OPTICOMM, Denver, CO, 2001, pp. 12–23. [23] C. Vijayasaradhi, C.S.R. Murthy, Routing differentiated reliable connections in single and multi-fiber WDM optical networks, in: Proceedings of the SPIE OPTICOMM, Denver, CO, 2001, pp. 24– 35. [24] H. Choi, S. Subramaniam, H.-A. Choi, On double-link failure recovery in WDM optical networks, in: Proceedings of the IEEE INFOCOM, New York, NY, 2002. [25] W. He, M. Sridharan, A.K. Somani, Capacity optimization for surviving double-link failures in mesh-restorable optical networks, in: Proceedings of the SPIE OPTICOMM, Boston, MA, 2002. [26] M. Sivakumar, C. Maciocco, M. Mishra, K.M. Sivalingam, A hybrid protection–restoration mechanism for enhancing dual-failure restorability in optical mesh-restorable networks, in: Proceedings of the SPIE OptiComm, Dallas, TX, 2003. [27] J. Doucette, W.D. Grover, Capacity design studies of span-restorable mesh networks with shared-risk link group (SRLG) effects, in: Proceedings of the SPIE OPTICOMM, Boston, MA, 2002. [28] H. Krishnamurthy, K.M. Sivalingam, M. Mishra, Restoration mechanisms based on tunable lasers for handling channel and link failures in optical WDM networks, in: Proceedings of the SPIE OPTICOMM, Boston, MA, 2002. [29] R. Bhandari, Survivable Networks—Algorithms for Diverse Routing, Kluwer Academic Publishers, Dordrecht, 1998. [30] J.R. Jump, YACSIM Reference Manual, Rice University, Department of Electrical and Computer Engineering, first ed. 1992.
Harini Krishnamurthy is presently with Amazon.com in Seattle, WA. She received her M.S. degree in Computer Science from Washington State University, Pullman, WA; and her M.C.A. degree from Anna University in Madras, INDIA.
999
Krishna M. Sivalingam is an Associate Professor in the Dept. of CSEE at University of Maryland, Baltimore County. Previously, he was with the School of EECS at Washington State University, Pullman from 1997 until 2002; and with the University of North Carolina Greensboro from 1994 until 1997. He has also conducted research at Lucent Technologies’ Bell Labs in Murray Hill, NJ, and at AT\&T Labs in Whippany, NJ. He received his Ph.D. and M.S. degrees in Computer Science from State University of New York at Buffalo in 1994 and 1990 respectively; and his B.E. degree in Computer Science and Engineering in 1988 from Anna University, Chennai (Madras), India. While at SUNY Buffalo, he was a Presidential Fellow from 1988 to 1991. His research interests include wireless networks, optical wavelength division multiplexed networks, and performance evaluation. He holds three patents in wireless networks and has published several research articles including more than thirty journal publications. He has published an edited book on Wireless Sensor Networks in 2004 and edited books on optical WDM networks in 2000 and 2004. He served as a Guest Co-Editor for special issues of the ACM MONET journal on ‘‘Wireless Sensor Networks’’ in 2003 and 2004; and an issue of the IEEE Journal on Selected Areas in Communications on optical WDM networks (2000). He is co-recipient of the Best Paper Award at the IEEE International Conference on Networks 2000 held in Singapore. His work has been supported by several sources including AFOSR, NSF, Cisco, Intel and Laboratory for Telecommunication Sciences. He is a member of the Editorial Board for ACM Wireless Networks Journal, IEEE Transactions on Mobile Computing, Ad Hoc and Sensor Wireless Networks Journal, and KICS Journal of Computer Networks. He serves as Steering Committee Co-Chair for the International Conference on Broadband Networks (BroadNets) that was created in 2004. He is currently serving as General Co-Vice-Chair for the Second Annual International Mobiquitous conference to be held in San Diego in 2005 and as General Co-Chair for the First International Conference on Security and Privacy for Emerging Areas in Communication Networks to be held in Athens, Greece in Sep. 2005. He served as Technical Program Co-Chair for the First IEEE Conference on Sensor and Ad Hoc Communications and Networks (SECON) held at Santa Clara, CA in 2004; as General Co-Chair for SPIE Opticomm 2003 (Dallas, TX) and for ACM Intl. Workshop on Wireless Sensor Networks and Applications (WSNA) 2003 held in conjunction with ACM MobiCom 2003 at San Diego, CA; as Technical Program Co-Chair of SPIE/IEEE/ACM OptiComm conference at Boston, MA in July 2002; and as Workshop Co-Chair for WSNA 2002 held in conjunction with ACM MobiCom 2002 at Atlanta, GA in Sep 2002. Heis a Senior Member of IEEE and a member of ACM.
Manav Mishra received his M.S. degree in Electrical Engineering from Washington State University, Pullman in 1999, and his B.E. degree in Electrical and Electronics Engineering from College of Engineering, Guindy, Anna University, Madras, India in 1997. He is presently with Microsoft Research in Redmond, WA, USA. From 1999 until 2003, he was employed by Intel Corporation, Hillsboro, Oregon, USA. His research interests include all-optical WDM networks, wireless communications and networking and performance analysis. He is co-recipient of the Best Paper award at the IEEE International Conference on Networks held at Singapore in September 2000.