Load sharing in Call Server clusters

Load sharing in Call Server clusters

Available online at www.sciencedirect.com Computer Communications 30 (2007) 3027–3045 www.elsevier.com/locate/comcom Load sharing in Call Server clu...

1MB Sizes 0 Downloads 51 Views

Available online at www.sciencedirect.com

Computer Communications 30 (2007) 3027–3045 www.elsevier.com/locate/comcom

Load sharing in Call Server clusters Muhammad Asif a, Shikharesh Majumdar a

a,*

, Gerald Kopec

b

Department of Systems and Computer Engineering, Carleton University, 1125 Colonel By Drive, Ottawa, ON, Canada K1S 5B6 b Nortel, Ottawa, ON, Canada K2H 8E9 Available online 27 June 2007

Abstract A Call Server in a telephone switch is responsible for connecting the calling and called subscribers. A cluster of Call Servers that serves the subscribers in a large geographical area can provide geographic redundancy, scalability, reduced operational cost, fault tolerance and load sharing. This paper focuses on load sharing in Call Server clusters, particularly on strategies for redirecting new calls from one Call Server to the other. It investigates relative performances of static and adaptive approaches for effectively handling the load spike’s occurring on the system. A detailed simulation model is developed using the OPNET tool. The model is used to study the performance of the Call Server clusters under different load conditions. A comparison of the performances of different redirection policies is presented. The impact of different load sharing thresholds on performance is analyzed and various insights gained into system behavior and performance are described.  2007 Elsevier B.V. All rights reserved. Keywords: Load sharing; Redirection policy; Call Server; Call processing; Call Server clusters

1. Introduction With increasing demands of the telecommunications services, wireline and wireless, the voice and data call switches have to provide a large call handling capacity. The internet and mobile devices have contributed to the increased demand for switching capacity of telecommunication servers. Because of these demanding applications, the capacities of the existing Call Servers are being pushed to their limits. In order to handle the increased capacity demand, provide the required fault tolerance, and minimize the operational costs for phone companies, Nortel and Carleton University have started to investigate Call Server clustering. Such clusters are to be able to work with both circuit switched networks and Voice over IP networks. By avoiding a single point of failure, a Call Server cluster can also address the security challenges that have characterized the beginning of this millennium. By providing geographic redundancy, a Call Server failure due to a ter*

Corresponding author. Tel.: +1 613 520 5654; fax: +1 613 520 5727. E-mail addresses: [email protected] (M. Asif), majumdar@ sce.carleton.ca (S. Majumdar), [email protected] (G. Kopec). 0140-3664/$ - see front matter  2007 Elsevier B.V. All rights reserved. doi:10.1016/j.comcom.2007.05.056

rorist attack or a natural calamity can be effectively handled by the other Call Servers in the cluster that can share the workload of the failed Call Server. In the context of this paper, a Call Server (CS) is an element consisting of a call control engine that performs call processing. A Call Server cluster is a group of multiple individual Call Servers covering a designated geographical area. As discussed, Call Server clustering provides the advantages of both scalability and reliability. Load sharing is an important aspect of Call Server clusters. One of the Call Servers in the cluster may be overloaded while other Call Servers may be operating below their designated load levels. In case of such an unbalanced situation, it is possible to distribute the additional traffic of the overloaded Call Server across the different Call Servers in the cluster. This level of load sharing prevents a single Call Server becoming the performance bottleneck. This paper focuses on load sharing in Call Server clusters for achieving high system performance. Normally, a new call is processed by a Call Server which contains the subscriber data associated with initiator of the call. But in case of an overload, a number of calls waiting for service at a Call Server is transferred to a lightly loaded Call

3028

M. Asif et al. / Computer Communications 30 (2007) 3027–3045

Server. Load sharing systems described in the literature are composed of a number of components that are briefly discussed. The transfer policy is used to decide when to start load transferring from a node. It determines whether a node is overloaded or under loaded. Sharing of load information among nodes is guided by an information policy. The location policy is used to determine from where or to which location the task is to be transferred. A great deal of work on these components exists in the literature (see [1,2] for example). In most of these previous works, load sharing is however done by transferring one task at a time. In case of a large number of waiting tasks on an overloaded node, transferring one task at a time is costly and may not be able to take load off from the overloaded node in a timely manner. This ability to perform a timely load transfer is particularly important in the context of a soft real time system such as a CS cluster. So in such situations, it is very important to transfer multiple tasks at a time to speed up load sharing and amortize the cost of transfers over multiple tasks. But there is not much work done to determine the number of tasks to be transferred at a time such that an effective load sharing is achieved. The load sharing strategies presented in this paper focus on investigating different techniques for calculating the proportion of new incoming calls that should be redirected to another Call Server to achieve high performance. The policy used to determine the number of calls to be redirected from an overloaded node to another lightly loaded node is termed the redirection policy. To the best of our knowledge, this is one of the first works that investigate the performance of redirection policies in the context of telecommunication system. The main contributions of this paper are summarized. • The paper demonstrates that the redirection policy used has a strong impact on system performance. • A number of static and dynamic redirection policies are proposed. • Based on a simulation model, a detailed investigation of the performance of these redirection policies under various combinations of workload parameters is performed. • The impact of various system parameters on performance is analyzed. For example, the effect of the threshold values used in the transfer and the location policy on the performance of a call are presented. • A number of insights into system behavior and performance under different overload scenarios are described. This research concerns Call Server clusters that span a large geographic area such as a city. Each CS in the cluster is responsible for the processing of calls related to voice and data traffic initiated by a subscriber. After processing a call, the switching network is used to connect the calling and called subscribers. Since the processing of calls on a CS can cause a high CPU utilization leading to intermittent overloads on individual Call Servers we focus on investi-

gating load sharing techniques on Call Server clusters. The results of this research will be useful primarily to both telecommunication equipment manufacturers and telecommunication service providers. The insights gained from this research are also of interest in the context of other cluster systems that process a transaction oriented workload. This paper is organized as follow. Section 2 discusses the related work for different components of load sharing. Section 3 describes the concept of Call Server clustering. Section 4 discusses the proposed redirection policies for load sharing in Call Servers. Section 5 gives an overview of the simulation model for the Call Server cluster system. Section 6 describes the experimental setup and the results. Finally, conclusions and future work are presented in Section 7. 2. Related work Load sharing in distributed systems has been investigated for a number of years and a significant body of knowledge exists in the area. The novelty of this paper lies in adapting these techniques to the domain of Call Server clusters and in investigating the redirection policies (introduced later in this section) on which little work seems to have been done. A representative set of papers on load sharing in distributed systems is discussed in this section. Load sharing techniques used in distributed systems are generally classified into two main categories; static and dynamic. Static techniques use predefined formulas or numbers without involving any run time state information of the system for decision making. A simple example of static load sharing is the distribution of load by a single broker using a round robin scheme. Static load sharing is observed to be effective, if there is only one centralized broker or a task dispatcher and tasks are of the same size (equal processing time). This approach is not considered very useful for a distributed load sharing [1,3]. Dynamic load sharing techniques use system state information on the fly for decision making. The decision making is performed at different levels [3]. Dynamic load sharing techniques are also referred to as adaptive load sharing techniques in the literature [1,2]. In adaptive load sharing techniques, the performance of the system can be used to adjust the degree of load sharing at run time. Load sharing has been investigated in the context of various environments. Chang et al. [4] have proposed a two stage load balanced switch to scale up to the speed of fiber optics. They have shown that load balancing is more effective than the other approaches such as i-SLIP, a conflict resolution algorithm based on the iterative round robin matching with slip. In [5], it is shown for Multi-Protocol Label Switching (MPLS) traffic, dynamic load sharing algorithms are better in performance than the shortest path algorithm. Long et al. [5] have also observed that dynamic load balancing algorithms are effective for both light and heavy load conditions.

M. Asif et al. / Computer Communications 30 (2007) 3027–3045

A request redirecting algorithm for web server clusters is investigated in [6]. The redirecting algorithm classifies each new request on the basis of its expected impact on server resources and each request is assigned a weight that is used to compute the server load. In order to reduce the load information collection time, only two servers from the server pools are selected and the new request is forwarded to the least loaded server. Xiao et al. [7] have proposed load sharing strategies which are not only based on the CPU resources but also consider memory resources. The objective of the proposed load sharing strategies is to minimize both the CPU idle times and the number of page faults in heterogeneous distributed systems. Cao et al. [8] have used mobile agents to achieve load sharing in a wide area network environment such as the Internet. Mobile agents act as coordinators on behalf of the servers and present themselves at remote sites to coordinate the job relocation activities. As indicated in Section 1, the load sharing policies available in the literature generally consist of four components [9]. These are: the transfer policy, the selection policy, the location policy and the information policy. In [10,11], different transfer policies are discussed in detail. Two simple and effective policies, in addition to round robin and random, are based on a fixed threshold and an adaptive threshold. The former policy compares the current load information of the local node to a predefined fixed threshold. If the current load goes above the fixed threshold, it attempts to transfer some of its load to another node. The dynamic or adaptive threshold based transfer policy first determines a threshold based on the load information of all the nodes in the system. Then this threshold is used in the decision making by the transfer policy. If the current load of a node is higher than this ‘evaluated’ threshold, then it starts transferring load to the lightly loaded node. The dynamic approach has an overhead of calculating the dynamic threshold. It is also observed that the simple fixed threshold policy gives better results than adaptive threshold based policy [11]. A fixed threshold based policy is used for load sharing in the Call Server cluster discussed in this paper. The location policy determines from where or to where a task is to be transferred. In the literature, location policies are broadly categorized as sender-initiated, receiver-initiated and symmetrically-initiated [12,13]. In the sender-initiated policies the potential senders of tasks search for receivers where the load can be transferred while in the receiver-initiated policy [13], potential receivers of tasks search for senders. In a symmetrically-initiated policy, both senders and receivers search for complementary nodes. For this research, a sender-initiated policy is adopted because of the nature of the redirection policies that are explained in Section 4. The key aspect of the sender-initiated location policy is the criteria used for selecting a receiver. The first step is to look for potential receivers in the system. A threshold based approach is more common for selecting a potential receiver [14]. If the load on the node is less than

3029

a certain threshold, then that node is considered a potential receiver. The next step is to choose a single or multiple receivers from the list of potential receivers. The very important issue here is that multiple senders can choose one receiver at the same time, and start transferring tasks to it simultaneously. This will result in overloading of the receiver. This problem can be minimized by probing the receiver before transferring the load to it. Lee et al. [15] have proposed a prediction based adaptive location policy to handle this problem. They assumed all the nodes are connected in the form of a ring, and the sender will look from the list of potential receivers which is ahead of it in the ring. In [16], Genova et al. suggested to select k receivers from the list of potential receivers randomly and then select the least loaded receiver from the list of k receivers. Selection policy determines which task is to be transferred for remote execution. Both preemptive and non-preemptive policies are possible [9]. The scope of this paper is limited to non-preemptive tasks, because only new incoming calls can be considered for transfer in Call Server clusters. Once processing of a call starts then it is not efficient to migrate the processing to another Call Server. Most of the work in the literature [17,18] for selecting tasks to be transferred is based on criteria that use different task characteristics. These characteristics include memory requirement, frequency of I/O access and task execution times [18]. For achieving high efficiency, a Call Server typically processes a batch of new call arrival events periodically [19]. New call messages arrive and are queued at the Call Server that periodically processes the new calls. As a result, if a Call Server is overloaded, there are multiple calls that can be potentially off loaded to other Call Servers. So the selection policy in case of Call Server clusters is entirely focused on determining what proportion of the new calls is to be processed locally and what proportion is to be transferred to a remote Call Server. The selection policy is thus referred to as a redirection policy in this paper. This policy is used to determine the number of new incoming calls currently waiting to be processed at a Call Server that should be redirected to another Call Server to distribute the load within the cluster. Sit et al. [1] have proposed adaptive techniques based on data mining to determine the number of tasks to be migrated to different nodes in a distributed system. But they used a central task dispatcher to redirect new tasks to different nodes (servers) and a central task balancer to migrate tasks between different servers to balance the load. On such a system, the central task dispatcher is a single point of failure and can become a performance bottleneck. Al-Amri et al. [20] introduced the idea of multiple task transfers with other components on a load shared system. Selection of multiple tasks is based on their dependencies on other nodes in the system. A number of redirection algorithms were explored in [21] for web server traffic, but all these algorithms assumed a central load balancer. Information policy concerns how and when to share load information among different nodes of the system [9]. There are three different types of information policies.

3030

M. Asif et al. / Computer Communications 30 (2007) 3027–3045

The first type is the periodic information policy, in which either every node transfers its load information to other nodes after a fixed period of time or every node probes other nodes in the system on a periodic basis to collect the load information. In the second type called the demand driven information policy, the load information is only fetched from other nodes when it is required. In the third type called the state change-based information policy, the load information is only broadcast to other nodes if the load on a node changes by a certain degree. A state change-based policy is used in this research. 3. Call Server cluster Majumdar et al. have introduced the concept of Call Server clusters in [22]. A telecommunication switch is a system that provides connections between two or more subscribers for the interchange of voice and data. A Call Server and the switching network are two important components of a telecommunication switch (see Fig. 1). The CS performs call processing, as well as various activities relevant to billing and maintenance whereas the switching network is responsible for providing a path for the connection of two subscribers. With a Call Server cluster, a pool of Call Servers is connected to the switching network that provides the switching fabric required for connecting the subscribers. The servers in the cluster inter-communicate by a dedicated interconnection network. Call Server clustering can be performed at various levels. It can include CS blades mounted on a single frame, multiple frames of Call Servers in one site as well as geographically distributed sites each of which contains one or more Call Servers (see Fig. 1). A given Call Server cluster is associated with a number of subscribers. Connection between subscribers that belong to separate CS clusters uses trunks that interconnect CS clusters. Once a subscriber goes offhook, the call processing functionality of the CS performs various activities required for setting up the call between

two subscribers. Once the identity of both the calling and the called subscribers are known, the CS instructs the appropriate gateway controllers to set up the path between the subscribers using the switching network. Using a Call Server cluster-based approach has a number of advantages over the conventional independent switch-based telecommunication network. These include the following. • Single logical switch view and the reduction of operations and management (OAM) costs: A Call Server clusterbased system includes multiple Call Servers that function as a single logical system. Such a cluster can be used to serve a larger geographic area than that currently handled by a single conventional CS. This significantly reduces the OAM cost which is of great interest to the telecommunications service provider. • Scalability: A CS cluster may start with a small number of Call Servers and additional Call Servers can be incrementally added as the demand on capacity grows. • Fault tolerance (including geographic redundancy): Since any call can be processed by any CS in the cluster, the lines handled by a particular CS can be distributed among other Call Servers in the cluster when a Call Server fails. If the Call Servers are located sufficiently far from one another, the CS cluster can gracefully handle a Call Server failure due to a natural calamity or a terrorist attack. • Load sharing: Current telecommunication networks deploying multiple independent Call Servers start rejecting call attempts when the current call volume at a given CS becomes too high to meet the grade of service requirements or when the utilization of a key system resource exceeds a predetermined threshold. In a CS cluster, instead of rejecting a call, it can be steered to another lightly loaded Call Server that still has the spare capacity to switch additional calls. This paper focuses on this load sharing issue that is discussed in the next section.

Site

Site

CS

CS Frame

CS

CS

CS

Frame

Switching Network

Fig. 1. A Call Server cluster spanning multiple geographic sites.

CS

M. Asif et al. / Computer Communications 30 (2007) 3027–3045

Clustering of Call Servers is relatively a new concept. There are a number of challenges that have to be addressed for clustering of Call Servers. These include the selection of a Call Server for call processing, inter-connection topologies to be used for connecting the Call Servers within a cluster, load sharing among Call Servers as well as billing and maintenance. In a Call Server cluster, any Call Server can be selected to process a call, so the first action on receiving a new call is to select an appropriate Call Server that will process that call. A number of Call Server selection strategies that are based on the characteristics of the call is presented [22]. The Call Server, which has the ownership of the state data of the originator of the call, is selected for processing the new call. The state data of a subscriber is the one which changes from idle to busy when the subscriber participates in a call. Once a call arrives, the state data of the calling server needs to be accessed. Since remote data access is more expensive than local, a class of techniques in which the Call Server that stores the state data of the calling subscriber is selected for call switching [22]. After the dial digits are collected, the identity of the called subscriber is known and its state data is required to complete the call processing. Three techniques for called subscriber data access, remote, replication and migration are experimented with [22]. The three strategies behave the same if the state data of the called subscriber is local to the Call Server processing the call. Differences in the strategies occur when the state data of the called subscriber is available at a remote Call Server. In case of remote, each called subscriber state data access is performed remotely. In case of replication, the called subscriber state data is copied at the Call Server switching (processing) the call. Each subsequent read operation is performed locally while messages to the remote Call Server are sent to maintain data consistency when a write operation is performed. The called subscriber state data is moved to the Call Server switching (processing) the call when the migration strategy is used. After the call is completed, the state data is moved back to the Call Server originally storing the data. The relative performances of the policies depend on the number of data accesses per call and the Read/Write ratio. The remote policy has been used to access calling and called subscriber state data in this research. The data management policy is not expected to have a strong impact on the relative performances of load sharing techniques that this paper focuses on [23]. Cisco has also introduced clustering of CallManagers for IP phones [24]. To the best of our knowledge, this is the only other clustering work existing in the domain of telecommunication Call Servers. The Cisco CallManager is the software-based call processing component of the Cisco enterprise IP telephony solution. The capacity of a single CallManager based server is only 7500 IP phones. Cisco CallManager clustering can yield scalability up to 30,000 IP phones per cluster. The subscribers are statically assigned to Call Managers that perform call switching.

3031

Although the system provides fault tolerance, dynamic sharing of load addressed in this paper is not handled by the system. In other commercial non-clustered CS systems, a call is rejected when an overload occurs. The research presented in this paper focuses on redirecting a call from an overloaded CS to an underloaded CS. The CS clusters addressed in this paper are expected to be scalable to hundreds of thousands of subscribers and perform intelligent dynamic Call Server selection as well as load sharing at run time while significantly reducing the operation and maintenance cost. 4. Design of redirection policies Before discussing redirection policies, a brief overview of the load index used in the load sharing of Call Servers is presented. 4.1. Load index A candidate load index should be easy to compute and should also be related to the main performance objective of the system such as the system response time. Response times of calls are usually measured as the origination delay. Origination delay is the time a new call message waits before it starts getting service from the call processing system. In case of Call Servers, a 95th percentile of the origination delay is found to be appropriate as the load index. Use of the 95th percentile value is also based on the fact that most performance specifications of telephone switches are expressed as 95th percentiles [25]. Further description of the process of selecting the load index is beyond the scope of this paper. A detailed discussion is provided in [23]. 4.2. Redirection policies The process of load sharing carried out by a Call Server is based on the flow chart presented in Fig. 2. In Fig. 2, Nt is the total number of available calls on a Call Server that can be redirected to a remote receiver Call Server. d is the redirection factor that is the proportion of the total number of new calls to be sent to the remote CS. Thus, the number of new calls sent to the remote Call Server is given by n ¼ d  Nt The main features of this flow chart are described next. (1) Every time a new incoming call arrives, the Call Server decides whether the current CS is a sender node or not (transfer policy). (2) If the current CS is a sender node, then it looks for an appropriate receiver CS where it can transfer the load of new incoming calls (location policy). (3) After locating a receiver CS successfully, the sender CS applies the redirection policy to determine the values of d and n. (4) n of the calls are transferred to the remote receiver CS while the rest of calls (Nt  n) are processed at the local

3032

M. Asif et al. / Computer Communications 30 (2007) 3027–3045

New origination messages arrived

No

Am I a sender CS?

Yes

No

Is receiver CS available?

Ye s

Calculate d, n

Send n calls to remote CS

Process all calls (Nt) at local CS

Process (Nt – n) calls at local CS

End

Fig. 2. A high level algorithm for load sharing.

CS. (5) In case the local CS is not a sender node or there is no available receiver CS, all the incoming calls are processed at the local CS. In this paper, five redirection policies are investigated. These include one static and four adaptive policies and are described in the following subsections. The four adaptive policies differ in the way the number of redirected calls ‘n’ is calculated using the load information of the sender Call Server and/or the receiver Call Server. Different degrees of knowledge of system state are used in these policies. Adaptive #1 and Adaptive #3 use load information of the sender CS and the receiver CS, respectively. Adaptive #2 and Adaptive #4 use load information of both the sender CS and the receiver CS but deploy different algorithms for using the information. In case of Adaptive #2, the maximum number of incoming calls that can be redirected to a remote CS is fixed whereas it is a variable parameter for Adaptive #4. The different policies are investigated to gain insight into the degree of knowledge of system state that is required to attain high system performance. 4.2.1. Static policy This policy transfers a fixed proportion of the total incoming calls, waiting to be processed by a call processing module of a Call Server, to a receiver Call Server. The proportion of calls transferred is termed the redirection factor (d). This policy is static in the sense that it does not use any load information of either the sender or the receiver Call Server in determining d that is held at a fixed value. A value

of 0.15 is used for d in most of the experiments described in this paper. The value of d = 0.15 is chosen based on experimental analysis described in Section 6.3.1. 4.2.2. Adaptive #1 This policy considers the load status of the sender Call Server to calculate the redirection factor (d). The challenging part of this policy is to determine the redirection factor when Call Servers are operating beyond their engineered capacity (E). Engineered capacity of a Call Server is defined as the load intensity that can be handled by a Call Server with a 95th POD less than a certain specified value (140 ms for our CS simulation model). We are informed by Nortel experts that such an engineering capacity is appropriate for the nature of research presented in the paper. The 95th POD of a Call Server operating at E will be referred to as TE in this paper. For CS operating above E, assuming a simple linear relationship between d and the load of the sender CS is not effective. The example graph shown in Fig. 3a shows a relationship between the load intensity and the 95th POD of the Call Server. The non-linear part of the curve beyond engineered capacity (part of the curve above dotted line) is sub-divided into four linear segments a, b, c and d. The horizontal lengths of these four segments are equal to one another. The exact relationship between d and the load index of the sender CS is determined separately for each segment. In order to determine the maximum value of d for a given segment the possible range of d (0–1) is divided into four equal parts. The maximum value of d corresponding to seg-

1800 1600 1400 1200 1000 800 600 400 200 0

a

b

c

Redirection factor (d)

95th POD (msec)

M. Asif et al. / Computer Communications 30 (2007) 3027–3045

d

3033

1.0 E

0.75 D

0.5 C

0.25 B

295

3000

3050

3100

3150

3200

3250

3300

A

3350

140 a 230 b

Load Intensity (KCalls/hour)

440

c

850

Load Index (95

d

th

1600

POD)

Fig. 3. Computation of redirection factor at sender CS: (a) relationship between 95th POD (load index) and load intensity; (b) relationship between d and load index.

ments a, b, c and d are set at 0.25, 0.5, 0.75 and 1.0, respectively (see Fig. 3b). If the load index of the sender Call Server, Ls, lies in the segment a, the value of d is computed by using the equation of a straight line. da ¼ 0:25  ðLs  140Þ=ð230  140Þ The values of 140 and 230 correspond to the 95th PODs at the two corners of segment ‘a’. Similarly the equations for segments b, c and d can be written as db ¼ 0:25  ½1 þ ðLs  230Þ=ð440  230Þ dc ¼ 0:25  ½2 þ ðLs  440Þ=ð850  440Þ dd ¼ 0:25  ½3 þ ðLs  850Þ=ð1600  850Þ Depending on which segment the load index of the sender CS (Ls) lies in, a specific equation is used to compute the value of d. Once d is found, the number of redirected calls n (a rounded off value) can be calculated using the equation n ¼ d  P  Nt

ð1Þ

where P is a constant which corresponds to the maximum redirected proportion of incoming calls that can be transferred to a remote Call Server. P is held at 40% for the experiments described in this paper. The value of 40% is determined using experiment that is described in Section 6.3.3.1. 4.2.3. Adaptive #2 This policy uses load information of both the sender Call Server and the receiver Call Server to calculate n. This policy is very similar to Adaptive #1 and differs in terms of using a dynamic value of P instead of a fixed one. The calculation of d uses the same algorithm as described for Adaptive #1 based on the load information of the sender CS. Based on the load status of the receiver CS, four availability levels that correspond to the spare capacity of the receiver CS are proposed. If the load index of a receiver CS is denoted by Lr, then the available capacity, A, of that receiver CS is defined as the difference between the location policy threshold, THL, and Lr (A = THL  Lr). The location policy threshold (THL) is a fixed value and a Call Server is designated as the receiver node if the load index of

that Call Server is lower than the location policy threshold. The four availability levels are defined in Table 1. A set of four values of P (P1 < P2 < P3 < P4) is used. Each value of P in the set corresponds to one availability level of the receiver CS. For example, P1 corresponds to availability level 1, P2 corresponds to availability level 2 and so on. Once the availability level of the receiver CS is determined, the corresponding value of P from the set (P1, P2, P3, P4) is used to determine the number of redirected calls, n, by using Eq. (1). Thus both the sender and receiver Call Server information is used in determining n: the sender Call Server load is used to determine d and the receiver Call Server load is used to determine P. 4.2.4. Adaptive #3 This policy uses load information of the receiver Call Server only to calculate the redirection factor d. The algorithm is based on a more precise relationship between the 95th POD and the load intensity in comparison to those used in Adaptive #1 and Adaptive #2. It has been observed that this relationship is approximately linear up to the load index corresponding to TE/2 and it is non-linear beyond that point (see [23] for a more detailed discussion). If the load index of the receiver CS is denoted by Lr, the available spare capacity of the receiver CS by A and the threshold of the location policy by THL, then the redirection factor can be computed by using a linear interpolation given by d ¼ A=THL

where

A ¼ THL  Lr

For a load index range of (TE/2 < Lr < THL), it has been observed experimentally that if load is transferred by assuming a linear relationship between the 95th POD and the load intensity, then the receiver CS gets overloaded. Table 1 Availability levels for the spare capacity of the receiver Call Server Level Level Level Level

1 2 3 4

0 < A 6 (0.125 * THL) (0.125 * THL) < A 6 (0.25 * THL) (0.25 * THL) < A 6 (0.5 * THL) (0.5 * THL) < A 6 THL

3034

M. Asif et al. / Computer Communications 30 (2007) 3027–3045

To avoid this, a low fixed proportion of calls is redirected when Lr > TE/2 Æ A. Higher the value of A, higher is the fixed value of d used. But these fixed values of d are lower than the values computed using the relation: d = A/THL. A more detailed analysis is presented in [23]. Once d is calculated, the value of n can be computed using Eq. (1) and the value of P is assumed 100%. 4.2.5. Adaptive #4 This policy uses load information of both the sender and receiver CSs and computes the redirection factor using an approach that is a combination of the approaches used in Adaptive #1 and Adaptive #3. The sender and the receiver components contribute equally to the determination of n. If, n, the number of calls to be transferred is given by n = D * Nt then D = 0.5 * d1 + 0.5 * d2 Æ d1 is determined using Adaptive #1 as described earlier and d2 is calculated by using Adaptive #3. 5. Simulation system overview To study the relative performance of different redirection policies, a simulation model of an eXtended-Architecture (XA-Core) based Digital Multiplex Switch (DMS) [25] is developed in OPNET [26]. OPNET provides a comprehensive development environment supporting the modeling of communication networks and distributed systems. A high level architecture of the model of a Call Server is shown in Fig. 4. The simulation model consists of two main sub-models. These sub-models are for the peripheral module (PM) and for the multiprocessor-based core Call Server. These two individual sub-models work as a client–server model. The clients represent the multiple OPNET processes used for modeling peripheral modules and the OPNET processes in the XA-Core Call Server represent the servers in the client–server model. The peripheral module is modeled for plain old telephone service (POTS), which is used only for voice calls.

The model of the XA-Core Call Server is further decomposed into smaller component models: the Input/Output process model, the broker and the call processing subsystem. A UML collaboration diagram [27] for a model of the Call Server is shown in Fig. 5. As expected, the Input/Output process handles all incoming and outgoing traffic of the core Call Server. A broker in each Call Server is responsible for load sharing. The call processing subsystem provides the core functionality of call processing. Shared memory is used in the form of built-in queue models of OPNET. The ‘Input Q’ shown in Fig. 5 is an input queue for the messages sent from external modules. The Input/Output process extracts these messages every time it is invoked by the scheduler. The messages, which are to be sent outside the Call Server, are put in the ‘Output Q’ queue. The call processing subsystem and the broker process put outgoing messages in the output queue. Again, the Input/Output process processes these messages every time it is invoked by the scheduler. The ‘XB-Q’ queue is an input queue of the broker process and it stores new call messages and the load information messages sent by the Input/Output process. The broker process extracts these messages every time it is invoked by the scheduler. Using the load sharing algorithm, the broker process put (Nt  n) new call messages in the origination queue (shown as ‘Orig Q’ in Fig. 5) for local processing and n new call messages in the ‘Output Q’ for remote processing. The new call messages wait in the origination queue for processing by the call processing subsystem. All the messages other than the new call messages are put directly in the ‘Progress Q’ queue by the Input/Output process. The progress queue is the second input queue for the call processing subsystem. The stereotypes used in this diagram are introduced to represent various OPNET processes and built-in queues. The first numbers of numeric labels (1.y, 2.y, 3.y) in the messages shown in Fig. 5 are based on the assumption that the scheduler invokes the Input/Output process first, followed by the broker and then the call processing subsystem

CallProcessing ProcessingSubsystem Subsystem Call Call Processing Subsystem (OPNETProcess) Process) (OPNET (OPNET Process)

I/O Process PM (OPNET Process)

Shared Memory (OPNET Process)

Broker (OPNET Process)

XA-Core Call Server Fig. 4. High level architecture of a Call Server.

M. Asif et al. / Computer Communications 30 (2007) 3027–3045

<>

Input Q

Output Q

1.2 put origination / load messages

<>

<>

XB-Q

Progress Q

Orig Q 2.2 put origination messages

<>

2.1 fetch input messages

1.4 fetch output messages

1.1 fetch input messages <>

3.1 fetch progress messages

3.3 put all output messages 1.3 put all progress messages

3.2 fetch origination messages

<> Call Processing Subsystem CallProcessing ProcessingSubsystem Subsystem Call

<> Scheduler

<> Input/Output

3035

2.3 put load and redirected origination messages

<> Broker

Fig. 5. Internal details of a Call Server simulation model.

processes. But in reality, the order of the invocation of these processes is not fixed. The messages in a particular process (e.g. messages with labels x.1, x.2, etc. for the Input/Output process) follow the sequence shown in Fig. 5. 5.1. Transfer, location and information policies As described in Section 2, a threshold based location and transfer policies are used in our simulated system. The location policy identifies a receiver node where calls will be transferred from a sender node. Any Call Server whose load index is lower than the location policy threshold can be a receiver node. To avoid a situation of multiple sender nodes selecting the same receiver node every time, a weighted random policy is used for choosing a receiver node from a set of candidate receiver nodes. This policy uses the available capacity A of a receiver node defined in Section 4.2.3.P A node i from the set is selected with a probability of Ai/ (Ai), where Ai is the available capacity of node i. The transfer policy identifies a sender node. When the load index of a Call Server exceeds the transfer policy

threshold, it is considered as a sender node. The thresholds used in our experiments are provided in Table 2. The location and the transfer policy thresholds correspond to values that are slightly lower and slightly higher, respectively, than the 95th POD achieved at the engineered capacity (E) for a single CS based system. A state change-based information policy is used: state information is sent by a

Table 2 Default values of simulation parameters Parameter

Value

Cluster size Load intensity corresponding to the engineered capacity (E) Think times (dialing, answering, talking and disconnecting) Data sharing policy Transfer policy threshold Location policy threshold Locality factor

3 3157 kcalls/h Truncated exponential (mean of 5 s) Remote access 175 ms 105 ms 50%

3036

M. Asif et al. / Computer Communications 30 (2007) 3027–3045

node when its load index changes by a predetermined value. 5.2. Simulation parameters and performance indices The workload parameters are varied for different sets of experiments to investigate the effect of the parameters on the behavior of the Call Server cluster. Most of the workload parameters are submitted as an input to the ‘Simulation Sequence’ tool provided by OPNET. The important workload parameters are load intensity, think time, service time and the locality factor. The parameter values were chosen in consultation with Nortel experts to model a Call Server cluster that spans a large geographic area such as a city and switch calls carrying voice and data traffic among subscribers. The default values of the important parameters are presented in Table 2. The experiments are performed using a cluster size of two and three. Only the results for a cluster size of three are included in this paper, so we set the default value of cluster size to three. Such a cluster size can provide call switching in a large geographic area such as an entire city. It has been observed that the relative performance of redirection policies is the same for a cluster size of two [23]. We expect the relative performance of the redirection policies to be the same for cluster sizes of four or more. The experiments with higher cluster sizes were not performed because this leads to significantly larger simulation times. Load intensity is the average number of call arrivals per unit time. The call arrival process is assumed to be a Poisson process. The load intensity corresponding to the engineered capacity of the simulation model of our Call Server is 3157 kcalls/h [22]. Various types of think times include the time taken by a subscriber to dial a number, to answer a call, to talk and to disconnect after the other subscriber hangs up. Durations of think time events are provided through a truncated exponential distribution (0.5–10 s) with a mean of 5 s. We have performed selected experiments with other mean values but the relative performance of the redirection strategies is observed to be insensitive to the mean value used. A mean value of 5 s is chosen to achieve a reasonable simulation time. The exponential distribution produces variability in the duration of the think time events and the truncation is done to eliminate very high or very low values (<0.5 s) of think times which are not possible in real scenarios. As mentioned in Section 3, a remote access policy is used to access the calling and the called subscriber state data. The remote access policy is chosen because it is simple to implement and is observed to give rise to superior performance for a broad class of workload parameters [22]. Service times are the times required to process different call processing messages. These service times are different for different messages. The values of these service times cannot be reported due to a non-disclosure agreement with Nortel. The values used for the transfer policy threshold and the

location policy threshold are determined by experiments and explained in Section 6.1. The experiments are conducted for a high speed network. The inter-connection link between the two Call Servers is assumed to be 100 Mbps which was effectively supported by the technology available at the time the simulation experiments were conducted. A similar inter-connection link is assumed between a peripheral module and a Call Server. Due to the small length of messages transferred over such a dedicated link, the intercommunication delay between a peripheral module and a Call Server is assumed to be negligible in comparison to the call processing times. The locality factor is the probability that the state data of a called subscriber is available locally on a Call Server that is designated to process the call when the call arrives. The default value assumed for the locality factor is 50%. The locality factor of 50% means the probability of the data to be available locally is equal to that of the data to be located on a remote Call Server. This is the intermediate value in a set of values experimented with (see also Section 6.4). A number of performance indices are observed to study the relative performance of different redirection policies. The key performance parameter is the origination delay. Ninety-fifth percentile of the origination delay (95th POD) is computed for individual Call Servers and for the overall Call Server cluster. In order to highlight the performance improvement produced by a redirection policy over a system that does not use any load sharing, the origination delay improvement (ODI) factor is introduced. It is the ratio of the overall 95th POD of the cluster without load sharing and the overall 95th POD of the cluster with load sharing under same conditions. A number of additional performance indices including the average origination delay are included in [23]. Due to space limitations only the 95th POD is discussed in this paper. The ODI factor and average origination delay are also shown for the main scenarios. After discarding the data for the initial warm up period, the simulations were run for at least 600,000 call arrivals. This was deemed adequate by Nortel experts for analyzing the performance of the Call Server cluster. The simulation results are discussed in the next section. 6. Experimental results The main focus of these experiments is to investigate the performance of the load sharing strategies. Note that all the load sharing strategies differ only in terms of redirection policies. The other components of the strategies are the same. The impacts of different workload parameters on the performance of the redirection policies are discussed. A factor at a time approach is used: one parameter is varied at a time while the others are held at their default values presented. Results of a set of pilot experiments are discussed in Sections 6.1 and 6.2. The appropriate values of the location

M. Asif et al. / Computer Communications 30 (2007) 3027–3045

6.1. Threshold analysis As discussed in Section 5.1, there are two types of thresholds corresponding to the two different components of load sharing. One threshold is for the transfer policy and the other is for the location policy. These thresholds should correspond to the border line between an overloaded system and a non-overloaded system. For telephone switches, this border line is the engineered capacity of the system and the 95th POD on this border line is denoted

Overall 95th POD (msec)

160 140 120 100 80 60 40 20 0 No LS

105

140

175

200

240

Transfer Policy Threshold (msec)

Fig. 6. Comparison of system performance for different values of the transfer policy threshold.

160 Overall 95th POD (msec)

and the transfer policy thresholds that are used in subsequent experiments are investigated in Section 6.1. The impact of load sharing on the performance of various Call Server cluster configurations is presented in Section 6.2. Based on the experimental results, the cluster with three Call Servers with one overloaded and two under loaded Call Servers is chosen to analyze in more detail in the following sections. Section 6.3 presents a detailed analysis of the performance of the various redirection policies introduced in this paper. The impact of the locality factor on the performance is discussed in Section 6.4. Section 6.5 presents the summary of performance evaluation of the experimental results obtained in this section. The results of the experiments are captured in Figs. 6–15. Note that for the bar graphs in Figs. 8–15, the sequence of the bars in a figure corresponds to the sequence of legends indicated in the figure.

140 120 100 80 60 40 20 0 No LS

105 140 175 200 Transfer Policy Threshold (msec)

240

Fig. 7. Comparison of system performance for different values of location policy threshold.

by TE (as defined in Section 4.2.2). A Call Server that has a 95th POD more than TE is considered overloaded and the one that has a 95th POD below TE is considered operating as a lightly loaded CS. The set of experiments described in this section assumes default values of the input parameters listed in Table 2 except for the thresholds used in the transfer policy and in the location policy. The threshold of the transfer policy is varied in the first experiment and the threshold of the location policy is varied in the next.

6.1.1. Transfer policy threshold In the first experiment, the threshold of the location policy is fixed at 140 ms (TE), the 95th POD corresponding to the engineered capacity of a CS (E) and the threshold of the transfer policy is varied from 100 to 240 ms. To see the effectiveness of load sharing, the load intensity of one CS (CS1) is kept slightly above (2.5%) E while the other two CSs (CS2 and CS3) are operated at 2.5% below E. The results of this experiment are presented in Fig. 6. The first bar in Fig. 6 (with No LS label) shows the results when no load sharing is used. For threshold values of 105 and 140 ms, it has been observed that the 95th POD of the overloaded CS is reduced significantly. But this increased transfer of calls pushes the 95th PODs of the lightly loaded CSs to higher values. The overall 95th POD of the cluster is observed to be lower than that achieved with no load sharing. As the transfer policy 700

1600

600 95th POD (msec)

1400 95th POD (msec)

3037

1200 1000 800 600 400

500 400 300 200 100

200

0

0 2CS-1OL-1LL

3CS-1OL-2LL

3CS-2OL-1LL

2CS-1OL-1LL

CS1

CS2

CS3

3CS-1OL-2LL

3CS-2OL-1LL

Cluster Configuration

Cluster Configuration

Cluster

CS1

CS2

CS3

Cluster

Fig. 8. Effect of cluster composition and size on the 95th PODs of the Call Server cluster with (a) no load sharing; (b) load sharing.

3038

M. Asif et al. / Computer Communications 30 (2007) 3027–3045 400

95th POD (msec)

350 300 250 200 150 100 50 0 No LS

Static

Adaptive Adaptive #1, #2, #3, #4

Fig. 9. Performance comparison of static and adaptive redirection policies (the order of d and that of the Adaptive policies follow the order used in the legend).

threshold is increased, performance improves at first but is observed to deteriorate when the threshold value is increased beyond 175 ms (see Fig. 6). With a threshold value of 175 ms, the 95th POD of the overloaded CS is reduced significantly without pushing the 95th PODs of the lightly loaded CSs to significantly higher values. For higher values of threshold such as 200 and 240 ms, there is not much reduction in the 95th POD of the overloaded CS because of a reduction in number of calls transferred and therefore the overall 95th POD of the cluster is high as well. Based on this experiment, a threshold value of 175 ms is chosen as the default value for the transfer policy (see Table 2). This value is slightly higher than TE and ensures that the overloaded CS will not start transferring calls due to small spikes in its load.

6.1.2. Location policy threshold The location policy threshold is used to identify the potential receiver Call Servers. In this experiment, the threshold for the transfer policy is set equal to 175 ms and value of the threshold for the location policy is varied from 70 to 175 ms. Fig. 7 shows the overall 95th POD of the cluster. The first bar in Fig. 7 (No LS) shows the results when no load sharing is used. As in the case of Fig. 6, a non-monotonic behavior is observed. The overall 95th POD of the cluster is observed to decrease first with an increase in the location policy threshold then to increase after a threshold value of 105 ms. The results for threshold values of 70 and 105 ms are better than the results achieved with threshold values of 140 and 175 ms. This is because of the more sensitive behavior of Call Servers when they operate near the engineered capacity that corresponds to the knee of the curve plotting 95th POD versus load intensity for a single Call Server (see Fig. 7). A small increase in the number of new incoming calls from a remote sender CS can easily overload such a receiver CS. So the threshold for the location policy should be chosen to be significantly lower than TE. The default value for the location policy threshold is chosen to be 105 ms (see Table 2) because the overall 95th POD of the cluster is the minimum observed in Fig. 7. The experiments with both the transfer policy threshold and the location policy threshold demonstrate a nonmonotonic behavior (see Figs. 6 and 7). Clearly, very low and very high values for these parameters lead to a suboptimal performance. Very low values for the transfer policy threshold lead to a large number of call transfers even

8

1000

7

800

6 5

ODI Factor

95th POD (msec)

1200

600 400

4 3 2

200

1 0

0 CS1=0% No LS

Static

CS1=2.5% CS1=5.0% Load Configuration Adaptive #1

Adaptive #2

CS1=0%

CS1=7.5%

CS1=2.5%

CS1=5.0%

CS1=7.5%

Load Configuration

Adaptive #3

Static

Adaptive #4

Adaptive #1

Adaptive #2

Adaptive #3

Adaptive #4

Average Origination Delay (m sec)

250 200 150 100 50 0 CS1=0% No LS

Static

CS1=2.5% CS1=5.0% Load Configuration Adaptive #1

Adaptive #2

Adaptive #3

CS1=7.5% Adaptive #4

Fig. 10. Performance comparison of the different redirection policies for different degrees of overloading for CS1 (a) 95th POD; (b) ODI factor; (c) average origination delay.

M. Asif et al. / Computer Communications 30 (2007) 3027–3045

3039

1400

12

1200

10

1000

ODI Factor

95th POD (msec)

1600

800 600 400

8 6 4 2

200 0

0 CS2/3=0%

CS2/3=2.5%

CS2/3=5%

CS2/3=0%

CS2/3=10%

CS2/3=2.5%

Load Configuration No LS

Static

Adaptive #1

Adaptive #2

CS2/3=5%

CS2/3=10%

LoadConfiguration

Adaptive #3

Static

Adaptive #4

Adaptive #1

Adaptive #2

Adaptive #3

Adaptive #4

Fig. 11. Performance comparison of the different redirection policies for different loads of lightly loaded Call Servers (a) 95th POD; (b) ODI factor.

250

1500 95th POD (msec)

95th POD (msec)

2000

1000 500

200 150 100 50

0 Static

Adaptive #1

Adaptive #2

Adaptive #3

Adaptive #4 0

Redirection Policy

No LS

25% overloaded

Pa

30% overloaded

Pb

Pc

Pd

Pe

Different sets of P (Pa, Pb, Pc, Pd, Pe)

Fig. 12. Comparison of the 95th PODs of the cluster using different redirection policies at 25% and 30% overloading levels.

CS1

CS2

CS3

Cluster

Fig. 14. Effect of different sets of P on the performance of the Call Server cluster. 250

starting with a transfer policy threshold that is slightly higher than and with a location policy threshold that is slightly lower than the 95th POD that correspond to E seems to be a reasonable choice.

95th POD (msec)

200 150 100 50

6.2. Cluster configuration

0 No LS

25%

40%

55%

0%

Maximum Redirected Proportion (P)

CS1

CS2

CS3

Cluster

Fig. 13. Effect of P on the performance of the Call Server cluster.

when a node is not overloaded. Very high values on the other hand are detrimental because the load transfer process starts too late. A similar rationale exists for the location policy threshold. Using a very low location policy threshold means only a few very lightly loaded nodes can act as receivers and some nodes with medium load do not qualify as receivers, even though they have spare capacity to offload the overloaded nodes. Using a very large value for the threshold on the other hand increases the probability of offloading calls to nodes that are already busy. For both the thresholds, using an intermediate value seems to lead to the best system performance. Although the values on the thresholds can be fine tuned on the system,

In this set of experiments, performance of the cluster system is investigated for different sizes and compositions of the cluster. For this analysis, experiments with three different configurations are carried out. These configurations are described next. • 2CS-1OL-1LL: This simple configuration has only two Call Servers in the cluster. One Call Server (CS1) is operating at a load intensity of 5% above E and the second Call Server (CS2) is operating at a load intensity of 5% below E. • 3CS-1OL-2LL: This configuration is composed of three Call Servers. One Call Server (CS1) is overloaded 5% above E and the other two Call Servers (CS2 and CS3) are operating 5% below E. • 3CS-2OL-1LL: This configuration has three Call Servers. Two Call Servers (CS1 and CS2) are overloaded 5% above E and one Call Server (CS3) is operating 5% below E.

3040

M. Asif et al. / Computer Communications 30 (2007) 3027–3045

1000

1000

95th POD (msec)

95th POD (msec)

1200

800 600 400 200 0

800 600 400 200 0

25%

50%

75%

25%

50%

CS1

CS2

CS3

75%

Locality Factor

Locality Factor

Cluster

CS1

CS2

CS3

Cluster

Fig. 15. Effect of locality factor on the 95th PODs of the Call Server cluster when (a) load sharing is not used; (b) load sharing (Adaptive #2) is used.

The other input parameters used for these experiments are held at their default levels listed in Table 2. The performance of these three cluster configurations are shown in Fig. 8a (without load sharing) and in Fig. 8b (with load sharing and using Adaptive #2 policy). For load configurations 2CS-1OL-1LL and 3CS-2OL-1LL, the 95th PODs of the overloaded CS (CS1) and of the cluster system are very high in comparison to that achieved for the 3CS1OL-2LL configuration (see Fig. 8a). This is because of the lower number of lightly loaded CS(s) for these cluster configurations. In addition to its own load intensity, the 95th POD of a CS is also affected by the number of remote Call Servers and the load intensity on those remote Call Servers. Higher load intensity and more number of remote CSs give rise to a higher number of data sharing requests (for the called subscriber) to a given Call Server. This is the reason for the degradation of performance for 2CS-1OL-1LL and 3CS-2OL-1LL configurations. Fig. 8b shows that load sharing has improved system performance by a great extent for all the cluster configurations. But the improvement in performance is more for those configurations that have a higher ratio of the number of lightly loaded Call Servers to the number of overloaded Call Servers. The cluster configuration 3CS-1OL-2LL has two lightly loaded CSs, so the best overall 95th POD results are observed for this configuration. The configuration, 2CS-1OL-1LL, with one lightly loaded CS and one overloaded CS has also shown better performance in comparison to the cluster configuration 3CS-2OL-1LL in which the number of overloaded CSs is higher than the number of lightly loaded CSs. The results achieved for the 3CS2OL-1LL configuration have shown that our load sharing technique does not try to overload the lightly loaded CS (CS3). This is observed from the value of 95th POD of CS3 in Fig. 8b. The 95th POD of CS3 is increased only up to nearly 150 ms, although the overloaded CSs (CS1 and CS2) are still operating in the overloaded state. Clearly, more underloaded Call Servers are required to further improve the performance of the cluster. A system, with three Call Servers with one overloaded and two underloaded Call Servers that produces a large improvement in performance when load sharing is used, is investigated in detail in the following section.

6.3. Analysis of the performance of the redirection policies As already mentioned, all the load sharing strategies differ only in terms of the redirection policies. The other components of load sharing described in Section 5.1 are the same. In this section, a detailed analysis and comparison of different redirection policies is presented. In addition, experiments that investigate the effect of different parameters associated with these redirection policies on performance are also discussed. 6.3.1. Static versus adaptive redirection policies In this experiment, a cluster of three Call Servers is analyzed to study the performance of static and adaptive policies. One of the Call Servers (CS1) is operating approximately 5% above the engineered capacity (E) and the other two Call Servers are operating 5% below E. 5% of E corresponds to 158 kcalls/h for the engineered capacity used in the simulation model used for these experiments. Such a number of calls per unit time is found to be enough to make the Call Server operating above E saturated. The overall 95th percentile origination delay is observed (Fig. 9) for no load sharing (‘No LS’ bar), the static redirection policy with four different values of d (0.05, 0.1, 0.15, 0.2) and the adaptive redirection policies. Note that for the no load sharing case, the overall 95th POD reached a very high value of more than 1600 ms. From Fig. 9, it can be concluded that the static policy performs better than the no load sharing situation for all values of d, but the performance of adaptive policies is always better than the static policy. A large variation in the performance of the static policy is observed for different values of d. At least a 100% improvement over the static policy is achieved by the adaptive policies for a d = 0.05. The best performance exhibited by the static policy is for d = 0.15 that is inferior to the performance of all the adaptive policies. 6.3.2. Different load imbalance scenarios In this section, three sets of experiments are presented; each corresponds to a different load imbalance scenario. The first set of experiments is carried out to investigate the effectiveness of the redirection policies, when the load intensity of one overloaded CS is varied above E by small

M. Asif et al. / Computer Communications 30 (2007) 3027–3045

amounts (0–7.5%). In the second set, the load intensity of the two lightly loaded Call Servers is varied. These two sets of experiments are labeled as ‘‘Load Variation of a Single Overloaded Call Server’’ and ‘‘Load Variation of Two Lightly Loaded Call Servers’’, respectively. In the third set labeled as the ‘‘High Overload Scenario’’, experiments are carried out to see the effectiveness of all the redirection polices for high levels of overloading such as 25% and 30% above the engineered capacity. For the static policy, the value of d is held at 0.15 in all the experiments. 6.3.2.1. Load variation of a single overloaded call Server. In these experiments, a cluster of three Call Servers is used. The load intensities of the two Call Servers (CS2 and CS3) are maintained at 5% below E for all these experiments. But the load intensity of the third Call Server (CS1) is varied from 0% to 7.5% above E. The results are presented in Fig. 10. Different load configurations based on the load intensity of CS1 are labeled as: ‘CS1 = 0%’ for CS1 operating at engineered capacity (E), ‘CS1 = 2.5%’ for CS1 operating 2.5% above E and so on. In Fig. 10a, the first bar at each load configuration shows the overall 95th POD of the cluster without using any load sharing. The rest of the bars in the graph display the overall 95th POD for different redirection policies. The differences between the first bar and the other bars indicate how effectively these redirection policies have reduced the 95th POD of the cluster from the 95th POD achieved without load sharing. It is important to observe that in the first two situations (CS1 = 0%, CS1 = 2.5%) when CS1 is not overloaded by a great amount, load sharing policies perform only slightly better than a no load sharing situation. But as soon as load on the overloaded CS is increased by a significant amount (CS1 = 5%, CS1 = 7.5%), load sharing policies demonstrate large improvements in performance. Among all load sharing policies, Adaptive #2 performs very well for all the load configurations. This is because of its capability to compute the value of d according to the load of both the sender and the receiver CS. So for different load configurations described in these scenario, it can be concluded that the sender/receiver load based redirection policy shows a superior performance in comparison to all the other redirection policies. The improvement in the 95th POD is also measured by the ODI factor. A comparison of the ODI factors for different load configurations is shown in Fig. 10b. The ODI factor achieved with Adaptive #2 is the highest for all the load configurations. Average origination delay is also presented in Fig. 10c. The ranking of different redirection policies in terms of the average origination delay is observed to be the same as that seen for the 95th POD. The results of experiments with two overloaded Call Servers in the cluster have also shown the same pattern of performance improvement [23] but not as much as seen in the current scenario of one overloaded CS. This is due to the presence of only one lightly loaded CS in the cluster.

3041

6.3.2.2. Load variation of two lightly loaded Call Servers. In this scenario, the performances of the different redirection policies are investigated for different load levels of the lightly loaded Call Servers. For these experiments, one overloaded Call Server (CS1) in the cluster is operating 5% above E, and the load intensities of the two lightly loaded Call Servers (CS2 and CS3) are varied from 0% to 10% below E. Different load configurations based on the load intensities of CS2 and CS3 are labeled as: ‘CS2/ 3 = 0%’ for CS2 and CS3 operating at the engineered capacity (E), ‘CS2/3 = 2.5%’ for CS2 and CS3 operating 2.5% below E and so on. By comparing the overall 95th POD achieved with all redirection policies at different load configurations (Fig. 11), Adaptive #2 is found to be consistently performing better than the other policies in all cases except the last configuration. For all the load configurations, the adaptive policies demonstrate significant performance improvements over a system with no load sharing. For the first two load configurations (CS2/3 = 0% and CS2/3 = 2.5%), the lightly loaded CSs have very little spare capacity to absorb the extra load of the overloaded CS. This is the reason for the higher values of 95th PODs observed for the adaptive policies in these two load configurations. At the load configurations CS2/3 = 5% and CS2/3 = 10%, the performances of all the redirection polices are significantly improved because of sufficient spare capacity available on the lightly loaded Call Servers. This is captured by the ODI factor displayed in Fig. 11b. Adaptive #2 is the winner for the load configuration CS2/3 = 5%. Adaptive #3 and Adaptive #4 show better performance than others at load configuration CS2/3 = 10%. For Adaptive #3 and Adaptive #4, the redirection factor, d, is proportional to the spare capacity of the receiver CS. So with the two lightly loaded CSs operating 10% below E, the last configuration (CS2/3 = 10%) gives rise to the improved performances for Adaptive #3 and Adaptive #4. Using an appropriate redirection policy seems to have a strong impact on performance. Average origination delay captured for these load configurations leads to the same ranking of the performances of the redirection policies. 6.3.2.3. High overload scenario. To investigate the effectiveness of the redirection policies for higher levels of overloading, additional experiments are performed. One of the Call Servers (CS1) is overloaded up to 25% and 30% above the engineered capacity (E) and the two lightly loaded Call Servers (CS2 and CS3) are operating 50% below E. Two different levels of overloading (25% and 30%) are used to see which policy demonstrates a more consistent performance. The results of all the redirection policies for these two levels of overloading are presented in Fig. 12. The static policy demonstrates a poor performance for both levels of overloading. The fixed value of the redirection factor (d = 0.15) is not efficient in absorbing the extra load of the overloaded CS that can change with time. The performances of Adaptive #1 and Adaptive #2 are better

3042

M. Asif et al. / Computer Communications 30 (2007) 3027–3045

than the static policy, but their 95th PODs are higher than the desired value of TE (140 ms). However, Adaptive #2 performs better than Adaptive #1. Adaptive #3 and Adaptive #4 demonstrate a superior performance in comparison to the other redirection policies. The overall 95th PODs produced by both policies are under the specified TE. The results obtained for these two policies are consistently better than the other policies at both the levels (25% and 30%) of overloading. Adaptive #3 is observed to perform slightly better than Adaptive #4 at both levels of overloading. 6.3.3. Further analysis of Adaptive #1 and Adaptive #2 The Adaptive #1 and Adaptive #2 redirection policies use the parameter P that corresponds to the maximum proportions of new calls that can be redirected from a sender Call Server to a receiver Call Server. The effect of P on the performance of these two policies is discussed in the following subsections. The experiments discussed in the following subsections use the default values of the input parameters listed in Table 2. One of the Call Servers (CS1) is operating at a load intensity that is 2.5% above E and the other two Call Servers (CS2 and CS3) are operating at load intensities 2.5% below E. 6.3.3.1. Adaptive #1: Effect of P. In Adaptive #1, the redirection factor (d) is adaptive and changes with the current load of the sender CS. The higher the load on the sender CS, the higher the number of calls transferred to the receiver CS. The value of P is a constant and can be anything between 0% and 100%. For this experiment, an analysis of the effect of different values of P on the performance of the system is carried out. The experiment starts with a value of 25% for P and then P is incremented in steps of 15%. The results are summarized in Fig. 13. The first bar in Fig. 13 (No LS) shows the results achieved without load sharing. Using the sender load based redirection policy, the best results are observed for a value of P equal to 40%. For all other values of P, the 95th POD of the overloaded CS (CS1) is improved but the overall 95th POD of the cluster is worse than the overall 95th POD achieved without using any load sharing. For a P = 25%, the number of redirected calls from the overloaded CS (CS1) to the lightly loaded CSs (CS2 and CS3) is not large enough to improve the performance achieved in the ‘No LS’ case. The 95th POD of the overloaded CS (CS1) is improved only slightly and the 95th PODs of the lightly loaded CSs (CS2 and CS3) are moderately increased. The overall 95th POD of the cluster is observed to be slightly worse in comparison to the situation without using load sharing. For a P = 40%, both the 95th POD of the overloaded CS and the overall 95th POD of the cluster are improved significantly in comparison to the situation without load sharing. For higher values of P such as 55% and 70%, degradation in performance is observed. The 95th PODs of the lightly loaded Call Servers (CS2 and CS3) are pushed to high values but the 95th POD of the overloaded CS

(CS1) is not reduced accordingly. This is because of a thrashing phenomenon which is explained next. The overloaded Call Server (CS1) tries to send too many calls to the receiver CSs (CS2 and CS3). This overloads the receiver Call Servers that start sending back some of their calls to the Call Server that was originally overloaded (CS1). 6.3.3.2. Adaptive #2: Effect of different sets of P. As described in Section 4.2.3, this policy considers the load of both the sender and the receiver CS to calculate the redirection factor, d, at run time. For this policy, P is assumed to be the set of four values (P1, P2, P3, P4) in increasing order and each value of P in the set corresponds to one availability level of the receiver CS as explained in Section 4.2.3. Based on the availability level of the receiver CS, a corresponding value of P is selected and applied in Eq. (1) to calculate number of calls to transfer. In the experiment described in this section, the effect of different sets of P (shown in Table 3) on performance is investigated. These five sets of P are labeled as Pa, Pb, Pc, Pd and Pe. The results are shown in Fig. 14. The first bar with a ‘No LS’ label in Fig. 14 shows the results without using any load sharing. As seen in Fig. 14, the results for the set Pa are not good because of the lower number of transferred calls from the overloaded Call Server (CS1) to the lightly loaded Call Servers (CS2 and CS3). The 95th POD of CS1 is only marginally improved and the effect on the 95th PODs of lightly loaded CSs is also negligible as shown in the results. The overall 95th POD for this set of P also shows that there is no improvement in performance from the ‘No LS’ case. As the values in the sets of P are increased, more improvements in the performance of the Call Server cluster are observed. For set Pb, the 95th POD of the overloaded CS (CS1) is reduced to nearly 195 ms and the overall 95th POD is 140 ms (see Fig. 14). The set Pc uses higher values of P (14, 28, 42, 55). The 95th PODs of the overloaded CS and of the overall cluster are improved further to nearly 180 and 137 ms, respectively. The improvement continues with set Pd (18, 36, 54, 70). But this trend does not continue with Pe that includes the highest level of the redirected proportion (100%). The results obtained for Pe (see Fig. 14) indicate that these higher values degrade the performance of the cluster. A transfer of a large number of new calls from the overloaded CS (CS1) tends to overload the lightly loaded Call Servers (CS2 and CS3). This can be observed in the higher values of the 95th PODs Table 3 Different sets of P Set labels

Set of P (P1, P2, P3, P4)

Pa Pb Pc Pd Pe

(6, 12, 18, 25) (10, 20, 30, 40) (14, 28, 42, 55) (18, 36, 54, 70) (25, 50, 75, 100)

M. Asif et al. / Computer Communications 30 (2007) 3027–3045

of the lightly loaded CSs. The 95th POD of CS1, which was originally overloaded, is only reduced marginally. This is possibly because of the transfer of calls back to CS1 from CS2 and CS3 that get overloaded. This phenomenon of thrashing has already been discussed in the previous section. From this analysis, the set Pd is chosen as the default parameter for Adaptive #2. 6.4. Effect of locality factor Locality factor (defined in Section 5.2) that depends on the location of the called subscriber state data has a direct effect on the performance of the cluster. If the called subscriber state data is local to the Call Server that is handling the call, performance and capacity of the cluster system are improved [22]. In this set of experiments, the effect of locality factor on the performance of a Call Server cluster with and without load sharing is investigated. One of the Call Servers (CS1) is operating at a load intensity that is 5% above E and the other two Call Servers (CS2 and CS3) are operating at load intensities 5% below E. The results are shown in Fig. 15a (without load sharing) and Fig. 15b (with load sharing and using Adaptive #2 policy). The three data sets shown in Fig. 15 are for three values of locality factor (25%, 50% and 75%). From Fig. 15a, it can be seen that the 95th PODs of individual CSs and of the cluster are high for lower values of the locality factor. By increasing the locality factor, the performance of the system is improved significantly. For a locality factor of 75%, the overall 95th POD is reduced significantly in comparison to the 95th POD achieved with a locality factor of 50%. It is important to note that it is not easy to accurately capture the impact of locality factor when load sharing is used. Because if a call is transferred to a remote Call Server, then the called subscriber’s state data that was originally remote can become local and vice versa. From Fig. 15b, it is observed that load sharing leads to a performance improvement for all the three values of the locality factor. It is also seen that higher the locality factor lower is the overall 95th POD for the cluster. For a locality factor of 25%, the overall 95th POD is improved from 1600 ms to approximately 493 ms when load sharing is used. These results have shown that load sharing always improves the performance of the system but locality factor is another important factor which affects system performance, with and without load sharing. 6.5. Performance analysis Based on OPNET, a simulation model of a CS cluster was built and a number of experiments was conducted. The results of the experiments gave rise to a number of insights into system behavior and performance that are summarized.

3043

• The redirection policy adopted plays an important role in load sharing on CS clusters. The 95th POD of the CS cluster is observed to depend on the specific load sharing strategy used. • Both static and adaptive redirection policies are investigated. The proposed adaptive policies are observed to perform better in comparison to the static policy (see Section 6.3.1). • Four adaptive redirection policies are experimented with. Each policy uses a different degree of knowledge of system state in determining the proportion of the new incoming calls to be transferred from an overloaded CS. The inferior performance of Adaptive #1 suggests that using load information of only the sender CS is not adequate. Adaptive #2 that uses load information from both the sender and receiver Call Servers is observed to perform better than Adaptive #1. As shown in Section 6.3.2, Adaptive #4 that deploys a more complex algorithm does not perform well at lower levels of overloading for the sender Call Server. • Adaptive #1 and Adaptive #2 redirection policies use a parameter P that corresponds to the maximum proportion of the new calls that can be redirected. Analyses presented in Sections 6.3.3.1 and 6.3.3.2 show that using a moderate value of P leads to the best performance. Very small and very large values of P are observed to give rise to worse performance. • Adaptive #2 and Adaptive #3 are observed to be performing better in comparison to the other adaptive policies for a broad range of workload variations. • Two different types of load imbalance scenarios are considered in Sections 6.3.2.1 and 6.3.2.2. The load sharing strategies give rise to a significant improvement in performance in comparison to a non-load shared system in each case. A higher degree of performance improvement is observed when the difference between the load of the overloaded Call Server and the engineered capacity as well as the difference between the load of the lightly loaded Call Servers and the engineered capacity are increased. • With high overloads (see Section 6.3.2.3) Adaptive #3 is observed to demonstrate superior performance. Using knowledge of the load of the receiver CS only seems to be apt for controlling such system overloads. • Choosing appropriate values for the transfer policy threshold and the location policy threshold is important and discussed next. Selecting a transfer policy threshold that is substantially higher (approximately 25%) than the 95th POD of a Call Server operating at its engineered capacity seems to be appropriate for the parameter experimented with. By choosing a value above TE with a substantial margin has the advantage that a Call Server does not become a sender because of minor load spikes. For the experiment on the location policy described in Fig. 7, the threshold value (105 ms) which is

3044

M. Asif et al. / Computer Communications 30 (2007) 3027–3045

approximately 25% lower than TE is observed to be effective for selecting a potential receiver CS. By choosing a value that is substantially lower than the 95th POD that corresponds to E makes sure that the receiver node has sufficient spare capacity to accept transferred calls from a sender node. • The locality factor has a strong impact on system performance. The adaptive load sharing policies are observed to give rise to performance improvements over a nonload shared system for a wide range of locality factor. 7. Conclusions and future work This paper concerns load sharing in Call Server clusters and focuses on redirection policies to be used by an overloaded (sender) CS to transfer new incoming calls to another underloaded (receiver) CS. Adaptive policies that base the proportion of new incoming calls to be transferred on the current system state are observed to perform better in comparison to a static policy for which only a fixed proportion of the new incoming calls is transferred from an overloaded CS to an underloaded CS. Both Adaptive #2 and Adaptive #3 that use load information from the receiver CS only demonstrate good performance for a broad range of workload parameters. However, Adaptive #3 is considered a better choice because it uses lesser state information than Adaptive #2 and so the concomitant overhead is expected to be lower. It has also been observed that the values of the transfer policy threshold and the location policy threshold have an impact on system performance. The transfer policy threshold should be chosen slightly higher than the load index of a Call Server operating at the engineered capacity. The location policy threshold should be marginally lower than the value of the load index of a Call Server operating at the engineered capacity. A number of topics can be investigated in more detail in the future for enhancing system performance. The simulation model presented in this paper did not account for the overhead associated with execution of the algorithm of a given redirection policy. We intend to include this factor in our future studies. A dynamic threshold based policy in which the threshold depends on the load information of the Call Servers in the cluster is worthy of research. In this paper, calls are transferred to one receiver node at a time. An enhanced location policy that can select more than one receiver node to transfer the load may lead to higher system performance. Investigation of such a location policy warrants investigation. Acknowledgements This work was supported by Natural Sciences and Engineering Research Council of Canada (NSERC) and Nortel. We thank NSERC and members of the Systems Engineering group at Nortel, Ottawa, Canada for their valuable help and support in carrying out this research.

References [1] H.Y. Sit, K.S. Ho, H.V. Leong, W.P. Robert, An adaptive clustering approach to dynamic load balancing, in: Proceedings of the 7th International Symposium on Parallel Architectures, Algorithms and Networks (ISPAN’04), Hong Kong, May 2004, pp. 415–420. [2] S.K. Das, D.J. Harvey, R. Biswas, Adaptive load balancing algorithms using symmetric broadcast networks, in: Proceedings of the 26th International Conference on Parallel Processing (ICPP’97), Bloomingdale, IL, USA, August 1997, pp. 360–367. [3] M. Dahlin, Interpreting stale load information, in: Proceedings of the 10th IEEE International Conference on Distributed Computing Systems (ICDCS’99), Washington, DC, USA, June 1999, pp. 285–296. [4] C. Chang, D. Lee, Y. Jou, Load balanced Birkhoff-von Neumann switches, in: Proceedings of the 2001 IEEE Workshop on High Performance Switching and Routing (HPSR’01), Dallas, TX, USA, May 2001, pp. 276–280. [5] K. Long, Z. Zhang, S. Cheng, Load balancing algorithms in MPLS traffic engineering, in: Proceedings of the 2001 IEEE Workshop on High Performance Switching and Routing (HPSR’01), Dallas, TX, USA, May 2001, pp. 175–179. [6] Y. Li, Q. Zhu, Y. Cao, A request dispatching policy for web server cluster, in: Proceedings of 2005 IEEE International Conference on eTechnology, e-Commerce and e-Service (EEE ’05), Hong Kong, March–April 2005, pp. 391–394. [7] L. Xiao, X. Zhang, Y. Qu, Effective load sharing on heterogeneous networks of workstations, in: Proceedings of 14th International on Parallel and Distributed Processing Symposium (IPDPS 2000), Cancun, Mexico, May 2000, pp. 431–438. [8] J. Cao, X. Wang1, S.K. Das, A framework of using cooperating mobile agents to achieve load sharing in distributed web server groups, in: Proceedings of Fifth International Conference on Algorithms and Architectures for Parallel Processing (ICA3PP’02), Beijing, China, October 2002, pp. 118–125. [9] M. Singhal, N. Shivarathi, Advanced Concepts in Operating Systems: Distributed, Database, and Multiprocessor Operating Systems, McGraw-Hill Inc, New York, NY, USA, 1994. [10] J. Balasubramanian, D.C. Schmidt, L. Dowdy, O. Othman, Evaluating the performance of middleware load balancing strategies, in: Proceedings of the 8th IEEE International Enterprise Distributed Object Computing Conference (EDOC’04), Monterey, CA, USA, September 2004, pp. 135–146. [11] P. Krueger, R. Chawla, The Stealth distributed scheduler, in: Proceedings of the 11th International Conference on Distributed Computing Systems (ICDCS’91), Arlington, TX, USA, May 1991, pp. 336– 343. [12] D.L. Eager, E.D. Lazowska, J. Zahorjan, Adaptive load sharing in homogeneous distributed systems, IEEE Transactions on Software Engineering 12 (5) (1986) 662–675. [13] H.G. Rotithor, S.S. Pyo, Decentralized decision making in adaptive task sharing, in: Proceedings of the 2nd IEEE Symposium on Parallel and Distributed Processing (IPDPS’90), Dallas, TX, USA, December 1990, pp. 34–41. [14] N.G. Shivaratri, P. Krueger, Two adaptive location policies for global scheduling algorithms, in: Proceedings of the 10th IEEE International Conference on Distributed Computing Systems (ICDCS’90), Paris, France, June 1990, pp. 502–509. [15] G. Lee, H. Lee, J. Cho, A prediction-based adaptive location policy for distributed load balancing, Journal of Systems Architecture 42 (1) (1996) 1–18. [16] Z. Genova, K.J. Christensen, Challenges in URL switching for implementing globally distributed web sites, in: Proceedings of the 29th International Conference on Parallel Processing (ICPP’00), Toronto, Canada, August 2000, pp. 89–94. [17] J. Ju, G. Xu, K. Yang, An intelligent dynamic load balancer for workstation clusters, ACM SIGOPS Operating Systems Review 29 (1) (1995) 7–16.

M. Asif et al. / Computer Communications 30 (2007) 3027–3045 [18] C. Wang, P. Krueger and M. Liu, Intelligent job selection for distributed scheduling, In: Proceedings of the 13th International Conference on Distributed Computing Systems (ICDCS’93), Pittsburg, PA, USA, May 1993, pp. 517–524. [19] B. Keiser, E. Strange, Digital Telephony and Network Integration, Van Nostrand Reinhold, New York, NY, USA, 1995. [20] M.S. Al-Amri, R.E. Ahmed, New job selection and location policies for load-distributing, International Journal of Network Management 12 (3) (2002) 165–178. [21] V. Cardellini, M. Colajanni, P.S. Yu, Redirection algorithms for load sharing in distributed web-server systems, in: Proceedings of 19th IEEE International Conference on Distributed Computing Systems (ICDCS’99), Austin, Texas, USA May/June 1999, pp. 528–535. [22] S. Majumdar, B. Sun, Call server clusters: year 1 report (project: high performance telecommunication servers SER03-104), Technical Report, Nortel, Ottawa, ON, Canada, September 2004. [23] M. Asif, Load sharing in call servers clusters, M.A.Sc. Thesis, Department of Systems and Computer Engineering, Carleton University, Ottawa, ON, Canada, August 2005. [24] Cisco Systems Inc., Cisco CallManager System Guide (Release 3.1), Corporate Headquarters, Cisco Systems Inc., CA, USA, 2001. [25] B. Beninger, XA-Core: Guide to Multiprocessing, Nortel, Ottawa, ON, Canada, 1998. [26] OPNET Technologies Inc., OPNET Modeler 8.0: Online Documentation, OPNET Headquarters, Bethesda, MD, USA, 2002. [27] Unified Modeling Language Specification, Version 2.0, October 2004, available from http://www.uml.org (accessed in July, 2005).

Muhammad Asif is a Ph.D. candidate in the Department of Systems and Computer engineering, Carleton University, Ottawa, Canada. He has received a Master’s degree in Electrical Engineering from Carleton University in 2005 and a B.Sc. degree in Electronics and Communications Engineering from University of Engineering and Technology, Lahore, Pakistan in 1998. He has worked with Techlogix Inc. (www.techlogix.com) from 1999 to 2003. Asif is a co-inventor of a US (6,459,974) and a European patent (EP1262376). His research interests include application partition-

3045

ing, mobile web services, web applications, load sharing, machine learning and classification algorithms.

Shikharesh Majumdar is a Professor and the Director of the Real Time and Distributed Systems group at the Department of Systems and Computer Engineering at Carleton University in Ottawa, Canada. He holds an M.Sc. and a Ph.D. degree in Computational Science from University of Saskatchewan, Saskatoon, Canada. Before his graduate studies in Canada he did a Bachelor of Electronics and Telecom Engineering and a PostGraduate Diploma in Computer Science (hardware) from Jadavpur University in India and Corso Di Perfezionamento from Politecnico Di Torino in Italy. Dr. Majumdar has worked at the R&D Wing of Indian Telephone Industries (Bangalore) for 6 years. His research interests are in the areas of gridbased systems, operating systems, middleware and performance evaluation. He has received two awards related to publications in these areas. Dr. Majumdar is a member of ACM and IEEE. He was a Distinguished Visitor for the IEEE Computer Society from 1998 to 2001.

Gerald Kopec is a senior system engineer at Nortel with over a decade of experience in engineering and capacity optimization of communication Call Servers. He holds a Bachelor’s degree of Applied Science in Computer Engineering from the University of Waterloo.