ARTICLE IN PRESS
JID: COMCOM
[m5G;February 14, 2017;10:52]
Computer Communications 0 0 0 (2017) 1–17
Contents lists available at ScienceDirect
Computer Communications journal homepage: www.elsevier.com/locate/comcom
Lyapunov stability of SIP systems and its application to overload control Mojtaba Jahanbakhsh, Seyed Vahid Azhari∗, Hani Nemati School of Computer Engineering, Iran University of Science & Technology, Tehran, IRAN
a r t i c l e
i n f o
Article history: Received 22 June 2016 Revised 12 November 2016 Accepted 24 January 2017 Available online xxx Keywords: Session initiation protocol Overload control Lyapunov stability Fluid model
a b s t r a c t We propose a deterministic fluid model for the operation of two tandem SIP proxies in overload conditions. Using our model, we categorize overload scenarios into several classes, gaining better understanding of SIP overload. We then use the theory of Lyapunov for system stability and characterize stabilizing SIP overload control schemes over the state-space representation of a SIP server. A simple Lyapunov function is then utilized to build a Proportional-Integral-Derivative (PID) controller for performing overload control in a tandem system of SIP servers. The proposed PID overload control algorithm is then evaluated using experiments showing it can successfully restore throughput close to maximum system capacity during overload. Furthermore, we study the effect of overload detection lag on system stability showing that even a small delay in reacting to overload can postpone system stabilization by multiple minutes. © 2017 Elsevier B.V. All rights reserved.
1. Introduction As more telecommunication operators migrate their legacy TDM-switch infrastructure to next generation networks (NGN), Session Initiation Protocol (SIP) [1] is gaining widespread use as a signaling protocol. For instance, the IP Multimedia Subsystem (IMS) proposed as the signaling plane of LTE by the 3rd Generation Partnership Project (3GPP), is entirely based on SIP [2]. The extensive use of SIP in carrier networks demands high availability specially in overload situations; which is challenging for a protocol that was not originally designed to deliver carrier-grade performance [3,4]. The problem of overload control exists in many faucets of networking, but can be generally categorized into service plane and data plane approaches. In the data plane, Active Queue Management (AQM) schemes such as the family of BLUE [5] algorithms, or the classical Random Early Discard (RED) and its variants [6], have been long used to prevent congestion. These schemes apply to greedy TCP senders with bulk traffic of elastic nature. AQM schemes anticipate and prevent congestion by providing early warning to senders over a bottleneck link. However, SIP signaling traffic is usually not elastic in nature, and overload is a temporary event, which can be long or short term. Therefore, overload is detected as opposed to being anticipated. More importantly, SIP overload control schemes have to resort to individual rejection of excess requests in cases where the
∗
Corresponding author. E-mail address:
[email protected] (S.V. Azhari).
SIP server is downstream of many independent UACs. Such a condition does not occur in the AQM framework. Also, SIP servers are highly capable custom designed equipment that enable more efficient overload control techniques. Hence, the problem of overload control in SIP networks needs to be dealt with using specially designed algorithms. Overload in SIP is a particularly difficult problem to tackle, as load surges can happen with almost no notice regardless of overprovisioning and load balancing techniques. In fact, IETF RFCs 7068 and 5390 list several causes for SIP overload [7,8]. Fundamentally, an overload condition causes performance degradation of SIP entities due to the interplay of the following: 1. The excess amount of SIP messages keeping server CPU busy and increasing response time. 2. Message re-transmissions triggered by SIP reliability mechanism, after experiencing a long response time (i.e., more than half a second). 3. Head of line blocking at SIP-server queues, where messages belonging to ongoing SIP sessions are held behind those awaiting session initiation. Head of line blocking also happens when SIP response messages from a downstream entity are queued behind SIP INVITEs from the upstream. Head of line blocking results in further re-transmissions. Eventually, messages will build up at server queues leading to loss and consequently call failure. More importantly, an overload condition can spread to other entities if not promptly dealt with. Controlling an overload condition can become extremely difficult when complex load relationships exist among network entities.
http://dx.doi.org/10.1016/j.comcom.2017.01.014 0140-3664/© 2017 Elsevier B.V. All rights reserved.
Please cite this article as: M. Jahanbakhsh et al., Lyapunov stability of SIP systems and its application to overload control, Computer Communications (2017), http://dx.doi.org/10.1016/j.comcom.2017.01.014
JID: COMCOM 2
ARTICLE IN PRESS
[m5G;February 14, 2017;10:52]
M. Jahanbakhsh et al. / Computer Communications 000 (2017) 1–17
Such conditions have not been previously considered to the best of our knowledge. In this paper, we present a novel framework for studying and controlling SIP overload. Our main contributions are as follows: 1. Development of a fluid-flow model for the basic building block of SIP signaling networks, that is, a tandem upstreamdownstream server pair. This model also includes SIP rejection of excess requests.1 2. Applying Lyapunov stability theory to SIP and deriving stability conditions of the system. 3. Characterizing the state-space trajectory of stabilizing and nonstabilizing control regimes. 4. Proposing a simple SIP overload control (SIP-OC) method based on a suitable Lyapunov function. 5. Considering the effect of overload detection lag on system stability and SIP-OC response time. Furthermore, our proposed SIP-OC algorithm is evaluated using an experimental test-bed and our fluid model is verified by simulations run over a detailed SIP server model using the NS-2 simulator. In the next section, we address the body of research on SIP-OC. Section 3 describes our proposed fluid-flow model for two tandem servers. This analytical model is compared against simulations in Section 4.1. In Section 5, we use Lyapunov stability theory to find stabilizing conditions in terms of rejection probabilities for the tandem system. Many other interesting findings based on the model are also presented; such as an insightful discussion of stabilizing state-space trajectories and the effect of overload detection lag on SIP-OC response time. Section 6 proposes and evaluates a SIP-OC algorithm using the Lyapunov method which is then evaluated by actual experiments. We finally conclude the paper in Section 7 and discuss future research directions. 2. Research background A SIP overload control (SIP-OC) algorithm consists of overload detection and control mechanisms [8,9]. Overload detection is the process of determining whether some SIP/non-SIP entity is overloaded. This is accomplished either locally at the overloaded entity by monitoring certain trigger functions, or remotely by an upstream server using some form of explicit feedback or perhaps even implicitly by monitoring certain system parameters such as transaction completion time. A SIP-OC mechanism consists of action taken whenever overload is detected to restore normal server operation condition, usually by adjusting the rate of calls forwarded to the overloaded entity. This is generally performed using conventional flow control schemes such as, rate-based, window-based, loss-based and call gaping techniques [8,10–14]. SIP-OC mechanisms can be categorized into implicit and explicit methods based on the type of feedback they use for detecting an overload. Implicit methods use no feedback from the downstream overloaded server and generally make use of some information such as the transaction completion delay that is entirely calculated at the upstream entity to detect overload [13,15,16]. Explicit methods, on the other hand, make use of information that is sent back by the overloaded server to detect overload and also to determine the correct amount of load to be imposed on the overloaded server. This information may include CPU occupancy [17–20], delay [21], queue length [17,22–28], and SIP transaction state [29], to name a few. 1 Note that without losing generality, a redirect-when-overload policy can also be adopted within the same modeling framework as opposed to a reject-whenoverload policy.
In addition, SIP-OC methods differ by the level of cooperation among various entities. Local SIP-OC methods simply get rid of the excess load locally at the affected SIP server. This is generally done by issuing a 503-REJECT response to excess INVITE requests [17,25]. Tackling overload at the affected server, unfortunately, wastes valuable server resources. Hence, if the amount of overload exceeds a certain limit, then even full rejection will not restore server stability. As a result, local SIP-OC is only used at the edge of a SIP network, where SIP User Agents (UA) connect and there is no option but to offload excess demand one UA at a time. Alternatively, if the immediate upstream entity of an overloaded SIP server is responsible for throttling requests, then we have hopby-hop (HbH) SIP-OC [15,20,26–30]. Furthermore, if throttling is handed back to the first upstream server on the signaling path, then end-to-end (E2E) SIP-OC is realized [13,18,19,21]. E2E approaches have the advantage of dealing with excess load at the point of origin, where network resources are yet not wasted. HbH methods, on the other hand, reject requests that have already used processing resources of further upstream servers. Nevertheless, HbH approaches are sometimes the only possibility, for example, when overload occurs in another administrative domain. In such cases, the overloaded server may not be able to trace back the signaling flow to its originating entity, preventing use of E2E SIP-OC schemes. Once overload is detected, a throttling mechanism should be invoked. Various throttling mechanisms are available as also referred to in RFC 6357 [10], each imposing load limitation through a certain control variable. Rate based methods directly control the request rate to an overloaded downstream server [20,21,26–29,31]. Window based mechanisms control the actual amount of workload that is delivered to the overloaded server through a sliding window [13,15,16]. Loss based approaches, on the other hand, adjust the fraction of requests that should be rejected and do not directly focus on the actual amount of load imposed on the overloaded server [17–19,25]. Under a loss based regime, a SIP server asks an upstream neighbor to reduce the number of requests it would normally forward to this server by a certain percentage [11]. Finally, there are a number of alternative measures that improve SIP performance during overload. These include limiting request queue size, priority queueing [11,32,33] and controlling retransmission rate to the overloaded server [30,34]. While these approaches improve system performance during an overload condition, they are unable to completely mitigate its ramifications. However, in two independent recent work, a very simple transport layer scheme is used to completely mitigate SIP overload [35,36]. The proposed methods deliberately drop TCP SYN packets for excess INVITE requests during an overload period, preventing a request at connection establishment stage. Their methods are, however, only applicable to edge SIP proxies which directly serve UACs. We have summarized some of the research body on SIP-OC in Table 1. The algorithm column briefly describes the throttling mechanism used by each approach. 2.1. IETF developments Early after the conception of SIP and ratification of RFC 2543 [37] in 1999, it became clear that SIP was not equipped with proper measures to counter overload. In 2002, RFC 2543 was obsoleted by the current base standard, RFC 3261[1], which introduced the 503 response code to signal overload as well as many other failure conditions. However, this also posed new instability problems and it was soon realized that SIP would suffer greatly in overload conditions. This was an obstacle for SIP becoming a carrier grade option as a signaling protocol. Consequently in 2008, the IETF proposed RFC 5390 [8], which acknowledged the shortcomings of the 503 response code and pro-
Please cite this article as: M. Jahanbakhsh et al., Lyapunov stability of SIP systems and its application to overload control, Computer Communications (2017), http://dx.doi.org/10.1016/j.comcom.2017.01.014
ARTICLE IN PRESS
JID: COMCOM
[m5G;February 14, 2017;10:52]
M. Jahanbakhsh et al. / Computer Communications 000 (2017) 1–17
3
Table 1 Comparison of related work on SIP overload control. Ref.
Detection mech.
Control variable
Cooperation
[17]
CPU Occupancy, Queue Length Queue Length Queue Length CPU Occupancy Implicit (Transaction Delay) Active Transactions
Loss Based
Local
PI Controller
Loss Based Loss Based
Local Local; HbH; E2E
Window Based
HbH; E2E
Fairness
Heuristic Threshold Function Adaptive MIMDa AIMDb
Rate Based
Local; HbH
Back-load Estimation
[25] [18] [13] [29] [29] [21] [30]
Active Probing Upstream ReTx Rate Implicit (Transaction Delay) Round Trip Time
Rate Based Redundant ReTx Ratio Upstream RTT Window Based
E2E HbH
Rate Based
E2E
[39] [26,27] [28] [40] [41]
Queue Length
Rate Based
HbH
Queue Length
Loss Based
HbH
Queue Length
Loss Based
Local; HbH
[16]
Implicit (Transaction Delay) CPU Occupancy
Window Based
HbH
Loss Based
HbH; E2E
CPU and Memory Queueing Delay
Rate Based
HbH
Window Based
HbH
[15] [38]
[19] [20] [42] [43] [44] [45] [46]
[36] [35] [31] [25,32,47] a b c
Notes
For Short Term Overload
Algorithm
NLMSc Predictor NLMSc Predictor Call Gapping PI-Controller AIMDb
HbH Prompt Response, Fairness, Capacity Probing and Prediction Token Bucket M/G/1/R Model with batch arrival Markovian Model Exhaustive Priority Gated Priority
Call Gapping
Bi-Level Hysteretic Bi-Level Hysteretic Simple Threshold Fuzzy
CPU occupancy and Queueing Delay Queue Length CPU Occupancy CPU occupancy
Loss and Window Based Loss Based
HbH
Window Based
HbH
CPU and Memory Session Latency CPU Load Overload Mitigation
Drop Rate Drop Rate Rate Based N/A
Local Local HbH N/A
Prioritizing O.C. Messages
Local; HbH; E2E Pkt. Classification, Re-transmission Deletion, Response Time Prediction TCP Connection Drop TCP Connection Drop Priority Queueing
MIMDa Additive Increase Probabilistic Decrease Estimate Request Service Rate Proportional Controller Threshold Function Adaptive MIMDa Multi-queue Scheduling with target response time Threshold Based AIMDb Target CPU Threshold N/A
Multiplicative Increase Multiplicative Decrease Additive Increase Multiplicative Decrease Normalized Least Mean Square
vided 23 requirements for a SIP overload control solution. This was followed by the establishment of SIP overload control (SOC) working group at IETF, which in 2011 issued its first RFC 6357 [10]. The RFC 6357, puts forward a framework for SIP-OC by presenting different design considerations and models. In particular, it outlines various types of SIP-OC algorithms based on controlling the rate, loss probability or window size used by the upstream entity, as well as on/off and signal based methods. It also emphasizes on the notion of fairness during overload, and classifies different SIPOC schemes into local, hop by hop, and end to end, depending on the extent of cooperation between SIP servers. The last two standards proposed by the SOC work group, RFCs 7339 [11] and 7415 [14] present loss-based and rate-based SIP-OC schemes, respectively. Following this, the SOC working group was concluded in 2015.
mance model for a hysteresis based loss based overload control scheme, where a server is assumed to be in any of three states: normal, overload and blocking. The rate based version of SIP-OC was later analyzed in [39] and a token bucket rate limiter controlled by a hysteretic algorithm considering buffer length on the downstream side was proposed. However, it was assumed that the upstream SIP server has infinite capacity. In addition, Azhari et. al. [50] propose a fluid-flow model of two tandem SIP proxies and then use Lyapunov method to derive its stability conditions during overload.2 In [26–28] a SIP server is modeled as a Markov M/G/1/R queue with batch arrivals, to which a bi-level hysteretic queue length control scheme is applied. However, SIP call signaling is approximated as consisting of a single message. In our work, on the other hand, we consider both INVITEs and OKs and message re-transmissions in a priority queuing system.
2.2. Modeling approaches Recently, there has been some work on modeling the performance of SIP servers. A queuing system model with finite buffer has been developed in [48,49], along with an analytical perfor-
2 The cited work includes some of our preliminary results which are also repeated in this paper for completeness.
Please cite this article as: M. Jahanbakhsh et al., Lyapunov stability of SIP systems and its application to overload control, Computer Communications (2017), http://dx.doi.org/10.1016/j.comcom.2017.01.014
JID: COMCOM 4
ARTICLE IN PRESS
[m5G;February 14, 2017;10:52]
M. Jahanbakhsh et al. / Computer Communications 000 (2017) 1–17
In [40,41], a Markov model is provided for a SIP tandem system considering two INVITE and non-INVITE queues, where nonINVITEs are given strict priority during overload. An overload condition is detected whenever non-INVITE queue length exceeds a certain threshold. Two service disciplines, namely, gated and exhaustive priority service are considered and it is found that the former provides higher server utilization while minimizing server overload probability. The paper, however, does not provide an analysis of the effect of queue size threshold for detecting an overload condition. Moreover, [51] considers the IMS network, which is a SIP based signaling network. Two classes of signaling traffic are considered, namely, high priority non-INVITE messages and low priority INVITE messages. Then a Markov Chain is used to model the IMS network as a multi-server cut-off priority system and a Generalized Stochastic Perti Net is applied to model its performance under normal load and also for overload. The method, however, ignores the fact that non-INVITE messages are strongly related to previous INVITE messages. In addition, overload mitigation is achieved based on server over-provisioning. Hong et. al. [52] created a Markov-Modulated Poisson Process (MMPP) model to analyze the queuing mechanism of a SIP server under two typical service states. They also developed a fluid model to capture the dynamic behavior of SIP re-transmission mechanism for a single server with infinite buffer. Another novel strategy was also introduced to classify different types of re-transmission messages, along with a fluid model for a system of tandem overloaded SIP servers with finite buffer. Reference [53] proposes a fluid model for SIP networks to dimension buffer size for ensuring reasonable packet drop and system stability. However, they consider a linear system, which is an approximation, and use transfer functions to show stability; whereas we consider a non-linear system and propose a Lyapunov approach to stability, which is applicable to any non-linear system, as is the case with overloaded SIP servers. Moreover, our algorithms are entirely different; Hong’s algorithm is based on re-transmissions, whereas ours is based on queue length and queue length variation. More importantly, we use complementary strategies to SIP-OC. Hong et. al. mitigate short-term overloads by limiting re-transmissions, whereas our approach is to immediately throttle actual load to remove both short and long-term overloads. Their approaches are implicit requiring no feedback from downstream, which is very suitable in situations where there is no cooperation between adjacent SIP servers, for example, in cross carrier scenarios. However, we use explicit feedback which is more precise. We also differentiate among response processing overhead and timer expiration, which are considered to have the same overhead in Hong’s work. Lastly, our model also considers rejection and its overhead unlike Hong’s. Although much modeling work has been done for SIP networks, we are not aware of one which uses Lyapunov stability theory as it is used in this paper to propose a SIP overload control algorithm. However, Lyapunov theory has been used in other context such as stabilizing overloaded network of queues [54], calculating a throughput stabilizing routing in wireless sensor networks [55], and to propose a stabilizing flow control algorithm for congested networks [56]. Nevertheless, these methods are not directly applicable to a SIP signaling network.
1. Few solid work on system stability given that a SIP system is inherently a non-linear system. 2. No good analysis of the effect of overload detection lag on system convergence time and stability. 3. Very few work studying the effect of re-transmissions. 4. Lack of SIP-OC consideration over complex topologies that arise, for example, in the IP Multimedia Subsystem, which is a network of SIP elements used as the signaling core of next generation networks. 5. Lack of an appropriate autonomous scheme for finding the source of overload in a general SIP network containing many different signaling paths and entities. 6. Stability properties of various SIP-OC mechanisms including loss/rate/window-based schemes needs investigation. 7. The effect of virtualization technology on SIP-OC mechanisms needs to be investigated. There are, few papers such as [33] that consider virtualization of SIP servers, but their scope is limited to migration and a complete treatment of overload for reasons other than migration is not provided. In another work, throughput maximization and admission control is studied over a load-balanced farm of SIP servers [57]. They, show that their problem is NP-hard and propose a heuristic resource allocation scheme to improve total system throughput. However, no overload control scheme is considered for such a system. More importantly, none of the existing schemes consider variation in server capacity due to resource scaling and sharing as a result of virtualization. 3. A fluid-flow model for two tandem SIP proxies In this section, we model the state-space of the system composed of two tandem SIP proxies using deterministic fluid-flow techniques. The tandem system shown in Fig. 1 is the basic building block of more complex SIP networks and is important to be studied. Our model includes the UACs initiating sessions, upstream and downstream proxies in stateful mode with no authentication, and the UASs which terminate sessions. In a more general topology, the UACs and UASs can be replaced with other upstream and downstream proxies, respectively. In our system, proxies give higher priority to non-INVITE messages such as RINGING and OK. It has been shown this improves throughput during overload because responses received for forwarded INVITEs are not put in the congested INVITE queue of the overloaded proxy [25,33,58]. We also assume that queues are infinite, that network latency, packet loss and processing time of UAC and UAS is negligible, and that the SIP signaling only consists of INVITE and OK messages. The processing overhead of TRYING, RINGING, and ACK are appropriately factored into the processing time of INVITE and OK messages to produce the correct load on proxies.3 The group of UACs are collectively modeled as a deterministic fluid source of rate λ0 primary INVITE requests per second as shown in Fig. 1. Re-transmissions are generated using appropriate delay elements introduced to the primary INVITE requests. A UAS simply passes through all flow (INVITEs) that it receives, ruas , and feeds it back into the downstream proxy to model the OK response with flow rate r2,ok . The flow of OK responses is then fed back into the upstream proxy with rate r1,ok and is consumed by UACs with rate ruac,ok , which marks the end of the call flow cycle in the tandem system. Note that ruac,ok is the rate of OK messages
2.3. Open challenges and issues Although the body of work on SIP-OC is considerably large, there are several issues that are either not been considered yet or that lack solid work. We have enumerated some of these issues in the following; In our work, we focus on the first three issues.
3 We have neglected BYE transactions. We claim that since every successful INVITE is accompanied by a BYE then the processing overhead of BYE transactions can also be factored into INVITE and OK messages. A similar approach has been previously taken in [34]. We compare our model with ns-2 simulations which include the BYE transaction.
Please cite this article as: M. Jahanbakhsh et al., Lyapunov stability of SIP systems and its application to overload control, Computer Communications (2017), http://dx.doi.org/10.1016/j.comcom.2017.01.014
ARTICLE IN PRESS
JID: COMCOM
[m5G;February 14, 2017;10:52]
M. Jahanbakhsh et al. / Computer Communications 000 (2017) 1–17
5
0.5
0.5
1
1
2
2
1
2
2
3
3
2
3
4
4
4
4
8
5
5
5
16
6
6
4
8 16
2
3
4
5
6
Primary INVITE and INVITE Retransmission OK and Rejection Message
Fig. 1. Diagram showing system model.
Table 2 SIP message processing weights normalized to a complete call sequence.
sent back to the primary INVITEs issued by the UACs and is thus system throughput.4 Also note that under the assumptions of no latency, zero packet loss, infinite queue size and prioritized service of non-INVITE messages we have that
ruas (t ) = r1,ok (t ) = r2,ok (t ) = ruac,ok (t ).
(1)
In addition, these assumptions imply that no message retransmissions will be triggered by the downstream proxy or UAS. However, UAC and upstream proxy may initiate re-transmissions for the primary INVITE requests. It should be noted that, in a realistic scenario with reasonably large queue sizes, small network latency and negligible packet loss, re-transmissions from the downstream proxy or UAS will also be rare, provided that non-INVITE messages are prioritized. Moreover, each proxy rejects incoming INVITE requests with some tunable probability, resulting in a rejection rate of r1,rej and r2,rej by each of the upstream and downstream proxies, respectively. The rejection mechanism is equivalent to sending a stateful 503-REJECT message in response to an INVITE request. We also assume that an upstream agent receiving a REJECT message will not attempt retry. In the following three subsections, we elaborate on the system model derivation by considering proxy queue dynamics, upstream proxy re-transmission mechanism and UAC re-transmission mechanism. 3.1. Proxy queue dynamics A significant part of the dynamics of the tandem system is captured by the INVITE and non-INVITE queues of upstream and downstream proxies. The length of these queues comprise our state variables and are denoted by xi and xi,ok , for INVITE and nonINVITE (i.e., OK) queues, respectively. Here i = 1 for the upstream proxy and i = 2 for the one downstream.It is immediately observed that xi,ok = 0 due to treating non-INVITE queues with highest priority. This implies that OK messages will not experience timeout nor they will be re-transmitted. The state variable xi holds the weighted amount of INVITEs in the input queue of proxy i. A primary INVITE has unit weight and a re-transmission weights wwretx < 1. Table 2 lists the weights of difinv ferent messages, which represent the relative amount of processing power that is required by the CPU of a proxy to process each message. Table 3 also lists various variables and parameters used in our fluid-flow model. 4 This assumption hold as long as session setup delay is below thresholds set by SIP timers at the UAC, e.g., 32 s.
winv
wretx
wok
wrej
0.53
0.068
0.47
0.20
The state equations for x1 and x2 can be written as
⎧ c (t ) ⎨r1,inv (t ) − w¯11,res ,inv (t ) + x˙ 1 (t ) = ⎩ r1,inv (t ) − c1,res (t ) w¯ 1,inv (t )
x1 (t ) > 0 x1 (t ) = 0
⎧ c (t ) ⎨r2,inv (t ) − w¯22,res ,inv (t ) + x˙ 2 (t ) = ⎩ r2,inv (t ) − c2,res (t ) w¯ 2,inv (t )
(2)
x2 (t ) > 0 x2 (t ) = 0
(3)
where x˙ is the time derivative of x. Primary and re-transmitted SIP INVITE requests flow into the input queue of the jth proxy with flow rate of rj, inv (t). For the upstream proxy this is the sum of primary INVITE rate, λ0 , and the jth-level re-transmission rate, λj , issued by UACs, and is expressed as
r1,inv (t ) = λ0 (t − duac ) +
6 wretx λ j (t − duac ) winv
(4)
j=1
where duac is the network delay between UAC and upstream proxy. Similarly, for the downstream proxy r2, inv (t) is the sum of the rate at which primary INVITEs are forwarded by the upstream proxy, ( j) r2(0,in) v , and all the jth-level INVITE re-transmission rates, r2,inv , triggered by the upstream proxy and is expressed as
r2,inv (t ) = r2(0,in) v (t − d proxy ) +
6 wretx ( j ) r2,inv (t − d proxy ) winv
(5)
j=1
where dproxy is network delay between upstream and downstream proxies. INVITE requests in the jth proxy queue, xj , are forwarded if they are primary INVITEs, consuming a fraction winv of the CPU resources required by a call. Re-transmitted INVITE requests are simply ignored consuming wretx fraction of a call. Finally, primary INVITEs may also be rejected with probability pj,rej imposing a processing overhead of wrej compared to a call. It follows that the average amount of time taken by proxy j to process an INVITE
Please cite this article as: M. Jahanbakhsh et al., Lyapunov stability of SIP systems and its application to overload control, Computer Communications (2017), http://dx.doi.org/10.1016/j.comcom.2017.01.014
ARTICLE IN PRESS
JID: COMCOM 6
[m5G;February 14, 2017;10:52]
M. Jahanbakhsh et al. / Computer Communications 000 (2017) 1–17 Table 3 System parameters. Parameter
Description
cj cj,res winv wretx wrej wok w¯ j,inv xj xjp xj,k yi
Capacity of the jth proxy in calls per second (cps) Residual capacity of the jth proxy left for processing INVITEs (cps) Weight of processing an INVITE relative to a complete call Weight of processing re-transmitted INVITE relative to complete call Processing weight of rejecting INVITE relative to complete call Weight of processing an OK relative to a complete call sequence Average weight of processing INVITE relative to complete call at jth proxy Weighted length of the INVITE queue for the jth proxy Number of primary INVITEs in the jth proxy’s queue Number of non-INVITE (OK) messages queued up in the jthproxy Number of i-level INVITE re-transmission timers expired and yet to be served SIP call request rate from the UAC ith-level INVITE re-transmission rate from the UAC Potential ith-level INVITE re-transmission rate from the UAC Total weighted INVITE rate sent to the jth proxy ith-level INVITE (re-transmission) rate from upstream proxy r2i ,inv i = 1 · · · 6; introduced for notational convenience ith-level INVITE re-transmission timer scheduling rate at upstream proxy ith-level INVITE re-transmission timer expiration rate at upstream proxy Total rate of OK sent to jth proxy INVITE rejection probability at jth proxy INVITE rejection rate at jth proxy INVITE forwarding rate by the downstream proxy to the UAS OK forwarding rate by the upstream to UAC (call completion rate) Network latency between the UAC and the upstream proxy Network latency between upstream and downstream proxies Network latency between the downstream proxy and UAS
λ0 λi λi
rj, inv ) r2(i,in v ryi ryi ryi rj,ok pj,rej rj,rej ruas ruac,ok duac dproxy duas
relative to one complete call sequence is
residual capacity of proxies (cj,res ), which is left for processing INVITE requests, derived as
x jp (t ) (winv (1 − p j,re j (t )) + wre j p j,re j (t )) x j (t )
w¯ j,inv (t ) =
+ winv (t ) 1 −
x jp (t ) x j (t )
c1,res (t ) = (c1 (t ) − wok r1,ok (t − (2duas + d proxy )) − r1,timer (t ))+
(6)
where, xjp (t) is an auxiliary state variable holding the amount of primary INVITEs residing in the queue of the jth proxy and
x jp xj ,
which will be denoted by α j , is the fraction of primary INVITEs in that queue. It follows that, α j is approximately the fraction of time that the jth proxy is busy processing primary INVITEs. The state equations for xjp are
x˙ 1 p (t ) =
λ0 (t ) − r2(0,in) v (t ) − r1,re j (t ) (0 )
(λ0 (t ) − r2,inv (t ) − r1,re j (t ))
x˙ 2 p (t ) =
x1 p (t ) > 0 +
(0 )
r2,inv (t ) − ruas (t ) − r2,re j (t ) (0 )
(r2,inv (t ) − ruas (t ) − r2,re j (t ))
x1 p (t ) = 0 x2 p (t ) > 0
+
x2 p (t ) = 0.
(7)
(8)
Introducing xjp simplifies our model and removes the need for storing the entire contents of the queue. The drawback of this approach is that the temporal relationship between INVITEs and their re-transmissions is blurred. For instance, when the proxy receives a large amount of re-transmitted INVITEs our model predicts that the processing time of messages in the INVITE queue decreases immediately, whereas in reality, it should decrease when earlier primary INVITE messages in the queue are served. This approximation, however, only affects the transient dynamics of the system and does not change its long term behavior. So temporary deviations from simulation results is expected but the system model is expected to converge to the simulated system. Therefore, it is a good candidate model to study stability. We next calculate the depletion rate of proxy INVITE queues, as proxy capacity in calls per second (cps) divided by average INVITE processing overhead, w¯ j,inv . However, since non-INVITEs and timer expirations are treated with higher priority, we should consider the
c2,res (t ) = (c2 (t ) − wok r2,ok (t − 2duas ))+
(9) (10)
where duas is network delay between the downstream proxy and UAS. Also r1,timer is the rate of INVITE re-transmission timer expiration at the upstream proxy (see Eq. (21)) and wok is the processing weight of an OK response compared to a call. Clearly, the emptying c rate of INVITE queue of proxy j is w¯ j,res , justifying (2) and (3) rej,inv
garding INVITE queue dynamics. Moreover, the rate at which upstream proxy forwards INVITEs to the downstream proxy is
r2(0,in) v (t )
⎧ c (t ) ⎪ α1 (t ) · 1,res (1 − p1,re j (t )) ⎪ ⎪ w¯ 1,inv (t ) ⎪ ⎨ λ0 (t − duac ) c1,res (t ) = min · , λ0 (t − duac ) ⎪ ⎪ r1,inv (t ) w¯ 1,inv (t ) ⎪ ⎪ ⎩ (1 − p1,re j (t ))
x1 (t ) > 0, x1 (t ) = 0.
(11)
When the upstream queue is non-empty, (i.e., x1 (t) > 0), this is related to the fraction of primary INVITEs that exist in the queue which are not rejected, or the rate at which primary INVITEs enter the queue when it is empty (i.e., x(t ) = 0). Furthermore, the rate at which primary invites are rejected by the upstream proxy is
r1,re j (t )
⎧ c (t ) ⎪ α1 (t ) · 1,res p1,re j (t ) x1 (t ) > 0, ⎪ ⎪ ¯ w ⎪ ⎨ 1,inv (t ) λ0 (t − duac ) c1,res (t ) = min · , λ0 (t − duac ) x1 (t ) = 0. ⎪ ⎪ r1,inv (t ) w¯ 1,inv (t ) ⎪ ⎪ ⎩ p1,re j (t )
(12)
Please cite this article as: M. Jahanbakhsh et al., Lyapunov stability of SIP systems and its application to overload control, Computer Communications (2017), http://dx.doi.org/10.1016/j.comcom.2017.01.014
ARTICLE IN PRESS
JID: COMCOM
[m5G;February 14, 2017;10:52]
M. Jahanbakhsh et al. / Computer Communications 000 (2017) 1–17
7
Similarly, the INVITE forwarding rate from the downstream proxy to the UAS is
It follows that the rate of re-transmission from the upstream proxy to the one downstream will be
ruas (t )
ryk (t )
⎧ c (t ) ⎪ α2 (t ) · 2,res (1 − p2,re j (t )) x2 (t ) > 0, ⎪ ⎪ w¯ 2,inv (t ) ⎪ ⎪
⎨ (0 ) r2,inv (t − d proxy ) c2,res (t ) (0) = min · ,r (t − d proxy ) ⎪ ⎪ r2,inv (t ) w¯ 2,inv (t ) 2,inv ⎪ ⎪ ⎪ ⎩ ×(1 − p2,re j (t )) x2 (t ) = 0.
⎧ 6 (c (t ) − wok r1k (t )) 6yk (yt )(t ) j=1 y j (t ) > 0 ⎪ ⎨ 1 j=1 j 6 ryk (t ) = min ry (t ), (c1 (t ) − wok r1k (t )) 6 , j=1 y j (t ) = 0 k ⎪ j=1 ry j (t ) ⎩ 6 0 j=1 ry j (t ) = 0 (20) and the total rate of timer expirations at the upstream proxy is
(13) Interestingly, referring to (1) then ruas is the system throughput as long as session setup delay is within bounds set by the UAC (e.g., 32 seconds). On the other hand, the downstream rate of rejecting primary invites is
r2,re j (t )
r1,timer (t ) =
6
ry j (t ).
(21)
j=1
4. Classification of overload dynamics and model verification
Here, T1 = 0.5 s is the initial SIP re-transmission timer and the subscript k = 1 · · · 6 indicates a certain level of re-transmission timers being fired, in particular, after 0.5, 1, 2, 4, 8 and 16 s. These potential re-transmissions of rate λk are then gated through a suitable trigger function expressed by the indicator function 1{} in (16), which eventually decides whether a particular re-transmission should be issued.
We study different cases of overload that may happen in a tandem SIP system, summarized in Table 4, and for which example state-space trajectories are shown in Fig. 2. The first and second case, correspond to when only the downstream or the upstream server is overloaded, respectively. The downstream server becomes exclusively overloaded when the upstream server has much larger capacity compared to the downstream, as outlined in Table 4. As a result, the upstream re-transmissions triggered due to downstream overload take up less than the available capacity at the upstream entity. In case-2 overload, only the upstream proxy is overloaded. This scenario happens when the capacity of the upstream proxy is smaller than downstream, so that it can never overload it. The third case of overload corresponds to when the downstream proxy is overloaded first, which then spreads to the upstream. Overload spreads because of re-transmissions that are triggered upstream. The spread of overload reduces upstream effective capacity and forwarding rate to the downstream server, which eventually relieves downstream from overload. The upstream server, however, remains in overload due to excessive request backlog. This behavior can be clearly observed in Fig. 2 by the curve labeled “Case-3”. Such overload occurs whenever the upstream server has a slightly larger capacity than the downstream, as noted in Table 4. If the amount of overload is large enough, then a case three overload can end up overloading both downstream and upstream servers. This happens when the upstream server has enough capacity to flood the downstream even after losing some of its capacity to handling re-transmissions. This is what happens in overload cases four and five, as illustrated in Fig. 2 with parameters according to Table 4. Case five differs from case four in the extent of overload imposed on the downstream server.
λk (t ) = λk (t )1{x1 (t−2k−1 T1 )>2k−1 T1C1 (t )−t
4.1. Comparing with simulations
⎧ c (t ) ⎪ ⎪ α2 (t ) · 2,res p (t ) x2 (t ) > 0, ⎪ ⎪ ¯ (t ) 2,re j w 2 ,in v ⎪ ⎨
= r2(0,in) v (t − d proxy ) c2,res (t ) (0) ⎪ min · ,r (t − d proxy ) ⎪ ⎪ r2,inv (t ) w¯ 2,inv (t ) 2,inv ⎪ ⎪ ⎩ ×p2,re j (t ) x2 (t ) = 0. (14) 3.2. The INVITE re-transmission mechanism This section describes the dynamics of INVITE re-transmissions generated by the UACs and the upstream proxy. Note that, retransmission may only occur for INVITE messages, since others are treated with high priority. In addition, the downstream proxy and the UAS will not issue any re-transmission, as their immediate next hop is not overloaded. UAC re-transmissions are modeled by feeding the flow of primary INVITE requests of rate λ0 , through a cascade of delay elements corresponding to different timeout intervals, namely, 0.5, 1, 2, 4, 8 and 16 s, as illustrated in Fig. 1 and expressed in the following mathematical form
λk (t ) = λk−1 (t − 2k−1 T1 ), ∀k = 1..6.
i=t−2k−1 T1
(15)
r1,timer (i )} .
(16)
Similarly, the re-transmissions issued by the upstream proxy are fed into a cascade of delay elements resulting in potential retransmissions of rate
ryk (t ) = ryk−1 (t − 2k−1 T1 ),
(17)
which are then gated through an indicator function resulting in actual INVITE re-transmission timeouts of rate
ryk (t ) = ryk (t )1{x2 (t−2k−1 T1 )>2k−1 T1C2 (t )}.
(18)
However, unlike independent UACs which immediately issue a retransmission request, the upstream proxy should enqueue timeout events to be sent sequentially. The state transition equation for the length of this timeout queue yk for the kth level of timeouts is
y˙k (t ) = (ryk (t ) − ryk (t ))+
∀k = 1..6.
(19)
We compare our fluid based model against simulations performed in ns2. The model is realized through its state-space equations which are solved numerically using MATLAB. A fully detailed simulation model of the tandem SIP system is also developed in ns2, which completely models the SIP state machine for two transaction-stateful proxies connected in tandem according to RFC 3261 [1]. In our comparisons we are interested in the asymptotic behavior of our model during overload. As a result, our model is considered appropriate as long as it asymptotically estimates an upper bound on queue sizes for both proxies.In fact, simulations were run long enough to observe that this trend is maintained. Without losing generality, we only considered case-3 overload for comparing queue lengths in upstream and downstream servers, shown in Figs. 3 and 4. It can be observed that the error between
Please cite this article as: M. Jahanbakhsh et al., Lyapunov stability of SIP systems and its application to overload control, Computer Communications (2017), http://dx.doi.org/10.1016/j.comcom.2017.01.014
ARTICLE IN PRESS
JID: COMCOM 8
[m5G;February 14, 2017;10:52]
M. Jahanbakhsh et al. / Computer Communications 000 (2017) 1–17
7000 Case 1 Case 2 Case 3 Case 4 Case 5
6000
X ( Message ) 2
5000 4000 3000 2000 1000 0 0
500
1000
1500
2000
2500 3000 X1 ( Message )
3500
4000
4500
5000
5500
Fig. 2. Types of overload in the tandem system.
4
4
x 10
Model Simulation
3.5
X1 ( Message )
3 2.5 2 1.5 1 0.5 0 0
20
40
60
80
100 120 Time ( Sec )
140
160
180
200
Fig. 3. Case 3 Overload: INVITE Queue Size of Upstream Proxy (x1 ).
6000 Model Simulation
X2 ( Message )
5000
4000
3000
2000
1000
0 0
20
40
60
80
100 120 Time ( Sec )
140
160
180
200
Fig. 4. Case 3 Overload: INVITE Queue Size of Downstream Proxy (x2 ).
Please cite this article as: M. Jahanbakhsh et al., Lyapunov stability of SIP systems and its application to overload control, Computer Communications (2017), http://dx.doi.org/10.1016/j.comcom.2017.01.014
ARTICLE IN PRESS
JID: COMCOM
[m5G;February 14, 2017;10:52]
M. Jahanbakhsh et al. / Computer Communications 000 (2017) 1–17
9
Table 4 Different cases of overload. Case No.
Description
Example
1 2 3
Downstream only Upstream only Temporary downstream overload takes upstream into permanent overload Downstream remains overloaded and takes upstream into overload Upstream and downstream both become overloaded at almost the same rate
c1 = 400, c2 = 200, λ0 = 210 c1 = 190, c2 = 20 0, λ0 = 30 0 c1 = 280, c2 = 200, λ0 = 210
4 5
queue lengths from the model and simulations starts to increase toward the end of the simulation time. Nevertheless, this shows that our model is suitable for overload conditions, as it provides an upper bound on the simulation results. This means that if the real system goes into overload then the model will also behave as overloaded, because it has a larger backlog. More importantly, any stabilizing regime dictated by the model such as our proposed Lyapunov method will also stabilize the actual system which has a smaller backlog. 5. Lyapunov stability of the system When a proxy becomes overloaded its INVITE queue length, x, will increase to the point that all call setup requests will incur large delay or even experience time out. The tandem system shown in Fig. 1 is overloaded when at least one of the proxies becomes overloaded. Hence overload can be viewed as the system described by variables (x1 , x2 ) becoming unstable. To bring the system out of overload, queue lengths should be reduced below a certain value. Without loss of generality, we take this value to be zero for both x1 and x2 . In fact, for our deterministic fluid model, a system working under normal load will always have x1 and x2 equal to zero. Hence we take (x1 , x2 ) = (0, 0 ) to be the equilibrium of the set of state equations given by (2) and (3). Now we can define some Lyapunov function L(x1 , x2 ) to represent the stability of the tandem proxy system. This function should be strictly positive everywhere, except possibly at the origin. Then according to Lyapunov stability theorem, our system is stable if and only if the time derivative of this function, L˙ (x1 , x2 ), is strictly negative everywhere except at the origin. We have chosen the following Lyapunov function,
L(x1 , x2 ) = x1 (t ) + x2 (t )
(22)
It follows that for our system to be stable we should have,
L˙ (x1 , x2 ) = x˙ 1 (t ) + x˙ 2 (t ) < 0, ∀(x1 , x2 ) = (0, 0 )
(23)
Our objective then is to find rejection probabilities at both proxies such that (23) is satisfied. It is also possible to control the speed of convergence to stability by forcing L˙ (x1 , x2 ) to be more negative. In general, we can define the following stabilizing (overload control) condition,
L˙ (x1 , x2 ) = x˙ 1 (t ) + x˙ 2 (t ) < −z, ∀(x1 , x2 ) = (0, 0 )
(24)
where z > 0 is the rate at which the sum of proxy queues are emptied. It is worth mentioning that (22) is not the only choice for a Lyapunov function. In fact, one could use any positive definite function over any set of state variables, such that the state becoming indefinitely large implies overload and vice versa. The general rule is that if L(.) goes to infinity then the system must have been overloaded and if the system becomes overloaded then L(.) should go to infinity. A simple function such as (22) is sufficient for this purpose.
c1 = 320, c2 = 200, λ0 = 250 c1 = 320, c2 = 20 0, λ0 = 40 0
For the set of state variables, however, there are many options available. These include the number of primary invites, xjp , or the length of the timer expiration queue, yk . Nevertheless, selecting total number of INVITEs as done in (22) has two significant benefits over other alternative state variable combinations. First, it is a practically obtainable quantity in a SIP server as opposed to nonretransmitted invites or timer expiration backlog. Second, it better reflects the severity of an overload because it also takes into account INVITE re-transmissions that are indicative of the overload depth. So now, combining (24), (2) and (3), it is possible to derive rejection probabilities that ensure stability. The formula derived for p1,rej and p2,rej has closed form but we do not provide it here as it is complex. Nevertheless, the resulting rejection probabilities have the following form,
( p1,re j , p2,re j ) = G(z, c1 , c2 , r1,inv , r2,inv , ruas )
(25)
As an example, we have included Fig. 5 which shows the range of stabilizing rejection probabilities for different values of z when SIP call request rate is λ0 = 500 cps and the capacity of upstream and downstream proxies are 400 and 200 cps, respectively. This scenario is a case-5 overload, in which the downstream proxy is overloaded, consequently overloading the upstream proxy as well. 5.1. Optimal rejection policy Fig. 5 shows that for a given speed of convergence, z, our stabilizing algorithm can choose a range of rejection probabilities for each proxy. For instance, to barely stabilize the system of Fig. 5, (i.e., for z = 0), one can choose (p1,rej , p2,rej ) to be any of (0.39, 0) or (0, 0.75). Then what is the difference between each set of rejection probabilities? We prove that if rejection is only performed by the upstream proxy then system throughput is maximized.5 Consider an overloaded system with non-zero queue lengths, i.e., x1 , x2 > 0. Recall that throughput is ruas and we have ruas = r1,ok = r2,ok . Replacing c2,res with (10), in (13), also using r2,ok = ruas and solving for ruas we get,
ruas =
c2 (1 − p2,re j ) α (−α2 wok − winv + wre j ) p2,re j + α2 wok + winv 2
where α2 =
−
x2 p x2 .
(26)
The derivative of ruas with respect to p2,rej is,
α2 c2 wre j <0 ((−α2 wok − winv + wre j ) p2,re j + α2 wok + winv )2
(27)
Therefore, ruas is a strictly decreasing function of p2,rej and p2,re j = 0 for maximum throughput. The intuition behind this “upstream rejects” policy is to spend less total proxy time for an excess INVITE request by rejecting it as soon as possible. We have also verified this result by simulating a variety of scenarios. We adopt this rejection policy throughout the paper. 5 We prove this result when c1 ≥ c2 . In the other case, however, it is obvious that the upstream proxy is always the one responsible for rejection, since the downstream proxy is receiving from a slower one and will not become overloaded.
Please cite this article as: M. Jahanbakhsh et al., Lyapunov stability of SIP systems and its application to overload control, Computer Communications (2017), http://dx.doi.org/10.1016/j.comcom.2017.01.014
ARTICLE IN PRESS
JID: COMCOM 10
[m5G;February 14, 2017;10:52]
M. Jahanbakhsh et al. / Computer Communications 000 (2017) 1–17
Fig. 5. Rejection probabilities obtained from Solving Eq. (24) When c1 = 400, c2 = 200 and λ0 = 500.
Table 5 Maximum withstand-able overload c1 = 400, c2 = 200. K
0
1
2
3
4
5
6
λ0 (cps)
1152
733
537
424
350
298
260
Table 6 Minimum upstream proxy capacity to withstand overload of λ0 = 500 cps when c2 = 200. K
0
1
2
3
4
5
6
c1 (cps)
263
323
383
443
503
563
623
5.2. Maximum withstand-able overload A crucial and rather interesting question is how much overload will our system be able to withstand under ultimate rejection? We distinguish between two cases, namely, c1 < c2 and c1 ≥ c2 . In either case, the maximum throughput is min(c1 , c2 ). In the first case, the downstream proxy with capacity c2 receives from a slower proxy and not become overloaded. The upstream proxy, on the other hand, has to reject excess INVITEs to prevent overload, at the cost of reduced system throughput, i.e., ruas ≤ min (c1 , c2 ). This is a classical case-2 overload, for which we can never obtain the maximum throughput of min(c1 , c2 ) during overload. It is easy to see that system throughput in this case is simply the INVITE forwarding rate from upstream to downstream, r2(0,in) v given by (11). On the other hand, c1 ≥ c2 creates a rather interesting case. Let us find the maximum amount of overload λ0 that can be handled by our optimal rejection policy (see Section 5.1), such that maximum system throughput is retained, i.e., ruas = min(c1 , c2 ) = c2 . Note that any stabilizing control should stop re-transmissions from the upstream proxy (r1,timer = 0). Moreover, since calls are only rejected at the upstream proxy, throughput will be equal to the INVITE forwarding rate of the upstream proxy, i.e., r2(0,in) v = c2 = ruas . Then equating (11) when x1 = 0 with c2 and also substituting r1,ok = r2,ok = c2 and r1,timer = 0 in (9) we obtain,
λ0 ≤
c1 − c2 + wre j c2 wre j + Kwretx
(28)
where λ0 is the maximum overload that the system can withstand without throughput deterioration and K = 0 · · · 6 is the number of re-transmission levels triggered from the UACs. The largest value for λ0 is obtained when there are no retransmissions from UACs, i.e., K = 0. For instance, Table 5 shows maximum λ0 for each value of K, when the upstream and downstream proxy have capacities of 40 0 and 20 0 cps, respectively. Results from this table can be interpreted as follows: Say we have an overload of 10 0 0 cps < 1152. Then our overload control scheme will stabilize the system if it can take it out of overload within 0.5 s before the first round of re-transmissions from UACs. If it misses this deadline, then there is no way it can relieve the system unless the load subsides to less than 733 cps. In the worst case scenario, the load has to come down to 260 cps, much lower
than the capacity of the upstream proxy, for the system to be stabilized. This example shows the importance of response time and speed of the overload control scheme. A general rule is to “relieve system load before any re-transmissions happen”. This example also illustrates the devastating effect of re-transmissions on system performance. Re-transmissions from the upstream proxy can be eliminated using methods such as the one proposed in [34]. However, re-transmissions from UACs cannot be prevented unless UACs are sent back a 503 Service Unavailable response promptly. Therefore having a rejection policy is inevitable for any SIP overload control scheme. The question of maximum withstand-able overload can also be posed from a capacity planning perspective. Take the case where the maximum overload is roughly known and we wish to provision enough capacity to handle it. Rewriting (28) we get the following relationship for c1 in terms of λ0 , c2 and K,
c1 ≥ λ0 (wre j + Kwretx ) + c2 (1 − wre j )
(29)
Table 6 shows minimum upstream proxy capacity for each value of K. It is easy to see that c1 needs to be larger as more retransmissions come from UACs, even requiring more than twice as much capacity under full re-transmission compared to no retransmission. This result again emphasizes the importance of fast response to overload. In practice, however, any upstream proxy may also be downstream for some other SIP call flows. Then how is the result in (29) useful? Let us assume that we can control the CPU share of the upstream proxy, allocated to processing incoming and outgoing calls, denoted by (1 − η ) and η, respectively. Given that the proxy capacity is c, then it can handle incoming and outgoing calls with a capacity of (1 − η )c and ηc, respectively. Let us denoting the capacity of the immediate next hop proxy allocated to incoming calls from this proxy by cnexthop . Then replacing c2 with cnexthop and c1 with ηc in (29) we obtain,
η≥
λ0 (wre j + Kwretx ) + cnexthop (1 − wre j ) c
(30)
This is a very interesting result, which can be interpreted in two ways; From a capacity planning point of view, if we require the
Please cite this article as: M. Jahanbakhsh et al., Lyapunov stability of SIP systems and its application to overload control, Computer Communications (2017), http://dx.doi.org/10.1016/j.comcom.2017.01.014
ARTICLE IN PRESS
JID: COMCOM
M. Jahanbakhsh et al. / Computer Communications 000 (2017) 1–17
[m5G;February 14, 2017;10:52] 11
Fig. 6. Stable and non-stable system trajectories.
capacity of a proxy for incoming calls to be some value cin then in using c = 1c−η and (30) we obtain,
c ≥ cin + λ0 (wre j + Kwretx ) + cnexthop (1 − wre j )
(31)
From an overload control point of view, any upstream proxy has to dynamically tune its value of η such that (30) is satisfied. We believe this novel result to be of practical significance. 5.3. State-space representation of stabilizing algorithms Let us contemplate on how the state-space trajectory of an overloaded system looks when different rejection probabilities are applied at the upstream proxy. We apply a surge of 10,0 0 0 calls to the system and then immediately start rejecting with a fixed probability at the upstream. Fig. 6 plots state-space trajectories for different rejection rates. The markers on each curve are a second apart and system state trajectories are almost piece-wise linear. The first segment corresponds to the load surge jumping x1 to a little bit above the surge value due to first round of retransmissions after half of a second. Rejection starts at the beginning of the second segment (i.e., at the first marker), where rate of downstream queue buildup catches up with that of upstream due to re-transmissions from upstream to downstream, taking the system deeper into overload as the trajectory diverges from the origin. It is clearly observed that the length of this second segment, which we call the “overload buildup”, becomes smaller as rejection rate increases ranging from 3 s when not rejecting at all to almost zero when rejecting at 50%. The third segment contributes to less increase for the upstream queue but still the flow of re-transmissions from upstream to downstream is flooding x2 . This is because, by this time which is almost 4 s into overload the first three re-transmissions have occurred and the next one will be due for almost another four seconds. This is a good time for the overload control algorithm to take advantage and bring back system into stability. For instance, rejecting at 60% apparently achieves this goal and changes system trajectory toward the origin. In order to better identify which segment of these trajectories is stabilizing and which segment is not, we decide to evaluate our simple Lyapunov function’s derivative at these points. Recall that to achieve stability we should have L˙ (x1 , x2 ) = x˙ 1 + x˙ 2 < 0, which can be rewritten as,
x˙ 1 1 +
x˙ 2 x˙ 1
Note that
< 0, x˙ 1 = 0. x˙ 2 x˙ 1
(32)
is simply the slope of system trajectory, hence, In-
equality (32) only holds when the slope of the state-space trajec-
tory is between 3π /4 and 7π /4, that is, more tilted than the parallel dash-dotted lines drawn over Fig. 6. Following this result, only trajectories for rejection rates 40% and more are stabilizing. When rejection is at 20% and 30%, however, the trajectory is non-stabilizing. In fact, the backlog of the downstream proxy becomes larger during the last segment which we call the “overload spread” stage, where overload has now contaminated the downstream proxy as well as the upstream. Nevertheless, these trajectories eventually become stabilizing once x1 = 0 and the curve starts descending vertically toward the origin at a slope of 3π /2. When no rejection is performed or even when it’s merely at 10% the trajectory will clearly never become stabilizing and the overload control algorithm will fail indefinitely. An interesting case is the state trajectory for 30% of rejection, when it temporarily becomes stabilizing during its third segment during which upstream re-transmissions have calmed only for a short period of time. Unfortunately, as the fourth wave of retransmissions arrives, the system again deviates from its stabilizing trajectory and will not recover until it’s trajectory hits the vertical axis after a very long period. During this entire time period, system throughput is greatly deteriorated and will eventually reach zero as call requests begin to age and timeout. The above example, nicely illustrates the influential role of retransmissions on SIP network stability. Although re-transmissions are part of the standard to provide reliability over the commonly used unreliable UDP, nevertheless, it is convincing enough to limit it to two or three attempts. If that were the case then even the 20% rejection case would almost qualify as a stabilizing trajectory. Let us look at this from another perspective by generalizing the Lyapunov time derivative to be less than some negative value −z instead of zero, i.e.,
L˙ (x1 , x2 ) = x˙ 1 + x˙ 2 < −z, z ≥ 0.
(33)
Note that our Lyapunov function represents total system workload. It follows that by picking a larger z we are forcing the system to deplete its workload at a quicker pace, in fact at a pace of z messages per second. Therefore, we expect to reach stability sooner as we increase z, especially for cases of excessive overload. For example, Fig. 7 shows system state trajectory when an overload of 10,0 0 0 cps is applied to the system but this time it is maintained for 10 s. The upstream and downstream proxies have a capacity of 400 cps and 200 cps, respectively. Clearly, setting z = 300 forces the system on a stabilizing track right after the overload stops.6 On the other hand, when z = 200 the system will converge to a 6 Note that an overload of 10,0 0 0 cps is not stabilizable even under full rejection unless after residing.
Please cite this article as: M. Jahanbakhsh et al., Lyapunov stability of SIP systems and its application to overload control, Computer Communications (2017), http://dx.doi.org/10.1016/j.comcom.2017.01.014
JID: COMCOM 12
ARTICLE IN PRESS
[m5G;February 14, 2017;10:52]
M. Jahanbakhsh et al. / Computer Communications 000 (2017) 1–17
Fig. 7. The effect of parameter z on system trajectory.
500 Trigger @0 Trigger @1sec Triggers @10sec
450 400 350 300 Z 250 200 150 100 50 0 0 10
1
2
10
10
3
10
Response Time (sec) Fig. 8. The effect of overload trigger lag on system response time.
stabilizing trajectory when it becomes tangent to the 3π /4 baseline tracks which happens some time after overload ends. However, z = 100 is not capable of forcing the system trajectory to tilt below the baseline tracks unless after hitting the vertical axis at very distant future.
5.4. Overload detection and triggering lag Up to this point we have assumed that an overload event will be immediately detected at the upstream proxy, which will then promptly trigger the OC algorithm. In practice, however, it may take on the order of a few seconds for this process to happen, mainly due to the lag involved in detecting overload at the upstream which is sometimes even necessary to prevent false triggering. In this section we have applied a single surge of 10,0 0 0 calls to the system and considered the effect of various OC triggering lags and values of z on the OC response time (see Fig. 8). The response time of an overload control algorithm is defined as the time it takes from the moment overload control is triggered until the system converges to a stabilizing trajectory.
According to Fig. 8, for a moderate z = 200 the response time triples by merely triggering one second after the surge happens reaching 34 s. If triggering lag is 10 s then it will take the OC mechanism an additional 12 s to force the system trajectory on a stabilizing path. During this time system throughput will most likely reach zero as the majority of requests have timed out, causing a temporary performance collapse. Nevertheless, Fig. 8 shows that triggering lag can be to some extent compensated by larger choice of z at the cost of rejecting more requests. For instance, moving up to z = 300 will make the one second trigger lag case perform just as prompt triggering with a response time of 6.6 s. During this time at most four rounds of re-transmissions have happened and the system throughput is still expected to be well above zero. However, when trigger lag is 10 s then no amount of z can completely compensate for it and response time bottoms out at about 13 s. In fact, moving beyond z = 300 will never improve response time in any of the cases since the system is already operating at full rejection. Moreover, Fig. 8) also suggests that for small z, response time will become very large and be dominated by z as opposed to trigger lag. From this discussion it follows that it is equally important
Please cite this article as: M. Jahanbakhsh et al., Lyapunov stability of SIP systems and its application to overload control, Computer Communications (2017), http://dx.doi.org/10.1016/j.comcom.2017.01.014
ARTICLE IN PRESS
JID: COMCOM
[m5G;February 14, 2017;10:52]
M. Jahanbakhsh et al. / Computer Communications 000 (2017) 1–17
13
Fig. 9. Heuristic PID overload control system based on Lyapunov function.
to promptly trigger the overload control mechanism as well as to pick the right choice of z which depends on the extent of overload. We believe that a suitable choice of z would be to set it adaptively based on the amount of system workload during an overload period. 6. An overload control algorithm We now use the previously proposed Lyapunov function, but require
L˙ (x1 , x2 ) = −L(x1 , x2 ),
(34)
as the stability condition. Such a condition is known to yield “exponential stability”; that is, the system state will converge to stability with exponential pace. This results in the following control law for a stabilizing overload control algorithm:
x˙ 1 + x˙ 2 + x1 + x2 = 0
(35)
The solution to this differential equation is complex in terms of system parameters, however; so an efficient control law is not directly obtainable. Hence, we propose to use a heuristic controller in the manner shown in Fig. 9 that monitors L + L˙ and adjusts the appropriate control signal to the upstream proxy in the form of INVITE rejection probability, prej . We have chosen the well-known Proportional-Derivative-Integral (PID) controller as our control algorithm. The control algorithm is applied periodically over cycles of t. Each server, i, estimates its queue variation rate, x˙ i (t ) = xi (t )−xi (t −t ) , t
and its queue size, xi , and sends it to the upstream server as feedback at the end of each cycle. The upstream proxy then computes x˙ 1 (t ) + x˙ 2 (t ) + x1 (t ) + x2 (t ) and its error e(t) with respect to the target value of 0 and sets prej (t) for the next cycle as,
pre j (t ) = pre j (t − t ) + k p e(t ) + ki
N n=0
e(t − nt ).t + kd
e t (36)
where kp , ki and kd are the proportional, integral, and derivative gains of the PID controller, respectively. The proportional part of the controller plays the main role in adjusting the rejection probability to achieve the objective of zero error. A large proportional gain, kp , results in shorter response time but may cause fluctuations. The derivative gain, kd , counteracts these fluctuations by observing the rate of change in the error term. Furthermore, the integral term ensures that error does not accumulate and the system converges to stability. Note also that a negative rejection probability implies no rejection, whereas, a rejection probability larger than one implies full rejection. It should be noted that our proposed scheme fulfills most of the requirements set by RFC 5390 [8] for any SIP-OC scheme. A detailed discussion on the various requirements and the extent to which they are met is also provided in the Appendix at the end of the paper.
6.1. Experimental evaluation of Lyapunov-based-Heuristic overload control We have evaluated the Lyapunov-based PID overload control algorithm by running an actual experiment over a testbed illustrated in Fig. 10 where both upstream and downstream proxies are properly down-clocked to achieve a capacity of 440 cps. The control cycle is set to 100 ms and PID gains were empirically set to k p = 0.003, ki = 0.005 and kd = 0.0001, while the latest ten error samples were used for the integral term (i.e., N = 9). The PID gains were tuned using trial and error, but were found to be suitable for a large range of input load. According to the load profile shown in Fig. 11, both proxies become overloaded during [50s 100s]. It is observed that the PID control approach based on the proposed Lyapunov function is able to maintain throughput between 420 and 440 cps, which is close the nominal system capacity of 440 cps. Moreover, the controller suitably and promptly backs out of rejection when overload recedes at 100 s. Fig. 12, further illustrates the queue size for each proxy. At the 50th second, the PID overload controller senses a spike in the error function and reacts by promptly rejecting some excess load. This prevents further increase of queue size, bringing it down from 100 KB to 40 KB at the downstream proxy. The fact that upstream queue size does not jump suggests that our PID overload controller reacts promptly to the overload condition. We next present the results for another experiment when two independent upstream proxies are sending to the downstream proxy, when the downstream becomes overloaded. In this scenario the first upstream proxy has a nominal capacity of 2500 cps, while the downstream proxy and the second upstream both have a nominal capacity of 1250 cps. Fig. 13 shows throughput achieved by each upstream. For the first 50 s, both send 700 cps, slightly overloading the downstream and resulting in a total throughput of 1150 cps with a reasonable downstream queue size of 20 KB as shown in Fig. 14. From 50 to 100 sec, upstream-1 having the larger capacity increases its load to 1400 cps, which overloads downstream by an excess load of 600 cps, also causing a jump in downstream queue size to 120 KB. The Lyapunov-based SIP-OC immediately reacts at both upstream servers independently, forcing upstream-1 and upstream-2 to send roughly, 800 and 400 cps, respectively. This is a total of 1200 cps which is just below the downstream capacity of 1250 cps. At this point, each upstream proxy is throttling nearly 43% of its traffic. Hence, our proposed algorithm provides proportional fairness as well. From 150 to 200 s, both upstreams now send with the same rate of 1400 cps, which jumps queue size to about 180 KB at first. Again, both upstreams independently react to this event and throttle their sending rates by 60%, achieving an equal throughput of almost 570 cps each with a downstream queue size of 150 KB. During 200–250 s, upstream-2 reduces its sending rate back to 700 cps, and the SIP-OC residing at upstream-1 takes advantage of the freed capacity. We also compare our approach to [17], where they use two PI controllers to regulate both queue size and CPU occupancy. Fur-
Please cite this article as: M. Jahanbakhsh et al., Lyapunov stability of SIP systems and its application to overload control, Computer Communications (2017), http://dx.doi.org/10.1016/j.comcom.2017.01.014
ARTICLE IN PRESS
JID: COMCOM 14
[m5G;February 14, 2017;10:52]
M. Jahanbakhsh et al. / Computer Communications 000 (2017) 1–17
Fig. 10. Testbed setup for PID overload control.
700
Throughput Offered Load
600
Rate (cps)
500 400 300 200 100 0 0
10
50
100
Time (sec)
140
150
Fig. 11. Evaluation of Lyapunov-based PID overload control algorithm.
100
Queue Length (KB)
Upstream
80
Downstream
60
40
20
0 0
10
50
100
Time (sec)
140
150
Fig. 12. Queue length of each proxy.
1400 Upstream−1 Upstream−2
Throughput (cps)
1200 1000 800 600 400 200 0 0
20
40
60
80
100
120
140
160
180
200
Time (sec) Fig. 13. Evaluation of Lyapunov-based PID overload control algorithm for two upstreams overloading a single downstream.
Please cite this article as: M. Jahanbakhsh et al., Lyapunov stability of SIP systems and its application to overload control, Computer Communications (2017), http://dx.doi.org/10.1016/j.comcom.2017.01.014
ARTICLE IN PRESS
JID: COMCOM
M. Jahanbakhsh et al. / Computer Communications 000 (2017) 1–17
Queue Length (KB)
200 DownStream Upstream−1 Upstream−2
150 100 50
REQ5
0 0
50
100
150
200
Time (sec) Fig. 14. Queue length of each proxy. 1 PI Lyapunov
Normalized Goodput
0.9
REQ6
0.8 0.7 0.6
REQ7
0.5 0.4 0.3 0.2
REQ8
0.1 0 0
0.5
1
1.5
2
2.5
REQ9,14
3
Normalized Offered Load Fig. 15. Throughput comparison with [17].
REQ10,11
thermore, they also evaluate their method using experiments. But since they only consider a standalone proxy, we perform the comparison for when the upstream is exclusively overloaded across a range of different loads. Fig. 15 shows that our proposed Lyapunov stabilizing algorithm maintains nearly 15% larger throughput during overload, but suffers more throughput variation when overload becomes very large.
REQ12
7. Conclusion
REQ13
In this paper we developed a deterministic fluid flow model to analyze a tandem system of SIP servers during overload. We then used the theory of Lyapunov stability to propose a simple overload control algorithm which is enforced by a heuristic PID controller. This overload controller is then tested experimentally showing it can suitably control overload. Moreover, we study the state space representation of the tandem SIP system and characterize stabilizing and non-stabilizing state trajectories, providing valuable insight into the overload control problem. In addition, the effect of overload control triggering latency on system stability is studied, where we showed that prompt reaction to overload is critical in sustaining such a situation. Appendix. Discussion of IETF SIP-OC requirements REQ1 Maintain maximum throughput during overload : met. REQ2 Does not allow overload to spread : met. REQ3 Minimize amount of configuration : partially met. The control interval and feedback function need to be configured. REQ4 Capability to deal with non-supporting elements : met. If the downstream proxy does not support it, the local queue length of the upstream proxy is used as a measure. This guarantees stability for overload cases 2–5, since x1 will increase if x2 is still in overload. Alternatively, if the upstream proxy does not support the proposed method, and the one downstream is overloaded
REQ15
REQ16
REQ17
REQ18
REQ19
REQ20
[m5G;February 14, 2017;10:52] 15
then the downstream proxy will have to take matters in hand and start rejecting locally. This is non-optimal but feasible. However, for a case-1 overload where the downstream is exclusively overloaded and it does not send back any feedback, then the upstream proxy is configured to initiate rejection after not receiving any response. A similar policy is also used in RFC 7339 [11]. Should not assume completely trusted elements : partially met. An untrusted entity which is non-conforming to SIP-OC feedback can be limited by receiving only a certain fraction of the overloaded proxy capacity. Excess requests will be individually rejected by the proxy. However, this mechanism is out of the scope of this work and is not considered. The overload feedback should be unambiguous: met. We do not re-use the 503 rejection response code, instead we embed overload control feedback information in the Via header. The overload feedback should not have on/off nature: met. The overload decision is made at the upstream entity according to the value of the Lyapunov derivative, which is a continuous quantity. Requests should not be retried on overloaded entities: out of the scope of this work. Rejected requests should not be prevented service indefinitely and should be instructed when to retry: out of the scope of this work. Should support enumerable/non-enumerable set of upstream servers: met. We have set no constraint on the set of upstream elements. Moreover, our experiments show that our method also provides proportionally fair service to upstream servers. Should work across domains: met. The only requirement is that interconnection points re-populate feedback information after passing SIP requests across the domain border. In case servers of a certain domain do not implement our scheme then REQ4 applies. Should not enforce certain session priority: met. We only set priority on which messages to process, while the preference for processing different type of sessions such as emergency ones is irrelevant to our scheme. Should work in case network elements cannot provide explicit feedback: met. In this case the upstream proxy will only use its own queue backlog to form the Lyapunov function. This is sufficient because any overload at the downstream entity will eventually cause backlog at the upstream proxy as well. Therefore, our proposed Lyapunov method will detect it and react accordingly. Also refer to REQ4. Should limit the overhead of SIP-OC: met. We use the top-most Via header and no new messages are generated. Must not provide an avenue for attacks: out of scope but can be partially met by having a confidential channel established between adjacent entities. Overload feedback should clearly discriminate among an IP:port, host, or URI to which the feedback applies so it is distinguishable by the upstream entity: met. Queue level feedbacks are sent along with the IP:port of the issuing downstream entity. Message priority during overload should be determined: met. We give higher priority to non-INVITEs over INVITE messages. No disproportionate benefit shall accrue to the users or operators of the elements that do not implement the mechanism: met. If an overloaded element does not im-
Please cite this article as: M. Jahanbakhsh et al., Lyapunov stability of SIP systems and its application to overload control, Computer Communications (2017), http://dx.doi.org/10.1016/j.comcom.2017.01.014
JID: COMCOM 16
ARTICLE IN PRESS
[m5G;February 14, 2017;10:52]
M. Jahanbakhsh et al. / Computer Communications 000 (2017) 1–17
plement our mechanism then its load will not be adequately throttled which degrades its throughput. On the other hand, if its immediately upstream proxy does not implement it then the overloaded proxy will resort to local rejection which again does not provide the upstream proxy with any advantage. REQ21 The overload mechanism should ensure that the system remains stable: met. This is ensured due to directly applying stability theory to the SIP-OC problem and is illustrated using many graphs and experiments. REQ22 It must be possible to disable the reporting of load information towards upstream targets based on the identity of those targets: This is out of the scope of our work but can be easily met. REQ23 The SIP-OC scheme should work with load balancers: not considered. However, we believe that our approach can be made to work in such a condition as follows: If the load balancer is a SIP entity then it will receive overload feedback from individual downstream servers and would also send upstream overload indications if necessary. On the other hand, if the load balancer has no knowledge of SIP and the entire farm of downstream servers are viewed as one aggregate server then it is possible to send an aggregate overload feedback right to the upstream server. This feedback is sent considering the total length of all server farm queues. The sender can be any of the downstream entities. This approach should easily work since the load balancer is effectively masking the entire server farm and replacing it with a single downstream server with much larger capacity and queue length.
References [1] J. Rosenberg, H. Schulzrinne, G. Camarillo, A. Johnston, J. Peterson, R. Sparks, M. Handley, E. Schooler, SIP: Session Initiation Protocol, 2002, (RFC 3261 (Proposed Standard)). [2] S.G.A. Ali, M.D. Baba, M.A. Mansur, L.M. Abdullah, Studying ims signalling in lte-based femtocell network, Int. J. Recent TrendsEng. Technol. 11 (2) (2014) 102. [3] C. Shen, H. Schulzrinne, A. Koike, A Mechanism for Session Initiation Protocol (SIP) Avalanche Restart Overload Control, Internet-Draft, Internet Engineering Task Force, 2014. Work in Progress [4] O. Dementev, O. Galinina, M. Gerasimenko, T. Tirronen, J. Torsner, S. Andreev, Y. Koucheryavy, Analyzing the overload of 3gpp lte system by diverse classes of connected-mode mtc devices, in: Internet of Things (WF-IoT), 2014 IEEE World Forum on, IEEE, 2014, pp. 309–3012. [5] W.-c. Feng, K.G. Shin, D.D. Kandlur, D. Saha, The blue active queue management algorithms, IEEE/ACM Trans. Netw. (ToN) 10 (4) (2002) 513–528. [6] J. Sun, K.-T. Ko, G. Chen, S. Chan, M. Zukerman, Pd-red: to improve the performance of red, IEEE Commun. lett. 7 (8) (2003) 406–408. [7] E. McMurry, B. Campbell, Diameter Overload Control Requirements, 2013, (RFC 7068 (Informational)). [8] J. Rosenberg, Requirements for Management of Overload in the Session Initiation Protocol, 2008, (RFC 5390 (Informational)). [9] S.A. Baset, V.K. Gurbani, A.B. Johnston, H. Kaplan, B. Rosen, J.D. Rosenberg, The session initiation protocol (sip): An evolutionary study, J. Commun. 7 (2) (2012) 89–105. [10] V. Hilt, E. Noel, C. Shen, A. Abdelal, Design Considerations for Session Initiation Protocol (SIP) Overload Control, 2011, (RFC 6357 (Informational)). [11] V. Gurbani, V. Hilt, H. Schulzrinne, Session Initiation Protocol (SIP) Overload Control, 2014, (RFC 7339 (Proposed Standard)). [12] Y. Wang, Sip overload control: a backpressure-based approach, in: SIGCOMM ’10: Proceedings of the ACM SIGCOMM 2010 Conference on SIGCOMM, ACM, New York, NY, USA, 2010, pp. 399–400. [13] S.V. Azhari, M. Homayouni, H. Nemati, J. Enayatizadeh, A. Akbari, Overload control in sip networks using no explicit feedback: A window based approach, Comput. Commun. 35 (12) (2012) 1472–1483. [14] E. Noel, P.M. Williams, Session Initiation Protocol (SIP) Rate Control, 2015, (RFC 7415). [15] A.R. Montazerolghaem, M.H. Yaghmaee, Sip overload control testbed: Design, building and evaluation, arXiv preprint arXiv: 1307.3411(2013). [16] A. Montazerolghaem, M.H.Y. Moghaddam, An overload window control method based on fuzzy logic to improve sip performance, Int. J. Inf.Commun. Technol. Res. 6 (4) (2014) 13–26.
[17] L. De Cicco, G. Cofano, S. Mascolo, Local sip overload control: controller design and optimization by extremum seeking, IEEE Trans. Control Netw. Syst. 2 (3) (2015) 267–277. [18] V. Hilt, I. Widjaja, Controlling overload in networks of sip servers, in: IEEE International Conference on Network Protocols, 2008, pp. 83–93. [19] H. Tarasiuk, J. Rogowski, On Performance Evaluation of Loss-Based Overload Control Mechanism in Signaling System with SIP Protocol, Springer International Publishing, Cham, pp. 345–354. [20] B. Upadhyay, A. Mishra, S.B. Upadhyay, Aipc : Counter-active analysis of overload control mechanism for sip server, Int. J Comput. Eng.Technol. (IJCET) 5 (1) (2014) 128–140. [21] J. Wang, J. Liao, T. Li, J. Wang, J. Wang, Q. Qi, Probe-based end-to-end overload control for networks of sip servers, J. Netw. Comput. Appl. 41 (2014) 114–125. [22] M. Ohta, Overload control in a sip signaling network, Int. J. Electr. Electron.Eng. 3 (2) (2009) 87–92. [23] E.C. Noel, C.R. Johnson, Initial simulation results that analyze sip based voip networks under overload, in: International Teletraffic Congress, 2007, pp. 54–64. [24] S. Montagna, M. Pignolo, Performance evaluation of load control techniques in sip signaling servers, in: ICONS, 2008, pp. 51–56. [25] R.G. Garroppo, S. Giordano, S. Spagna, S. Niccolini, Queueing strategies for local overload control in sip server, in: Proceedings of the 28th IEEE Conference on Global Telecommunications, in: GLOBECOM’09, IEEE Press, Piscataway, NJ, USA, 2009, pp. 466–471. [26] P. Abaev, Y. Gaidamaka, A. Pechinkin, R. Razumchik, S. Shorgin, Simulation of overload control in sip server networks, in: Proc. of the 26th European Conference on Modelling and Simulation, ECMS, 2012, pp. 533–539. [27] P. Abaev, R.V. Razumchik, Queuing model for sip server hysteretic overload control with bursty traffic, in: Internet of Things, Smart Spaces, and Next Generation Networking, Springer, 2013, pp. 383–396. [28] Y. Gaidamaka, A. Pechinkin, R. Razumchik, K. Samouylov, E. Sopin, Analysis of an m| g| 1| r queue with batch arrivals and two hysteretic overload control policies, Int. J. Appl. Math.Comput. Sci. 24 (3) (2014) 519–534. [29] R.G. Garroppo, S. Giordano, S. Niccolini, S. Spagna, A prediction-based overload control algorithm for sip servers, IEEE Trans. Netw. Ser.Manage. 8 (1) (2011) 39–51. [30] Y. Hong, C. Huang, J. Yan, Applying control theoretic approach to mitigate sip overload, Telecommun. Syst. 54 (4) (2013) 387–404. [31] A. Akbar, S.M. Basha, S.A. Sattar, A novel rate based overload control method for sip servers, in: Emerging Research in Computing, Information, Communication and Applications, Springer, 2016, pp. 143–152. [32] K.K. Guduru, J. Usha, Queuing strategies for self overload control in sip servers, in: Contemporary Computing and Informatics (IC3I), 2014 International Conference on, IEEE, 2014, pp. 1007–1011. [33] T. Nozoe, M. Noguchi, M. Sakuma, M. Isawa, Live migration of virtualized carrier grade sip server, Int. J. Commun. Netw.Inf. Secur. 8 (2) (2016) 57. [34] Y. Hong, C. Huang, J. Yan, Mitigating sip overload using a control-theoretic approach, in: Global Telecommunications Conference (GLOBECOM 2010), IEEE, 2010, pp. 1–5. [35] M. Jahanbakhsh, S. Azhari, J. Enayati-Zadeh, M. Baghdadi, Local and distributed sip overload control solution improving sustainability of sip networks, Int. J. Commun. Syst. (2016). n/a–n/a [36] A. Montazerolghaem, S.A. Hosseini Seno, M.H. Yaghmaee, F. Tashtarian, Overload mitigation mechanism for voip networks: a transport layer approach based on resource management, Trans. Emerg. Telecommun.Technol. 27 (6) (2016) 857–873. [37] M.J. Handley, SIP: Session Initiation Protocol, 1999, (RFC 2543). URL https:// rfc-editor.org/rfc/rfc2543.txt. [38] J. Liao, J. Wang, T. Li, J. Wang, J. Wang, X. Zhu, A distributed end-to-end overload control mechanism for networks of sip servers, Comput. Netw. 56 (12) (2012) 2847–2868. [39] K. Samouylov, Y. Gaidamaka, P. Abaev, M. Talanova, O. Pavlotsky, Analytical modeling of rate-based overload control with token bucket traffic shaping on client side, in: Proceedings of the 29th European Conference on Modelling and Simulation (ECMS 2015). Digitaldruck Pirrot GmbH, Albena, 2015, pp. 669– 674. [40] Y. Gaidamaka, Model with threshold control for analyzing a server with an sip protocol in the overload mode, Autom. Control Comput. Sci. 47 (4) (2013) 211–218. [41] S. Shorgin, K. Samouylov, Y. Gaidamaka, S. Etezov, Polling system with threshold control for modeling of sip server under overload, in: Advances in Systems Science, Springer, 2014, pp. 97–107. [42] C. Shen, H. Schulzrinne, E. Nahum, Session initiation protocol (sip) server overload control: Design and evaluation, in: Principles, Systems and Applications of IP Telecommunications. Services and Security for Next Generation Networks, Springer, 2008, pp. 149–173. [43] E. Noel, C. Johnson, Novel overload controls for sip networks, in: 21st International Teletraffic Congress, 2009., 2009, pp. 1–8. [44] S. Montagna, M. Pignolo., Comparison between two approaches to overload control in a real server: ”local” or ”hybrid” solutions? in: Proceedings of the 15th IEEE Mediterranean Electrotechnical Conference, in: MELECON 2010, IEEE Press, Piscataway, NJ, USA, 2010, pp. 845–849. [45] J. Sun, J. Hu, R. Tian, B. Yang, Flow management for sip application servers, in: ICC, 2007, pp. 646–652. [46] J. Sun, H. Yu, W. Zheng, Flow management with service differentiation for sip application servers, in: CHINAGRID ’08: Proceedings of the The Third China-
Please cite this article as: M. Jahanbakhsh et al., Lyapunov stability of SIP systems and its application to overload control, Computer Communications (2017), http://dx.doi.org/10.1016/j.comcom.2017.01.014
JID: COMCOM
ARTICLE IN PRESS M. Jahanbakhsh et al. / Computer Communications 000 (2017) 1–17
[47]
[48]
[49]
[50] [51]
Grid Annual Conference, IEEE Computer Society, Washington, DC, USA, 2008, pp. 272–277. T. Nozoe, M. Noguchi, M. Sakuma, K. Misawa, M. Isawa, Priority-based queue management scheme to reduce overloads on sip servers, Int. J. Comput. TheoryEng. 8 (5) (2016) 389. K. Samouylov, P. Abaev, Y. Gaidamaka, A. Pechinkin, R. Razumchik, Analytical modelling and simulation for performance evaluation of sip server with histeretic overload control, in: Proc. of the 28th European Conference on Modelling and Simulation (ECMS 2014), May, 2014, pp. 27–30. P. Abaev, A. Pechinkin, R. Razumchik, Analysis of queueing system with constant service time for sip server hop-by-hop overload control, in: Modern Probabilistic Methods for Analysis of Telecommunication Networks, Springer, 2013, pp. 1–10. S.V. Azhari, H. Nemati, Stability analysis of tandem sip proxies, in: IEEE International Communications Conference, ICC, 2012, pp. 1244–1248. G. Mishra, S. Dharmaraja, S. Kar, Performance analysis of sip signaling network using hierarchical modeling, in: Communications (NCC), 2014 Twentieth National Conference on, 2014, pp. 1–5.
[m5G;February 14, 2017;10:52] 17
[52] Y. Hong, C. Huang, J. Yan, A comparative study of sip overload control algorithms, in: J.H. Abawajy, M. Pathan, M. Rahman, A.K. Pathan, M.M. Deris (Eds.), Network and Traffic Engineering in Emerging Distributed Computing Applications, Hershey: IGI Global, 2013. [53] Y. Hong, C. Huang, J. Yan, Modeling chaotic behaviour of sip retransmission mechanism, Int. J. Parallel EmergentDistrib. Syst. 28 (2) (2013) 95–122. [54] L. Leskelä, Stabilization of an overloaded queueing network using measurement-based admission control, J. Appl. Probab. (2006) 231–244. [55] Y. Wang, Y. Liu, F. Wu, B. Tang, Load balance in wireless sensor network through stability routing, J. Inf. Comput. Sci. 8 (6) (2011) 903–910. [56] C.-p. Li, E. Modiano, Receiver-based flow control for networks in overload, in: INFOCOM, 2013 Proceedings IEEE, 2013, pp. 2949–2957. [57] A. Montazerolghaem, M.H.Y. Moghaddam, F. Tashtarian, Overload control in sip networks: A heuristic approach based on mathematical optimization, in: 2015 IEEE Global Communications Conference (GLOBECOM), 2015, pp. 1–6. [58] M. Ohta, Overload protection in a sip signaling network, in: International Conference on Internet Surveillance and Protection., 2006. 11–11.
Please cite this article as: M. Jahanbakhsh et al., Lyapunov stability of SIP systems and its application to overload control, Computer Communications (2017), http://dx.doi.org/10.1016/j.comcom.2017.01.014