Computer Communications 29 (2006) 3679–3690 www.elsevier.com/locate/comcom
A robust packet scheduling algorithm for proportional delay differentiation services Jianbin Wei b
a,*
, Cheng-Zhong Xu a, Xiaobo Zhou b, Qing Li
a
a Department of Electrical and Computer Engineering, Wayne State University, Detroit, MI 48202, USA Department of Computer Science, University of Colorado at Colorado Springs, Colorado Springs, CO 80918, USA
Received 20 November 2004; received in revised form 15 June 2006; accepted 20 June 2006 Available online 12 July 2006
Abstract Proportional delay differentiation (PDD) model is an important approach to relative differentiated services provisioning on the Internet. It aims to maintain pre-specified packet queueing-delay ratios between different classes of traffic at each hop. Existing PDD packet scheduling algorithms are able to achieve the goal in long time-scales when the system is highly utilized. This paper presents a new PDD scheduling algorithm, called Little’s average delay (LAD), based on a proof of Little’s Law. It monitors the arrival rate of the packets in each traffic class and the cumulative delays of the packets and schedules the packet according to their transient queueing properties in order to achieve the desired class delay ratios in both short and long time-scales. Simulation results show that LAD is able to provide predictable and controllable services in various system conditions and that such services, whenever feasible, can be guaranteed, independent of the distributions of packet arrivals and sizes. In comparison with other PDD scheduling algorithms, LAD can provide the same level of service quality in long time-scales and more accurate and robust control over the delay ratio in short time-scales. In particular, LAD outperforms its main competitors significantly when the desired delay ratio is large. 2006 Elsevier B.V. All rights reserved. Keywords: Quality of service; Packet scheduling; Proportional delay differentiation; Little’s law
1. Introduction The past decade has seen an increasing demand for provisioning of different levels of quality of service (QoS) on the Internet to support different types of network applications and different user requirements. To meet this demand, two service architectures are proposed: Integrated Services (IntServ) [4] and Differentiated Services (DiffServ) [2]. IntServ requires to reserve routing resources along the service delivery paths using a protocol like Resource Reservation Protocol (RSVP) for QoS guarantee. Since all the routers need to maintain per-flow state information, this requirement hinders the IntServ architecture from widespread deployment. *
Corresponding author. Address: Department of Mathematics and Computer Science, South Dakota School of Mines & Technology, Rapid City, SD 57701, USA. Tel.: +1 13135775147; fax: +1 13135771101. E-mail addresses:
[email protected] (J. Wei),
[email protected] (C.-Z. Xu),
[email protected] (X. Zhou),
[email protected] (Q. Li). 0140-3664/$ - see front matter 2006 Elsevier B.V. All rights reserved. doi:10.1016/j.comcom.2006.06.009
In contrast, DiffServ aims to provide differentiated services among classes of aggregated traffic flows, instead of offering absolute QoS measures to individual flows. It is implemented by stateless priority scheduling in the core routers, in collaboration of stateful resource management at the network edges. To receive different levels of QoS, application packets are assigned to different service types or traffic classes at the network edges [21]; DiffServ-compatible routers in the network core perform stateless prioritized packet forwarding, so-called ‘‘per-hop behaviors’’ (PHBs), to the classified packets. Due to its per-class stateless routing, the DiffServ architecture exhibits a good scalability. Early PHB proposals of DiffServ focused on the construction of versatile end-to-end services with guaranteed QoS. Two examples are ‘‘expedited forwarding’’ [10] and ‘‘assured forwarding’’ [9]. An alternative to absolute DiffServ is a relative differentiated services model to quantify the difference of QoS between classes of traffic. In this model, the network traffic is divided into a number of classes
3680
J. Wei et al. / Computer Communications 29 (2006) 3679–3690
with ordered QoS requirements in such a way that the traffic of a higher ranked class receives better (or at least no worse) services than the traffic of lower ranked classes, in terms of local (per-hop) metrics like queueing delay and packet loss [5]. The Internet traffic is classified by applications and users at the network edges according to various service-level cost/performance agreements and policy constraints. Due to the lack of admission control or resource reservation in the network core, relative DiffServ provides no QoS guarantee to services. However, with the support of server-side QoS adaptation, DiffServ-capable routers assure end-to-end relative service differentiation. Although absolute DiffServ is desired to Internet services like audio/video streaming applications that have hard time constraints, relative DiffServ with respect to delay is sufficient to soft real-time applications like e-Commerce transactions. Recently, Dovrolis, et al. defined a proportional delay differentiation (PDD) model in support of relative DiffServ [6,7]. It ensures the quality spacing between classes of traffic to be proportional to certain pre-specified class differentiation parameters. Since then, many packet scheduling algorithms have been developed to implement the PDD model. Representatives of the PDD algorithms include backlog-proportional rate (BPR) [6], joint buffer management and scheduling (JoBS) [14], waiting-time priority (WTP) [7], adaptive WTP [12], hybrid proportional delay (HPD) [7], and mean-delay proportional (MDP) [18]. They demonstrated various characteristics in support of the PDD model in different class load conditions and different time-scales. Most of them are capable of achieving desired delay ratios, if the ratios are feasible, under heavy load conditions and in long time-scales. For example, HPD takes into account the delay of head-of-line packet of a backlogged class and the delay of departed packets simultaneously and achieves desired delay ratios in both short and long time-scales on average when the delay ratio is small. However, it yields large ratio variations in statistics in short time-scales. For large desired delay ratios, large relative errors (on average) are observed as well. Details of the PDD algorithms are reviewed in Section 2. In this paper, we present a new PDD algorithm, called Little’s average delay (LAD), based on a proof of Little’s Law. Little’s Law regarding a queueing system states the stationary relationship between queue length, arrival rate, and queueing delay on average in the long run [15]. Its proof reveals a transient property regarding the queueing length [22]. That is, the queueing length of a class at any time is equal to the product of the traffic arrival rate and the waiting time of backlogged packets, plus the experienced delay of departed packets. Accordingly, LAD monitors the average arrival rate of every traffic class and the queueing delay of arrived packets, including both the waiting packets in the queue and departed packets for the purpose of controlling the delay ratio in both long and short time-scales. Simulation results show that LAD overcomes the limitations of its main competitors: AWTP, HDP, and MDP. Specifically, whenever the PDD model of a desired class
delay ratio is feasible, LAD is capable of providing more accurate and robust control over the delay ratio than its competitors in short time-scales. The improvement is significant when the desired delay ratio is large. In long time-scales, LAD performs no worse than its competitors under any load conditions. Moreover, the performance of LAD is independent of the distributions of packet arrivals and packet sizes because of the generality of Little’s Law. The remainder of the paper is organized as follows. Section 2 gives an overview of the PDD model and a brief review of the existing PDD algorithms. Section 3 presents the LAD algorithm and discusses its design and implementation issues. Section 4 evaluates the algorithm via extensive simulation and compares it with other PDD algorithms. We conclude this paper in Section 5. 2. Background and related work 2.1. Proportional Delay Differentiation Model We consider packet scheduling of a lossless, work-conserving, and non-preemptive link that services M (M P 2) first-come-first-served (FCFS) queues, one for each traffic class (Hereinafter the terms ‘‘queue’’ and ‘‘class’’ will be used interchangeably). The lossless property requires that the average arrival rate of the aggregate traffic must be less than the link capacity and that there is enough queueing space to buffer backlogged packets. The workconserving property is that the link is never left idle as long as there are backlogged packets waiting for service in the queues. The non-preemptive property requires the transmission of a packet cannot be interrupted. It is assumed that the traffic in different classes has independent arrival and packet size processes. Therefore, the aggregate traffic of the queueing system is determined by the superposition of the M traffic streams. Denote ki the arrival rate of class i, 1 6 i 6 M. It follows that arrival rate of aggregate trafPthe M fic of the system, k ¼ i¼1 ki . Let C represents P the link capacity. The system utilization rate q ¼ ð M i¼1 ki xi Þ=C, where xi represents the average packet size of of class i. The objective of the PDD model is to control the quality spacing between different classes so that their average delay ratios be proportional to certain class differentiation parameters pre-defined by network operators. Let Wi denote the average delay of class i, and di the pre-defined delay differentiation parameter. The PDD model requires to ensure that for any two classes i and j, 1 6 i, j 6 M, W i di ¼ : W j dj
ð1Þ
Notice that the PDD model is not always feasible. Because of the additional constraint of the Conservation Law in priority queueing systems, there may not necessarily exist a work-conserving scheduler that can meet the constraint of (1). It is known that the average delay of a class has a minimum value due to its inherent class load and the minimum value can be achieved by the use of a strict priority
J. Wei et al. / Computer Communications 29 (2006) 3679–3690
based scheduler over the classes. In the strict priority based scheduler, class i cannot be serviced until the queues for classes i + 1, i + 2, . . ., M are all empty. Assume W sp i be the average delay of class i due to the strict priority scheduling. The upper bound of feasible delay ratio for a G/G/1 sp system with two classes is W sp 1 =W 2 [7]. For a M/G/1 system with two classes, the upper bound is 1/(1 q) [3,12]. The PDD model requires the differentiated services be predictable and controllable in the sense that network operators should be able to adjust the service quality spacing between any two classes by setting delay differentiation parameters and that the average delay ratios of different classes be consistent with their delay differentiation parameters in both long and short time-scales. Such consistency should also be maintained for individual packets departed successively from different classes. In addition, the service differentiation should be independent of class load traffic characteristics. Regardless of the distributions of packet arrivals and sizes, the consistency should be maintained whenever the PDD model is feasible. 2.2. PDD scheduling algorithms Since Dovrolis, et al. formulated the PDD model in 1999 [5], many packet scheduling algorithms have been proposed for this model. The existing PDD algorithms can be classified into three categories: rate-based, time-dependent priority based, and Little’s Law based. Rate-based algorithms, as exemplified by BPR [6] and JoBS [14], adjust service rate allocations of classes dynamically to meet the proportional delay differentiation constraints. BPR adjusts the service rate of a class according to its backlogged queue length, while JoBS allocates the service rate of a class based on delay predictions of its backlogged traffic. Other examples in this category include dynamic weighted fair queueing [13], and proportional queue control mechanism [17]. Rate-based scheduling algorithms are able to provide different levels of QoS to different classes. But the accuracy of their control over the delay ratio is unfortunately dependent on class load conditions [6]. Due to the dynamic nature of the Internet traffic, the class load distribution on a router tends to change quickly with time. This limits the applicability of the rate-based PDD algorithms. Time-dependent priority based algorithms adjust the priority of a backlogged class according to the experienced delay of its head-of-line packet. WTP [7] and adaptive WTP [12] fall into this category. In WTP, the priority of a backlogged class is adjusted to be proportional to its head-of-line packet’s delay normalized with respect to its delay differentiation parameter. It uses a set of control variables bi, 1 6 i 6 M, as scaling factors of the adjustment. ci denote the delay of the head-of-line packet of class Let W i. According to WTP, the priority of backlogged class i, Pi, ci =di dynamically on departure of each packet. is set to bi W A packet of a backlogged class with the highest priority will be forwarded next. Albeit simple, WTP implements the PDD model only when the system utilization rate q
3681
approaches unity [20,19]. In Section 4, we will also show that its achieved class delay ratios in short time-scales exhibit high statistical variations. It is noticed that the WTP control parameters bi depend upon the class load distributions. In [12], Leung, et al. derived a necessary condition, with respect to the class load distribution, for feasible WTP control parameters to achieve desired class delay ratios. The derivation is based on an assumption that the arrival process of each traffic class is a Poisson distribution. They developed an adaptive WTP (AWTP) algorithm to adjust the feasible set of control parameters {bi} according to the delay of the head-of-line packet in each class and the class load distribution. The authors demonstrated the accuracy and adaptivity of the algorithm, in comparison with WTP, under various system utilization rates and in both short and long time-scales. The authors argued that AWTP was applicable to the traffic of more practical Pareto distributions. Their simulation assumed a Pareto distribution with the shape parameter a = 1.9. We note that the shape parameter a characterizes the degree of self-similarity of network traffic. The larger a, the less bursty and self-similar behaviors were observed in trace studies [11]. In Section 4, we will show that AWTP fails to realize the PDD model for Pareto distributions with small a. The third class of the PDD algorithms is based on the Little’s Law, which relates the average queue length (in terms of the number of packets in queue) to the average arrival rate and the average waiting time of packets. For a given arrival rate of the packets, the PDD algorithms control the actual delay ratio between different classes by equalizing their normalized queue lengths with respect to the pre-defined delay differentiation parameters. The equalization process is a feedback control process based on the average delay of the arrived packets in a time window. PAD, HPD, and MDP are three representatives in this category. The LAD algorithm proposed in this paper belongs to this category as well. They differ in the way of average delay calculation. It is known that at time t, arrived packets of a class in a time window [t s, t], can be in one of the two states: departed or waited in the queue. PAD considers the average delay of departed packets in the time window only. It is capable of achieving the PDD model constraints in various load conditions, provided that the desired class delay ratios are feasible. However, PAD exhibits a pathological behavior in short time-scales – occasionally higher classes to experience larger delays than lower classes – because the algorithm ignores the waiting time of backlogged packets. To solve this issue, HPD was proposed to take into account the average delay of departed packets, and the delay of the head-of-line packet at the same time. fi denote the average delay of the departed packets. Let W fi þ ð1 gÞ W ci , HPD deploys a simple linear function: g W 0 6 g 6 1, to measure the queueing delay of class i. The weighting parameter g can be adjusted according to network operators’ requirements. HPD is reduce to PAD when g = 1 and WTP when g = 0. HPD enhances the aver-
3682
J. Wei et al. / Computer Communications 29 (2006) 3679–3690
age control quality of PAD, and meanwhile avoids its pathological behavior problem. However, in Section 4, we will show that HPD achieves the class delay ratio with large statistical variations in short time-scales. MDP considers the delay of all arrived packets of each class in a time window [t s, t]. In addition to the delay of the packets in the window, it also takes into account the estimated delay of backlogged packets in future [t, 1). In Section 4, we will show that MDP delivers performance comparable to HPD in both short and long timescales, and MDP achieves the class delay ratio with smaller variations. However, its performance deteriorates as the target quality spacing between the classes is enlarged. Note that PAD, HPD and MDP aim to equalize the normalized queue length of different classes based on heuristic delay information of arrived packets. LAD presented in this paper is based on a proof of Little’s Law [22]. It considers the delay of departed packets as well as the delay of the packets in the backlogged queue in the time window [t s, t].
3.1. Little’s Law For a G/G/1 queueing system, Little’s Law states that the average number of packets in the system is equal to the product of average arrival rate of packets and the average waiting time of the packets in the system. Define L(T) as the average number of the packets in the system during the time interval [0,T], W(T) as the waiting time per packet averaged over all packets, k(T) as the average arrival rate. Suppose W(T) and k(T) have limits as T fi 1, that is W ¼ lim W ðT Þ; and k ¼ lim kðT Þ: T !1
Then, the limit of L(T), denoted by L, exists and L ¼ kW :
ð2Þ
The beauty of Little’s Law (2) is that it does not depend upon any particular queueing discipline (packet scheduling algorithms); nor does it depend upon any specific assumptions regarding the packet arrival distribution or the packet size distribution. It is applicable to the queueing system of each traffic class in the PDD model. LAD algorithm controls the delay ratios between different classes based on the Little’s Law. Substituting L/k for W, the objective of PDD model in (1) leads to a new constraint: Li Lj ¼ ; ki di kj dj
N ðT Þ ¼ N d ðT Þ þ N c ðT Þ:
ð4Þ
Define Ii(t) as the presentation function of packet pi at time t, that is 1; if packet pi is present at time t; I i ðtÞ ¼ 0; otherwise:
3. Little’s average delay algorithm
T !1
Notice that (2) reveals an asymptotic (or stationary) relationship between the queue length, packet arrival rate, and packet waiting time in the system. It is not enough to guide PDD scheduling because the objective of proportional delay needs to meet in small time windows. Because most of Web requests are small in size [1], provisioning of relative delay differentiation service in short time-scales is as important as in long time-scales. LAD algorithm is based on a transient property of the queueing system, as revealed by a proof of the Little’s Law [22]. Following is a sketch of the proof. Suppose that packets p1, p2, . . . arrive at time t1, t2, . . . (0 6 ti < ti+1), and depart at td1 , td2 , . . .. The packets are not necessarily forwarded in FCFS discipline. Denote N(T) the total number of arrived packets in the time interval [0,T]; Nd(T) and Nc(T) the number of departed packets and the number of waiting packets in queue, respectively. It follows that at time T,
ð3Þ
for any two classes i and j. To ensure proportional delay differentiation between two classes, their normalized queue length with respect to their respective arrival rates and delay differentiation parameters should be kept equal. The LAD algorithm is to control the delay ratio by adjusting their average queueing lengths according to their arrival rates.
Then, we have N c ðT Þ ¼
N ðT Þ X
I i ðtÞ:
ð5Þ
i¼1
Since packet pi stays in queue during the interval [ti, tdi ] and its queueing delay wi ¼ tdi ti , we have ( Z T wi ; tdi 6 T ; I i ðtÞdt ¼ ð6Þ T ti ; tdi > T : 0 Therefore, the cumulative queue length in the interval [0, T] is Z T Z T NX ðT Þ N c ðtÞdt ¼ I i ðtÞdt 0
0
¼ ¼
i¼1
XN d ðT ÞþN c ðT Þ Z X
i¼1
T
0
fi:tdi 6T g
wi þ
I i ðtÞdt X
ðT ti Þ;
ð7Þ
fi:ti 6T ;tdi >T g
and the average queue length in interval [0, T] is Z 1 T LðT Þ ¼ LðtÞdt ¼ kðT ÞW ðT Þ; T 0
ð8Þ
where kðT Þ ¼
N ðT Þ ; T P
W ðT Þ ¼
ð9Þ P
wi
fi:tdi 6T g
N ðT Þ
þ
ðT ti Þ
fi:ti 6T ;tdi >T g
N ðT Þ
:
ð10Þ
J. Wei et al. / Computer Communications 29 (2006) 3679–3690
Assume that k(T) and W(T) exist as T fi 1, (8) leads to that L ¼ lim kðT ÞW ðT Þ ¼ kW : T !1
ð11Þ
This completes the proof. 3.2. The LAD algorithm The basic idea of LAD algorithm is to control the delay ratio of classes by monitoring their arrival rates and queueing delays of their arrived packets based on transient relationship between the queue length, arrival rate and waiting time, as revealed by (8). In particular, (10) defines the average waiting time per packet in a window of size T. The numerator of the first term actually represents the accumulated delays of all departed packets and the numerator of the second term represents the accumulated waiting time of the packets in the backlogged queue so far at time T. Accordingly, we define the LAD algorithm as follows. For class i, the LAD scheduler maintains three control variables to monitor its traffic flow over finite time window T: the cumulative delays of departed packets W di ; the number of arrived packets Ni; and current queue length N ci . At the beginning of each time window, these variables are (re)initialized. Note that the size of T is in terms of number of successively departed packets from the system. These control variables are updated according to the following rules: At the beginning of each time window, N i N ci and Wi ‹ 0. Upon the receipt of a packet of class i, the packet is timestamped and Ni ‹ Ni + 1, and N ci N ci þ 1. c After transmitting a packet of class i, N i N ci 1 and d d Wi W i þ w, where w is the measured delay of the packet. Let W ci denote the current cumulative delay of backlogged packets in the queue i. According to (10), we set the priority of class i as Pi ¼
W di þ W ci : N i di
ð12Þ
Whenever the queueing system is available for packet transmission, a backlogged packet of class j* with the highest priority is selected. That is, j ¼ arg max P i : 16i6M
ð13Þ
Ties for the highest priority are broken by serving the packet that has entered the queueing system earliest. Note that the validity of Little’s Law does not depend upon any particular queueing discipline. Therefore, the next packet can be any backlogged packet if a more complicated scheduling algorithm is needed. There are some important issues in the implementation of the LAD algorithm. The foremost is the time window
3683
size T. It is known that Little’s Law is valid when the time window is sufficiently large. However, provisioning PDD services in short time-scales is as important as in long time-scales. A good choice of T should strike a balance between system stability and responsiveness. On one side, a large T would avoid abrupt changes of average queueing delay due to bursty traffic. Particularly, when T is sufficiently large, the average delay of the packets in the time window would hide the effect of the distributions of packet arrivals and packet sizes. On the other side, a small T would lead to an agile scheduler that responds to the change of traffic conditions quickly. As we shall presented in Section 4.4, LAD is able to provide PDD services in both long and short time-scales. Thus, we believe T = 100 packets is a good choice since it gives good responsiveness to LAD. Another important implementation issue is the calculation of the cumulative delay of backlogged packets in each queue W ci . It is too costly to scan each queue to re-calculate W ci every time when a packet is to be transmitted and the priority of each class needs to be adjusted. Instead, we calculate W ci recursively in the following way. Suppose that at time u when the last transmitted packet was selected from class i, the class has m backlogged packets p1, p2, . . ., pm and their arrival times are t1, t2, . . ., tm, respectively. Since the queueing system assumes no FCFS scheduling principle, the next packet to be selected for forwarding from class i can be any packet in the queue. Without loss of generality, we assume packet pk is forwarded at time u + s, that is, the time interval between two successive packet departures is s. Suppose there are n new packet arrivals during the interval and their arrival time are tm+1, tm+2, . . ., tm+n. It follows that W ci ðu þ sÞ ¼ W ci ðuÞ ðu tk Þ þ ðm 1Þ s n X þ ðu þ s tmþj Þ:
ð14Þ
j¼1
Recall that the traffic of class i has an arrival rate of ki. During the interval of s, the average number of packet arrivals is kis. Note that E[s] is the average service time of a packet. For a stable system, it should be less than or equal to the average inter-arrival time. Thus, for E[n], the average number ofP packets entering into the system during N s, we have E½n ¼ i¼1 E½ni 6 1, where E[ni] is the average arrived packets from class i. Therefore, the main computation overhead of the updating is the multiplication, which is appropriate in real environment [7]. For each packet transmission, LAD needs to calculate and compare the priorities of all backlogged classes, which requires at most N calculations and N 1 comparisons. The calculation overhead is mainly due to the update of control variables and timestamping operations. The cost for update is small because it involves only a few addition operations. The timestamping operation is needed so that the delay of a packet can be measured. As pointed out in [7], we do not expect this requirement to be an important
3684
J. Wei et al. / Computer Communications 29 (2006) 3679–3690
difficulty in practice. Furthermore, the timestamping operation is required for all PDD algorithms. The AWTP, WTP, PAD, HPD, and MDP also require N 1 comparisons to select the backlogged class with highest priority. In addition, AWTP needs to record the number of arrived packets to estimate the load of every class, which requires a few addition operations. The PAD needs to record the delay of all departed packets with a few addition operations. The HPD is a combination of PAD and WTP. The number of addition operations is same as PAD since WTP requires no such addition operations. The MDP requires more operations than LAD since it needs to estimate the average delay of backlogged packet as well as calculate the delay of all arrived packets. In summary, the complexity of LAD is smaller than MDP, similar to PAD and HPD, and slightly larger than AWTP and WTP. 4. Simulation results In this section, we present simulation results of LAD to demonstrate its performance and properties. We first investigate the predictability and controllability of LAD under different system conditions (i.e. class load distribution and system utilization) and in various time-scales. Then, we illustrated its generality by using arbitrary distributions of packet arrivals and sizes. Finally, we compare LAD with other PDD algorithms, including WTP, AWTP, ADP, HPD, and MDP. A primary performance metric is error between desired class delay ratio and achieved ratio. The results are an average of 1000 runs. The experiments are based on a model that consists of three main components: • N packet generators, which generate packets of independent arrival and size distributions by using GNU Scientific Library [8]. • N packet queues, which hold packets of corresponding classes. Although LAD assumes no requirements for packet scheduling for each queue, the simulator assumes the FCFS principle to ensure the same service order in different run. • A packet scheduler that performs LAD and other PDD algorithms under various settings of proportional delay differentiation parameters. The experiments assumed the distributions of packet arrivals and sizes are similar to those in [7,12]. That is, the interarrivals between packets of a class are identically independent distributed (i.i.d.) random variables of a Pareto or Poisson distribution. The packets are uniform in size or variable with a small number of choices. The transmission time of a packet is proportional to its size. The probability density function P(y) of the Pareto distribution and its mean l P ðyÞ ¼
aba ; y aþ1
ð15Þ
ab ; ð16Þ a1 where a, 0 < a < 2, is the shape parameter (also called ‘‘tail’’ index) and b is the scale parameter. It is known that a characterizes the degree of self-similarity of network traffic. The larger a, the less bursty and self-similar behaviors were observed in trace studies [11]. As in [7], we set the shape parameter a = 1.5 in the experiments. In the Pareto distribution, the system utilization is controlled by adjusting b. When a = 1.5, by (16), we have the system utilization rate
l¼
q¼
1 a1 1 ¼ ¼ : l ab 3b
ð17Þ
Finally, we note that a possible performance factor of the PDD algorithms is class load distribution. The load metric of a traffic class is in terms of the service time. The load distribution between two class i and j is equal to the ratio of ki to kj, when the packets in different classes have the same size. 4.1. Predictability of LAD We investigated the predictability of LAD in experiments over three classes of Pareto distributed traffic (a = 1.5). Their delay differentiation parameters (d1, d2, d3) were set to (4,2,1) and the class load distribution (k1, k2, k3) varies between (1,1,1), (1,2,4) and (4,2,1). We obtained the simulations results in short (T = 100), moderate (T = 1000), and long (T = 10,000) time-scales, as the system utilization rate q varies. Due to space limitation, Fig. 1 shows the results of short and long timescales. From Fig. 1, we can see that LAD can achieve the desired delay ratios accurately, independent of class load distributions, in long time-scales in all the test cases. As the timescale decreases, the resulted error between achieved and desired delay ratios under moderate system utilization rates becomes no longer negligible, particularly in the case the desired delay ratio of 4. This is mainly because the system utilization difference in short and long time-scales. Although the experiment assumed stable system utilization rates in the long run, the system utilization rates were hardly maintained accurately in the short run. Due to the burstiness of the Internet traffic, the system transient utilization rates in short time windows were often lower than the stationary rates. That means even when the desired delay ratio can be achieved in long time-scales, it maybe infeasible in short time-scales. For example, when the long run system utilization is 65%, the transient system utilization in short time-scales in most of the runs would be too low to ensure the feasibility of the desired delay ratio of 4 between class 1 and class 3. When the utilization in long run reaches 80%, the transient utilization rates in most of the runs becomes high enough to achieve the target delay ratios.
J. Wei et al. / Computer Communications 29 (2006) 3679–3690
a
3685
a
b b
Fig. 2. Individual packet delays of three classes using LAD in different system utilizations. (a) q = 90%; (b) q = 65%.
Fig. 1. Delay ratios of three classes using LAD in different system utilizations and time-scales. (a) T = 100 packets. (b) T = 10,000 packets.
Note that the feasibility of LAD is affected by the system utilization rate does not mean that LAD has requirements for an accurate estimation of the system utilization rate. It can be seen from the results from the setting of class delay ratio of 2 in Fig. 1. As long as that the utilization is high enough for the PDD ratio to be feasible, LAD can achieve ratio in both short and long time-scales, independent of class load distributions. To investigate the behaviors of individual packets from different classes, we plot in Fig. 2 the individual delays of packets departed from 3000th to 4000th time unit. Fig. 2(a) shows the results when the system utilization rate was set to 90% and all classes had the same load (i.e., k1 = k2 = k3). From this figure, we can see that individual packets of a higher class exhibit smaller changes of delay than those of a lower class in both dimensions of time (x axis) and delay (y axis). We refer to this as a ‘‘scale-difference’’. For example, in the delay dimension, the delay of individual packets in class 3 ranges between 15 to 55 time units, while that of class 1 changes in a range of 110 to 210 time units. This scale-difference suggests that when each class has backlogged packets in queue, the packets
in class 3 (higher quality class) tend to be selected with a higher probability than those in class 1. In the time dimension, the monotonous delay increasing or decreasing periods of delays in class 3 are between 10 to 50 time units, while those of class 1 are between 50 to 150 units. This time-wise scale-difference implies that the backlogged packets in class 3 tend to be forwarded at a faster rate than those in class. When the system load is low, the scale-difference phenomenon is not obvious in the time dimension, but remains manifesting in the delay dimension. Fig. 2(b) depicts the individual packet delays where the system utilization was set to 65%. In general, the queue lengths in a low utilized system are short and their differences are often small. This leads to a small period of monotonous changes in each class in the time dimension and minimal difference between their periods. In the delay dimension, the backlogged packets of a higher class still have a higher probability to be forwarded than those of a lower class and therefore experience smaller delays. In Fig. 2(a) and (b), there are no signs of the pathological behavior with the packets in different classes. In summary, we conclude that LAD is capable of providing predictable delay differentiation services in both short and long time-scales. In heavy load conditions, pack-
3686
J. Wei et al. / Computer Communications 29 (2006) 3679–3690
ets in different classes receive different qualities of services, which are consistent with their per-class differentiation parameters. 4.2. Controllability of LAD We studied the controllability of LAD through an experiment over two classes of traffic with an equal load distribution (i.e. k1 = k2). Fig. 3 plots the achieved class delay ratios in various time scales, in comparison with the desired delay ratios as the system utilization rate changes. From this group of figures, it can be observed that LAD is capable of achieving the target delay ratio of 2 in all the time-scales that we tested under medium or high system utilization rates. When the desired delay differentiation ratio d1/d2 increases to 4, a marginal error (less than 10%) occurs. Fig. 3(b) shows that when the desired ratio goes up to 8, LAD cannot meet the PDD constraints in the long timescale, unless the system utilization rate is higher than 70%. Aforementioned, this is because the ratio is infeasible when q 6 70% in the long timescale. Whenever
the ratio is feasible, the service difference between these two classes can be controlled accurately by network operators. With the increase of the desired class delay ratio d1/d2, the minimum system utilization rate that makes the given delay ratio feasible increases. In comparison with Fig. 3(a), it can be seen that with the increase of time-scales, the error of the achieved ratio decreases. For example, the delay ratio d1/d2 = 16 cannot be achieved in the small timescale of 100 packets, unless the system utilization rate q P 85%. With the increase of the timescale T to 10,000 packets, the desired ratio can be approximated when q becomes equal to or larger than 75%. This is due to the difference between the system transient utilization in short time-scales and the stationary system utilization in long time-scales, as we explained in Section 4.2. Since the system transient utilization rate is often smaller than the stationary utilization rate for bursty traffic, a feasible delay ratio in short timescale tends to be feasible in long time-scales too under the same traffic conditions. 4.3. Generality of LAD
a
b
Fig. 3. Delay ratios of class 1 to class 2 using LAD in different system utilizations and time-scales. The desired delay ratios are 2, 4, 8, 16, and 32. (a) T = 100 packets; (b) T = 10,000 packets.
Recall that the generality of a PDD algorithm means that its performance should be independent of the distributions of packet arrivals and sizes. Results in preceding experiments are for Pareto distributed traffic with a uniform packet size. We studied the generality of LAD through experiments over two classes with equal class loads in the long timescale. In addition, we assumed that the packet arrivals of the same class followed a Pareto or a Poisson distribution and that the packet sizes of each class were variable. The variable packet sizes were set in the same pattern as in [6]. That is, 40% of the packets were set to 40 bytes, 50% packets 550 bytes, and 10% packets 1500 bytes. The transmission time of a packet with average size (441 bytes) is referred to as one time unit. We carried out the experiments for traffic with different packet arrival distributions: Pareto, Poisson, and mixed distributions. In the mixed arrival distribution, we assumed packets of class 1 followed a Poisson distribution and packets of class 2 are Pareto distributed. Fig. 4 presents the simulation results of the experiments with variable packet sizes. To measure the actual feasible range for the same packet stream, the results due to the strict priority-based algorithm are included, as well. From Fig. 4, we can observe that the feasible delay ratio increases with the system utilization rate and that whenever the desired ratio is feasible, it can be achieved by LAD accurately; otherwise, the achieved ratio by LAD is very close to that measured by the strict priority-based algorithm. For example, Fig. 4(a) shows that the maximum achievable delay ratio is 22 when the system utilization rate q = 85%. Although the desired ratio of 32 is infeasible, LAD achieves the maximum feasible ratio. Note that in Fig. 4(b), the feasible ranges are different from those calculated using 1/(1 q). It is due to the differ-
J. Wei et al. / Computer Communications 29 (2006) 3679–3690
a
3687
the measured feasible ratio changes slightly from 9.65 to 9.84. In summary, Fig. 4 shows that the performance of LAD is not affected by system utilization rates and is independent of the packet arrival distribution, whenever the desired delay ratio is feasible. In comparison with the results from uniform packet sizes in preceding experiments, we can also observe the independence of packet sizes. This generality of LAD is attributed to the generality of Little’s Law. 4.4. Comparison with other PDD algorithms
b
c
Fig. 4. Delay ratios of class 1 to class 2 using LAD and priority-based packet scheduling algorithms. The packets have various sizes. (a) Pareto arrival distribution; (b) Poission arrival distribution; (c) Mixed arrival distribution.
ence between the system transient utilization rate in a short time window and the stationary utilization rate in the long run. Such difference becomes smaller with the increase of the time window size. Hence, the maximum feasible delay ratio becomes closer to the theoretical upper bound. For example, with the upper bound 10 at q = 90%, when the time window increases from 100,000 to 1,000,000 packets,
We compared LAD with other PDD algorithms, including WTP, AWTP, PAD, HPD, and MDP. In the experiments, we assumed two classes of traffic with the equal class loads. The packet arrivals of each class followed a Pareto distribution (a = 1.5) and all the packets had equal size. We generated a stream of packets beforehand and assumed the same packet stream for all the experiments with different PDD algorithms. Recall that AWTP adjusts the feasible set of control parameters according to the delay of the head-of-line packet in each class and the system utilization. Our implementation of AWTP used the jumping window method, as suggested in [12], to estimate the arrival rate of traffic. HPD is a hybrid of WTP and PAD with a weighting parameter g. We set the parameter g to 0.875 as recommended in [7]. MDP takes into account the delay of departed packets and the estimated delay of all other waiting packets in the determination of class priorities. Although the MDP authors suggested a simplified method to approximate the average delay for all arrived packets to make a tradeoff between quality and run-time overhead [18], we implemented its original version in this experiment. 4.4.1. Comparison in short timescale We first compared the short timescale performance of the algorithms under different system utilization rates. The time window was set to T = 100 packets. The simulation results for the cases of d1/d2 = 2 and 8 are plotted in Fig. 5(a) and (b), respectively. Fig. 5(a) shows that all the PDD algorithms, except AWTP, can meet the PDD constraints to an acceptable extent for a small delay ratio under moderate and high system load conditions. In particular, LAD achieves the desired delay ratio with minimum errors consistently. In contrast, HPD and MDP demonstrate good performance under moderate load conditions, but yield relatively large errors when the system utilization rate goes up to as high as 90%. Recall that HPD is a hybrid of WTP and PAD. Both WTP and PAD gain performance as the utilization rate increases, but their improvement rates are different. Hence, a linear combination of the WTP and PAD with a constant weighting parameter g in HPD is expected to generate a convex performance plot with respect to the utilization rate. This impact of linear combination can be seen more clearly in Fig. 5(b) for the case of a large desired delay ratio.
3688
J. Wei et al. / Computer Communications 29 (2006) 3679–3690
The algorithm is based on an assumption that the arrival process of each traffic class is a Poisson distributions. The authors showed that AWTP be applicable to the traffic of a Pareto distribution with the shape parameter a = 1.9. We note that the shape parameter a characterizes the degree of self-similarity and burstiness of network traffic. The larger a, the less bursty and self-similar behaviors were observed in trace studies [11]. Given the fact that 0 < a < 2, the Pareto distribution with a = 1.9 bears much resemblance to a Poisson distribution. Fig. 5 shows the results from a Pareto distribution with a = 1.5. We experimented with both AWTP and LAD for more Pareto distributions with various a and plotted the results in Fig. 6. From this figure, we observe that the control parameters of AWTP are unable to meet the PDD constraints over general Pareto distributed traffic. By contrast, LAD is insensitive to the Pareto distribution shape. From Fig. 5 we can also observe that the performance of AWTP is close to that of LAD when the system utilization is as large as 95%. This is because when the system is heavily loaded, the system can be described using a fluid model, which is independent of the distribution of packet arrivals [16].
Fig. 5. Delay ratios of class 1 to class 2 using different PDD algorithms in different system utilizations. The classes have the same load and the time window is 100 packets. (a) d6/d2 = 2; (b) d6/d2 = 8.
The reason for the inaccuracy of MDP in highly utilized systems is the estimation error of the delays of backlogged packets in a time window of [t s, 1) at any time t. Although MDP can measure the delay of packets in the time window [t s, t], MDP uses a lower bound to estimate the delay of the packets in future [t, 1). With the increase of the system utilization, there are more packets in a backlogged queue during the interval s, and consequently the estimation error increases. When the system utilization rate goes beyond certain point, the impact of estimation accuracy becomes significant and the overall performance of MDP starts to deteriorate. Fig. 5(b) shows that the estimation error is exaggerated in the case of a large desired delay ratio and the gap between LAD and MDP is enlarged. Fig. 5 shows that WTP yields relatively large errors when the system utilization rate is moderate. This is consistent with the findings of other researchers [7,12]. AWTP was proposed as a remedy of this problem [12]. It relies on a policy iteration algorithm to adjust the feasible set of control parameters according to the delay of the headof-line packet in each class and the class load distributions.
4.4.2. Comparison in long time-scales We compared LAD with other PDD algorithms, focusing on their robustness in different time-scales. The experiment settings remain the same as in the last one, except that the system utilization rate is fixed at 90%. Fig. 7(a) and (b) show three percentiles (the 5th, 50th, and 95th) of achieved delay ratios for the target ratio of 2 and 8, respectively. We give the numbers in the figures directly for some of the large percentiles. Fig. 7 shows that LAD achieves the target ratios accurately in all of the time-scales that we tested and outperforms its competitors consistently in terms of the errors in various percentiles. This implies that LAD is more robust to keep the class delay ratio under control and deliv-
Fig. 6. Impact of scaling parameter a on AWTP when applied to Pareto distributed traffic. d1/d2 = 2.
J. Wei et al. / Computer Communications 29 (2006) 3679–3690
a
b
Fig. 7. Percentiles of achieved delay ratios using different PDD algorithms in different time-scales. These two classes have the same load and the system utilization rate is 90%. (a) d6/d2 = 2; (b) d6/d2 = 8.
er the desired ratio with small statistical variations. Although all the algorithms are able to meet the PDD constraints in terms of their medians with small deviations in long time-scales, LAD is outstanding to provide tight and robust control in a statistical sense over the class delay ratio in short time-scales. Fig. 7(a) shows that all the PDD algorithms, except MDP, are able to achieve the delay ratio of 2 with a high probability in the short timescale of 100 packets. LAD demonstrates an excellent robustness because more than 90 percentage of the total runs would produce ratios between 1.6 and 2.4. MDP is robust, as well, but its achieved ratios center around 1.6. In contrast, AWTP, HPD, and PAD exhibit a ‘‘heavy tail’’ property in that majority of the runs, under the control of the algorithms, would lead to delay ratios that are close to the target ratio of 2, but the algorithms could lose the control in a few occasions. Fig. 7(a) also shows that the success probability of the algorithms increases with the time scale. In the long timescale of 10,000 packets, all the algorithms are able to achieve the target delay ratio robustly. In comparison with Fig. 7(b), we observe that all the PDD algorithms lose certain degrees of robustness when the desired delay ratio d1/d2 is large. In the short timescale
3689
of 100 packets, LAD performs slightly better than WTP and AWTP, but outperforms PAD, HPD and MDP significantly in terms of their medians. This also indicates that 100-packet is a good choice of T. The goodness of WTP and AWTP are mainly due to the high utilization ratio (90%) that we assumed in this experiment. WTP and AWTP provide consistent levels of QoS, independent of the desired delay ratio. This is because they use extra control parameters to adjust the impact of the pre-defined delay ratio. But they are lack of robustness because of their medians with large statistical variations. PAD, HPD and MDP perform in a similar way to LAD. They differ in the way of delay estimation of arrived packets. Fig. 7(b) shows that their performance gap in short time-scales gets larger as the delay ratio increases. As the timescale increases, all the PDD algorithms gain more control over the delay ratio. In the long timescale of 10,000 packets, LAD provides similar levels of QoS to HPD and MDP. We conclude that in short time-scales, LAD consistently outperforms its competitors for large target delay ratios. Meanwhile, 100 packets is a good choice of time window size T. For small target ratios, most of the algorithms can provide an acceptable level of quality of service. Under heavy load conditions and in long time-scales, LAD performs similarly to HPD and MDP. WTP, AWTP, and PAD are not as robust as the others due to their large statistical variations. 5. Conclusions We have proposed a new proportional delay differentiation algorithm, called LAD, to implement the PDD model. The algorithm is derived from a proof of Little’s Law. It monitors the arrival rate of the packets in each traffic class and their cumulative delays and achieves the desired class delay ratios in both short and long timescales. Simulation results have shown that LAD is able to meet the PDD constraints, independent of the distributions of packet arrivals and packet sizes. In comparison with other PDD algorithms, LAD provides the same level of service quality in long time-scales and more accurate and robust control over the delay ratio in short time-scales. Our future work will focus on the proportional loss differentiation model and combine it with the PDD model. The requirements for absolute QoS, such as the end-toend delay will also be investigated. References [1] M. Arlitt, C.L. Williamson, Internet web servers: workload characterization and performance implications, IEEE/ACM Transactions on Networking 5 (5) (1997) 631–645. [2] S. Blake, D. Black, M. Carlson, E. Davies, Z. Wang, W. Weiss, An Architecture for Differentiated Services, IETF, Request for Comments 2475, December 1998. [3] G. Bolch, S. Greiner, H. de Meer, K.S. Trivedi, Queueing Networks and Markov Chains, John Wiley & Sons, 1999.
3690
J. Wei et al. / Computer Communications 29 (2006) 3679–3690
[4] R. Braden, D. Clark, S. Shenker. Integrated services in the internet architecture: an overview, Request for Comments 1633, June 1994. [5] C. Dovrolis, P. Ramanathan, A case for relative differentiated services and the proportional differentiation model, IEEE Network 13 (5) (1999) 26–34. [6] C. Dovrolis, D. Stiliadis, P. Ramanathan, Proportional differentiated services: delay differentiation and packet scheduling, in: Proceedings of SIGCOMM, 1999, pp. 109–120 [7] C. Dovrolis, D. Stiliadis, P. Ramanathan, Proportional differentiated services: delay differentiation and packet scheduling, IEEE/ACM Transactions on Networking 10 (1) (2002) 12–26. [8] Free Software Foundation. GSL – GNU Scientific Library.
. [9] J. Heinanen, F. Baker, W. Weiss, J. Wroclawski, Assured forwarding PHP group, Network Working Group, Request for Comments 2597, June 1999. [10] V. Jacobson, K. Nichols, K. Poduri, An expedited forwarding PHB, Network Working Group, Request for Comments 2598 (1999). [11] W.E. Leland, M.S. Taqqu, W. Willinger, D.V. Wilson, On the selfsimilar nature of ethernet traffic (extended version), IEEE/ACM Transactions on Networking 2 (1) (1994) 1–15. [12] M.K. Leung, J.C. Lui, D.K. Yau, Adaptive proportional delay differentiated services: characterization and performance evaluation, IEEE/ACM Transactions on Networking 9 (6) (2001) 801–817. [13] C.-C. Li, S.-L. Tsao, M.C. Chen, Y. Sun, Y.-M. Huang, Proportional delay differentiation service based on weighted fair queueing, in: Proceedings of IEEE International Conference on Computer Communications and Network (ICCCN), October 2000, pp. 418–423. [14] J. Liebeherr, N. Christin. JoBS: Joint buffer management and scheduling for differentiated services, in: Proceedings of IWQoS 2001, Karlsruhe, Germany, June 2001, pp. 404–418. [15] J.D. Little, A proof of the queueing formula L = kW, Operations Research 9 (1961) 383–387. [16] V. Misra, W.-B. Gong, D. Towsley, Fluid-based analysis of a network of aqm routers supporting tcp flows with an application to red, in: Proceedings of Sigcomm, 2000, pp. 151–160. [17] Y. Moret, S. Fdida, A proportional queue control mechanism to provide differentiated services, in: Proceedings of International Symposium on Computer and Information System (ISCIS), Belek, Turkey, October 1998. [18] T. Nandagopal, N. Venkitaraman, R. Sivakumar, V. Bharghavan, Delay differentiation and adaptation in core stateless networks, in: Proceedings of IEEE Infocom, Tel-Aviv, Israel, April 2000, pp. 421–430. [19] R.D. Nelson, Heavy traffic response times for a priority queue with linear priorities, Operations Research 38 (3) (1990) 560–563. [20] A. Netterman, I. Adiri, A dynamic priority queue with general concave priority functions, Operations Research 27 (6) (1979) 1088–1100. [21] K. Nichols, S. Blake, F. Baker, D. L. Black, Definition of the differentiated services field (DS Field) in the IPv4 and IPv6 headers, Network Working Group, Request for Comments 2474, December 1998. [22] S.J. Stidham, A last word on L = kW, Operations Research 22 (2) (1974) 417–421.
Jianbin Wei received the BS degree in computer science from Huazhong University of Science and Technology, China, in 1997, and the MS and PhD degrees in computer engineering from Wayne State University in 2003 and 2006, respectively. His research interests are in computer communications and networks, distributed and Internet computing systems.
Cheng-Zhong Xu received the BS and MS degrees in computer science from Nanjing University in 1986 and 1989, respectively, and the Ph.D. in computer science from the University of Hong Kong in 1993. He is an Associate Professor in the Department of Electrical and Computer Engineer of Wayne State University. His research interests lie in distributed are in distributed and parallel systems, particularly in resource management for high performance cluster and grid computing and scalable and secure Internet services. He has published more than100 peer-reviewed articles in journals and conference proceedings in these areas. He is the author of the book Scalable and Secure Internet Services and Architecture (CRC Press, 2005) and a coauthor of the book Load Balancing in Parallel Computers: Theory and Practice (Kluwer Academic, 1997). He serves on the editorial boards of J. of Parallel and Distributed Computing, J. of Parallel, Emergent, and Distributed Systems, J. of High Performance Computing and Networking, and J. of Computers and Applications. He was the founding program cochair of International Workshop on Security in Systems and Networks (SSN), the general co-chair of the IFIP 2006 International Conference on Embedded and Ubiquitous Computing (EUC’06), and a member of the program committees of numerous international conferences. His research was supported in part by the US National Science Foundation, NASA, and Cray Research. He is a recipient of the Faculty Research Award of Wayne State University in 2000, the President’s Award for Excellence in Teaching in 2002, and the Career Development Chair Award in 2003. He is a senior member of the IEEE.
Xiaobo Zhou received the BS, MS, and PhD degrees in computer science from Nanjing University, in 1994, 1997, and 2000, respectively. He is an assistant professor in the Department of Computer Science at the University of Colorado at Colorado Springs. His research interests are in scalable distributed systems and Internet services, network communications and security. He has published about 40 articles in journals and peerreviewed conference proceedings in these areas. He was a founding program co-chair of the IEEE International Workshop on Security in Systems and Networks (SSN), the workshops chair of the IFIP 2006 International Conference on Embedded and Ubiquitous Computing (EUC’06), and a program committee member of numerous IEEE conferences. He was a guest coeditor of the Journal of Parallel and Distributed Computing and the Journal of Network and Computer Applications. He serves on the editorial board of Journal of Autonomic and Trusted Computing. Dr. Zhou’s research was supported in part by the US Air Force Research Laboratory. He was a visiting scientist in 1999 and a Postdoctorate research associate in 2000 at the Paderborn Center for Parallel Computing, University of Paderborn, Germany. From January 2001 to August 2003, he was a visiting assistant professor in the Department of Computer Science at Wayne State University, Detroit. He was the recipient of the Outstanding Researcher of the Year of the EAS College and the CRCW award of the University of Colorado at Colorado Springs in 2005. He is a member of the IEEE Computer Society. Qing Li received the BS and MS degrees in civil engineering from Huazhong University of Science and Technology, China, in 1997 and 2000, respectively. She received the MS degrees in computer engineering from Wayne State University in 2003.