Computer Networks 36 (2001) 709±728
www.elsevier.com/locate/comnet
A load adaptive mechanism for buer management James Aweya *, Michel Ouellette, Del®n Y. Montuno, Alan Chapman Nortel Networks, P.O. Box 3511, Station C, Ottawa, Canada K1Y 4H7 Received 17 March 2000; received in revised form 4 October 2000; accepted 30 January 2001 Responsible Editor: I. Stavrakakis
Abstract The basic idea behind active queue management schemes such as random early detection (RED) is to detect incipient congestion early and to convey congestion noti®cation to the end-systems, allowing them to reduce their transmission rates before queues in the network over¯ow and packets are dropped. The basic RED scheme (and its newer variants) maintains an average of the queue length which it uses together with a number of queue thresholds to detect congestion. RED schemes drop incoming packets in a random probabilistic manner where the probability is a function of recent buer ®ll history. The objective is to provide a more equitable distribution of packet loss, avoid the synchronization of ¯ows, and at the same time improve the utilization of the network. The setting of the queue thresholds in RED schemes is problematic because the required buer size for good sharing among TCP connections is dependent on the number of TCP connections using the buer. This paper describes a technique for enhancing the eectiveness of RED schemes by dynamically changing the threshold settings as the number of connections (and system load) changes. Using this technique, routers and switches can eectively control packet losses and TCP timeouts while maintaining high link utilization. Ó 2001 Elsevier Science B.V. All rights reserved. Keywords: TCP; Congestion control; Active queue management; Random early detection; Adaptive buer management
1. Introduction Congestion control in packet networks has proven to be a dicult problem, in general. The problem is particularly challenging in the Internet where congestion control is mainly provided by end-to-end mechanisms in TCP [1,2] in cooperation with queueing strategies in routers [3]. A large * Corresponding author. Tel.: +1-613-763-6491; fax: +1-613765-0698. E-mail addresses:
[email protected] (J. Aweya),
[email protected] (M. Ouellette), del®
[email protected] (D.Y. Montuno),
[email protected] (A. Chapman).
factor in TCPs widespread use is its ability to utilize a network bottleneck, adapting quickly to changes in oered load or available bandwidth. A more thorough treatment of TCP algorithms is given in [4]. Studies have shown that TCPs ability to share a bottleneck fairly and eciently decreases as the number of ¯ows increases [1,5±7]. The performance of TCP becomes signi®cantly degraded when the number of active TCP ¯ows exceeds the network's bandwidth-delay product measured in packets [7]. There is a signi®cant performance degradation when a TCP source operates with a window less than 1 which is mainly due to TCP entering exponential back-o and slow-start (see
1389-1286/01/$ - see front matter Ó 2001 Elsevier Science B.V. All rights reserved. PII: S 1 3 8 9 - 1 2 8 6 ( 0 1 ) 0 0 1 7 6 - 1
710
J. Aweya et al. / Computer Networks 36 (2001) 709±728
details in [4]). When the TCP sender's congestion window becomes less than 4 packets, TCP is no longer able to recover from a single packet loss since Fast-Retransmit [1,2] needs at least 3 duplicate acknowledgments (ACKs) to get triggered. Thus, windows below 4 are not amenable to the fast retransmission feature of TCP and a single packet loss will send the connection into timeout. With inadequate buering, TCP connections will tend to keep the buers full and the resulting packet losses will force many of the connections into timeout. As link utilization grows, premature loss may occur long before full bottleneck utilization is achieved due to the bursty nature of IP trac. In addition, most TCP ¯ows today can be characterized as short-lived ¯ows. The fast retransmission feature of TCP requires a window size of several packets in order to work and shortlived ¯ows often never achieve a large enough window size to be able to recover from a packet loss without entering timeout. Fine grain bandwidth sharing, where sharing is achieved over time intervals under 1 or 2 s, is important for interactive applications, but is not possible unless connections avoid timeouts. One way to solve this problem is to provision routers with not just one round-trip time (RTT) of buering, but buering proportional to the total number of active ¯ows. Ref. [7] suggests 10 times as many buers as ¯ows. Large buers should be possible in routers since the cost of memory is dropping rapidly because of demands in the computer industry. Many router vendors adopt the ``one RTT'' buering approach. Although this is a step in the right direction, this only addresses the link utilization problem, but not the packet loss problem. It is important to note that the requirement to support large aggregations of trac is of interest for any large deployment of IP, including existing or planned commercial IP services. Even though adequate buering proportional to the total number of active ¯ows is required to support high number of ¯ows and aggregations of TCP ¯ows, the combination of adequate buering and RED schemes [3,8] provides the means to support large number of ¯ows attempting to use full link bandwidth [5,7]. This combination allows high link utilization and eliminates the possibility
of degradation or collapse in link utilization due to excessive window size limits set by end users. The basic idea behind active queue management schemes such as random early detection (RED) is to detect incipient congestion early and to convey congestion noti®cation to the end-systems, allowing them to reduce their transmission rates before queues in the network over¯ow and excessive packet loss occurs. The basic RED scheme (and its newer variants) maintains an average of the queue length which it uses together with a number of queue thresholds to detect congestion. RED schemes drop incoming packets in a random probabilistic manner where the probability is a function of recent buer ®ll history. The objective is to provide a more equitable distribution of packet loss, avoid the synchronization of ¯ows, and at the same time improve the utilization of the network. The setting of the queue thresholds in RED schemes is problematic because the required buer size for good sharing among TCP connections is dependent on the number of TCP connections using the buer. To keep latency at the router low, it may be desirable to set the thresholds low. But setting it too low will cause many timeouts (when the number of connections is large) which drastically degrade the latency perceived by the user. On the other hand, setting the thresholds too high unnecessarily increases the latency when operating with a small number of connections. This means the setting of the thresholds should not be done in an ad hoc manner but should also be tied to the number of active connections sharing the buer. The danger of not tying the RED thresholds (and buer size) to the number of active connections is that, a RED queue operating with a large number of connections can cause the average queue size to exceed the maximum RED threshold, causing the router to operate as a drop-tail router and resulting in high packet loss rates and TCP timeouts. It is important to note that high network utilization is only good when the packet loss rate is low. This is because high packet loss rates can negatively impact overall network and end-user performance. A lost packet consumes network resources before it is dropped, thereby impacting the eciency in other parts of the network. As
J. Aweya et al. / Computer Networks 36 (2001) 709±728
noted earlier, high packet loss rate also causes long and unpredictable delays as a result of TCP timeouts. It is therefore desirable to achieve high network utilization but with low packet loss rates. This means that even if large buers are used in the network to achieve high utilization, appropriate steps must also be taken to ensure that packet losses are low. This paper describes a mechanism to enhance the eectiveness of RED schemes by dynamically changing the threshold settings as the number of connections changes. Previous work in this area has involved actually counting the number of connections by inspecting packet control headers [9], which is impractical in real core networks and problematic when encryption is present. The technique proposed here adjusts to the number of connections by inspecting the packet loss rate and adjusting the thresholds to keep the loss rate at a speci®ed value that would not cause too many timeouts. The behavior of TCP when RED is used is of interest here. When packet loss rate is low, TCP can sustain its sending rate by Fast-Retransmit/ Fast-Recovery. Fast-Retransmit/Fast-Recovery helps in the case of isolated losses, but not in bursty losses (i.e., losses in a single window). As the packet loss rate increases, retransmissions become driven by timeouts. Packet loss aects the achievable bandwidth of a TCP connection. The achievable bandwidth can be computed as the window size W of a connection divided by its RTT. In an ideal scenario where ¯ows are relatively long, where ¯ows do not experience any timeouts and where losses are spread uniformly over time, a simple model [10±12] for the average size of the congestion window W of a TCP connection in the presence of loss is s 8 W ; 3bp where b is the number of packets that are acknowledged by a received ACK, and is typically 2 since most TCP implementations employ ack-every-other-packet policies and p is the probability that a packet is dropped. As p increases, W becomes smaller. This equation can be seen as ap-
711
proximating the average number of packets a TCP source will have in ¯ight given the loss rate. However, it is known that the equation overestimates of the average number of packets in ¯ight [14,15]. For N connections, the number of packets in ¯ight will be proportionally larger and in the presence of congestion most of these packets will be stored in the congestion buer. Therefore, if the loss rate is to be maintained around a target level over a wide range of connections, then in order to prevent congestion collapse, it is desirable to adapt the buering according to the system load. It can be deduced from the above equation that buering that automatically adapts to the number of ¯ows while at the same time limits timeouts, queue length, and maintains high utilization is desirable. The same observation is given in [7]. This is the objective of the proposed control strategy described in this paper which does not involve ¯ow accounting complexities as in [9]. The above discussion shows that when the packet loss rate is high, TCP is not able to sustain its sending rate. The behavior of TCP becomes driven by timeouts; TCP needs to wait for a retransmission timer to expire, and then, sets the congestion window size to 1 and performs slowstart. When a timeout occurs, TCP stalls for a substantial period depending on the RTT and variance. The coarse timers used in the TCP implementations also have a great impact on the timeout duration. For example, BSD based systems have a timer granularity of 500 ms and its minimum timeout duration is 2 ticks; the minimum timeout is 750 ms on average. It should be noted also that timeouts result in wide variations in TCP behavior, which is problematic for conformance tests (e.g., fair share computations, rate metering, rate policing, rate conformance tests, etc.). A statistical test is applicable only over suf®ciently large samples. With timeouts, the number of required samples will be much larger because the packet arrival rate will be much lower [13]. An active queue management scheme will balance a buer ®ll which causes just enough loss to stabilize the aggregate input to the outgoing link rate. The loss rate applied re¯ects itself directly in the size of window that the connections can
712
J. Aweya et al. / Computer Networks 36 (2001) 709±728
achieve. Windows below a certain size are not amenable to the fast retransmission feature of TCP and a single packet loss will send the connection into timeout. Thus, we can determine the loss rate corresponding to a minimum safe window. By inspecting the current loss rate and changing the buer management parameters (i.e., thresholds) accordingly, we can ensure that the loss rate is maintained around a target level. As the number of connections decreases the threshold will be adjusted downwards to keep the latency at the router as small as possible. This is to prevent excessive queue buildup when the number of ¯ows is small. The mechanism described here is in the context of the DRED [8] algorithm. DRED is an improved algorithm for active queue management and is capable of stabilizing a router queue at a prespeci®ed level independent of the number of active connections. However, the main concepts are general enough to be applicable to other RED schemes. An application to the RED algorithm [3] is given in [16]. 2. A control technique for online adjustment of queue thresholds We describe in this section techniques that can be used to adjust in an online fashion the parameters of a random packet drop mechanism to accommodate changes in process dynamics and disturbances (e.g., changes in number of connections, trac load changes, etc.). This adaptation is important because the required buer size (and consequently the threshold settings) for good sharing among TCP connections is dependent on the number of TCP connections using the buer. To use the technique it is necessary to ®nd measurable variables that correlate well with changes in process dynamics. The variable can be, for instance, a measured signal (e.g., trac load), the control signal (e.g., drop rates), etc. The measured or computed packet drop rate is a good choice in our buer management problem since TCP timeouts and time delays are closely related to the loss rate. A block diagram of the control technique is shown in Fig. 1. The system can be viewed as
Fig. 1. Technique for online adjustment of queue thresholds.
having two loops. There is an inner loop, composed of the process and the packet drop controller, and an outer loop, which adjusts the drop controller parameters based on the operating conditions. The diagram shows a parameter estimator that determines the parameters of the process based on observations of process inputs (e.g., packet drop rates) and if required, outputs (e.g., queue sizes). There is also a control (or design) block that computes the packet drop controller parameters from the estimated process parameters. The process parameters are computed (or estimated) continuously and the packet drop controller parameters are updated when new parameters values are obtained. The technique described in Fig. 1 can be used to dynamically change the queue threshold(s) of the drop controller as the number of connections in the queue changes. The controller adjusts to the number of connections by inspecting the current estimated value of packet loss rate and adjusting the thresholds to keep the loss rate at a value that would not cause many timeouts. As the number of connections decreases the threshold will be adjusted downwards to keep the latency at the router as small as possible. This is to prevent excessive queue buildup when the number of ¯ows is low. A two-level control strategy as described in Fig. 1 is adopted, where a high-level controller (operating in the outer loop on a slower time scale) sets the queue threshold(s) and a low-level controller (operating in the inner loop on a faster time scale) computes the packet drop rate. We assume that there is sucient buering B such that the queue threshold(s) can be varied as needed. Ref. [7] suggests 10 times as many buers as the number
J. Aweya et al. / Computer Networks 36 (2001) 709±728
Fig. 2. Two-level control strategy.
of expected ¯ows. The two-level control strategy is illustrated in Fig. 2. The high-level controller can be viewed as a ``quasi-static'' or ``quasi-stationary'' controller. That is, when there are any system disturbances or perturbations (due to, for example, a change in the number of connections, etc.), the system is allowed to settle into a new steady state before the queue threshold(s) is changed. This ensures that the threshold(s) is not changed unnecessarily to aect the computation (and stability) of packet drop rate which in turn depends on the threshold settings. The queue threshold(s) is, as a result, typically piece-wise constant with changes occurring at a slower pace. The packet drop rate can be estimated by observing the packet arrivals and losses at the queue, or in some random drop schemes [3,8], the packet drop rate can be explicitly computed. In these schemes, the computed packet drop rate is a good measure of the actual loss rate since it approximates asymptotically the actual loss rate very well. The measured or computed packet drop rate can therefore be used as an indicator for varying the queue threshold(s) to further control packet losses. The queue threshold(s) can be varied dynamically to keep the packet loss rate close to a prespeci®ed target loss rate value hmax since TCP timeouts are very dependent on losses. Note that a target loss rate can only be attained if the network is properly engineered and there are adequate resources (e.g., buers, capacity, etc.). Most random drop schemes have typically two queue thresholds, an upper threshold T and a lower threshold L. In our control approach, the upper threshold T can be selected as the manipulating variable (to achieve the desired control behavior), while the lower threshold L can be tied to the upper threshold through a simple linear relationship,
713
e.g., L bT , where b (0 < b < 1 ) is a constant for relating L to T. The control target T can then be varied dynamically to keep the packet loss rate close to the pre-speci®ed value hmax . The measured or computed packet loss rate pl is ®ltered (or smoothed) to remove transient components before being used in the high-level controller. The smoothed signal is obtained using an EWMA (exponentially weighted moving average) ®lter with gain c (more weight given to the history) p^l
1
c^ pl cpl ;
0 < c < 1:
The actual packet loss rate is thus approximated by p^l . If no ®ltering is required, then p^l pl . The target loss rate is selected, for example, to be hmax 2%. We describe below two algorithms for dynamically adjusting the threshold T to achieve the desired control performance. Algorithm 1. Basic mechanism: If jp^l hmax j > e for d s, then T T DT sgnp^l hmax TTmax . min Sample implementation: begin: get new p^l value start time_now while jp^l hmax j > e stop time_now if stop ± start P d s T T pl hmax Tmax T DT sgn^ min break out of while loop endif get new p^l value endwhile go to begin /* time_nowis a free running system clock */ where e is a tolerance value to eliminate unnecessary updates of T (e.g., e 1%); d is an elapse time used to check the loss rate mismatch, e.g., d 1 s; DT is the control step size and is given as DT B=K, i.e., the buer size B is divided into K bands (e.g., K 8, 10, 12, etc.); sgn: denotes the sign of [.]; Tmax is an upper bound on T, since T cannot be allowed to be close to B resulting in drop-tail behavior; Tmin is a lower bound on T in order to maintain high link utilization since T
714
J. Aweya et al. / Computer Networks 36 (2001) 709±728
should not be allowed to be close to 0. Tmin can be set to one bandwidth-delay product worth of data. The above procedure ensures that T is only changed when it is certain that the packet loss rate has deviated from the target by an amount equal to e (1%). The system therefore tries to maintain losses within the range hmax e.
Fig. 3. Limiting the impact of setpoint changes.
Algorithm 2. Basic mechanism: The queue threshold T is only changed when the estimated packet loss rate p^l is either above or below the target loss rate hmax and the loss rate is deviating from the target loss rate. Sample implementation: every d s do the following: if p^l
now > hmax then if p^l
now > p^l
previous T increase T : T T DT Tmax min else do not change T else if p^l
now < hmax then if p^l
now < p^l
previous decrease T : T T DT TTmax min In the above procedure there is no notion of a loss tolerance e. Once the loss rate goes beyond/ below the target loss rate and there is an indication that there is a drift from the target, it is then that the queue threshold T is increased/decreased. Otherwise the threshold is ``frozen'' since further changes can cause the loss rate to deviate further from the target loss rate. The deviations p^l
now < p^l
previous (or p^l
now > p^l
previous) in the above procedure can be computed over shorter sampling intervals or over longer intervals, e.g., after every 1 s, depending on the measurement overhead allowed. 2.1. Queue threshold changes and transients In most buer management schemes, the control loops have constant thresholds (or setpoints). But as we have described above, the thresholds may change at certain time instances because of desires to change operating conditions such as user delays, loss rates, etc. A threshold is, as a result,
Fig. 4. Block diagram of a ramp unit.
typically piece-wise constant with changes occurring less frequently. It is therefore suitable to view the threshold as a step function (from a system/ control theoretic point of view). Since the threshold is a system disturbance 1 that we have access to, it is possible to feed it through a low-pass ®lter or a ramping module before it enters the low-level controller. In this way, the step function can be made smoother. This property can be useful, since most control designs having a good rejection of load disturbances give too large overshoots after a sudden change in the threshold. Smoothing of the queue threshold is particularly important when the step change DT is large. This way the command signals from the high-level controller can be limited so that setpoint changes are not generated at a rate faster than the system can cope with. We have not implemented ®ltering of the threshold T in our simulation since the DT was chosen reasonably small. The two-level control strategy with a mechanism for limiting the rate of change of the threshold(s) is shown in Fig. 3. The ramp unit or rate limiter is shown in Fig. 4. The output of the
1 There are typically three types of system disturbances, namely, setpoint changes, load disturbances, and measurement noise.
J. Aweya et al. / Computer Networks 36 (2001) 709±728
ramp unit will attempt to follow the input signals. Since there is an integral action in the ramp unit, the inputs and the outputs will be identical in steady state. Since the output is generated by an integrator with limited input signal, the rate of change of the output will be limited to the bounds given by the limiter. Fig. 4 can be described by the following equations: · in continuous-time domain: dy sat
e sat
T y; dt · in discrete-time domain: Dy
n y
n y
n 1 sat
T y
n y
n y
n 1 sat
T y
n 1:
1;
The amplitude limiter or saturation sat(e) is de®ned as 8 < a; e 6 a; sat
e e; jej < a; : a; e P a; where the limit a can be de®ned as a small fraction of the step size DT , i.e., a g DT , 0 < g < 1, if DT is large enough to require a smoothing of T. Smoothing may not be required for small DT . In Fig. 4, y is the smoothed threshold input to the low-level controller. The smoothing of T can be implemented in discrete time steps simply as follows: initialize y 0 while e T y 6 0 y y sat
e pass y to the low-level controller wait for next y computing time endwhile The time interval between the y computations should be smaller than the time interval between the threshold changes.
3. Load adaptive buer management using the DRED algorithm The Dynamic-RED (DRED) algorithm [8] is a new active queue management technique which
715
uses a simple control-theoretic approach to stabilize a router queue occupancy at a pre-speci®ed level independent of the number of active connections. The bene®ts of a stabilized queue in a network are high resources utilization, predictable delays, ease in buer provisioning, and trac-load-independent network performance in terms of trac intensity and number of connections. The actual queue size in the router is assumed to be sampled every Dt units of time (seconds), and the packet drop controller provides a new value of the drop probability pd every Dt units of time. Therefore, Dt is the sampling/control interval of the system. Let q(n) denote the actual queue size at discrete time n, where n 1Dt; 2Dt; 3Dt; . . . ; and let T denote the target buer occupancy. The goal of the controller is therefore to adapt pd so that the magnitude of the error signal e
n q
n T is kept as small as possible. A lower queue threshold parameter L is introduced in the control process to help maintain high link utilization and keep the queue size around the target level. The parameter L is typically set a little smaller than T, e.g., L bT , b 2 0:8; 0:9. DRED does not drop packets when q
n < L in order to maintain high resource utilization and also not to further penalize sources which are in the process of backing o in response to (previous) packet drops. Note that there is always a time lag between the time a packet is dropped and the time a source responds to the packet drop. The computation of pd , however, still continues even if packet dropping is suspended (when q
n < L). The DRED computations can be summarized as follows assuming a slow varying threshold T
n: DRED control parameters: Control gain a; Filter gain b; Target buer occupancy T(n); Minimum queue threshold L
n bT
n, b 2 0:8; 0:9. At time n · Sample queue size: q(n). · Compute current error signal: e
n q
n T
n. · Compute ®ltered error signal: e^
n
1 b e^
n 1 be
n.
716
J. Aweya et al. / Computer Networks 36 (2001) 709±728
· Compute current drop probability: " #p max 6 1 e^
n pd
n pd
n 1 a 2T
n 0
The 2T
n term in the above computation is simply a normalization parameter so that the chosen control gain a can be preserved throughout the control process since T can vary. The normalization constant in the original DRED algorithm [8] is the buer size B. · Use pd
n as the drop probability until time n 1, when a new pd is to be computed again. · Store e^
n and pd
n to be used at time n 1. In DRED, a good measure of the actual packet loss rate is the packet drop probability pd
n. pd
n converges asymptotically to the actual loss rate. Thus, pd
n approximates the actual loss rate very well. We therefore use pl
n pd
n as an indicator for varying the control target T in order to reduce losses. The pl
n pd
n values are ®ltered as described earlier to obtain p^l
n which is then used in the high-level controller. The ®ltered values are computed as follows: p^l
n
1
cp^l
n
1 cpd
n;
0 < c < 1:
4. Numerical results We conduct our simulations using the OPNET Modeler. Fig. 5 shows a simple bottleneck network
con®guration with two routers and a number of TCP connections. This con®guration can represent the interconnection of LANs to WANs or dial-up access users through WANs as in the case of ISP networks. The routers are interconnected with a T3 (45 Mbps) link. All other links have an equal speed such that they do not create any bottlenecks. The allocated buer size B of the bottleneck router (IP Router 1) is set to a relatively large value so that the DRED target buer occupancy T can be varied as needed. All links have a propagation delay of 10 ms. The RTT (RTT) of the network is therefore 60 ms. The active queue management algorithm running in each router is DRED [8]. The TCP sources are based on a TCP-Reno implementation. The Reno version uses the fastretransmit and fast-recovery mechanisms. The TCP connections are modeled as greedy FTP connections; that is, they always have data to send as long as their congestion windows permit. The maximum segment size (MSS) for TCP is set to 536 bytes. The receiver's advertised window size is set suciently large so that TCP connections are not constrained at the destination. The ackevery-segment strategy is used. The TCP timer granularity `tick' is set to 500 ms and the minimum retransmission timeout is set to two ticks. The primary performance metrics in our simulation experiments are the DRED queue size q
n, the DRED drop probability pd
n and the target buffer occupancy T. The queue size is sampled and
Fig. 5. Bottleneck con®guration.
J. Aweya et al. / Computer Networks 36 (2001) 709±728
plotted each 10 ms interval. The drop probability is plotted each Dt 10 packet transmission time [8]. Other metrics of interest include the average packet loss rate, the average fraction of time spent in timeout and the average number of timeouts per packet. The average packet loss rate is calculated as the number of packets dropped at the queue divided by the total number of packets that arrive at the queue. The average fraction of time spent in timeout is calculated as the total amount of time all sources spent in timeout divided by the average number of sources and the simulation time. The average number of timeouts per packet is calculated as the total number of timeouts from all sources divided by the total number of packets sent by all sources. For Algorithm 1, the decision to change or not the queue threshold is done at the end of each d s interval. The computed or measured packet loss rate values p^l are obtained each 100 ms interval. 4.1. Results without the load adaptive mechanism for buer management This section investigates the network performance when the load adaptive mechanism is disabled. That is, the target buer occupancy T is ®xed, and does not vary over time. With respect to Fig. 1, the high-level controller is disabled. The DRED algorithm is parameterized as Dt 10 packet transmissions times, a 0.00005, b 0:002 and L 0:9T , with a target buer occupancy of T BDP, where BDP is the bandwidth-delay product of the network. The normalization parameter of the error signal is set to two times the target buer occupancy T. In the ®rst and second scenarios, the initial number of connections is set to 100. In addition, for the ®rst scenario, 900 connections have a start time that is uniformly distributed over 200 s (i.e., all connections stay active to the end of the simulation once started), while for the second scenario, 900 connections have a start time and a stop time uniformly distributed over 200 s. The purpose of doing so is to have a trac scenario with random start and stop times thus simulating staggered connection setup and termination.
717
Figs. 6(a)±(c) present, respectively, the DRED queue size q
n, DRED drop probability pd
n and ®ltered packet loss rate for the ®rst scenario. Figs. 7(a±c) present the corresponding metrics for the second scenario. The ®ltered packet loss rate gives an estimate of the actual packet loss rate of the network (IP Router 1). Its value is plotted each 100 ms and is obtained by computing the packet loss rate over 100 ms intervals and then averaging using a moving average ®lter. The queue size ®gures show that the DRED controller (i.e., the low-level controller) is eective at stabilizing the queue size at the target buer occupancy T. Although DRED can eectively stabilize the queue size independent of the number of active connections, the drop probability however strongly depends on the number of active connections. This is required to eectively control the queue size. From the ®gures, it can also be seen that the drop probability computed by the DRED algorithm converges asymptotically to the ®ltered packet loss rate. Thus, the DRED drop probability is a good estimate of the actual packet loss rate. Table 1 presents other performance metrics of interest. The average packet loss rate, average fraction of time spent in timeout and timeouts per packet are presented. The performance metrics will be used for comparison purposes in subsequent sections. The average fraction of time spent in timeout for Scenario 2 is not available, since an average number of sources cannot be computed for this scenario. It is generally accepted in the Internet community that packet loss rates well under 10% are required to deliver acceptable TCP services and that properly engineered networks are expected to achieve packet loss rates of not more than 2%. The drop probability/packet loss rate ®gures presented above, as well as the values presented in Table 1, are far from achieving this goal. For the scenario under investigation, the ®gures show that the packet loss rate can increase well beyond 10% during periods of congestion. High loss rates can be an indication of congestion collapse and should be avoided, since they simply force most TCP connections into timeout periods, resulting in a signi®cant degradation of the TCP service oered to users. Morris [14] showed via an analytical
718
J. Aweya et al. / Computer Networks 36 (2001) 709±728
Fig. 6. (a) DRED queue size. (b) DRED drop probability. (c) Filtered packet loss rate. (Scenario 1: 100 connections start at time 0.0 s, 900 connections start at random times).
model and simulation that the average fraction of time spent in timeout increases dramatically when the packet loss rate is of the order of a few percent. This was attributed to the diculty of the fastretransmit/fast-recovery mechanisms of TCP to operate well in the presence of small windows. It was noted that for the proper operation of TCP's fast-retransmit and fast-recovery mechanisms, there should be at least ®ve packets worth of buering for each active TCP connection.
On a T3 link with predominantly dial-up trac, the maximum number of connections that should be supported is roughly 1300 for a minimum throughput per connection of 33.3 Kbps. As mentioned above, the proper operation of fastretransmit and fast-recovery (for 1300 connections) will be ensured when the network has a buering capacity of about 6500 packets (equivalent to ®ve packets worth of buering per connection). However, the buering capacity for the
J. Aweya et al. / Computer Networks 36 (2001) 709±728
719
Fig. 7. (a) DRED queue size. (b) DRED drop probability. (c) Filtered packet loss rate. (Scenario 2: 100 connections start at time 0.0 s, 900 connections start and stop at random times). Table 1 Loss rates, fraction of time spent in timeouts and timeouts per packet Scenario
Average packet loss rates (%)
Average fraction of time spent in timeout (%)
Average timeouts per packet
1 2
6.9 4.3
46.6 Not applicable (NA)
0.02 0.0073
network topology under investigation is only 1100 packets. The network can therefore deliver satisfactory performance only up to approximately 200 connections, and beyond that, the network is
forced to drop several packets in order to control the excess trac (and buer occupancy) caused by an increase in the system load. Consequently, congestion collapse arises and many connections
720
J. Aweya et al. / Computer Networks 36 (2001) 709±728
are forced into timeout periods (because TCP congestion windows decrease below values that result in unsatisfactory performance) due to the high loss rates imposed by the network. Thus, there is a need to operate below loss rates that will not cause excessive TCP timeouts by adaptively controlling the buer occupancy, while supporting a large number of connections. 4.2. Results with the load adaptive mechanism for buer management and using the DRED drop probability as the drop rate estimate This section investigates the network performance when the load adaptive mechanism is enabled. By enabling the high-level controller shown in Fig. 1, the target buer occupancy T is allowed to dynamically change over time. The scenarios are the same as in Section 4.1. Section 4.2.1 shows the results for Algorithm 1, while Section 4.2.2 presents the results for Algorithm 2. 4.2.1. Results for Algorithm 1 The DRED algorithm is parameterized as indicated in Section 4.1. The high-level controller is parameterized with a target loss rate of hmax 2%, a loss tolerance of e 1%, a control step size of DT 100 packets, a d 1:0 s, Tmin BDP and Tmax 20 BDP. Figs. 8(a)±(c) present, respectively, the queue size q
n, the DRED drop probability pd
n and the DRED target buer occupancy T for the ®rst scenario. Figs. 9(a)±(c) present the corresponding metrics for the second scenario. As expected, the queue size follows the change in the target buer occupancy T. Thus, the DRED controller is effective at stabilizing the queue size around the ``moving'' target buer occupancy. It can be seen that the target buer occupancy varies when the drop probability increases beyond hmax e or decreases below hmax e. We denote the region within hmax e as the ``band''. When the drop probability goes outside the band, the eect of varying the control target makes the drop probability converge back within the band. This is the desired operation of the high-level controller (i.e., load adaptive mechanism). Once the drop probability is within the band, the high-level con-
troller is ``frozen'' (i.e., the target buer occupancy T is maintained as is) to allow the queue size and drop probability to stabilize around the target buer occupancy T and the target loss rate hmax , respectively. In addition, when the system operates with a small number of connections and a target buer occupancy equal to the minimum threshold value Tmin , the drop probability is allowed to decrease below the band since the desired operation is to maintain the loss rate around or below the target loss rate. Thus these results show that the two-level control strategy is eective at keeping the drop probability close to the target loss rate hmax 2%. This solution is preferred over a static target buer occupancy since it signi®cantly helps in controlling the amount of packet loss, the queuing delay and ultimately the number of TCP timeouts. This can be seen by comparing Table 2 with Table 1, where the presented metrics clearly show the improvement. The results show that the use of a load adaptive mechanism in conjunction with active queue management can eectively control and reduce packet loss rates and TCP timeouts during periods of congestion. Although this solution comes at the expense of an increase in the available buering (which translates into higher queuing delays at the point of congestion), higher queuing delays are far more acceptable than extremely high loss rates and timeout delays. As previously observed, the proper operation of TCPs fast-retransmit and fast-recovery mechanisms can be ensured when there are at least ®ve packets worth of buering for each active connection. Since the system stabilizes with 1000 active connections (in the ®rst scenario), a total of 5000 packets worth of buering would be required for proper TCP operation. It can be seen from Fig. 8(c) that this is indeed the case, where the target buer occupancy of the router stabilizes around 5000 packets. Using the window size and drop probability relationship derived in [14], one would compute a value of around 11,000 packets worth of buering for this scenario. This shows that the equation overestimates the simulation value. Nevertheless, the equation is helpful in explaining TCP behavior. These results also suggest that the use of the load adaptive mechanism in conjunction with
J. Aweya et al. / Computer Networks 36 (2001) 709±728
721
Fig. 8. Algorithm 1: Results for a target loss rate of hmax 2%. (a) DRED queue size. (b) DRED drop probability. (c) DRED target buer occupancy. (Scenario 1: 100 connections start at time 0.0 s, 900 connections start at random times).
active queue management can not only control packet loss rates and TCP timeouts but can signi®cantly increase the number of connections a network can support. 4.2.2. Results for Algorithm 2 This section investigates the performance of Algorithm 2. The results are based on the same simulation scenarios and algorithm parameter values given in Section 4.2.1. Note that this algorithm does not have a loss tolerance parameter as explained in Section 2.
Figs. 10(a)±(c) present, respectively, the queue size q
n, the DRED drop probability pd (n) and the DRED target buer occupancy T for the ®rst scenario. Figs. 11(a)±(c) present the corresponding metrics for the second scenario. This algorithm retains all the strong features of Algorithm 1 shown in Section 4.2.1. In addition, this algorithm results in a tighter control of the drop probability around the target loss rate, without the need of the band or loss tolerance parameter. Note also that the target buer occupancy changes more rapidly, since the
722
J. Aweya et al. / Computer Networks 36 (2001) 709±728
Fig. 9. Algorithm 1: Results for a target loss rate of hmax 2%. (a) DRED queue size. (b) DRED drop probability. (c) DRED target buer occupancy. (Scenario 2: 100 connections start at time 0.0 s, 900 connections start and stop at random times).
Table 2 Loss rates, fraction of time spent in timeouts and timeouts per packet Scenario
Average packet loss rates (%)
Average fraction of time spent in timeout (%)
Average timeouts per packet
1 2
2.5 2.0
8.2 NA
0.0033 0.0026
algorithm constantly tries to maintain the drop probability around the target loss rate. The algorithm quickly tracks any deviation from the target loss rate and responds by changing the target buer occupancy.
Table 3 presents other metrics of interest. The results are similar to the results shown in Table 2 for Algorithm 1. It can be seen that the average packet loss rates are indeed lower and closer to the target loss rate for this algorithm, mainly due to the fact
J. Aweya et al. / Computer Networks 36 (2001) 709±728
723
Fig. 10. Algorithm 2: Results for a target loss rate of hmax 2%. (a) DRED queue size. (b) DRED drop probability. (c) DRED target buer occupancy. (Scenario 1: 100 connections start at time 0.0 s, 900 connections start at random times).
that Algorithm 2 does not have any loss tolerance parameter. In addition, these results are much better than the results presented in Table 1, where the target buer occupancy was static over time. 4.3. Results with the load adaptive mechanism for buer management and using the actual packet loss rate as the drop rate estimate This section presents results for the case where the measured actual packet loss rate is used as the input to the high-level controller. However, pre-
vious sections have shown that the computed DRED drop probability pd (n) is a good estimate of the actual packet loss rate. Using the actual packet loss rate as controller input requires some form of trac measurements which can increase the complexity of the system. The measured actual packet loss rate needs to be monitored. For instance, it could be measured each 100 ms by monitoring the number of packets dropped at the queue divided by the total number of packets that arrive at the queue. Then, this quantity could be ®ltered; for instance, using an EWMA
724
J. Aweya et al. / Computer Networks 36 (2001) 709±728
Fig. 11. Algorithm 2: Results for a target loss rate of hmax 2%. (a) DRED queue size. (b) DRED drop probability. (c) DRED target buer occupancy. (Scenario 2: 100 connections start at time 0.0 s, 900 connections start and stop at random times).
Table 3 Loss rates, fraction of time spent in timeouts and timeouts per packet Scenario
Average packet loss rates (%)
Average fraction of time spent in timeout (%)
Average timeouts per packet
1 2
2.0 1.9
7.1 NA
0.0026 0.0024
with a ®lter gain of 0.01 (more weight given to the history). This is the technique used in our simulations. This ®ltered quantity is plotted for each scenario, and is denoted as the filtered packet loss rate.
We use the same simulation scenarios and algorithm parameter values given in Section 4.2. Only the results for Algorithm 2 are presented here, since Algorithm 1 exhibits similar behavior.
J. Aweya et al. / Computer Networks 36 (2001) 709±728
4.4. Results for algorithm 2 Figs. 12(a)±(c) present, respectively, the DRED queue size q
n, the DRED drop probability pd
n and ®ltered actual loss rate, and the DRED target buer occupancy T for the ®rst scenario. Note that the ®ltered actual loss rate is used to adjust the target buer occupancy. Figs. 13(a)±(c) present the corresponding metrics for the second scenario. It is observed that using the measured actual loss rate does not provide any signi®cant
725
advantage and performance improvement. The network performance is about the same as in Section 4.2.2 (Algorithm 2) where the computed DRED drop probability was used as the drop rate estimate. In addition the metrics listed in Table 4 indicate similar performance. These results show that there is no clear advantage in this extra complexity. Using the DRED drop probability as a drop rate estimate and using this as the high-level controller input works satisfactorily.
Fig. 12. Results for a target loss rate of hmax 2%. (a) DRED queue size. (b) DRED drop probability. (c) DRED target buer occupancy. (Scenario 1: 100 connections start at time 0.0 s, 900 connections start at random times).
726
J. Aweya et al. / Computer Networks 36 (2001) 709±728
Fig. 13. Results for a target loss rate of hmax 2%: (a) DRED queue size. (b) DRED drop probability. (c) DRED target buer occupancy. (Scenario 2: 100 connections start at time 0.0 s, 900 connections start and stop at random times). Table 4 Loss rates, fraction of time spent in timeouts and timeouts per packet Scenario
Average packet loss rates (%)
Average fraction of time spent in timeout (%)
Average timeouts per packet
1 2
2.0 1.9
7.3 NA
0.0026 0.0024
5. Conclusion High loss rates in TCP/IP networks indicate that there is an excess of trac and that the net-
work has insucient buering or bandwidth capacity. To relieve the congestion, the network has to drop many packets. However, dropping too many packets tends to penalize TCP users, forcing
J. Aweya et al. / Computer Networks 36 (2001) 709±728
them into long timeout periods. The problem worsens as the system load (number of competing connections) increases. In this paper, we have shown that the proposed load adaptive mechanism for buer management enables a network (IP routers) to store extra packets during periods of congestion, instead of dropping packets. This signi®cantly reduces the packet loss rate, and ultimately the number of TCP timeouts. We showed that the proposed load adaptive mechanism automatically achieved a good tradeo between packet loss rate and queuing delay over a wide range of loads. It was shown that this mechanism could be con®gured to operate around a pre-speci®ed target loss rate, for instance, 2%. It can also be concluded that such a mechanism can signi®cantly increase the number of connections a network can support. Finally, it was shown that using the computed DRED drop probability (instead of using the measured actual packet loss rate) as the drop rate estimate parameter to the high-level controller resulted in satisfactory performance, without requiring any additional packet accounting complexity. This mechanism can be eectively used in routers with large buers without causing any excessive queuing delay, since the buers are consumed in proportion to the level of congestion. This mechanism is attractive for the deployment of large IP networks supporting a wide range of connections and large aggregations of trac. There are still some remaining issues to be investigated regarding the proposed load adaptive buer management mechanism. However, our preliminary studies indicate that this mechanism is a very good candidate for controlling the packet loss rates in TCP/IP networks. Some of the future studies will include a thorough sensitivity analysis of the load adaptive buer management mechanism and its parameter space (e.g., DT ; d; e, etc.) by using diverse network topologies, link rates, trac pattern, a mix of short and long-lived ¯ows and other networking conditions found in the Internet. The major transport protocol in use over the Internet is TCP which provides end-to-end congestion control. However, with the proliferation of applications and users, it is no longer possible to rely exclusively on the end systems to implement
727
end-to-end congestion control. There is a growing set of UDP-based applications whose congestion control algorithms are inadequate or nonexistent (i.e., the application does not throttle back upon receipt of congestion noti®cation). Such UDP applications include streaming applications like packet voice and video, and also multicast bulk data transport. To support UDP-based applications over the Internet, it is necessary to provide bandwidth to the UDP applications within the network so that the performance of the UDP applications will not be seriously aected during periods of congestion. UDP ¯ows do not typically back o when they encounter congestion. Thus, UDP ¯ows aggressively use up more bandwidth than TCP friendly ¯ows. Therefore, while it is important to have router algorithms support UDP ¯ows by assigning appropriate bandwidth, it is also necessary to protect responsive TCP ¯ows from unresponsive or aggressive UDP ¯ows so that all users get a reasonable service quality. Several router-based mechanisms which comprise of queueing policies, resource reservation, and metering have been proposed in the literature for handling Internet trac consisting of a mix of TCP and UDP ¯ows (e.g., [9,11,13,15]). These router-based mechanisms aim to provide rate guarantees to UDP ¯ows, protection of well-behaved TCP ¯ows from unresponsive UDP ¯ows, and bandwidth fairness between TCP ¯ows. In the same spirit, as future extensions and study of the proposed load adaptive buer management scheme, mechanisms to detect and restrict unresponsive or high-bandwidth best-eort ¯ows in times of congestion in the network will be investigated. We also intend to explore these mechanisms in complex scenarios with multiple congested gateways, more realistic trac models for UDP trac, and higher-priority real-time trac. References [1] V. Jacobson, Congestion avoidance and control, in: Proc. ACM SIGCOMM'88, August 1988, pp. 314±329. [2] W. Stevens, TCP slow start congestion avoidance fast retransmit and fast recovery algorithms, IETF RFC2001, 1997.
728
J. Aweya et al. / Computer Networks 36 (2001) 709±728
[3] S. Floyd, V. Jacobson, Random early detection gateways for congestion avoidance, IEEE/ACM Trans. Networking 1 (4) (1993) 397±413. [4] W.R. Stevens, TCP/IP Illustrated, Volume 1: The Protocols, Addison-Wesley, Reading, MA, 1994. [5] C. Villamizar, C. Song, High performance TCP in ANSNET, ACM Comput. Commun. Rev. 24 (5) (1995). [6] C. Eldridge, Rate control in standard transport layer protocols, ACM Comput. Commun. Rev. 22 (3) (1992). [7] R. Morris, TCP behavior with many ¯ows, in: IEEE Intl Conf. Network Protocols, Atlanta, GA, 1997. [8] J. Aweya, M. Ouellette, D.Y. Montuno, A control theoretic approach to active queue management, Computer Networks 36 (2001) 203±235. [9] D. Lin, R. Morris, Dynamics of random early detection, in: Proc. SIGCOMM'97, Cannes, France, September 1997, pp. 127±137. [10] M. Mathis, J. Semke, J. Mahdavi, T. Ott, The macroscopic behavior of the TCP congestion avoidance algorithm, ACM Comput. Commun. Rev. 27 (3) (1997). [11] S. Floyd, K. Fall, Promoting the use of end-to-end congestion control in the internet, IEEE/ACM Trans. Networking 7 (4) (1999) 458±472. [12] J. Padhye, V. Firoiu, D. Towsley, J. Kurose, Modeling TCP throughput: a simple model and its empirical validation, in: Proc. SIGCOMM'98, Vancouver, Canada, September 1998, pp. 303±314. [13] K. Cho, Flow-valve: embedding a safety-valve in RED, in: Proc. IEEE Globecom'99, Rio de Janeiro, Brazil, 5±9 December 1999. [14] R. Morris, Scalable TCP congestion control, Ph.D. Thesis, Harvard University, January 1999. [15] W. Feng et al., Techniques for eliminating packet loss in congested TCP/IP networks, Technical Report CSE-TR349-97, University of Michigan, November 1997. [16] J. Aweya, M. Ouellette, D.Y. Montuno, A. Chapman, Enhancing TCP performance with a load adaptive RED mechanism, Int. J. Network Management 11 (2001) 31±50. James Aweya received the B.Sc. degree in Electrical and Electronics Engineering from the University of Science & Technology, Kumasi, Ghana, the M.Sc. degree in Electrical Engineering from the University of Saskatchewan, Saskatoon, and the Ph.D. degree in Electrical Engineering from the University of Ottawa. Since 1996 he has been with Nortel Networks where he is a Systems Architect. He is currently involved in the design of resource management functions for IP and ATM networks, architectural issues and design of switches and IP routers, and network architectures and protocols. He has published more than 25 journal and
conference papers and has a number of patents pending. In addition to communication networks and protocols, he has other interests in neural networks, fuzzy logic control and application of arti®cial intelligence to computer networking. Michel Ouellette received the B.A.Sc. and M.A.Sc. degrees in Electrical & Computer Engineering from the School of Information Technology and Engineering (University of Ottawa), Ottawa, Canada in 1995 and 1998, respectively. He completed his B.A.Sc. at l'Ecole Nationale Superieure des Telecommunications (Telecom Paris), Paris, France in 1995. His Masters thesis was part of a joint collaboration with the Stentor Alliance and Bell Canada, where he focussed on Advanced Intelligent Networks design. He joined Nortel Networks as a Systems Design Engineer in September 1997. His current interests are in voice over IP, TCP/ IP congestion control and avoidance, active queue management, content switching and networking, and performance analysis and algorithm design. Since joining Nortel Networks, he has published a number of journal and conference papers and has a number of patents pending. Del®n Y. Montuno obtained the B.Sc. degree in Physics from Ateneo de Manila University, Philippines in 1972, M.Sc. degree in Electronics from Yamanashi University, Japan, Doctor of Information Engineering from Nagoya University, Japan in 1976 and 1980, respectively, and Ph.D. in Computer Science from University of Toronto, Canada, in 1985. He has been with Nortel Networks (then BNR) since 1984 where he worked on a number of projects including the development of VLSI layout algorithms. He is currently involved in the development of high capacity switch fabrics, and resource management algorithms and mechanisms for high-speed networks. His other research interests include constraint logic programming, neural networks, fuzzy logic, and computational geometry and their applications to network trac control. Alan Chapman has worked in the telecommunications industry for 45 years, initially in the United Kingdom. He joined Nortel Networks in 1969 and has played a key role in the development of the company's switching products. For the last ten years he has concentrated on the evolution of data networks and in particular controlled performance in IP networks. As well as publishing various papers, Mr. Chapman has been awarded many patents in the area of telecommunications systems and next generation networks.