Computer Networks 54 (2010) 977–990
Contents lists available at ScienceDirect
Computer Networks journal homepage: www.elsevier.com/locate/comnet
Traceband: A fast, low overhead and accurate tool for available bandwidth estimation and monitoring q Cesar D. Guerrero 1, Miguel A. Labrador * University of South Florida, Tampa, FL 33620, United States
a r t i c l e
i n f o
Article history: Available online 29 October 2009 Keywords: Internet measurement Network monitoring Hidden Markov models Moving average Available bandwidth Network capacity
a b s t r a c t Available bandwidth estimation techniques are being used in network monitoring and management tools to provide information about the utilization of the network and verify the compliance of service level agreements. However, the use of these techniques in other applications and network environments is limited by the long convergence times, accuracy errors, and the amount of overhead that they introduce. In this paper, we introduce Traceband, a hidden Markov model-based technique for end-to-end available bandwidth estimation and monitoring that improves these performance metrics and therefore promises to expand the use of these techniques in other scenarios. Traceband is evaluated and compared with Spruce and Pathload using Poisson and self-similar cross-traffic. Experimental results in a controlled environment with Poisson cross-traffic demonstrate that Traceband is as accurate as Spruce and Pathload but considerably faster, and introduces less overhead. Traceband’s convergence time is demonstrated using bursty cross-traffic, as it is the only tool that accurately reacts to zero-traffic periods, which may be particularly useful for those applications that need to make decisions in real time. Using self-similar traffic, Traceband’s mean accuracy and variability degrade with the Hurst parameter but it still performs within reasonable limits. A general and optional moving average algorithm is also introduced to solve these issues. Ó 2009 Elsevier B.V. All rights reserved.
1. Introduction Recently, the estimation of the available bandwidth (AB) of an end-to-end path has received considerable attention due to its applicability in several network applications. For example, the estimation of the available bandwidth is an essential component in network management tools and platforms that monitor large networked systems, as they provide information about current utilization of the network resources. It can be used to monitor and verify service level agreements, providing users and service
q
Preliminary results of this work have been recently published in [1]. * Corresponding author. Tel.: +1 813 974 3260. E-mail addresses:
[email protected] (C.D. Guerrero), labrador@ cse.usf.edu (M.A. Labrador). 1 In part supported by the Universidad Autonoma de Bucaramanga, Colombia. 1389-1286/$ - see front matter Ó 2009 Elsevier B.V. All rights reserved. doi:10.1016/j.comnet.2009.09.024
providers with accurate information to manage their contracts. Transport layer protocols might also use AB information to change the transmission rate according to the amount of bandwidth available in the path, using the network resources efficiently while avoiding congestion. Traffic engineering mechanisms would be able to perform realtime resource provisioning while balancing the load of the network. Overlay networks could use AB estimations to find the most appropriate topology. Call admission control mechanisms might take advantage of AB information to either admit or reject a new incoming connection, avoiding network congestion and guaranteeing the quality of service of current connections. Also, an estimation of the available bandwidth could provide very valuable information to detect network failures and malicious attacks. Currently, two main approaches have been utilized to estimate the end-to-end available bandwidth. One approach, called the Probe Gap Model (PGM), estimates
978
C.D. Guerrero, M.A. Labrador / Computer Networks 54 (2010) 977–990
the available bandwidth as the difference between the capacity and an estimated value of the amount of cross traffic. The most representative tool using this approach is called Spruce [2]. The other approach, called the Probe Rate Model (PRM), estimates the available bandwidth by finding the rate at which the one way delay of a train of probing packets shows an increasing trend. The most representative tool in this approach is called Pathload [3]. Although Pathload and Spruce have been recognized as the best performing tools in their respective estimation approaches, their applicability has been limited to network management tools in wired networks where the accuracy, overhead, and convergence times are not as strict as in other applications or network environments. Otherwise, performance trade-offs must be made, as it is shown in [4]. Pathload is accurate, but it takes too long to provide an estimate and introduces considerable overhead. This last aspect might not be appropriate for applications running over wireless networks. Spruce is faster and less intrusive than Pathload but, in general, it is less accurate than Pathload [5]. Further, neither tool is able to react fast enough to cross-traffic rapid changing conditions, which is especially important for those applications that need to make decisions in real time, such as an available bandwidth-based transport layer protocol that needs to make transmission rate decisions according to network load fluctuations, or a routing protocol that uses the available bandwidth as the routing metric. This paper includes several contributions to the stateof-the-art in end-to-end available bandwidth estimation. First, it presents an accurate, non-intrusive and fast endto-end available bandwidth estimation approach that utilizes a hidden Markov model (HMM) using observations from the dispersion of probing packet pairs. Second, this new approach is implemented in a new estimation tool called Traceband. Third, the paper includes a moving average technique to provide smoother estimations of the available bandwidth that can be incorporated not only in Traceband but in any other estimation tool. Finally, the paper includes a thorough performance evaluation comparing Traceband, Spruce and Pathload in a controlled environment using Poisson and self-similar cross-traffic. The tools have been additionally evaluated in a real network path at the University of South Florida. Our experimental results demonstrate that the hidden Markov model allows Traceband to provide fast and accurate estimates using a low probing traffic rate. The tool is also able to quickly and accurately react to cross-traffic variations, like those present in links loaded with bursty traffic. These features not only make Traceband a better tool to perform available bandwidth estimations for network management tools (no trade-offs are necessary) but it also opens up the possibility of incorporating the tool in other scenarios and applications not possible before. The reminder of the paper is organized as follows. Section 2 defines the available bandwidth and presents the current estimation approaches and most representative tools available in the literature. Section 3 explains the hidden Markov model and the method used to obtain observations from the network. Section 4 presents Traceband’s implementation details. Section 5 compares the perfor-
mance of Traceband with Pathload and Spruce using Poisson, bursty, and self-similar synthetic generated traffic in a controlled testbed. Section 6 describes a general and optional moving average algorithm that improves Traceband’s mean accuracy and variability. Finally, Section 7 concludes the paper. 2. Background and related work The available bandwidth of an end-to-end path is a time-varying metric related to the individual utilization of each link throughout the path. Defining T as the averaging timescale of the available bandwidth [6], the average utilization of link k during an interval of time T, is given by
uk ðt; t þ TÞ ¼
1 T
Z
tþT
uk ðsÞ ds;
ð1Þ
t
where 0 6 uk ðt; t þ TÞ 6 1. For a link k with capacity C k , the AB of the link in the interval (t, t + T) can be defined as the average non-utilized capacity during the time T. That is,
ABk ðt; t þ TÞ ¼ C k ½1 uk ðt; t þ TÞ:
ð2Þ
For an end-to-end path with H hops, the available bandwidth is given by the minimum non-utilized link in the path, as follows:
ABðt; t þ TÞ ¼ mink¼1...H ABk ðt; t þ TÞ:
ð3Þ
In the literature, the link with the minimum capacity is called the narrow link and the link with the minimum available bandwidth is called the tight link, which is considered the bottleneck of the path and the link that determines the end-to-end available bandwidth. To estimate the available bandwidth, current techniques use the single link model shown in Fig. 1. In this model, two consecutive probing packets arrive to the node with a determined time-separation between them ðDin Þ. After interacting with the cross-traffic coming from different sources, the probing packets leave the output queue with a different time-separation ðDout Þ. That variation in the probing packet time-separation, or in the probing packet rate due to the interaction with the cross-traffic, has inspired researchers to formulate two different models to estimate the available bandwidth in an end-to-end path: the Probe Gap Model (PGM) and the Probe Rate Model (PRM). The Probe Gap Model bases the estimation on the gap separation or dispersion between two consecutive probing packets at the receiver, which has a strong correlation with the amount of cross-traffic in the tight link. That dispersion is used to estimate the amount of cross-traffic in the tight link during T, which is subtracted from the tight link capacity to estimate the AB in the path. This method suffers from several possible sources of errors when used in real networks. First, probing packet pair dispersions may be affected by the presence of cross traffic in previous hops before the tight link. Second, cross traffic in links after the tight link may also affect the Dout measured at the receiver. Third, the estimation of the capacity, as required by PGM tools to determine the Din , may introduce additional errors. Finally, PGM tools assume that the tight and narrow links are the same, which may lead to erratic
C.D. Guerrero, M.A. Labrador / Computer Networks 54 (2010) 977–990
979
Fig. 1. Single link model for bandwidth estimations.
estimations. Examples of tools in this category are Spruce [2], Delphi [7], Abing [8], and IGI [9]. The Probe Rate Model is based on the idea of induced congestion, in which the tools send probing packets at increasing rates and the receiver looks for the turning point, or the point at which the delay of the probing packets starts increasing in a consistent basis. The available bandwidth is then estimated looking at the probing packet rate utilized when the turning point is found. Pathload [3], TOPP [10], and Pathchirp [11] are examples of tools utilizing this approach. Pathload and Spruce are the most representative tools from the Probe Rate and Probe Gap models, respectively. They have been previously evaluated in several papers [2,12,4]. Pathload uses the principle of Self-Loading Periodic Stream (SLoPS) [13]. SLoPS is based on the fact that the one way delay of a periodic packet stream increases when the rate of the probing traffic is higher than the available bandwidth in the path. Otherwise, there is no increase in the delay measured. A fleet of streams (of a fixed number of packets each) are sent at varying rates and the one way delay trend of each stream is then characterized at the receiver as either increasing or decreasing. When that delay is in a grey region (not clearly increasing nor decreasing), the methodology presents a variation range (upper and lower limits) of the available bandwidth. Pathload sends periodic packet streams of UDP traffic and uses a TCP connection to send trend results back to the sender. Given a desired stream rate R, Pathload sets the packet inter-departure time t at 100 ls and calculates the necessary packet size L to satisfy R ¼ L=t. If the L is less than 96 bytes, Pathload uses this minimum value and calculates t instead. Spruce uses the Probe Gap Model. It sends a Poisson sample of 1500-Bytes UDP pairs of packets with an intrapair gap equal to the narrow link transmission time of a 1500-Byte packet. That guarantees that the second packet arrives at the narrow link queue before the first packet leaves that queue. Using the dispersion of the probing packets measured at the receiver, Spruce calculates the average rate of the traffic that arrives to the queue between the two packets. The available bandwidth is determined by subtracting that cross-traffic rate from the capacity in the bottleneck link. Spruce estimation requires a previous calculation of the tight link capacity. The implementation and evaluation of a bandwidth estimation tool require a great deal of practical programming and networking knowledge. Several aspects such as timestamps, operating system-dependent functions, clock
granularity, etc. have to be considered in their implementation. Similarly, link capacities, traffic loads, type of cross traffic, propagation delay of the links, tools to measure the real available bandwidth and traffic generators, etc. are among the practical aspects that need to be considered in the evaluation. Most of these critical aspects for the correct implementation and evaluation of available bandwidth tools can be found in the papers describing the respective tools and also in [14,5]. 3. Estimation model The estimation of the available bandwidth is a difficult task. First, due to the bursty nature of the cross traffic and the load in the network, a single pair of probing packets is not enough to capture the entire behavior of the underlying process. As a result, estimation tools based on both the PGM and PRM approaches use a train of probing packets. Second, there are many possible sources of noise in the system. Tools based on the probe gap model have difficulties calculating the right packet pair dispersion, as mentioned in Section 2. Estimation tools can also make inaccurate estimations because of errors generated by the end hosts and routers along the end-to-end path, link failures, changes in routes, out-of-order packet delivery, packet replications, errors in the probing packets due to link quality issues, incorrect packet time stamps, poor Network Interface Card (NIC) utilization, etc. [15–17]. Most of these errors can be corrected in a controlled testing environment but not in a real scenario where users run the estimation tool but have no knowledge about how to conveniently set up their machines, links and routers. The estimation tool presented in this paper does not prevent these errors from occurring but builds a model of the available bandwidth that statistically adjusts erratic measurements. The description of such a model is detailed next. Table 1 summarizes all the variables used to define it. 3.1. Probing sampling method In order to get information about the AB dynamics during a period T, the network is sampled using the Probe Gap Model. Assuming the single tight link model shown in Fig. 1, a probing packet pair enters the router with a Din separation. Then, due to the interaction of the probing packet pairs with the cross-traffic in the router’s output queue, the packets leave the link with a different
980
C.D. Guerrero, M.A. Labrador / Computer Networks 54 (2010) 977–990
Table 1 Model variables. Variable
Description
C Din Dout N S M V A B
Tight link capacity Packet pair separation before the tight link Packet pair separation after the tight link Number of states representing AB levels Set of states (low to high): S ¼ S1 ; S2 ; . . . ; SN Number of distinct observation outcomes Set of observations: V ¼ t1 ; t2 ; . . . ; tM State transition probability matrix Observation probabilities Initial state probabilities Sampling period. t ¼ 1 . . . T Relative time dispersion at time t State at time t State sequence: Q ¼ X 1 ; X 2 ; . . . ; X T Observation symbol at time t Observation sequence: O ¼ n1 ; n2 ; . . . ; nT
P T
t Xt Q nt O
separation, or dispersion Dout . It has been shown in [2] that this variation has a strong correlation with the amount of cross-traffic in the queue during the sampling period, which can be used to estimate the available bandwidth. The relative dispersion suffered by the probing packets can be defined as:
t ¼ ðDout Din Þ=Din ;
t ¼ 1 . . . T;
ð4Þ
which is a measure of the tight link utilization as seen by a probing packet pair at time t. Then, knowing the capacity of the tight link C, the end-to-end available bandwidth at time t can be estimated by:
Dout Din ABt ¼ C ð1 t Þ ¼ C 1 : Din
bandwidth during that period of time. The bigger the number of states defined, the smaller the AB range represented by each state, and the better the precision of the estimation. However, AB states can not be directly observed, they are hidden, since the end-to-end estimators do not have information about bandwidth consumption in intermediate routers. Rather, AB estimators sample the network path with probing packets that convey packet dispersion information, which can be used by a hidden Markov model to infer the non-observable states. Since in the AB model proposed in Fig. 2 the states can not be directly observed, a HMM approach can be used to find the state sequence associated with the dispersions observed during the sampling period. The model, which is shown in Fig. 3, is a hidden Markov model with discrete hidden states X representing the available bandwidth levels (ranges) and discrete observation variables n representing probing packet pair dispersions. A particular observation has associated a probability B of being generated by a particular hidden state. Available bandwidth transitions go from time t ¼ 1 to time t ¼ T. Transitions between states are governed by probabilities specified in the transition probability matrix A. This model, which is refined with every new observation, is used to determine the most probable state sequence ðQ ¼ X 1 ; X 2 ; ; X T Þ responsible for what has been observed during T. At the end, the average state in the estimated sequence of states will be the average available bandwidth during time period T. Considering the model characterization given in [19], this work implements a hidden Markov model with the following five elements:
ð5Þ
Similar to other PGM tools, it is assumed in this paper that the tight link is also the narrow link for the path. The value of the narrow link capacity (C) can be calculated using well-known accurate tools, like Pathrate [18]. 3.2. Hidden Markov Model (HMM) The available bandwidth in an end-to-end path can be modeled by N states, each one representing a certain level of availability. For example, in the five-state representation shown in Fig. 2, the AB could be in one of the Low (L), Medium Low (ML), Medium (M), Medium High (MH), and High (H) states. That is, it could be located in any spare utilization range from [0, 0.2), [0.2, 0.4), [0.4, 0.6), [0.6, 0.8), or [0.8, 1]. If we could observe the sequence of states visited during time T, then the average of the middle points of each state range could be an estimation of the available
3.2.1. Number of states in the model (N) The set of states is defined by S ¼ fS1 ; S2 ; . . . ; SN g where the available bandwidth level grows from S1 (low) to SN (high). The state at time t is denoted by X t . The default number of states in the estimation tool presented in this work is ten, representing available bandwidth ranges of [0, 0.1), [0.1, 0.2), . . ., and [0.9, 1]. This value was set after testing the model with 5, 10, 15, and 20 states, and obtaining the best results with 10 states. 3.2.2. Number of distinct observation symbols per state (M) These are all the possible outcomes of a state. That is, the set of symbols associated with observed dispersions from the probing sampling method. The default number of distinct observation symbols in the estimation tool presented in this work is ten, and it is denoted by V ¼ ft1 ; t2 ; . . . ; t10 g. Each symbol is a decimal number from 1 to 10. These ten symbols represent observed
Fig. 2. AB Markov model. States represent levels of bandwidth availability.
C.D. Guerrero, M.A. Labrador / Computer Networks 54 (2010) 977–990
981
by a low order state (one indicating low available bandwidth) and conversely. Based on this rational, probability values can be assigned and fixed in the model with fairly accurate ”educated guess” values for matrix B. Since the number of observation symbols and the number of states in the tool presented in this work is equal to ten, the matrix values were fixed as follows: Fig. 3. Hidden Markov model for available bandwidth estimations.
relative dispersion values in the ranges t1 ½0; 0:1Þ; t2 ½0:1; 0:2Þ; . . ., and t10 ½0:9; 1. Therefore, every single observation at time t has to be converted to a discrete value nt associated with a symbol t by:
nt ¼ dM j1 t je:
ð6Þ
That is, Eq. (6) determines the discrete observation value nt that corresponds to a continuous observation t , where nt 2 V. There is also a relation between states (AB) and observations (associated with ) as it is defined in Eq. (5). 3.2.3. State transition probability matrix (A) A ¼ ½aij where aij ¼ PðX tþ1 ¼ Sj jX t ¼ Si Þ; 1 6 i; j 6 N. In a hidden Markov model it is convenient to initially populate this matrix with ”educated guess” values. In this work, the values of the initial matrix A were set assuming that transitions would occur only between adjacent states. This assumption facilitates initial calculations to find the model parameters since the number of unknown elements in the matrix is reduced to the three main diagonals. This is a good assumption if the probing packet pairs interact with cross traffic. However, this is not always the case and, in practice, there are transitions between non-adjacent states as a result of one packet pair finding cross traffic and the subsequent one finding the channel idle. However, the HMM learns about these transitions over time and adjust the transition probabilities accordingly.
2
a1;1 6 6 6 a2;1 6 6 A¼6 0 6 6 . 6 . 4 . 0
a1;2
0
a2;2 .. .
a2;3 .. .
0 .. .
0
aN1;N2 0
aN1;N1 aN;N1
0 .. .
3
7 7 7 7 7 : 0 7 7 7 7 aN1;N 5 aN;N
3.2.4. Observation probabilities(B) This is a set of probabilities that indicates how likely it is that at time t an observation symbol nt is generated by each state from the set S. More specifically, B ¼ ½bj ðmÞ where bj ðmÞ ¼ Pðnt ¼ tm jX t ¼ Sj Þ for 1 6 m 6 M; 1 6 j 6 N, P and M m¼1 bj ðmÞ ¼ 1:
b1 ¼ ½Pðt1 jS1 Þ Pðt2 jS1 Þ PðtM jS1 Þ b2 ¼ ½Pðt1 jS2 Þ Pðt2 jS2 Þ PðtM jS2 Þ b3 ¼ ½Pðt1 jS3 Þ Pðt2 jS3 Þ PðtM jS3 Þ .. . bN ¼ ½Pðt1 jSN Þ Pðt2 jSN Þ PðtM jSN Þ It is expected that small values of n are the result of a highly loaded network and therefore more likely generated
b1 ¼ ½0:35 0:25 0:15 0:10 0:05 0:03 0:03 0:02 0:01 0:01; b2 ¼ ½0:20 0:35 0:20 0:10 0:05 0:03 0:03 0:02 0:01 0:01; b3 ¼ ½0:08 0:20 0:35 0:20 0:07 0:03 0:03 0:02 0:01 0:01 b5 ¼ ½0:02 0:03 0:08 0:20 0:35 0:20 0:07 0:02 0:02 0:01; b6 ¼ ½0:01 0:02 0:02 0:07 0:20 0:35 0:20 0:08 0:03 0:02; b7 ¼ ½0:01 0:01 0:02 0:03 0:07 0:20 0:35 0:20 0:08 0:03; b8 ¼ ½0:01 0:01 0:02 0:03 0:03 0:07 0:20 0:35 0:20 0:08; b9 ¼ ½0:01 0:01 0:02 0:03 0:03 0:05 0:10 0:20 0:35 0:20; b10 ¼ ½0:01 0:01 0:02 0:03 0:03 0:05 0:10 0:15 0:25 0:35:
Note that 0.35 and 0.25 are high probability values assigned to more likely states. 3.2.5. Initial state probabilities ðPÞ This is a vector with the probabilities that each state is the first in the state sequence that generated the observations. P ¼ ½pi where pi ¼ PðX 1 ¼ Si Þ for 1 6 i 6 N. Therefore,
P ¼ ½p1 p2 pN ¼ ½PðX 1 ¼ S1 Þ PðX 1 ¼ S2 Þ PðX 1 ¼ SN Þ: The last three probabilities are usually denoted as k ¼ ðA; B; PÞ to indicate the complete parameter set of the model. It is worth noticing that the values of N, M, and B were found from experimentation; a deeper analysis is required to study the effect of varying these parameters. 3.3. Parameter estimation Given an observation sequence O ¼ n1 ; n2 ; . . . ; nT , that is, a set of samples from the network during T, it is desired to estimate the model k that most likely generated that sequence, i.e. the model k ¼ ðA; B; PÞ for which the PðOjkÞ is maximized. The solution to this problem is given by an iterative procedure formulated in the Baum–Welch algorithm [20]. The purpose of using this algorithm is to update the available bandwidth model with every new sequence of observations from the network. The estimation tool presented in this work has implemented a modified version of the Baum–Welch algorithm written in C by Tapas Kanungo [21]. There are two main modifications to the algorithm. The first one is that the initial transition probability matrix A is randomly generated to be a one-step transition matrix. Therefore, only the three main diagonals in the matrix have probability values. The second modification is that the observation probability matrix B is fixed so that the probabilities of observations being generated by the states do not change. This is due to the fact that it is expected that highly congested links will increase the dispersion between packets and vice versa. Indeed, a link with zero cross-traffic generates zero (or close to zero) dispersion between the pair of packets. The algorithm, which _ runs as follows: has a time complexity of OðN 2 TÞ,
982
C.D. Guerrero, M.A. Labrador / Computer Networks 54 (2010) 977–990
1. Set the initial model k0 with a randomly generated onestep transition matrix A0 and initial state probability vector P0 . Matrix B0 is initialized as explained in the previous section. 2. Calculate a new k ¼ ðA; B0 ; PÞ based on k0 and the observation sequence O. See [19] for more details. k 3. if logPðO= kÞ logPðO=k0 Þ < 0:001 then stop else k0 and go to step 2. Note that B ¼ B0 all the time as explained before. 3.4. State sequence estimation With an updated model k, the next problem is to find the state sequence Q ¼ X 1 ; X 2 ; . . . ; X T that maximizes the likelihood of PðX 1 ; X 2 ; . . . ; X T jO; kÞ. That state sequence is used to calculate the average available bandwidth during T. This is done by using another iterative algorithm, the _ time comViterbi algorithm [22], which also has a OðN 2 TÞ plexity. The algorithm selects the most likely path from a particular state to all possible paths and does the same for each state. See [19] or [21] for implementation details of the algorithm. The final most likely path represents the levels of AB that the probing sampling packets have observed during the sampling time. As defined in Eq. 1, the final estimation is based on the average utilization observed during T. Therefore, the AB is calculated as the average of all states in the sequence. Table 1 summarizes all the variables utilized in the estimation model. 4. Traceband Traceband is a client–server tool written in ANSI C that uses the described hidden Markov representation of the available bandwidth dynamics to provide fast, continuous, and accurate AB estimates. The Traceband client runs in cycles of ten estimations. In the first estimation the tool sends 50 UDP packet pairs 1498 bytes long. The nine remaining estimations are performed with 30 packet pairs each. This reduction is possible thanks to the HMM, which is able to learn the AB dynamics with an initial sample and keep the model updated with samples of reduced size. It was found from experimentation that re-learning every ten estimations was enough to maintain good accuracy with low overhead. Using 50 packet pairs to train the model showed the best estimation results. Future work will study the implications on the accuracy of the tool when this number of probing packets is varied. Traceband utilizes different values for the intra-gap and inter-gap times of packet pairs. The intra-gap refers to the time between the two packets of each packet pair. The intra-gap or Din is specified at the sender and is set equal to the transmission time of a single probing packet in the tight link. In that way, the packet pair will be able to capture cross-traffic in the queue, if any. The inter-gap refers to the time between pairs of probing packets, i.e. the time between the second packet of probing pair i 1 and the first packet of probing pair i. These times are obtained using the gettimeofday function, and its values are sent to the receiver in the packet payload. Similar to Spruce [2],
Traceband performs a Poisson sampling process of the available bandwidth of the path by using exponentially distributed inter-gap times. In order to keep the overhead controlled and low, the mean inter-gap time value is calculated so that the maximum overhead introduced by the tool is 5% or less of the tight link capacity. At the receiver side, the tool server application timestamps each received probing packet at the kernel level. This is performed by setting the SO_TIMESTAMP option in the socket. Packets are numbered to determine which packets are in the same pair and calculate the correct relative time dispersion ðÞ between them. By applying Eq. 6, the corresponding observation symbol for the HMM is determined for each packet pair. The HMM module in Traceband reads the values of N, M and B from a file to compute the model k based on the 50 (or less) observations calculated at the receiver. The model is used to determine the most likely sequence of states that generated the observations. For every new estimation, the initial model k0 is the output of the previous estimation. The sequence of states is then averaged and multiplied by the tight link capacity to provide a final AB estimation. The main Traceband algorithm running at the receiver is shown in Fig. 4. 5. Performance evaluation The performance of Traceband is evaluated and compared with Pathload and Spruce. In the evaluation these tools were run with their default parameters, including their default time scales. We did not attempt to change these parameters and rather assumed that these tools were optimized to offer their best performance with those settings. The evaluation is performed using the testbed shown in Fig. 5. This is a fully controlled environment with a 10 Mbps capacity tight link. Cross-traffic is generated from
Fig. 4. Traceband receiver pseudo code.
983
C.D. Guerrero, M.A. Labrador / Computer Networks 54 (2010) 977–990
Fig. 5. Testbed used in the performance evaluation of the available bandwidth estimation tools.
6
6
x 10
10
9 8 7 6
Real AvBw Mean Real AvBw Estimated AvBw
5 4
0
50
100
Available Bandwidth (bps)
Available Bandwidth (bps)
10
9 8 7 6
Real AvBw Mean Real AvBw Estimated AvBw
5 4
150
x 10
0
50
100
150
Time (s)
Time (s)
(b) Spruce.
(a) Pathload. 6
Available Bandwidth (bps)
10
x 10
9 8 7 6
Real AvBw Mean Real AvBw Estimated AvBw
5 4
0
50
100
150
Time (s)
(c) Traceband. Fig. 6. Available bandwidth estimation for a 10 Mbps tight link with 30% of Poisson cross-traffic.
the host called US to the host called China and the estimation is performed from Sender to Receiver. The Multi-Generator MGEN [23] is used to generate Poisson and bursty cross-traffic experiments; it allows sending cross-traffic at different rates and with different probability distributions.
Self-similar cross-traffic is generated using a trace generated by [24]. A computer using tcpdump sniffs the output link in the router and records a trace with the joined cross and probing traffic. This trace is used to calculate and plot the average utilization of the link every 1/10 s.
984
C.D. Guerrero, M.A. Labrador / Computer Networks 54 (2010) 977–990
Table 2 Performance evaluation for 30% Poisson cross-traffic with a 95% confidence interval. Tool
Estimation error (%)
Estimations/ min
Overhead (%)
Pathload Spruce Traceband
6:71 1:17 7:77 0:98 8:83 0:43
1:754 0:066 5:579 0:059 11:645 0:132
6:57 0:20 1:41 0:02 1:96 0:03
The performance metrics used in the evaluation are accuracy, overhead, and estimation rate. The accuracy metric compares the estimation provided by the tool with the real average value obtained from the tcpdump trace, during the tool estimation period. In this paper, the accuracy is given by the relative error according to Eq. (7), where mAB is the value given by the tool and lAB is the real AB value from the trace.
mAB lAB 100%: error ¼ l
ð7Þ
AB
The overhead is related to the amount of probing packets that the tool needs to inject into the network in order to perform the estimation. Although the overhead is usually
expressed in bps, in this article it is defined as the percentage of tool traffic rate (tool traffic divided by the tool running time) with respect to the total capacity of the tight link. Finally, the estimation rate shows how often the tool is able to provide an estimate. This rate is given in estimations per minute. The better the convergence time, the higher this value can be. Pathload and Traceband directly report the estimation time. Spruce estimation time was recorded using a script to calculate the difference of times before and after running the tool. The tools were evaluated as if they were performing a continuous network monitoring task during a period of 200 s. In the case of Pathload and Spruce, it was necessary to run the tools in a loop. In the case of Traceband, the tool has an option to set the estimation period. For every experiment, the output of the tool was redirected to a log file that was processed to extract information about the time, amount, and values of the estimations. The tight link was loaded with 3 Mbps (30% of its capacity) with Poisson, bursty, and self-similar (hurst parameter = 0.8) cross-traffic. Every experiment was repeated five times and 95% confidence intervals were calculated.
6
6
x 10
9 8 7 6
Real AvBw Mean Real AvBw Estimated AvBw
5 4
0
50
100
Available Bandwidth (bps)
10 9 8 7 6
Real AvBw Mean Real AvBw Estimated AvBw
5 4
150
x 10
0
50
100
Time (s)
Time (s)
(a) Pathload.
(b) Spruce. 6
10
Available Bandwidth (bps)
Available Bandwidth (bps)
10
x 10
9 8 7 6
Real AvBw Mean Real AvBw Estimated AvBw
5 4
0
50
100
150
Time (s)
(c) Traceband. Fig. 7. Available bandwidth estimation for a 10 Mbps tight link with 30% of bursty cross-traffic.
150
985
C.D. Guerrero, M.A. Labrador / Computer Networks 54 (2010) 977–990 6
6
x 10
10
8
6
4
2
0
Real AvBw Mean Real AvBw Estimated AvBw 0
50
100
150
Available Bandwidth (bps)
Available Bandwidth (bps)
10
8
6
4
2
0
200
x 10
Real AvBw Mean Real AvBw Estimated AvBw 0
50
100
Time (s)
150
200
Time (s)
(a) Pathload.
(b) Spruce. 6
x 10
Available Bandwidth (bps)
10
8
6
4
2
0
Real AvBw Mean Real AvBw Estimated AvBw 0
50
100
150
200
Time (s)
(c) Traceband. Fig. 8. Available bandwidth estimation for a 10 Mbps tight link with 30% of self-similar (H = 0.8) cross-traffic.
14
Table 3 Performance evaluation for 30% self-similar cross-traffic with a 95% confidence interval. Estimation error (%)
Estimations/ min
Overhead (%)
Pathload Spruce Traceband
4:68 1:29 7:72 1:57 10:48 1:33
1:600 0:092 5:482 0:060 12:543 0:051
6:31 0:22 1:36 0:04 2:09 0:02
5.1. Experiments with Poisson cross-traffic Fig. 6 shows the tools’ estimations when the tight link is loaded with an average of 3 Mbps Poisson cross-traffic. The mean value for the real available bandwidth is calculated as the average of all real AB values observed between two estimations of each tool. This is done in that way since the tools also provide an average over the estimation period. For comparison purposes, Pathload single points were calculated as the mid point of the range reported by the tool. In the experiment shown in Fig. 6, Pathload makes 1.86 estimations per minute, inserts 6.86% of the tight link
Estimation Error (%)
Tool
12
Traceband Pathload Spruce
10 8 6 4 2 0
0.5
0.55
0.6
0.65
0.7
0.75
0.8
Hurst Parameter Fig. 9. Effect of the Hurst parameter in the estimation error.
capacity as tool overhead, and presents an average estimation error of 6.92%. Spruce, performs 5.49 estimations per minute, inserts 1.42% of the tight link capacity as tool overhead, and has an average estimation error of 8.54%.
986
C.D. Guerrero, M.A. Labrador / Computer Networks 54 (2010) 977–990
ment. Table 2 shows 95% confidence intervals for each performance metric as a result of running the experiments five times. In this Poisson cross-traffic scenario, the three tools under evaluation showed estimation errors below 10%, which according to evaluations performed by other authors as in [12] can be considered as of high accuracy. Compared with Spruce, Traceband has been shown to perform twice the number of estimations per minute with similar total overhead. Pathload has been shown to be more than three times more intrusive and more than six times slower than Traceband.
5.2. Experiments with bursty cross-traffic Fig. 7 shows the results of running the tools when the network is loaded with 3 Mbps of bursty cross-traffic. The length of the bursts and the burst interarrival times are both exponentially distributed with averages of 5 and 10 s, respectively. In this case, Pathload makes 2.45 estimations per minute, inserts 7.65% of the tight link capacity as tool overhead, and presents an average estimation error of
Fig. 10. Moving average algorithm.
Finally, Traceband performs an average of 11.42 estimations per minute, inserts 1.90% of the tight link capacity as tool overhead, and presents an average estimation error of 8.40%. These results correspond to one single experi-
7
7
x 10
Available Bandwidth (bps)
10
8
6
4
2
0 0
Mean Real AvBw Estimated AvBw 10
20
30
40
50
60
x 10
8
6
4
2
Mean Real AvBw Estimated AvBw
0 0
70
10
20
30
Time (s)
40
Time (s)
(a) Pathload.
(b) Spruce. 7
10
Available Bandwidth (bps)
Available Bandwidth (bps)
10
x 10
8
6
4
2
0 0
Mean Real AvBw Estimated AvBw 10
20
30
40
50
60
70
Time (s)
(c) Traceband. Fig. 11. Available bandwidth estimation for a 100 Mbps tight link with Internet cross traffic.
50
60
70
987
C.D. Guerrero, M.A. Labrador / Computer Networks 54 (2010) 977–990 Table 4 Performance evaluation with real cross traffic in a 100 Mbps path. Tool
Estimation error (%)
Estimations/min
Pathload Spruce Traceband
13.89 11.07 10.95
4.20 5.91 109.85
Table 5 Estimation error after applying moving average to experiment results in Table 2.
12.06%. Spruce performs an average of 5.46 estimations per minute, inserts 1.34% of the tight link capacity as tool overhead, and has an average estimation error of 8.21%. Traceband performs an average of 11.86 estimations per minute, inserts 1.98% of the tight link capacity as tool overhead, and presents an average estimation error of 4.12%. As in the Poisson cross-traffic case, the amount of overhead introduced by Traceband is considerably lower than Pathload. As can be observed from Fig. 7, since Traceband performs more estimations per minute, the tool is able to accurately react to periods where the tight link has no cross-traffic. Further, during those empty periods, the HMM provides 100% accuracy setting the estimations to the state representing the highest availability. As far as
Tool
Estimation error (%)
Pathload Spruce Traceband
4:88 2:13 3:84 1:92 2:93 1:42
the authors’ knowledge is concerned, this is the first tool capable of reacting in such a fast and accurate manner. 5.3. Experiments with self-similar cross-traffic Fig. 8 shows the results of running the tools when the network is loaded with 3 Mbps of self-similar cross-traffic with Hurst parameter equal to 0.8. Since in the estimation methodology we are assuming that the available bandwidth can be modeled by a hidden Markov model, the memoryless property holds when the cross-traffic is Poisson but not self-similar. Therefore, as expected, Traceband shows a higher but still low estimation error. The 95% confidence intervals calculated for the three evaluated tools
6
6
x 10
9 8 7 6
Real AvBw Mean Real AvBw Estimated AvBw
5 4 0
50
100
Available Bandwidth (bps)
10
x 10
9 8 7 6
Real AvBw Mean Real AvBw Estimated AvBw
5 4 0
150
50
100
Time (s)
Time (s)
(a) Pathload.
(b) Spruce. 6
10
Available Bandwidth (bps)
Available Bandwidth (bps)
10
x 10
9 8 7 6
Real AvBw Mean Real AvBw Estimated AvBw
5 4 0
50
100
150
Time (s)
(c) Traceband. Fig. 12. Moving average post processing to experiments in Fig. 6 (Poisson cross-traffic).
150
988
C.D. Guerrero, M.A. Labrador / Computer Networks 54 (2010) 977–990 6
10
6
x 10
10 9
Available Bandwidth (bps)
Available Bandwidth (bps)
9 8 7 6 5 4 3 2
Real AvBw Mean Real AvBw Estimated AvBw
1 0 0
x 10
20
40
60
80
8 7 6 5 4 3 2
Real AvBw Mean Real AvBw Estimated AvBw
1
100 120 140 160 180 200
0 0
20
40
60
80
100 120 140 160 180 200
Estimation Time (s)
Estimation Time (s)
(a) Pathload.
(b) Spruce. 6
10
x 10
Available Bandwidth (bps)
9 8 7 6 5 4 3 2
Real AvBw Mean Real AvBw Estimated AvBw
1 0 0
20
40
60
80
100 120 140 160 180 200
Estimation Time (s)
(c) Traceband. Fig. 13. Moving average post processing to experiments in Fig. 8 (self-similar cross-traffic with H ¼ 0:8).
under self-similar cross-traffic are shown in Table 3. Overhead and estimation rates are very similar to results obtained in the Poisson cross-traffic scenario (Table 2). The estimation error is however higher but still in a low range (around 10%) with low variability. One additional experiment was made to look at the effect of the Hurst parameter in the estimation error of the tools. Fig. 9 shows these results. From the figure, it is clear that Pathload is the most accurate tool followed by Spruce and Traceband, in that order. As expected, the self similarity level affects the accuracy of the evaluated tools, and in particular, the performance of Traceband, which not only increases its estimation error but also its variability. 5.4. Experiments with Internet traffic In these experiments we evaluated the estimation tools utilizing two computers connected to the network of the University of South Florida. The end-to-end path consisted of three switches (four links) with one intermediate switch connected to the Internet. The path had a 100 Mbps tight link. Since this is not a fully controlled environment, the traffic traces were provided by the network administrator of the university, and have a granularity of 10 s.
Table 6 Ninety five percent confidence intervals on the estimation error of the tools after applying the moving average algorithm to the experiment results in Table 3. Tool
Estimation error (%)
Pathload Spruce Traceband
3:90 1:72 4:67 1:24 5:12 0:49
The average values plotted in Fig. 11a–c are summarized in Table 4. In these experiments Traceband was also considerably faster and more accurate than the other tools when compared with the 10-s granularity real traffic trace. 6. Moving average algorithm In this section, we introduce a moving average algorithm meant to improve the estimation error and variability of Traceband. The idea of the algorithm is similar to the one proposed in [25] to filter out abrupt changes in the received signal strength of wireless devices. It calculates the
C.D. Guerrero, M.A. Labrador / Computer Networks 54 (2010) 977–990
14
Estimation Error (%)
12
989
From the figure, it can be observed that Traceband’s estimation error is reduced considerably using this methodology, as it is its variability. Further, the algorithm improves Pathload’s and Spruce’s performance as well. The 95% confidence intervals are shown in Table 6, which shows that now Traceband is the tool with the lowest variability. If we apply the moving average algorithm to the results shown in Fig. 9 where the Hurst parameter was varied from 0.5 to 0.8, Fig. 14 now shows that all the tools present fairly similar results, but better compared with the case without the moving average algorithm.
Traceband Pathload Spruce
10 8 6 4 2 0
7. Conclusions 0.5
0.55
0.6
0.65
0.7
0.75
0.8
Hurst Parameter Fig. 14. Effect of the Hurst parameter in the estimation error with the moving average algorithm.
mean AB and the standard deviation std of five continuous estimations.
Inter v al ¼ ½AB std; AB þ std:
ð8Þ
If the next single estimation lies above or below the upper or lower limits calculated using Eq. 8, that estimation is considered a ‘‘peak” (a very rare sample) and it is changed to the interval upper or lower limit value. Then a new confidence interval is calculated with the last five estimations (the window of five estimations is continuously shifted once every time). The smoothed estimation is therefore the result of averaging the last five measurements after adjusting those out of the limits. It is worth noticing that this technique is general and could be applied and incorporated into any other available bandwidth estimation tool. The algorithm is shown in Fig. 10. This optional moving average algorithm available in Traceband was evaluated using Poisson and self-similar cross-traffic. In the case of the Poisson cross-traffic, we used the same results shown in Fig. 6, and for comparison purposes, the technique was also applied to the estimation results of Pathload and Spruce. From Fig. 12, it can be observed that since Traceband estimations are more symmetric over the mean value than Pathload’s and Spruce’s, after applying the moving average technique, the tool shows the best accuracy and the lowest variability. It is worth noticing that given the small estimation rate of Pathload, using this filtering algorithm the tool is not able to perform the first estimate before 150 s. As before, for the set of five experiments, a 95% confidence interval was calculated. The overhead and estimation rate are the same as in Table 2 but the estimation error results are shown in Table 5. In the case of self-similar cross-traffic, in which Traceband’s performance worsens with the Hurst parameter, the moving average algorithm makes important improvements in both, the estimation error and its variability. Fig. 13 shows the results of applying the moving average algorithm to the same data used to plot Fig. 8 with a Hurst parameter of H ¼ 0:8. As before, for comparison purposes, the technique was also applied to Pathload and Spruce.
This paper introduces a hidden Markov model approach to end-to-end available bandwidth estimation and monitoring. The new approach is implemented in a tool called Traceband that, compared with Pathload and Spruce, not only provides better performance results overall, but it is also able to react and accurately estimate the available bandwidth under abrupt changes in cross-traffic. Experimental results using synthetic and real cross-traffic show that Traceband provides more estimations per unit time with comparable accuracy relative to Pathload and Spruce with less probing traffic. Traceband also includes an optional moving average technique that smooths out the estimations and improves its accuracy even further. With these results, it is expected that available bandwidth estimation tools will be used in a larger number of applications and networking scenarios. References [1] C.D. Guerrero, M.A. Labrador, A hidden markov model approach to available bandwidth estimation and monitoring, in: Proceedings of the Internet Network Management Workshop, Orlando, FL, 2008. [2] J. Strauss, D. Katabi, F. Kaashoek, A measurement study of available bandwidth estimation tools, in: Proceedings of the 3rd ACM SIGCOMM conference on Internet Measurement, 2003, pp. 39–44. [3] M. Jain, C. Dovrolis, Pathload: A measurement tool for end-to-end available bandwidth, in: Proceedings of the 3rd Passive and Active Measurements Workshop, vol. 11, 2002, pp. 14–25. [4] C.D. Guerrero, M.A. Labrador, Experimental and analytical evaluation of available bandwidth estimation tools, in: Proceedings of the IEEE Local Computer Networks, 2006, pp. 710–717. [5] C.D. Guerrero, M.A. Labrador, On the applicability of available bandwidth estimation techniques and tools, Computer Communications, 2009. doi:10.1016/j.comcom.2009.08.010. [6] R. Prasad, C. Dovrolis, M. Murray, K. Claffy, Bandwidth estimation: metrics, measurement techniques, and tools, IEEE Network 17 (6) (2003) 27–35. [7] V. Ribeiro, M. Coates, R. Riedi, S. Sarvotham, B. Hendricks, R. Baraniuk, Multifractal cross-traffic estimation, in: Proceedings of the ITC Conference on IP Traffic, Modeling and Management, 2002. [8] J. Navratil, R.L. Cottrell, Abwe: A practical approach to available bandwidth estimation, in: Proceedings of the 4th Passive and Active Measurement Workshop, 2003. [9] N. Hu, P. Steenkiste, Evaluation and characterization of available bandwidth probing techniques, IEEE Journal on Selected Areas in Communications 21 (6) (2003) 879–894. [10] B. Melander, M. Bjorkman, P. Gunningberg, A new end-to-end probing and analysis method for estimating bandwidth bottlenecks, in: Proceedings of the IEEE Global Telecommunications Conference, vol. 1, San Francisco, CA, USA, 2000, pp. 415–420. [11] V.J. Ribeiro, R.H. Riedi, R.G. Baraniuk, J. Navratil, L. Cottrell, pathchirp: Efficient available bandwidth estimation for network paths, in: Proceedings of the 4th Passive and Active Measurements Workshop, vol. 2, 2003.
990
C.D. Guerrero, M.A. Labrador / Computer Networks 54 (2010) 977–990
[12] A. Shriram, M. Murray, Y. Hyun, N. Brownlee, A. Broido, M. Fomenkov, K. Claffy, Comparison of public end to end bandwidth estimation tools on high speed links, in: Proceedings of the 6th Passive and Active Measurements Workshop, 2005, pp. 306–320. [13] M. Jain, C. Dovrolis, End-to-end available bandwidth: measurement methodology, dynamics, and relation with tcp throughput, IEEE/ ACM Transactions on Networking 11 (4) (2003) 537–549. [14] A. Shipalov, C.D. Guerrero, M.A. Labrador, M. Alzate, On the implementation of a capacity estimator for wireless Ad Hoc Networks, in: Proceedings of the IEEE SoutheastCon, Atlanta, 2009. [15] H. Zhou, Y. Wang, X. Wang, X. Huai, Difficulties in Estimating Available Bandwidth, in: IEEE International Conference on Communications, vol. 2, 2006, pp. 704–709. [16] V. Paxson, End-to-end Internet packet dynamics, IEEE/ACM Transactions on Networking 7 (3) (1999) 277–292. [17] F. Michaut, F. Lepage, Application-oriented Network metrology: metrics and active measurement tools, IEEE Communications Surveys & Tutorials 7 (2) (2005) 2–24. [18] C. Dovrolis, P. Ramanathan, D. Moore, Packet-dispersion techniques and a capacity-estimation methodology, IEEE/ACM Transactions on Networking 12 (6) (2004) 963–977. [19] L.R. Rabiner, A tutorial on hidden markov models and selected applications in speech recognition, Proceedings of the IEEE 77 (2) (1989) 257–286. [20] L.E. Baum, T. Petrie, G. Soules, N. Weiss, A maximization technique occurring in the statistical analysis of probabilistic functions of markov chains, The Annals of Mathematical Statistics 41 (1) (1970) 164–171. [21] T. Kanungo, Hmm toolkit, 1999,
. [22] G.D. Forney Jr., The viterbi algorithm, Proceedings of the IEEE 61 (3) (1973) 268–278. [23] B. Adamson, S. Gallavan, Mgen, 1997, . [24] K. Christensen, Traffic generator, 2005, . [25] P. Wightman, D. Jabba, M.A. Labrador, An rssi-based filter for mobility control of mobile wireless ad hoc-based unmanned ground vehicles, in: Proceedings of SPIE, 2008.
Cesar D. Guerrero is a Ph.D. candidate in the department of Computer Science and Engineering at the University of South Florida. He received his M.S. degree in Computer Science from the Instituto Tecnologico de Estudios Superiores de Monterrey (Mexico) in 2002 and his M.S. in Computer Engineering from University of South Florida (USA) in 2007. He is a Fulbright scholar who works with Universidad Autonoma de Bucaramanga (Colombia). His research interest includes Bandwidth Estimation and Network Measurement.
Miguel A. Labrador received the M.S. in Telecommunications and the Ph.D. degree in Information Science with concentration in Telecommunications from the University of Pittsburgh, in 1994 and 2000, respectively. Since 2001, he has been with the University of South Florida, Tampa, where he is currently an Associate Professor in the Department of Computer Science and Engineering, and the Director of the Research Experiences for Undergraduates Program. Before joining USF, he worked at Telcordia Technologies, Inc., NJ, in the Broadband Networking Group of the Professional Services Business Unit. His research interests are in design and performance evaluation of computer networks and communication protocols for wired, wireless, and optical networks, energy-efficient mechanisms for wireless sensor networks, bandwidth estimation techniques, and location-based services. Dr. Labrador has served as Technical Program Committee member of many IEEE conferences and is currently member of the Editorial Board of Computer Communications (Elsevier Science). He served as Secretary of the IEEE Technical Committee on Computer Communications (TCCC), Chair of the IEEE VTC 2003 Transport Layer Protocols over Wireless Networks Symposium, and guest editor of the special issue of Computer Communications on ”Advanced Location-Based Services.”