The Journal of Systems and Software 83 (2010) 2317–2321
Contents lists available at ScienceDirect
The Journal of Systems and Software journal homepage: www.elsevier.com/locate/jss
A pattern-based prediction: An empirical approach to predict end-to-end network latency JunSeong Kim ∗ , Jongsu Yi School of Electrical and Electronics Engineering, Chung-Ang University, Seoul, Republic of Korea
a r t i c l e
i n f o
Article history: Received 6 May 2010 Received in revised form 5 July 2010 Accepted 6 July 2010 Available online 15 July 2010 Keywords: End-to-end network latency Pattern-based method Latency prediction Network stability Network-based applications
a b s t r a c t Understanding latency in network-based applications has received considerable attention to provide consistent and acceptable levels of services. This paper presents an empirical approach, a pattern-based prediction method, to predict end-to-end network latency. The key idea of the approach is to utilize past history of latency and their variation patterns in latency predictions. After some preliminary study on simple numerical prediction models we examine the effectiveness of the proposed method with real latency data and various definitions of network stability. Our results show that the pattern-based method outperforms any single numerical model obtaining an overall prediction accuracy of 86.2%. Crown Copyright © 2010 Published by Elsevier Inc. All rights reserved.
1. Introduction The importance of understanding communication overhead in network-based applications cannot be overemphasized since network delays often impact on applications’ performance, thus degrading end-user experiences. For example, in networked control systems (NCS) such as telerobotics, which enable execution of tasks from long distance through networks, variations in network delay result in delayed command order or decreased control latency (Mirfakhrai and Payandeh, 2002). It can lead to instability as well as performance degradations. In network-based computing such as cluster or grid computing, which perform a single task with a group of linked end-systems, different runs of the same application might have different execution times due to changes in network status (Yang and Guo, 2006). The results can become meaningless when the application is time critical. Network latency is of great interest in network-based applications and understanding the dynamic variations of latency is critical to provide consistent and acceptable levels of services. There have been many research efforts on network latency modeling and prediction. Queueing theory is used with a small scale network having the information on individual links involved (Hasib et al., 2007; Yang et al., 2004). However, the computation cost grows dramatically as the size of the network increases. System identification and time series approaches such as prediction
∗ Corresponding author. E-mail addresses:
[email protected] (J. Kim),
[email protected] (J. Yi).
error identification method (PEM), autoregressive (AR) and moving average (MA) models are used for modeling network dynamics (Mirfakhrai and Payandeh, 2002; Yang et al., 2004; Ohsaki et al., 2001). However, the time varying and nonlinear nature of network delay limits the applicability. Artificial neural networks approximate the dynamics of network delay with its strong adaptive learning ability (Belhaj and Tagina, 2009; Parlos, 2002). However, it has high computational complexity and requires extra training processes. Network latencies are random processes, which are too complex to be mathematically modeled and therefore hard to forecast accurately. This paper presents an empirical approach to predict end-toend network latency: a pattern-based prediction method. The basic idea of the approach is that network traffic repeat cycles of congested states and that a variation of network latency is strongly correlated with the past history of the latency. In order to examine the effectiveness of the approach we define various levels of network stability based on variations of network latency. For a stable network one of several simple numerical models is dynamically selected for predictions. For an unstable network it focuses on patterns of latency variation as well. Experiments with real latency data show that the proposed approach improves performance without the added burden of complex computations. It is a simple and extensible method such that it can be used for scheduling network resources in network-based applications. Furthermore, it will be very helpful for network monitoring and planning, routing protocol design, and flow control algorithms. In the remainder of the paper, Section 2 briefly surveys network latency by identifying the sources of delay. Section 3 examines sev-
0164-1212/$ – see front matter. Crown Copyright © 2010 Published by Elsevier Inc. All rights reserved. doi:10.1016/j.jss.2010.07.014
2318
J. Kim, J. Yi / The Journal of Systems and Software 83 (2010) 2317–2321
Fig. 1. The round trip time to send a 64 byte probing packet was measured in every 10 s for a 24-h period.
eral simple numerical prediction models used in network-based applications. Section 4 describes the pattern-based prediction method in detail. Section 5 then evaluates the proposed method for a set of real network latency data. Finally, Section 6 summarizes our results and conclusions.
3. A preliminary study
End-to-end network latency is the time delay for a packet to be transmitted from source to destination (Belhaj and Tagina, 2009; Lumezanu et al., 2009). Since there are typically several intermediate nodes in between it is the sum of the delays at each hop on the transmission. Each end-to-end latency can be expressed by:
We collect network latency using the ping tool. Though there may exist asymmetry between the forward and backward network delays, as in many studies on latency dynamics, we use round trip time (RTT) in this experiment (Yang et al., 2004; Belhaj and Tagina, 2009). We measured network latency between nodes within the Chung-Ang University campus over a 24-h period. Fig. 1 shows the 24-h sample with the X-axis representing the elapsed time from 8:00 a.m. on Thursday, September 24, 2009 to 8:00 a.m. the following day in units of 10 s (giving 8640 points during a day). The Y-axis represents the round trip time of a 64 byte probing packet sent between two fixed nodes.
latency = tD + tS + tQ + tF + tP
3.1. Simple numerical prediction models
2. End-to-end network latency
where • tD is distance (or propagation) delay. It is determined by the physical characteristics and geographic distance of the transmission path. For a given network path it is a constant. • tS is serialization delay. It is the amount of time for a network interface to perform bitwise transmission of a packet to the physical media. For a given size of packets it is a constant. • tQ is queuing delay. It is the amount of time for a packet to spend in a buffer waiting its turn to be transmitted. On a network interface it depends on the incoming packet arrival rate and the outgoing packet transmission rate. • tF is forwarding delay. It is the amount of time for a network device to lookup the destination address of a packet and to forward the packet to the outbound interface. It is typically a constant. • tP is protocol delay. It is the amount of time to process transmission algorithms at the end points of a network path. For a given source and destination pair it is a constant. tS , tQ , and tF are per-hop delays since each network switch or router receives a packet, performs a checksum operation, and then forwards it. Any variations in network latency results mainly from tQ due to packet spacing from multiple sources at a common outbound interface, partly from tS due to the variance in packet sizes, and less frequently from tD due to different routes taken from source to destination. We can mathematically characterize, in theory, the network latency. However, networks consist of multiple administrative domains and the complexity and the heterogeneity in numerous factors including congestion, QoS, and routing protocols make it very difficult to derive an accurate modeling of the latency.
We use several numerical models, as in the network weather service (NWS) and the network status predictor (NSP), to predict network latency based on measurements of latency history (Wolski, 1997; Kim and Lilja, 1999). We use a sliding window mechanism to record the latency history. We denote a window record that consists of the most recent K latency values SwinK . Let SwinK (i, t) be the (i + 1)th element in the window SwinK at time t. Then we can summarize the latency characteristics of the window SwinK by1 1 avgK (t) = SwinK (i, t) K K−1
i=0
medK (t) = SortWinK ((K − 1)/2, t) minK (t) = SortWinK (0, t) maxK (t) = SortWinK (K − 1, t) minK (t) + maxK (t) halfK (t) = 2 where SortWinK (i, t) is the (i + 1)th element of the sorted sequence of the SwinK in increasing order at time t. While the average value provides equal weight to all K elements within the window SwinK , the median value may be more representative when the events tend to show great variability with an asymmetric distribution. Since burstiness is a well-known feature of network latency a mechanism to eliminate any outliers in latency history would be
1
when K is even, medK (t) = [SortWinK (K / 2 − 1, t) + SortWinK (K / 2, t)] / 2.
J. Kim, J. Yi / The Journal of Systems and Software 83 (2010) 2317–2321
2319
Fig. 2. A comparison of the prediction accuracy of the simple numerical models over the 24-h samples.
helpful. That is, we want to ignore the window’s largest group, which is typically too large, and the smallest group, which is typically too small, in consideration. From the trimmed list, considering only the central (K − 2T) elements from SortWinK , we have 1 t avgK (t) = K − 2T
K−T −1
SortWinK (i, t)
i=T
t minK (t) = SortWinK (T, t) t maxK (t) = SortWinK (K − T − 1, t) t halfK (t) =
t minK (t) + t maxK (t) 2
where the value T determines the fraction of elements in the window SwinK that are ignored. 3.2. Performance comparison To test the effectiveness of a prediction model we define a performance metric, prediction accuracy. It is about the correctness of a prediction and defined to be the ratio of the number of correct predictions to the total number of predictions made. We call a prediction correct when the prediction error is less than % of the actual latency value. Fig. 2 shows the prediction accuracy for the simple numerical models when applied over the 24-h sample data. We consider the window sizes of K = 1, 3, 5, 7, 9 and the error bounds of = 1%, 5%, 10%. For the trimmed list we set T = 1 such that only the largest and the smallest ones of SwinK can be ignored. We can see that the med9 model (the median-based prediction with K = 9) shows the best performance of 80.8% and the t min5 model (the trimmed minimum-based prediction with K = 5) shows a comparable result of 80.2% when = 10%. These coincide with the fact that a median or a minimum values are used in many network coordinate systems and internet routing policies (Lumezanu et al., 2009; Dabek et al., 2004). In general, the simple numerical models with trimmed lists give more consistent and higher performance than those with all the latency history of SwinK .
in latency history can be ignored in predictions. We expect that network latency is strongly correlated with the past behavior of the latency variations. In addition, we do not expect any single model will work well throughout different traffic situations. Instead, by combining multiple models we can have better predictions than using a single model for all traffic situations. To exploit the behavior of network latency in a wide range of different traffic situations, we define network status based on stability: First, we classify each RTT according to its value and distribution. For a given network path and a probing packet, queuing delay tQ mainly determines RTT reflecting the instantaneous network load. Next, for a given window SwinK we define network status of stability such that stablewK (t) = 1 if all the K elements belong to the same class of RTT and t stablewK (t) = 1 if the K − 2T elements of the trimmed list belong to the same class of RTT. That is, we say a network is stable in the criteria of stablewK when there is no abrupt changes for the last K latency history. Similarly, a network is stable in the criteria of t stablewK when there is no more than a single high and a single low peak for the last K latency history. By using different window sizes K we can have various levels of network stability. Fig. 3 shows the structure of our pattern-based latency prediction method. It consists of a network status of stability detector (NSSD) and a latency pattern history table (LPHT). The NSSD captures a level of network stability in the criteria of stablewK and t stablewK with the record of SwinK and dynamically selects one among multiple models for a latency prediction. The best matches between prediction models and levels of network stability can be set in advance by examining the prediction accuracy as in Section
4. A pattern-based prediction method It has been known that networks repeat the cycle of being congested-empty-congested indefinitely (Hasib et al., 2007; Zhang and Clark, 1990). In fact, when we carefully look into the preliminary study results we can find that there exist certain patterns in the variations of network latency. For example, when latency remains in stable it is much more predictable and any single abrupt change
Fig. 3. The structure of pattern-based latency prediction method.
2320
J. Kim, J. Yi / The Journal of Systems and Software 83 (2010) 2317–2321
3.2. The LPHT records the variation patterns of network latency in terms of RTT classes. The operations of LPHT are quite similar to those of branch predictors in computer architecture (Shen and Lipasti, 2005). Each entry of the LPHT contains four fields: latency pattern, target class, count and dominant. The latency pattern field consists of a series of RTT classes for N consecutive RTT values. For a window SwinK , at time t, the last N among K (N ≤ K) RTT history is used for indexing LPHT. The target class field is for the latency prediction. It is determined by the actual RTT values for a specific latency pattern. When the same latency pattern results in different outputs an entry in LPHT is allocated instead of replacing any existing one. This allows multiple matches in a LPHT index and the count and the dominant fields are used for resolving the conflicts. When conflict matches occur the one whose dominant field is set is chosen for prediction. If there is no dominant entry in LPHT for a latency pattern it acts like no matches for the pattern. The count field is increased by one whenever the combination of a latency pattern and the corresponding target class is found. The dominant field is set by comparing the values in the count fields of all conflict matches. The degree of dominance can be set in advance. The performance of the pattern-based prediction method depends on how to manage LPHT. In this experiment, we assume an infinite size of LPHT and limit the use of LPHT only for a network in low levels of stability: an entry in LPHT is allocated when a prediction turns out to be wrong in low levels of network stability. Networks in periods of high levels of stability are desirable to maintain a stable end-to-end network latency and are quite predictable. 5. Experiments and results Fig. 4 shows the RTT distribution for the 24-h sample latency data. From the graph we have five RTT classes such that each class is centered around the four different peaks. Since the interval of each RTT class is too large we further divide each RTT class into three to five subclasses such that a median value can represent the corresponding subclass. That is, the boundaries of a subclass are within the error bounds of = 10% from the median value. Table 1 shows the classification of RTT for the 24-h samples into 5 classes and 17 subclasses. By allowing overlaps between RTT classes we provide flexibility in determining a network status of stability. In this experiment, we consider five different levels of network stability: stablew9 , stablew7 , t stablew9 , t stablew8 , and unstable in the order of high to low levels of stability. The higher levels of stability are subsets of the lower levels of stability in general. At time t, with a record of Swin9 , we first examine the network stability in the criteria of stablew9 . If all the last K = 9 RTT history belong to the same RTT class then we say that the network is in the highest status
Table 1 The classification of RTT for the 24-h samples. Class 1
rtt ≤ 320 s
Subclass 1 Subclass 2 Subclass 3
rtt ≤ 230 s 230 s < rtt ≤ 270 s 270 s < rtt ≤ 320 s
Class 2
270 s < rtt ≤ 650 s
Subclass 4 Subclass 5 Subclass 6 Subclass 7
320 s < rtt ≤ 380 s 380 s < rtt ≤ 450 s 450 s < rtt ≤ 550 s 550 s < rtt ≤ 650 s
Class 3
550 s < rtt ≤ 1300 s
Subclass 8 Subclass 9 Subclass 10 Subclass 11
650 s < rtt ≤ 750 s 750 s < rtt ≤ 900 s 900 s < rtt ≤ 1100 s 1100 s < rtt ≤ 1300 s
Class 4
1100 s < rtt ≤ 2500 s
Subclass 12 Subclass 13 Subclass 14 Subclass 15
1300 s < rtt ≤ 1500 s 1500 s < rtt ≤ 1800 s 1800 s < rtt ≤ 2100 s 2100 s < rtt ≤ 2500 s
Class 5
2100 s < rtt
Subclass 16 Subclass 17
2500 s < rtt ≤ 3000 s 3000 s < rtt
of stability. Otherwise, we examine the stability in the criteria of stablew7 , t stablew9 , t stablew8 in the order. Fig. 5 shows the prediction accuracy of simple numerical models, presenting the best five ones for the different levels of network stability. We can see that one model is good for a level of stability and that another model for a different level of stability. In high levels of stability, which shows a very stable pattern in latency variations, halfK and avgK based models show higher prediction accuracy around 91% than other models. In low levels of stability, which shows a very bursty pattern, t minK based models show higher prediction accuracy around 73% than others. In between, which shows only occasional bursts of traffic, medK based models show higher prediction accuracy around 83% than others. These observations are used in the design of NSSD for dynamically combining the multiple models into a single predictor for better predictions and in the design of LPHT for determining the degree of dominance of a latency pattern. When there are multiple matches for a LPHT indexing the entry whose count value is higher than 73% of the sum of all the matched entries’ count values set its dominant field. Table 2 shows the prediction accuracy of the pattern-based prediction method. Also shown for comparison are the prediction accuracy of the med9 model, which shows the single best model among used, and that of the combined method of multiple models, which corresponds to the pattern-based prediction without LPHT. We can see that by simply combining multiple models we easily improve prediction accuracy by 2.3% and that the pattern-based method further improves the prediction accuracy by 3.1% when we consider the error bound of 10%. From the result we can con-
Fig. 4. The distribution of RTT values for the 24-h samples.
J. Kim, J. Yi / The Journal of Systems and Software 83 (2010) 2317–2321
2321
Fig. 5. The prediction accuracy of the simple numerical models for the different levels of network stability.
Table 2 The comparison of prediction accuracy of the single best med9 model, the combined method of multiple models and the pattern-based prediction method. Error bound
med9 model prediction
Multiple model prediction
Pattern-based prediction
= 1% = 5% = 10%
14.1 58.2 80.8
14.4 59.5 83.1
14.4 62.1 86.2
clude that network latency is strongly correlated with the latency variation history and that utilizing the variation patterns results in accurate latency predictions. 6. Conclusions Understanding the dynamics of network latency is of great interest in network-based applications where the quality of information available in real time is a major concern. In this paper, we presented an empirical approach, a pattern-based prediction method, to predict network latency. It is a simple and extensible method relying on past history of network latency and their variation patterns. Since no single model work well for all traffic situations we combine multiple models into a single one based on network stability. In addition, dominant patterns of latency variations are recorded and used to complement the multiple model predictions. The effectiveness of the proposed method has been examined experimentally with a set of real latency data. The pattern-based method improves prediction accuracy by 5.4% over the best single prediction med9 model. The pattern-based prediction method would be helpful in effectively scheduling resources in network-based applications. As future works, it would be important to investigate the effectiveness of the pattern-based method using different probing packets and probing rates. Acknowledgement This research was supported by the Chung-Ang University Research Grants in 2010.
References Belhaj, S., Tagina, M., 2009. Modeling and prediction of the internet end-to-end delay using recurrent neural networks. Journal of Networks 4 (6), 528–535. Dabek, F., Cox, R., Kaashoek, F., Morris, R., 2004. Vivaldi: a decentralized network coordinate system. In: Conference on Applications, Technologies, Architectures, and Protocols for Computer Communications, pp. 15–26. Hasib, M., Schormans, J., Timotijevic, T., 2007. Accuracy of packet loss monitoring over networked CPE. IET Communications 1 (3), 507–513. Kim, J., Lilja, D.J., 1999. A network status predictor to support dynamic scheduling in network-based computing systems. In: International Parallel Processing Symposium, pp. 372–378. Lumezanu, C., Baden, R., Spring, N., Bhattacharjee, B., 2009. Triangle inequality variations in the internet. In: ACM SIGCOMM Conference on Internet Measurement, pp. 177–183. Mirfakhrai, T., Payandeh, S., 2002. A delay prediction approach for teleoperation over the internet. In: IEEE International Conference on Robotics and Automation, pp. 2178–2183. Ohsaki, H., Murata, M., Miyahara, H., 2001. Modeling end-to-end packet delay dynamics of the internet using system identification. In: International Teletraffic Congress, pp. 1027–1038. Parlos, A.G., 2002. Identification of the internet end-to-end delay dynamics using multi-step neuro-predictors. In: International Conference on Neural Networks, pp. 2460–2465. Shen, J.P., Lipasti, M.H., 2005. Modern Processor Design: Fundamentals of Superscalar Processors. McGraw-Hill Companies, Inc. Wolski, R., 1997. Dynamically forecasting network performance to support dynamic scheduling using the network weather service. In: International Symposium on High Performance Distributed Computing, pp. 316–325. Yang, L.T., Guo, M., 2006. High-Performance Computing: Paradigm and Infrastructure. A John Wiley and Sons, Inc. Yang, M., Li, X.R., Chen, H., 2004. Predicting internet end-to-end delay: an overview. In: IEEE Southeastern Symposium on Systems Theory, pp. 210–214. Zhang, L., Clark, D.D., 1990. Oscillating behavior of network traffic: a case study simulation. Internetworking: Research and Experience, pp. 101–112. JunSeong Kim received a PhD in Electrical Engineering from the University of Minnesota at Minneapolis, an M.S. and a B.S. both in Electronics Engineering from Chung-Ang University in Seoul, Korea. He is currently an associate professor of School of Electrical and Electronics Engineering at the Chung-Ang University in Seoul. Jongsu Yi received an M.S. and a B.S. both in School of Electrical and Electronics Engineering from Chung-Ang University in Seoul, Korea. He is currently on PhD course in School of Electrical and Electronics Engineering, Chung-Ang University.