Future Generation Computer Systems 20 (2004) 257–263
An efficient event publish technique for real-time monitor on real-time grid computing Eui-Nam Huh a,∗ , Y. Mun b , Hyoung-Woo Park c a
Seoul Women’s University, Seoul, South Korea b Soongsil University, Seoul, South Korea c KISTI, Daejun, South Korea
Abstract Schedulability analysis (SA) approaches in large scaled distributed systems for real-time constraints are conventionally based on a priori information. Such systems use fixed execution time with constant workload, work well in many application domains, and allow pre-deployment guarantees of real-time performance such as rate monotonic analysis (RMA) [J. ACM 20 (1973) 46]. However, certain grid applications must operate in highly dynamic environments, thereby precluding accurate characterization of the applications’ workloads by static models. This considers issues that a new SA trigger needs to guarantee real-time performance during run-time on a dynamic environment of which applications experience large variations in the workload. This paper, especially, for the dynamic real-time grid computing environments, describes an efficient event (that are monitored in real-time) publish or policing technique which can trigger effective SA or reporting monitored-data appropriately, and uses a dynamic threshold which becomes sensitive when the quality of service (QoS) of the dynamic real-time application approaches to its deadline or constraint. © 2003 Elsevier B.V. All rights reserved. Keywords: Schedulability analysis; Real-time; Computing
1. Introduction Real-time grid computing is an emerging technology, which uses remote resources and should meet the quality of service (QoS) requirement of an application. Schedulability analysis (SA) approaches to allocate a program to remote resources that are based on a priori worst-case execution time (WCET) work well in many real-time application domains [1] and allow pre-deployment guarantees of performance. How∗
Corresponding author. E-mail addresses:
[email protected] (E.-N. Huh),
[email protected] (Y. Mun),
[email protected] (H.-W. Park).
ever, certain real-time applications on grid that uses geographically distributed resources must operate in highly dynamic environments, thereby precluding accurate characterization of the applications’ workloads by static models. So execution times of applications in this paper are variable with high variance during run-time. Thus, real-time grid applications in the dynamic environment should monitor system resources and workload changes in real-time, publish or report events to collector (or storage), trigger reanalysis of schedulability for system reallocation on violation appropriately and reallocate resources by the resource manager. If event publish occurs frequently, then SA policing (SAP) introduced in [2] becomes too sensi-
0167-739X/$ – see front matter © 2003 Elsevier B.V. All rights reserved. doi:10.1016/S0167-739X(03)00140-7
258
E.-N. Huh et al. / Future Generation Computer Systems 20 (2004) 257–263
tive to trigger reanalysis. Eventually, system overhead will be delivered. Otherwise, it cannot certify real-time performance of the application. Therefore, a novel, efficient SAP that is an intelligent decision procedure for SA is strongly required system component to provide the QoS requirement that will be changed dynamically and enables the system to be certified real-time systems. The monitored event also be ignored to be stored if SAP do not consider the change of events. The primary QoS changes of applications on the dynamic grid computing environments such as the weather forecasting system will be affected by dynamic workload changes. Furthermore, resource usage of each workload is depending on properties of applications. Thus, it is hard to verify causes of changing the QoS of an application by monitoring only changes of workload. In other word, it is hard to say how many workload will affect the QoS critically. An important design decision in SA techniques concerns the tradeoff between accuracy of SA and the overhead involved in achieving it. The triggering rate (the rate at which the SAP triggers reanalysis of schedulability depending on monitored events) will affect both of these issues. If SA is never triggered, this is equivalent to other offline approaches; at the other extreme, if SA is triggered too frequently, the overhead becomes prohibitive. Thus, we have experimented with several triggering rates with both the Adaptive RMA (ARMA) and the Dynamic QoS Prediction (DQP) techniques presented in [10] functioning in the Dynamic SA (DSA) architecture depicted in Fig. 1. The ARMA SA approach uses RMA with
a posteriori execution time of an application rather than a priori WCET. Hence, the DQP-based SA tests performance of an application compared with QoS requirement (or deadline). The results of DSA techniques will be applied to manage computing resources to allocate or control resources dynamically by assessing QoS metrics and resource utilization metrics that are determined a posteriori. In our model, SA is invoked by SAP dynamically. Conditions in SAP are determined by the host and the application status, which can be monitored periodically by external monitors; host monitor, workload monitor and performance monitor. That is, SAP should decide to trigger the SA request efficiently to justify schedulability of the real-time application. The system event change in our model includes: (1) workloads (denoted tl) of real-time applications, ‘A’, (2) CPU utilization percentage (CUP) of a host, Hk and (3) the desired control level (DCL) of ‘A’ at current cycle, c. We assume that a periodic real-time application ‘A’ is running on a host ‘Hk ’ at cycle c with an amount of workload ‘tl’, execution time, C, and performance, λ. Thus, the SA tests performance of an application such that λ(c)/λreq (A) ≤ 1, where λreq (A) is QoS requirement (performance) of the applications ‘A’. The remainder of this paper presents: experiments for SA in Section 2, SAP with proportional, integral, derivative (PID) controller with experimental evaluations of the effectiveness of these techniques in Section 3; and conclusion for real-time grid computing operating in dynamic environments in Section 4.
2. Experiments for SAP PID controller
SAP
Action request
SA
DCL(c+1) tl(c)
λ(c)
Performance Monitor
CUP(Hk)
Workload Monitor
Fig. 1. An architecture for DSA.
Host Monitor
In this section many experimental results are illustrated to measure accuracy of DQP and SAP. Our DQP approach using probabilistic contention analysis technique introduced in [10] uses current system status and profile data of applications and predict response time of the application [3–5]. Results are shown in Fig. 2(a) using DQP with two different SA triggers, when workload (tl) changes by 1 (additional one task or data to process) and when tl changes by 100. The results in both cases were similar. This indicates that a precise SA is not always necessary—it is almost as good to reevaluate SA when tl changes by 100 as
E.-N. Huh et al. / Future Generation Computer Systems 20 (2004) 257–263
259
0.6
Probablistic Response Time Prediction
0.5
1
CUP(H)
response time(sec.)
0.4
0.8
0.3
0.6
0.2
0.4
0.1
0.2
43
40
37
34
31
28
25
22
19
16
13
7
tl (= label * 25+1000)
43
40
37
34
31
28
25
22
19
16
13
7
10
4
1
0
10
4
1
0
tl (=label * 25 +1000) Lobs
DQP(tl1)
DQP(tl100)
CUPobs
ARMA(tl1)
ARMA(tl100)
ARMA(CUP10%)
ARMA(CUP25%)
RMA(CUP)
ARMA(CUP1%)
(a) Fig. 3. Resource (CPU) requirement analysis by RMA and ARMA.
response time (sec.)
Response Time Prediction 1 0.8 0.6 0.4 0.2 0 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 tl (=label * 25 +1000) Lobs
DQP(CUP1%)
DQP(CUP10%)
DQP(CUP25%)
(b) Fig. 2. DQP and SA triggers. (a) DQP and SA triggers with workload changes. (b) DQP and SA triggers with CPU changes.
when it changes by only 1; thus the overhead of DSA can be reduced significantly if triggering is moderate. Fig. 2(b) shows the results when three SA triggers relating to changes in CUP (CPU event) were employed; specifically, we triggered SA whenever CUP is changed by 1, 10 and 25%. In this case, the triggers of 1 and 10% performed reasonably, however, the 25% trigger performed poorly. (“Lobs” is the observed performance in figures.) Thus, it is not necessary to have SA performed for minute changes in CUP, because little (if any) accuracy is accrued for the additional intrusiveness required. To have the same environment of many experiments, an identical experiment file read by an experimental generator program is used to increase loads; host load and workload. For the dynamic real-time environment,
sensing applications in DynBench [6] are used and a filter application is used as a target task to predict response time. The results of evaluating: (1) RMA and (2) ARMA with the same triggers that were used for DQP are given in Fig. 3. Note that (as expected) the static RMA approach performed very poorly, with an average error of 21%. Similar observation is delivered from two different triggers that uses 1 and 10% CUP triggers for ARMA, and 1 and 100 tl triggers. RMA always consider the worst-cases of CPU utilization of applications with any SA triggers. Therefore, RMA always compute utilization bound for SA with WCET even though the application uses small amount of resources.
3. DSA with PID controller A feedback controller is designed to generate an output that causes some corrective effort to be applied to a process so as to drive a measurable process variable towards a desired value known as the set point. Using a basic feedback controller, especially, PID controller, introduced in Quasar scheduler [7], CPU proportion is reserved in a feedback manner. The proposed approach is to apply the PID controller to the QoS management of dynamic real-time applications over the distributed clusters. In DeSiDeRaTa [8], there are QoS monitors and host resource monitors among middleware components. Hence, using PID controller, QoS management is possible like controlling of the
260
E.-N. Huh et al. / Future Generation Computer Systems 20 (2004) 257–263
water level. Especially, in [9], PID controller is explained well as follows: • The Proportional term causes a larger control action to be taken for a larger error. • The Integral term is added to the control action if the error has persisted for some time. • The Derivative term supplements the control action if the error is changing rapidly with time. The PID controller formula is described as: u(k) = Kp e(k) + Ki
k
e(j) + Kd {e(k) − e(k − 1)},
j=1
(1) where, Kp , Ki , and Kd are parameters (tunable). Resource control mechanism using PID controller is shown in Fig. 4. Resource manager as a “Tuner” will change the real-time priority of the target process to tune the QoS. As different priority levels can vary queuing delay, QoS of the process will be adjusted to the required QoS. The queuing delay as “error” in Fig. 4 will be delivered every cycle. This feedback controller can be adaptable for dynamic real-time systems that changes of resource usage are various in run-time. In Fig. 1, SA techniques considered H/W and S/W system status. The decision of proper timing of SA is hard as we have seen in Fig. 3. To find proper decision of SA triggering, PID controller is employed. PID controller itself is not enough to react quickly for current error (e(k)) to the dynamic environments, as the integrated portion in PID controller holds all the previous errors ( e(k − 1)). control signal desired queuing delay (error)
Resource Manager
Fortunately, for the SA, when QoS is not changed, the DCL (u(k)) of an application or a set of applications that is computed by PID controller is not much changed from the previous one (u(k − 1)). Moreover, the response time or QoS is not changed significantly, even though the host load is significantly changed by lower priority tasks. In the above case, of course, if DSA with CUP 1% method is applied, SAP should trigger SA. However, PID controller keeps increasing u(k) (the desired queuing delay) and asking to change priority, while QoS change is very small from the previous one. Hence, the trigger of priority change by PID controller also is not necessary for every request. Therefore, it is necessary step to decide an appropriate threshold of DCL change. 3.1. Dynamic threshold of PID controller As we have seen the problem of triggering the resource control such as the priority control by PID controller, a threshold is required to reduce the overhead of the SA module. Furthermore, the fixed threshold cannot adopt the dynamic system changes. For example, when QoS is close to the requirement (set-point), we do better have a sensitive SAP to react to the environment. Fig. 5 shows how to compute the dynamic threshold, Threshold PID(k). It is designed for reducing SA triggers and be a sensitive SAP, when the DCL is close to the set-point. From Eq. (1), consider the error changes are small (steady state), that is, e(k) ∼ = e(k − 1) ∼ = e(k − 2), where k is the cycle c of the application. If
Condition:: if steady state, e(k) ≅ e(k-1) ≅ e(k-2) |u(k) – u(k-1)|= (Kp*e(k) + Ki *
k
∑ e ( j ) + Kd*(e(k)j 1
k 1
∑ e( j) + Kd*(e(k-1)-e(k-2)
e(k-1)) - (Kp*e(k-1) + Ki *
j 1
PID controller
current status (error)
application (process)
Fig. 4. PID controller mechanism.
= Kp(e(k)-e(k-1)) + Ki*(
k
k 1
j 1
j 1
∑ e ( j ) - ∑ e( j ) )
≅ Ki * e(k) = Threshold_PID(k)
Fig. 5. The dynamic threshold.
E.-N. Huh et al. / Future Generation Computer Systems 20 (2004) 257–263
That is, If |u(k) − u(k − 1)| > Threshold PID(k), then trigger SA. The overall algorithm to trigger SA is explained in Fig. 6. The main idea of the algorithm is “consulting with PID”. If DCL change from the previous DCL is greater than the dynamic threshold, then trigger SA. Otherwise, ignore SA. Fig. 7 shows that a number of SA triggers with sensitive SAP such as 1% CPU change plus the dy-
0.4 0.3 0.2 0.1 0 cycles
Fig. 6. SAP algorithm with PID.
error changes are small, SAP may ignore the changes to trigger SA. To apply PID controller to the dynamic QoS management, DCL changes are carefully considered to trigger SA. The QoS requirement is used as the “set-point”. Here, the DCL is not used as the target of the QoS to be reached. Therefore, from Fig. 6, the condition in each cycle in order to trigger SA is employed
0.5
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31
Procedure consult_with_PID 1. DCL_Changes = current_DCL – Prev_DCL; 2. if |DCL_Changes| > Threshold_PID 3. Trigger SA 4. Ignore Load changes
Resource Requirement Analysis with DSA CPU requirement
Procedure main 1. for(i=0; i< number_of_host; i++) { 2. Load_Change_of_Host[i] = current_load_host[i] prev_load_host[i]; 3. total_load_changes +=|Load_Change_of_Host[i]|; 4. } 5. if total_load_changes > Threshold_Load 6. consult_with_PID();
261
CUPobs
ARMA(tl1)
ARMA(CUP10%)
ARMA(CUP1%)
ARMA(CUP1%withPID)
Fig. 8. Accuracy of DSA with PID controller.
namic threshold of PID controller are reduced by 33%. (An additional sensing applications in DynBench with fixed workload is running in background to impact the host load.) The DSA technique noted “CUP1%withPID” shown in Fig. 8 that consults with PID controller on 1% CPU change detected performs accurately and appropriately. Next experiment shown in Fig. 9 illustrates the dynamic threshold that enable SA to be sensitive in order to trigger SA more frequently when current QoS is close to QoS requirement, while the fixed threshold has no ability to reflect the system saturation. The additional portion of “CUP1%withPID” from shaded bar-graph of Fig. 9 is the number of additional SA triggers with more stricted QoS constraint than
70 60
60
number of triggers
number of triggers
70 50 40 30 20 10 0
50 40 30 20 10 0
CUP1%
CUP1% with PID
Fig. 7. Comparison of SA triggers.
CUP1%
CUP1%withPID
Fig. 9. Sensitivity of the dynamic threshold with PID controller.
262
E.-N. Huh et al. / Future Generation Computer Systems 20 (2004) 257–263
when time is increased, while it is unstable in (b) using the fixed threshold in control module at any time.
Response
Bandwidth [MB]_
x(i)
b(i)-b(i-1)
m*Ki*e(i)
10 9 8 7 6 5 4 3 2 1 0
4. Conclusion
0
1
2
3
4
5
6
7
8
9
10
Time[sec]
(a)
Response
Bandwidth [MB]_
x(i)
e(i)
threshold
10 9 8 7 6 5 4 3 2 1 0 0
1
2
3
4
5
6
7
8
9
10
Time[sec]
(b) Fig. 10. Network traffic control with PID dynamic threshold. (a) Dynamic threshold with 10% noise. (b) Fixed threshold with 10% noise.
the previous experiment. From this experiment, the dynamic threshold can enable the SAP to trigger additional 25%. Therefore, the designed threshold performs very well in terms of sensitivity and accuracy. It reduced many unnecessary SA triggers even though it uses very sensitive SAP strategy. Additionally, as shown in Fig. 10, this scheme is applied to network traffic control. The dynamic threshold we use reduces event publish rate 30% less than any fixed threshold including network noises up to 10% of real data. System reliability or safety is also remarkably improved by 30 times from several experiments. As shown in Fig. 10(a), the network traffic state using dynamic PID threshold approaches (converges) to the desired network level (5 MB bandwidth)
We have developed two DSA approaches for dynamic real-time systems, which provide adaptable RMA (ARMA) and dynamic QoS prediction (DQP). We have identified engineering tradeoffs that must be made in the design of SAP mechanisms, and have experimentally evaluated several SAP implementations to show the importance of these tradeoffs. Using PID controller, the dynamic threshold reduced many unnecessary SA triggers, even though it uses very sensitive SAP strategy, such as 1% CPU change. The trigger is also very accurate in terms of QoS. Therefore, the designed threshold performs very well in terms of sensitivity and accuracy. It can be very effective method for the dynamic load balance. These effective results are applied also to figure out effective system change, and the overhead on host or network is reduced. Finally, this paper identified that the design of the dynamic threshold is very effective and necessary component through experiments for the dynamic real-time grid computing systems.
Acknowledgements This work is also supported by Seoul Women’s University special research fund in 2003. References [1] L. Sha, M.H. Klein, J.B. Goodenough, Rate monotonic analysis for real-time systems, in: A.M. van Tilborg, G.M. Koob (Eds.), Scheduling and Resource Management, Kluwer Academic Publishers, Dordrecht, 1991, pp. 129–156. [2] A. Atlas, A. Bestavros, Statistical rate monotonic scheduling, in: Proceedings of the 19th IEEE Real-time Systems Symposium, IEEE Computer Society Press, Silver Spring, MD, 1998, pp. 123–132. [3] J.P. Lehoczky, Fixed priority scheduling of period task sets with arbitrary deadlines, in: Proceedings of the IEEE Real-time System Symposium, IEEE Computer Society Press, Los Alamitos, CA, 1990.
E.-N. Huh et al. / Future Generation Computer Systems 20 (2004) 257–263 [4] L.R. Welch, A.D. Stoyenko, T.J. Marlowe, Response time prediction or distributed processes specified in CaRT-Spec, Contr. Eng. Prac. 3 (5) (1995) 651–664. [5] N.C. Audsley, A. Burns, M.F. Richardson, A.J. Wellings, Applying new scheduling theory to static priority pre-emptive scheduling, Report YCS-92-171, Department of Computer Science, York University, 1992. [6] L.R. Welch, B. Shirazi, A dynamic real-time benchmark for assessment of QoS and resource management technology, IEEE Real-time Appl. Syst. (1999). [7] A. Goel, D. Steere, C. Pu, J. Walpole, Adaptive resource management via modular feedback control, HOTOS, in: Proceedings of IEEE Real-Time Applications and System, 1999, submitted for publication. [8] L.R. Welch, B.A. Shirazi, B. Ravindran, C. Bruggeman, DeSiDeRaTa: QoS Management Technology for Dynamic, Scalable, Dependable, Real-time Systems, IFAC, 1998, pp. 7–11. [9] PID Tutorial. http://www.engin.umich.edu/group/ctm/PID/ PID.html. [10] E.-N. Huh, L.R. Welch, Y. Mun, Response time estimation for dynamic, distributed real-time systems, Lecture Notes in Computer Science, Springer, Berlin, vol. 2331, May 2002, pp. 1071–1079.
263
Eui-Nam Huh has earned BS degree from Busan National University in Korea, Master degree in Computer Science from University of Texas in 1995, and PhD degree from The Ohio University in 2002. He was a director of Computer Information Center and assistant professor in Sahmyook University during the academic duration in 2001 and 2002. He has also served for the WPDRTS/IPDPS community as Program Chair in 2003. He has been an editor of Journal of Korean Society for Internet Information and Korea Grid QoS Working Group Co-Chair since 2002. He is now assistant professor in Seoul Women’s University. His interesting research areas are follows: high performance network, distributed real-time system, grid middleware, and monitoring, QoS. Hyoung-Woo Park earned BA degree from University of Seoul in 1985, MS and PhD degrees from Sungkyunkwan University in 1996 and 2001, respectively. He is currently working on super computing center in Korea Institute of Science and Technology Information as a director. His interesting topics are as follows: grid computing, and next generation of TCP.