INFORMATION AND SOFTWARE TECHNOWGY ELSEVIER
Information and Software Technology 39 (1997) 195-204
Predictive capacity planning: a proactive approach Choon-Ling Department
of Information
Sia, Yin-Seong
Ho
Systems and Computer Science, National University of Singapore, 10 Kent Ridge Crescent, Singapore 119260, Republic of Singapore
Received 7 November 1995; 12 August 1996; accepted 15 August 1996
Abstract There are a number of difficulties associated with traditional capacity planning studies which adopt a ‘fix-it-later’ approach. An underestimation of the costs of remedial action which produces only incremental improvements is an example of such difficulties. To address these difficulties, a proactive approach is proposed which uses a prediction-based performance model within a capacity planning framework to recommend either the configuration for a new computer system or the reconfiguration of an installed system. The model used in the framework predicts the resource usage of applications, comprising the potential workload of a system, given the structure of the application programs, the resource usage of their basic components, and forecasted workload characteristics. This information is used to identify applications which may compete for certain system resources if they are executed concurrently in the system. By characterizing the resource usage of the forecasted workload, the occurrence of capacity problems may be pre-empted. 0 1997 Elsevier Science B.V. All rights reserved. Keywords:
Proactive approach;
Performance
modelling;
Capacity
planning
1. Introduction Capacity planning is the ‘class of the problems related to the prediction of when in the future the capacity of an existing system will become insufficient to process the installation’s workload with a given level of performance’ [l]. However, traditional capacity planning is mainly based on the ‘fix-it-later’ approach which is deemed to be inappropriate today [2]. Smith [2] highlighted a number of difficulties associated with the ‘fix-it-later’ approach. Firstly, it is easy to lose control of budgets for machine upgrades and maintenance plans. Secondly, tuning the system is more likely to yield only modest incremental improvements. Thirdly, costs tend to escalate when problems are fixed later rather than earlier. On top of that, poor system performance can leave a long-lasting negative impression. To alleviate the difficulties, a model for predicting the performance behaviour of application programs is proposed within a capacity planning framework. The predictive model is designed to estimate the performance of application programs with respect to their usage of the various system resources. Consequently, performance bottlenecks arising from any forecasted or hypothetical workload can ’ Fax: (65) 779 4580; e-mail:
[email protected] ’ Fax: (65) 779 4580; e-mail:
[email protected] 0950-5849/97/$17.00 0 1997 Elsevier Science B.V. All rights reserved PII 0950-5849(96)01142-l
be assessed before they occur. If a proactive approach is taken by institutionalizing such modelling within a computer installation on a regular basis, many difficulties of traditional capacity planning may be alleviated. This paper is divided into two main parts. The first part describes the predictive model and the evaluation tests performed on the model to determine its accuracy in estimating the resource usage of programs. The model is then incorporated into a capacity planning framework proposed in the second part of this paper, which is aimed at identifying and planning for the forecasted capacity problems before they actually occur. The work described in this paper is based on the UNIX operating system running C language applications which has been gaining immense popularity in recent years. Thus, a predictive model for C applications on UNIX is likely to be useful to many UNIX installations. An IBM-compatible personal computer is used as the demonstrative platform on which the predictive model is validated on its accuracy and effectiveness. The hardware and software specifications for the data collection are: Machine
Ideal CPU speed
: KBS 486DX Personal Computer with a 80487 Numeric coprocessor : 33 MHz/l 1 MIPS/l 1 SPECmark89
C.-L. Sk, Y.-S, Holinformation and Software Technology 39 (1997) 195-204
196
Actual CPU speed RAM Hard disk capacity Hard disk transfer rate Operating
system
Language
used
2. Literature
: 32.14 MHz : 640KB (base), 19456 KB (extended) : 210MB : 15 ms or 931.7KB per second per track : USL UNIX System V Release 4.2 Version 1 : C programming language [3]
review
2.1. UNIX UNIX is a highly modular operating system that has been described as an ‘onion-style’ implementation in [4] in that high-level applications and services are built on progressively lower-level, more primitive functions and services. Each layer interacts with the next lower layer by invoking services provided by the lower layer. At the lowest level just before the hardware level is the UNIX Kernel. User processes gain access to the kernel through system calls. System calls are system functions for managing the UNIX system’s resources such as processes, primary storage, secondary storage and peripherals. The modular and layered architecture of UNIX makes it particularly appropriate for our study because the communication rules between and within layers are clear and unambiguous. 2.2. Performance
model&
Software Performance Engineering (SPE) is a methodology advanced by Smith [2] to develop software that meets performance objectives. SPE efforts permeate the entire software lifecycle. Performance considerations are factored into the software development early so as to prevent, rather than remedy, the occurrence of performance problems. Throughout the software lifecycle, use of the SPE helps to predict and manage the performance of the software as it evolves. This is achieved by comparing the actual performance of the software against the predictions and performance objectives. Performance of the software is predicted by carrying out a static analysis. A static analysis requires the construction of a software execution model based on abstracting the resource requirements of various workload scenarios. Following the static analysis, a system execution model which represents key computer resources as queueing networks, is used to evaluate whether the execution of the workload meets the performance objectives. To date, an architecture for an SPE tool which can interface with Computer-Aided Software Engineering (CASE) tools has been proposed [5]. In addition, the effectiveness of SPE has been demonstrated using a real-world example [6]. Queueing Network Models (QNM) have also been used
by performance analysts to evaluate the performance of computer systems [7,8]. Key interconnected computer resources are modelled as service centres connected in a network, each with a specified service time distribution. Service requests originate from customers who queue to place their requests at servers or terminals. Each service request demands different sequences and amounts of service from some or all of the service centres. By modelling the network of queues and service centres, QNMs enable the performance of a computer system with specific workload characteristics to be modelled. In addition to the above, a number of researchers, namely Feder [9], Loukides [lo] pp. 3-4, Mikkilineni and Utter [ 1l] pp. 135-140, and Beizer [12] pp. 57-64, have proposed models to characterize the performance of programs with the objective of improving their performance. These works model the performance of programs by looking at different components which contribute to their execution times.
3. The predictive
model
3.1. Proposal of the model and data collection The predictive model proposed in this study adopts a different orientation from the different performance models documented in the literature review. It looks at the performance of programs written in C on UNIX systems from an applications perspective by analysing their usage of major system resources. While the SPE predicts software performance mainly from the transactions which make up the workload, the model proposed here takes a different stance by characterizing the resource usage of individual language primitives which comprise the processes in the likely workload of an installation. The rationale for characterizing the resource usage of language primitives is that once this information is collected for a particular installation, the resource usage of any new applications which change the nature of the workload can be calculated easily. The other contribution of this study is that the predictive model is applied to a different context from the above models; that of predicting system bottlenecks with the ultimate goal of enabling preventive capacity planning as discussed in Section 4. The goal of predictive modelling of an application/ program is to make predictions about the execution time of the program based on its projected usage of system resources. The usage can be estimated by knowing the nature of the program components and their resource usage characteristics [ll] pp. 135-140, [12] pp. 57-64, [13] pp. 1659-1679, and [14] pp. 317-320. C-language applications on UNIX systems are particularly suitable for this study due to their modular nature. The nature of the program components is modelled by determining the entire hierarchical structure of applications which comprises the workload of an installation, from the applications at
197
C.-I. Sia, Y.-S. Hoilnformationand SoftwareTechnology39 (1997) 195-204
the highest level to the system calls, commands and language primitives at the lowest levels (a program analyser [15] can be used to automate the extraction of such information) for a UNIX-based system. The resource usage characteristics of the program components are determined by measuring their use of the major system resources, namely the CPU, the disk subsystem and the memory subsystem. Thus, it is hypothesized that the execution time of a program when executed alone is: T =
TCPU +
TIIO +
described in Section 1 was conducted to demonstrate the feasibility of the predictive model and the capacity planning framework proposed in Section 4. In analysing the data collected, it was found that: consume 1 Empty programs average (over three samples) programs are:
minimal resources. The resource usages of empty
Real time (sets)
CPU system (sets)
CPU user (sets)
0.0267
0.02
0.0067
TMEM
where T TCPU
TII0 TMEM
= total processing
time (real time, CPU SYS or CPU USER) for the program time of those system calls and other = processing program components which use the CPU = processing time of those system calls and other program components which use the I/O subsystem. time for the declaration and access of = processing memory within the program
The proposed model as applied to each of the resource outlined below. 3.1.1.
is
CPU
Since each process has the CPU entirely to itself during its allocated timeslice [4] p. 76, the CPU usage of a C program is a function of the processing of its program components. From the CPU times required to process each of the program components, and the structure of each program with respect to these components, the CPU usage of the program can be predicted. It is thus hypothesized that the CPU time of each program is: TCPU= E +
(F*NF) + C(Sj*Nsj)+ C(LI’*NL;)
+ ~wk*Nsck)+ c(LFf*NLFf)
CPU system refers to the amount
of CPU time used by the system for administrative purposes (such as starting up the process) during the execution of a program, while CPU User is the amount of CPU time used in the actual processing of the user program. 2. There was no statistical difference (determined using the t-Test at the 0.05 level of significance) among the resource (CPU) usage of C statements, such as the assignment statement, arithmetic statements, ‘goto’, ‘if’, and ‘return’, and the function calls. The average (over three samples) resource usage of ten thousand assignment statements is chosen as the representative of C statements in the model:
Real time (sets)
CPU system (sets)
CPU user (sets)
0.12
0.05
0.01
3. The t-Test applied on the CPU usage of program loops (‘for’, ‘while’, and ‘do’) and conditional statement ‘switch’ did not reveal any significant variance. The resource usage of the ‘for’ loop will be used in the model. The average CPU usage of one unit of the ‘for’ loop is:
-t CU’GCm *NPCC~1 where F
E
= processing time of an empty program = unit processing time of overhead of function
NF
=
Sj Nsi Li NLI
SCk Nsck LFf NLFf PGCm N PGCm
number of functions = unit processing time of C statements of type j = number of C statements of type j = unit processing time of loop/conditions of type i number of loops/conditions of type i = = unit processing time of system call k = number of system calls k = unit processing time of library function f = number of library functions f time of programming com= unit processing mand m commands m = number of programming
Data collection
using the hardware
and software
platform
Real time (sets)
CPU system (sets)
CPU user (sets)
1.633 x IO-’
3.333 x lo-’
1.567 x IO-’
4. The CPU usage of system calls are reported in [15]; 5. The collection procedures for the CPU usage of library functions and programming commands are very similar to that for system calls and can be performed easily if required. 3.1.2.
/JO
The only I/O operations studied in this paper are those involving the disk. The system call which performs disk input operation is the ‘read’ system call, while the ‘write’ system call handles output to the disk. The characteristics of disk usage are actually very similar to that of the CPU, since only one process can make use of the disk subsystem at any
C.-L. Sia, Y.-S. Hoilnformation and Software Technology 39 (1997) 195-204
198
Characters
read
Iterations
1
10
1,000
10,000
0.01 0.003333 0 0 0 -0.4
0.001 0.001333 -0.00067 0.333333 7.666667 -0.83333
0.00016 0.000157 0.00001 0 0 -4.66667
0.000151 0.00014 0.000011 0 0 1.666667
Indexes 1
Real time CPU system CPU user avque avwait avserv
Iterations
Characters
read
1
10
100
1,000
0.00016 0.000157 0.00001 0 0 -4.66667
0.00014 0.000127 0.000027 0 0 -0.66667
0.000173 0.000167 0.00001 -0.1 -3 0.633333
0.00034 0.00033 0.00001 0 0 0.983333
Indexes 1,000
Real time CPU system CPU user avque avwait avserv
one time [4], p. 7. The usage of the disk is manifested in the execution time of the program. Thus, disk usage can be modelled by: TIIO =
under varying amounts of iterations and different numbers of characters transferred are shown below. (The performance indexes Device avque, Device avwait and Device avserv are calculated by taking the difference of the performance index between the program with and the program without the ‘read’ system call.) The average resource usage for one ‘write’ system call under varying amounts of iterations and different numbers of characters transferred are shown below. (The performance indexes Device avque, Device avwait and Device avserv are calculated by taking the difference of the performance index between the program with and the program without the ‘write’ system call.) The data indicated that the average usage for the transfer
C(zi0i.m *N,/oi,m) I
where Ifoi,m
Nltoi,m
=
unit processing time for an I/O command (system call, library function or programming command) of type i transferring mbytes of data = number of I/O command of type i transferring m bytes of data in the program.
The average
Characters
resource
written
usage for one ‘read’ system call
Iterations
1
10
1,000
10,000
0.113333 0.01 -0.00333 0 0 1.333333
0.014 0.001333 0.000333 0 0 -0.03333
0.000427 0.000277 0 0.066667 2.333333 0.266667
0.000296 0.000252 0.000013 0.133333 3.166667 -0.86667
Indexes 1
Iterations
Real time CPU system CPU user avque avwait avserv
Characters
written
1
10
100
1,000
0.000427 0.000277 0 0.066667 2.333333 0.266667
0.000727 0.00034 0 0.5 14.3 -1
0.000692 0.000365 0.000032 0.1 2.6 -1.93333
0.002992 0.00112 0.000015 0.283333 7.05 -5.38333
Indexes 1,000
REAL CPU SYS CPU USER avque avwait avserv
199
C.-l. Sia, Y.-S. Hollnfonnation and Software Technology 39 (1997) X95-204
of one character (for both ‘read’ and ‘write’ system calls) decreases as the number of iterations under which the data was collected increases. In addition, using simple regression analysis [16], it was found that the correlation between real time and the number of characters transferred, and that between CPU system and the number of characters transferred, for both ‘read’ and ‘write’ system calls are both very high (‘r’ values of greater than 0.98) [1.5]. On the other hand, the values of CPU user remain relatively constant irrespective of the number of characters transferred. Thus, the real time and CPU system can be predicted by the number of ‘read’/‘write’ calls made, the number of characters transferred per call, and the resource usage per ‘read’/‘write’ call. 3.1.3. Memory Only the usage behaviour of the declaration and access of arrays of integers are included in this study. In [17] p. 6-32, it was noted that a high %idle coupled with poor response may indicate a memory bottleneck. It can thus be deduced that request and use of runtime memory will contribute to the execution time of a program accessing that memory. Thus,: TMEM
-
CMEMm,(y,i,z m
when Memory Declared is the same as Memory Accessed, based on the Memory Accessed amount. 3. The pattern randomly).
to memory
(sequentially
or
For a particular program component PC, such as a system call, there are a number of modifiers to its invocations within the program: loops (e.g. ‘for’) and conditions (e.g. ‘if-else’). Thus, to provide an estimate of the number of times PC will be invoked in the program, all the loops and conditions leading to each instance of PC, together with the associated probabilities [12], pp. 57-64, have to be determined. In addition, it is necessary to determine the probabilities of following the processing paths to each loop occurrence. The number of times a particular instance of PC is invoked in a program is modelled by:
where Npci,j is the number of times the ith occurrence of PC is invoked with j levels of invocation modifiers along the processing path leading to PCi, and NPci.j = PPci,j
where
of access
*Npci, j- I
(j 2 1, Npci.0 = 1)
where
MEMm.( y,i).z
= The processing time necessary cess array m with y elements type i and accessed z times.
to proof data
The average (over three samples) memory-related performance data collected for different amounts of array elements declared and accessed in a program is shown below:
Access:
1 million
2 million
4 million
Declared:
Real time (sets)
Real time (sets)
Real time (sets)
1 million 2 million 4 million
0.6733 0.68 0.68
0.7267 1.34 1.3367
1.0733 1.9 58.6467
is the probability of taking the next level (step) of processing path when a condition is encountered, or the estimated number of iterations when a loop is encountered, at level j for program component PCi. Npci,j _ 1 is the number of invocations up to the (j - 1)th level for PCi. pPCi, j
3.2. Validation of the predictive
model
The above data, together with more detailed data collected and analysed in [15], suggested that in predicting the portion of the program execution time due to memory usage, it is necessary to know:
To validate the accuracy of the predictive model, a set of test programs was written [15] pp. 127-129, which comprises different combinations of program components. The predictive model is used to estimate the resource usage of these programs [15] pp. 161-163, which is then compared to their resource usage during actual execution of those programs using the x2 Test [15] pp. 130-138. A summary of the x2 values obtained for each program is shown in the following table.
1. The amount of memory declared. 2. The estimated access to the memory declared.
Program
Real time (sets)
CPU systems (sets)
CPU user (sets)
Program 1-l Programl-2 Program2_1 Program2_2 Progmkl_1 Progmkl2 Progmk2_1 Progmk2_2
0.125 0.737 1.045 0.367 0.044 0.27 0.731 0.248
0.542 1.274 0.017 0.05 0.543 0.966 0.021 0.127
0.14 0.045 2.178 0.398 0.217 0.14 1.402 0.155
If then
Memory Declared 5 Memory Accessed, use the processing time associated with the MA (Declared * Access) for the amount of memory declared (Memory Declared) and the amount of memory accessed (Memory Accessed).
If then
Memory Declared > Memory Accessed, use the processing time associated with the MA
C.-L. Sia, Y.-S. Ho/Information and Software Technology 39 (1997) 195-204
200 Step 1.1
1
Step 1.2
f
>
Workload J
t
4 Step 4
/
where the computer systems may or may not already have been installed. The framework is developed so that the steps can be incorporated into the life cycle phases described in the Installation Life Cycle for packaged software in [19] and the Enhanced Life Cycle Model in [20]. The capacity planning framework is applicable to an installation with a UNIXbased environment, as long as its present, future or potential workload can be characterized and the appropriate performance data gathered for that installation. l
1 Determine Unit Resource Usage For Program \ Components / 4 Step 5
f
-)(Tz+
Applicatton of Predictive Model to Workload
Actual Workload
Continuous Monitoring
l
Fig. 1. Prediction-based
capacity
planning framework.
The sample size for the actual program execution is ten. Thus, there are nine (n - 1) degrees of freedom for each program. At the 0.05 level of significance, the critical value x& is 3.33 [18], p. 1130. From the above table, it is clear that none of them is greater than 3.33 (x&, implying that all the predictions for the set of test programs are not significantly different from the actual data collected. The less-than-perfect accuracy of the predictive model can be attributed to exclusion of certain factors which will contribute to the processing time of a process, such as the start-up activities during process creation [l], p. 17, [4], pp. 57-70, and page-fault activities [ll], p. 137. However, these factors are not significant. l
4. Prediction-based
capacity planning
framework
The framework for the prediction-based capacity planning is shown in Fig. 1 which is applicable to situations
Time Period 1
Application A
B
2
A D
Step 1.1 Determine performance objectives It is first necessary to establish a performance contract between the user management and the computer department of the organization [lo] pp. 8-10. The contract, containing the performance objectives, reflects the user’s expectations [20], pp. 3.39-3.40, of the computer system’s service levels, and the computer department’s agreement to meet the expectations. It is usually meaningful to specify these objectives in a statistical form, such as: 90% of the response times of transactions should be less than 5 seconds, 98% less than 8 seconds, etc. Step 1.2 Workload forecasting Workload forecasting requires the user and system administrator to estimate the workload, including the workload applications and their usage patterns, likely to be encountered by the system. The estimate should allow for future expansion to the workload using techniques discussed in [21], [22] and 1231. The workload forecast may be estimated using Natural Forecasting Units (NFU) proposed by Friedman in [24], so that the forecast can be more intuitively specified by the user management. The usage patterns should be classified according to meaningful periods of time in which a set of applications/transactions are likely to be executed concurrently for the installation. The time period may be any arbitrary period meaningful to the installation. A sample workload forecast is presented in Fig. 2. The ‘/ ’ symbol separating items in the Fig. means that either one of the items may be executed at one time, but not both. Step 2 Analysis of programs in workload Preliminary work for this step requires the programmers of the applications to document, for each program, the numbers of iterations of loops used and the probabilities of following each processing path. In addition, the program structure,
Programs/ Transactions A. l/A.4 A.21A.3 A.6lA.7 B.2 B.6iB.7 B.8 A.2 A.3 D.l D.2 Fig. 2. A sample workload
Frequency
Forecasted Increase in 1 year: 10%
1 10 5 1000 1 200 5 10 3000 100
1.1 11 5.5 1100 1.1 220 5.5 11 3300 110
forecast.
C.-l. .%a, Y.-S. Hotinformation and Sofhvare Technology 39 (1997) 195-204
Fig. 3. A sample workload
l
system calls, library functions and programming commands invoked within each program may be extracted using tools such as the Program Analyser of [15]. Also, it is possible for the program developer to specify the I/O and memory requirements of each program. The memory required for the process execution and run-time memory declared by each program may be obtained by using the method proposed by Loukides in [lo], pp. 121-129. In the case where the system has already been installed and used for some time, and Steps 7.2 and 1.2 reveal that the forecasted workload characteristics have changed, then the program structure of the changed portion of the workload will have to be extracted. The next task is the estimation of the I/O requirements for each program/transaction in the workload, which can be provided by the user with the help of the system administrator. All the preliminary work described above requires the cooperation and some discipline on the part of the application developers. However, if such documentation becomes a standard practice among software developers, then the benefits of the prediction-based capacity planning methodology can be fully realized. With the memory requirements and the I/O requirements known, Fig. 2 can be augmented to include the resource requirements, as shown in Fig. 3. Step 3 System conjigurationlreconfiguration Given the resource requirements (for memory and disk) of the projected workload, and an estimate of the CPU usage by the vendor: An initial configuration (types of CPU, different amounts of memory and different disk capacities with different speeds of transfers etc) may be recommended by the vendor during the pre-installation of the computer system; or Changes to the existing configuration (reconfiguration) of the system resources may be recommended by the vendor or requested by the user after the computer system has already been installed. There are two situations under which reconfiguration are done: when the nature of the workload has changed significantly as detected in Step
201
resource requirement.
7.2 (Situation 1 reconfiguration), or when the system configuration proposed has been found to be inadequate for handling the workload in Steps 5 and 6 (Situation 2 reconfiguration). The configuration or reconfiguration (for Situation 1 reconfiguration only) can be done by inspecting the application resource usage table, such as Fig. 3. From the usage table, the computer system specifications may be estimated for initial configuration, or more accurately predicted using the predictive model for reconfiguration (see Step 5 Point 13), by the vendor using the following method: 1. Determine the CPU requirements for each time period. For the initial configuration, it is necessary for the vendor to give a very rough estimate of the likely CPU usage by the workload based on his/her experience. For reconfiguration, the CPU usage may be predicted by calculating the scaled instructions (see Step 5 Point 24) required to process the workload at each time period and using the maximum scaled instructions (based on the current processor) among all relevant time periods (plus some allowance for future increases in demand) as the ideal capacity for the CPU. Determine the disk requirements for each time period. Sum up the disk space required by all the applications used by the installation, the projected space required by the data files, the swap memory, the database (if any) and other miscellaneous disk space requirements. In addition, the access speeds and transfer rates required may be roughly estimated by looking at the disk contention within each time period. Additional space should be added to the disk capacity requirements for future growth. _. Determine the memory requirements for each trme
’ The use of predictive model is described in Step 5 under Point 1. It is not explained here hecause the proposed framework is an iterative one and the predictive model is used during reconfiguration here when Step 5 should have already been performed. 4 The concept of scaled (normalized) instructions is described in Step 5 under Point 2. It is not explained here because the proposed framework is an iterative one and the scaled instructions are used during reconfiguration here when Step 5 should have already been performed.
C. -L. Sia, Y.-S. Hollnformation and Software Technology 39 (1997) 195-204
202 Time Period 1
Appln
A B
2
A D
Progs/ Trns
Forecasted Freq A.llA.4 1.1 A.21A.3 11 A.6iA.7 5.5 B.2 1100 B.61B.7 1.1 B.8 220 A.2 5.5 A.3 11 D.l 3300 D.2 110
REAL (SECS) 5 10 7 20 2 10 5 10 1022 32
CPU Total CPU USER Mem SYS (SECS) (SECSI 5 0 1OOKB 9 1 IOKB 6 1 50KB 9 2 200KB 1 0.5 150KB 7 1.1 180KB 4 1 1OKB 9 1 15KB 986 36 350KB 30 2 150KB
Fig. 4. A sample updated workload
period. Sum up the memory requirements for each time period and take the maximum over all time periods, plus the memory requirements by the operating system (including system tables), and other systems software such as library functions and utilities, approximated run-time memory required, buffers, and other miscellaneous requirements. A percentage should be incorporated into the memory requirements to allow room for unexpectedly high demands, and increases in future demand. 4. Match the resource requirement specifications to the hardware available, keeping in mind the performance objectives agreed upon in Step 1.1. The mapping from resource requirements to the hardware specifications can, at worst, be estimated by the vendor experts (marketing representatives or systems engineers), and at best can be empirically studied through the collection of statistically significant samples of such mapping data from past experiences of installation and reconfiguration. The situation 2 reconfiguration is needed because of an estimation error when the vendor proposed a resource (the overloaded resource found in Step 5) which is not able to handle the usage requirements of the workload. Thus, the remedy for this is to recommend a resource with a bigger capacity. Step 4 Determine unit resource usage for program comvonents Using the machine configuration proposed in Step 3, it is now possible to collect the resource usage of the >rogram components (see [ 15]), such as the program commands, I/O accesses, memory accesses and library functions used. This data will form the input to the predictive model used in Step 5. Step 5 Application of predictive model to workload The Following activities are carried out in this step: The predictive model will be used to estimate the resource usage of each program/transaction in the forecasted workload. The outputs from the predictive model will be used to update the resource usage table with the REAL, CPU SYS and CPU USER for each program/ transaction, resulting in a table such as Fig. 4. One possible indicator of the system workload [15] is to compare the number of machine instructions used by the
Avg Rd 100 0 0 0 100 500 300 100 200 0
Chars Wr 0 1000 10 0 0 0 10 0 200 0 _
resource requirement.
workload in one second, with the capacity of the system (11 MIPS/llSPECmark89 for the machine used in the test) [12] p. 65. Using this indicator, it is possible to characterize any workload by determining the number of machine instructions required by the program and its components. For this study, the program components of interest are the system calls, programming commands and library functions, since they are the main contributors to program performance [ 151. The number of instructions required per program component PC normalized with respect to the ideal capacity (11 MIPS or 11,000,000 instructions per second for the 486DX Personal Computer used in this study), lets call it X or X-value, is given by: IPS
x=
or
X = IPS* RTUpc
NPC fRT,c
where IPS NPC RTPC
RTUpc
is the machine capacity in Instructions Per Second (11,000,000 for the 486DX PC). is the number of times PC is invoked. is the real time in seconds required to process the Npc numbers of PC. is the real time in seconds required to process one unit of PC.
Note that X can also be calculated for a program or a set of programs comprising a workload since the number of machine instructions to be processed (scaled to the ideal capacity of the system) for the program/workload can be calculated by knowing the program components (using automated tools such as the Program Analyser in [15], pp. 139-149), and the execution time can be predicted using the predictive model of [15]. To determine the degree to which the capacity of a resource has been used, it is necessary to differentiate program components according to the resource they utilize. Subsequently, the X-value of each program for each resource will have to be calculated. The tests in [15] discovered that: (a) if the workload consists of a number of programs which are executed simultaneously and the sum of
C.-I. Sia, Y-S. Hoilnformation and Sofhvare Technology 39 (I 997) 195-204
scaled machine instructions (the X-value) of the programs in the workload is greater than the ideal capacity of the machine, then the system is overloaded (a capacity problem) [12]. It was determined in [15], pp. 64-66, that the execution time of a workload begins to increase drastically (indicating system/resource overload) when the sum of scaled instructions in that workload is approximately equal to three times the ideal capacity of the machine. (b) thus, if the number of scaled machine instructions to be processed in the workload is large (greater than the ideal IPS of the machine), then the capacity of one or more resources of the system might be reached; and (c) the resource which is overloaded can be found by determining which program components are most heavily used in the workload and what resources these components require. The profile (CPU-intensive, I/O-intensive or Memory-intensive) of the workload must be determined to decide which system resources are overloaded. This profile can be obtained by determining the workload program component (whether CPU-, I/O- or memory-related) which contribute most to the estimated execution times. Workload performance data may have to be collected to verify the profile using the following rules from [15], pp. 57-66: (i)
If
then (ii)
(iii)
If
then If
then
runq-sz is greater than or equals to 3AND %wio is less than 10 AND Mean Size (K) is small (less than l), the workload or program is CPUintensive. avque is greater than or equals to 3 AND avwait is greater than 100 AND %wio is greater than 50, the workload is I/O-bound. %idle is relatively high (greater than 10) AND %wio is low (less than 10) AND Mean Size (K) is high (greater than l), the workload is Memory-bound.
runq-sz is the number of processes in memory waiting to run. %wio is the percentage of time the CPU is waiting for I/O requests to be completed %idle is the percentage of time the CPU is in the idle state. avque is the average number of disk requests outstanding (queued) at a particular point in time. avwait is the average time that I/Q requests wait for their turn to be processed. Mean Size is the mean core memory in kilobytes used by a process during its execution.
203
Using the above workload analysis method, the X-value of each program component and the scaled instructions to be processed by the workload’ per time period, may be calculated and used to determine whether the capacity of the system, or its resources has been reached. The resource with the highest utilization (highest number of scaled instructions) is the bottleneck for the system [12], p. 65. By inspecting the updated resource usage table and matching it to the resource consumption characteristics of workload programs, the overloaded resource(s) can be determined. The overloaded resource is likely to contribute most to the processing times of the workload and which is heavily demanded in most of the time periods of interest, by the components of the programs concurrently executing in the workload. Identification of the overloaded resource(s) can be further supplemented and confirmed through the use of QMNs [7,8]. Step 6 Potential problem? This step is a decision by the user and system administrator on whether the performance objectives have been met. If they have not, it needs to be determined whether it is the system as a whole or some of the resources that are causing the problems. If it has been agreed that the service levels need to be improved, then Step 7.1 may be done next, otherwise Step 7.2 will be performed. Step 7.1 Confirmation with actual workload data The purpose of this step is to verify that a performance problem actually exists and to validate that a particular resource(s) is causing a drop in service levels. The X-value of the workload may be calculated to confirm that the processing capability of the system has indeed deteriorated as a result of the system or resource overload. This step may also be used to fine-tune the predictive model and to provide additional data to the workload analysis heuristics discussed previously. The collection of confirmatory data is not expected to place undue load on the system as only a few snapshots of accounting data (by application programs) will be needed. Following Step 7.1, the methodology iterates back to Step 1.1 to determine the need to make changes to the performance contract and objectives. . Step 7.2 Continuous monitoring When no performance problem has been predicted for the newly installed system, or experienced by the system, then the system will only need to be subjected to regular (but not necessarily frequent) monitoring. The purpose of collecting such data is to detect performance bottlenecks which occur as a result of changes in the nature or characteristics of the workload, and to identify differences in the trends of resource usage. When these happen, tuning of the system may be carried out and if this fails to improve l
l
5 Sum over the programs constituting the workload in a time period, of the product of Frequency of Program Invocation and scaled instructions for that program.
204
C.-L. Sia, Y.-S. HalInformation and Software Technology 39 (1997) 195-204
system performance, then a reassessment of the workload characteristics (that is, planning for future capacity) is necessary, and Step 1.2 will be performed next. With the exception of Steps 7.1 and 7.2, the steps in the prediction-based capacity planning framework may be carried out without actually having to use the installed system. Steps 4 and 5 may be performed using a test system, provided by the computer vendor, with the same configuration as the proposed or installed system. One consequence is that the user management can provide what-if scenarios to help them to decide the most suitable configuration for the organization. These are the advantages of the predictionbased capacity planning framework.
5. Conclusion The purpose of the predictive performance model is to enable its users to make an estimation of the usage of various system resources by a program/application without having to execute the program, through analysing the profile of its components. The predictive model proposed is incorporated into the proposal of a capacity planning framework. The framework aims to free as much work from the system administrator and to take as much load as possible away from the system during performance studies, yet it is able to provide a reasonably accurate picture of the degree to which the capacities of the different resources have been utilized. However, when a more accurate assessment of system performance is required, a more powerful modelling tool such as QNM is recommended. In addition, the predictive model would not be appropriate when the performance of individual programs have to be optimized, as the model does not require that performance considerations be built into the software system. The SPE would be more suitable for applications where the performance of individual programs is critical, such as real-time systems (e.g. [6]). UNIX running on a microcomputer has been used to validate the predictive model in this study. However, this does not limit the applicability of the predictive model to other UNIX-based systems with different and larger configurations.
References [l] D. Ferrari, G. Serazzi and A. Zeigner, Measurement and tuning of computer systems, Prentice-Hall Inc., Englewood Cliffs, New Jersey, 1983. [2] C.U. Smith, Performance engineering of software systems, AddisonWesley Publishing Company, Reading, MA, 1990. [3] B.W. Kernighan and D.M. Ritchie, The C programming language, Prentice-Hall Inc., Englewood Cliffs, New Jersey, Second Edition, 1988.
[4] P.K. Andleigh, UNIX system architecture, Prentice-Hall Inc., Englewood Cliffs, New Jersey, 1990. [5] C.U. Smith, Integrating new and ‘used’ modeling tools for performance engineering, In G. Balbo and G. Serazzi (editors), Computer Performance Evaluation: Modelling Techniques and Tools, Elsevier Science Publishers B.V., Amsterdam, The Netherlands, 1992, pp. 153-163. [6] CU. Smith and L.G. Williams, Software performance engineering: a case study including performance comparison with design alternatives, IEEE Trans. on software engineering, 19 (7) (1993) 720-741. [7] R. Jain, The art of computer systems performance analysis: techniques for experimental design, measurement, simulation, and modelling, John Wiley & Sons, New York, NY, 1991. [S] E.D. I-azowska, J. Zahorjan, S.G. Graham and K.C. Sevcik, Quantitative system performance: computer system analysis using queueing network models, Prentice-Hall, Englewood Cliffs, New Jersey, 1984. [9] J. Feder, The evolution of UNIX system performance, UNIX Systems Readings and Applications Volume II, Prentice-Hal1 Inc., Englewood Cliffs, New Jersey, 1987. [lo] M. Loukides, System performance tuning, O’Reilly & Associates Inc., 1990. [ll] R.P. Mikkilineni and D.F. Utter, Designing software for maintenance and performance, Proc. of the Twelfth COMPSAC (Annual International Computer Software & Applications Conference), IEEE Computer Society Press, October 1988. pp. 135-140. [12] B. Beizer, Micro-analysis of computer system performance, Von Nostrand Reinhold Company, 1978. [13] R.H. Saavedra-Barrera, A.J. Smith and E. Miya, Machine characterization based on an abstract high-level language machine, IEEE Trans. on Computers, 38 (12) (December 1989) 1659-1679. [14] E.W. Brehm, S.S. Chow and R. Goettge, Measurement and prediction of computer performance at the FAA’s en route air traffic control centers, Proc. of the Int. Conf. on Management and Performance Evaluation of Computer Systems (CMG ‘86), The Computer Measurement Group Inc., I986, pp. 317-320. [15] CL. Sia, A predictive model for the capacity management of UNIX systems, M.Sc. Dissertation, National University of Singapore, 1993. [16] H. Kobayashi, Modelling and analysis: an introduction to system performance evaluation methodology, Addison-Wesley Publishing Company, October 1981. [17] UNIX SVR4.2, Advanced system administration, Edited by Kathy O’Leary & Mathew Wood, UNIX Press (A Prentice Hall Title), ISBN o-13-042565-6, 1992. [18] J. Neter, W. Wasserman and G.A. Whitmore, Applied statistics, AIlyn and Bacon Inc., Third Edition, 1988. [19] C.L. Sia and A. Teo, EXOTUS report, Publication No. 0013599, Published by Dept. of Information Systems and Computer Science (DISCS) of the National University of Singapore (NUS), 1990. [20] KS. Raman, Capacity planning for information systems, Proc. of IFIP TC-8 Open Conference, Published by Department of Information Systems and Computer Science (DISCS) of the National University of Singapore (NUS), pp. 3.40-3.55, 1988. [21] F.A. Wong and F. Pedriana, Automated forecasting of large and complex workloads, Proc. of CMG ‘86, December 1986, pp. 121-132. [22] J.H. Smith and R.L. Gimarc, A unified methodology for data center planning, Proc. of the CMG ‘88, The Computer Measurement Group Inc., 1988, pp. 100-102. [23] P.M. Chen and D.A. Patterson, A new approach to I/O performance evaluation - self-scaling I/O benchmarks, predicted I/O performance, Proc. of ACM SIGMETRICS (Conf. on Measurement & Modelling of Computer Systems), May 1993 pp. 1-12. [24] E.M. Friedman, Workload forecasting: predicting resource requirements for new and existing workload, Proc. of the CMG ‘86, The Computer Measurement Group Inc., 1986, pp. 533-541.