hardware systems

hardware systems

Performance Evaluation 45 (2001) 125–146 Performance validation tools for software/hardware systems Ulrich Herzog a,∗ , Jerry Rolia b,1 a b Univers...

274KB Sizes 1 Downloads 211 Views

Performance Evaluation 45 (2001) 125–146

Performance validation tools for software/hardware systems Ulrich Herzog a,∗ , Jerry Rolia b,1 a

b

University of Erlangen, Informatik 7, Martensstr. 3, D-91058 Erlangen, Germany Internet and Mobile Systems Lab, Hewlett-Packard Laboratories, 1501 Page Mills Road, Palo Alto, CA 94304, USA

Abstract It is common for software/hardware systems to be fully designed and functionally tested before an attempt is made to verify performance characteristics. Unsatisfactory performance, when discovered late in a system’s development, can cause a costly redesign and implementation of software or hardware and is likely to result in late system delivery. The goal of system performance validation is to provide assurance that a system as a whole is likely to meet its quantitative goals before the system is complete. It exploits performance engineering methods and tools to systematically construct and evaluate predictive system models. The tools must correctly predict system performance at some useful abstraction. This paper compares layered queueing models (LQMs) and stochastic process algebras (SPAs) and their support for system performance validation. In particular, we focus on abstraction and level detail within models and automated model building. © 2001 Published by Elsevier Science B.V. Keywords: Performance validation; Software performance engineering; Automated model building; Layered queueing models; Stochastic process algebras

1. Introduction System performance validation provides assurance that a system as a whole is likely to meet its quantitative performance goals. It applies performance engineering methods and tools during a system’s development and has to be integrated into the design process from the very beginning through to system deployment. Performance engineering (PE) means to describe, analyze and optimize the dynamic time dependent behavior of systems. The main steps are summarized in Fig. 1. First, desired characteristics for system performance must be established and critical system performance risks must be identified. Next, it is necessary to decide the modeling abstraction needed to address a particular concern. Specifically, we must choose system features, workloads and performance objectives for the study. Measurement and or modeling studies are then conducted as is appropriate to verify whether or not the objectives are feasible and to provide feedback to system developers. This method must be repeated as the system is ∗

Corresponding author. Tel.: +49-9131-85-27041; fax: +49-9131-85-27409. E-mail addresses: [email protected] (U. Herzog), [email protected] (J. Rolia). 1 Present address: 1501 Page Mills Rd, Palo Alto, CA, USA, 94304, 1U-17. Tel.: +1-650-236-8439. 0166-5316/01/$ – see front matter © 2001 Published by Elsevier Science B.V. PII: S 0 1 6 6 - 5 3 1 6 ( 0 1 ) 0 0 0 3 2 - 3

126

U. Herzog, J. Rolia / Performance Evaluation 45 (2001) 125–146

Fig. 1. Performance engineering methodology.

further developed. Parts of a system are studied in isolation with the more critical aspects of the system being studied in greater detail. As performance objectives are validated the artifacts of studies must be synthesized, in effect the results of one study affect others. The final result of this process is a system level validation. An important challenge for performance engineers is to choose modeling abstractions. Abstraction identifies the queueing and synchronization resources to be reflected in a model, workload classes, and how each class of work makes use of system resources. Furthermore, an appropriate level of detail must also be chosen. Detail determines how many parameters are used to describe each part of the model. It must be possible to create and maintain representative model descriptions within a project’s schedule and budget. By representative we mean that the model accurately reflects system behavior at its level of abstraction. For this reason a model should be as abstract as possible and include the minimal detail needed to validate performance. Models typically become less abstract and more detailed as a system is developed. Care must be taken to ensure that models do not include more features than necessary to validate performance and that they do not have more detail (parameters) than have accurately been characterized. The goal of automated model building techniques is to reduce the effort needed to create and maintain representative model descriptions so that performance engineers can focus on interpreting results and influencing system development. In this paper, we concentrate on two analytic modeling approaches for supporting system performance validation. These are used for modeling studies as illustrated in Fig. 1. First, we deal with layered queueing models (LQMs). LQMs are characterized in terms of features typical of distributed and internet systems. Performance prediction for LQMs exploits fast mean value analysis (MVA) techniques. Secondly, we consider stochastic process algebras (SPAs). SPA offer more general support for abstraction. They rely on continuous time Markov chains (CTMC) for performance prediction and can be used for the detailed study of specific subsystems. After presenting basic ideas, principles and examples for both, we compare them with respect to their flexibility in terms of modeling abstraction and detail, and the task of building models of systems. Section 2 of this paper describes LQMs, various predictive modeling tools and issues specific to LQMs, and a set of related model building tools that rely on system measurements to automate the construction of LQMs. Similarly Section 3 describes SPA. Section 4 compares the two approaches. Summary and concluding remarks are given in Section 5.

U. Herzog, J. Rolia / Performance Evaluation 45 (2001) 125–146

127

2. Layered queueing models LQMs were developed [1,2] to study multi-process software systems that share hardware resources such as processors, disks and network elements. They are extended queueing network models (QNMs) [3] that take into account both demands at hardware resources and visits and queueing between software components. Like QNMs, they operate at a high level of abstraction and are suitable for comparing system designs and for capacity planning. These models are best suited to systems with relatively large numbers of clients, for example, transaction processing and internet systems. 2.1. Describing systems Fig. 2 gives an example of an LQM. Each process provides a set of services (methods of objects). The services interact to accomplish the application’s goals. The layering of processes and the possibility of queueing when requiring access to devices as well as services in other processes gives the name of the model. Devices in LQMs include queueing centers such as processors, disks and network elements. Software servers can be single or multi-threaded. There are several analytic performance evaluation techniques for LQMs [1,2,4]. [1,2] are aimed towards large systems and have parameters as described in this paper. [4] is aimed towards smaller systems with a more detailed characterization of model parameters. The solution techniques for LQMs are all based on MVA [5]. In general, the layered models are partitioned into a sequence of two level submodels that are solved using variants of MVA. The results of one model affect the input parameters of the others. The techniques differ in the way they integrate submodel results, extensions to MVA, and in the features of systems that are described. 2.2. Tool support There are three tools for describing and solving LQMs. The layered queueing network solver [6] integrates the research efforts of several researchers and extends them further. The tool is aimed towards

Fig. 2. A layered queueing model.

128

U. Herzog, J. Rolia / Performance Evaluation 45 (2001) 125–146

medium sized systems having many classes of clients, but only a single customer per class. [4] describes a tool for LQMs with models that have Coxian phase-type descriptions for software visit patterns. This goes well beyond the detail of the basic QNM parameter descriptions for phases typical of [1,2]. Their tool deals with detailed interactions between asynchronous processes in a very accurate manner and is targeted towards smaller systems with small numbers of concurrent customers where these interactions may dominate performance. The method of layers (MOLs) tool [2,6] is aimed at very large systems with many classes of customers having many clients per class. It has special support for studying the scalability of distributed application systems [7]. This paper describes the MOL tool as an example of an LQM modeling package. The purpose is to illustrate support for system performance validation based on LQMs. Later, we focus on tools and techniques that automate model building for the MOL tool. These results could apply equally to other LQM tools. The MOL tool characterizes systems and experiments in the following way. It presents: • A hierarchical system configuration description that describes: ◦ objects and their methods, operating system processes; ◦ nodes and their number of processors and disks; ◦ the networking infrastructure that connects the nodes. • A simple language to describe software interactions and use of resources. It expresses: ◦ named activities that have resource demands on processors and disks; ◦ UML-like sequences of activities (with loops, conditionals and barrier synchronization) and RPC interactions (synchronous, asynchronous and forwarding) between the methods of objects; ◦ the makeup of client sessions (job classes), these are based on a mix of the sequences. • Support for experimental design that includes: ◦ the placement of objects within processes; ◦ the selection of process threading levels; ◦ resource demands of activities (processing, disk, think time and delay); ◦ workload mix and intensity; ◦ network configuration; ◦ for scalability analysis: the replication of processes, nodes, and/or higher level packages. The MOL tool permits an analyst to describe application level objects, their methods and coarse level activities that are associated with resource demands. Sequences describe the paths through these activities. The objects are then allocated to processes, processes to nodes, and nodes to networks. Processes, nodes and networks of nodes can be included in hierarchically organized packages that can be replicated. By varying the number of replicates of each package an analyst can represent more users or more system resources. In Fig. 2, this could correspond to varying the number of client nodes or web server nodes (or other kinds of packages). System design alternatives can be considered by varying other aspects of the model description. For example, the business logic object could be deployed on the web server node. By changing a sequence or resource demands of activities within a sequence, a performance engineer can reflect the impact of different software implementations or of different hardware characteristics. Because the tool is intended to support distributed and internet systems, its abstraction is quite fixed. Models characterize: hierarchical packages, processors, disks, network elements, processes, objects, methods, sequences and activities. The MOL tool’s front end performs a preprocessing step to deduce the

U. Herzog, J. Rolia / Performance Evaluation 45 (2001) 125–146

129

average number of visits between services and average service demands at hardware resources including processors, disks and networks as well as for software resources. In this way the system’s software description characterizes both software interactions and load on hardware resources. These values are then used as inputs to the MVA-based performance evaluation engine. The MOL tool’s features present the performance engineer’s degrees of modeling freedom. Other LQM-based modeling packages have different but comparable support for abstraction. SPA, as described in the next section, are far more general in their support for expressing abstraction. The next section explains how the level of detail within MOL tool models can be controlled. 2.3. Building models To build an LQM, it is necessary to identify queueing and synchronization points that affect performance most and characterize the resource demands between them. For transaction processing and internet systems the key queueing points typically include hardware resources such as processors, disks and network elements and also the operating system processes that act as software servers. Examples include HTTP servers, application logic servers and databases. Synchronization points most often relate to transactional commit protocols. Application-specific queueing (for example, queues that are managed by application logic) and synchronization points must be introduced to a model when they are expected to have a significant impact on system behavior. Though the MOL tool permits an analyst to reflect some aspects of synchronization behavior in models, the SPA techniques described in Section 4 are much more expressive. Because the performance evaluation method for the SPA is based on CTMC (as opposed to MVA), they are also better able to predict the impact of more general synchronization behavior. Similarly, for some studies an MVA-based model for the underlying hardware may not be adequate to assure system performance. A different abstraction or more detail for the model may be required. Techniques such as SPA should be considered for these specific subsystems. Ideally LQM, SPA and other tool kits should be able to exchange information in an iterative manner so that the most appropriate tool can be used for each modeling task. Model building for the MOL tool has the following steps: • • • •

identify the processes and services (object methods) that contribute to the application; identify the client sessions (client sequences and activities) that make use of the services; for each sequence estimate the resource demands of its activities within services; associate objects with processes and processes with node and network elements.

The notion of service (object method) within the MOL tool is quite general and addresses the issue of model detail. A model’s method can characterize a method as encoded in a programming language. Alternatively, it could reflect a lower or higher level of detail. For example, a method may behave differently depending on how it is used. In this case, we may choose to represent a single method encoded in a programming language as several methods in a corresponding model, one for each behavior. This increases the level of detail in the model. Detail can be suppressed by representing more complex software processing (within the same operating system process) with a single method in the model. We now consider several tools that support the automated model building of LQMs for the MOL tool [8,9]. Automated model building enables the creation of models that more precisely reflect certain aspects of system structure. They decrease the likelihood of error when creating and maintaining models, enable

130

U. Herzog, J. Rolia / Performance Evaluation 45 (2001) 125–146

timely support for greater detail, and permit performance engineers to focus on interpreting results and providing feedback to system developers. The Angio tracing tool [8] supports the construction of models very early in a system’s design stage and relies on an instrumented design emulation environment. The emulation environment’s runtime system is instrumented to output a trace that identifies the sequence of service names caused by each request and the causal relationship between successive services. In the emulation environment interactions are by asynchronous message. A Prolog-based pattern matching technique is used to decide whether sequences of interactions are synchronous, asynchronous or forwarding in nature. In this way processing a trace gives the structure of a performance model. Resource demand estimates for activities of services that are estimated by analysts and stored in a database. These are summed up as the trace is processed to give a service’s aggregate hardware demands. A major advantage of this model building approach is that as a design description evolves, new models can be easily regenerated. Models can always be kept in step with the design. Note that if the nature of each interaction is recorded in the trace then the pattern matching stage is not necessary. More recently, Hrischuk [10] has applied more robust graph grammar methods to identify the nature of interactions based on asynchronous messages. For distributed application systems, midware and operating system overheads can have a significant impact on performance. It can be difficult to accurately estimate such overheads. A performance prototype can be used to support the measurement of overheads so that they can be reflected in models. A performance prototype is an application that expresses critical parts of a design or architecture and attempts to exploit the application’s key technologies in the way they will be used by the final system. Performance prototypes support measurement studies as illustrated in Fig. 1. Performance measurements from the prototype can provide structural and resource demand information for corresponding predictive models. The predictive models can then be enhanced to express features of the system under development that are not possible to implement in prototype form. Performance measurements from a prototype can be deduced from traces [8,11] or periodic reports from performance monitors. Most tracing tools impose a relatively high overhead on the system being measured. They generate and store or forward information for every measured event. In general, this monitoring perturbs the system. Tracing tools are most appropriate for learning about software interactions. In particular, to learn about unexpected visits between system software components. For resource demands, statistical confidence is required. This suggests long measurement intervals and the need for lower overheads. Commercial operating system monitors are better suited for measuring mean resource demands than tracing techniques. However, the demands must be correlated with application instrumentation to help complete a model. Many application instrumentation systems have been proposed that integrate application instrumentation with information from operating system monitors. An approach should be chosen that can be used in heterogeneous environments and that supports our need to build models throughout a system’s lifetime. The application response measurement (ARM) application programming interface (API) [12] was proposed by Hewlett-Packard and IBM/Tivoli as a vendor neutral distributed application instrumentation API. It is now a standard of the Open Group with its development managed by the “Distributed Application Performance” subgroup formed within the desktop management task force (DMTF). Many vendors now provide monitoring infrastructures that are compatible with the ARM API. Data collected by multiple vendors can be coalesced to produce a single view or model of a heterogeneous system.

U. Herzog, J. Rolia / Performance Evaluation 45 (2001) 125–146

131

The ARM 2.0 API is oriented around application names — an application name typically identifies an operating system process, transaction names — a transaction name is associated with a service within a process and handles — that identify individual executions of transactions. To reduce monitoring overhead instead of reporting every measure to a performance database, similar measures (for example, a series of a specific transaction’s response times) are mapped onto a common performance object. Each performance object captures count, mean and other distributional measures. The aggregated performance information is then periodically reported to a performance database — for example, every 5 min. This greatly reduces the monitoring overhead. ARM 2.0 also introduces the concept of a correlator. The correlator provides a way to characterize causal interactions (visits) between services across application tiers. Essentially, the correlator must be passed with each message. The correlator contains information that permits tools to correctly interpret monitoring data in heterogeneous environments with many vendors. [9] describes an ARM 2.0 prototype. The prototype implements the ARM API. The prototype’s runtime environment is a library that is linked to application processes. It receives start and stop ARM transaction API calls from within applications and manages performance objects. The runtime periodically reports resulting performance object data to a performance database. A model builder uses the information to automatically generate an LQM for the MOL tool. The automatically generated model currently includes the following information: • structural information that identifies processes, sequences, services, the use of services by sequences; • per-service resource demand information as captured from the host operating system. This provides the information needed to create LQMs similar to that of Fig. 2. The prototype has extended ARM 2.0 in several important ways [9,13] to better support model building exercises. Three key additions are: • Poly-abstract monitoring, i.e. models can be built with different levels of detail. For example, each process in the model could have one service or many. The fewer the number of services lesser the detail about the process and lower the monitoring overhead. This is implemented by supporting hierarchical ARM transaction names, for example, object1/method1 and object2/method2. The hierarchy can be used to control the number of services and corresponding level of detail that appears in the model. For example, a model could include a single service that is an aggregate of method1 and method2 or it can have two services that represent method1 and method2 separately. • Work flows, these identify sequences for the MOL tool. The sequences span operating system process boundaries. • Multi-tier monitoring to distinguish user behavior across multiple tiers in a distributed application system. These features combined with the periodic reporting of performance information provides for a low overhead automated model building solution. The disadvantage is that an application’s performance prototype must be instrumented to include the ARM API calls and be linked with the ARM runtime library. We note that if such instrumentation is included in the final system then it is always possible to regenerate models throughout the system’s lifetime. The models can be used for capacity planning and to support future design exercises.

132

U. Herzog, J. Rolia / Performance Evaluation 45 (2001) 125–146

Fig. 3. An architecture for a distributed management system.

2.4. Experiences In [7], a model was considered to study the design of a next generation distributed system management infrastructure. The purpose of the study was to validate the scalability of the proposed design and to gain insight into its potential capacity. The architecture for the system is shown in Fig. 3. The system includes several management domains connected by a wide area network (WAN), each of which has three kinds of nodes connected by a local area network (LAN) that support various kinds of processes. The nodes represent management stations, managed systems and management data repositories. Management requests make use of the processes within and between domains. Each of the nodes had its own processors and disks. Communications took place over the LANs and WANs. A simulation model was developed to support system validation. The simulation model reflected queueing resources similar to those of LQMs but had more detail. In particular, it used more general parameter distributions and also expressed some detailed data dependent application behavior. The system under study had a significant amount of asynchronous messaging and some join behavior. A corresponding LQM for the MOL tool was then created by hand, based on the simulation model, to determine whether the LQM was as useful as the simulation model for supporting system validation. The resulting LQM identified the key parameters that affected performance most in the same way as the simulator. Both could identify a latency issue that was caused by frequent RPCs for small amounts of information. The problem in the system design was overcome by making fewer RPC requests that returned more data. Mean response time estimates for key transactions were within 15% of simulated values. The analytic model provided much more timely performance estimates with estimates taking a few seconds per model. Simulations were only performed for small system configurations, large systems could quickly be assessed by varying the replicates of nodes within the analytic model. In this way LQMs proved to be a useful tool for system performance validation. A major problem with the by-hand model building approach was keeping the LQM in step with the rapidly evolving simulation model. Ensuring the simulation model remained representative of the system design was a similar but greater challenge. This motivated the development of a monitoring API that could be used to instrument prototypes and systems to support automatic model building. That API was then adapted to the ARM 2.0 API and later extended to provide more robust model generation capabilities [9,13]. Other examples of case study work and applications of LQMs are described in [14–27,49]. Next, we consider process algebras and how their support for system performance validation.

U. Herzog, J. Rolia / Performance Evaluation 45 (2001) 125–146

133

3. Process algebras Classical process algebras, among them CCS [28], CSP [29] or LOTOS [30], have been designed as formal description techniques for concurrent systems. Therefore, they are well suited to describe reactive systems, like operating systems, automation systems, hierarchies of communication protocols, etc. From the very beginning, the basic idea of process algebras was to systematically construct complex systems from smaller building blocks and to check formally whether systems behave equivalently or not. The behavior of each building block — whether hardware, software or a combination of both — is described as a process which may communicate with other processes. Standard operators allow various kinds of process composition — synchronization and — communication. Therefore, software processes can be mapped onto other, more elementary software processes — this may be done repeatedly — and finally mapped and executed on computer- or communication hardware. Such systems can be combined again to build a network, etc. Since such network systems are very complex, another important operator allows one to abstract from internal details at any level of system description. Unique features of process algebras are their carefully introduced formal semantics and several notions of equivalence or other relations between an abstract specification and its implementation. An algebraic framework eases the handling and comparison of specifications by means of equational laws. These central ingredients are graphically summarized in Fig. 4 and explained next (cf. Section 3.1). Classical process algebras dealt exclusively with the functional aspects while features of performance evaluation have been added during the last decade (cf. Section 3.2). We also mention briefly these important extensions. 3.1. Describing systems and equivalent behavior An effective way to describe accurately the behavior of modern computer and communication systems is to use discrete state transition systems, mostly called state machines. Such a representation is usually given by a list of states and a list of transitions between the states. States describe the totality of features characterizing at one time instant the behavior and properties of a system. The transitions represent events like the arrival of a message or the occurrence of an interrupt. Process algebras are based on this scheme. However, instead of describing the state transition system directly, a two-step methodology is used (cf. Fig. 4):

Fig. 4. Basic concept of process algebras. In the case of SPA temporal information is added at each level and analysis includes performance measures as well as mixed properties.

134

U. Herzog, J. Rolia / Performance Evaluation 45 (2001) 125–146

• The system is described with a high-level language which is quite user-friendly and design-orientated (our input language is an enhanced version of BASIC LOTOS — Language of Temporal Ordering Specification — the core language of ISO standard 8807). • A rule system, called formal semantics, allows to automatically translate the language expressions into states and transitions of the labeled transition system. Two systems or system components are considered being equivalent if the transition systems show the same (functional) behavior. There are several possibilities to define this formally: trace equivalence means that all possible sequences of actions are identical. Popular is also bisimulation meaning that both systems (or components) simulate each other in any situation. Having defined equivalent behavior equational laws — deduced by axiomatisation — reflect these equivalences on the system description level; therefore, comparison of two systems or system components is possible on both levels and therefore, one speaks of process algebras. 3.2. Stochastic process algebras The main motivation behind the development of SPA has been to accurately describe and investigate the behavior of resource-sharing systems. To achieve this goal, temporal information has been attached to process descriptions in the form of (continuous) time random variables. These random variables allow one to represent time instants as well as of activities. Then, the concept of SPA follows the lines of classical process algebras. As before — cf. Fig. 4 — the main ingredients are a formal mapping from system description to a transition system and substantive notions of equivalence. Equational laws reflect these equivalences on the system description level. Rather than considering only the functional behavior we add stochastic timing information. This additional information in the semantic model allows the evaluation of various system aspects: • functional behavior (e.g. liveness or deadlocks); • temporal behavior (e.g. throughput, waiting times, reliability); • combined properties (e.g. probability of timeout, duration of certain event sequences). The stochastic process associated with every specification is the source for the derivation of performance results. Its characteristics clearly depend on the class of random distributions that are incorporated in the system description. Several attempts have been made to incorporate generally distributed random variables in the model [31–34]. However, the general approach suffers from the problem of efficient analysis techniques as well as general algebraic laws. Therefore, usually exponential or phase-type distributions are embedded into the basic functional system description. Some simple examples of such process algebra descriptions with embedded exponential phases are shown next. • The sequential arrival of three different jobs is specified by a process Jobstream describing explicitly each arrival point before halting: Jobstream := (job1 , λ1 ).(job2 , λ2 ).(job3 , λ3 ).Stop. • Consequently, a Poisson-arrival process is defined by an infinite sequence of incoming requests (in, λ).(in, λ).(in, λ).. . . , which can be formulated recursively: Poisson := (in, λ).Poisson.

U. Herzog, J. Rolia / Performance Evaluation 45 (2001) 125–146

135

• A service process consisting of an Erlangian distribution of order 2 is given by Erl2 := (end1 , µ).(end2 , µ).Stop. Both, a precise and concise description of many service or arrival processes is possible; this is illustrated by a so-called train-process, which is important for the modeling of file transfers in local area networks. Thereby the overlap and interleaving of different ‘trains’ is captured by the parallel operator (|||): Train := (lok, µ).{((wag1 , µ).(wag2 , µ) · · · (wagn , µ).Stop)||| Train}. Mapping of software components onto other software modules or hardware is a very important modeling step. Let us consider a very simple example: • There is a software job consisting of two independent tasks; their different functionality is expressed by different exponential times: Job := start.(task 1 , λ1 ).Stop|||start.(task 2 , λ2 ).Stop, where ||| indicates again the parallel operator without synchronization. We also have off-the-shelf processors, the first of which (Proc1 ) got unit speed while processor Procx is x-times as fast: Procx := start.(task i , x).Procx . • Mapping the job onto a specific processor is given by the parallel composition of both taking into consideration that they have to synchronize on the events start and taski (|[..]|-operator). Since we are not interested to “see” the timeless start signal we may hide it. Then, the behavior of the complete hardware/software system is given by Systemx := hide start in (Job |[start, task i ]| Procx ), where the behavior of the Job and Procx is given by the above descriptions. Of course, arbitrary task and processor behaviors can be modeled. From such a modular system description, we automatically generate the labeled transition system; it contains all functional and temporal information. By hiding its functional information we directly derive the underlying CTMC. The state-space explosion problem associated with the CTMC can be reduced significantly by compositional model generation and reduction. This is a very important feature which is demonstrated with an example in Section 3.4.2. In effect, a performance engineer decides the abstraction and level of detail to be included in process specifications in such a way as to increase the sizes of models that can be considered. Another promising approach uses functionally correct and temporally approximate decomposition and reduction techniques; this permits the analysis of specific system models with a very large number of states [38,39]. In each case, the performance engineer exploits modeling abstraction not only to characterize the system but also to influence the efficiency of computing measures from the CTMC. This differs from LQMs where modeling abstraction is fixed and already chosen for solution efficiency. 3.3. Tool support At the moment there are three tools available, the PEPA Workbench [35], Two Towers [36] and our TIPPtool [37]. The TIPPtool is a prototype modeling tool which contains most of the specification and evaluation features of today’s SPA. Its main characteristics are:

136

U. Herzog, J. Rolia / Performance Evaluation 45 (2001) 125–146

• LOTOS orientated input language as well as a graphical notation based on extended finite state machines; • implementation of the structured operational semantics; • investigation of functional properties by reachability analysis; • analysis of temporal properties with various numerical solution modules for the transient as well as stationary analysis of CTMCs; • user guided exploitation of symmetries and exact or approximate compositional reduction techniques; • computation of standard performance and reliability measures. The tool is implemented in standard ML and C and has an elaborate graphical interface. Medium size problems up to 100K states can be solved easily; very complex application problems with a certain structure may be solved by the approximate decomposition techniques [38,39] or via a transformation to stochastic graphs [32,40,41]. We also use our tool in combination with mature tools for purely functional specification, analysis and code generation. 3.4. Experiences There are about 15 research groups dealing with SPA. A complete theory and tools are available for models with Markovian assumptions including immediate transitions (actions consuming no time compared to the others). Many small and medium size examples have shown the practicability of the SPA concept. Recently, researchers began to solve large non-trivial problems, e.g. • • • • •

adaptive mechanism for packetized audio over the internet; an ATM switch with explicit rate marking; a Manhattan style mobile communication system; the Erlangen hospital communication system (HCS); a plain-old telephone system (POTS).

While the first two example model systems with generally or constant distributed time intervals and evaluate them by simulation [36,50], the others embed exponential and phase-type distributions and use the PEPA Workbench [35] or the compositional model construction and analytic techniques of the TIPPtool. The following example illustrates the use of SPA and how abstraction is used to describe systems in a compact manner and how to improve solution efficiency. 3.4.1. Global structure of the HCS This study is part of an ongoing performance measurement and modeling project which is being conducted at the University of Erlangen [42–44]. The hospital communication system provides a communication infrastructure that is used by medical subsystems for exchanging information such as patient data, observation results, medical images and accounting data. Furthermore, the system consists of a huge number of interacting subsystems, among them the hospital’s main laboratory, an observations processing system and the operations documentation system. In a large clinic such as the Erlangen University hospital there exists a great variety of subsystems associated with different departments and institutions. Due to historical reasons, these decentralized information processing systems are mostly incompatible. In the past, communication between subsystems was based on proprietary one-to-one relations. Integration efforts have led to the use of standardized

U. Herzog, J. Rolia / Performance Evaluation 45 (2001) 125–146

137

Fig. 5. Base model of the Erlangen HCS.

message formats (e.g. the health level 7 message standard developed for the healthcare sector) and the deployment of a central communication server. In Erlangen, the communication server DataGate from STC is used, whose tasks are the reception, checking, processing, routing and forwarding of (standardized) messages between medical subsystems. Among the subsystems in the Erlangen HCS, the patient management system, a SAP R/3 IS-H product, is the central business application. Besides the patient management system, there is a second large database, the communication database, which mirrors parts of the patient management system and contains additional medical information. The communication database serves as a fast data buffer which, from the point of view of the subsystems, provides data access about 10 times faster than the patient management system itself, and as a side effect also significantly reduces the load of the latter. We only present a rudimentary model of the Erlangen hospital communication system which during our project has been extended in various directions. Fig. 5 shows the basic structure of the model. It consists of the communication server (CS), the communication database (CDB) and two medical subsystems, the main laboratory system (MLS) and an observations processing system (OPS). Since almost all of the subsystems’ demands for data can be satisfied by the CDB, we do not model the patient management system (PMS) at this stage (therefore the PMS is drawn gray in the figure). There is an “artificial” subsystem, representing an adjustable background load (BL), caused by those subsystems which are not explicitly modeled (BL actually consists of two sub-processes, a source and a sink which communicate with CS via actions load in and load out). The top-level specification for the TIPPtool is as follows: hide request, response in (MLS ||| OPS) |[request, response]| ( hide query,answer in ( hide load in,load out in BL |[load in,load out]| CS ) |[query, answer]| CDB )

138

U. Herzog, J. Rolia / Performance Evaluation 45 (2001) 125–146

The subsystems MLS and OPS run in parallel but without synchronization (III-operator), while BL, CS and CDB synchronize via different events (|[..]|-operator). Since subsystems communications are not visible at the system level they are all hidden (operator hide. . . in). Rather than presenting the complete model specification — which can be found in [44,45] — we show next the advantage of compositional modeling and some experimental results. 3.4.2. Compositional specification and analysis Describing the overall behavior of the HCS and transforming it as a whole leads to a large state-space. This is particularly true if we model the complete HCS with some 20 clinics. Here, the unique features of process algebras reduces the state-space explosion problem considerably, especially when there are several symmetric subsystems (e.g. input stations); process algebras allow to model and analyze the complete system model in a stepwise fashion out of smaller building blocks, the first fundamental principle of process algebras [28]. The second feature is also used systematically: process algebras allow one to precisely describe equivalent behavior, i.e. one can (using the corresponding algorithm) formally compare system descriptions and filter out equivalent states. This stepwise composition and aggregation technique is illustrated in Fig. 6

Fig. 6. Compositional aggregation applied to the HCS (comparison of states and transitions before and after aggregation).

U. Herzog, J. Rolia / Performance Evaluation 45 (2001) 125–146

139

Fig. 7. Percentage of time the main laboratory system is waiting for the communication system with varying BL.

by means of a small example. We gradually cluster the different subsystems while eliminating equivalent states at each composition step. Then, instead of generating and analyzing the state-space of the overall model (in our example, 4951 states and 16236 transitions) the automatically generated state-space is much smaller (2294 states and 7340 transitions). The modest reduction is due to the diversity of components; it increases drastically if there are, as in the real HCS, several components of the same type. Recently, the combination of our prototype TIPPtool and a very efficient tool for functional analysis CAESAR/ALDEBARAN [46,47] permitted the analysis of a system with more than 107 states in total [48]. Here, we note the relationship between the notion of replication of resources of the same type in SPAs and package replication in LQMs. In each case a system’s symmetry is exploited to simplify model description and improve solution efficiency. We have calculated a variety of numerical results for the Erlangen HCS model. These results coincide quite nicely with throughput measurements on the actual system configuration and predict limits for various performance characteristics. For example, experiments revealed that under heavy background load the subsystem MLS spends up to 11% of the total time waiting for the CS (cf. Fig. 7). In this as in other experiments, the offered background load was increased exponentially, starting with an initial value load0 = 1 request per second which was doubled in every step. Fig. 8 shows the proportion of time the CS is idle, depending on the offered background load. It should be noted that due to the average message processing time of 50 ms per message, a maximum of 20 messages per second can be carried by the CS, no matter how much background load is offered. Other interesting results show the influence of different buffer sizes, the effect of decreasing bit rates in the CDB due to up-date failures or the merit of parallel CDB processes.

140

U. Herzog, J. Rolia / Performance Evaluation 45 (2001) 125–146

Fig. 8. Percentage of idle time of the communication system with varying BL.

Beside performance results also functional properties such as absence of deadlocks or loops could be proved. All these experiments helped significantly to better understand the actual system and to extend it in an efficient way [44].

4. Comparison of techniques This section compares the two modeling techniques with respect to modeling abstraction, detail, and their relationship with model building. There are many similarities between the techniques, but there are also fundamental differences. Their main features are summarized in Table 1. Both tool sets use abstraction to reduce the problem of describing models and to provide for more efficient performance evaluation. Each is able to express some portion of system behavior in a manner that can be reused in larger models. LQMs represent the components of a system’s software/hardware hierarchy and their mutual use. They abstract away all functional details at the start of the performance evaluation process — in particular when moving from a software description described using sequences, objects, and methods to a corresponding queueing model with processes, hardware and networks. The queueing model focuses directly on resource allocation and processing requirements to estimate response times and utilizations. The queueing model is partitioned into a sequence of QNMs that are analyzed using MVA to give estimates for major performance characteristics.

U. Herzog, J. Rolia / Performance Evaluation 45 (2001) 125–146

141

Table 1 Comparison of LQM and SPA techniques LQM Basic paradigm Model generation

Multi-level queueing models and MVA Data from instrumented applications can be used to automatically generate the LQM, can characterize the LQM with different levels of detail Depth of system study Fixed level of abstraction suitable for comparing system designs and for configuration/capacity planning Results Characteristic mean performance values

Accuracy Evaluation cost Tools Acceptance

Good Very low/reasonable Mature prototypes Growing

SPA SPA models and solution of CTMC Specifications (LOTOS) of the system components are directly used and enhanced by (measured) time attributes Hierarchical design including very detailed information at low levels Performance values, functional properties and mixed results. Formal comparison of components and systems, verification Good to high High in most cases Prototypes Still low because of the unconventional theoretical foundation

The intent of SPA is very similar. However, at each level of modeling both functional as well as temporal requirements and dependencies are considered and analyzed. It is possible to get more information, i.e. to evaluate and reason about functional, temporal and combined properties of a software/hardware system. This leads to more precise but also more complex models that have greater solution costs. Compositional analysis and reduction techniques influence a performance engineer’s selection of modeling abstractions and are applied to alleviate these problems. System specifications and implementations can be compared formally (verification). Todays SPA usually assume exponential time intervals. Exact distribution functions (e.g. for a response time) cannot be transferred to the next level of abstraction; rather they are approximated by exponentials or phase-type distributions. More experience and research on SPA with generally distributed time intervals are necessary. Similar issues exist for LQMs. [4] describes the use of phase-type distributions for propagating distribution information across layers in LQMs. Building models for complex systems is a challenging task. Systems and their designs have many components and can change rapidly. Systems and their prototypes can be used to obtain estimates for resource demand values and for visits between software components. We note that the quality of resource demand measures is determined by operating systems and is limited. Typically, mean values are reported for CPU demands; the accuracy of CPU demand measures are affected by interrupt rates; and physical disk operations are often reported on a per process or per host basis not on a per service basis. Furthermore, the combined effects of real world schedulers, memory management systems, and caches are complex and limit the accuracy of any resulting absolute performance predictions. For these reasons there are limits to the amount of detail that should be included in any model. Because of the close relationship between LQMs and distributed application system middle-ware platforms it is possible to automatically build significant portions of LQM models [8,9,13]. An LQMs

142

U. Herzog, J. Rolia / Performance Evaluation 45 (2001) 125–146

abstraction is fixed in terms of processes, hardware, and networks, but in [9,13] the level of detail (number of services) within processes can be controlled by the monitoring system and model builder. An analyst can maintain detail where it is needed most and suppress it in other areas. A resulting model can then be modified by hand to describe the planned system under alternative circumstances. SPA models rely on hierarchical abstraction to help, manage and overcome the state-space explosion problem. The natural and ideal way is that this hierarchy is already given by the functional specification of the system (e.g. by the LOTOS specification available for ISO protocols). Model parameters are then derived from a prototype implementation or emulation. However, currently and most often, the choice of the model hierarchy and the specification of model parameters are done by the analyst and is part of the design exercise. Research is needed to better understand the possibilities for automatically building SPA models that best exploit the SPA notion of hierarchical abstraction for both clarity and solution efficiency. To summarize, a performance model should be sufficiently precise to help draw correct conclusions for the problems at hand. LQMs have structure and parameters that are closely matched with measures that can be obtained from application and operating system monitors. SPA models can describe the same features but with a greater performance evaluation cost. We note that SPA models describe functional, temporal and combined properties of systems and as a result can provide for a richer set of experimental design alternatives and analysis results. They offer a more direct approach for describing complex micro- and macro-level interaction patterns that except for special case studies [27] are lacking in LQMs. However, the extra detail must be supported by a very careful characterization of model parameters. SPA and their underlying CTMC are most descriptive whereas LQMs and MVA based performance evaluation are most efficient. The choice of modeling approach therefore depends on the detail and timeliness required for the problems at hand. Research is needed to compare the accuracy of these techniques for specific problem domains and how they can support one another for system performance validation.

5. Summary and conclusions Performance evaluation has to be integrated into the design process from the very beginning. We discussed and compared two promising approaches, LQMs and SPAs. LQMs have been studied since the mid-1980s. Recent extensions include: phase-type descriptions of visit and service time information [4] and support for scalability evaluation [7]. The models are well suited for transaction processing and internet systems. Significant efforts have been targeted towards automatically building models based on measurements of prototype and production systems. Several examples of automated model building of LQMs for the MOL tool are described. Several case studies offer verified predictive results [14,15,49] and motivate the use of LQMs for system performance validation. Much work must still be done to reflect the complex effects of operating system and midware heuristics on distributed application system behavior. In the mean time the results of these models appear adequate for assessing alternative designs and for capacity planning purposes. SPAs are new approach with unique features: the abstraction process for complex system modeling is supported, the state-space can be reduced automatically, and compositional modeling allows a significant

U. Herzog, J. Rolia / Performance Evaluation 45 (2001) 125–146

143

state-space reduction. Most SPA deal with systems assuming exponential or phase-type distributions. Compact state-space representations and efficient evaluation procedures are important topics of current research. There is a clear trend, however, to investigate also general distribution functions. Then, the dream of systematic hierarchical modeling with exact interface representation between layers seems to be reachable. Integrating the fundamental ideas of LQM and SPA research and the automation of model building are promising challenges for the future.

Acknowledgements The authors thank the anonymous reviewers for their helpful comments as well as Xiaoyun Zhu and Sujata Banerjee of HP Labs. This work was supported by grants from the Natural Sciences and Engineering Research Council of Canada, Hewlett-Packard and the German Research Society (DFG). References [1] C.M. Woodside, J.E. Neilson, D.C. Petriu, S. Majumdar, The stochastic rendezvous network model for performance of synchronous client–server-like distributed software, IEEE Trans. Comput. 44 (1995) 20–34. [2] J. Rolia, K.C. Sevcik, The method of layers, IEEE Trans. Softw. Eng. 21 (8) (1995) 689–700. [3] E.D. Lazowska, J. Zahorjan, G.S. Graham, K. Sevcik, Quantitative System Performance Using Queuing Network Models, Prentice-Hall, Englewood Cliffs, NJ, 1984. [4] S. Ramesh, H.G. Perros, A multi-layer client–server queueing network model with synchronous and asynchronous messages, in: Proceedings of the First International Workshop on Software and Performance (WOSP ’98), Santa Fe, NM, October 1998, pp. 107–119. [5] M. Reiser, A queuing network analysis of computer communication networks with window flow control, IEEE Trans. Commun. 27 (1979) 1201–1209. [6] G. Franks, A. Hubbard, S. Majumdar, D. Petriu, J. Rolia, C.M. Woodside, A toolset for performance engineering and software design of client–server systems, Perform. Evaluation 24 (1–2) (1996) 117–135. [7] F. Sheihk, J. Rolia, P. Garg, S. Frolund, A. Shepherd, Layered modeling of large-scale distributed applications, in: Proceedings of the First World Congress on Systems Simulation, Quality of Service Modeling, Singapore, September 1–3, 1997, pp. 247–254. [8] C. Hrischuk, C.M. Woodside, J. Rolia, R. Iversen, Trace-based load characterization for generating software performance models, IEEE Trans. Softw. Eng. 25 (1) (1999) 122–135. [9] F. El-Rayes, J. Rolia, R. Friedrich, The performance impact of workload characterization for distributed applications using ARM, in: Proceedings of the Computer Measurement Group (CMG’98), Anaheim, CA, December 1998, pp. 821–830. [10] C. Hrischuk, Trace-based load characterization for the automated development of software performance models, Ph.D. Thesis, Carleton University, Canada, 1998. [11] Microsoft Visual Studio Analyser. http://www.microsoft.com/Mind/0699/analyzer/analyzer.htm. [12] CMG ARM Homepage. http://www.cmg.org/regions/cmgarmw/index.html. [13] F. El-Rayes, Design and implementation of an ARM 2.0 compatible performance monitoring and model building framework, M.Eng. Thesis, Carleton University, Canada, May 1999. [14] D. Krishnamurthy, Performance characterization of web-based shopping systems, M.Eng. Thesis, Department of Systems and Computer Engineering, Carleton University, Canada, September 1998. [15] J. Dilley, R. Friedrich, T. Jin, J. Rolia, Web server performance measurement and modeling techniques, Perform. Evaluation 33 (1998) 5–26.

144

U. Herzog, J. Rolia / Performance Evaluation 45 (2001) 125–146

[16] M. Litoiu, D. Krishnamurthy, J. Rolia, Performance testing for distributed object applications, in: Web Proceedings of ICSE Workshop on Testing Distributed Component-based Systems, Los Angeles, CA, May 17, 1999. http://www.siemens.com/ICSE99workshop/prog.html. [17] D. Krishnamurthy, M. Litoiu, J. Rolia, Performance stress conditions and capacity planning for E-business applications, in: Proceedings of the International Symposium on Electronic Commerce, Beijing, PR China, May 17–20, 1999. [18] R. Friedrich, J. Rolia, Applying performance engineering to a distributed application monitoring system, in: A. Schill, C. Mittasch, O. Spaniol, C. Popien (Eds.), Distributed Platforms, Chapman & Hall, New York, 1996, pp. 258–271. [19] H. El-Sayed, D. Cameron, C.M. Woodside, Automated performance modeling from scenarios and SDL designs of telecom systems, in: Proceedings of the International Symposium on Software Engineering for Parallel and Distributed Systems (PDSE’98), Kyoto, April 1998. [20] F. Sheikh, C.M. Woodside, Sensitivity analysis of performance predictions of distributed application models, in: Proceedings of the Second International Symposium on Sensitivity Analysis of Model Output, Venice, April 1998. [21] F. Sheikh, C.M. Woodside, Layered analytic performance modeling of a distributed database system, in: Proceedings of the 1997 International Conference on Distributed Computing Systems, May 1997, pp. 482–490. [22] P.P. Jogalekar, G. Boersma, R. MacGillivray, C.M. Woodside, TINA architectures and performance: a telepresence case study, in: Proceedings of the TINA’95, Melbourne, Australia, February 1995. [23] D.C. Petriu, C.M. Woodside, Approximate MVA from Markov model of software client/server systems, in: Proceedings of the Third IEEE Symposium on Parallel and Distributed Processing, Dallas, TX, December 1–5, 1991, pp. 322– 329. [24] D.C. Petriu, X. Wang, Deriving software performance models from architectural patterns by graph transformations, in: Proceedings of the Sixth International Workshop on Theory and Applications of Graph Transformations (TAGT’98), Paderborn, Germany, November 1998 pp. 340–347. [25] C. Shousha, D.C. Petriu, A. Jalnapurkar, K. Ngo, Applying performance modeling to a telecommunication system, in: Proceedings of the First International Workshop on Software and Performance (WOSP’98), Santa Fe, NM, October 1998, pp. 1–6. [26] G. Franks, C.M. Woodside, Performance of multi-level client–server systems with parallel service operations, in: Proceedings of the First International Workshop on Software and Performance (WOSP’98), Santa Fe, NM, October 1998, pp. 120–130. [27] D.C. Petriu, S. Majumdar, J. Lin, C. Hrischuk, Analytic performance estimation of client–server systems with multi-threaded clients, in: Proceedings of the International Workshop on Modeling, Analysis and Simulation of Computer and Telecommunication Systems (MASCOTS’94), January 1994, pp. 96–100. [28] R. Milner, Communication and Concurrency, Prentice-Hall, Englewood Cliffs, NJ, 1989. [29] C.A.R. Hoare, Communicating Sequential Processes, Prentice-Hall, Englewood Cliffs, NJ, 1985. [30] T. Bolognesi, E. Brinksma, Introduction to the ISO specification language LOTOS, Comput. Networks ISDN Syst. 14 (1987) 25–59. [31] N. Götz, U. Herzog, M. Rettelbach, TIPP — Einführung in die leistungsbewertung von verteilten systemen mit Hilfe von Prozessalgebren, in: Verteilte Systeme, Grundlagen und Zukünftige Entwicklungen aus der Sicht des SFB 182, BI-Wissenschaftsverlag, 1993. [32] U. Herzog, A concept for graph-based stochastic process algebras, generally distributed activity times and hierarchical modeling, in: M. Ribaudo (Ed.), Proceedings of the Fourth Workshop on Process Algebras and Performance Modeling, Università di Torino, CLUT, 1996. [33] E. Brinksma, J.P. Katoen, R. Langerak, D. Latella, Partial order models for quantitative extensions of LOTOS, Comput. Networks ISDN Syst. 30 (1998) 925–950. [34] C. Priami, Stochastic ␲-calculus with general distributions, in: M. Ribaudo (Ed.), Proceedings of the Fourth Workshop on Process Algebras and Performance Modeling, Università di Torino, CLUT, 1996. [35] S. Gilmore, J. Hillston, The PEPA workbench: a tool to support a process algebra-based approach to performance modeling, in: G. Haring, G. Kotsis (Eds.), Proceedings of the Seventh International Conference on Modeling Techniques and Tools for Computer Performance Evaluation, Wien, 1994. [36] M. Bernardo, W.R. Cleaveland, S.T. Sims, W.J. Stewart, Two towers: a tool integrating functional and performance analysis of concurrent systems, in: Proceedings of the FORTE/PSTV ’98, Paris, 1998.

U. Herzog, J. Rolia / Performance Evaluation 45 (2001) 125–146

145

[37] H. Hermanns, V. Mertsiotakis, M. Rettelbach, A construction and analysis tool based on the stochastic process algebra TIPP, in: Proceedings of the Second International Workshop on Tools and Algorithms for the Construction and Analysis of Systems (TACAS’96), Lecture Notes in Computer Science, Vol. 1055, Springer, Berlin, 1996. [38] J. Hillston, V. Mertsiotakis, A simple time scale decomposition technique for stochastic process algebras, in: S. Gilmore, J. Hillston (Eds.), Proceedings of the Third Workshop on Process Algebras and Performance Modeling (special issue), Comput. J., 38 (7) (1995), Oxford University Press, Oxford. [39] V. Mertsiotakis, M. Silva, Throughput approximation of decision-free processes using decomposition, in: Proceedings of the Seventh International Workshop on Petri Nets and Performance Models, St. Malo, IEEE Computer Society Press, June 1997. [40] P. Dauphin, F. Hartleb, M. Kienow, V. Mertsiotakis, A. Quick, PEPP: Performance Evaluation of Parallel Programs — User’s Guide, Version 3.3, Technical Report 17, Universität Erlangen, IMMD7, 1993. [41] K. Marzbani, Hierarchische Beschreibung und Analyse von Kommunikationssystemen mittels graphbasieren Prozessalgebren, Master’s Thesis, Universität Erlangen, IMMD7, 1997. [42] M. Siegle, B. Wentz, A. Klingler, M. Simon, Neue Ansätze zur Planung von Klinikkommunikationssystemen mittels stochastischer Leistungsmodellierung, in: R. Muche, G. Büchele, D. Harder, W. Gaus (Eds.), 42. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie (GMDS), MMV Medien & Medizin Verlag, Ulm, September 1997, pp. 188–192. [43] M. Siegle, D. Kraska, M. Simon, B. Wentz, Analyse des Erlanger Klinikkommunikationssystems mit Hilfe von Leistungsmessungen, in: E. Greiser, M. Wischnewsky (Eds.), 43. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Episemiologie (GMDS), MMV Medien & Medizin Verlag, Bremen, September 1998, pp. CD-ROM C24. [44] B. Aures, Modellierung des Erlanger Klinikkommunikationssystems mit Hilfe von stochastischen Prozessalgebren und TIPPtool, Studienarbeit, Universität Erlangen-Nürnberg, MMD7, October 1998. [45] H. Hermanns, U. Herzog, U. Klehmet, V. Mertsiotakis, M. Siegle, Compositional performance modeling with the TIPPtool (extended version of the TOOLS ’98 paper, Lecture Notes in Computer Science, Vol. 1469), Perform. Evaluation 39 (2000) 5–35. [46] H. Garavel, OPEN/CAESAR: an open software architecture for verification, simulation and testing, in: B. Steffen (Ed.), Tools and Algorithms for the Construction and Analysis of Systems, Lecture Notes in Computer Science, Vol. 1384, Springer, Berlin, 1998, pp. 68–84. [47] M. Bozga, J.-C. Fernandez, A. Kerbrat, L. Mounier, Protocol verification with the ALDÉBARAN toolset, Int. J. Softw. Tools Techn. Transf. 1 (1/2) (1997) 166–184. [48] H. Hermanns, J.P. Katoen, Automated compositional Markov chain generation for a plain old telephony system, Sci. Comput. Program. 36 (1) (2000) 97–127. [49] Yi-Chun Chu, C.J. Antonelli, T. Teorey, Performance measurement of peoplesoft multi-tier remote computing application, Parts I and II, in: Proceedings of the CMG ’98, Anaheim, December 1998, pp. 1–25. [50] A. Aldini, M. Bernardo, R. Gorrieri, An algebraic model for evaluating the performance of an ATM-switch with explicit rate marking. in: Proceedings of the PAPM Workshop, PAPM’99, Zaragoza.

Ulrich Herzog received all his degrees in electrical engineering from the University of Stuttgart. In 1964, he joined the Institute for Switching Techniques and Data Processing at the University of Stuttgart, working in the area of telephone switching systems and teletraffic research. He then spent 2 years in the Teleprocessing System Optimization group at IBM Thomas J. Watson Research Centre. Since 1976, he has been full Professor at the University Erlangen-Nürnberg. Since 1981, he has held the chair on computer architecture and performance evaluation. His current research and teaching interests are architecture and performance evaluation of computer systems and communication networks. In particular, he is involved in projects on system design methodology, the integration of process algebras and performance modeling and rapid prototyping of real-time systems.

146

U. Herzog, J. Rolia / Performance Evaluation 45 (2001) 125–146 Jerry Rolia received his Ph.D. from the University of Toronto in 1992. He was an Associate Professor in the Department of Systems and Computer Engineering at Carleton University in Ottawa, Canada until 2000 and is now a researcher with Hewlett-Packard Laboratories. His interests include performance evaluation and management, large-scale system architecture, programmable networking and systems, and software performance engineering. He can be contacted at his e-mail address.