Quality Assessment of a Parallel System Simulator

Quality Assessment of a Parallel System Simulator

Parallel Computing: Fundamentals, Applications and New Directions E.H. D'Hollander, G.R. Joubert, F.J. Peters and U. Trottenberg (Editors) © 1998 Else...

357KB Sizes 0 Downloads 133 Views

Parallel Computing: Fundamentals, Applications and New Directions E.H. D'Hollander, G.R. Joubert, F.J. Peters and U. Trottenberg (Editors) © 1998 Elsevier Science B.V. All rights reserved.

685

Quality Assessment of a Parallel System Simulator Remo Suppi, Emilio Luque, Joan Sorribes Computer Architecture and Operating Systems Group - Computer Science Dept. - Building C, University Autonoma of Barcelona. Bellaterra (08193) - Spain - email: [email protected] We have developed a set of tools for software engineering in parallel processing (EU-funded project). The aims of the project are to develop methodologies and tools for parallel software engineering. The main tool corresponds to an event-driven simulator that uses synthetic descriptions of a parallel programme and a parallel architecture. This work deals with the model verification, the data collection and the simulator validation.

1.

INTRODUCTION

A spectacular growth in the development of the high-performance parallel (and distributed) systems has been observed over the last decade. The parallel and distributed computer systems have their power in the theoretical possibility of executing multiple tasks in co-operative form. The achievement of this objective involves several factors such as understanding interconnection structures, technological factors, granularity, algorithms and policies of system. In response to this new problem, many researchers have begun to develop novel approaches to the development of suitable methodologies and tools for parallel programming.[l-3] In this work we use a set of tools for software engineering in parallel processing, developed as part of an EU-funded project. The aims of the project are to develop methodologies and tools for parallel software engineering. The present project is the result of the evolution of a previous simulator (PSEE) developed upon a simple model. [4-6] When we use simulation for parallel programme design, the life cycle is 'design ofparallel programme, simulation, analysis and redesign'. This methodology involves a shorter and more flexible development cycle than the same in the real system. Simulation provides behaviour information to the designer at earlier stages of the design process. The user can analyse in a shorter and interactive way, the programme behaviour from the point of view of performance, estimating the influence of different parameters involved in the design. The most important aspect of simulation methodologies is to yield behaviour and results close to the real system. To obtain this goal, a careful model is necessary. This model should have enough detail level to adjust the modelled system to the real system. However, this usually involves a complexity grade. Therefore, a trade-off solution between detail and complexity must be reached.[7,8] We have developed models for parallel algorithms and architectures to support a good performance evaluation analysis. Based on these models, we have built a set of tools to simulate the performance behaviour of parallel algorithms on parallel architectures. The main

686 tool corresponds to an event-driven simulator that uses synthetic descriptions of a parallel programme and a parallel architecture. [9,4] This work deals with the model verification, the data collection and the simulator validation. The first of these implies the model containing enough elements of the real system to represent it with a given detail level. The second serves to list the model parameters that have to specify their values numerically. The objective of the third part is to prove that the simulator is a correct model implementation and its results hold a limited error range from those obtained from the real system. [9]

2.

MODEL VALIDATION & VERIFICATION METHODOLOGIES

When a simulator programme produces coherent results, the following step consists of verifying that the simulation programme will be a valid implementation of the model. For small models, this task can be obviated by inspection of the simulation traces files; for large programmes an analysis method is necessary. The verification requires as a minimum a comparative analysis of the description of the model and of the simulation programme. The validation, on the contrary, has as an objective to demonstrate that the simulation model is a responsible representation of the real system: this reproduces the behaviour with a sufficient accuracy to satisfy the objectives of the analysis. In our case the validation objectives are centred in that the real execution of an application and the simulation must be equivalent. For this it is necessary to have the parallel or distributed architecture in order to show the degree of similarity between both results. Within this point it is first necessary to make an adjustment of the parameters of the models, extracting accurate data for the model to simulate, which also represents a problem to consider. The methodology applied in our case can be resumed as: the representation of the algorithms model will be considered, that is to say, the model have all the elements to represent a parallel code written in a language of conventional use. In our case Parallel C^^ and PYM"^^ will be considered as parallel & distributed languages. As a second step we must put the values in the parameters characterising the algorithm, architecture and operating system (OS). This includes the stochastic analysis that determine the distribution functions that adapt the model in each particular case. Finally, we compare the simulation results with their real execution. The objective is to validate the simulation tool and to obtain conclusions about the degree of accuracy and precision that this possesses. In order to show that the model of the parallel algorithm possesses all the elements to be a valid representation of a code written in a high level language, well-known applications has been selected: Travelling Salesman Algorithm and Master-Worker Algorithm. A detailed analysis and references are shown in [9]. For this part we can concluded that the model has all the elements to model all the sentences of the code for these languages. As was mentioned previously, the design of a model involves two tasks: the development of a representation of the system (model) and furthermore the representation of the work that the system accomplishes. When is necessary to know the values of the model parameters two types of parameters can be considered. Deterministic parameters, not changing in the time; the situation is manageable since it is only necessary to obtain these parameters from the real system. On the other hand, the stochastic parameters, changing in function of the time. In this case it is necessary to consider as a value of the parameters the associated distributions and not your means values. For example, in our model, the computing volume is a variable which it

687 can have either deterministic or stochastic behaviour. We have developed a tool to obtain the parameter values for the algorithm and OS. This tool, with the resuhs of a set of executions, can extract the variation law of the parameters that are not deterministic for this model. A detailed reference for this tool can be obtained from [9].

3. VALIDATION OF THE SIMULATION TOOL The objective of this phase is to validate and verify the behaviour of the simulator. To validate the simulator, a Master-Workers algorithm (implemented on PVM) has been used. A full validation analysis has been developed in [9] using more complicated algorithms & architectures. The procedure is: • Real Execution of the MW algorithm with the objective of obtaining the parameters from the application (values for the nodes, arcs: computing volume, communication volume, probabilities, arrival-departure point of conmiunications, etc). These parameters are obtained from the tool mentionated in the previous paragraph. • Simulation of this application and parameters adjustment. • Comparison of the results and conclusions about the precision and estimate capacity of the simulator as well as considerations on the adjustment of the parameters. The graphics of the figures 1 and 2 show the result of the real execution of the MW for 1,2,3, and 4 workers on a processor (together with the master) and for master and workers on different processors. These measures have been accomplished in optimum operation conditions, that is to say, without net load and with the minimal influence of the operating system. These graphics show the comparison of the simulation results and the real execution results, as well as the error among the real measure and obtained by simulation for both cases. As can be observed from the data of the simulation and its comparison with the real execution, it can be asserted that the data obtained by simulation are a valid representation of the real system. We can conclude that the simulator reproduces the behaviour with a sufficient loyalty to satisfy the objectives of the analysis (a more detailed analysis for different algorithms can be found in [9]). In figure 3, we show that the greater error is produced when the M-W are in different processor. To reduce this error a most accurate adjustment of the communication speed in the architecture link would be necessary. In the simulated example the connection among the processors has been considered without external load, while in the real link (Ethernet network) there are other messages in addition to those of the application. As a final test, a prediction with the simulator has been carried out. This test consists of the execution of 10 workers and this execution has been verified the real execution of the programme. The obtained results are: total execution time by simulation = 1,535 sec, real execution time of the application on HP712 / 60 = 1,412 sec. (error: 8,7%). The local administration CPU algorithm gives the difference among the prediction and the real execution. An adjustment of the parameters that models the processor speed and the round robin quantum of CPU scheduling algorithm would indicate a decrease on the error. This analysis has been developed in optimum conditions (low net load & minimum OS influence). A complete prediction analysis to estimate the influence of the processor number, OS overhead, network load is being developed based in matrix multiplication & image convolution algorithms. As a final point of the experiment it can be concluded that the proposed objectives have been achieved with very good results.

688 Fig. 1. MW in the same processor

r670

DSim.Time 13 Heal Hme

1.335

2

Fig. 2. MW in different processors

I

3

2

No. of Workers

3

4

No. of Workers

DMW in different proc. •MW in the same proc.

I

m

2 3 N* of Workers

Fig. 3. Simulation Vs. Real Execution Error This work has been supported by the CICYT under contract TIC 95-0868

REFERENCES [1] Computer IEEE. Special Issue on Parallel & Distributed Processing. Vol. 28. No 11. Nov. 1995.[2] lerotheu, C. et al. Computer Aided Parallelization Tools. Parallel Computing. Vol. 22. No 2. pp. 37-45.1996. [3] Zeigler, Bernard et al. A Simulation Environment for Intelligent Machine Architectures. Journal of Parallel & Distributed Computing. Vol. 18. pp. 77-88.1993. [4] Parallel System Evaluation Environment. User's Guide. Computer Architecture & Operating Systems Group. University Autonoma of Barcelona, www.caos.uab.es /coper.html. January 1997. [5] Luque E., Suppi R., Sorribes J., Cesar E., Falguera J., Serrano M. PSEE: A tool for Parallel Systems Learning. Computers & Artificial Intelligence. Vol.14, No 1. pp 319-339. 1996. [6] Luque, E., Suppi, R., Sorribes, J. PSEE: Parallel Systems Evaluation Environment. Proceedings 5th International PARLE Conference. PARLE '93. Vol. 1. pp. 696-699.1993. [7] Keith, A. et al. Systems Modelling & Computer Simulation. Dekker Pub. September 1995. [8] MacDougall M. Simulating Computer Systems. The MIT Press. 1987. [9] Suppi R. PhD Thesis. Models & Simulation of Parallel Systems. Computer Architecture & Operating Systems Group. University Autonoma of Barcelona. June 1996. (in Spanish)