Future Generation Computer Systems 22 (2006) 742–744 www.elsevier.com/locate/fgcs
Editorial
Special section: Systems performance analysis and evaluation Geyong Min a,∗ , Mohamed Ould-Khaoua b a Department of Computing, School of Informatics, University of Bradford, Bradford BD7 1DP, UK b Department of Computing Science, University of Glasgow, Glasgow G12 8RZ, UK
Available online 2 May 2006
The study of performance analysis and evaluation is of immense importance as it is needed at every stage in the life cycle of a system, including its design, manufacturing, sale, use, upgrade, and so on. This special issue contains 11 selected papers (out of 32) as extended and revised versions, which were originally presented at the International Workshop on Performance Modeling, Evaluation, and Optimization of Parallel and Distributed Systems (PMEO-PDS’03), held in conjunction with IEEE IPDPS’03, 22–26 April, 2003, Nice, France. The purpose of the workshop was to promote discussion on the development of innovative tools and techniques to keep up with the rapid evolution and ever increasing complexity of parallel and distributed systems. Each paper was fully peer-reviewed according to the practice of this journal. The selected papers span important aspects of systems performance analysis and evaluation, including middleware services, large-scale distributed interactive simulation, parallel programming language, sorting algorithms, system call tracing, interconnection networks, multicast communication, faulttolerant routing, and analytical performance tools based on queueing systems, Markov chains and semi-Markov processes. The contributions of these papers are outlined below. Performance-based middleware services are set to play an increasingly important role in the management of resources and distributed workloads in emerging wide-area, heterogeneous distributed computing environments. The paper by Jarvis, Spooner, Lim Choi Keung, Cao, Saini, and Nudd documents two prediction-based middleware services that address the implications of executing a particular workload for a given set of resources. The results of this study show that performance improvements with regard to system- and user-level qualities of service brought about by these middleware services are ∗ Corresponding author. Tel.: +44 1274 234021.
E-mail address:
[email protected] (G. Min). c 2006 Elsevier B.V. All rights reserved. 0167-739X/$ - see front matter doi:10.1016/j.future.2006.02.010
significant at both the intra-domain and multi-domain levels in heterogeneous distributed computing environments. The selection of an appropriate data distribution management scheme can be a critical factor in accomplishing efficient data exchange within a large-scale distributed interactive simulation. Data distribution management (DDM) is a service that seeks to control the volume of messages exchanged during a simulation, thereby decreasing the workload for the simulation hosts. Boukerche and Dzermajko define and describe the various methods of DDM, contrast their similarities and differences and discuss the implementation of several DDM methods. This work promotes an understanding of the benefits of DDM and offers a detailed explanation of the available DDM methods. UPC, also known as Unified Parallel C, is an explicit parallel programming language based on the ISO C standard and the Partitioned Global Address Space (PGAS) programming model. El-Ghazawi et al. define a methodology for evaluating the performance of PGAS compilers. Their strategy includes emulating potential UPC compiler optimizations and comparing with unoptimized versions of the codes as well as with MPI and SHMem implementations. The results obtained from the comparison of measurements for the raw performance versus the one with the emulated optimizations allows uncovering whether that optimization was implemented or not in current compilers, and the potential performance impact of improving the quality of its implementation. Cerin, Koskas, Fkaier, and Jemni investigate the effects of cache between the first two levels of memory hierarchy for sequential in-core sorting algorithms in the frameworks of two applications, notably the parallel sorting for heterogeneous clusters and the development of an SQL service. This research uses hardware performance counters available on most modern microprocessors to observe performance, exploit functional units in processors, and optimize the use of the memory system and different functional units. The performance results reveal that on an Athlon processor (a three-way superscalar
G. Min, M. Ould-Khaoua / Future Generation Computer Systems 22 (2006) 742–744
x86 processor) data cache misses of the first level are not the central problem but a subtil proportion of independent retired instructions should be advised to obtain an acceptable performance for in-core sorting. System call tracing can be a very effective way of capturing an interactive workload. Burton and Kelly present a technique for enhancing system call tracing to capture an application’s paging behaviour, and evaluate the capture overheads and predictive value of the approach. This method leads to a slight loss in the degree of accuracy. Using a suite of memory-intensive applications, they evaluate how faithfully performance of alternative system configurations could be predicted using recorded workload traces. The choice of an interconnection network for a parallel computer depends on a large number of performance factors which are very often application dependent. Aljundi, Dekeyser, Kechadi, and Scherson propose a multi-criteria evaluation and comparison methodology for Multistage Interconnection Networks (MINs). This methodology is based on different performance measures, and its main feature is its ability to incorporate a number of performance factors simultaneously. The methodology can serve in the evaluation of the use of multistage interconnection networks as an intercommunication medium in current multiprocessor systems, and to demonstrate this the authors apply it to the recently introduced Chordal ringbased MIN. Multicast, referring to the delivery of a message disseminated from a given source to a group of destinations, is one of the most important collective communication operations. Al-Dubai, Ould-Khaoua, and Mackenzie present a new multicast path-based algorithm that relies on an approach that divides the destinations in a way that balances the traffic load on network channels during the propagation of the multicast message. Results from extensive simulations confirm that the new grouping algorithm can achieve a high degree of parallelism and low communication latency over a wide range of traffic loads in the mesh network. A feasible path from source to destination may not be guaranteed at the source based on local safety information when a network contains a large number of faults. Xiang proposes a fault-tolerant routing algorithm in hypercubes, which needs to set up a partial path based on local safety information. Simulation results show that the partial path set-up scheme is quite useful for fault-tolerant routing while the extra cost caused by path set-up is trivial. Thomas presents an iterative technique for deriving numerical solutions for marginal queue size probabilities for a class of queueing models. The proposed approach uses fewer states in approximation than is traditional and is shown to give a reasonable degree of accuracy under certain circumstances. In addition, two approximations are proposed for the joint queue size distribution and are evaluated numerically through two examples. It is shown that in some circumstances a naive product form approximation can work well, but in others it can be improved by considering additional information that is available from the iterative solution.
743
Semi-Markov processes (SMPs) are expressive tools for modelling and evaluating parallel and distributed systems; they are a generalization of Markov processes that allow for arbitrarily distributed sojourn times. Bradley, Dingle, Harrison, and Knottenbelt present an iterative technique for transient and passage time analysis of large structurally unrestricted semiMarkov processes. Their method is based on the calculation and subsequent numerical inversion of Laplace transforms and is amenable to a highly scalable distributed implementation. The authors then derive passage time densities, quantiles and transient state distributions for distributed systems with underlying semi-Markov state spaces of up to 1.1 million states. Continuous Time Markov Chains (CTMC) facilitate the performance analysis of dynamic systems in many areas of application and are particularly well adapted to the study of parallel and distributed systems. Benoit, Plateau, and Stewart present a new algorithm for computing the solution of large Markov chain models whose generators can be represented in the form of generalized tensor algebra. They propose an improvement of the standard numerical shuffle algorithm and study the performance on the basis of both memory needs and execution time. Acknowledgments We would like to express our deep thanks to the Editorin-Chief, Professor Peter Sloot, for providing us with the opportunity to host this special issue in Future Generation Computer Systems. We also thank the authors for their contributions, including those whose papers were not included. Last but not least, we are grateful for the thoughtful work of the many reviewers who provided invaluable evaluations and recommendations. Geyong Min received the Ph.D. degree in Computing Science from the University of Glasgow, United Kingdom, in 2003, and the B.Sc. degree in Computer Science from Huazhong University of Science and Technology, China, in 1995. He is currently a lecturer in the Department of Computing at the University of Bradford, United Kingdom. His research interests include Performance Modelling/Evaluation, Parallel and Distributed Systems, Computer Networks, Mobile Computing and Wireless Networks, and Grid Computing. Dr. Min is the Founding Co-Chair of the International Workshop on Performance Modelling, Evaluation, and Optimization of Parallel and Distributed Systems (PMEO-PDS) held in conjunction with IEEE/ACMIPDPS and the International Workshop on Performance Analysis and Enhancement of Wireless Networks (PAEWN) held in conjunction with IEEE -AINA. He serves as an Associate Editor of the Journal of Wireless and Mobile Computing and the Guest Editor for the journals Computation and Concurrency: Practice and Experience, Future Generation Computer Systems, Supercomputing, and Cluster Computing. He is the Program ViceChair of the International Conference on High Performance Computing and Communications (HPCC’2005) and the International Conference on Wireless Networks (ICWN’2005). He served on program committees of 30 professional conferences/workshops including GLOBECOM and ICCCN. He is a member of the IEEE Computer Society.
744
G. Min, M. Ould-Khaoua / Future Generation Computer Systems 22 (2006) 742–744 Mohamed Ould-Khaoua received his B.Sc. degree from the University of Algiers, Algeria, in 1986, and the M.App.Sci. and Ph.D. degrees in Computer Science from the University of Glasgow, UK, in 1990 and 1994, respectively. He is currently a Reader in the Department of Computing Science at the University of Glasgow, UK. His research focuses on applying theoretical results from stochastic processes and queuing theory to the quantitative study of hardware and soft-
ware architectures. Dr. Ould-Khaoua is an Associate Editor for the International Journal of Computers & Applications, and the International Journal of High-performance Computing & Networking. He is the Guest Editor of six special issues re-
lated to performance modelling and evaluation of computer systems and networks in the journals Computation and Concurrency: Practice & Experience, Performance Evaluation, Supercomputing, Journal of Parallel & Distributed Computing, and IEE Proceedings—Computers & Digital Techniques. He is the Co-Chair of the International Workshop Series on Performance Modeling, Evaluation, and Optimization of Parallel and Distributed Systems (PMEO-PDS). He has served on program committees of many international conferences and workshops, including MASCOTS, ICPADS, PDCN, MSWiM, PDCS, HET-NETs, ASMTA, CIC, SPECT, HPCS, IPCCC, and NPDPA. Dr. Ould-Khaoua’s current research interests are performance modelling/evaluation of parallel and distributed systems, and wired/wireless communication networks. He is a member of the IEEE-CS.