Special Issue on Modeling of Parallel Computers

Special Issue on Modeling of Parallel Computers

JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING 12,297-299 ( 1991) Special Issue on Modeling of Parallel Computers Guest Editors’ Introduction VER...

334KB Sizes 2 Downloads 159 Views

JOURNAL

OF PARALLEL

AND

DISTRIBUTED

COMPUTING

12,297-299 ( 1991)

Special Issue on Modeling of Parallel Computers Guest Editors’ Introduction VERNON

J. REGO

Department of Computer Sciences, Purdue University, West Lafayette, Indiana 47906

AND SATISH KTRIPATHI Department of Computer Science, University ofMaryland,

The many revolutionary changes brought about by the integrated chip-in the form of significant improvements in processing, storage, and communications-have also brought about a host of related problems for designers and users of parallel and distributed systems. These systems develop and proliferate at an amazing momentum, motivating research in the understanding and testing of complex distributed systems. Unfortunately, these relatively expensive systems are being designed, built, used, refined, and rebuilt (at perhaps an avoidable expense) even before we have developed methodology for understanding the underlying principles of their behavior. Though it is not realistic to expect that the current rate of manufacturing can be slowed down to accommodate research in design principles, it behooves us to bring attention to the importance of design methodology and performance understanding of such systems and, in this way, to attempt to influence parallel system design in a positive manner. At the present time, there is considerable debate among various schools of thought on parallel machine architectures, with different schools proposing different architectures and design philosophies. Consider, for example, one such debate involving tightly coupled systems. Early on, Minsky [l] conjectured a somewhat pessimistic bound of log y1for typical speedup on n processors. Since then, researchers [2 ] have shown that certain characteristics of programs, such as the DO loops in Fortran, can often be exploited to yield more optimistic levels of speedup. Other researchers [ 31 counter this kind of optimism by pointing out that parallel and vector processing has limitations in potential speedup (i.e., Amdahl’s law) to the extent that speedup is bounded from above by n/ (s . n + 1 - s), where s is the fraction of a computation that must be done serially. This suggests that it makes more sense to first concentrate on achieving the maximum speedup possible with a single powerful processor. In this view, the distributed approach is not as attractive an option. More recently, work on hypercubes [4] appears to indicate that

College Park, Maryland 20742

the arguments underlying Amdahl’s speedup bound are not particularly appropriate for the current approach to massive ensemble parallelism. Experiments performed on the Ncube hypercube indicate it is often the case that ( 1 - s) (i.e., the fraction of computation that can be done in parallel) is dependent on II, and in practice problem size scales with the number of processors. There are a number of similar issues, many unexplored and many unanswered in the performance of parallel and distributed systems, including shared vs distributed memory, circuit switching vs message passing, cost of computation vs cost of communication, scheduling strategies, and problems of stability. Our original intent in publishing a special issue was to solicit novel modeling techniques for the many difficult problems in the parallel computer domain. In particular, we were interested in exploratory methodology that takes into account randomness in large parallel or distributed systems. We feel this subject is especially important because systems are bound to get larger, capable of generating many more dependent events in a given time span, and capable of behavior that is significantly more complex than that of systems of today. The variety of papers in this issue gives some indication of the rich nature of modeling problems in the parallel computer domain. Though we received a number of good papers, the time and page limitations of the issue allowed for acceptance of only a fraction of the submissions. The papers in this issue may be divided into three categories, namely, parallel computer benchmarking, stochastic models of parallel systems, and performance issues in multiprocessing. PARALLEL

COMPUTER

BENCHMARKING

The paper by Gustafson, Rover, Elbert, and Carter on “The Design of a Scalable, Fixed-Time Computer Benchmark,” contains a novel and interesting design for scalable 297

0743-7315/91 $3.00 Copyright 0 I99 I by Academic Press, Inc. All rights of reproduction in any form reserved.

298

REGO

AND

TRIPATHI

benchmarking of parallel computers. The authors point out proximations, and the latter is useful for predicting the perproblems with previous benchmarks and show that the pro- formance for a variety of traffic levels. For circuit-switched posed benchmark, called SLALOM, overcomes some of these models, the paper describes a good approximation for MINs limitations. This benchmark can be used on a variety of under certain conditions. In both cases, the author points parallel machines and allows for a comparative look at per- out some open problems in modeling these networks under formance on a wide range. Empirical results show bench- conditions of blocking and partial path release. marks differing by up to six orders of magnitude for given problems on different architectures. Of particular interest is PERFORMANCE ISSUES IN MULTIPROCESSING the presence of superlinear speedup with problem scaling under certain conditions. In the paper entitled “Resource Contention in SharedAndrews and Polychronopoulos have proposed a different Memory Multiprocessors: A Parameterized Performance kind of benchmarking, in order to assess performance vs Degradation Model,” Nanda, Shing, Tzen, and Ni propose cost trade-offs in heterugeneous vs homogeneous parallel two new and interesting normalized output metrics in order machine architectures. In “An Analytical Approach to Per- to effect a comparative assessment of performance loss beformance/Cost Modeling of Parallel Computers,” the au- tween systems that are essentially different. Focusing on thors use the perspectives of centralized and distributed con- shared-memory multiprocessors, the authors examine how trol to study how one compares cost/performance ratios of performance degrades with increased contention for sharedone architecture against the other architecture; comparison memory resources. The paper contains a comparative study is done between architectures with fast centralized control of contention-based performance degradation for the BBN of slower processors and comparable architectures with dis- Butterfly-I, the BBN Butterfly-II, and the Sequent Balance tributed control of homogeneous processors. Arguing in multiprocessors. terms of averages, the authors show that the former has a Memory-based modeling is also the topic of the paper by better cost/performance ratio than the latter. Johnsson, entitled “Performance Modeling of Distributed Yet another form of benchmarking is discussed in the Memory Architectures.” In modeling distributed-memory paper on “Synthetic Models of Distributed Memory Parallel machines, it is crucial to understand how data are moved Programs,” by Poplawski. In this paper the benchmarking about in the system, with particular emphasis on commuinvolves the use of simple programs synthesized to construcnication. The paper contains a high-level architectural model tively model the behavior of more complex programs. Three and indicates that communication primitives are well modmethods are proposed for synthesizing such programs, using eled separately, with detailed attention also given to local delays to model the computation between instances of com- memory hierarchy, data allocation, and data density on local munication. These synthetic benchmarks can be used to memory. Benchmarks performed on the Intel iPSC/ 1 and predict the performance of complex parallel programs. CM-2 exhibit improvements of an order of magnitude with data-motion-based algorithm enhancements. As suggested by the title, “Modeling and Measuring MulSTOCHASTIC MODELS OF PARALLEL SYSTEMS tiprogramming and System Overheads on a Shared-Memory The paper by Plateau and Foumeau, entitled “A MethMultiprocessor: A Case Study,” involves a detailed examiodology for Solving Markov Models of Parallel Systems,” nation of multiprogramming overhead in a parallel execution describes a novel approach to modeling the performance of environment, with empirical work performed on the Alliants FX/8 and FX/80 using the Cedar Operating System. In communicating automata. Along with an accompanying software package (PEPS) for solving communicating auto- seeking accurate parameter values for modeling overhead effects in multiprogramming, Dimpsey and Iyer conduct mata network models, the new Stochastic Automata Network sampling experiments to obtain real workload measure(SAN) formalism competes well with the more established ments, in addition to other measurements. The analysis of Generalized Stochastic Petri Net (GSPN) approach to modoverhead effects is based on statistical correlations. eling parallel systems. It is likely that advantages including In the paper entitled “Modeling Overlapped Operation more expressive power, less storage requirements, and exploitation of structure will make this work important for between the Control Unit and Processing Elements in an providing useful insights into how structure can be exploited SIMD Machine,” Kim, Nichols, and Siegel present a deterso that really large systems are amenable to analysis. ministic (homogeneous processor) model for analyzing The paper by Harrison on “Analytic Models For MultiSIMD machines with overlap of control unit and processor stage Interconnection Networks” focuses on approximate operations. Aiming to balance computational workload probability models of packet-switched and circuit-switched across the control unit and processing elements, the authors multistage interconnection networks ( MINs) . For packet- demonstrate how migrating work between these units helps switched models, Harrison introduces both exact and ap- reduce execution time. The proposed architecture, known proximate models; the former is useful for evaluating ap- as the Overlapped SIMD (OSIMD) achieves a speedup of 2

GUEST EDITORS’

over a non-Overlapped SIMD, under ideal operating conditions. The proposed model will be of use to system designers and SIMD machine programmers. Finally, in the paper on “Performance of Hashed Cache Data Migration Schemes on Multicomputers,” Hiranandani, Saltz, Mehrotra, and Berryman propose and study schemes for the automatic migration of data over distributed memories. Since system performance is related to efficient handling of data, evaluating schemes for data migration can lead to high-performance remote data access mechanisms on multiprocessors. The authors evaluate a set of proposed hashed cache schemes experimentally, reporting results (using select model problems) for a 32-node Intel iPSC/2. We are deeply indebted to the more than 100 referees of this special issue for taking pains to provide helpful comments under rather stringent time constraints. We thank Professor Kai Hwang for encouraging us to put together this special issue and are appreciative of his ready and willing help in the form of editorial and technical advice. REFERENCES 1. Minsky, M., and Papert, S. On some associative, parallel and analog computations. In Jacks, E. J. (Ed.). Associative Information Technologies. Elsevier/North-Holland, New York, 197 I.

INTRODUCTION

299

2. Amdahl, G. Validity of the single processor approach to achieving large scale computing capabilities. Proc. AFIPS Conference. Thompson Books, Washington, DC, Apr. 1967, Vol. 30, pp. 483-485. 3. Polychronopoulos, C., et al. Execution of parallel loops on parallel processor systems. Proc. 1986 International Conference on ParalLel Processing. 1986. pp. 519-527. 4. Gustafson, J. L. Reevaluating Amdahl’s law. Comm. ACM 31,5 (May 1988).

VERNON J. REGO is an associate professor of computer sciences at Purdue University, West Lafayette, Indiana. His research interests include parallel and distributed systems, computer networks, probabilistic modeling, concurrent simulation, and performance evaluation. He received the M.Sc. (Hons.) degree in mathematics from B.1.T.S (Pilani), India, in 1979. Prior to joining Purdue, he was a statistical computing consultant with the Academic Computer Laboratory at Michigan State University (East Lansing, MI), where he obtained his M.S. and Ph.D. degrees in computer science in 1985. He is an ACM national lecturer and a member of the IEEE. SATISH K. TRIPATHI is a professor and the Chairman of the Department of Computer Science at the University of Maryland, College Park. His research interests are in the areas of performance evaluation, distributed and parallel systems, real-time systems, fault tolerance, and high-speed networks. He is on the editorial board of Theoretical Computer Science. He has been a guest editor for IEEE Transactions on Software Engineering and has edited a book for North-Holland. He attended Benaras Hindu University, the Indian Statistical Institute, the University of Alberta, and the University of Toronto. He received a Ph.D. in computer science from the University of Toronto in 1979.