A step towards large scale parallelism: Building a parallel computing environment from heterogenous resources

A step towards large scale parallelism: Building a parallel computing environment from heterogenous resources

Future Generation Computer Systems ll(1995) 491-498 A step towards large scale parallelism: Building a parallel computing environment from heterogeno...

802KB Sizes 0 Downloads 35 Views

Future Generation Computer Systems ll(1995) 491-498

A step towards large scale parallelism: Building a parallel computing environment from heterogenous resources Kimmo Koski

*

Center for Scientific Computing (CSC), P.O. Box 405, Tietotie 6, FIN-02101 Espoo, Finland

Abstract MPP industry has recently suffered from economical problems: various companies have disappeared from the market. Traditional vector system vendors, such as IBM, Cray and Convex, have entered the marketplace with their new systems based on standard RISC-technology. Competition and risks in massively parallel processing have increased. The real breakthrough of the parallel systems has not yet happened. Lack of software and tools makes the transition to MPP slow. In order to minimize the risks, a careful approach is to train the user base first with the existing computer resources, which generally include some RISC processors or clusters. A set of tools is available for the user, such as PVM. Pilot projects in order to promote MPP programming are necessary. In a heterogeneous environment, a special attention should be given in load balancing and efficient usage of the resources. The Center for Scientific Computing (CSC) in Finland has been running moderately parallel shared memory systems from the end of 1980s and additionally PVM clusters and IBM SP2 distributed memory system for the last few years in order to prepare for large scale parallelism. In the end of 1994 CSC made a decision to purchase a 192-processor Cray’s future generation MPP system, which will be installed during 1996. The selection of the parallel tools is large and everyone of them cannot be supported. In order to concentrate the efforts, choice of technology has to be made. The next few years will demonstrate the abilities of the maturing compiler technology of the parallel systems. One choice of supported software tools and assumption of the compiler technology development have been presented in this paper. Keywords:

Metacomputing;

MPP; Cluster; Load balancing;

1. Introduction During the recent years parallel processing and lately especially massively parallel processing has been discussed intensively. In certain application areas the need for parallel usage is emerging due to the parallel nature of the problem or high cost of vector supercomputers. However, the massively parallel processor (MPP) systems have

* Email: [email protected] 0167-739X/95/$09.50 0 1995 Elsevier SSDI 0167-739X(95)00018-6

Science

PVM; CSC

not yet been able to demonstrate mature and adequate computing environment to solve problems from various different application areas. Also it is still unclear to which direction and in which time frame the parallel compiler technology and architectures are evolving. Most high-performance computing centers are today running a set of different (mainly UNIX based) resources, like vector supercomputers, massively parallel computers, shared memory multiprocessors and efficient RISC workstations.

B.V. All rights reserved

492

I2 Koski / Future Generation Computer Systems I1 (1995) 491-498

Their basic problem often is how to run the load in the most suitable computer for this particular problem in order to use the environment as efficiently as possible. This interoperability of different kinds of resources is called Metacomputing. The optimal usage of parallel processing in a Metacomputing environment raises also several questions. Some of the parallel systems are also used in serial mode so that the task is executed on one node same time than part of the machine executes parallel jobs. In the case of a large set of heterogeneous jobs this brings out the need to consider the throughput capacity obtained from serial usage versus the faster problem solving (but probably more resource consuming) time of parallel usage. One of the key questions is how to balance both the parallel and serial as well also interactive and batch usage optimally across different set of resources. There are also different approaches to parallel computing. The usage can be moderate using, for example, a small set of processors and shared memory which has already been done for several years. More interesting cases are related to the usage of tens, hundreds or even more processors within the same computer system or possibly across different computer systems. In some cases to build a parallel computing environment has to be realized with quite a moderate budget which makes possibilities to use some of the already existing equipment appealing. Today the number of different parallel computing and programming tools has grown considerably. Although this kind of tools offer various means to parallelize code and run the jobs in heterogenous platforms, lack of standards still create a somewhat unclear vision about the direction where parallel computing is heading. Changes of the industry and disappearing of some traditional MPP vendors have also radically affected to the market.

2. Motivation of building the environment heterogeneous resources

from

Presently the scientific computing environments are often dominated by either large and

expensive vector computers, multiprocessor servers or clusters of efficient workstations. Traditional massively parallel computers have not fulfilled their promises to spread quickly around the market, mainly due to their difficult user interfaces and lack of software. In many cases purchased MPP computers are serving as a playground for message passing type programming or very specific applications. On the other hand it is inevitable that high performance computing is increasingly using more parallelism due to the fact that one processor peak performance is expensive to achieve and processors are getting near to their practical limits. Also some problems are by their nature suitable for parallel processing. It is necessary for all the high performance computing centers to build an environment for parallel computing due to the inevitable development towards increasingly more massive parallelism. The state of the massively parallel computing industry is at the moment very unstable. The traditional MPP vendors, such as Intel, Thinking Machines, Meiko, Maspar, Kendall Square Research and some others, have successfully been attacked by new products made by vendors previously known for their vector processor supercomputers, such as Cray T3D, Convex Exemplar, Fujitsu VPPSOO and IBM SP systems. The market and especially profits from MPP systems are obviously too small for such a large vendor base. The last year, 1994, has reduced the number of the MPP vendors, and I would not be surprised in case the amount of them would still degrease. At the same time as MPP vendors are suffering from hard competition and decreasing profits, some centers are suffering from degreasing investment funds. Thus it is not always possible to purchase a MPP system for dedicated use. This makes it appealing to provide an environment by building it from existing resources by coupling them together or by obtaining such a MPP system which has a high one processor performance such as the system can efficiently take care of serial jobs in addition to parallel tasks. This kind of solution represents also relatively low risk: if the parallel usage does not seem to attract users one can always use the high one processor perfor-

K. Koski /Future

Generation

Computer Systems 11 (I 995) 491-498

mance system as a throughput machine for serial load. However, sharing a resource between both parallel and serial load can cause a new kind of practical problem: finding a suitable amount of free nodes for the parallel job is not always an easy task in a fully loaded machine. The development of different interconnections and switches will make it possible to couple different systems together more tightly than before. In some of the clustered systems the latencies have decreased from hundreds to tens of microseconds getting near to the ‘traditional’ MPP systems latencies. In a few years timeframe it might not be possible to make very much difference between the systems we today call clusters with high performance interconnection and those which we name massively parallel computers. Thus in the future when the interconnections and communication systems have about similar latenties and capabilities it might also become attractive to couple together processors from different computer systems and use this kind of entity as a parallel computing resource in stead of purchasing a dedicated MPP system.

3. Parallel computing tools During the last few years there has been many efforts in designing appropriate tools for the parallel computing environment [l]. Several research projects and computing centers have come out with different systems from which the most popular public domain software package is Parallel Virtual Machine (PVM) from the University of Tennessee 121. In addition to the public domain packages a set of commercial packages have been published such as Express from Applied Parallel Research (GENIAS GmbH in Europe) [3] and PARMACS from Pallas GmbH [4]. These different tools provide a possibility to use several types of computers together and this way to build a parallel computing resource. However, in many cases the parallel usage is limited by the communication latencies between nodes and by the speed of connections. The tools are also often quite difficult to use and require much effort from the end user, their command lan-

493

guage is also not consistent with each other. An effort to ease the PVM usage by an interface called EASYPVM has been made at Center for Scientific Computing by Sami Saarinen [5]. A standardization effort for message passing programming, Message Passing Interface (MPH [6], has been started, but it will still probably take some time before this standard is generally used. In addition to the parallel computing tools there is a set of other programming tools which are directed to help the user to write more efficient code, modify old programs or locate the critical parts of the code. The variety of these tools is large and the optimal selection is not obvious. One of the most known is Forge-90, an interactive parallization tool package. In the future it is probable that parallel computing capabilities and advanced programming features will be included into the compilers. Standard compilers, Fortranand High Performance Fortran (HPF), will probably be used on a variety of different computing resources thus providing the end user a relatively easy to use, standard interface to a parallel computing resource. However, compiler development and standardisation efforts are slow and it will still take some time before compiler technology has reached the sufficient level of effectivity and automation. Currently there are some quite advanced attempts to hide the difficulties of parallel programming from the user behind an intelligent compiler. For example Convex Exemplar has an advanced compiler which uses logically shared memory although the system under it is a distributed memory system. Same kind of attempt are visible also in other systems, for example in Cray’s T3D system. At the moment it would be useful to be able to predict the future and choose a set of tools that will probably exist in the future in such a way that the work done during the next two years can be profited from with the future systems [7]. The limited manpower resources have to be directed towards the right selection in order to avoid overhead work and at the same time serve the supported environment efficiently. Fig. 1 demonstrates the appropriate tools and one possible selection.

494

IL Koski / Future Generation

Computer Systems II (1995) 491-498

Fig. 1. The selection of different software tools available today is large [l]. This figure presents a part of the scope which contains the most important products for CSC. Shadowed boxes represent the supported products and white boxes those that are currently being evaluated at CSC. NQE also includes NQS. LSF has not been evaluated due to the concentration on NQE instead. It is necessary to note that relations between these layers can not be directly assumed from the placement of different boxes due to the fact that these products are not working simply on top of each other [9].

4. Load balancing Balancing the heterogeneous load in a supercomputing environment often includes compromising between parallel and serial load if there are not enough dedicated resources for parallel processing. The idea of metacomputing is to run the job in such a resource into which this particular job suits best [8]. Thus the massively parallel load should be run in a MPP type of computer, vectorizable load in a vector supercomputer and serial scalar load in a workstation etc. However, the selection of the available systems and profile of the load are the key characteristics to be used for the decision where to run the current jobs. There are no general systems available at the moment which could automatically and effectively divide the load to appropriate computing resource, although such intelligent queuing systems, such as Distributed Queueing System (DQS), CODINE from GENIAS GmbH and LoadLeveler from IBM, are being developed for batch load -presently these systems generally still lack some features and decision algorithms are not always adjusting dynamically to changes. Also interactive load is not necessarily considered

thoroughly. To complete the decision phase information about system status of various resources is needed and in order to collect this kind of information several tools are available [7]. In a metacomputer building the parallel computing environment from heterogeneous resources generally requires that the interactive load and serial batch load can be taken care of efficiently same time with the parallel load. This is assuming that there is no system dedicated only for parallel usage (which can serve all the parallel usage needs) and that the constant lack of resources exists as it generally does. In this case the development of queuing systems has a significant meaning, as also the configuration of the queues and ability to change them dynamically. During 1994 some new systems have been developed into the market. Load Sharing Facility (LSF) from Platform Computing Corporation, a commercial product based on the previous Utopia system from the University of Toronto, has risen some interest. LSF can be obtained in a variety of platforms and is also used in load balancing of Convex Exemplar systems. CraySoft has also developed a general network queuing system: Network Queing Environment (NQE) which is also available in multiple platforms. NQE has NQS as a subset which provides compatibility. In addition to NQS batch system NQE contains a Network Load Balancer (NLB) and File Transfer Agent WA). The initial testing experiences of NQE at CSC are quite promising.

5. Case example: Center for Scientific Computing (CSC), Finland

Center for Scientific Computing is the Finnish national supercomputer center, which is serving mainly the Finnish universities, research centers, the Finnish Meteorological Institute and some industrial users. The CSc’s computing environment has been presented in the appendices. The main resources are Cray C94 (replaced X-MP in May 1995) and Convex C3840 vector supercomputers, 24-processor IBM SP2 system with a high performance switch and two Silicon Graphics Power Challenge systems with ten and

IL Koski /Future

Generation Computer Systems 1 I (1995) 491-498

six processors, two and one gigabyte of central memory. CSC has also obtained a next generation Cray massively parallel system with close to 200 processors which will be installed in two phases during 1996 and will thus serve as a large scale parallel computing resource until the end of the century. CSc’s computing environment has for a long time been based on a moderate (1 to 8 processors) scale parallel processing systems, like large vector supercomputers and shared memory multiprocessors. CSC has chosen to proceed in small steps towards the parallelism and invested in scalable systems which provide also a considerable single processor performance together with parallel capabilities. CSc’s parallel computing environment has been built with a relatively low budget and due to this the resources have to be used as efficiently as possible. At this stage of the MPP industry CSC has also estimated that risks to choose a ‘traditional’ MPP system exist due to the unstable nature of the market and the future. In the supercomputer procurement which was processed in 1994 CSC made a long term commitment to invest on the massively parallel computing by purchasing Cray’s next generation MPP system. Reason for this was among others that we felt MPP industry soon to be ready to fulfil part of their promises and the timeframe 1996 will thus be suitable for starting the MPP transition even though same time the traditional vector and shared memory resources have to be maintained operational. One reason for the optimism was also the promising development of the MPP compilers and network queing systems. The metacomputing concept has been studied and further developed at CSC during the last years [7,8] and it’s goal is to use CSC’s existing resources as efficiently as possible. One part of the CSC metacomputing concept is to provide an environment for the parallel computing purposes. The current IBM SP2 system provides a first phase MPP system where the distributed memory programming experience is being gathered. During 1994 the parallel resource has mainly been IBM SP system although PVM was run also across the rest of the environment. The parallel usage of the IBM has consisted of several pilot

495

projects which have given some useful experience. The parallel jobs have shared the machine with a very intense serial load which has caused some practical difficulties to get the necessary amount of free nodes for the parallel usage. The IBM SP2 system at CSC has to be considered only as a first step towards parallel usage, not due to the system capabilities but mainly due to the policy of its usage. The second step which we consider already a full scale productional parallel usage will follow in 1996 when the Cray’s MPP system is taken into the production. The usage of the predecessor system (T3D) will start already in 1995 and the goal is to have the most promising parallel projects ready to be used in a new platform as soon as the system is up and running. The estimated sustained performance figures for a well parallelizable programs are expected to reach 8-10 Gflop/s depending on the application [lO,ll]. IBM system is using among others proprietary queing system (LoadLeveler) and Message Passing Library (MPL). In addition to this PVM has been used at CSC for few years and experiences have been gathered. To make it easier for the users to start PVM usage a special subroutine library has been designed by Sami Saarinen (CSC): EASYPVM [5]. The PVM is the base product for the parallel computing in CSC’s heterogeneous environment which is also likely to remain for the next years not only due to the product capabilities, but also because of the large amount of work which has been done in developing PVM and PVM enhancements around the world. Cray’s NQE system will probably later replace the IBM LoadLeveler and become the main batch system solution for the whole environment. Additionally CSC is following Express and PARMACS development. This is necessary in case these products develop such a way that their larger usage will be relevant. At the moment intensive use of these products is not possible due to limited resources although the situation might be changing in the near future. MPI will also be followed, and it is likely to be produced in full scale some time during the next few years. Some MPI implementations are already available.

496

K Koski/ Future Generation Computer Systems I1 (1995) 491-498

products. The variety of products has been studied in various research projects Cl]. In Fig. 1 one possible software product selection has been illustrated (CSC’s approach) which provides together with the hardware environment (appendix A) a relatively low risk environment for moderate parallelism. This kind of approach has been considered as a starting phase for more serious steps towards more massively parallel computing [93. In addition to the hardware environment and software tools the appropriate timing is an important factor to optimize the workload and transition costs between different systems and tools. To be able to choose the right systems in the right time some predictions for the future has to be done. One prediction which serves CSc’s needs is presented in Fig. 2. Many of the predictions in recent years, like the breakthrough of MPP systems and death of the vector supercomputers, have shown to be quite optimistic -that might well be the case also with the timeframes presented in Fig. 2. Together with the parallel computing tools the queing system development has to be considered carefully. Optimal load balancing makes resources available. Parallel usage and serial throughput type of usage compete often from same resources and efficient queueing policy is necessary. This is especially important if the computing resources are limited. During 1994 a number of parallel projects have been carried out with the first stage parallel IBM

The parallel programming is mainly concentrated in Fortran. Forge-90 interactive parallelization tools package will be used in the near future and HPF development will also be followed with interest. CSC’s choice for the parallel computing environment software building blocks is illustrated in Fig. 1. The CSc’s policy of parallel computing in the first stage is to concentrate into such MPP systems whose single processor performance is sufficiently high in order to obtain considerable performance when using a limited number of processors. This is due to the fact that many programs have optimal performance with moderate number of processors: after certain stage adding more processors might result such increase in communications that speedup is not sufficient. Single processor performance is important for the real performance also with massive number of processors. In later stage when larger parallel systems are available it is also possible to concentrate more to the scalability and communication speed between processors and obtain impressive performance figures with a large number of processors.

6. Results Building the parallel computing environment basically depends on two sections: appropriate hardware and selection of supported software

1994

996

1995 PVM

>

i_ ! Modestly parallel

MPI

1997

1 iJi2jcedco~~

>

R@stricted MPP

I Ve:tor supercomputin.

>

~General

MPP

>

Fig. 2. Assumed timeframes for the development of a parallel computing environment. These assumptions are based on various discussions with several people involved with computer industry and research. Restricted MPP is still used with a very limited number of different application software packages. It is predicted that from the beginning of 1997 the application base of MPP systems will already be significant.

K. Koski /Future

497

Generation Computer Systems II (1995) 491-498

SP2 system. Some of the most interesting ongoing or finished projects are: (1) Tuomo Kauranne, Juha Oinonen, Sami Saarinen, Olli Serimaa, Jarkko Hietaniemi: Parallelization of the HIRLAM weather model. CSC, University of Joensuu, Helsinki University of Technology. of a (2) Raino A.E. Makinen: Parallelization FEM code. University of Jyvaskyla, Laboratory of Scientific Computing. (3) Risto Nieminen, Fabio Bernardini: Car-Parrinello calculations. Helsinki University of Technology, Laboratory of Physics. (4) Jouko Nieminen et al.: Simulation of Spreading of a Polymer Droplet. Tampere University of Technology. (5) Mikael Fogelstriim, Juhani Kurkijarvi: Inhomogenous superfluids. Abo Akademi.

7. Discussion and conclusions The usability of different parallel computing tools can be discussed further. The amount of

/ Vector Multiprocessor Supercomputers

products is increasing and it becomes more difficult to select appropriate combination which will develop and exist still after a few years. Lack of standardization is making the choice even more risky. Users have to think carefully where to put their limited resources in order to avoid useless work. The question about what will be the future arises also among different types of massively or moderately parallel computer systems. Should you buy a large ‘traditional’ MPP system or should you start with a smaller system or with a cluster? Which MPP vendors exist still after one year? When there is enough software and for which type of systems? How should you build an environment for parallel computing with a low risk and what kind of resources should it contain? CSC has chosen a moderate approach which we believe will provide a scalable and low-risk upgrading path towards more massively parallel computing as the systems and their computing environment together with application software packages are evolving. We have also wondered when the compilers will be intelligent enough to

RISC-Based Parallel Supercomputers

Visualization

Fig. 3. The Center

for Scientific

Computing

(CSC) metacomputing

architecture.

498

K Koski/Future Generation Computer Systems 11 (1995) 491-498

handle virtually shared memory model in scalable distributed memory systems such a way that the typical user can program a computer system relatively easy. Some promising products in this field are already available, such as Convex Exemplar and Cray T3D compilers. However, we came into the conclusion that during the next two to three years users probably have to settle with some kind of message passing interface in order to get their code run with maximum performance. Thus building a parallel computing environment for the next few years at least with a suitable user friendly interface will be a relatively non-trivial task. Acknowledgements The author would like to thank especially scientific director Risto Nieminen and application specialist Sami Saarinen from CSC for their useful comments and discussions about this subject. Appendix A The Metacomputing environment for Scientific Computing

of the Center

The metacomputing environment of the Center for Scientific Computing is given in Fig. 3. References 111 L. H. Turcotte, A survey of software environments for exploiting networked computing resources, Engineering

Research Center for Computational Field Simulation, June 1993. [2] A. Geist, A. Beguelin, J. Dongarra, W. Jiang, Manchek and V. Sunderam, PVM 3 user’s guide and reference manual, Oak Ridge National Laboratory, May 1993. [3] Writing programs for parallel and distributed computing, Course Material, Parasoft Corporation, 1993. [4] Comparison between PVM and PARMACS, Report, Pallas GmbH., 1993. [5] S. Saarinen, EASYPVM -EnhAnced Subroutine library for PVM, Conference presentation at HPCN Europe, April 1994. [6] J. J. Dongarra et al., Message Passing Interface (MPI), Standards draft (1993), Standard (1994). [7] K. Koski, K. Lindberg and S. Saarinen, Effective use of supercomputing resources, CSC Research reports R01/93. [8] K. Koski, Metacomputing technologies at Center for Scientific Computing in Finland, CSC Research Reports R02/93. [9] K. Koski, Building a parallel computing environment from heterogeneous resources, Conference presentation at HPCN Europe, April 1994. [lo] J. Oinonen, 0. Serimaa, S. Saarinen, T. Kauranne and J. Hietaniemi, Parallelizing the operational HIRLAM weather model, CSC News 6 (4) (Dec. 1994). [ill K. Koski, The procurement for supercomputing, Request for proposals, Feb. 1994. Kimmo Koski (lit. tech.) is currently working as a Manager of the Section for Operations at the Center for Scientific Computing (CSC), Finland. His responsibilities include the maintenance and development of the current Metacomputing environment at CSC. During 1994 Koski was responsible among others for the Supercomputer procurement, which resulted the purchases of Cray C94 and 192processor T3E systems together with some Silicon graphics and IBM SP2 scalar systems. Koski’s recent research work includes papers and conference proceedings about Metacomputing, efficiency of the supercomputing environment, benchmarking and file server development.