Parallel processing with an array of microcomputers

Parallel processing with an array of microcomputers The availability of powerful and cheap microcomputers has focused attention on computer arrays. B ...

Download PDF

1MB Sizes 0 Downloads 64 Views

Report

PDF Reader
Full Text

Parallel processing with an array of microcomputers The availability of powerful and cheap microcomputers has focused attention on computer arrays. B Braunleder and R Kober describe a multicomputer system used to solve complex problems

This article describes a multicomputer system which can effectively use up to several hundred microcomputers to solve complex problems and outlines its operating schema, especially task interoperation and communication, as well as its areas of application. Reliability studies based on practical experience with an experimental system demon-

±

SYSTEM

ARCHITECTURE

High performance for time-consuming computations can be obtained by combining many microcomputers to form a single mainframe. Figure 1 shows the basic architecture of the structured multiprocessor system from the Siemens Research Laboratories. It is hierarchically organized thus: •

the frontend computer acts as user interface and supervises the control computer

Siemens AG, Central Researchand Development Department, ZFE FL FKS, Otto Hahn Ring 8, D-8000 Munich 83, FRG

vol 5 no 6july/august 1981

rhlI

Control I

computer

L_J

strate how spare module Computers and reconfiguration can significantly enhance reliability. The evolution of computer architectures is stimulated by the rapid progress in technology. New computer architectures aim at modularity, high performance and low cost, and demand new operating structures and software system design 1-3 . Horizontal and vertical migration is a major method of meeting this challenge: on the one hand, dedicated functions of the instruction set, the operating system or basic utilities are realized by highly integrated hardware, i.e. microprogrammable processor architectures; on the other hand, different application programs in a multiuser environment, or fairly independent parts of one large complex program are handled by several distributed processors or computers linked together to form a multiprocessor/multicomputer system. SMS is a microprocessor-based structured multicomputer system designed for the parallel solution of large, mainly numerical, problems. The main difficulty was the development of an effective organization to exploit the multiple hardware and its parallel operating mode as far as possible. This has been achieved by implementing a clear and straightforward operating scheme. Thus providing a large degree of independence in the processing units in a simple fashion and avoiding complicated control structures which tend to limit system performance.

fJ,

I I

rh

Modu~ computers

rhl I

Modulearmy

Figure I. Basic architecture o f a hierarchically organized multicomputer system. The module computers are complete microcomputers with a private memory and a private I / 0 interface • •

the control computer itself controls the module computers (the module array) the module computers are identical microcomputers consisting of a link to the interconnection network, a two-port memory, a processor and a private I/O interface

SMS operation is selfsynchronizing. It is determined by a strict regulation of flow control and a clear assignment of tasks to frontend computer, control computer and module array. The frontend computer (Figure 2a) controls and starts the control computer and waits for completion or error messages. The data flow, on the other hand, is more flexible. For performance reasons, it is possible to transfer data between the frontend computer and the module array without the interaction of the array controller. Therefore the frontend computer has access to the memories of the control computer and the module computer (Figure 2b). The control computer has access to the module computers (Figure 2c). The module computers perform the computation of the problem to be solved. No module processor can access the memory of other modules directly. Communication between module computers must be handled by the control computer or the frontend computer. This restriction is the prerequisite for a clear and nonconflicting operating schema which has proven its worth in a number of applications. The SMS architecture suits nearly all application areas where many identical or similar tasks have to be executed.

0141-9331/81/060241-05 $02.00 © 1981 I PC Business Press

I I

24I

I I

"°°°'e

computer

b

computer

Frontend computer

orroy

Control computer

C

Control computer

Module orroy

I

~ ~

I

Module orroy

Figure 2. Flow of information in the mu/ticomputer system: a) control flow; b) data flow supervised by the frontend computer," c) data flow supervised by the control computer

Figure 3. Plugin unit using 16 SAB 8080-based microcomputer modules

We call this type of application 'task-parallel problems' (Table I). They can be divided into vectorizable and nonvectorizable problems. •

vectorizable problems stem from regular, homogeneous application areas - for SMS that means all computer modules execute the same program in the same sequence (SlMD-type application*) • nonvectorized problems stem from irregular, nonhomogeneous or stochastic applications - in SMS the module computers run identical programs in a different manner, or they execute different programs (M IMD-type applicationt)

Table I. Task.parallel problems Vectorizable problems

Nonvectorizable problems

Linear partial differential Nonlinear inhomogeneous problems equations Monte Carlo methods Matrix algebra Sparse matrices Sparse matrices (?) System optimization Decision tree problems OPERATIONAL

SMS S Y S T E M S

For two years the experimental system SMS 201 has demonstrated its practical worth in more and more applications. It is a two-level system consisting of a minicomputer, which acts as both a frontend computer and a control computer, and an array of 128 microcomputer modules 4's . The microcomputers are based on the 8-bit microprocessor SAB 8080. They contain 16 kbyte of private RAM and up to 4 kbyte of EPROM. The total memory capacity is 2 Mbyte of RAM and up to 0.5 Mbyte of EPROM. The computing power of the SMS 201 is comparable to that of a state-of-the-art mainframe. (Figures 3 and 4 show examples of the SMS 201 system.) Much more power is provided by the SMS 3, a new experimental system based on the 16-bit microprocessor SAB 8086. The numerical performance will be significantly improved by adding the numerical data processor 8087 to each module ° . The SMS 3 is a three-level system. The frontend computer is a standard mainframe from the Siemens system 7000 (a 7531 is used). The control *SlMD-single instruction multiple data tMIMD-multiple instruction multiple data I

242

Figure 4. 51145201 multiprocessor system with 128 microcomputer modules computer is a hard disc 8086 system with powerful microprogrammed extensions for control and communications functions. OPERATING

SCHEMA

The operating schema that determines the interoperation of the control computer and module computers is based on a clear separation of the global control flow and data flow. The control computer coordinates the computations performed by, and the communication between, the module computers. Thus the organization of the program run is handled by the control computer, whereas the module computers can be considered slaves for performing subproblem calculations. The global flow of data depends on how problems are decomposed into subproblems and turns out to be the main aspect of parallelization and its implementation. For a systematic approach it is appropriate to split the function of the control computer into two system tasks • TI - subproblem calculation on the module computers • T2 - communication of data between the module computers

microprocessors and microsystems

Their dynamic execution is linked to the two system processes • •

Process stote -Active

PI - calculation process P: - communication process

Initializing the SMS corresponds to starting P1 and P2. After that, the SMS is ready to begin operation by alternately activating Pz and P2- P1 starts the user programs in the module computers, which perform the parallel computation of the user problem. After reaching a certain state in the partial results, P2 ensures that every module computer receives from the other module computers the current data for further computations. The final results having been obtained, P1 and P2 are deleted. Program run can be considered a cyclic repetition of states of either process (Figure 5). In order to process the tasks T1 and T2, P1 and P2 are active. P1 or P2 is deactivated by the 'done' signal from all module computers to the control computer. Because the two processes are activated alternately and interact, the process which has just been active is blocked. The other process which has been blocked will be in the ready state before being activated. The ready state is an intermediate state in which the control computer can implement certain coordinating measures.

Reedy

-Blocked

I

A more systematic description is given in the following considerations. The behaviour of system processes at runtime is coordinated by synchronizing the states according to two principles: alternate activation and interaction of the processes, and cyclic repetition of states within each process. The states are synchronized at fixed time-points in a predetermined order of states and at a problemdependent time of occurence (Figure 6). (ti)ieN being the sequence of these time-points, a characterization of system behaviour means we must consider process state relationships at time te(t h 4+1 ) for all ieN. Definition

Sb is the set of busy

states

Si is the set of idle states

(5b = [ready, active])

(Si = [blocked[ )

Natural order of states: ready < active < blocked

N ~ 7 St°rt

I

I

I

Time

Figure 6. Schematic view of system processes at runtime Characterization

Cyclic property Each process passes5 = S b L.) S i in its natural order and continues in cyclic repetition. Alternate property If one process is idle, the other is busy. COMMUN ICATION

SYSTEM BEHAVIOUR

~1

4-3

PROTOCOL

The communication process requires special regulation in accordance with the decentralized arrangement of the module computer array. The following communication protocol guarantees that the maximum volume of information is exchanged in parallel. Every module computer uses a part of the module memory as communication storage. It is accessed by either the module processor or the control computer. This storage has a separate compartment which corresponds to an equalsized compartment in each of the other module computers. The size of the compartments can be varied to suit the problem being dealt with and to fit the data structure in the working st6rage. The exchange of information between the module computers is timed by the control computer. Firstly, preparatory measures are implemented in each module computer, i.e. data packets are formed and then filled in its own compartment. Secondly, the data packets are copied to the corresponding compartment in each of the other communication storages. Thus each data pocket remains associated with the sending module computer. Thirdly, each module computer accesses the data from those compartments of its private communication storage which are allocated to the relevant module computers 7. The flow of data during the firstand third stages of the information exchange is dependent on the calculation structure. It underlines the problem structure and the partitioning into parallelsubproblems. This requires a description of the entire data structure, indicatingwhich data must be availablefor which subproblem. In general, this interlinkageof data must be described in full, whereas for regular data structures itwill be given as an algorithmic description.

Delete

PARALLELIZATION Woit

Figure 5. Process state diagram

vol 5 no 6july/august 1981

OF PROBLEMS

Parallelization must aim at splitting large complex problems into subproblems of equal size requiring much more time for their solution than for their interaction. SMS can be

243

used most efficiently in solving many largely selfcontained subproblems. Decomposition strategy requires a careful analysis of the user problem. This means looking for basic computational units, combining them to form reasonable subproblems, analysing the basic dependency relations and transposing them to subproblem interactions. Regular calculations in general are of an interactive nature: a complete set of new values is computed from old values and repeated until the final solution is reached. Doing this sequentially is very time-consuming and sophisticated algorithms have been developed to meet this challenge. Parallelizing these problems promises to yield good results 8. In general, the greater the number of processors used, the greater the advantage of parallel computation. It is somewhat diminished, however, by the amount of time and additional storage needed for the coordination of, and communication between, the module computers. An effective decomposition into subproblems will aim at reducing this as much as possible. The solution of regular problems is a major impetus behind the establishment of SMS systems and allows better insight into problem realization.

MODEL PROBLEM The solution of partial differential equations by discretization is a typical regular problem. The differential equation is thereby transformed into a difference equation and the solution is computed at a grid of points spread over the region considered. The computation of semiconductor structures is a well known problem in the area of field calculations. It requires a typical grid size of 100 x 100 points. Suitable subproblems to be allocated to the module computers are formed by splitting the points into a set of disjoint subregions of equal size. The relation for computing the value at an individual point is determined by the discretization form selected. A nonlinear Poisson equation, for example, is discretized to a nonlinear system of equations. Following Newton's method, the system is solved by an interative calculation o{ new layers of values for the whole grid. The discretization form usually chosen gives the local relation for computing the value of an individual point as one quarter of the sum of the values of its four neighbours. With regard to this form of computation, the points of a subregion can be divided into two classes: one class of interior points requiring information from its own subregion only; the other class of boundary points requiring additional information from other subregions. The four-point formula necessitates that the boundaries of the subregions belong to the second class and that each side of that boundary interrelates with the corresponding side of the opposite subregion (Figure 7). These points are essential for parallelization. After each autonomous computation of the data from all subregions in all module computers, i.e. a complete layer of new values, interaction between the module computers requires precisely these data to arrange the new state in all of them and to enable the next step of the autonomous computation. Figure 7 shows the partitioning of the points in subregion S. The boundary points of other subregions must be known by Si to be autonomous in the computation of its own points. As can be seen, Si interdepends with parts of the boundary ofSi+l, Si+n, Si-1 ,Si-n and thus reflects

244

s,-.-,

•

•

l

]

I

I

T

•

• •1' Siln

Si-I

•

•

•

•

]

I

I

•

S. . . . I

•

•

•

I

I

I

~

[

s,

Sl÷ n

!.... --•--

O--,

--O--

I I

Si_n+ I

/

O----

I I Si*l

[

Si.n*l

Figure 7. Model problem," regular operation and decomposition of data array into subarrays the calculation form of an individual point. This is the basis for an algorithmic representation of the overall communication structure. It is converted into communication tables and distributed to the module computers, all of which thus have the necessary information to participate in the flow of communication 9 .

RELIABI LITY ASPECTS Large arrays of identical microcomputers can provide high reliability. There are two reasons for this: the use of standard mass-produced components; and the addition of free modules to the system to replace modules which have failed. The experience gained using the system SMS 201 in extended non-stop operation forms the background for the following investigations. We look at the reliability of the microcomputer without the interconnection network and without either the frontend or the control computer. Suppose we require nl computer modules to solve the application problem. If we add n2 modules to replace modules which have failed, the system reliability RA' (t) is1°: /7 2

RA(t )= ~,

(n3) aM' (1 -aM) n2-i

(1)

1=/7 1

where aM is the reliability of a single module; it equals -At

aM= e where X is the failure rate of a single module. Thus /72 R A (t) = ~

i=n 1

n2 e-Xti(1 e-At) n 2-i ( 7- ) -

(2) (3)

In Figure 8, Equation 3 is drawn for nj =100 and various n2. The reliability X of a single module is taken from a one-year test run of the SMS 201. It is equal to about 90 000 h-1 10. The increase in system reliability which is achieved by adding some spare modules is clearly shown. For example, the reliability after six months increases from 0.04 to 0.95, if 20 spare modules have been added. In SMS 3, modules which have failed can be deactivated and replaced automatically under software control. The user program has to be reloaded after reconfiguration.

microprocessors and microsystems

REFERENCES 0.8

I n2=IIO

E 0.6

2 Rumbaugh, J 'A data flow multiprocessor' IEEE Trans. Comput. Voi-C26 No 2 (1972) pp 138-146

0.4

0.2

1 Flynn, M J 'Some computer organizations and their effectiveness' IEEE Trans. Comput. Vo1-C21 (1972) pp 948-960

~

n,= IOO !

~=105

,ooo 2000 30oo 400o 5oo0 6000 ~ooo eooo 9o00 I

3 Wallach, Y and Konrad, V 'On block-parallel methods for solving linear equations' IEEE Trans. Comput. Voi-C29 No 5 (1980) pp 354-359 4 Kober, R 'Parallel system structures' Siemens Res. DeveL Report Vol 7 No 6 (1978) pp 316-318

6 months t [hours~

Figure 8. Reliability RA of an array of computers against time t (n i = I00 out of n2 microcomputers are required to perform the calculations

5 Kober, R and Kuznia, C 'SMS - a multiprocessor architecture for high speed numerical calculations' Proc. Int. Conf. on parallel processing (1978) pp 18-24

If checkpointing is used, the computation can continue after the last check point. Without checkpointing the user program has to be restarted from the beginning.

6 Palmer, Jet al 'Making mainframes accessible to microcomputers' Electronics Vo153 No 11 (1980) pp 114-121

FUTURE TRENDS Today multimicroprocessor systems such as SMS already represent an economical alternative to large scientific computers. Owing to the rapid progress in semiconductor technology, especially in very large scale integration, the immediate future will bring still more powerful systems with improved price:performance ratios. To make use of this trend necessitates further endeavours, primarily in two directions. Firstly, with regard to the further development of parallel algorithms and procedures; and secondly, against a background of

constantly rising software costs, with regard to the testing and introduction of higher-level hardware-independent languages for formulating parallel processes for multicomputer systems.

7 Braunleder, B, Goetz, P and Tanner, G 'Parallel processing with 128 microprocessors - part I: architecture and operation of a multiprocessor system' Siemens Res. DeveL Report Vol 9 No 6 (1980) pp 330-333 8 Braunleder, B 'Decomposition into quasi-independent subproblems - the solution of large problems with a structured multimicroprocessor' Proc. Int. Microcomputers, minicomputers, microprocessors Conf. (1980) pp 165-170 9 Braunleder, B and Tanner, G 'Parallel processing with 128 microprocessors- part I1: programming and use of a multiprocessor system' Siemens Res. DeveL Report (1981) Vol 10 No 1 10Tomann, S and Liedl, J 'Reliability in microcomputer arrays' Microprocessors and Microprogramming Vol 7 No 3 (March 1981) pp 185-190

Design notes Engineers - - have you a design hint or trick of the trade you'd like to share with your colleagues, but cannot spare the time to author a full-scale refereed paper?. Or do you have experience with a particular part or piece of equipment you feel may be of interest to others? Why not write a Design Note? It can be as short as you like - - consistent with clarity - - and we can reproduce your software listings straight from printer output. Send us your sketches and our studio will turn them into works of art (although we would prefer unlettered tracings). At present we can offer rapid typeset publication -- which can be turned quickly and simply into glossy reprints for use by your company. Send your ideas (or ring for more advice) to Robert Parry, Microprocessors and Microsystems, PO Box 63, Westbury House, Bury Street, Guildford, Surrey GU2 5BH, UK.

vol 5 no 6july/august 1981

245

Parallel processing with an array of microcomputers

Parallel processing with an array of microcomputers

Recommend Documents