A Computer Called Ramrod

A Computer Called Ramrod

Copyrighl © I FAC 9th Trienn ial World Congress Budapesl. Hu ng...

1MB Sizes 151 Downloads 156 Views

Copyrighl © I FAC 9th Trienn ial World Congress Budapesl. Hu ng
A COMPUTER CALLED RAMROD A. E. Rabinowitz and M. G. Rodd Department of Electrical Engineering, University of the Witwatersrand, J ohannesburg, South Africa

Abstract. As the demands made on process- control computers increase with the sophistication of application, designers of such computers are being forced to review the fundamental architecture. Processcontrol computers will have to achieve high reliability thro ugh fa ult - tolerance, while providing for intensive parallel processing in order to meet Real - Time requirements. Despite recent advances in the design of custom VLSI components, the cost of microprocessors continues to plummet. It seems reasonable, then, to find computing structures which will utilize these components as parallel pr ocessing elements to achieve the stated requirements. This paper describes a highly parallel, closely - coupled distributed co mputing system, called RAMROD, which utilizes such freely-available microprocessor components. It achieves its desired goals through the use of a data - flow architecture, combined with inherent, fault-tolerant features. The paper discusses the architecture and describes actual results that have been obtained. Keywords . 1.

Multiprocessors, Data Flow Computers, Parallel Processors, Process Control Computers.

INTRODUCTION

Current research into future computer systems for use in process-control is aimed at producing structures which:

*

have a high throughput

*

exhibit a high degree of fault-tolerance

*

are cost-effective

*

are simple, from both hardware and software points- of - view

*

are reliable and easy to maintain

*

are able to be tested , despite possible complexity.

No distinction is to be made between local memory and global memory.

3.

Overall control is to be invested in a high speed, hardware - based Real-Time operating system .

Clearly, system software will be an essential feature of the ultimate system. The first stage of the project has resulted in a skeleton operating system. Parallel to the work described in this paper are various softw8-re engineering tasks which will address problems such as program decomposition , syste m verification and language primitives . The computer system developed is intended for use in Real-Time process-control applications. Clearly, in addition to a high response ti me , reliability is an essential feature of such machines. While the problem of software reliability is still a major issue, this paper is directed towards the hardware reliability. Here, an important premise is the assumption that a processor will fail at some time , and hence the use of redundancy becomes inevitable.

Various methods of achieving some or all of these objectives have been proposed, but as yet no completely satisfactory solutions have emerged . Whilst many researchers are putting their efforts into the use of inherently ultra - high-speed devices , these solutions seem very expensive, requiring highly-sophisticated environmental con ditions . A more reasonable approach appears to lie in the adoption of structures which are capable of a high degree of para llel processing . Numerous systems have been suggested and, in fact, some have been developed, but generally they tend to suffer from complexit~

As will be shown in this paper, such redundancy can be achieved at a relatively low cost . A failure in one processor will cause another to take over its function, thus resulting in a fault-tolerant structure. If each processor is identical to the next and they are transparent to each other, then servicing becomes easier, and the maintenance technician simply needs to take out the processor board and replace it with another . This is also a factor adding to the system's simplicity , especially as self-diagnosis ,is to be an inherent feature of the ul timate structure. In terms of Kopetz ' s definition (Ref. 8) , a processor unit or a memory unit is regarded as the "smallest repl acea bl e unit".

In order to achieve high speeds in computation, the data-flow approach offers much for future machines . Data-flow machines attempt to provide concurrency and a data-flow language should reflect the programmer's thoughts, while making the computer's architecture transparent. To address this objective, a structure is proposed in this paper, which defines the features being explored. The guidelines for the design. may be broadly stated as: 1.

2.

The processing elements are to be identical and must make use of freely-available , lowcost, microprocessor chips.

Resource sharing, of all processing facilities as well as of peripheral devices, also contributes 2829

2830

A. E. Rabinowitz and M. G. Rodd

to the reduced cost of the system, thus giving increased performance. The multiprocessor system developed is modular and can be expanded up to a maximum configuration which has been set at 50 processors .

2.

DESIGN STRATEGY·

In any mul ti-computing system there are many options available for interconnecting the processor and memory. This section pro vides a brief review of the types of processor-memory switches and justifies the final design philosophy developed . Shared Bus The simplest switch for a multiprocessor system is a common bus, connecting the units as shown in Figu re la. The shared bus can be centrally polled, i.e. the processors only transmit when selected by the controller. Bus contention is avoided by using schemes such as:

1.

Fixed priorities, which allow processors with a high priority to gain access to the bus if another, lower-priority processor presently has access.

2.

First-in-first-out: the processor which first made the request is granted access to the bus.

3.

Daisy chaining : the processors are asked in turn whether they have made a request, and a processor can be gi ven access to the bus onl y when its turn arrives.

On the other hand, the bus may be interrupt-driven by the processors, which request bus usage. This scheme allows random usage of the bus. The bus can be a Time-Division Multiplexed (TDM) bus, where each processor is allocated a time slot, or it can be Frequency-Division Multiplexed (FDM), in which each processor has a particular transmit/receive frequenc~ This scheme is the least costly in terms of the hardware used, and is also the least complex in terms of components, as the bus can be totally passive. Cross-Bar Switch The cross-bar switch as shown in Figure lb has separate paths from each proc essor to each memory and I/O unit. The functional units (processors, memories and I/O) need not be concer ned wi th the bus interface as the switch contai ns all the necessary logic. This is the most complex interconnection scheme because the number of connections is necessarily large and because of the extra logic needed in the switch . Howe ver , reliability is reasonable and can be improved by redundancy of the units. Mul tiport Memory In a multiport memory system the co ntrol, switching and priorit y arbitration are conce ntrated at the interface to the passive units, and not in the switch as in the cross-bar scheme . Figure lc shows that each processor has a separate port and bus connecting it to each memory a nd I/O unit. This approach is the most expensive , since multiport memorie s are costly and each memory has to have contention logic built-in, in order to arbitrate between processors competing for the resourc e .

Shared Memory All of the interconnection strategies previously mentioned, typically, use shared memory, which provides for a means of inter-action between microprocessors. This inter-action can be enhanced if there is no distinction between the local memory of a single processor and the global memory of the multiprocessor system. Inherent in Enslow ' s definition (Re£. 1) is the concept of a multiprocessor system being a socalled "tightly-coupl ed " disUibuted computer. This implies that the various processors should be in close proximity to each other and should typically have access to a common memory and I/O system .

3.

A PROPOSED SOLUTION

The architecture of the tightly-coupled distributed computer proposed in this paper has been suggested directly by Enslow's definition. All microprocessors are identical and transparent to each other. They s hare a common memor y by using a Time-Division Multiplexed bus, and they share a common Input-Output bus system. (See Fig. la) . The initial key to the success of such a system lies in th e system -m anageme nt methodology selected. In line with a policy of using as much high-level hardware as possible within the design (and hence decreasing the dependence o n software), the controlling operating system is de e pl y embedded in a n advanced hardware structure. The feasibility of such an operating system has been investigated, and shown to hold much promise (Ref. 2). In this investigation a real-time operating system was developed and implementeo in a bit-slice, microprocessor-based hardware unit, controlled via microcode. The system was used to control up to 5 different computers, working together in a tightly -coupl ed fashion. The second important aspect of the proposed approach lies in the design of the processormemory switch. It is proposed that a single , Time-M ul tipl exed, Processor-Memory switch (i.e. a shared bus) be used. Control over the switch is effected by the hardware-based real-time operating system , termed a "Master Controller" (MC) . The full operation of this bus will be discu ssed in a later section. One of the most interesting aspects of such a system is that of speed . To make use of conventional microproce ssors with their relatively slow processi ng times, a system must b e produced to run these devices without a n y additional overheads being i n troduced by the ProcessorMemory switch. The answer to this lies in the timing concepts illustrated in figure 2. Using a conventional microprocessor, the average cycle time for a fetch instruction is approximatel y 1 microsecond, this time being set primarily by speed of memory access. \Yhile the memory is being accessed, the processor (and therefore the bus) is idle and this time can be uti lized for other processors. Thus , each proc essor uses the bus for a very short period in the chosen example , this will amount to 20 nanoseconds . The need, therefore, is for a bus system which switches very rapidly between processors. It is here that advanced, high-speed logic designs are required. Emitter Coupled Logic (ECL) has been chosen as the logic of the shared bus, being the fastest technology currently available. Each processor has a TTL-ECL interface to the bus,

A Computer called RAMROD

2831

which is itself controlled by the MC . Each microprocessor also has an interface to the intelligent Input/Output bus, on to which all peripherals are connected. The principle of the system is shown in a simplified form in Figure 3.

and a l6-bit internal address space. It is designed from 4-bit elements, which are cascaded to give the l6-bit word. Apart from the CCU, ALU and memory, an interrupt unit is provided so that a failed processor can alert the MC .

4.

5.

MASTER CONTROLLER

The Master Controller (MC), as stated above, is the single operating system of the tightly-coupled distributed computer system. It has many of the functions of a normal operating system, e . g . memory allocation, scheduling and dispatching . Tasks are loaded from an external medium into the system by the MC . Each task is allocated to an available processor, and may not exceed a specified length. A processor can have the status of "running", "available" or "failed", and the MC keeps a list showing the status of each . A Task Control Block is kept for each task, indicating where the task lies in memory, and to which processor it has been allocated. Dispatching of tasks to processors is effected by the MC, which programs the particular processor with the starting address of the task . The concept is, in fact, that of a virtual-machine system, in which a given processor has available only a block of memory within its normal address range. This overcomes the need for a memory-mapping hardware unit. An inherent feature resulting from this approach is that a processor can in fact address more than one block of memory - thus producing a true common-memory structure. The MC schedules a processor to run by allowing it access to the common bus and the common memory. As all processors are identical and transparent to one another, any processor can execute any task. Updating of the processor availability queues is achieved by constant moni toring . If a processor has fai led it is effec ti vel y remo ved from the bus, and when it is replaced, the MC must detect this. When a task has successfully run to completion, the MC can add that processor to the available list as well. The Master Controller is implemented using bitslice microprocessor components. This approach was adopted because a bit-slice-based processor can be optimally designed for a particular function requiring more speed than a conventional microprocessor can provide. It has been shown (Ref . 2) that the MC must be at least an order of magni tude faster than any indi v idual system processor. The Master Controller is divided into two units, the Computer Control Unit (CCU) and the Arithmetic Unit (figure 4). The CCU consists of a Microprogram Sequencer and a Mul tiplexer unit, which address the microprogram memory. The Sequencer addresses memory under microprogram control, and can perform functions such as "continue", "branch", "test" etc. The Pipe-line Register makes the next word available to the rest of the system while the current word is being executed, resulting in a significant improvement in performance. The Arithmetic Unit has an ALU which can do normal arithmetic functions; a fast-carry, look-ahead generator improves the speed of calculation. A Status Register holds all the system flags for use by the Multiplexer and Sequencer in the conditional branch instructions. External memory is available for scratch pad use; data and addresses from the processor pass through latches, enabled under program control. The Master Controller has a l6-bit internal data word

BUS STRUCTURE

The adoption of a Time-Multiplexed bus imposes severe constraints on the technology used in its implementation. Excluding some of the more exotic technologies currently under development (such as IBM's Josephson Junction technique) , only two logic families appear to offer suitable characteristics. These are Emitter - Coupled Logic (ECL) and Advanced Schottky TTL (AST) . ECL has about twice as fast a propagation time as AST, but its logic levels are not TTL- compatible, leading to interfacing difficulties. A bi - directional, logic - level translating latch is now available in ECL, however, and this provides a reasonably rapid means of converting from the normal microprocessor logic level to those of ECL . "Emi tter-Coupled" refers to the manner in which the emitters of a differential amplifier within an integrated circuit are connected . The differential amplifier provides high impedance inputs, and low impedance outputs . Thus ECL circuits can drive transmission lines, and have a high fanout capability. This makes them suitable for driving the bus-structure required in this design . Preliminary calculations and experimental experience have shown that the bus can have up to 50 processors connected to it on one side, with appropriate memory blocks on the other. The buses are time-shared among all processors. Thus, if a processor cycle is 1 microsecond, then the time during which the processor has access to the bus is 1 micro-second/50 i.e. 20 nanoseconds. The cyclic operation is as follows : Each processor deposits data and addresses into its la tches, and when these are gi ven access to the bus, the data and addresses are transferred into latches on the other side of the bus. Memory is thus addressed; in a similar fashion data is retrieved from memory by the processor. Note that the data and addresses can be sent to more than one set of memory latches. A shift register controlled by the MC is the basic bus controller . This shifts at a frequency of 50 Mhz and every processor's set of latches is enabled in turn, irrespective of whether a processor requires memory access or not. Bus contention is avoided by this time-sharing process. I t does obv iousl y imp ly that there wfT I be some redundancy but this is preferable to complex contention-resolving circuitry.

6.

INPUT/OUTPUT BUS

To a void complication of the above bus structure it was decided to allow processor-to-processor communication through the I/O bus. There is, however, also the possibility of communicating via the common memory, and this will be investigated at a later stag~ The I/O bus also serves to communicate with peripherals and thus it must have a high degree of intelligence. Of those intelligent buses available, Ethernet (Ref. 3) was chosen. Ethernet has in fact the highly desirable feature that no bus-controller is required: any device wishing to use the bus simply "listens" and

2832

A. E. Rabinowitz and M. G. Rodd

seizes it when the bus is free. Simultaneous transmissions are ignored and re - transmission takes place after a "random" wait time . 7.

SOFTWARE

As a research vehicle, much interest lies in the use of the basic structure being developed to investigate various parallel processing ideas . For the immediate project, the following software was developed, and will provide a framework for further work.

the MC can be monitored by various processors i f they do not recei ve a task for any length of time they themsel ves can pass an error message . So potentially the system offers a solution in the way of automatic error detection and inherent fault - tolerance. Another area of potential difficulty is the highspeed bus. This is presently merely duplicated , but future work is required in order to overcome any bus error. 9.

1. 2.

A test program to test the hardware facilities of the bit-slice Master Controller. The operating system, to be resident as a micro program in the MC, consisting of:

(a)

A scheduler, lists,

with associated

(b)

A monitor, failure,

(c)

Sub ro uti nes for deter mi n i ng s uc ce ss ful completion of tasks

(d)

A dispatcher which starts or stops the shift register, and

(e)

A memory manager, memory .

for

queues

and

determining a processor

which loads the tasks into

RESULTS OBTAINED

As the architecture of RAMROD is similar to that of an array processor it was decided to undertake the evaluation of RAMROD by making it do an exercise on an array of integer& The program, based on data flow principles, calculates the maximum of an array of integers. Various processors calculate the maxima of subarrays and then another processor calculates the absol ute maximum. The resul ts obtained indicate that RAMROD performs very well under the given conditions . The total run - time of RAMROD to calculate the maximum of 96 integers is approximately 1 millisecond while it takes one slave processor 2.6 milliseconds to do the same operation. It should be noted that the Ethernet I/O system has not been fully constructed due to the una v ail a bil i ty of components. In the in terim a system of direct I/O from specific processors wa s adopted .

3.

User interface for task programming

4.

A test task for the microprocessor modules

5.

A system diagnosis program

10. CONCLUSION

6.

Monitor programs incorporated within each processor module solely for the purpose of selftesting . These will be invoked norma lly after the comp letion of e very task processing session .

The tightly-coupled distributed microprocessor system described in this paper has been aimed at exploring the use of readily-available state-of the-art components to produce a highly- efficient and fault-tolerant computer . Clearly the weakness, from the point - of - view of faulttolerance, lies in the bus-structure , which could be the bottleneck and the weak link in the chain of control. To overcome this, the bus is designed as a duplicated system, where a fault detected on one bus would cause control to be switched to the second bus. The MC would possibly have to be duplicated. It is also apparent, however, that a failure of the master controller does not necessarily imply that the system as a whole will fail . A degree of operation can be maintained, and only the shift registers will have to be fault-tolerant. Finally, the actual processors can themselves be used as wa tchdogs on the MC.

The application software for RAMROD is based on Data - flow principles: A data - flow language is defined as a "language based entirely upon the notion of flowing from one functional entity to another" (Ref. 4). Data-flow programs are extremely modular . Each module can be understood entirely on its own and has no sideeffects, such as altering another module's variables. Data-flow languages can thus be re presented graphically. The modules that look independent can be executed independently and modules can therefore run concurrently. The data-flow concept can be extrapolated to conventional von-Neumann structures by having processing elements operating simultaneously on tasks. This concept lends itself ideally to an array processor structure. Further information may be found in Ref. 5 . 8.

ERROR DETECTION

The system developed has many features which make for ease of error detection. It has become clear from many studies that the bulk of run - time error conditions may be detected in the time domain rather than in the data domain. The structure of RAMROD allows for this - each processor being able to generate an error signal via a time - out. The MC can set limits on the time for which a specific task is permitted to run so if the processor exceeds this time, an error condition is detected by the MC . In either case, the MC will simply not make use of the failed module until its faul ts have been corrected. Likewise , the condition of

11. ACKNOWLEDGEMENTS The authors wish to express their gratitude to the University of the Witwatersrand, for providing the facilities for the work undertaken and for assistance in the preparation of this paper. They also want to thank the CSIR and Perseus Computing and Automation for the funding of the project.

12. REFERENCES

1.

Enslow, P. , "Multiprocessors and Parallel Processing", John Wiley and Sons, New York, 1974.

2.

l
2833

A Computer called RAMROD

3.

Metcalfe, R.M. and Boggs, D.R., "Ethernet: Distributed Packet Switching for local Computer Networks". Comm ACM, Vol. O. No. 7, Jul y 1976, pp. 395-424.

6.

Guth, R., and LaLive d'Epinay, Th., "The Distributed Data Flow Architecture of Industrial Computer Systems", Brown-Ba veri Res. Center, Baden, Switzer land, 1983.

4.

Ackerman, W.B., "Data Flow Languages", Vol. 15, No. 2, February 1982.

7.

Davis, A.L., and Keller, R.K., "Data-flow Program Graphs", IEEE Computer Vol. 15, No. 2, February 1982.

5.

Rabinowitz, A.E., "RAMROD - An experimental Mul ti Microprocessor", PhD Thesis, University of the Witwatersrand, 1982.

8.

Kopetz, H, et al. "fhe Architecture of MARS", A Technical University of Berlin Report MA 82/2, April 1982 .

"""~ ~~-y----

~

SHARED

BUS

---.----.I--------~___.J

.

Shared Bus

Figure la

E~ I I

I I

PROCESSOR

I

[HEMORy-l

L ___.__ I:HORY

J }

----13 __

I

~I

I

2

[:2

PROCESSORi

PROCESSOR

J

,, PROCESSOR

I I

---[~:,,

r I

I/O

I

n

n

- - -

Figure 1 b

I rRoc~ssOR

Cross bar Switch

I PROC~SSOR

I

~

I-H-++-I-I-----.l

-=-=::::n'-------._-.:..-:....-_-_-_-_-_-_-_h .-_-_-_ -_-------f===t

MEMORY n jl-___ L--___

l

I Figure 1c

IWC5-q*

Multiport Memory

I/O n

]

2834

A. E. Rabinowitz and M. G. Rodd

-m-II

't

'1

MICROPROCESSOR

CYCLE

'}

OATA FROM MEMORY

ADDRESS TO MEMORY (PROCESSOR I)

(PROCESSOR I)

Figure 2

Timing on the Shared Bus

MASTER CONTROLLER DES I GNEO rROH 2900 BIT-SLICE SERIES

FAMilY

RANDOH

Figure 3

COHPUTER

CONlROl

ACCESS

HUWRY

RAMROD Block Diagram

UN I T

ARllIlHETIC

6'

ells

Figure 4

UHI r

AOO~ BUS

Master Controlle r