North-Holland Microprocessingand Microprogramming 23 (1988) 149 - 152
149
A NETWORK OF TRANSPUTERS TO EMULATE A PARALLEL SIglBOLIC PROCESSOR
G. Goncalves,
M.P. Lecouffe,B.
Toursel,S.
Niar
Universit4 des Sciences et Techniques de Lille Flandres-Artois Laboratoire d'Informatlque Fondamentale de Lille (LIFL) 59655 VILLENEUVE D'ASCQ CEDEX FRANCE
This paper deals with the N-ARCH project developed at the University of Lille and its implementation on an INMOS transputers network. The aim of this project is the design of a parallel architecture with non-Von Neumann control modes and suited to declarative languages processing [Tour 87]. A first section describes the main characteristics of the architecture. A second part details the basic functions and the structure of a network node. A final part presents a prototype implementation of N-ARCH based on a network of transputers.
I. MAIN CHARACTERISTICS OF THE ARCHITECTURE
We are presently investigating the design of a Parallel Symbolic Processor (PSP) in which a uniform objects distribution is chosen to provide a maximum parallelism. One of the most important characteristics of the PSP architecture is the use of LINET, the Lille Intelligent NETwork. In LINET, each node of the network incorporates communication and processing facilities and an associative memory. Both allocation of the different objects among the nodes, and messages routing are realized by a similar method, based on hashing technics. The hashing functions, applied on the pair (object name, context) are used to determine in which node an information is stored (global searching), and after this step, to switch messages from a node to another one. Collision problems are solved using spanning trees [Gonc 86]. This method allows an unlform distribution of information and traffic among the network of PSPs. Message routlng is done concurrently with evaluation inside the nodes. Another major point is the use, in each node of LINET, of content addressable memories for local searching. All objects (data, expressions, functions,...) are represented by tagged informations, and the object name and the context are incorporated into tags. These tags can be used as selection criteria for associative memory operations. Another significant difference with other works is the ability to use simultaneously distinct control modes. Messages circulating in LINET may be either request packets for an object, or answers delivered by a node to some object known by its name. The request mechanism permits a by reduction evaluation of functional programs ; answers may be either values or expression definitions allowing us to implement either graph reduction or string
reduction or data-flow control. Moreover, depending on object characteristics (like size or type), it would be possible to use sometimes string reduction, sometimes graph reduction in a single program evaluation.
II. A N-ARCH NODE
The propertles of systems which devlate from the traditional "von Neumann" architecture (reduction machine or data flow computers) show that associative mechanisms are needed (i.e. in order to isolate a particular information inside a large space) for fast processing. During the five last years, with the advance of the VLSI technology, many researchers have designed large associative memories [Kadota 85] or "intelligent memorles" [O'Don 85] in which information is accessed according to its content. In N-ARCH we introduce the concept of "intelligent" nodes which incorporate such memories. II.1.
The
basic
funetlons
Each node must following tasks :
be
of
able
a node
to
perform
the
- Object storing is achieved by the way of a global hashing function ; both static objects (definition of expressions...) and dynamic objects (request packets...) are stored and distributed using this method. - Updating ; In a demand driven or data driven control it is always necessary to update objects or control informations into the memory (in the case of a graph reduction when an expression is quite reduced, its value must replace its definition).
150
G. Goncalves et al / A Network of Transputers
Object selection is always done by associative technics. Objects are selected either by their name (request or result packets) or by their type (executable instruction or reducible expression). - Expressions evaluation which must be done to process expressions which are totally reduced (graph reduction) or executable instructions (data flow mode). - Communication with nei~hbourin~ nodes. A node is able to send (or receive) messages towards nodes which are physically connected to it, by the way of a specific communication mechanism.
II.2.
The structure
The structure gure I :
~
of
a node
of a node
is shown
in fi-
InputUnitJ---~ontro! Unit~
1
r, ssocioti,el IProcessingI
Output UniM t MemoryI'~ Unit I ' I(~ackets) J ,
i
fig I
Two associatives memories are used ; the first one to store programs and the second one to store dynamic objects as packets (result or request packets). The control unit examines messages which are received by the input unit, to determine if the message is in a routing step or in a searching step. In a routing step, the node is not the consumer of the message, so the message is sent to the output unit. In a searching step, a request to the associative memory is done to determine if the searched object is present, otherwise a local hasching function is applied to give a successor node number where the searching must continue.The processing unit performs the evaluations of expression and the output unit sends both result and request packets produced by the node, and routing messages.
III.
PROTOTYPE
Transputer
implementation
In this implementation we use an AT computer with an IMS BOO4 add-in board (INMOS) connected with a network of 8 up to 16 Transputers T414 (IMS B003 boards). A Transputer (INMOS) is a VLSI micro-computer which integrates processor, memory and inter-computer data links into a single chip [Wal 85]. Data links make easier the design of various network topologies (hypercube, bubble,...). The processor provides a support for concurrency with microcoded scheduler. The OCCAM programming language based on the CSP model is directly handled by the transputer. A first version of a minimal N-ARCH kernel has been written in OCCAM which emulates the basic functions of a node and incoporates facilities to execute fonctionnal programs in a by reduction control mode. These programs only consist of expressions in a first step, we will latter integrate function evaluation. In this version each Transputer of the network runs the same N-ARCH kernel program. The structure of the kernel has been implemented by concurrent processes in order to have no wasting time during inter-computer communications. So we can distinguish two types of process : communication processes and internal processes. A communication process needs during its execution to send or receive informations through a physical link unit (external channel). An internal process requires only internal channel to execute, so only the processor unit is used. Because the four link units and the processor unit of the Transputer are autonomous, communication processes and internal processes can be run concurrently. The figure 2 shows the decomposition of the N-ARCH kernel into OCCAM parallel processes.
4 input channels
4 output c
~
anne,s
mere,
,ry
:xecution process
IMPLANTATION
Currently, we are investigating the design of a prototype implementation of N-ARCH in two distincts ways : * The study of VLSI chips to realize the different units of a node as associative memory. * The use of a network of transputers which emulates the basic functions of a node (described in the following part).
~
fig 2
I
G. Goncalves et al / A Network of Transputers At the top level of this scheme is :
, the Occam translation
PAR INPUT.PROCESS(...) INPUT.QUEUE.PROCESS(...) CONTROL.PROCESS(...) PROGRAM.MEM.PROCESS(...) PACKET.MEM.PROCESS(...) EXECUTION.PROCESS(...) OUTPUT.QUEUE.PROCESS(...) OUTPUT.PROCESS(...) Communication
processes
The Input process and the Output process involve the use of a link unit to communicate with its neighbouring nodes. Input process : the function of this process is to receive input messages on each of the four external channels by the way of four parallel processes, one per channel. In OCCAM, we may write : PAR i=[O FOR 4] WHILE TRUE SEQ receive.packet (channel [i]) send.to.queue(to, input.queue[i] ,to.output.queue[i]) : Output process : in a similar way the output process consist of four processes. Each process continually asks its corresponding sub-queue for a message to send to the node which is physically connected to it. Internal
processes
It consists of five processes. Input queue process, Output queue process, Control process, Memory process, Execution process. Input queue process : acts as a buffer (fifo memory) between the Input process and the Control process which run at different speeds.
WHILE TRUE ALT ALT i=[O FOR 4] length <=free.place & from. input.process[i] ?length SEQ packet.in.memory (from. input.process[i]) free.place:=free.place - length packet.nbr:=packet.nbr + I
151
Control process : realizes the most important function of the node (i.e implements the demand driven control mode). It consumes packets of the Input queue process and depending of the packet type it makes a particular task. Firstly for a load packet (i.e initially to load the program to execute) it asks the Program memory process for a free place to store the new packet. If it does not succeed (i.e collision case) the packet is routed to one of the Output sub-queues. Secondly for a request packet, it asks the Program memory process if an object known by its name is present or not. If this name is found, depending on the requested expression state, the control process receives either : A value : In this case, it builds a result packet, and sends it to the output queue process. Or an expression : In this case the control process creates an incomplete (waiting) result packet and sends it to the packet memory. Moreover, if the expression has not yet been requested, for each identifier in the expression, one request packet is generated and sent to the output queue process. Finally for a result packet, there is only one access to the program associative memory. The expression is therefore replaced by its value (received in the packet). -
-
WHILE TRUE SEQ receive.packet(from, input.queue) to.prog.mem ! type If type = load SEQ from.prog.mem ? answer IF answer send.packet (to.prog.mem) TRUE send.packet (to.out.queue) type=request do.work.control(request) type=result send.packet(to.prog.mem)
:
Memor~ process : associative memories present in the structure of the node have been simulated by two Occam processes. The Program memory process stores a part of the program to execute (i.e expressions). It can be considered as a server for the Control process and the Execution process. The control process requests can be : to load an expression, to search an expression by its name, to update an expression. -
packet.nbr>O & from.control.process ? ANY SEQ packet.out.memory(to, control.process) free.place:=free.place+length packet.nbr:=packet.nbr -I :
-
-
On the other hand, the execution process requests are : to search an executable expression, to replace an expression by its value. -
-
152
G. Goncalves et al / A Network of Transputers
The packet memory process stores packets waiting for results (incomplete packets). Once an incomplete packet receives a result, it is immedlatly sent to the output queue.
The Execution Process This process will be responsible for arithmetic and logic operations. The processing of an expression starts by sending a request to the Program memory process. When there is an executable expression in the program memory it is sent to the Execution process. This one evaluates the expression and sends the result both to the program and packet memories.
CONCLUSION
The N-ARCH emulator described in this paper is in a realisation step, so we cannot give for the moment benchmark results. Occam programming seems very attractive to implement the N-ARCH kernel at the top level by the way of Occam parallel processes. However due to the lack of powerful data structures we think to use a C programming language interface (available to-day) for writing at a low level specific processes as Execution process.
REFERENCES
[Tour 87] TOURSEL B., LECOUFFE M.P., GONCALVES G. "N-ARCH: une architecture parall~le pour programmes declaratifs." C3 Pole Architectures. Journ@es architectures de Nice, juillet 87 [Kadota 85] KADOTA H. and al"An 8-kbit content addressable and reentrant memory". IEEE journal of solid-state circuits, Vol. 20, n o 5, October 1985. [Wal 85] 1985.
WALKER P."The transputer"
Byte,
May
[Gonc 86] GONCALVES G, LECOUFFE M.P, TOURSEL B. "Le projet N-ARCH : conception d'une architecture parall~le pour programmes d@claratifs". Technical report LIFL -Universit4 de Lille I, n o 48, Oct. 1986. [O'Don 85] O'DONNEL J.T."An architecture that efficiently updates associative aggregates in applicative programming language".Functional programming language and computer architecture, Nancy,Sept. 85.