North-Holland Microprocessingand Microprogramming25 (1989) 1 3 3 - 138
133
A Data Flow Numerical Processing Operator P. Abellard and B. Barbagelata Laboratoire d ' A u t o m a t i q u e et d ' I n f o r m a t i q u e Appliqu~es de Toulon Universit~ de Toulon 8 3 1 3 0 - La Garde - France
1 - INTRODUCTION
c o n n e c t i o n o f a p r o c e s s i n g unit output memory module section concerned (5).
Efficiently carrying out a parallel calculation with a c o n v e n t i o n a l multiprocessor machine poses some difficulties and d o e s n ' t always yield the p e r f o r m a n c e s e x p e c t e d . The t h r e e m a i n difficulties to be solved are : -A limited c o n c u r r e n c e : the use o f Von Neuman's m o d e l , c h a r a c t e r i z e d by a s e q u e n t i a l c o n t r o l m o d e and a s h a r i n g o f m e m o r y areas reduces the c o n c u r r e n c e b e t w e e n the various tasks of the program and the number o f tasks carried out simultaneously. - A complex control system : in order to m a n a g e the p r o c e s s o r to p r o c e s s o r and the processor to m e m o r y c o m m u n i c a t i o n s , and to solve the conflicts arising in access to resources, a complex control system is required. P h e n o m e n a o f j a m m i n g and saturation o f b u s e s c o m p l i c a t e this control and reduce its efficiency. - D i f f i c u l t i e s o f p r a g r a m m a t i a n : the p r o b l e m o f p r o g r a m m a t i o n with m u l t i p r o c e s s o r m a c h i n e s is to a v o i d interference between the various instructions of the parallel program. Now, interference is important in the case o f a conventional program carried out by a multiprocessor system.
to
the
A representation model of parallel c a l c u l a t i o n is a b s o l u t e l y n e c e s s a r y , on the o n e hand in o r d e r to express the parallelism, on the other hand to determine the dynamic behaviour o f the system (6). A m o n g the existing models, an extension o f the basic Petri Nets, dubbed " D a t a Flow Petri N e t s " , has been selected (7).
3-DATA 3-1
:
FLOW
PETRI
Reminding
NETS
the
definition.
A Data Fl aw Petri Net is a 7 uple < R, to, ~, qJ, ×,
O, C > in which :
Net,
Pv
R is a conformable Two-part Places Petri and Po called set o f places a s s o c i a t e d
respectively
to
variables
and
operators.
-tO is a surjective application t O / P v :P --~ X ; ~0/Po : P --> O
All o f this makes the analysis o f the system bchaviour more difficult and more complex.
as V Pi ~ P o ' V pj E P o a n d to (pi)= to (pj) for i j then :
In a Data Flow architecture, these t h r e e problems are non-existent. This architecture is structurally different from that o f c o n v e n t i o n a l multiprocessor machines and doesn't implied supplementary material requirements for the execution of parallel programs (1, 2, 3, 4).
V t 1 E Pi,V t k ~ pj, {tO( tl)} ;~ (to(°tk)}, so two identical operators can't work on the same set of data.
2-DATA
(ME 1 ......... ME u}
FLOW
ARCHITECTURE
(Figure 1)
In a Data Flow architecture, there is no central p r o c e s s o r it is replaced by a s e c t i o n of p r o c e s s o r m o d u l e s ( a r i t h m e t i c and l o g i c units, input/output p r o c e s s o r s , . . . ) . T h e r e is no R A M central m e m o r y . It is replaced by a section o f memory modules including addresses, operation codes, operands .... Neither is there any program c o u n t e r : a d e c i s i o n - m a k i n g array e n a b l e s c o n n e c t i o n o f one m e m o r y module section output to the appropriate p r o c e s s i n g unit, and a distribution array enables
o
-~
o
o
is an injective application ~ : X - ~
a s V p E Pv, ME E M ~
M =
M E = ~ (tO (p)) ; M i s
called set o f memory areas. - qJ is a surjective application qJ : T .~
C.
- X = {Xl,X 2 ......... Xu} is a set of variables (real, entire, logic) with values in domains D1, D2 ........ Du. - O = {o 1 , 0 2 ......... ot} is a set o f operators defined as internal applications of D I * D2* ....... D u.
134
P. Abe/lard, B. Barbagelata / A Data Flow Numerical Processing Operator
. . . . . . . . . . . . . . . . . . .
"%
!
1 ,.Roc -ss,No ,N.T VL / I ! !
I..ocEssi.o PACK
OF
RESULTS
PROCESSINO
1
I
L I
~
OF
PACK
J
OPERATIONS
INSTRUCTIONS
~.' ~ I
Figure
SECTION
!
i
,
: Data
Fiow
Architecture.
C = {c 1, c 2 ......... Cr} is a set of conditions (predicates) on X variables. -
3-2
:
Pi
Representation.
PZ
Pr
°t. }Data
F i g u r e 2 s h o w s the r e p r e s e n t a t i o n o f an operation carried out with an operator and a set o f variables. S o f t w a r e and h a r d w a r e s i m u l a t i o n s o f t h o s e nets have been r e a l i z e d with e l e m e n t a r y studied m o d u l e s a s s e m b l e d t o g e t h e r a c c o r d i n g to the net architecture (8). -t
i and tj
are
respectively
called
output transitions o f the o r operator
input
P£j ~
and
or
operator
associated with
the place Pij" o
places P'2 ......... P's} n e c e s s a r y for obtained. -
3-3
:
o
t i = {Pl' P2 ......... pr } and t j = {P'I' respectively represent the data the o p e r a t o r o r and the results
j J
Pl
~ P
t?}results J
P~"
Figure 2 : Representatlon of an operation.
Marking.
A mark put down in a "variable" place means that the value o f the variable is written. A mark put d o w n in an " o p e r a t o r " p l a c e m e a n s that the operator is activated. We assume that a place store one mark, so the nets are safe. 4 - A P P L I C A T I O N TO C O M P U T A T I O N A L 3-4
:
Example.
Figure 3 shows the Data Flow Petri Net of the calculation : z22 = a22 - w21* Zl2 - w24" z42. 3-5
:
LINEAR
Marker
graph.
In order to study the dynamic behaviour the net we can use the marker graph (figure 4).
of
Some new methods of using parallel algorithms are now well suited to their i m p l e m e n t a t i o n on such a parallel architecture. This is the case o f Evans' s Quadrant Interlocking M e t h o d , for the solving o f linear s y s t e m s o f equations, which, contrarily to t h e Gaussian method or triangular LU decomposition, doesn't lead to sequential, but to concurrent relations (9).
135
P. Abellard, B. Barbagelata / A Data Flow Numerical Processing Operator
W2,
Zi2
W24
ME = Iiii0001000
Z42
i,2,3,,,81
-
P5 ~~
Pd
~
7,8,3,, ] ] 5,8,8]" 8,9,1,2J
t3 a
tz ~
[7,8,o
t5
~ i t
t ~
I
/ t
"1 5,8,9 I"
I
7,8,9 t5
p,o( i0
6 -l t
'I
t~ c
Z22 -- a22
M r = 00000000001 Figure 4 : Marker graph.
- w21~zi2 - w24)~z42
Figure 3 : Data Flow Petri Net. Let the system A x = b be solved and let us consider the d e c o m p o s i t i o n o f A into " b u t t e r f l y " matrices : A = W , Z with :
and by substitution equations, we obtain :
in the
second and
third
z 2 2 = a22 _ w21 z12 _w24 z42 and z23 = a23 _
W=
1 w21
0 0 0 1 0 w24
w31 0
0 1 w34 0 0 1
Zll Z=
0
w21 Zl 3 _ w24 z43-
Z12 Z13 Z14 Z 2 2 Z23
0 Z32 Z33 Z41 Z 4 2 Z43
0 0 Z44
Similarly, for row III, we have :
In order to determine the coefficients o f W and Z, we realize the equality : A = W , Z, and thus we obtain, for rows I and IV :
w31 Z l l + w34 z41 = a31, w31 z12 + z 3 2 + w34 z42 -- a32, w31 z13 + z33 + w34 z43 = a33, w31 z14 + w34 z44 = a34,
I
: Z l l = a11, z12 = a12, z13 = a13, z14 = a14
IV : z41 = a41, z42 = a42, z43 = a43, z44 = a44 w h i c h require no calculation. manner, we have for row II :
In
the
same
II : w21 Z l l + w24 = a21, w21 z12 + z22 + w24
As above, the values o f w 3 1 and w 3 4 are deduced from the first and the last equations, and by s u b s t i t u t i o n in the s e c o n d and the third equations, we obtain z32 and z33. The Data Flow Petri Nets m o d e l l i n g calculations are illustrated in figures 5 and 6.
these
z42 = a22, By using the relation A = W , w21 Zl 3 + z23 + w24 z43 = a43, w21 z14 + w24 z44 = a24, From obtain :
the
first
and
the
last
equations,
we
w21 = (a24 - z44 a 2 1 / z 4 1 ) / ( z 1 4 - Z l l z 4 4 / z 4 1 ) and w24 = (a24 - z14 a 2 1 / Z l l ) / ( z 4 4 - z41 Z l 4 / Z l l ) '
Z, the linear
system A x = b can now be expressed with two linear systems : Wy. = b, and Z x = Z. For solving W v = b, we proceed as follows :
P. Abellard, B. Barbagelata / A Data Flow Numerical Processing Operator
136
z II
Z14
p,O t
/
0i
t
~21
2 ~ / " - - ~
zxg/zxt
02
~
=i4/zix
~ai
0 3
,.%
....%
t
t
Pio
~r
Piz
Pia
4
Pt
Wt9 O~ ~
W
m
O~ ~
~ - J t~ x
txo
--
O70
--
q!) / ~,
t,?
W24
r---I t,B
W ~g
Figure 5 : Data Flow Petri Net of decomposition 1. Y 1
0 0
0
w21 1 0 w 2 4 w31 0 1 w 3 4 0 0 0 1
1
b
Y2
b
Y3
b
Y4
b
1 2 3 4
We can see immediatly that Yl = bl"Y4 = b4' w21 Yl + Y 2 + w24 Y4 = b 2 and w31 Yl +Y3 + w34 Y4--b3 in wich Y2 = b 2 " Yl w21 " Y4W24 and Y 3 - - b 3 " Y l w31 " Y4 w34" Once y.. is known, we have to solve y_ = Zx by proceeding as follows :
Zll 0 0 Z41
Z12 Z22 Z32 Z42
Z13 Z14 Z23 0 Z33 0 Z43 Z44
Y
x1 x2
=
1
Y2
x3
Y3
x4
Y4
Starting from the center, we solve the system z22x2 + z23x3 = Y2 and z32x 2 + z33x 3 = Y3 to obtain : x3 = (Y3 - z32Y2/Z22)/(z33 - z32Y23/z22) and x 2 = (z32 - z33Y22/z23)/(y 3 - z33Y 1/z23 )By substitution in ZllXl + z12x2 + z13x3 + z14x4 = Yl' z 4 1 x l + z42x2 + z43x3 + z44x4 = Y4 we can calculate x 1 and x4
137
PoAbe/lard, B. Barbagelata / A Data Flow Numerical Processing Operator
W
Z
2.
W
12
Z
24
42
P4
t3
a2z
L . - - - I
t 5
( rZ2:2
Figure
NODE
i
6
: Data
Flow
NODE
MUL
r--~
~p.
Petri
OUEUE 2
MUL
r-
"-1 ARC
/ NODE
N (~
OUEUE
ADD
Net
6
of d e c o m p o s i t i o n
2.
I .EQUATE HOST=O; 2 .MODULE EXONE=I; 3 .INPUT ARCI,ARC2,ARC3,ARC4; 4 .OUTPUT ARC8; 5 .LINK ARCS=NODEI(ARCI,ARC2); 6 .LINK ARC6=NODE2(ARC3,ARC4); 7 .LINK ARC7=NODE3(ARC5,ARC6); 8 .LINK ARCS=NODE4(ARC7, ); 9 .FUNCTION NODEI=MUL,QUEUE(QUEI,I); IO.FUNCTION NODE2=MUL,QUEUEiQUE2,1); II.FUNCTION NODE3=ADD(QUEUE,QUE3,1); 12.FUNCTION NODE4=OUTI(HOST,O); 13.MEMORY QUEI=AREA(1); 14.MEMORY QUE2=AREA(II; 15.MEMORY QUE3=AREA(1); 16.START; 17.DATA EXEC(EXONE,ARCI); 18.DATA EXEC(EXONE,ARC2); 19.DATA EXEC(EXONE,ARC3); 20.DATA EXEC(EXONE,ARC4);
21.END. ARC
Figure
7
B
: Implementation
of a c a l c u l a t i o n
on a H P D
7281
processor
P. Abellard, B. Barbagelata / A Data Flow Numerical Processing Operator
138
We can see that the two nets described above can be used again. Those nets feature a h i g h parallelism rate. 5 - OPERATOR The
IMPLEMENTATION
nets
obtained
can
be
directly
implemented on a data flow processor [~ PD 7281 (10, 11). Figure 7 gives an example of implementation for the calculation : Y2 = z22' x2 + z23' x3
6
-
5 - DONAN A : Paradocs, a highly parallel data flow computer and its data flow langage. Microprocessors ane microsystems, n ° 7, pp 20-31, 1981. 6 - KRISNA M, KHAVI, BILL P BUCKLES, U NARAYAN BHAT : A f o r m a t defintion o f data f l o w graph m o d e l s , IEEE Vol C35, n ° 1, pp 940-948, November 1986. 7 - ALMHANA J : Moddlisation par Rdseaux de Pdtri d Flux de Donnges. Application ~ la synth~se de l'opdrateur de Riccati rapide. Th~se de Doctorat d'Etat, Marseille, Juin 1983.
~
Data flow or data-driven machines feature a high degree o f parallelism between the tasks feasible in parallel, and consequently allow a high level performance to be reached, substantially higher than with conventionnal machines. In this paper, a static data flow architecture has been presented. The regular and cellular structure in which communications are simple allows to take advantage of the progress achieved in the domain of Large Scale Integration for the implementation with data flow processors, o f fast and low cost machines. Though the architecture presented corresponds to an operator for the solving of linear systems of equations, (an important subset in filtering and image p r o c e s s i n g problems), the principles discussed are yet valid for numerous other applications such as robotics, etc... Data Flow Petri Nets have been used for the modelling o f data flow parallel programs. A software tool has been developped for their automatic print-out, validation and arrangement in order to know the temporal behaviour of the programs represented and to optimize the procedure of their implementation on a data flow machine (12).
1- RUMBAUGH J : A Data Flow Multiprocessor, IEEE Transactions on Computer, Vol C26 n°2, pp 138,146, February 1977. 2 - AGERWALA T, ARVIND : Data Flow Systems, IEEE Computer, February 1982. 3 - ARVIND, VINOD KHATAIL, PINGALI : A Data Flow Architrecture with tagged tokens. Laboratory for Computer Sciences, M I T / L C S / T M - 1 1 4 , September 1980. 4 - ARVIND, VINOD KHATAIL : A multiple processor data f l o w machine that s u p p o r t generalized procedures. Eighth Annual Architecture Conference IEEE, pp 291-302, May 1981, Minneapolis.
8 - BARBAGELATA B, ABELLARD P : P a r a l l e l processing modelling with Data Flow Petri Nets. First european workshop on parallel processing and techniques for simulation, UMIST, 28-29 October 1985, Manchester. 9 - EVANS D.J : Design o f p a r a l l e l numerical algorithms. First european workshop on parallel processing techniques for simulation UMIST 28-29 October 1985, Manchester. 10 - BARBAGELATA B, ABELLARD P : Opgrateurs de calcul paralldle modglisgs par Rdseaux de Petri gt Flux de Donndes. Onzi~me Colloque GRETSI, 1-5 Juin 1987, Nice. 11 - BARBAGELATA B, ABELLARD P : Data Flow Petri nets f o r Data Flow processors. The Petri Net Newsletter, n ° 24, pp 18-20, August 1986. 12 - BARBAGELATA B, ABELLARD P : A Data Flow Multiprocessor, Eight European Workshop on Applications and Theory of Petri nets, 24-26 June 1987, Zaragoza. 13 - GUILIERI A, BARBAGELATA B, ABELLARD P : Systolic arrays modelling by Data Flow Petri Nets, First European Workshop on parallel processing techniques for simulation, 28-29 October 1985, UMIST, Manchester. 14 - ABELLARD P, BARBAGELATA B : Systolic array d e s i g n , Ninth European Workshop on Applications and Theory of Petri Nets, 22-24 June 1988, Venice. 15 - MESHACH W : Data Flow IC makes short work though processing chores, Electronic Design n ° 17, pp 191-206, May 1984. 16 ABELLARD P : C o n t r i b u t i o n gt l'dtude d'extensions des rdseaux de Petri ~ Flux de Donndes r la tdl~symbiotique assistge p a r calculateur. se de Doctorat ~ciences, 14 Juin 1988, Toulon.
~