~nR~rma~snProceSS[haP..e~mzs46(I@93)~.3-18 Et~d~r
29Ap~l 1993
G.~. ~egs~n Dcl~rtm~t of Como~in8 Science, Uniuer~i~jof H~ccccde-Ul~-~js~, H~cmde.Upon.Ty~eHEI 7RU, Un#ed~ d o m C~msnun~¢~md by R.$. S~r~
Received2 Nev~n~b~r ~@~ Re~d~ 5 February 1~3 K ~ . . ~ : Systolica~ays;dynarn[¢prosrammin$;parable[pr~ssin$;d~s[Snof a~l~d~hms
The ~art#ion probie~ can be stated s[taply as foUows; ~iv©n a finite sot A and a s~.e s(a) e Z + for each a eA, ~~s there a subset A' C A such that
It ~s we~ k n ~ n that (LI) be~on~ to the c~a~s of N]F-~ompi~te problen~s a ~ p~ay~ as hnpor~ant ro~e in the proof and c~assif~cation of ~F pr~bHems in genera~ [3]. l~n particular, par~#~@~ is applicable ~o problems involvin~ numerica~ para~©te~ (such as lengths, weights, c~sts, capacities, etc.) which ir~cludes pr~ess~r schedu~in~ a~d th~ s@.~ai~ed ~ a p s a c k probleza [4]. Sy~oiic s~u~ions to (LI) are of interest for a number of reasons. Fi~t!y, it is precisely ~he type of prob~©ras from NF which can benefit from i~prove~ computationa~ pefformanc~ using massively para~e~ architectures. Sec~n~y, the dye na~nic p r o ~ a n ~ n g s~lu~ion to (I.~) produces a recurrence formulation m ~ h d n ~ peculiarities~hat ¢aus~ e~sting autognatic array syathesis techniques ~o breakdo~,n ~1]. Thir~y, once a systolic m~ution to ~ r t # ~ is ~owa, the properties of
Co~ence
~: Dr G.M. M e ~ n , Depar~nent of Corn.
transitivity and ~o~y=ornia~ time reciucibiliCy which have been e~ren~e~y effective in a~6orithm analysis can he employed to provide systolic solutions for a w~¢ range of problems for very ~k~H¢effort. }In this pzper our objective is twofold. ~irst, and foren~ost,we show how to remove the pecu~iari~y from the ~ynarnic pro~amming version of pa~#i@~ and produce an equiva~en~ systern of Conditiona~ Uniform ~ecurrence Equations (CUREs). Second, we apply tradhiona~ synthesis techniques to derive systolic arrays. The ~rst task can be shown to be a specia~ case of a ~nore general mappi~ technique wHch @idens the appHicabi~icy of rehear array syt~¢hes~stechniques. The forgna~ e~ana~ion of ~h~s ~¢w technique is beyond the sc~pe of the current paper but the reader ~s referre~ to {5] for the proof of impor. rant resuhs. Fina]ly, w~ point out Chat, for the sa~e of hre~ty, the rest of the paper assu~es a wor~dn~ ~owie~g© of e~sti~g regular array synthesis techniques. The derivation hem relies on the methods ~n [~] and [7], for conciseness we sca~e on~y the initi~ c~nditions and ~h¢ f[na~ resuh of their application.
L ~ the set A - [a~, a2,..., a,~} and the sizes s ( a ~ , s(a2),..°, z(~,) in Z + he a Siren arbitrary
7~U, U~md K~n~d~n. *
"Tni~work is s u ~ r ~
0~r~)/93/$~.~)
~ SERC ~an~ ~ R / F S ~ .
© I~3 - Ell~er
instance of ~z~rt#~. Define B =, E,e,<~(a). ~f B is not evenly divisible by .two then no parCidon
Sc~en~ ~b~bhers B.V. Al~ zigh~ resezved
13
Volunw 46, Number ~
~ N ~ O g ~ T ~ O N PgOCESSgNO LETTEg$
eNs(cs and (che answer (co (che problem is false. When N is evenly dMsible by ~ o we in(croduce ~ndices i and ] wi(ch ranges ll ~ i ~ n, 0 ~ ] ~ B / 2 and denote by ¢(i, ]) (che result of finding or no~ finding a subset of (¢~, ¢a . . . . . ¢~} for which
¢(~., }) = i / a ~ e
otherwise,
(~.~)
¢(i, 0) ~¢~¢ l ( i ~ n ,
(2.2)
29 April ~993
of affine or linear vadabl¢ inde~ functions (co guarantee l~near schedules resulting from (che ~ufion of au(coma(cicaUy genera(cod linear programming proNems. L~near Nlocafions (or projeCo fio~) of (che compu(cafion domain on(co an array (col0olo~ are ~$o used freque~(c~y. C~¢ar~y (che value of ¢ ( i - 1, / - a ( i , 0)) depends Endirec(c~yon (che variable ~ ~hus contravening (chose assumptions. Nohow we demob(era(co how (2.4) can be re-wNttea as a system of CUREs which, by deft° nifion, are a ~ n e and ~herefore per~i~ (che application ef exisfin$ s~(chesis ~heoty.
because the sum ever (che empty range ~s zero, and
for 1 < ~( n and 0 ( ~ ( ~ / 2 . The values of (2.1) and (2.2) form ~ef~ and upper boundaries of a recsangular (cable which is filled in by applying (2.3) row-byorow. Hence (che ~lufion of ~he gar¢i¢~n problem is given by (che value ¢(n, ~/2). As a simple e~amp~e consider the case when n ~ 5, 3 and s ( ~ ) ~ 8. ~y ~ n s t r ~ i n g ~he ~abRe (che reader can verif~] ~ha(c ¢(5, 13) -~ t r y . TI~e resu~ is confirmed by noting ~ha(c
A f~rther back(cracking s(cage is required to re* c~ver the actual parfi(c~on which involves (crating .the (cruth values back (chrough ~he ruble ~o re(creese the actua~ e~emen~s in each se~ of (che par~hion. Computing ~he ~able row by n~w requires O(n~) s(ceps, while backtracking is only G(n + ~). Nevertheless (che arbitra~ si~e of the ~(a~) values force (che proNem (co stay N N1~. We wN not consider the bacNracking phase. ~or our pu~oses a more sui(cable fon~ of (2.3) is
eU,/) - / ( ¢ U = ~, J), ~U, o), ~(o,/),
~(~-
~, J - ~ U , o)),
(~.,+)
where s(i, O) - ~(a~), ,~(0, D - i are fully indexed, a~d f ( ) represents ~he same ~ o ~ l a as (2.3). Tradifiona~ synthesis techniques rely on ~he use
The main idea is te observe :ha~ if (2.4) is embedded into Z 3 rather (chan Z a (as d~ctated by (che variable indices) (che non-uniformity above can be reduced ~o a routing prob~erA. To see ~hEs f~c(c re°write (2.4) ~s fol~ows
where for q - ( i , ], k)~, ¢(q) is undefined when > 0 and ¢(q) ~ e(~, ]) when k ~ O, and C~-
C3 ~
¸(100) 0 0
i 0
0 ,
0 0
~ 0
,
Cz ~
1
0
,
0 C4 ~
~. 0
(3.2) ,
d~- ( - I, o, o)', d~- (o, o, o)', d3 ~ (o, o, o)', e2 = (0, - I ,
d, - ( - I , o, o)',
(3°3)
O) ~.
]In ~he parlance, of syn~heds rae~hods ~he convex de~sin in Z ~ defined by i and j is e~ended into as equivalent domain D in Z 3 - an upper bound on k is required to ensure tha~ D remains convex and fmi(ce bu(c for now ~us~ a~ume an appropriate one orders. The d a t a dependency between ~(q) and ¢(C4~+ d4 + s(C2q+ d2)¢~) is give~ by e - [ c ~ e * d, + s ( c 2 e - d~)e~] -- - [ d , + s ( c ~ e + d2)e21
Vm~unm46, ~u~m~¢r ~
~I~OgMAT~ON~ROL~SS~NOLETTERS
I R~2~(q+ e2 - e3) k >
which can ~e ~ h t c n
•" - ~ .
-
29 A¢~ ~.~3
I:(e2 + e~) + #e~ + :(e~ - e~)]. (~.4)
O,
(3.5f)
rRc2~(~) Rc~Cq) =
k > o & wCq) - o & vCq) ~ 1, Rc~(q + %) otherwise,
(3.5~) ~ ¢ommccti~m i~cmfifi¢~ by (3.4) ~ now hc ~p~a~:d by ~ thr¢¢-.~t~g¢ muting path in Z ~. This path is non-unique. ~ 7 vector summation which is m~t c~dic ¢~a tm ~sed. Our choice a~c~mp~ ~o m~h~ai~¢ the ~ u m ¢ of the domain ¢~©nsion, the ~ota~ ~©mbcho~ the routh~ path, and ~¢acB ~o ~¢vo~a~ imtcrcs~mg m/~tdic a r r ~ . ~t i~ a ~ wo~h nottmg a~ ~his ~ i n t tha~ ©q~v~©nt ro~ing~ in Z ~ arc ~ssib~¢ hut tend to e,~cnd the path outside th¢ o ~ n a i 2-D domain, ~¢ad to o~¢fl~ading of ~ i~ut=ou~pu~ (i.e., ~amy conn¢~ions and unbounded ~ ~rca) and morn ~ i o ~ y produce data co~miom prcvcnth~g ~h¢ d¢fivatiom of a th~~n~ ~¢hec~u~¢h~ th¢ ~raclition~ way. Denote the t h r ~ stoics h~ the routtmg path by R ~°~, R c~, ~¢~ rcsp¢cfivdy, them (3.1) can be ~piac¢~ by th¢ ~o~o~ng ¢¢ui~lont ~ t c m o~ com~itiomal m m ~ ¢ n ~ s
[ S(¢ - u~)
for ¢ : ~ e D,
(~.~a)
'V(~ +e~ -e~)
R~2~Cq)
Rm(~)
Where (3,5a) a~d (3,5b) pipd~ne the s(i, O) and ~@, j) values ~ing ~h¢ pip¢~ining v¢cto~ v~ = (O, i, 0) ~-- - e 2 ,
v2 ~ (1, 0, 0)'
b©lon~h~ ~o ~h¢ ~¢mds of C~ and C~ respectively (s¢¢ [2]). The v~u¢ ~ is ~eraporafi~y used as a dum~my value which do©.~ no~ affe~ ~h¢ eel] r ~ p u t a t i o n . Eq~atio~ (3o~C) an~ (3.5~) r ~ ' e s ~ n d to com~n~ si~a~.~ that switch the ,data frora ~n© ~ h ~ g sta~¢ ~ another. ~n particular, (3.5c) camps th~ a ~l~¢s whiR¢ (3°~d) carries th~ va~es in (3.4), On the outward trip (i°¢, as inc~©a~cs) a is dcc~cmcnt©d amd on re~.ching ~¢ro the dam assoc:iate~l w~:h t ( C ~ + d.~ "{-$(C~ + ~ ) % ) , on the ~duttmg path z~, is dh-¢~¢~ b~ck towards the ~ a ~ domain a~ ~ - 0 . The ~[ue acta as a tog~c ¢~©ndi~ the path with ~o~
(~.~) :(~(, + a~). s(,). ~(,). ~ . ( , + a.)) ~(e)" I ~-o,
Ob~mm~¢tha~ the ¢~mditi~m~s c~n b¢ ¢~a~ua~¢d at m~n-~m¢ mmd pm~dcd that the ¢¢tuatiom~ arc ~mput¢~ tm the ord¢~ Ca-h) ai~ t&e h~fommafion ~¢quir¢~ to ¢ ~ t ¢ (3.~) is at ~ or its nearest m©i~t~ur~. ~ t m ~ , tm order to si~ap~7 ~h¢ f i ~ ~ operations additiona~ va~ab]es can h¢ intro-
Volume .~, Number 1
duced ~o carry precomp~ted a and ~ terms thus avoiding the hnpiied repea~ed evaluation of the floor and ceiling f~ncfions at each ~ on the k = 0 plane. The change adds more connections but is only cosmetic and is omitted.
Given a set of U~Es systolic arrays can be defined by the pair (time(~), plaee(~)) where
time(~) is the
29 Ap~ ~993
~NFORMAT~ON PROCESSING LETTErerS
time that the e~uafions (3°5) are
eva~ated at point ~ in Z ~ and p~ce(~) is ~he cel~ of the systolic array where the computations are
pe~onmed. The domain, D of (3..+) is siren by 0 < + < n , 0 . ¢ / < B / 2 aad 0<~:<~+B/2,1. The bounds on +, i folnow from (2.t) whi~e +he range of k ~s dete~ra~ed by ra~+[~(a~)/2.1 which m ~ be ho~ded by ~/2 (~ bdn~ evenly div~dbge ~y two)°~ The da~a d~pe~de~o/ ~raph (DD~) for (3°5) is o~ai~ed by co~ecfi~ ~he poi~s i~ according to ~he ~ef~and right-hand sides of (305), wi~h ~h¢ directions of the connections implying da~a ~ow, However, the i©ngths of the ro~d~g paths are dependent on S(a~) so the DDG changes s~ruc~ur© from one problem instance to another° A static DDG is produced by eondder° i~g a~li the possiNe D D G s and forming a single
® P~, + i a
m(2) ~ (0)
_(2) (o) _
®
(
L%o~m
, ~C2) ~(D ~,,,.,,,+ om ¢ ~ +++
~iS. L $~o~ic a ~ y s for ~ s ~ o ~ . (a) 0 ( ~ / 2 ) arrsy (n - 4, b .. 7). (b) O(B 2/4) z~sy (b - 7). ~6
~N~O~T~O~ P~OCES$~NOLETTERS
V¢~u~ ~ , Number 1
Sraph from the union of ~heir dependencies this amounts ~ replicating the dependencies of (3.5) across the whole domain, h is dear that D is a conve~ doma~. Consequently the ve~:iccs, • ~d rays of ~;he sm~,liest COnVeXCO~e containin~ D are e~iiy ohta~ed and these together with the da~a dependencies of (3.S) produce a linear pro~ r a ~ i n ~ problem. Solving ~his linear program pr~x~uces ~he feasible linear ~h~in~ ~chedu~¢
29 A p ~ ~ 3
~(e 2 )
K~ 4 )
which is independen~ of k and c~n be interpreted a wavefront p~s~n~ over th© original domain in Z ~ and a plane e~endin~ in the k direction for Z ~. Any projection direction x ¢ Z ~ for which A~.a @0 can be ~ e d to derive a valid s~tolic array. The ~ o we consider are the directions ( - i , O, O)~ and (0, - I , O)~ co~espo~din~ to the ~ e s of Z 3 (~ot¢ that the direction (0, O, - I ) is not pcm~ittcd)° Thus
(°°°)
p~ce~(e)-0
0
p~ace~(e) -
1
O e,
0
1
0 0
e
specify ~he ~ o a ~ a ~ s h ~ in F~° 1. Each node ~mputes the modified form of (3.~) in a sin~e step. By projecfin~ paramete~zed (i.e. problem ~L~e r©lated) ve~io~s of ~he re.ices one can ~erify tha~, in ~enerai, ~he former allocation recnuiresO(~2/#) cellsand the la~er O ( ~ / 2 ) ceUs and ~hat ~ + ~ / 2 - 1 steps are required ~o ¢ompuCe ~he ~a~t~n resui~ on either array. The ~ o desi~s ai~o trade-off the number of stationary da~a elements a~d the number of e~e~,nal input-output c o . a c t i o n . For e ~ p i e p[~c¢~ ~¢¢uires B/2 ¢ ~ ¢ c f i o n s for ~he ~(0, j) values whi~¢ ~ c ~ z requires o~y one. Sirailarly ~ c c z re~ui~es ~ connections fo~ the a(L 0) vaiu¢~ wher¢~ ~ce~ requires onl~ one. The i~puts to ~he ~'ra7 are c ~ y derived ~ m ~he allocation, but ob~c~e tha~ roufin~ pa~s that ideally be~n outside ~he dc~ah~ ~e~uire appropriately ~ o chro~sed data ent~. Oenerai~y, da~a values in the ~ > 0 potion of the a~ays s~ar~up with
) Fi& 2. Trapezoida~ array for pm~ion (when sizes ~t¢ ~o~¢d,
values which do not affect each other in computations° Fu~her improvements to ~he a~ays are possible, we mention j ~ two here. First the size of the domain in the k direction depends upon $ ~ = m~:(u~). Consequently the ceil require° men~s can he reduced to O(smo.B/2) and O(na,.~) respectively° ~f am~ is ~ o w n and in addition am~. ~ B considerable savings can be made° For instance, in the e~mp~e in Section 2, area,, - 5 and B/2 = 13 saris8 (B/2XB/2 - sm~) - $ 2 and n ( B / 2 - S , ~ ) - 20 ceUs for the two a~ays considered° ]If we can also assume that ~he S(a~) are so~ed the hounds on the routing path lengths make the domain D prism shaped and ~he p~c¢~ aii~catio~ results in a trapezoidal array as s h o ~ in Fig° 2 (for the example in Section 2)° ~Low~of the a~ay wi~h r a ~ e d ends are padded wi~h dummy ceils which do not affect the results to maintain re.latin7.
]~n this paper we have derived systolic arrays for the so-called ~ i t i o n problem using the dynamic p ~ n n ~ g method. The recu~ence formulation ~ o c i a t c d ~ t h this method involves some peculiar non°unifo~ifies which cause e~sting symhesis me~hods to break down. We have ~7
volume 46, Numb,r X
~NFO~ATXO~ PROCESSINGLETTERS
shown how such di~cuh~¢s c~n h¢ ~v©rcom¢ wi~h
c~n be produced. The c~scquenc© of ~he res~hs ~ny N? problem ~h~ c~n be ~ r ~ m f o ~ e d or reduced ~o pa~iaon ~u~om~c~Uy ~cquires ~ systolic imp~cm©n~ion, a~b¢~ ~fier • po~ynomia~ ~h~e ~ransformafi¢~ of ~he da~a. $econd~y, ~he range of recurrences for which e~sfing regular array sy=~h©sis ~cchniques can b¢ applied has ~©en improved.
29 Ap~i~1993
[2] LA.B. Fo~es and D.L Mo~dova., Da~ b ~ c ~ f i ~ g ~ [3] ~.g. O~r¢/s~d D.S. ~ a h ~ , Cor~e~ and Ym~¢~a~i[4] E. Hor~h~ s ~ S. S~h~i, F~ameng~ of ~mpg~¢r Alg#eithr~ (Hgg~n, Lon~., ~978)Ch~p~¢r H. onto r~lgu~r~y¢onn¢c~ ~r~ys, T~ch. RcpL 397, Dept. cf
[6] P. Quimon ~nd V. v~n Don~¢~, The ~PP~lg O~ li~¢~ reticence cquadons on r~gu~r ~r?~ys, J. ~SY S i ~ ~¢¢~. ~L(2), pp. 9~-~.~. si~n~s from recurr~cs ,qu~fi~m, D~'~r~b~e~C ~ t . (I~89) 88-105.
I~¢~u~e No~es i~ (So~nser, ~r~in, ~992).
for ~h¢ ~ a ~ a c k p'obUem, in: C ~ 9 2 ~
Computer Sc~¢~ce~~
/
18