Science of Computer North-Holland
15 ( 1990) 201-215
Programming
A DERIVATION
OF A SERIAL-PARALLEL
201
MULTIPLIER
Rob R. HOOGERWOORD f~eparrmcnr
qf Muhemarics and Computing Science. Eindhtmw Unit~ersig t$ rechnokogy,
P.0. Box 513. _.WOCIbfB Eindhtwen. Nerherlands
Revised June 1Y90
Abstnct.
A serial-parallel
circuit implementauon.
multiplier
is developed systematicatfy from functional specification to
First. a functional
for a systolic computation
program is derived and, second, a parailel program
is constructed. The parallel program is derived from the functional
program. Both synchronous and asynchronous circuit implementations are discussed- The latter implementation
for the paralfei program
has a pipetfne structure with bounded response time.
1. Introduction
In VLS! textbooks serial-parallel multipliers are usually explained in terms of pictures and diagrams. The purpose of this paper is to derive a design of a serial-parallel multiplier in a calculational style, thereby revealing in detail all properties used in such a derivation. Such a derivation starts with a functional specification and ends with a circuit implementation. The derivation consists of three steps. In the first step we derive a functional program for the serial-parallel multiplier. In the second step we derive a parallel program that expresses serial-parallel multiplication as a distributed computation performed by communicating components. The difference between the functional and the parallel program is that the parallel program not only s computed t y each component, but also the M&V in* among the components. In the last step we briefly indicate how the parallel program can be mapped on several types of circuit implementations. Such an implementation may be a. synchronous or an asynchronous circuit. Our design allows a nice pipelined implementation. Furthermore, its response time is bounded, i.e., after receipt of each input the next output is produced within an amount of time that is independent of the length of the pipeline. 0167~6423/90/%3.50
@ 199~Elsevier
Science Publishers B.V. (North-Holland)
1. C. Ehergen, R. R. Hoogerwoord
202
2. Specification The problem is to design an efficient machine that multiplies numbers by a fixed number. Here, numbers are natural numbers. The numbers input to and output from the machine are represented by lists of binary digits (“bits”), in the usual way. In order to formalist: the specification, we introduce an abstraction function maps finite lists of bits to the numbers they represent. Its definition is = (Gi:OS i< #hs:(&s.i)
* 2’)
where #b_s denotes the leqqth of list hs of bits and hs.i denotes element i in list 6s. Notice that in this representation the least significant bits occur at the head of the list. Without proof we state that this representation has the following properties. For bit b and list b,r of bits, we have
pj<2”
zs
n is representable by a list of length k
(3)
where [ ] denotes the empty list and where .*:” (cons) denotes list prefixing with one element. First, we consider the machine as a function mul, say, that maps lists of bits to lists of bits. Although one of the operands of the multiplications performed by the machine remains fixed, we do introduce a parameter for this operand. Thus, we increase our manipulative freedom. For lists 9s and us of bits, rnul.9~~~ is a list of bits satisfying #
mdqs.as
= #as
(4)
(5) (Function application is denoted by “.” .) Parameter 9s represents the fixed multiplicand of machine muL9s. This machine produces a list of bits for its output that is as long as its input, which is represented by parameter as; hence (4). In order that (4) be satisfiable, the product [9sj* [as] must be representable by a list of length #as. On account of (3) this amounts to [9sj * [USI < 2”“‘
(6)
Therefore, we impose (6) as a precondition onto the parameters of marl. specification of function nrul now reads (6) +
(4) A(5)
o he
for lists 95 and as of bits.
Remark. Actually, we are heading for a machine that outputs one bit after each reception of a bit via its input. This is possible, because the lesser significant bits of the (representation of the) product depend on the lesser significant bits of the operands only.
Derit~arian
qf a serial-parallel
multiplier
203
Precondition (6) is not too restrictive. For arbitrary qs and ats, it can be met by suffixing us with #qs zeroes. Notice that as t-+( :eroes. # qs )I = [as 1 where 2eroes.k represents the list consisting of k zeroes, and that
3. Derivation
We derive a recursive definition for mu1 by induction on qs. We start with (5), since this is the more important part of the specification. Base: fimul.[ ].aslj = {Specification of mu!
=
(I[in* bn
(5))
(Definition of [ I( 1)9arithmetic} 0
=
(zeroes.#as is a list of # QSzeroes, definition of [ 1) [ zeroes.# as1
If it were not for (4), we could have chosen mul.[ ].as = [ 1. Now we must choose mul.[ ;.as = zeroes.#as which satisfies all conditions of the specification of mul. Step : mul.( q:qs).asj = (Specification of mu1 (5 )} [
ucqmn *u4 = (Definition of [ n(2))
(q+2 * uq.sn)* [asI] {arithmetic) 4 f [as1 +2 * lqsn * [asI = {Specification of mu1 (by induction hypothesis), see below) 4 * uusn + 2 * [mul.qs.asfy = {Definition of [ n(2)) q * [ asI + [O:mul.qs.asjj = {Specification of sum, see below} [sum.q.as.(O: mul.qs.as)j 2
204
J.C.
Ebergen,
R. R. Hoogerwourd
The introduction of a recursive application of mu! generates a proof obligation: its arguments must satisfy the precondition of muf. This proof obligation is
which is trivially satisfied, because (I qs c[q:qsj) A (05 We continue with a specification for the newly introduced function sum. In the last line of the above derivation we have introduced sum in order to define mul.(q:qs).as
= sum.q.as.(O: mu1.qs.a.f)
This is correct if, for bit q and bit sequences as and hs, sum.q.ar.bs
is a bit sequence
satisfying (7) *
@)A (9)
with (7)
(I* #(sum.q.as.bs)
= #as
(8) (9)
(7) is (again) necessary to make (8) n (9) satisfiable. Conditions (8) and (9) follow from the above derivation and the specification of mu!, in particular (4j. That (7) holds for sum’s arguments in sum.q.as.(O:mulqs.us), follows from precondition (6) of mul and the above derivation. We now derive a definition for sum by induction on us and 6s. Recondition
Base: [sum.q.[ ].bsl = (Specification of sum (9)) q *II1 11+uwl =
(By precondition
(7): q * [[ ]
0 =
(Definition of [ I( 1))
Ill Ill So we choose sum.q.[ 1.6s = [ 1, which also satisfies (8). Similarly, we can derive [sum.q.as.[ ]I = q * [USI. Although it is possible to +ewr% expression q * [asI further into an expression satisfying 8S 1, i: ;% srmprer to avotd it. We strengthen sum’s precondition with #as<
#bs
From this precondition it follows that (bs = [ ])*(a~ = [ 1). Consequently, there is no need for a separate definition of sum.q.as.[ 1. Notice also that this precondition is satisfied in sum’s application in the definition mul.(q:qs).as =
Derivation of a serial-parallel
multiplier
205
sum.q.as.(O:mul.qs.as ). Step: [sum.q.( a:as).( 6: bs)jj = {Specification of sum (9)) =
{Definition ofi ] (2)) q*(a+2*fJasB)+b+2*[bsj = (arithmetic) (9 * a+b)+2 * (q * gasD+[bsj) = (induction hypothesis) (q*a+b)+2*(sum.q.as.bs) = {(q*a+b)isnotabit,(q*a+b)mod2is} (9*a+b)mod2+2*((q*a+b)div2+[sum.9.as.b.s~) Now we are stuck: we see no way to get rid of the term (9 * a + 6) div 2. We observe, however, that both the expressions [sum.q.as.bslf and (9 * a + b) div 2 + fsum.q.as.bsl are of the form c+[sum.q.as.bsl for some c, OS c (actually, 06 c < 2, but we do not need this here). Therefore, we generalise function sum-using a technique called generalisation by abstraction in [4]-as follows. We introduce function sum1 that for riatural c, bit q, and bit sequences as and 6s satisfies (IO) *
(ll)A(f2)
where c+q*[asl+[bs]c2”“’ #(suml.q.c.as.bs) [sum 1.q.c.as.bsj
A #ass#bs = #as
= c + q * [aslf + [bsJ
Notice that sum I is indeed a generalisation of sum, because sum.q = sum 1.9.0 By (very) similar derivations as above we SAGS for NW.5 suml.q.c.[ 1.6s = [ ] suml.q.c.(a:as).(b:bs)=d
mod2:sumkq.(d div2).as.bs whered=c+q*a+ben
Since we used a recursive function application in the second line above, we must verify that this recursive application satisfies precondition (IO). This is easy.
Combining all results we obtain mul.[ ].as = zeroes.# as mul.( q:qs).as
= sutn l.9.O.a.~.~O:rnul.9s.a.~)
su?n l.g.c.[ ].h = [ ] suml.g.c.(a:as).(h:hs)
= d mod 2: suml.q.(d div 2).as.h whered=c+q*a+hend
Finally, we remark that the above derivation can be given also for number representations with bases other than 2. The dnly changes in the program and the derivation are that all occurrences of constant 2 must be replaced by the new base.
4. Towards a parallel program Our next task is to implement the above functional program as a set of communicating components. Each component should carry out a simple computation, like addition modulo 2 or (integer) division by 2. The challenge here is to design the pattern of the communications among these components in such a way that the composite performs the specified computation. Furthermore, we wish our design to be as efficient as possible. For example, we try to achieve bounded response time. The communication behaviour between a component and its environment is specified by a program as well. This program prescribes the order in which input and output values are communicated via the channels between the component and its environment. Our program notation is a simple CSP-like notation [3], where communications on a common charm<;are assumed to be synchronised by definition. Later, we transform the program into a version in which communications may be considered to be asynchronous. This simplifies its implementation as a circuit. First, we specify the order of the communications of multiplier MUL9s with input channel a and output channel z. The sequence of values communicated via a is list as and the sequence of values communicated via z is list zs; i.e., in order that MW.qs is an implementation of mu!.9s, we must have zs = mul.qs.as. The input sequence is input serially in order of increasing index, and the multiplier produces an output bit after each receipt of an input bit. This communication behaviour is expressed by pref *[a?; z!].
Here, “a?” denotes an input action at channel a, “z!” denotes zn output action at channel z, ‘*;” denotes concatenation, l ‘*[ . . -1” denotes repetition of the enclosed (also called Kleene’s closure), and pref denotes, so-called, prefix-closure. (That is, pref E indicates that any prefix of a sequence expressed by E is a valid sequence.) Examples of the sequences defined by pref *[a?; t!] are E (the empty sequence), az, azaz, also a, and aza, but not aa, nor x
lkvioation
qf a
serial-parallel
multiplier
207
Besides the order of the communications between component and environment, we must also specify which values are communicated along the channels. A suitable program for MUL.[ ] is mm MUL.[ ] (a?, z!:d3): pref *[a?;tf_T:=O;f!] mot
This program expresses that component MUL.[ ] has two channels: input channel Q and output channel z. With each channel a variable is associated: ~.MI for channel a and uz for channel z. Both variables are binary variables, i.e., VU,UZE where .#I = (0,l). Furthermore, a? denotes receipt of a value at channel u, after ch this value is assigned to variable MT,and z! denotes sending the value of uz along channel _. - Values received at channel a are always assigned to DQand output actions at channel 2 always output the value of DZ Accordingly, u? and z! can be considered as abbreviations of u ? uu and z! OZJThe above program prescribes that, repeatedly, a value is received at input channel a and, subsequently, the value 0 is output alon channel :. Obviously, this realises 2s = mul.[ &us. Notice that, when we strip all value information from the above program, we are left with pref *[u?; t!], which expresses the desired order of the communications.
5. A parallel program
In the previous section we have decided that the communication behaviour of MUL.qs will be pref *[u?;z! 1. We now derive, in a number of steps, a (recursive) decomposition for component MUL.( q:qs). Inspired by the definition mul.( q: qs).us = sum I .q.O.us.(0: mul.qs.us ) we decide to compose MUL.( q:qs) from two subcomponents MUL.qs and SUM 1.q, in such a way that the latter implements sumt.q.0, defined by: sum 1.q.c_[ 1.6s = [ ] suml.u.c.(u:us).( b:bs) = d mod 2: suml.q.(d div 2).ashs whered=t.+q*a+ben Function sum l-q.0 has two parameters, us and 6s; therefore, *Al with two input channels a and 6, and one output Thannel z. to implement MUL.(q:qs), its communication behaviour, with respect to a and Z, must be the same as that of MUL.qs, viz. pref *[a?; z!]. Because of the symmetry between u and 6, we choose the communication behaviour of SUM 1.q to be pref *[(u?II b?); z!], Here, the operator “11” is called weauing; in our programs it denotes arbitrary interleaving of its operands.
J. C. Ebergen, R. R. Hzmgetwoord
208
Having introduced subcomponents for the implementation of mctl.qsand sum 1.9.0, we are left with the problem what to do with 0: in O:mul.qs.as. We might introduce a (very simple) subcomponent to implement it, but it is easier to modify the specification of SUM 1.9 slightly such that it realises function F defined by Eas.bs = sum t.9.O.as.(O:6s)
For this purpose, we rewrite the communication behaviour of SUM 1.9 as follows prof((a?l!b?);*[=!;(a? where we have used the property that, for any x and _v, pref(*[x;y])
= prefb;*[y;xl)
Function 0: in the definition of F can now be implemented by replacing the .f;tsf occurrence of b? by vb := 0. As a result, the communication behaviour of SUM 1.9 becomes pref(a?;*[t!;(a?/ Because of the simple (tail-)recursive structure of F, component SUM I.9 can be defined by means of iteration, instead of recursion. Parameter c of sum1 can be implemented as a local variable, with initial value 0. Thus, we obtain the following program for SUM 1.9.
;*[~,c:=(c+q*~~+~b)mod2 ,(c+9* tu+bo)div2
Remark. The above transformation of a functional program into a component is so simple that we do not wish to prove its correctness each time we apply this transformation. In order to avoid such proofs and still be convinced of the correctness of the design, we need a-yet to be formulated-theorem stating the precise conditions under which such a transformation is correct. This is a topic for further rcza+ that goes beyond the scope of this paper. The decomposition of component MUL.( 9:9s) into its subcomponents can now be expressed by the following program. In this program the subcomponents have been given tocal names s and t; notice that, in order to avoid name clushes between the channels of different (sub)components, the channel names of s and t have been prefixed qvith “s.” and “t.“. This (de)composition is illustrated in Fig. 1.
I~crit~atiim c$ a serial-parallel
multiplier
209
a=t.a ,
v
a=s-a L
1
\
t=t-z rl
I
SMUL l qs
.
rxwu1*q ,
o-b=s-r
Fig. I. First decomposition of Ml.JL.( 9:~s 1.
corn MUL.(q:qs) (a?, z!: d): sub s: MUL.qs; t:SUMl.q; bus La = Q, s.u = a, f-b t=s.2, 2 = t.t mot Subsequently, we investigate the response time of this desi n. Since the communication behaviout of MUL.qs is ptef *[a?; z!19 the fol communication behaviour actually emerges for SUM 1.q: pref(t??;*[z?;h?;a?]) That is, communications h? and a? are ordered because of the. presence of subcomponent MUL.qs. Hence, after each communication on z, the next communication on a is delayed until a communication on 6 has taken place. By induction on the structure of the decomposition, it then follows that after each z, #(q:qs) communication actions take place before a next communication on u can take place. Consequently, MUL.(q:qs) has a response time proportional to #(q:qs). Fortunately, the efficiency of this design can be improved. For this purpose, we notice that changing the order of the communications in z!;(u?(lb?) in SUM1.q does not affect the correctness of the design, as long as we do not introduce deadtack or livelock [3]. The most liberal ordering would be z$z?//b?, but, since the ordering with respect to u and z should be pref*[a?;z!], we restrict it to (z!;a?)[fb?. With the communication behaviour for SUM Lq given by
we infer that communications on z and h can take ptace in parallel. From this observation it follows that the response time of MUL.(q:qs) and MUL.qs are the same. Since MUL.[ ] has bounded response time, we infer by induction on the structure of the decomposition that MWL.(q:qs) has Next, -, L observe that the values on channel g XX ‘c ponents. This is correct as long as we consider communicatiorps Cck; synchronous. In this case, this means that a ?, t-a?, and s.a? are assumed to e executed “simultaneously”‘. Under the asynchronous interpretation used in the next section this design is, however, not correct. We remedy this by providing component SUM1 with an additional output channel y along which the values received via input a are sent to MULqs. This provides the extra synchronisation needed to allow asynchronous communications.
210
J.C.
Ebergun,
R. R. Hoogerwoord
We call the component thus obtained SUM2 Because the communication behaviour of MUL.qs is pref *[a?; z!], the communication behaviour of SUM2.9s with respect to b and y must be pref *[ J t; h?]. By combining this with the communication behaviour of SUM I we obtain the following communication behaviouy for SUM2: pref(a?;*[(r!;a?)
It so happens that we can indeed implement the correct value transfers between channel Q and y by means of this communication behaviour. These changes give rise to the following, final, programs for MUL.(9:9~) and SUM 2.9. corn MUL.(g:qs)(u?,
z!:.&):
sub s: MUL.qs; cSUMZ.9 bus t.u = a, CT.4 = t.y* Lb = s.2, 2 = t.t mot corn SUMZ9(u?, h?, _I-!,t!: 3): varc:& pref(u?/l ob:= 011c:= 0 ; *[t.y, uz, c:= t?u ,(c+q* tw+ub)mod2 ,(c+9* w+-ch)divZ ;( z!;u?)ll(_r!;b?)
1 1 mot
This decomposition is illustrated in Fig. 2. Finally, we remark that the second design for the serial-parallel multiplier has bounded response time as well. This can be seen as follows. Notice that in the repetition in SUM2.9 both the sequence z!; a? and the sequence y!; b? occur. These sequences may be interleaved arbitrarily. Consequently, in component MUL.( 9:9s) symbols from {t.u, t.z) can be generated at the same rate as symbols from (~6, L_I-}. Accordingly, MUL.(q:qs) and MUL.qs have the same response time. Since the Is”-+ component in the decomposition, MUf..[ 1, has bounded response time, i* f&x~s by induction that MUL.(9:9s) has bounded response time as P&L .
.
a=t-a *
t .y=s-a
\
t NJM2=q
S-MUL -qs \
t -b=s-z
Fig. 2. Second decomposition
of AfUL.(q:q.s).
\
Deritwarion c$ a serial-parallel
5.
multiplier
211
Towards a circuit implementation
In order to implement the parallel program by means of a circuit, we have to realise, among other things, synchronisation of the communications on a channel, We avoid the introduction of synchronisation primitives by requiring that the communications in our parallel program satisfy an additional condition: whenever a component in a network of communicating components can produce an output, the receiving component is in such a state that it can accept that output. This condition is sometimes called ahsePrce of computation interfirence. Notice that this requirement introduces an asymmetry between inputs and outputs with respect to their synchronisation: the process performing the output action initiates the communication and the process “at the other end of the channel” is always able to perform the corresponding input action. If we consider the network of components SUM 1.q (with the new ordering), MULqs, and their environment, we can characterise their respective communication behaviours with respect to the symbols a, !P,z by pref *[u?; h!],
p~f(a?;*[(z!;a?)IIb?]),
pref *[a!; z?]
We observe that after the common behaviour a ; z the environment can produce the next a. Component MULgs, however, is not yet able to accept the next u, since it
has to produce output 6 first. Consequently, the first decomposition does not satisfy the extra condition. If we consider the network of components SUM2.9, MULqs, and their environment, we can specify their respective communication behaviours with respect to the symbols (I, 6. y, i by pref( a?; *[( z!; a?#
y!; b?)]),
pref *[ y?; b!],
pref *[a!: z?]
Here the extra condition is indeed satisfied. In the next step towards a circuit implementation for component SUM2.9, we implement locat variable c by a channel. For this purpose we compose S~AO.9 from component SUM3.9 and an auxiliary channel pref *IX?; vc := VX; c!] which simply copies input values to its output. Component SUM3.9 is defined as follows: comStJM3.q(u?,b?,c?,x?,y!,z!:B): pref(u?l) ok= 011vc:= 0
; *[ vy, 02, vx := vu ,(vc+q* vu+vh)m ,(vc+q*vu+vb)div92
; (r!; u?)ll(y!; b?)lJ(x!;c?) 1 ) mot Notice that each variable in this program is associated with a channel and that
the concurrent
assignment in the repetition has a list of output variabfes as its
left-hand side and a list of expressions in terms of the input variables as its right-hand side. Because of this property, component SUM3.9 can be decomposed further into components that implement the proper initialisations for variables UC and vb, the expressions in the right-hand side of the concurrent assignment, and a, so-called, SY N-component. A SYN-component is a primitive component that synchronises the transfer of values between input and output channels. For example, suppose that the communication behaviours pref *[a?; z!]
and
pref *[ h?;y!]
should be synchronised each time when inputs (I and b have been received and that values received at channel u should be transferred to output channel z and s values received at channel h should be transferred to output channel _JZ The followmg simple SYN-component with input channels (I, h and output channels z, r; all of type .T, realises this synchronisation.
comSYN(a?,b?,y!,z!:.F): pref(a?/rh?;*[~~,~?=:= ub,tua;(z!; a?)ll(y!;
b?)])
mot
if we would have had the concurrent assignment c(_r;uz:=uu,cb, instead of EJOZ:= u&a, then values received on channel a and h would be transferred to channel _I and z respectively. For the implementation of the initialisation of the variables t?b and t;c to 0, we use two components IO. For implementing the functions in the concurrent assignment, we use a component M.9,which computes 9 * ~'4 for any input value UJ, and a, so-called, full udder FA. Notice that addition, mod 2 and div 2, of three bits is exactly the function implemented by a full adder. The complete decomposition of SUM3.9 is depicted in Fig. 3, whereas the complete design for MUL.qs is given in Fig. 4.Notice that MUL.[ ] can be implemented simply with a component CO that outputs the constant 0 after each received input. There are several ways to continue from here. One way is to choose for a synchronous implementation. The data communications are then encoded using a b?
x
z!
Y! Fig. 3. Decomposition
of SUM3.9.
Dcriva tion cf a serial- paralltd .multiplier
a?
213
-
c
. . .
Fig. 4. Complete decomposition of MULqs.
traditional level encoding and each beginning of a cycle is signaled by a clock pulse. The components IO, FA, and M.9 are implemented in the standard way, where component 10 boils down to some initialisation circuitry, component FA to a full adder, and component M.9 to a simple AND gate of which one input is connected to the constant 9 and the other to channel a. The SYN-components are implemented by clocked registers. All registers in the lower half of Fig. 4 are clocked on the same phase, and all registers in the upper half are clocked on the same, opposite phase. The response time of this synchronous implementation, however, also depends on the length of the pipeline, because of the clock distribution. In order to avoid timing problems introduced by the clock and to add some more flexibility to the design, one can implement the design as an asynchronous circuit. In this case, one can encode the data communications using a data-bundling encoding scheme as explained in [2,9], for example. In such an implementation, however, there are some delay constraints that have to be adhered to. In case one wants an implementation in which there are no delay constraints, one can implement the design as a delaj+tsensitiue circuit [6]. Then, the correctness of the design is independent of any delays in the response times of the elements and connection wires. In this case, the data communications may be encoded by a double-rail two-cycle signaling scheme, for example. This last alternative is explained in [8]More formal characterisations of several types of asynchronous circuits can be found in [I]. In both asynchronous implementations mentioned above, the response time is independent of the length of the pipeline.
7. Concluding remarks There is something strange about our specifications. We have designed a machine that on input of k bits produces k bits of output, for any k. A convenient way to specify such machines is to consider them as functions from infinite lists to infinite lists. Infinite lists do not represent numbers, but their finite prefixes do. The obvious way, therefore, is to specify our function mu1 as foIlows. For finite list 9s of bits
J. C. Ebergen, R. R. Hcwgerword
214
and infinite list US of bits, mul.ys.as is an infinite list of bits satisfying-with denoting the finite prefix of xs of length i, for any infinite list S.S-
This specification,
however, is rather awkward because of the occurrences
mod 2’; now, all sorts of properties of div and mod
XS~ i
of t i and
sneek into the derivations. The
use of finite lists avoids this complexity. Although the specification of the multiplier mentions a single multiplication only, the machine can be used repeatedly (for fixed 9s). We have mul.qs.(4s * ( zefoes.# 4s ) ft hs ) = (mul.~s.(as*~ef~s.#gs))~( muf.9s.hs) if after every input sequence us. #9s zeroes are input, the multiplier USB. produces the output sequences corresponding to the multiplications One important difference between our programs and the traditional CSP programs is the asymmetry between input and output actions. In CSP semantics, inputs and outputs are synchronised by definition, whereas in our program notation we have to prove that correct synchronisation takes place, i.e., that an output can always be accepted as an input. The asymmetry in the synchronisation of communications simplifies the transformation from a program to a circuit implementation. One could also choose the traditional CSP semantics and transform a program into a circuit implementation using a handshake protocol for implementing the synchronisation of input and output actions. This approach is applied by Martin [S]. In the literature several serial-parallel multipliers have been presented. In [IO] a (synchronous) pipeline design of a serial-parallel multiplier is given. This design also has bounded response time, but takes #9s clock cycles to respond with the first output bit. In [3], a similar design for a serial-parallel multiplier is presented briefly using the traditional CSP semantics. Our design of the serial-parallel multiplier also shows similarity with the designs for bit convolution and polynomial multiplication/division given in 171. Pn our approach, however, we have separated the discussions on the values to be computed and on the pattern of communications needed for these values as much as possible. We believe that this contributes to a better disentanglement of different aspects of the problem. For the former, functional programming techniques -such as discussed in [4]-turn out to be valuable, whereas formal techniques for the latter still require further development. Consequently,
Acknowledgement
Maitham Shams is gratefully acknowledged for directing our attention to the design of delay-insensitive multipliers. We also thank the Eindhoven VLSI Club for their stimulating criticisms on an earlier version of this paper. This work was partially supported by the Natural Sciences and Engineering Research Council of Canada under Grant OGPO041920.
Derivation of a serial-parallel
multiplier
215
References
J.A. Brzotowski and J.C. Ebcrgen. Recent developments in the design of asynchronous circuits, in: J. <‘Grik. J. Demcirovics and F. Gfcseg. eds., Fundamentals of Computatitm 7kq; Lecture Notes in C’omputer Science ,3&O(Springer, Berlin, 1989) 78-94. W.A. Clark et al., Macromodular (AFIPS. C.A.R.
computer systems, in: heeedings
Springhrnt Computer Conference
Reston, VA, 1967) 335-393. Hoare. Communicating Sequential .hcesses (Prentice-Hall
International,
R-R. Hoogerwoord. The design of functional programs: A calculational Eindhoven University of Technology i 1989). A.J. Martin.
Formal program transformations
Formal Ikvxhpment
of hograms
and ?hqfs
London, 1$&s,.
approach,
Ph.D. Thesis,
for VLSI circuit synthesis, in: E.W. Dijkstra,
ed.,
I Addison-Wesley, Reading, MA, 1989) 59-80.
C.E. Molnar. T.P. Fang and F.U. Rosenberger, Synthesis of delay-insensitive modules. in: H. Fuchs, cd.. hceedings
19‘95 Chapel Will Conference on VLSI
1985 167-86. M. Rem, Multiplication of hgrams
ond
PUJ#
and division of polynomials. (Addison-Wesley,
(Computer
Science Press, Rockville,
in: E.W. Dijkstra, cd., hrmai
MD,
Devehpmtwt
Reading, MA, 1989) 159-169.
C.L. Seitz. System timing. in: C. Mead and L. Conway, eds. introdui*tion ta VLSI Systems (AddisonWesley. Reading. MA. 1980) 218-262. I.E. Sutherland. Micropipelines. Comm. ACM 32 t 19891 720-738. N. Weste and K. Eshragian, Principles qf CMOS VLSI Design: A Systems Perspective (AddisonWesley. Reading, MA. 198s).