A derivation of a serial-parallel multiplier

A derivation of a serial-parallel multiplier

Science of Computer North-Holland 15 ( 1990) 201-215 Programming A DERIVATION OF A SERIAL-PARALLEL 201 MULTIPLIER Rob R. HOOGERWOORD f~eparrmcn...

1MB Sizes 39 Downloads 75 Views

Science of Computer North-Holland

15 ( 1990) 201-215

Programming

A DERIVATION

OF A SERIAL-PARALLEL

201

MULTIPLIER

Rob R. HOOGERWOORD f~eparrmcnr

qf Muhemarics and Computing Science. Eindhtmw Unit~ersig t$ rechnokogy,

P.0. Box 513. _.WOCIbfB Eindhtwen. Nerherlands

Revised June 1Y90

Abstnct.

A serial-parallel

circuit implementauon.

multiplier

is developed systematicatfy from functional specification to

First. a functional

for a systolic computation

program is derived and, second, a parailel program

is constructed. The parallel program is derived from the functional

program. Both synchronous and asynchronous circuit implementations are discussed- The latter implementation

for the paralfei program

has a pipetfne structure with bounded response time.

1. Introduction

In VLS! textbooks serial-parallel multipliers are usually explained in terms of pictures and diagrams. The purpose of this paper is to derive a design of a serial-parallel multiplier in a calculational style, thereby revealing in detail all properties used in such a derivation. Such a derivation starts with a functional specification and ends with a circuit implementation. The derivation consists of three steps. In the first step we derive a functional program for the serial-parallel multiplier. In the second step we derive a parallel program that expresses serial-parallel multiplication as a distributed computation performed by communicating components. The difference between the functional and the parallel program is that the parallel program not only s computed t y each component, but also the M&V in* among the components. In the last step we briefly indicate how the parallel program can be mapped on several types of circuit implementations. Such an implementation may be a. synchronous or an asynchronous circuit. Our design allows a nice pipelined implementation. Furthermore, its response time is bounded, i.e., after receipt of each input the next output is produced within an amount of time that is independent of the length of the pipeline. 0167~6423/90/%3.50

@ 199~Elsevier

Science Publishers B.V. (North-Holland)

1. C. Ehergen, R. R. Hoogerwoord

202

2. Specification The problem is to design an efficient machine that multiplies numbers by a fixed number. Here, numbers are natural numbers. The numbers input to and output from the machine are represented by lists of binary digits (“bits”), in the usual way. In order to formalist: the specification, we introduce an abstraction function maps finite lists of bits to the numbers they represent. Its definition is = (Gi:OS i< #hs:(&s.i)

* 2’)

where #b_s denotes the leqqth of list hs of bits and hs.i denotes element i in list 6s. Notice that in this representation the least significant bits occur at the head of the list. Without proof we state that this representation has the following properties. For bit b and list b,r of bits, we have

pj<2”

zs

n is representable by a list of length k

(3)

where [ ] denotes the empty list and where .*:” (cons) denotes list prefixing with one element. First, we consider the machine as a function mul, say, that maps lists of bits to lists of bits. Although one of the operands of the multiplications performed by the machine remains fixed, we do introduce a parameter for this operand. Thus, we increase our manipulative freedom. For lists 9s and us of bits, rnul.9~~~ is a list of bits satisfying #

mdqs.as

= #as

(4)

(5) (Function application is denoted by “.” .) Parameter 9s represents the fixed multiplicand of machine muL9s. This machine produces a list of bits for its output that is as long as its input, which is represented by parameter as; hence (4). In order that (4) be satisfiable, the product [9sj* [as] must be representable by a list of length #as. On account of (3) this amounts to [9sj * [USI < 2”“‘

(6)

Therefore, we impose (6) as a precondition onto the parameters of marl. specification of function nrul now reads (6) +

(4) A(5)

o he

for lists 95 and as of bits.

Remark. Actually, we are heading for a machine that outputs one bit after each reception of a bit via its input. This is possible, because the lesser significant bits of the (representation of the) product depend on the lesser significant bits of the operands only.

Derit~arian

qf a serial-parallel

multiplier

203

Precondition (6) is not too restrictive. For arbitrary qs and ats, it can be met by suffixing us with #qs zeroes. Notice that as t-+( :eroes. # qs )I = [as 1 where 2eroes.k represents the list consisting of k zeroes, and that

3. Derivation

We derive a recursive definition for mu1 by induction on qs. We start with (5), since this is the more important part of the specification. Base: fimul.[ ].aslj = {Specification of mu!

=

(I[in* bn

(5))

(Definition of [ I( 1)9arithmetic} 0

=

(zeroes.#as is a list of # QSzeroes, definition of [ 1) [ zeroes.# as1

If it were not for (4), we could have chosen mul.[ ].as = [ 1. Now we must choose mul.[ ;.as = zeroes.#as which satisfies all conditions of the specification of mul. Step : mul.( q:qs).asj = (Specification of mu1 (5 )} [

ucqmn *u4 = (Definition of [ n(2))

(q+2 * uq.sn)* [asI] {arithmetic) 4 f [as1 +2 * lqsn * [asI = {Specification of mu1 (by induction hypothesis), see below) 4 * uusn + 2 * [mul.qs.asfy = {Definition of [ n(2)) q * [ asI + [O:mul.qs.asjj = {Specification of sum, see below} [sum.q.as.(O: mul.qs.as)j 2

204

J.C.

Ebergen,

R. R. Hoogerwourd

The introduction of a recursive application of mu! generates a proof obligation: its arguments must satisfy the precondition of muf. This proof obligation is

which is trivially satisfied, because (I qs c[q:qsj) A (05 We continue with a specification for the newly introduced function sum. In the last line of the above derivation we have introduced sum in order to define mul.(q:qs).as

= sum.q.as.(O: mu1.qs.a.f)

This is correct if, for bit q and bit sequences as and hs, sum.q.ar.bs

is a bit sequence

satisfying (7) *

@)A (9)

with (7)

(I* #(sum.q.as.bs)

= #as

(8) (9)

(7) is (again) necessary to make (8) n (9) satisfiable. Conditions (8) and (9) follow from the above derivation and the specification of mu!, in particular (4j. That (7) holds for sum’s arguments in sum.q.as.(O:mulqs.us), follows from precondition (6) of mul and the above derivation. We now derive a definition for sum by induction on us and 6s. Recondition

Base: [sum.q.[ ].bsl = (Specification of sum (9)) q *II1 11+uwl =

(By precondition

(7): q * [[ ]

0 =

(Definition of [ I( 1))

Ill Ill So we choose sum.q.[ 1.6s = [ 1, which also satisfies (8). Similarly, we can derive [sum.q.as.[ ]I = q * [USI. Although it is possible to +ewr% expression q * [asI further into an expression satisfying 8S 1, i: ;% srmprer to avotd it. We strengthen sum’s precondition with #as<

#bs

From this precondition it follows that (bs = [ ])*(a~ = [ 1). Consequently, there is no need for a separate definition of sum.q.as.[ 1. Notice also that this precondition is satisfied in sum’s application in the definition mul.(q:qs).as =

Derivation of a serial-parallel

multiplier

205

sum.q.as.(O:mul.qs.as ). Step: [sum.q.( a:as).( 6: bs)jj = {Specification of sum (9)) =

{Definition ofi ] (2)) q*(a+2*fJasB)+b+2*[bsj = (arithmetic) (9 * a+b)+2 * (q * gasD+[bsj) = (induction hypothesis) (q*a+b)+2*(sum.q.as.bs) = {(q*a+b)isnotabit,(q*a+b)mod2is} (9*a+b)mod2+2*((q*a+b)div2+[sum.9.as.b.s~) Now we are stuck: we see no way to get rid of the term (9 * a + 6) div 2. We observe, however, that both the expressions [sum.q.as.bslf and (9 * a + b) div 2 + fsum.q.as.bsl are of the form c+[sum.q.as.bsl for some c, OS c (actually, 06 c < 2, but we do not need this here). Therefore, we generalise function sum-using a technique called generalisation by abstraction in [4]-as follows. We introduce function sum1 that for riatural c, bit q, and bit sequences as and 6s satisfies (IO) *

(ll)A(f2)

where c+q*[asl+[bs]c2”“’ #(suml.q.c.as.bs) [sum 1.q.c.as.bsj

A #ass#bs = #as

= c + q * [aslf + [bsJ

Notice that sum I is indeed a generalisation of sum, because sum.q = sum 1.9.0 By (very) similar derivations as above we SAGS for NW.5 suml.q.c.[ 1.6s = [ ] suml.q.c.(a:as).(b:bs)=d

mod2:sumkq.(d div2).as.bs whered=c+q*a+ben

Since we used a recursive function application in the second line above, we must verify that this recursive application satisfies precondition (IO). This is easy.

Combining all results we obtain mul.[ ].as = zeroes.# as mul.( q:qs).as

= sutn l.9.O.a.~.~O:rnul.9s.a.~)

su?n l.g.c.[ ].h = [ ] suml.g.c.(a:as).(h:hs)

= d mod 2: suml.q.(d div 2).as.h whered=c+q*a+hend

Finally, we remark that the above derivation can be given also for number representations with bases other than 2. The dnly changes in the program and the derivation are that all occurrences of constant 2 must be replaced by the new base.

4. Towards a parallel program Our next task is to implement the above functional program as a set of communicating components. Each component should carry out a simple computation, like addition modulo 2 or (integer) division by 2. The challenge here is to design the pattern of the communications among these components in such a way that the composite performs the specified computation. Furthermore, we wish our design to be as efficient as possible. For example, we try to achieve bounded response time. The communication behaviour between a component and its environment is specified by a program as well. This program prescribes the order in which input and output values are communicated via the channels between the component and its environment. Our program notation is a simple CSP-like notation [3], where communications on a common charm<;are assumed to be synchronised by definition. Later, we transform the program into a version in which communications may be considered to be asynchronous. This simplifies its implementation as a circuit. First, we specify the order of the communications of multiplier MUL9s with input channel a and output channel z. The sequence of values communicated via a is list as and the sequence of values communicated via z is list zs; i.e., in order that MW.qs is an implementation of mu!.9s, we must have zs = mul.qs.as. The input sequence is input serially in order of increasing index, and the multiplier produces an output bit after each receipt of an input bit. This communication behaviour is expressed by pref *[a?; z!].

Here, “a?” denotes an input action at channel a, “z!” denotes zn output action at channel z, ‘*;” denotes concatenation, l ‘*[ . . -1” denotes repetition of the enclosed (also called Kleene’s closure), and pref denotes, so-called, prefix-closure. (That is, pref E indicates that any prefix of a sequence expressed by E is a valid sequence.) Examples of the sequences defined by pref *[a?; t!] are E (the empty sequence), az, azaz, also a, and aza, but not aa, nor x

lkvioation

qf a

serial-parallel

multiplier

207

Besides the order of the communications between component and environment, we must also specify which values are communicated along the channels. A suitable program for MUL.[ ] is mm MUL.[ ] (a?, z!:d3): pref *[a?;tf_T:=O;f!] mot

This program expresses that component MUL.[ ] has two channels: input channel Q and output channel z. With each channel a variable is associated: ~.MI for channel a and uz for channel z. Both variables are binary variables, i.e., VU,UZE where .#I = (0,l). Furthermore, a? denotes receipt of a value at channel u, after ch this value is assigned to variable MT,and z! denotes sending the value of uz along channel _. - Values received at channel a are always assigned to DQand output actions at channel 2 always output the value of DZ Accordingly, u? and z! can be considered as abbreviations of u ? uu and z! OZJThe above program prescribes that, repeatedly, a value is received at input channel a and, subsequently, the value 0 is output alon channel :. Obviously, this realises 2s = mul.[ &us. Notice that, when we strip all value information from the above program, we are left with pref *[u?; t!], which expresses the desired order of the communications.

5. A parallel program

In the previous section we have decided that the communication behaviour of MUL.qs will be pref *[u?;z! 1. We now derive, in a number of steps, a (recursive) decomposition for component MUL.( q:qs). Inspired by the definition mul.( q: qs).us = sum I .q.O.us.(0: mul.qs.us ) we decide to compose MUL.( q:qs) from two subcomponents MUL.qs and SUM 1.q, in such a way that the latter implements sumt.q.0, defined by: sum 1.q.c_[ 1.6s = [ ] suml.u.c.(u:us).( b:bs) = d mod 2: suml.q.(d div 2).ashs whered=t.+q*a+ben Function sum l-q.0 has two parameters, us and 6s; therefore, *Al with two input channels a and 6, and one output Thannel z. to implement MUL.(q:qs), its communication behaviour, with respect to a and Z, must be the same as that of MUL.qs, viz. pref *[a?; z!]. Because of the symmetry between u and 6, we choose the communication behaviour of SUM 1.q to be pref *[(u?II b?); z!], Here, the operator “11” is called weauing; in our programs it denotes arbitrary interleaving of its operands.

J. C. Ebergen, R. R. Hzmgetwoord

208

Having introduced subcomponents for the implementation of mctl.qsand sum 1.9.0, we are left with the problem what to do with 0: in O:mul.qs.as. We might introduce a (very simple) subcomponent to implement it, but it is easier to modify the specification of SUM 1.9 slightly such that it realises function F defined by Eas.bs = sum t.9.O.as.(O:6s)

For this purpose, we rewrite the communication behaviour of SUM 1.9 as follows prof((a?l!b?);*[=!;(a? where we have used the property that, for any x and _v, pref(*[x;y])

= prefb;*[y;xl)

Function 0: in the definition of F can now be implemented by replacing the .f;tsf occurrence of b? by vb := 0. As a result, the communication behaviour of SUM 1.9 becomes pref(a?;*[t!;(a?/ Because of the simple (tail-)recursive structure of F, component SUM I.9 can be defined by means of iteration, instead of recursion. Parameter c of sum1 can be implemented as a local variable, with initial value 0. Thus, we obtain the following program for SUM 1.9.

;*[~,c:=(c+q*~~+~b)mod2 ,(c+9* tu+bo)div2

Remark. The above transformation of a functional program into a component is so simple that we do not wish to prove its correctness each time we apply this transformation. In order to avoid such proofs and still be convinced of the correctness of the design, we need a-yet to be formulated-theorem stating the precise conditions under which such a transformation is correct. This is a topic for further rcza+ that goes beyond the scope of this paper. The decomposition of component MUL.( 9:9s) into its subcomponents can now be expressed by the following program. In this program the subcomponents have been given tocal names s and t; notice that, in order to avoid name clushes between the channels of different (sub)components, the channel names of s and t have been prefixed qvith “s.” and “t.“. This (de)composition is illustrated in Fig. 1.

I~crit~atiim c$ a serial-parallel

multiplier

209

a=t.a ,

v

a=s-a L

1

\

t=t-z rl

I

SMUL l qs

.

rxwu1*q ,

o-b=s-r

Fig. I. First decomposition of Ml.JL.( 9:~s 1.

corn MUL.(q:qs) (a?, z!: d): sub s: MUL.qs; t:SUMl.q; bus La = Q, s.u = a, f-b t=s.2, 2 = t.t mot Subsequently, we investigate the response time of this desi n. Since the communication behaviout of MUL.qs is ptef *[a?; z!19 the fol communication behaviour actually emerges for SUM 1.q: pref(t??;*[z?;h?;a?]) That is, communications h? and a? are ordered because of the. presence of subcomponent MUL.qs. Hence, after each communication on z, the next communication on a is delayed until a communication on 6 has taken place. By induction on the structure of the decomposition, it then follows that after each z, #(q:qs) communication actions take place before a next communication on u can take place. Consequently, MUL.(q:qs) has a response time proportional to #(q:qs). Fortunately, the efficiency of this design can be improved. For this purpose, we notice that changing the order of the communications in z!;(u?(lb?) in SUM1.q does not affect the correctness of the design, as long as we do not introduce deadtack or livelock [3]. The most liberal ordering would be z$z?//b?, but, since the ordering with respect to u and z should be pref*[a?;z!], we restrict it to (z!;a?)[fb?. With the communication behaviour for SUM Lq given by

we infer that communications on z and h can take ptace in parallel. From this observation it follows that the response time of MUL.(q:qs) and MUL.qs are the same. Since MUL.[ ] has bounded response time, we infer by induction on the structure of the decomposition that MWL.(q:qs) has Next, -, L observe that the values on channel g XX ‘c ponents. This is correct as long as we consider communicatiorps Cck; synchronous. In this case, this means that a ?, t-a?, and s.a? are assumed to e executed “simultaneously”‘. Under the asynchronous interpretation used in the next section this design is, however, not correct. We remedy this by providing component SUM1 with an additional output channel y along which the values received via input a are sent to MULqs. This provides the extra synchronisation needed to allow asynchronous communications.

210

J.C.

Ebergun,

R. R. Hoogerwoord

We call the component thus obtained SUM2 Because the communication behaviour of MUL.qs is pref *[a?; z!], the communication behaviour of SUM2.9s with respect to b and y must be pref *[ J t; h?]. By combining this with the communication behaviour of SUM I we obtain the following communication behaviouy for SUM2: pref(a?;*[(r!;a?)

It so happens that we can indeed implement the correct value transfers between channel Q and y by means of this communication behaviour. These changes give rise to the following, final, programs for MUL.(9:9~) and SUM 2.9. corn MUL.(g:qs)(u?,

z!:.&):

sub s: MUL.qs; cSUMZ.9 bus t.u = a, CT.4 = t.y* Lb = s.2, 2 = t.t mot corn SUMZ9(u?, h?, _I-!,t!: 3): varc:& pref(u?/l ob:= 011c:= 0 ; *[t.y, uz, c:= t?u ,(c+q* tw+ub)mod2 ,(c+9* w+-ch)divZ ;( z!;u?)ll(_r!;b?)

1 1 mot

This decomposition is illustrated in Fig. 2. Finally, we remark that the second design for the serial-parallel multiplier has bounded response time as well. This can be seen as follows. Notice that in the repetition in SUM2.9 both the sequence z!; a? and the sequence y!; b? occur. These sequences may be interleaved arbitrarily. Consequently, in component MUL.( 9:9s) symbols from {t.u, t.z) can be generated at the same rate as symbols from (~6, L_I-}. Accordingly, MUL.(q:qs) and MUL.qs have the same response time. Since the Is”-+ component in the decomposition, MUf..[ 1, has bounded response time, i* f&x~s by induction that MUL.(9:9s) has bounded response time as P&L .

.

a=t-a *

t .y=s-a

\

t NJM2=q

S-MUL -qs \

t -b=s-z

Fig. 2. Second decomposition

of AfUL.(q:q.s).

\

Deritwarion c$ a serial-parallel

5.

multiplier

211

Towards a circuit implementation

In order to implement the parallel program by means of a circuit, we have to realise, among other things, synchronisation of the communications on a channel, We avoid the introduction of synchronisation primitives by requiring that the communications in our parallel program satisfy an additional condition: whenever a component in a network of communicating components can produce an output, the receiving component is in such a state that it can accept that output. This condition is sometimes called ahsePrce of computation interfirence. Notice that this requirement introduces an asymmetry between inputs and outputs with respect to their synchronisation: the process performing the output action initiates the communication and the process “at the other end of the channel” is always able to perform the corresponding input action. If we consider the network of components SUM 1.q (with the new ordering), MULqs, and their environment, we can characterise their respective communication behaviours with respect to the symbols a, !P,z by pref *[u?; h!],

p~f(a?;*[(z!;a?)IIb?]),

pref *[a!; z?]

We observe that after the common behaviour a ; z the environment can produce the next a. Component MULgs, however, is not yet able to accept the next u, since it

has to produce output 6 first. Consequently, the first decomposition does not satisfy the extra condition. If we consider the network of components SUM2.9, MULqs, and their environment, we can specify their respective communication behaviours with respect to the symbols (I, 6. y, i by pref( a?; *[( z!; a?#

y!; b?)]),

pref *[ y?; b!],

pref *[a!: z?]

Here the extra condition is indeed satisfied. In the next step towards a circuit implementation for component SUM2.9, we implement locat variable c by a channel. For this purpose we compose S~AO.9 from component SUM3.9 and an auxiliary channel pref *IX?; vc := VX; c!] which simply copies input values to its output. Component SUM3.9 is defined as follows: comStJM3.q(u?,b?,c?,x?,y!,z!:B): pref(u?l) ok= 011vc:= 0

; *[ vy, 02, vx := vu ,(vc+q* vu+vh)m ,(vc+q*vu+vb)div92

; (r!; u?)ll(y!; b?)lJ(x!;c?) 1 ) mot Notice that each variable in this program is associated with a channel and that

the concurrent

assignment in the repetition has a list of output variabfes as its

left-hand side and a list of expressions in terms of the input variables as its right-hand side. Because of this property, component SUM3.9 can be decomposed further into components that implement the proper initialisations for variables UC and vb, the expressions in the right-hand side of the concurrent assignment, and a, so-called, SY N-component. A SYN-component is a primitive component that synchronises the transfer of values between input and output channels. For example, suppose that the communication behaviours pref *[a?; z!]

and

pref *[ h?;y!]

should be synchronised each time when inputs (I and b have been received and that values received at channel u should be transferred to output channel z and s values received at channel h should be transferred to output channel _JZ The followmg simple SYN-component with input channels (I, h and output channels z, r; all of type .T, realises this synchronisation.

comSYN(a?,b?,y!,z!:.F): pref(a?/rh?;*[~~,~?=:= ub,tua;(z!; a?)ll(y!;

b?)])

mot

if we would have had the concurrent assignment c(_r;uz:=uu,cb, instead of EJOZ:= u&a, then values received on channel a and h would be transferred to channel _I and z respectively. For the implementation of the initialisation of the variables t?b and t;c to 0, we use two components IO. For implementing the functions in the concurrent assignment, we use a component M.9,which computes 9 * ~'4 for any input value UJ, and a, so-called, full udder FA. Notice that addition, mod 2 and div 2, of three bits is exactly the function implemented by a full adder. The complete decomposition of SUM3.9 is depicted in Fig. 3, whereas the complete design for MUL.qs is given in Fig. 4.Notice that MUL.[ ] can be implemented simply with a component CO that outputs the constant 0 after each received input. There are several ways to continue from here. One way is to choose for a synchronous implementation. The data communications are then encoded using a b?

x

z!

Y! Fig. 3. Decomposition

of SUM3.9.

Dcriva tion cf a serial- paralltd .multiplier

a?

213

-

c

. . .

Fig. 4. Complete decomposition of MULqs.

traditional level encoding and each beginning of a cycle is signaled by a clock pulse. The components IO, FA, and M.9 are implemented in the standard way, where component 10 boils down to some initialisation circuitry, component FA to a full adder, and component M.9 to a simple AND gate of which one input is connected to the constant 9 and the other to channel a. The SYN-components are implemented by clocked registers. All registers in the lower half of Fig. 4 are clocked on the same phase, and all registers in the upper half are clocked on the same, opposite phase. The response time of this synchronous implementation, however, also depends on the length of the pipeline, because of the clock distribution. In order to avoid timing problems introduced by the clock and to add some more flexibility to the design, one can implement the design as an asynchronous circuit. In this case, one can encode the data communications using a data-bundling encoding scheme as explained in [2,9], for example. In such an implementation, however, there are some delay constraints that have to be adhered to. In case one wants an implementation in which there are no delay constraints, one can implement the design as a delaj+tsensitiue circuit [6]. Then, the correctness of the design is independent of any delays in the response times of the elements and connection wires. In this case, the data communications may be encoded by a double-rail two-cycle signaling scheme, for example. This last alternative is explained in [8]More formal characterisations of several types of asynchronous circuits can be found in [I]. In both asynchronous implementations mentioned above, the response time is independent of the length of the pipeline.

7. Concluding remarks There is something strange about our specifications. We have designed a machine that on input of k bits produces k bits of output, for any k. A convenient way to specify such machines is to consider them as functions from infinite lists to infinite lists. Infinite lists do not represent numbers, but their finite prefixes do. The obvious way, therefore, is to specify our function mu1 as foIlows. For finite list 9s of bits

J. C. Ebergen, R. R. Hcwgerword

214

and infinite list US of bits, mul.ys.as is an infinite list of bits satisfying-with denoting the finite prefix of xs of length i, for any infinite list S.S-

This specification,

however, is rather awkward because of the occurrences

mod 2’; now, all sorts of properties of div and mod

XS~ i

of t i and

sneek into the derivations. The

use of finite lists avoids this complexity. Although the specification of the multiplier mentions a single multiplication only, the machine can be used repeatedly (for fixed 9s). We have mul.qs.(4s * ( zefoes.# 4s ) ft hs ) = (mul.~s.(as*~ef~s.#gs))~( muf.9s.hs) if after every input sequence us. #9s zeroes are input, the multiplier USB. produces the output sequences corresponding to the multiplications One important difference between our programs and the traditional CSP programs is the asymmetry between input and output actions. In CSP semantics, inputs and outputs are synchronised by definition, whereas in our program notation we have to prove that correct synchronisation takes place, i.e., that an output can always be accepted as an input. The asymmetry in the synchronisation of communications simplifies the transformation from a program to a circuit implementation. One could also choose the traditional CSP semantics and transform a program into a circuit implementation using a handshake protocol for implementing the synchronisation of input and output actions. This approach is applied by Martin [S]. In the literature several serial-parallel multipliers have been presented. In [IO] a (synchronous) pipeline design of a serial-parallel multiplier is given. This design also has bounded response time, but takes #9s clock cycles to respond with the first output bit. In [3], a similar design for a serial-parallel multiplier is presented briefly using the traditional CSP semantics. Our design of the serial-parallel multiplier also shows similarity with the designs for bit convolution and polynomial multiplication/division given in 171. Pn our approach, however, we have separated the discussions on the values to be computed and on the pattern of communications needed for these values as much as possible. We believe that this contributes to a better disentanglement of different aspects of the problem. For the former, functional programming techniques -such as discussed in [4]-turn out to be valuable, whereas formal techniques for the latter still require further development. Consequently,

Acknowledgement

Maitham Shams is gratefully acknowledged for directing our attention to the design of delay-insensitive multipliers. We also thank the Eindhoven VLSI Club for their stimulating criticisms on an earlier version of this paper. This work was partially supported by the Natural Sciences and Engineering Research Council of Canada under Grant OGPO041920.

Derivation of a serial-parallel

multiplier

215

References

J.A. Brzotowski and J.C. Ebcrgen. Recent developments in the design of asynchronous circuits, in: J. <‘Grik. J. Demcirovics and F. Gfcseg. eds., Fundamentals of Computatitm 7kq; Lecture Notes in C’omputer Science ,3&O(Springer, Berlin, 1989) 78-94. W.A. Clark et al., Macromodular (AFIPS. C.A.R.

computer systems, in: heeedings

Springhrnt Computer Conference

Reston, VA, 1967) 335-393. Hoare. Communicating Sequential .hcesses (Prentice-Hall

International,

R-R. Hoogerwoord. The design of functional programs: A calculational Eindhoven University of Technology i 1989). A.J. Martin.

Formal program transformations

Formal Ikvxhpment

of hograms

and ?hqfs

London, 1$&s,.

approach,

Ph.D. Thesis,

for VLSI circuit synthesis, in: E.W. Dijkstra,

ed.,

I Addison-Wesley, Reading, MA, 1989) 59-80.

C.E. Molnar. T.P. Fang and F.U. Rosenberger, Synthesis of delay-insensitive modules. in: H. Fuchs, cd.. hceedings

19‘95 Chapel Will Conference on VLSI

1985 167-86. M. Rem, Multiplication of hgrams

ond

PUJ#

and division of polynomials. (Addison-Wesley,

(Computer

Science Press, Rockville,

in: E.W. Dijkstra, cd., hrmai

MD,

Devehpmtwt

Reading, MA, 1989) 159-169.

C.L. Seitz. System timing. in: C. Mead and L. Conway, eds. introdui*tion ta VLSI Systems (AddisonWesley. Reading. MA. 1980) 218-262. I.E. Sutherland. Micropipelines. Comm. ACM 32 t 19891 720-738. N. Weste and K. Eshragian, Principles qf CMOS VLSI Design: A Systems Perspective (AddisonWesley. Reading, MA. 198s).