Modeling power consumption in arithmetic operators

Modeling power consumption in arithmetic operators

MICROELECTRONIC ENGINEERING ELSEVIER Microelectronic Engineering 39 (1997) 245-253 Modeling Power Consumption in Arithmetic Operators Alain G u y o...

659KB Sizes 0 Downloads 75 Views

MICROELECTRONIC ENGINEERING

ELSEVIER

Microelectronic Engineering 39 (1997) 245-253

Modeling Power Consumption in Arithmetic Operators Alain G u y o t a n d S r l i m A b o u - S a m r a TIMA L a b o r a t o r y , 46 A v e n u e Fdlix Viallet, F 3 8 0 3 1 G r e n o b l e

Abstract: T h i s c h a p t e r will f i r s t a d d r e s s t h e following i s s u e s : w h y a voltage transition causes power dissipation, what causes a transition, what are useful and redundant transitions, how information r e d u n d a n c y m a y r e d u c e t h e n u m b e r of t r a n s i t i o n s , h o w to m a k e i n f o r m a t i o n r e d u n d a n t b y a d d i n g d e p e n d a n t bits, h o w to s t a t i s t i c a l l y m e a s u r e the average n u m b e r of t r a n s i t i o n s (or activity) a n d r e d u c e it t h r o u g h r e d u n d a n c y . T h e n t h i s c h a p t e r will c o n c e n t r a t e o n a d d i t i o n a n d a n s w e r t h e f o l l o w i n g q u e s t i o n s : h o w to compute the average activity using s t a t i s t i c s , h o w to m o d e l a c t i v i t y in t h e simplest adder: the carry ripple adder, h o w to e x t e n d t h e m o d e l to c a r r y s e l e c t a d d e r , to c a r r y l o o k a h e a d a d d e r a n d finally h o w to m a k e a d d e r s r e d u n d a n t .

I. REDUCING POWER D e s i g n i n g l o w - p o w e r h i g h - s p e e d circ u i t s r e q u i r e s a c o m b i n a t i o n of t e c h n i q u e s at four levels: technology, circuitry, a r c h i t e c t u r e s a n d a l g o r i t h m s [BCS92]. This work concentrates on the archit e c t u r e level a n d c o n s i d e r s a CMOS s t a t i c technology. Let u s c o n s i d e r a p a r t of a circuit t h a t we call a f u n c t i o n a l cell. T h e p o w e r c o n s u m p t i o n of a f u n c t i o n a l cell is h i g h l y d a t a d e p e n d e n t . So is t h e cell d e l a y . For e x a m p l e , if a c o m b i n a t o r i a l cell r e c e i v e s twice t h e s a m e i n p u t vector, t h e d e l a y a s well a s t h e p o w e r c o n s u m p t i o n will b e z e r o . N e v e r t h e l e s s , t h e a p p r o a c h e s to e v a l u a t e t h e p o w e r a n d t h e d e l a y a r e quite different. F o r t h e delay, w e a r e i n t e r e s t e d in t h e w o r s t c a s e , g i v e n b y t h e s l o w e s t p o s s i b l e p a t h in t h e c i r c u i t , c a l l e d t h e critical path. For the power, we are i n t e r e s t e d in t h e a v e r a g e c o n s u m p t i o n , so t h e a p p r o a c h will b e statistical. It is w o r t h n o t i n g t h a t in s e l f - t i m e d d e s i g n , w e a r e also i n t e r e s t e d in a v e r a g e d e l a y s a n d t h a t 0167-9317/97/$17.00 © Elsevier Science B.V. All rights reserved. PII: S0167-9317(97)00179-2

for p o w e r s u p p l y w i r e sizing we m a y b e i n t e r e s t e d in t h e w o r s t c a s e c u r r e n t .

I.I. How is energy dissipated E n e r g y is d i s s i p a t e d because some signal voltage changes and c h a r g e or d i s c h a r g e a ~ Vout p a r a s i t i c c a p a c i t a n c e Cl. F o r a 0-->Vdd t r a n s i tion, s w i t c h 1 is closed, a n energy E0 ~ I = CI * V~d is d r a w n f r o m the p o w e r f~ure I 1 s u p p l y Vdd a n d a n energy E C = ~ C I * V~d is s a v e d in t h e c a p a c i t a n c e Cl. T h e o t h e r I Cl *v~ is d i s s i p a t e d . For a Vdd ---~0 t r a n s i t i o n , s w i t c h 2 is closed, n o e n e r g y is d r a w n f r o m Vdd, b u t the e n e r g y s t o r e d in Cl is d i s s i p a t e d . To e x h i b i t t h i s r e s u l t , w e c a n c o n s i d e r t h a t t h e c u r r e n t I flowing t h r o u g h t h e s w i t c h is c o n s t a n t . In f a c t t h i s is a l m o s t t r u e in CMOS a n d a n y w a y t h e r e s u l t w o u l d hold for a n y c u r r e n t .

Vddt~

0

~

Vdd~ -- __

T

0

T

figure 2 We have I _- -dq -; q = C 1 *Vout. dt The e n e r g y W s s a v e d in the c a p a c i t a n c e is: W~Vout*I*dt=[

q*dq*dt=[ vdd q*dq J C 1 dt JO C1

Work supported in part by Hiperlogic Esprit 20023

A. Guyot, S. Abou-Samra/Microelectronic Engineering 39 (1997) 245-253

246

ws= 1 q,v

1.4. T r a n s i t i o n

The total d i s s i p a t e d e n e r g y Wd is : W d = f Vdd* I*dt = V d d , ~ 1 " Vdd dq I

Half of it is lost. 1.2. W h a t c a u s e s t r a n s i t i o n s T r a n s i t i o n s a t a cell o u t p u t s a r e obviously c a u s e d b y t h e i n p u t t r a n s i t i o n s . In a s y n c h r o n o u s m a c h i n e , all s w i t c h i n g a c t i v i t y u l t i m a t e l y derive f r o m t h e c l o c k t r a n s i t i o n s . Let u s see w h a t activity t h e input transition m a y cause on a 4-output cell. a ~

m

~bm

u

d~

1.5. E ~ m p l e s

of activity reduction

1.5.1. Information is r e d u n d a n t S u p p o s e t h a t we a r e n o t i n t e r e s t e d in t h e a c t u a l v a l u e o f two b i t s x a n d y {00,01,10,11}, b u t in the s u m s = x+y {0,1,2} Let u s i n t r o d u c e o = x v y a n d a=xAy, o a n d a are n o t i n d e p e n d e n t ( a < o), so t h e y c a r r y less i n f o r m a t i o n t h a n x a n d y.

1 0 l 0--- ~

1 0 ~

injection In static CMOS, t r a n s i t i o n s are injected into a c i r c u i t b y a c h a n g e of i n p u t . T h e a v e r a g e n u m b e r of i n j e c t e d t r a n s i t i o n s is the s u m of the n u m b e r of c h a n g e s over all possible i n p u t p a t t e r n p a i r s divided b y t h e n u m b e r o f p a t t e r n p a i r s . We call t h a t n u m b e r activity. In d y n a m i c CMOS, t h e r e is a p r e c h a r g e b e t w e e n every n e w i n p u t e v a l u a t i o n s . So t h e activity d o e s n o t d e p e n d o n t h e i n p u t transitions but only on input values. Nevertheless, every logic v a l u e "0" c a u s e s two t r a n s i t i o n s , o n e for t h e d e c h a r g e a n d a n o t h e r for the s u b s e q u e n t prechage.

~

-

figure 3 In t h i s e x a m p l e a d o e s n o t c h a n g e , b have a n u s e f u l t r a n s i t i o n , c h a s two r e d u n d a n t t r a n s i t i o n s a n d finally d h a s two redundant transitions and one useful. U s e f u l t r a n s i t i o n s a r e e a s i e r to a n a l y s e , s i n c e t h e y follow t h e r u l e s of B o o l e a n algebra. R e d u n d a n t t r a n s i t i o n s are c a u s e d b y different d e l a y s from t h e i n p u t s to t h e s a m e o u t p u t a n d a r e m o r e difficult to analyse and minimise. In the previous example we c a n c o u n t o n the o u t p u t c h r o n o g r a m t h a t t h e r e are twice a s m a n y r e d u n d a n t activity a s u s e f u l o n e at the cell o u t p u t .

A = Ause.ful + A r e d u n d a n t 1.3. R e d t m d R n t t r a n s i t i o n t a x o n o m y The r e d u n d a n t t r a n s i t i o n s are also c a l l e d g l i t c h e s or h a z a r d s or s p u r i o u s t r a n s i t i o n s . A n even n u m b e r of h a z a r d s i s often called a static h a z a r d ( o u t p u t c) , a n d a n o d d n u m b e r of t r a n s i t i o n s a d y n a m i c h a z a r d ( o u t p u t d).

x

0 0 1 1

y 0 1 0 1

o=xvy a=x^y s=x+y 0 0 0 1 0 1 1 0 1 1 1 2

o+a 0 I 1 2

We note t h a t s = x+y = o+a. If x a n d y are equiprobable and independent, the probability of e a c h line is the s a m e : 1/4. To c o u n t u p t h e t r a n s i t i o n s , we d r a w a t a b l e w i t h t h e f o u r p o s s i b l e old a n n e w v a l u e s for (xy), a n d t h e n u m b e r of t r a n s i tions ( H a m m i n g distance) b e t w e e n t h e m . xy 00 01 i0 ii

00 0 1 1 2

01 1 0 2 1

I0 1 2 0 1

II 2 1 1 0

All o c c u r r e n c e s in the table are equiproblable. The average n u m b e r of t r a n s i t i o n s is 1 6 / 1 6 = 1. Now we derive a n o t h e r table b y r e c o d i n g (xy) into (oa).

A. Guyot, S. Abou-Samra/Microelectronic Engineering 39 (1997) 245-253 oa 00 10 10 11

00 0 1 1 2

10 1 0 0 1

10 1 0 1 1

11 2 1 1 0

Again, all o c c u r r e n c e s in t h e table are equiproblable. The average n u m b e r of t r a n s i t i o n s is n o w 1 2 / 1 6 . The activity gain is 25%.

247

1.5.4. The trade off In ~ 1 . 4 . 1 a n d ~ 1 . 4 . 2 e x a m p l e s w e r e p r o v i d e d o n h o w to exploit r e d u n d a n c y or m a k e i n f o r m a t i o n r e d u n d a n t in o r d e r to d e c r e a s e activity a n d c o n s e q u e n t l y power. B u t to a c h i e v e a c t i v i t y r e d u c t i o n , e x t r a logic is a d d e d . This logic will b r i n g b o t h its o w n c o n s u m p t i o n a n d its o w n delay. T h e extra silicon a r e a is n o t a n i s s u e here.

1.5.2. Information is made redundant

2. R I P P L E C A R R Y A D D E R

Let u s s u p p o s e n o w t h a t we a r e i n t e r e s t e d in t h e a c t u a l v a l u e of t h e two bits x a n d y • {00,01,10,11]. We a d d a third bit, i a n d c o d e a = x@i a n d b = y@i. To d e c o d e x a n d y f r o m (i,a,b) is s t r a i g h t forward since x = a@i a n d y = b$i. The code is r e d u n d a n t s i n c e for e a c h v a l u e of (x,y) t h e r e are two possible v a l u e s of (i,a,b). xy 00 01 10 11 000 001 010 011 iab 111 1I0 101 100 T h e v a l u e o f i is c h o s e n in o r d e r to r e d u c e t h e n u m b e r of t r a n s i t i o n s . T h i s table gives t h e c h o i c e of t h e n e w v a l u e of (i,a,b) a c c o r d i n g to t h e n e w v a l u e of (x,y) a n d the old v a l u e of (i,a,b). xy 00 01 I0 I1 000 000 001 010 I00 001 000 001 101 011 010 000 II0 010 011 011 III 001 010 011 100 000 110 101 100 I01 III 001 I01 I00 110 111 110 010 100 111 111 110 101 011 Again, all t h e o c c u r r e n c e s in t h e table are equiprobable, The m i n i m u m n u m b e r of t r a n s i t i o n s is 0 a n d t h e m a x i m u m n u m b e r is 2 a n d the average is 2 4 / 3 2 . As in the previous example, t h e a c t i v i t y is r e d u c e d b y 25%.

In t h i s section, a m o d e l for t h e activity of a Ripple C a r r y A d d e r (RCA) is d e r i v e d without taking into account the a t t e n u a t i o n of t h e s p u r i o u s t r a n s i t i o n s . Let u s c o n s i d e r a n n - b i t a d d e r t h a t ~-,n-I Ai c o m p u t e s S = A + B, f r o m A = 2~i=0ai*2

1.5.3. F o r m a l i s a t i o n If all t h e bit v a l u e s a r e e q u i p r o b a b l e a n d i n d e p e n d e n t t h e a v e r a g e a c t i v i t y is 1 ~-,n . ~ . i ~-~_~i=01*~,.~n . If only a m a x i m u m of n/2 b i t s a r e a l l o w e d to c h a n g e , t h e a c t i v i t y

1

becomes --* 2n

~-,n/2.. ~-~i

~ 1'' " [StBu95]. x-~i=0 ~--'n+l

~-~n - l . -i a n d B = z~i=0Di*v..

2.1. Useful activity in adders If we s u p p o s e t h a t t h e r e is n o t e m p o r a l correlation between the successive inputs a n d t h a t t h e v a l u e s a r e u n i f o r m l y distrib u t e d t h e n t h e a v e r a g e n u m b e r of u s e f u l t r a n s i U o n s a t t h e o u t p u t S of a n a d d e r is e q u a l to h a l f t h e n u m b e r of bits.

Auseful = n/2 I 2.2. Redundant activity in adders Let u s c o n s i d e r a c a r r y r i p p l e a d d e r (figure 4} b7a7 b 6 a 6 b5a5 b4a4 b 3 a 3 b2a2 b l a I b o a 0

tFAI FAiN ~7

~6

~5

~4

s3

s2

Sl

so

figure 4 In this figure, the m o s t significant bit s7 d e p e n d s o n all the i n p u t s ai a n d bi t h r o u g h the c a r r y p a t h , w i t h different d e l a y s so it m a y exhibit a lot of r e d u n d a n t t r a n s i t i o n s . At e v e r y p o s i t i o n i, t h e n e x t c a r r y ci+ I is e i t h e r g e n e r a t e d (Ci+l = 1), p r o p a g a t e d (ci+ 1 = ci) or killed (ci+ i = 0) a c c o r d i n g to t h e v a l u e s of the digits ai a n d bi.

248

A. Guyot, S. Abou-Samra / Microelectronic Engineering 39 (1997) 245-253

So t h r e e s i g n a l s c a n be defined, one to control e a c h case: gi = ai ^ bi, Pi = ai @ b i , ki = aiA b i. Given the p r o b a b i l i t y o f a i a n d bi, we get

The two o p e r a t i o n are d e s c r i b e d b y gi, Pi a n d ki t-1 g7 k6 g5 k4 g3 k2 gl k0 to P7 P6 P5 P4 P3 P2 Pl go

t h e p r o b a b i l i t y iv a n d activity A of gi, Pi a n d k i t h e n derive t h e activity of c l a n d s i. iv( ai = 1) = 1/2. iv( ai = 0) = i - iv( ai = I) = 1/2. We s u p p o s e t h a t t h e r e is one a d d i t i o n per clock cycle a n d t h a t the a d d e r i n p u t are latched and change only at the beginning of the clock period t 0. ivt0( ai : l--K)) = ivt_ I ( ai = I) * ivt0( ai = 0). At0(ai) = ivt0( ai : 1--~0) + ivt0 ( ai : 0--~1) = i v t 0 ( a i : 0 4-d) = 1/2. Atk (ai)= 0 for k > 0, A a n d B do n o t c h a n g e until t h e n e x t addition.

t h e n the c a r r y c h a i n c 8 c 7 c6 t_ 1 0 1 0 to 0 I 0 tl 1 0 1 t2 0 1 0 t3 1 0 1 t4 0 1 0 t5 I 0 I t6 0 1 1 t7 1 1 1

] ai p r o b a b i l i t y iv activity A

bi

gi

Pi

ki

1/2

1/2

1/4

1/2

1/4

I 1/2

1/2

3/8

4/8

3/8

As far a s t h e c a r r y is c o n c e r n e d , a n a d d i t i o n is c o m p o s e d o f t w o p h a s e s . D u r i n g the initiate p h a s e t l the carries are g e n e r a t e d or killed. T h e d u r a t i o n of this p h a s e is t h e FA delay. T h e n c o m e s t h e p r o p a g a t e p h a s e t h a t e n d s w h e n all t h e c a r r i e s s t o p p r o p a g a t i n g . After t h a t t h e a d d e r is static. It is w o r t h m e n t i o n i n g t h a t the s u m si m a y c h a n g e d u r i n g b o t h phases. 2.3. Worst c a s e a c t i v i t y for RC,A The w o r s t c a s e d e l a y of a n n-bit ripple c a r r y a d d e r is l i n e a r l y p r o p o r t i o n a l to n s i n c e t h e c a r r y m a y h a v e to p r o p a g a t e t h r o u g h all t h e n FAs. D u r i n g all t h i s time, t h e a d d e r is active, m e a n i n g t h a t t h e r e is a t l e a s t o n e t r a n s i t i o n , b u t t h e r e m a y be m u c h more. S u p p o s e t h a t a t time t_ i t h e following addition had j u s t complete A 1 0 1 0 1 0 1 0 +B + 1 0 1 0 1 0 1 0 S 1 0 1 0 1 0 1 0 0 a n d at time to the 0 i 1 + 1 0 0 1 0 0 0

next 0 I 0

one 1 0 0

is started: 0 0 1 1 1 1 0 0 0

will c5 1 I 0 1 0 1 1 1 1

a n d the s u m will have the s7 s6 s5 s4 t-I 0 1 0 1 to 1 0 1 0 tl 0 1 0 1 t2 1 0 1 0 t3 0 I 0 1 t4 1 0 1 0 t5 0 1 0 0 t6 1 0 0 0 t7 0 0 0 0

h a v e the c 4 c3 0 1 0 I 1 0 0 1 1 1 1 1 1 1 1 1 1 1

values: c2 Cl 0 0 0 I 1 1 1 1 1 1 1 1 1 1 1 1 1 1

values sequence: s3 s2 Sl sO 0 1 0 0 1 0 1 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

in t h i s w o r s t c a s e e x a m p l e t h e activity of b o t h the c a r r y ci a n d the s u m si is O(n2/2). 2.4. Average o u t p u t S a ~ h r l t y In t h e ripple c a r r y a d d e r , w h e n all t h e i n p u t s are applied a t to, a n y activity a f t e r tl is d u e to t h e p r o p a g a t i o n of t h e c a r r y t h r o u g h a c h a i n of Pi = 1. So a n y activity of si t h a t s h o w s u p at time tk, Atk (si), h a s b e e n injected a t time to, initiated a t time tl with a probability 1 / 2 a n d p r o p a g a t e d to tk t h r o u g h (k-1)pi with a probability (1/2) k-1 . So Atk(si) = 2 -k if i >_ k a n d 0 o t h e r w i s e . T h e a v e r a g e a c t i v i t y is t h e s u m of t h e o u t p u t activity over t h e time is ~ k = 1 2 - k . This v a l u e is r a t h e r close to 1 w h e n i is large. U n f o r t u n a t e l y we c a n n o t j u s t a d d t h e a v e r a g e a c t i v i t i e s of all o u t p u t s si b e c a u s e a d j a c e n t o u t p u t s are highly d e p e n dant through the adjacent carry relation ci+ i = gi v Pi ^ci.

A. Guyot, S. Abou-Samra/Microelectronic Engineering 39 (1997) 245-253

2.5. A v e r a g e c a r r y propagation Let u s call T ( n , k ) t h e n u m b e r of c h a i n s of k c o n s e c u t i v e " l " s in a b i n a r y w o r d of l e n g t h n : It is c l e a r t h a t : T(n,0) = 0 (no "zero" c h a i n ) , T ( n , n ) = 1 (111...11) a n d T ( n , n - 1 ) = 2 (2 possibilities : 0 1 1 . . . 1 1 1 or 11...1110). Let u s n o w c o m p u t e t h e g e n e r a l t e r m T ( n , k ) for 0 < k < n . S i n c e t h e w o r d extremities a s well a s the bit v a l u e 0 a c t as c h a i n s e p a r a t o r s , we d i s t i n g u i s h two cases. W h e n t h e c h a i n is a t one of the two e n d s of the n-bit w o r d , t h e r e are 2 n-(k+l) different v a l u e s for t h e n - (k + 1) bits o u t s i d e t h e chain: 11...!0 k+ l

so T ( n , k ) = 2 n - k ( l +

011...01 n-(k+2) n-k-l)

for0
a n d T(rr0) = 0 , T(n,n) = 1.

p o r t i o n a l to k 2 / 2 [MoPa96]. Thus the average activity

is

k2 -2- * T(n,k).

A=2 n~k=0

Let u s recall s o m e u s e f u l identities [Kre93] n

. .2_ i

:2

y~in0 i 3 . 2 - i = 2 6

n

2- Z

n__3. a 2n,

# b i t s fictivity (%) 8 9.37 16 0.15 32 8.94e-06 64 8.33e-15

Delay (%) 2.34 0.02 5.59e-07 2.60e-16

77 (%) 8.27 0.04 4.26e-06 2.58e-06

TI = A r e d u n d a n t _ A - A u s e f u l _ n - 4 A A 3n-4 For large v a l u e s o f n, 77 = 1 / 3 . T h i s r e s u l t is c o n s i s t e n t w i t h t h e R O B D D simulations using a constant delay and with [LMJ95]. In o t h e r w o r d s , t h e a v e r a g e activity o v e r h e a d in a n a i v e c a r r y ripple a d d e r s is 50%

The activity at t l is given b y :

As s e e n in §2.3. the activity c a u s e d b y a c a r r y p r o p a g a t i n g over k p o s i t i o n s is pro-

n

In t h e following, the h i g h e r o r d e r t e r m s a r e n e g l e c t e d , i.e. it is a s s u m e d t h a t : 3n-4 A- - - . The table below gives t h e rela4 tire error for 8, 16, 32 a n d 64 bits.

2.7. Decreasing activity

2.6. Average adder activity

1

= n/ 4

The total activity A is split in two p a r t s : A = A u s e f u l + A r e d u n d a n t . T h u s , t h e ratio of r e d u n d a n t o v e r t o t a l a c t i v i t y is:

O11...01, n-(k+ l)

T h e r e are n - (I<+2) possibilities for the c h a i n to be in the middle of the word a n d for e a c h p o s i t i o n t h e r e are 2 n-(k+2) v a l u e s of t h e n - ( k + 2) r e m a i n i n g bits. 01...10 k+2

Aredundant

249

:0,22" : 6

_ n__~ 2 2 n'

At 1 = ~

1 ,yn

k=lk*T(n,k).

The

ripple

c h a i n s of l e n g t h k t h a t exist a t time t2 are t h o s e of length k+l at t l , t h u s : I n-1 At2 = -o--~-*~ k = l k * T ( n , k + 1). T h i s s u m goes to n- 1 b e c a u s e in a w o r d of l e n g t h n, t h e r e is no ripple c a r r y c h a i n l o n g e r t h a n n. More generally, t h e activity a t time ti is given b y : ,n-ik.T(n,k+i)=2_i(n2i+2] Ati = ~ 1 , ~X.~k=l

that

allow

to

simplify t h e e x p r e s s i o n of activity A :

2.8. Adder power consumption

3n - 4 3n 2 A - - 4 2n+3 t h e s e identities, o n e 1 n 2 n E k=ok*T(n'k)=

The actual power c o n s u m p t i o n of a ripple c a r r y a d d e r is o b t a i n e d f r o m t h e FA s t a t e t r a n s i t i o n d i a g r a m (figure 5} w h e r e e a c h v e r t i c e s is a s t a t e of t h e FA t h r e e i n p u t s ai, bi a n d ci a n d t h e e d g e s a r e labelled b y t h e e n e r g y d i s s i p a t e d w h e n t h e i n p u t s c h a n g e from one s t a t e to a n o t h e r .

3n - 4 ~ With n--->~¢ 4 c a n easily verify § 2.1: n 3n n )__. 2 2 n+2 n~ 2

250

A. Guyot, S. Abou-Samra/Microelectronic Engineering 39 (1997) 245-253 3.1. The A operator Let u s note Pi,j the group propagate a n d Gi,j the group generate, with n - I >_i_>j_>0. Pi,j m e a n s t h a t t h e c a r r y p r o p a g a t e s from position j u p to position i, t h a t is t h a t Ci+l is equal to cj. Pi,j = ['I~=i Pk. We have already s e e n t h a t P ( Pi,j = 1) = 2i-j • So its activity is r a t h e r low. Gi, j m e a n s t h a t a c a r r y is g e n e r a t e d somewhere between j and i and propagated from t h i s location u p to p o s i t i o n i a n d yields ci+1

figure 5 The e n e r g i e s of all 32 t r a n s i t i o n s are o b t a i n e d b y SPICE s i m u l a t i o n for s t a t i c c o m p l e m e n t a r y g a t e s in a .7~m CMOS technology. They show that the energy d e p e n d s m o s t l y on the o u t p u t ci+ I a n d si t r a n s i t i o n s a n d very few on the input. An o u t p u t state t r a n s i t i o n d i a g r a m gives:

=

1. q,j = g i v ~ J = i ( P i , k + l ^ g k ) "

We have the following: Pi,i = Pi = ai @ bi, Gi,i = gi = ai ^ bi, Pij ^ Gi,j = 0 a n d ci+ i = Gi, 0. (there is no co input. If there were one it would be considered as g_ 1) For a n y k s u c h t h a t n- 1 > i > k > j > 0, the pair of bits (Pi j, Gi,j ) c a n be c o m p u t e d from (Pi,k,Gi,k) anti' ( P k : i j , Gk-l,j) in t h e foliowing recursive way: (Pi,j,GLj)=(Pi,k/~k- 1,j, Gi,k vPi, k ^Gk-l,j ). Is noted A the operator s u c h t h a t (Pij, Gi,j) = (Pi,k, Gi,k) A (Pk-Ij, Gk-Ij). In the s u b s e q u e n t figures the icon ~ is u s e d for the 4 bit input, 2 bit o u t p u t A-cell. It is e a s y to c h e c k that: A is associative, n o n c o m m u t a t i v e a n d idempotent. Any ( P i j , Gi,j) requires (i-j-l) A-cells to be c o m p u t e d f r o m t h e a d d e r s i n p u t s . I n t e r m e d i a t e r e s u l t s from the A-cells m a y be reused, t h u s r e d u c i n g the total n u m b e r of A-cells, b u t i n c r e a s i n g t h e f a n - o u t of some of them.

figure 6 Only 4 different energies are m e a s u r e d in pico-Joule: Transition 1 2 3 4 Energy (p-Joule) 0.086 0.071 0.047 0.015 3. F A S T A D D E R A R C H I T E C T U R E This section is o r g a n i s e d as follows: the A o p e r a t o r i n t r o d u c e d by B r e n t a n d K u n g [BrKu82] is first recalled. T h e n it is u s e d to d e s c r i b e several well k n o w n a d d e r a r c h i t e c t u r e s b y a n unified formalism.

3.2. Adder Architectures [GBB941 Now a fast addition is composed of three p h a s e s . D u r i n g the initiate p h a s e , t h e Pi a n d gi are o b t a i n e d . T h e n d u r i n g t h e p r o p a g a t e p h a s e , all t h e ci = Gi-1,0 are computed. In the result p h a s e the si = ci @ Pi are eventually c o m p u t e d . Let u s e x a m i n e n o w s o m e w e l l - k n o w n a d d e r architectures, their delay ( n u m b e r of A-ceils along t h e critical path), t h e i r c o s t (the total n u m b e r of A-cells) a n d t h e i r activity. The figures 7 a n d 8 only s h o w t h e p r o p a g a t e p h a s e s i n c e t h e o t h e r s are trivial.

A. Guyot, S. Abou-Sarnra/Microelectronic Engineering 39 (1997) 245-253 28 27 26 25 24 23 22 21 20 19 1817 161

off

?

f

fffifi t' ftfl

14 13 12 11 10

432

/

// f-

~1 fE ~'(f( t'tY

251

K

'771

I

Figure 7: A 2-1evel Carry Select Adder 3.2.1. Two Level Carry Select Adder The two level carry-select-adder (figure 7) , also n a m e d c o n d i t i o n a l - s u m - a d d e r or c a r r y - i n c r e m e n t - a d d e r is b a s e d on the RCA t r u n c a t e d into blocks of varying sizes. Its cost is in O(2n) and delay o r 2 ~ ] , more precisely with k A-cells along the critical path, the n u m b e r n of bits s u c h an adder can accommodate is: 1+ ~ k i= l k ( k + 1) i=l 3.2.2. S k l a n s k y Adder The S k l a n s k y a d d e r [Skla60] h a s proved to be a m o n g the fastest architen ctures. It cost is or-~ log2(n)1, a n d its delay C~'log2(n)i. The m a i n d r a w b a c k of t h i s a d d e r is t h a t t h e f a n - o u t g r o w s e x p o n e n t i a l l y from the i n p u t s to the o u t p u t s along the critical p a t h a n d the transistors m u s t be sized.

3.2.3. Kogge & Stone and H a n & Carlson Adders The most significant bit of a Brent a n d Kung adder as well as in a Sklansky adder is obtained by a perfectly balanced b i n a r y tree in time log2n. If the tree for the m o s t significant position is j u s t copied for all o t h e r positions, the Kogge a n d S t o n e (figure 8) adder is obtained. The fan-out is reduced to j u s t two, at the expense of a larger n u m b e r of A-cells, t h a t b e c o m e s O( n ( log2n -1) +I) cells. As for t h e Sklansky's adder, the delay is orlog2 n i . In order to reduce the n u m b e r of cells of the Kogge a n d Stone adder, H a n a n d Carlson [HaCa87] have proposed to compute only the odd positions, a n d t h e n to add a layer to compute the even positions from the odd ones. The delay is slightly increased to orlog2(n) +11 , while t h e complexity becomes 0 (: ~rlog2(n) +11).

31 3 0 2 9 28 27 26 25 2423 22 21 20 lg 18 17 16 15 1413 12 1110 9

7

6

5

%

%

%

%

4

%

,~ 117

,I-,7 ~,

Figure 8: A 32 bit Kogge and Stone adder

3

2

0

A. Guyot, S. Abou-Samra / Microelectronic Engineering 39 (1997) 245-253

252

4. F A S T A D D E R S MODELLING T h e c a r r y r i p p l e m o d e l is e x t e n d e d to the adders that can be obtained by an a s s o c i a t i o n of r i p p l e c a r r y c h a i n s - like t h e c a r r y s e l e c t a d d e r or t h e Kogge a n d S t o n e a d d e r for e x a m p l e . In t h e c o m p u t a t i o n of TI, t h e u s e f u l activity A u s e f u l is a s s u m e d to b e h a l f t h e n u m b e r of ~ cells s i n c e "0" a n d "1" a r e e q u i p r o b a b l e (this is c o n s i s t e n t w i t h t h e B D D s i m u l a t i o n s ) . T h e e r r o r d u e to t h e a p p r o x i m a t i o n s m a d e for t h e c o m p u t a t i o n of A is too big for s m a l l ripple c a r r y c h a i n s (typically log2 n for p a r a l l e l a r c h i t e c t u r e s ) . Thus, the formulae below are meaningful for large v a l u e s of log2 n or 2~/~. 4.1. TWO L e v e l C a r r y ,~elect Al~kier T h e 2CSA a d d e r is a RCA t r u n c a t e d into b l o c k s . F o r n b i t s , t h e l e n g t h of t h e s e b l o c k s v a r i e s f r o m 1 to ~/2n - 1. T h u s a n b i t s 2CSA c a n b e viewed a s ~ RCAs of l e n g t h v a r y i n g f r o m 1 to xi2n - 1 (first level) p l u s a r o w o f ceils t h a t f o r m t h e s e c o n d level (figure 2).

4.1.1. First level T h e t o t a l activity of t h e first level of t h e 2CSA is g i v e n b y t h e s u m of t h e activities of t h e r i p p l e c a r r y c h a i n s , a s t h e y a r e i n d e p e n d e n t from e a c h other. A.st, .

= 2 n "2~i 2

tl'~tiJ)= 2 -~-'2~i 8

4

3. 2 ~ + 4 . n .

2~-6.n-57 8 A s s u m i n g t h a t t h e u s e f u l a c t i v i t y is g i v e n b y h a l f t h e n u m b e r of cells, t h e r e d u n d a n t to t o t a l a c t i v i t y r a t i o c a n b e computed:

I~2csA = A - Auseful

= (7 3. 2~v/~+4.n. 2 ~ - 6 n - 5 7

4.2. Kogge and Stone adder E a c h bit of t h e Kogge a n d S t o n e a d d e r is obtained by a balanced binary tree [KoSt73], t h u s t h e o u t p u t of a n y cell c a n c h a n g e only o n c e d u r i n g a clock cycle - n o redundant transitions. T h e c a r r y p r o p a g a t i o n is t h e r e s u l t of a logical AND, t h u s its a c t i v i t y d e c r e a s e s v e r y r a p i d l y w i t h t h e d e p t h (like 2-i), b u t t h e t r a n s i t i o n p r o b a b i l i t y of t h e c a r r y g e n e r a t i o n is a l m o s t c o n s t a n t (1/2). T h e s e c o n s i d e r a t i o n s a l l o w u s to a p p r o x i m a t e t h e activity of t h e Kogge a n d S t o n e a d d e r by half the number of its cells : n AK& S = -~ log2n , and ~K&S = 0

T a b l e I [TVG95, Zim96] of # of A-cells Delay Max. er (A-cell} fan-out

aT•de

4

4.1.2. Second level T h e s e c o n d l e v e l of t h e 2 C S A is a p p r o x i m a t eedd h e r e b y a ripple c a r r y c h a i n of l e n g t h ~/2n, in w h i c h , a cell a t p o s i t i o n i is d u p l i c a t e d i t i m e s . T h i s a p p r o a c h neglects the activity generated at the s e c o n d level b y t h e ripple of t h e o u t p u t s a t t h e first level. T h e activity is d e d u c e d f r o m t h e RCA a c t i v i t y b y a s s u m i n g t h a t t h e c a p a c i t a n c e of t h e it5 cell is i i n s t e a d of 1.

6. 2 - ~ n - 5 7 8

levels : ATota I = AlStleve I + A2ndlevel

4.3. Comparison

t'"

= 1 ~-~2~J2-~I, k 2 A2ndleve l ~--~-• x_~k = 1 L ~ - - 7 - . T ( - , k )

4.1.3. Total Activity T h e total activity of t h e 2CSA is t h e s u m of t h e a c t i v i t i e s of t h e f i r s t a n d s e c o n d

!

Ripple

2

n/2

2~evel

r2n- 2q2-~n~ r 2q5-~.]

r24~. i

r.-4-~.121

3-1evel CS

5/2 n31og2(n/2} ["6~n]

[ 6~n]

N.A.

BrentKung

n -1

[2nF2 log2(n ) r2 log2(n ] log2 (n)l -21 -21

rn/2 ~klansky

n -1

Useful Activity

N.A.

~rn/4

Flog2(n)I

log2 (n)]

n/2

log2 (n }1

Kogge& rn log2(n} [log2(n)l tone -n +1l

2

rn log2(n -n+I]/2

Han and ] In log2(n) l'log2(n )] Carlson -n+l ]/2 +I

2

"n 1og2(n -n+l]/4

A. Guyot, S. Abou-Samra/Microelectronic Engineering 39 (1997.) 245-253

5. R E D U N D A N T ADDITION C a r r y p r o p a g a t i o n free a d d e r u s e s mostly the c a r r y - s a v e r e p r e s e n t a t i o n t h a t is th e digit set {0,1,2} in radix 2. In this s y s t e m s o m e v a l u e s have several digit r e p r e s e n t a t i o n s , h e n c e t he n a m e r e d u n dant. This s y s t e m is very p o p u l a r in DSP b u t als o in f a s t m u l t i p l i c a t i o n a n d division. Since there are more digit values, the activity is definitely larger t h a n with c o n v e n t i o n a l b i n a r y not a t i on, this is the price to pay for no car r y propagation. On the other hand, the code can be optimised. Let x a n d y be the two bits representing a digit (there are 24 possibilities). We can min imis e th e activity A(x) + A(y) at the a d d e r o u t p u t by t he choice of t he code a c c o r d i n g t h e digit t r a n s i t i o n p r o b a b i lities P(0~-~I), P(0~-~2) and P(l~-~2). T h e a n a l y t i c modelling of t he t r ans i t i o n p r o b a b i l i t i e s is still to b e performed, b u t activity m e a s u r e s [LMC96] h as b een c o n d u c t e d t h r o u g h simulation. P(0~-~I) P(0<->2) P(l<->2) accumulation 28% 22% 16% multiplication 33% 10% 12% counting 25% 16% 14% T h e y clearly s how t h a t 0 a n d 1 m u s t have a d j a c e n t codes, a n d the code for 2 d e p e n d s on the application.

6. CONCLUSION We have s e e n on one h a n d t h a t a d d e r activity c a n be modelled and on the ot her h a n d t h a t t h e choi ce of t h e code m a y r e d u c e s the o u t p u t activity, sometimes at t h e e x p e n s e of m o r e d e l a y a n d / o r complexity. T he activity c a n a c c u r a t e l y predict the power c o n s u m p t i o n . So far no straightforward strategy h a s b e e n devised to minimise it.

7. R E F E R E N C E S [BCS92] R. Brodersen, A. C h a n d r a k a s a n a n d S. S h e n g , " L o w - P o w e r S i g n a l Processing Systems", V LSI S i g n a l Processing Yao, J a i n , P r z y t u l a a n d Rabaey, Editorss. IEEE Press, 1992,

253

[BrKu82] R. P. Brent , H. T. Kung, "A Regular Layout for Parallel Adders", IEEE Transactions on Computers, Vol. C31, pp 261 - 264, March 1982. [GBB94] A. Guyot, M. Belrhiti, G. Bosco, "Adders S y n t h e s i s " , Logic a n d Architect ure S y n t h e s i s , G. S a u c i e r a n d A. Mignotte, editors. C h a p m a n & Hall, 1994. [HaCa87] T. Han, D. A. Carlson, "Fast Area-Efficient VLSI Adders", proc of 8t h S y m p o s i u m on C o m p u t e r A ri t hm et i c, Como, Italy, May 1987 [KoSt73] P.M. Kogge and H. S. Stone, "A P a r a l l e l A l g o r i t h m for t h e E f f i c i e n t Solution of a General Class of R e c u r r e n c e Equations" IEEE Trans. Comput., vol. 22, no. 8, pp. 783-791, Aug. 1973. [Kre93] E. Kreyszig, "Advanced Engineering Mathematics", WILEY, New York. [LMC96] T. Lang, E. M u s s o l a n d J. Cortadella, " R e d u n d a n t a d d e r s for reduced o u t p u t t r a n s i t i o n s " i n proc. 11 t h DCIS'96 Barcelona, Nov. 1996. [LMJ95] J. Leijteen, J. v a n Meerbergen, J. J e s s , "Analysis a n d R e d u c t i o n of Glitches in S y n c h r o n o u s Networks", in proc ED&TC, Paris, March 1995. [MoPa96] L. Montalvo a n d K.K. Parhi, "Estimation of Average E nergy C o n s u m pt i on of Ri ppl e-Carry A d d e r B a s e d on Average Length C a r r y Chai ns" in proc. 1 l t h DCIS'96 Barcelona, Nov. 1996. [Skla60] J. Sklansky, "Conditional S u m Addition Logic", IRE Transaction EC-9(2), J u n e 1960, pp 226-231. [StBu95] M. S t a n and W. Burleson "BusInvert Coding for Low Power I/O" IEEE Trans on VLSI system, March 1995. [TVG95] V . T c h o u m a t c h e n k o , T. Vassileva a n d A. Guyot, "Timing Modelling for Adders Optimisation", in proc. PATMOS 95, Oldenburg, sept. 1995. [Zim96] R. Zimmermann, "NonHeuristic Optimization a n d S y n t h e s i s of Parallel-Prefix Adders", In proc. o f IFIP workshop, Grenoble, December1996.