Exponentiation using canonical recoding

Exponentiation using canonical recoding

Theoretical Elsevier Computer Science 407 129 (1994) 407417 Note Exponentiation recoding 6mer using canonical Egecioglu Department of Comput...

620KB Sizes 25 Downloads 114 Views

Theoretical Elsevier

Computer

Science

407

129 (1994) 407417

Note

Exponentiation recoding 6mer

using canonical

Egecioglu

Department

of Computer

Science,

University

of California,

Sanla Barbara,

CA 93106. USA

Cetin Kaya KOG* Departmenr USA

of Electrical and Computer

Engineering,

Oregon State University,

Corvallis, OR 97331,

Communicated by O.H. Ibarra Received April 1993 Revised August 1993

Abstract Egecioglu, 6. and C.K. Koc, Science 129 (1994) 4077417.

Exponentiation

using

canonical

recoding,

Theoretical

Computer

The canonical bit recoding technique can be used to reduce the average number of multiplications required to compute XE provided that X-i is supplied along with X. We model the generation of the digits of the canonical recoding D of an n-bit long exponent E as a Markov chain, and show that binary, quatemary, and octal methods applied to D require f n, J n, and g n multiplications, compared Mn required by these methods applied to E. We show that, in general, the canonically totn,yn,and recoded m-ary method for constant m requires fewer multiplications than the standard m-ary method. However, when m is picked optimally for each method for a given n, then the average number of multiplications required by the standard method is fewer than those required by the recoded version.

1. Introduction The binary multiplications

method [S] computes Y=XE using IZ- 1 squarings and as many as one less than the number of nonzero bits in the binary expansion of

Correspondence to: 6. Egecioglu, Department of Computer Science, University Barbara, CA 93106, USA. E-mail: [email protected]. *Supported in part by RSA Data Security Inc., Redwood City, CA, USA. 0304-3975/94/$07.00 c 1994-Elsevier SSDI 0304-3975(93)E0200-N

Science B.V. All rights reserved

of California,

Santa

408

ii. E@io$u,

C.K.

Koc

the exponent, where n = 1 + Llog, E]. It is well known that IZ- 1 is a lower bound for the number of squaring operations required. However, it is possible to reduce the number

of subsequent

multiplications

using a recoding of the exponent

[4,6,7,10,16].

Recoding techniques (Booth recoding, bit-pair recoding, etc.) for sparse representations of binary numbers have been effectively used in multiplication algorithms [3,17]. For example, the original Booth recoding technique [l] scans the bits of the multiplier

one bit at a time, and adds or subtracts

the multiplicand

to or from the

partial product, depending on the value of the current bit and the previous bit. The modified versions of the Booth algorithm scan the bits of the multiplier two bits at a time [9] or three bits at a time [17]. These techniques are equivalent in the sense that a signed-digit representation which is based on the identity 2’+j-2’=2’+jW1 + 2i+j-2+...+2’+1 +2’ is used to collapse blocks of l’s appearing in a binary representation. In a signed-digit number with radix 2, three symbols (i, 0, l} are allowed for the digit set, in which 1 and i in bit position i represent +2’ and -2’, respectively. Bit recoding techniques applied to E can be used for the exponentiation problem provided that X-l is supplied along with X. Throughout this paper, we will ignore the preprocessing time required for the computation of X - ’ and treat it as part of the input.

2. Canonical recoding A signed-digit vector D of E is a sparse recoding of E using digits from the set {i,O, 1). The recoding is canonical if D contains no adjacent nonzero digits [3,8,13]. Thus, a canonical signed-digit vector of E is of the form D=(D,_ 1D,-* . ..D.,) with Dig (1,0, 1) and Di’Di_1=0

for l
It can be shown that the canonical signed-digit vector for E is unique if the binary expansion of E is viewed as padded with an initial zero. This canonical signed-digit vector can be constructed by the canonical recoding algorithm of Reitwiesner [13]. Reitwiesner’s algorithm computes D starting from the least significant digit and proceeding to the left. First the auxiliary carry variable Co is set to 0 and subsequently the binary expansion of E is scanned two bits at a time. The canonically recoded digit Di and the next value of the auxiliary binary variable Ci+ 1 for i =O, 1,2, . . , n are generated using Table 1. As an example, when E = 3038, we compute the canonical signed-digit vector D as E=(0101111011110)=2”+29+28+27+2~+24+23+22+2’,

D=(~0iooooi000i0)=2~2-2~0-25-2~. Note that canonically

in this example the exponent E contains nine nonzero bits while its recoded version contains only four nonzero digits. Consequently, the

Exponentiation using canonical recoding

Table 1 Canonical

recoding

&+I

Ei

c,

D,

ci+1

0

0 0

0

0

1

1

0

1

0

I 0 0 1 1

0 1 0 1 0 1

1 0 0 i I 0

0 0 0 1 0 1 1 1

0

1 1 1 1

409

binary method requires 11 + 8 = 19 multiplications to compute X303s when applied to the binary expansion of E, but only 12+3 = 15 multiplications when applied to the canonical signed-digit vector D. The canonical signed-digit vector D is optimal in the sense that it has the minimum number of nonzero digits among all signed-digit vectors representing the same number.

3. The standard m-ary method The binary method can be generalized to the (standard) m-ary method [S, 12,161 which scans the digits of E expressed in radix m. We restrict our attention to the case when m=2d for some d. Let E=(E,_,E,_z ...Er E,) be the binary expansion of the exponent. We will assume that the most significant bit is equal to zero, i.e., En_ 1 = 0. This representation of E is partitioned into k blocks of length d each, for kd = n (if d does not divide n, the exponent is padded with at most d- 1 zeros). Define d-l

(1) Note that 0 < Fi < 2d - 1 and E = Et:,’ Fi2id. In the preprocessing phase of the m-ary method, the values of XF for F = 2,3 , . . . , 2d - 1 corresponding to all possible values of the length-d bit-sections are computed. Next, the bits of E are scanned d bits at a time from the most significant to the least significant. At each step the partial result is raised to the 2d power and multiplied with Xc, where Fi is the (nonzero) value of the current bit section. Standard m-ary method Input: X,E,n,d where n=l+Llog,Eland n=kd for k31. Output: Y= XE. 1. Decompose E into d-bit words FL for i = 0, 1,2, . . . , k - 1. 2. Compute and store XF for all F = 2,3,4, . . . , 2d - 1.

410

ii. EBecioBlu,

C.K. Koq

3 . y.=XC, . 4. for i=k-2 downto 0 4a. Y:= Y2d 4b. if Fi#O then Y:= Y.XK 5. return Y

The preprocessing part in Step 2 of the m-ary method requires 2d - 2 multiplications. The number of squaring operations in Step 4a is equal to (k- 1)d. Multiplications in Step 4b are performed for nonzero values of Fi. Since m - 1 out of m possible values of Fi are nonzero, the average number of subsequent multiplications required is (k- l)((ma total of

1)/m). Thus,

we find that,

on the average,

the m-ary

method

requires

multiplications. The average number of squarings plus multiplications for the binary (d = l), the quaternary (d = 2), and the octal (d = 3) methods are found from (2) as T,(n, l)=$n-3,

T,(n,2)=++--&

T,(n,3)=$$n-7,

(3)

respectively.

4. The recoded m-ary method In the recoded m-ary method, we partition the canonical signed-digit vector D produced by Reitwiesner’s algorithm instead of the exponent E itself. In other words, in the recoded m-ary method the d-bit-at-a-time partitioning that determines the bit sections Fi in (1) is applied to the canonical signed-digit vector D.

Recoded m-ary method Input: X,X-‘,E,n,d where n=l+Llog,E] and n=kd for k>l. Output: Y=XE. 1. Compute the canonical signed-digit recoding D of E using Reitwiesner’s algorithm. 2. Decompose D into d-bit words Fi for i = 0, 1,2, , . . , k - 1. 3. Compute and store XF for all possible length-d bit-sections that can appear in D. 4. y.=XCI 5. for i=k-2 downto 0 5a. Y:= Y2d 5b. if Fi#O then Y:= Y.XF~ 6. return Y

Exponentiation using canonical recoding

5. Analysis of the recoded wary In the analysis length-d

411

method

of the recoded

m-ary method

sections F of a canonical

signed-digit

we need the number

vector to estimate

of all possible

the work involved

in

Step 3 of the recoded m-ary method. To compute this, denote by 9 the formal language of all words w over the alphabet (i, 0,l)in which none of the patterns 11, appears.

ii,

ii,

ii

Thus, the words w of length d in 9 correspond

to possible

length-cl sections

Fi of a canonical signed-digit vector. For d 2 0, let TVdenote the total number of length d in 9. We have the following result.

of words

Lemma 5.1. Td=4[2d+2 +(Proof. By considering

l)d+‘].

(4)

the words in 9

according

to their first letter, we see

~=~+i+i+iosf+io~+0~, where h denotes the empty generating function f.(t)= It

1 t’“‘= WELP

word

1

and

(5) +

denotes

disjoint

union.

Consider

the

zdtd.

d>O

follows from (5) that f9 satisfies f&t) = 1 + 2t + 2t *&f(t) + tfy(t),

and therefore (6) Now (4) follows by equating

the coefficient

of td on both sides of (6).

0

Since we are interested in the cost of multiplications required in exponentiation, we do not concern ourselves with the bit level preprocessing required for the computation of the canonical recoding D in Step 1 of the algorithm. The number of multiplications necessary in the preprocessing stage Step 3 of the recoded m-ary method is given by the following lemma. Lemma 5.2. The number of multiplications required in the preprocessing of the recoded

m-ary method is

Td-3=+[2d+2+(-l)d+1]-3.

phase (Step 3)

412

ii. ECjecio~lu,

Proof. We need to compute XF for all length-d canonical

C.K.

KOC

the number of multiplications required to compute signed-digit vectors F. First we compute all quantities

XF where F contains only one nonzero letter. Since 1, X, and X- ’ are already available, this step requires 2(d- 1) multiplications. After this, each value XF, where F contains k> 1 nonzero digits, can be computed recursively from the already computed values of XF where F contains fewer than k nonzero digits by a single multiplication. For k>O, let ck denote the number of words of length d in _Y with exactly k nonzero

letters. Then the total number

of multiplications

required

for the prepro-

cessing step is

Since cO= 1, c1 =2d and

it follows that the total number t,-3. 0

of multiplications

required

is zd - (1 + 2d) + 2(d - 1) =

Now we turn to the computation of the probability that a length-d canonically recoded bit-section F consists of d zeros, since the multiplication in Step 5b of the recoded m-ary method is carried out only for nonzero F. An n-bit binary number E uniformly distributed in the range [0,2”- 11 can be viewed as the output of a random process that generates one bit at a time. Each bit assumes a value of zero or one with equal probability and there is no dependency between any two bits generated. Thus, Y(Ei = 0) = 9(Ei = 1) = 4 for 0 < i < n - 1. The signed-digit numbers produced by the canonical recoding algorithm can be modeled using a finite Markov chain. The state variables are taken to be the triplets (Ei + 1, Ei, Ci). There are eight states for the eight possible combinations of input as given in Table 1. The state transitions given in Table 2 are produced by considering all eight states labeled s0 to s7 and their successors from Table 1. As an example, consider state s0 which represents (Ei+l, Ei, Ci) = (O,O, 0). We compute the output (Di, Ci+ 1) as (0,O) from Table 1. Thus, the next state is (Ei+2,Ei+l,Ci+l)=(Ei+z,O,O). Since ~(Ei+*=O)=~(Ei+2=1)=~, there are transitions from state s0 to the states s0 = (O,O, 0) and s4 = (LO, 0), with equal probability. Let Pij denote the probability that the successor state of Si is sj. From the above analysis PO0 = PO4 = i and Yoj = 0 for j = 1,2,3,5,6,7. After computing the probabilities Pjj for all i and j from Table 2, we find that the one-step transition probability

Exponentiation

Table 2 State transition

table for the canonical

using canonical

recoding

413

recoding

algorithm Next state

State

output

si

&+,,&,G)

E a+*--0

(Di,C,.,)

&+,=l

(0,0, 0) (O,O,l) (0, LO) (0, 1, 1) (LO,O) (LO>1) (1, LO) (LL 1)

matrix

of the chain is

P=

‘l/2

0

0

0

l/2

0

0

0’

l/2

0

0

0

l/2

0

0

0

l/2

0

0

0

l/2

0

0

0

0

l/2

0

0

0

l/2

0

0

0

0

l/2

0

0

0

l/2

0

0

0

0

l/2

0

0

0

l/2

0

0

0

l/2

0

0

0

l/2

0

0

0

l/2

0

0

0

l/2_

The limiting probability ni of state si can be found by solving equations rcP=z with x0 + 7tr + ... + n7 = 1. This gives

(7)

the system of linear

Having computed 71iand ~ij for all 0 d i, j < 7, we can easily prove several properties of the canonical recoding algorithm. For example, the probability that a digit in a canonical signed-digit number D is equal to zero is found by summing the limiting probabilities of the states for which output Di=O. From Table 2 and (8) we get

In particular, the average number of nonzero digits in the canonically recoded binary number D is equal to 4 n. Therefore, the average number of squarings plus multiplications required by the recoded binary method for large n is $n + O(l), which is better than T,(n, l)=$n+ O(1) required by the standard binary method.

414

ii. E&xioglu,

Theorem 5.3. The probability signed-digit

C.K. Koq

that a length-d bit-section

F in a canonically

recoded

vector has all bits equal to zero is 1 d-12

0 z

1 j=p.

Proof. We have already Di = 0 is found as

p(Di+ 1~

seen that P(Di=O)=j.

The probability

that Di+ 1 =0 when

~O~Oj+n3~3j+n4~4j+~7~7j 0(Q=o)=cj=o~3~4~7

=-

1

2’

no+n3+n4+n7

It follows that for d
Di+d_2=O,

Di+d_3=O, . . ..Di=O)=

0

1 d-12 3. z

q

Combining Theorem 5.3 and the preprocessing cost for the recoded m-ary method given in Lemma 5.2, we find that, on the average, the recoded m-ary method with m = 2’ requires a total of

squarings and multiplications. Figure 1 compares the average number of multiplications required by the standard and the recoded m-ary methods, respectively, as a function of n=27,28, . . . . 216 and d=1,2, . . . . 15. The average number of squarings plus multiplications for the recoded binary (d = l), the recoded quaternary (d = 2), and the recoded octal (d = 3) methods are found from (9) as T,(n, 1)=4n-4,

T,(n,2)=jn-4,

T,(n,3)=f#n++,

(10)

respectively.

6. Comparison

of standard and recoded m-ary methods

For large n and fixed d, the behavior of T,(n, d) given in (9) and T,(n,d) of the standard m-ary method given in (2) is governed by the coefficient of IZ.In Table 3 we compare the values T,(n, d)/n and T,(n, d)/n for large n. We can compute directly from the expressions in (2) and (9) that for constant d lim

nAo0

r,(n,d)

(d+ 112”~4<

T,(n,=(d+

1)2d- 1

1



(11)

Exponentiation using canonical recoding

415

1.8

1.6

Fig. 1. The standard

Table 3 The average

number

of multiplications

versus recoded

for the recoded

m-ary methods.

and standard

m-ary methods

d=log,m

1

2

3

4

5

6

7

8

T,h 4/n

1.5 1.33333

1.375 1.33333

1.29167 1.27778

1.23437 1.22917

1.19375 1.19167

1.16406 1.16319

1.14174 1.14137

1.12451 1.12435

T,(n,d)ln

It is interesting

to note that if we consider

the optimal values d, and d, of d (which required by the

depend on n) which minimize the average number of multiplications standard and the recoded m-ary methods, respectively, then

(12) for large n. To prove (12), we consider the behavior of T,(n,d) and T,(n,d) for large n and ignore the lower-order terms involving d in (2) and (9). By differentiation, the optimal values d =d, and d =d, of the lengths of the bit-sections in the standard and

416

a. Egecioelu, C.K. Koq

the recoded to be

m-ary methods

that minimize

d222dlog2-d210g2 2d-dlog2-1 respectively.

and

=’

Since d increases

without

the number

of multiplications

are found

4d222dlog2-4d210g2

bound

32d-4dlog2-4

=Iz’

in each of these equations

as n gets large,

d, and d, satisfy di2dslog2cn

and

d,?2’rlog2z$n.

The function d22d is an increasing expressions in (2) and (9) we get T,(n, d,)

1+ l/d,

r,(n,=

1+ l/ds

function

(13) of d and therefore

d, cd,.

Now from the

for large n, which implies (12). Exact values of d, and d, for a given n can be obtained by enumeration. These optimal values of d, and d, are given in Table 4 together with the corresponding values of T, and T, for each n = 27, 28,. . . , 216.

7. Remarks Algorithms for computing XE using as few multiplications as possible are crucial in many important applications in computer science and engineering. Recent applications in cryptography, for example, the RSA algorithm [14], the ElGamal signature scheme [2], and the recently proposed digital signature standard (DSS) of National Institute for Standards and Technology [ 111, require the computation of X E (mod M) for large values of E (usually n=log, E > 512). The recoded m-ary method can be useful for these particular applications if X - 1 can be supplied without too much extra cost. Even though the inverse X-i (modM) can easily be computed using the

Table 4 Optimal values of d, and d, together with T, and r,

128 256 512 1024 2048 4096 8192 16 384 32 768 65 536

4 4 5 5 6 1 8 8 9 10

168 326 636 1247 2440 4195 945-l 18669 36902 73 095

3 4 4 5 6 I 7 8 9 10

168 328 643 1255 2458 4836 9511 18751 37 070 73 433

Exponentiation

using canonical

417

recoding

extended Euclid algorithm, the cost of this computation far exceeds the time gained by the use of the recoding technique in exponentiation. Thus, at this time the recoding techniques do not seem to be particularly applicable to these cryptosystems. However, the recoding techniques may be useful for computations on elliptic curves over finite fields, since in these cases the inverse is available at no additional cost [6, lo]. In this context, one computes E. X, where E is a large integer and X is a point on the elliptic curve. The multiplication operator is determined by the group law of the elliptic curve. An algorithm for computing XE is easily converted to an algorithm for computing E. X, where we replace multiplication the inverse) by subtraction.

by addition

and division

(multiplication

with

References [l] [2] [3] [4] [S] [6]

[7] [S] [9] [lo] [l l] [12] [13] [I41 [15] 1161 1171

A.D. Booth, A signed binary multiplication technique, Quart. J. Mech. Appl. Math. 4 (1951) 236-240 (also reprinted in 115, pp. 100-1041). T. ElGamal, A public key cryptosystem and a signature scheme based on discrete logarithms, IEEE Trans. Inform. Theory 31 (1985) 469-472. K. Hwang, Computer Arithmetic, Principles, Architecture, and Design (Wiley, New York, 1979). J. Jedwab and C.J. Mitchell, Minimum weight modified signed-digit representations and fast exponentiation, Electron. Lett. 25 (1989) 1171-1172. D.E. Knuth, The Art of Computer Programming, Vol. 2; Seminumerical Algorithms (Addison-Wesley, Reading, MA, 2nd ed., 1981). N. Koblitz, CM-curves with good cryptographic properties, in: J. Feigenbaum, ed., Advances in Cryptology - Proc. CR YP TO ‘91, Lecture Notes in Computer Science, Vol. 576 (Springer, New York, 1991) 279-287. C.K. Koq, High-radix and bit recoding techniques for modular exponentiation, Internat. J. Comput. Math. 40 (1991) 139-156. I. Koren, Computer Arithmetic Algorithms (Prentice-Hall, Englewood Cliffs, NJ, 1993). O.L. MacSorley, High-speed arithmetic in binary computers, Proc. IRE 49 (1961) 67-91 (also reprinted in [15, pp. 14-381). F. Morain and J. Olivos, Speeding up the computations on an elliptic curve using addition-subtraction chains, Rapport de Recherche 983, INRIA, 1989. National Institute for Standards and Technology, Digital signature standard (DSS), Federal Register 56 (1991) 169. H. Orup and P. Kornerup, A high-radix hardware algorithm for calculating the exponential ME modulo N, Proc. 10th Symp. Comput. Arith. (1991) 51-56. G.W. Reitwiesner, Binary arithmetic, Adv. Comput. 1 (1960) 231-308. R.L. Rivest, A. Shamir and L. Adleman, A method for obtaining digital signatures and public-key cryptosystems, Commun. ACM 21 (1978) 120-126. E.E. Swartzlander, ed., Computer Arithmetic, Vol. I (IEEE Computer Sot. Press, Los Alamitos, CA, 1990). N. Takagi, A radix-4 modular multiplication hardware algorithm for modular exponentiation, IEEE Trans. Comput. 41 (1992) 949-956. S. Waser and M.J. Flynn, Introduction to Arithmetic for Digital System Designers (Freeman, New York, 1982).