Theoretical Elsevier
Computer
Science
407
129 (1994) 407417
Note
Exponentiation recoding 6mer
using canonical
Egecioglu
Department
of Computer
Science,
University
of California,
Sanla Barbara,
CA 93106. USA
Cetin Kaya KOG* Departmenr USA
of Electrical and Computer
Engineering,
Oregon State University,
Corvallis, OR 97331,
Communicated by O.H. Ibarra Received April 1993 Revised August 1993
Abstract Egecioglu, 6. and C.K. Koc, Science 129 (1994) 4077417.
Exponentiation
using
canonical
recoding,
Theoretical
Computer
The canonical bit recoding technique can be used to reduce the average number of multiplications required to compute XE provided that X-i is supplied along with X. We model the generation of the digits of the canonical recoding D of an n-bit long exponent E as a Markov chain, and show that binary, quatemary, and octal methods applied to D require f n, J n, and g n multiplications, compared Mn required by these methods applied to E. We show that, in general, the canonically totn,yn,and recoded m-ary method for constant m requires fewer multiplications than the standard m-ary method. However, when m is picked optimally for each method for a given n, then the average number of multiplications required by the standard method is fewer than those required by the recoded version.
1. Introduction The binary multiplications
method [S] computes Y=XE using IZ- 1 squarings and as many as one less than the number of nonzero bits in the binary expansion of
Correspondence to: 6. Egecioglu, Department of Computer Science, University Barbara, CA 93106, USA. E-mail:
[email protected]. *Supported in part by RSA Data Security Inc., Redwood City, CA, USA. 0304-3975/94/$07.00 c 1994-Elsevier SSDI 0304-3975(93)E0200-N
Science B.V. All rights reserved
of California,
Santa
408
ii. E@io$u,
C.K.
Koc
the exponent, where n = 1 + Llog, E]. It is well known that IZ- 1 is a lower bound for the number of squaring operations required. However, it is possible to reduce the number
of subsequent
multiplications
using a recoding of the exponent
[4,6,7,10,16].
Recoding techniques (Booth recoding, bit-pair recoding, etc.) for sparse representations of binary numbers have been effectively used in multiplication algorithms [3,17]. For example, the original Booth recoding technique [l] scans the bits of the multiplier
one bit at a time, and adds or subtracts
the multiplicand
to or from the
partial product, depending on the value of the current bit and the previous bit. The modified versions of the Booth algorithm scan the bits of the multiplier two bits at a time [9] or three bits at a time [17]. These techniques are equivalent in the sense that a signed-digit representation which is based on the identity 2’+j-2’=2’+jW1 + 2i+j-2+...+2’+1 +2’ is used to collapse blocks of l’s appearing in a binary representation. In a signed-digit number with radix 2, three symbols (i, 0, l} are allowed for the digit set, in which 1 and i in bit position i represent +2’ and -2’, respectively. Bit recoding techniques applied to E can be used for the exponentiation problem provided that X-l is supplied along with X. Throughout this paper, we will ignore the preprocessing time required for the computation of X - ’ and treat it as part of the input.
2. Canonical recoding A signed-digit vector D of E is a sparse recoding of E using digits from the set {i,O, 1). The recoding is canonical if D contains no adjacent nonzero digits [3,8,13]. Thus, a canonical signed-digit vector of E is of the form D=(D,_ 1D,-* . ..D.,) with Dig (1,0, 1) and Di’Di_1=0
for l
It can be shown that the canonical signed-digit vector for E is unique if the binary expansion of E is viewed as padded with an initial zero. This canonical signed-digit vector can be constructed by the canonical recoding algorithm of Reitwiesner [13]. Reitwiesner’s algorithm computes D starting from the least significant digit and proceeding to the left. First the auxiliary carry variable Co is set to 0 and subsequently the binary expansion of E is scanned two bits at a time. The canonically recoded digit Di and the next value of the auxiliary binary variable Ci+ 1 for i =O, 1,2, . . , n are generated using Table 1. As an example, when E = 3038, we compute the canonical signed-digit vector D as E=(0101111011110)=2”+29+28+27+2~+24+23+22+2’,
D=(~0iooooi000i0)=2~2-2~0-25-2~. Note that canonically
in this example the exponent E contains nine nonzero bits while its recoded version contains only four nonzero digits. Consequently, the
Exponentiation using canonical recoding
Table 1 Canonical
recoding
&+I
Ei
c,
D,
ci+1
0
0 0
0
0
1
1
0
1
0
I 0 0 1 1
0 1 0 1 0 1
1 0 0 i I 0
0 0 0 1 0 1 1 1
0
1 1 1 1
409
binary method requires 11 + 8 = 19 multiplications to compute X303s when applied to the binary expansion of E, but only 12+3 = 15 multiplications when applied to the canonical signed-digit vector D. The canonical signed-digit vector D is optimal in the sense that it has the minimum number of nonzero digits among all signed-digit vectors representing the same number.
3. The standard m-ary method The binary method can be generalized to the (standard) m-ary method [S, 12,161 which scans the digits of E expressed in radix m. We restrict our attention to the case when m=2d for some d. Let E=(E,_,E,_z ...Er E,) be the binary expansion of the exponent. We will assume that the most significant bit is equal to zero, i.e., En_ 1 = 0. This representation of E is partitioned into k blocks of length d each, for kd = n (if d does not divide n, the exponent is padded with at most d- 1 zeros). Define d-l
(1) Note that 0 < Fi < 2d - 1 and E = Et:,’ Fi2id. In the preprocessing phase of the m-ary method, the values of XF for F = 2,3 , . . . , 2d - 1 corresponding to all possible values of the length-d bit-sections are computed. Next, the bits of E are scanned d bits at a time from the most significant to the least significant. At each step the partial result is raised to the 2d power and multiplied with Xc, where Fi is the (nonzero) value of the current bit section. Standard m-ary method Input: X,E,n,d where n=l+Llog,Eland n=kd for k31. Output: Y= XE. 1. Decompose E into d-bit words FL for i = 0, 1,2, . . . , k - 1. 2. Compute and store XF for all F = 2,3,4, . . . , 2d - 1.
410
ii. EBecioBlu,
C.K. Koq
3 . y.=XC, . 4. for i=k-2 downto 0 4a. Y:= Y2d 4b. if Fi#O then Y:= Y.XK 5. return Y
The preprocessing part in Step 2 of the m-ary method requires 2d - 2 multiplications. The number of squaring operations in Step 4a is equal to (k- 1)d. Multiplications in Step 4b are performed for nonzero values of Fi. Since m - 1 out of m possible values of Fi are nonzero, the average number of subsequent multiplications required is (k- l)((ma total of
1)/m). Thus,
we find that,
on the average,
the m-ary
method
requires
multiplications. The average number of squarings plus multiplications for the binary (d = l), the quaternary (d = 2), and the octal (d = 3) methods are found from (2) as T,(n, l)=$n-3,
T,(n,2)=++--&
T,(n,3)=$$n-7,
(3)
respectively.
4. The recoded m-ary method In the recoded m-ary method, we partition the canonical signed-digit vector D produced by Reitwiesner’s algorithm instead of the exponent E itself. In other words, in the recoded m-ary method the d-bit-at-a-time partitioning that determines the bit sections Fi in (1) is applied to the canonical signed-digit vector D.
Recoded m-ary method Input: X,X-‘,E,n,d where n=l+Llog,E] and n=kd for k>l. Output: Y=XE. 1. Compute the canonical signed-digit recoding D of E using Reitwiesner’s algorithm. 2. Decompose D into d-bit words Fi for i = 0, 1,2, , . . , k - 1. 3. Compute and store XF for all possible length-d bit-sections that can appear in D. 4. y.=XCI 5. for i=k-2 downto 0 5a. Y:= Y2d 5b. if Fi#O then Y:= Y.XF~ 6. return Y
Exponentiation using canonical recoding
5. Analysis of the recoded wary In the analysis length-d
411
method
of the recoded
m-ary method
sections F of a canonical
signed-digit
we need the number
vector to estimate
of all possible
the work involved
in
Step 3 of the recoded m-ary method. To compute this, denote by 9 the formal language of all words w over the alphabet (i, 0,l)in which none of the patterns 11, appears.
ii,
ii,
ii
Thus, the words w of length d in 9 correspond
to possible
length-cl sections
Fi of a canonical signed-digit vector. For d 2 0, let TVdenote the total number of length d in 9. We have the following result.
of words
Lemma 5.1. Td=4[2d+2 +(Proof. By considering
l)d+‘].
(4)
the words in 9
according
to their first letter, we see
~=~+i+i+iosf+io~+0~, where h denotes the empty generating function f.(t)= It
1 t’“‘= WELP
word
1
and
(5) +
denotes
disjoint
union.
Consider
the
zdtd.
d>O
follows from (5) that f9 satisfies f&t) = 1 + 2t + 2t *&f(t) + tfy(t),
and therefore (6) Now (4) follows by equating
the coefficient
of td on both sides of (6).
0
Since we are interested in the cost of multiplications required in exponentiation, we do not concern ourselves with the bit level preprocessing required for the computation of the canonical recoding D in Step 1 of the algorithm. The number of multiplications necessary in the preprocessing stage Step 3 of the recoded m-ary method is given by the following lemma. Lemma 5.2. The number of multiplications required in the preprocessing of the recoded
m-ary method is
Td-3=+[2d+2+(-l)d+1]-3.
phase (Step 3)
412
ii. ECjecio~lu,
Proof. We need to compute XF for all length-d canonical
C.K.
KOC
the number of multiplications required to compute signed-digit vectors F. First we compute all quantities
XF where F contains only one nonzero letter. Since 1, X, and X- ’ are already available, this step requires 2(d- 1) multiplications. After this, each value XF, where F contains k> 1 nonzero digits, can be computed recursively from the already computed values of XF where F contains fewer than k nonzero digits by a single multiplication. For k>O, let ck denote the number of words of length d in _Y with exactly k nonzero
letters. Then the total number
of multiplications
required
for the prepro-
cessing step is
Since cO= 1, c1 =2d and
it follows that the total number t,-3. 0
of multiplications
required
is zd - (1 + 2d) + 2(d - 1) =
Now we turn to the computation of the probability that a length-d canonically recoded bit-section F consists of d zeros, since the multiplication in Step 5b of the recoded m-ary method is carried out only for nonzero F. An n-bit binary number E uniformly distributed in the range [0,2”- 11 can be viewed as the output of a random process that generates one bit at a time. Each bit assumes a value of zero or one with equal probability and there is no dependency between any two bits generated. Thus, Y(Ei = 0) = 9(Ei = 1) = 4 for 0 < i < n - 1. The signed-digit numbers produced by the canonical recoding algorithm can be modeled using a finite Markov chain. The state variables are taken to be the triplets (Ei + 1, Ei, Ci). There are eight states for the eight possible combinations of input as given in Table 1. The state transitions given in Table 2 are produced by considering all eight states labeled s0 to s7 and their successors from Table 1. As an example, consider state s0 which represents (Ei+l, Ei, Ci) = (O,O, 0). We compute the output (Di, Ci+ 1) as (0,O) from Table 1. Thus, the next state is (Ei+2,Ei+l,Ci+l)=(Ei+z,O,O). Since ~(Ei+*=O)=~(Ei+2=1)=~, there are transitions from state s0 to the states s0 = (O,O, 0) and s4 = (LO, 0), with equal probability. Let Pij denote the probability that the successor state of Si is sj. From the above analysis PO0 = PO4 = i and Yoj = 0 for j = 1,2,3,5,6,7. After computing the probabilities Pjj for all i and j from Table 2, we find that the one-step transition probability
Exponentiation
Table 2 State transition
table for the canonical
using canonical
recoding
413
recoding
algorithm Next state
State
output
si
&+,,&,G)
E a+*--0
(Di,C,.,)
&+,=l
(0,0, 0) (O,O,l) (0, LO) (0, 1, 1) (LO,O) (LO>1) (1, LO) (LL 1)
matrix
of the chain is
P=
‘l/2
0
0
0
l/2
0
0
0’
l/2
0
0
0
l/2
0
0
0
l/2
0
0
0
l/2
0
0
0
0
l/2
0
0
0
l/2
0
0
0
0
l/2
0
0
0
l/2
0
0
0
0
l/2
0
0
0
l/2
0
0
0
l/2
0
0
0
l/2
0
0
0
l/2
0
0
0
l/2_
The limiting probability ni of state si can be found by solving equations rcP=z with x0 + 7tr + ... + n7 = 1. This gives
(7)
the system of linear
Having computed 71iand ~ij for all 0 d i, j < 7, we can easily prove several properties of the canonical recoding algorithm. For example, the probability that a digit in a canonical signed-digit number D is equal to zero is found by summing the limiting probabilities of the states for which output Di=O. From Table 2 and (8) we get
In particular, the average number of nonzero digits in the canonically recoded binary number D is equal to 4 n. Therefore, the average number of squarings plus multiplications required by the recoded binary method for large n is $n + O(l), which is better than T,(n, l)=$n+ O(1) required by the standard binary method.
414
ii. E&xioglu,
Theorem 5.3. The probability signed-digit
C.K. Koq
that a length-d bit-section
F in a canonically
recoded
vector has all bits equal to zero is 1 d-12
0 z
1 j=p.
Proof. We have already Di = 0 is found as
p(Di+ 1~
seen that P(Di=O)=j.
The probability
that Di+ 1 =0 when
~O~Oj+n3~3j+n4~4j+~7~7j 0(Q=o)=cj=o~3~4~7
=-
1
2’
no+n3+n4+n7
It follows that for d
Di+d_2=O,
Di+d_3=O, . . ..Di=O)=
0
1 d-12 3. z
q
Combining Theorem 5.3 and the preprocessing cost for the recoded m-ary method given in Lemma 5.2, we find that, on the average, the recoded m-ary method with m = 2’ requires a total of
squarings and multiplications. Figure 1 compares the average number of multiplications required by the standard and the recoded m-ary methods, respectively, as a function of n=27,28, . . . . 216 and d=1,2, . . . . 15. The average number of squarings plus multiplications for the recoded binary (d = l), the recoded quaternary (d = 2), and the recoded octal (d = 3) methods are found from (9) as T,(n, 1)=4n-4,
T,(n,2)=jn-4,
T,(n,3)=f#n++,
(10)
respectively.
6. Comparison
of standard and recoded m-ary methods
For large n and fixed d, the behavior of T,(n, d) given in (9) and T,(n,d) of the standard m-ary method given in (2) is governed by the coefficient of IZ.In Table 3 we compare the values T,(n, d)/n and T,(n, d)/n for large n. We can compute directly from the expressions in (2) and (9) that for constant d lim
nAo0
r,(n,d)
(d+ 112”~4<
T,(n,=(d+
1)2d- 1
1
’
(11)
Exponentiation using canonical recoding
415
1.8
1.6
Fig. 1. The standard
Table 3 The average
number
of multiplications
versus recoded
for the recoded
m-ary methods.
and standard
m-ary methods
d=log,m
1
2
3
4
5
6
7
8
T,h 4/n
1.5 1.33333
1.375 1.33333
1.29167 1.27778
1.23437 1.22917
1.19375 1.19167
1.16406 1.16319
1.14174 1.14137
1.12451 1.12435
T,(n,d)ln
It is interesting
to note that if we consider
the optimal values d, and d, of d (which required by the
depend on n) which minimize the average number of multiplications standard and the recoded m-ary methods, respectively, then
(12) for large n. To prove (12), we consider the behavior of T,(n,d) and T,(n,d) for large n and ignore the lower-order terms involving d in (2) and (9). By differentiation, the optimal values d =d, and d =d, of the lengths of the bit-sections in the standard and
416
a. Egecioelu, C.K. Koq
the recoded to be
m-ary methods
that minimize
d222dlog2-d210g2 2d-dlog2-1 respectively.
and
=’
Since d increases
without
the number
of multiplications
are found
4d222dlog2-4d210g2
bound
32d-4dlog2-4
=Iz’
in each of these equations
as n gets large,
d, and d, satisfy di2dslog2cn
and
d,?2’rlog2z$n.
The function d22d is an increasing expressions in (2) and (9) we get T,(n, d,)
1+ l/d,
r,(n,=
1+ l/ds
function
(13) of d and therefore
d, cd,.
Now from the
for large n, which implies (12). Exact values of d, and d, for a given n can be obtained by enumeration. These optimal values of d, and d, are given in Table 4 together with the corresponding values of T, and T, for each n = 27, 28,. . . , 216.
7. Remarks Algorithms for computing XE using as few multiplications as possible are crucial in many important applications in computer science and engineering. Recent applications in cryptography, for example, the RSA algorithm [14], the ElGamal signature scheme [2], and the recently proposed digital signature standard (DSS) of National Institute for Standards and Technology [ 111, require the computation of X E (mod M) for large values of E (usually n=log, E > 512). The recoded m-ary method can be useful for these particular applications if X - 1 can be supplied without too much extra cost. Even though the inverse X-i (modM) can easily be computed using the
Table 4 Optimal values of d, and d, together with T, and r,
128 256 512 1024 2048 4096 8192 16 384 32 768 65 536
4 4 5 5 6 1 8 8 9 10
168 326 636 1247 2440 4195 945-l 18669 36902 73 095
3 4 4 5 6 I 7 8 9 10
168 328 643 1255 2458 4836 9511 18751 37 070 73 433
Exponentiation
using canonical
417
recoding
extended Euclid algorithm, the cost of this computation far exceeds the time gained by the use of the recoding technique in exponentiation. Thus, at this time the recoding techniques do not seem to be particularly applicable to these cryptosystems. However, the recoding techniques may be useful for computations on elliptic curves over finite fields, since in these cases the inverse is available at no additional cost [6, lo]. In this context, one computes E. X, where E is a large integer and X is a point on the elliptic curve. The multiplication operator is determined by the group law of the elliptic curve. An algorithm for computing XE is easily converted to an algorithm for computing E. X, where we replace multiplication the inverse) by subtraction.
by addition
and division
(multiplication
with
References [l] [2] [3] [4] [S] [6]
[7] [S] [9] [lo] [l l] [12] [13] [I41 [15] 1161 1171
A.D. Booth, A signed binary multiplication technique, Quart. J. Mech. Appl. Math. 4 (1951) 236-240 (also reprinted in 115, pp. 100-1041). T. ElGamal, A public key cryptosystem and a signature scheme based on discrete logarithms, IEEE Trans. Inform. Theory 31 (1985) 469-472. K. Hwang, Computer Arithmetic, Principles, Architecture, and Design (Wiley, New York, 1979). J. Jedwab and C.J. Mitchell, Minimum weight modified signed-digit representations and fast exponentiation, Electron. Lett. 25 (1989) 1171-1172. D.E. Knuth, The Art of Computer Programming, Vol. 2; Seminumerical Algorithms (Addison-Wesley, Reading, MA, 2nd ed., 1981). N. Koblitz, CM-curves with good cryptographic properties, in: J. Feigenbaum, ed., Advances in Cryptology - Proc. CR YP TO ‘91, Lecture Notes in Computer Science, Vol. 576 (Springer, New York, 1991) 279-287. C.K. Koq, High-radix and bit recoding techniques for modular exponentiation, Internat. J. Comput. Math. 40 (1991) 139-156. I. Koren, Computer Arithmetic Algorithms (Prentice-Hall, Englewood Cliffs, NJ, 1993). O.L. MacSorley, High-speed arithmetic in binary computers, Proc. IRE 49 (1961) 67-91 (also reprinted in [15, pp. 14-381). F. Morain and J. Olivos, Speeding up the computations on an elliptic curve using addition-subtraction chains, Rapport de Recherche 983, INRIA, 1989. National Institute for Standards and Technology, Digital signature standard (DSS), Federal Register 56 (1991) 169. H. Orup and P. Kornerup, A high-radix hardware algorithm for calculating the exponential ME modulo N, Proc. 10th Symp. Comput. Arith. (1991) 51-56. G.W. Reitwiesner, Binary arithmetic, Adv. Comput. 1 (1960) 231-308. R.L. Rivest, A. Shamir and L. Adleman, A method for obtaining digital signatures and public-key cryptosystems, Commun. ACM 21 (1978) 120-126. E.E. Swartzlander, ed., Computer Arithmetic, Vol. I (IEEE Computer Sot. Press, Los Alamitos, CA, 1990). N. Takagi, A radix-4 modular multiplication hardware algorithm for modular exponentiation, IEEE Trans. Comput. 41 (1992) 949-956. S. Waser and M.J. Flynn, Introduction to Arithmetic for Digital System Designers (Freeman, New York, 1982).