An efficient common-multiplicand-multiplication method to the Montgomery algorithm for speeding up exponentiation

An efficient common-multiplicand-multiplication method to the Montgomery algorithm for speeding up exponentiation

Information Sciences 179 (2009) 410–421 Contents lists available at ScienceDirect Information Sciences journal homepage: www.elsevier.com/locate/ins...

232KB Sizes 2 Downloads 56 Views

Information Sciences 179 (2009) 410–421

Contents lists available at ScienceDirect

Information Sciences journal homepage: www.elsevier.com/locate/ins

An efficient common-multiplicand-multiplication method to the Montgomery algorithm for speeding up exponentiation Chia-Long Wu * Department of Aviation and Communication Electronics, Chinese Air Force Institute of Technology, Kaohsiung 82042, Taiwan, ROC

a r t i c l e

i n f o

Article history: Received 22 February 2008 Received in revised form 19 September 2008 Accepted 3 October 2008

Keywords: Modular arithmetic Exponentiation Common-multiplicand-multiplication Signed-digit recoding Public-key cryptography Complexity analysis

a b s t r a c t The modular exponentiation is a common operation for scrambling secret data and is used by several public-key cryptosystems, such as the RSA scheme and DSS digital signature scheme. However, the calculations involved in modular exponentiation are time-consuming especially when performed in software. In this paper, we propose an efficient CMM– MSD Montgomery algorithm by utilizing the Montgomery modular reduction method, common-multiplicand-multiplication (CMM) method, and minimal-signed-digit (MSD) recoding technique for fast modular exponentiation. By using the technique of recording the common signed-digit representations in the grouped substrings of exponent, our algorithm can improve the efficiency of both the original CMM exponentiation algorithm and the Montgomery multiplication algorithm. The fast modular exponentiation algorithm developed in this paper can be easily implemented in general signed-digit computing machine, and is therefore well suited for parallel implementation to fast evaluating modular exponentiation. Moreover, by using the proposed CMM–MSD Montgomery algorithm, on average the total number of single-precision multiplications can be reduced by about 38.9% and 26.68% as compared with Dusse–Kaliski’s Montgomery algorithm and Ha– Moon’s Montgomery algorithm, respectively. Ó 2008 Elsevier Inc. All rights reserved.

1. Introduction Efficient algorithms that can speed up software implementations of modular exponentiation are often considered practical significance for practical cryptographic applications such as the RSA public-key cryptosystem [17], the Diffie–Hellman key distribution scheme [4], and the ElGamal scheme [7]. The modular exponentiation problem can be described as follows. Given M (message), E (public key), and N (modulus), compute the ciphertext C = ME mod N. To evaluate the result for modular exponentiation of (ME mod N), the very intuitive way is to break the modular exponentiation into a series of modular multiplications. As efficient evaluation of the modular exponentiation of (ME mod N) is very useful for public-key cryptosystem, we need fast multiplication designs or novel exponentiation algorithms such as the Montgomery modular reduction method [14], binary (square-and-multiply) method [10], common-multiplicand-multiplication (CMM) method [23,27], signed-digit recoding method [8], exponent-folding method [13,25], and nonstandard arithmetic methods [5]. Moreover, detailed analyses of fast exponentiation techniques have been well described by Gordon [8]. An efficient method for speeding up modular exponentiation by using the Montgomery modular reduction algorithm, the CMM technique and the MSD recoding method is presented in this paper. This paper is organized as follows. Some related

* Tel.: +886 7 6258738; fax: +886 7 6252530. E-mail address: [email protected] 0020-0255/$ - see front matter Ó 2008 Elsevier Inc. All rights reserved. doi:10.1016/j.ins.2008.10.004

C.-L. Wu / Information Sciences 179 (2009) 410–421

411

works for efficiently solving the modular exponentiation are introduced in Section 2. In Section 3, we introduce the CMM– MSD Montgomery algorithm for fast modular exponentiation. Then, the complexity of the proposed algorithm is detailed in Section 4. Finally, we conclude our work in Section 5.

2. The modular exponentiation In 1976, Diffie and Hellman proposed the first public-key algorithm as the famous ‘‘Diffie–Hellman key exchange scheme [4]”. In 1978, Rivest et al. proposed the RSA public-key cryptosystem [17] and is widely used in digital communication today. In 1985, ElGamal proposed the ElGamal scheme [7], which can be used for the cryptographic applications of both digital signatures and encryptions. Modular exponentiation is composed of repetition of modular multiplication and is often the dominant part of modern cryptographic algorithms for key exchange, digital signatures, and authentication. Taking the RSA public-key cryptosystem for example, both the encryption and decryption operations are accomplished by modular exponentiation. However, as the high bit lengths are required to provide adequate security for electronic commerce in the telecommunication network, 1024 bit length is considered secure against attack in the near future [18]. There are two commonly used methods to reduce the execution time of the modular exponentiation. One is to reduce the number of modular exponentiation. The other is to reduce the execution time of each modular multiplication. In this paper, we improve the Montgomery multiplication method by using the common-multiplicand-multiplication method and minimal-signed-digit recoding technique to effectively reduce the number of multiplication for fast evaluating exponentiation. In this section, we will give a brief review to the binary (square-and-multiply) method, common-multiplicand-multiplication method, and Montgomery modular reduction method. Most importantly, we will also introduce some essential concepts related to the proposed CMM–MSD Montgomery modular exponentiation algorithm. 2.1. The binary (square-and-multiply) method The most commonly used algorithms for computing exponentiation are the binary methods (also called square-and-multiply methods) [12]. Its basic idea is to compute modular exponentiation by using the binary expression of exponent E and the exponentiation operation is broken into a series of squaring and multiplication operations. Assume m denotes the bit length of the exponent E, the exponent E can be expressed in binary representation as (em1 em2    e1 e0)2 and P i E ¼ m1 i¼0 ei  2 , where ei 2 {0, 1}. There are two useful modular exponentiation algorithms [10] in binary methods can convert the modular exponentiation into a sequence of modular multiplications, i.e., the LSB (least-significant-bit) algorithm and the MSB (most-significant-bit) algorithm. The LSB algorithm is depicted as follows, which computes the modular exponentiation starting from the least-significant-bit of the exponent and proceeding to the left: LSB binary modular exponentiation algorithm Input: Message: M; Exponent: E = (em1 em2    e1 e0)2; 0 6 i 6 m and ei 2 {0, 1}; Output: Ciphertext: C = ME mod N; C = 1; S = M; for i = 0 to m  1 do /*scan from right to left */ begin /*modular multiplication*/ if (ei = 1) then C = C  S mod N; S = S  S mod N; /*modular squaring*/ end; Output C.

The MSB algorithm (so called left-to-right binary algorithm) computes the modular exponentiation starting from the most-significant-bit of the exponent and proceeding to the right. The MSB algorithm is depicted as follows: MSB binary modular exponentiation algorithm Input: Message: M; Exponent: E = (em1 em2    e1 e0)2; 0 6 i 6 m and ei 2 {0, 1}; Output: Ciphertext: C = ME mod N; C = 1; for i = m  1 to 0 do /*scan from left to right */ begin C = C  C mod N; /*modular squaring*/ /* modular multiplication */ if (ei = 1)C = C  M mod N; end; Output C.

412

C.-L. Wu / Information Sciences 179 (2009) 410–421

Both the MSB and LSB binary algorithms have the same computations for multiplication and squaring operations, therefore, they have the same computational complexity. The computational complexities of two binary algorithms mentioned above are summarized as follows. Take m-bit length exponent for example, for the average case, we assume the occurrence probabilities for both bits ‘‘1” and bits ‘‘0” are the same. Then, the expectation numbers for bits ‘‘1” and ‘‘0” are both m/2. Therefore, on average the complexities for both algorithms are 2  (m/2) + 1  (m/2) = 1.5m multiplications. 2.2. The common-multiplicand-multiplication (CMM) method In 1993, Yen and Laih [28] developed the algorithm of common-multiplicand-multiplication to improve the performance of the LSB binary exponentiation algorithm. Here we focus on the computations of {X  Yiji = 1, 2, . . . , t; t P 2}, a bit level arithmetic algorithm [9] is proposed and applied to the LSB binary algorithm for the efficient evaluation of exponentiation. The following variables are required in the CMM method:

Y common ¼ ANDti¼1 Y i ; Y i;c ¼ Y i XOR Y common

ð1Þ for i ¼ 1; 2; . . . ; t:

ð2Þ

Hence, each can be represented as

Y i ¼ Y i;c þ Y common :

ð3Þ

Therefore, the common-multiplicand-multiplications X  Yi (i = 1, 2, . . . , t) can be computed with the assistance of X  Ycommon as

X  Y i ¼ X  Y i;c þ X  Y common

for i ¼ 1; 2; . . . ; t:

ð4Þ

The basic idea of CMM technique is to extract the common parts of multiplicands, and then save the number of binary additions for the computation of common parts. By using the above algorithm, the computations {X  Y1, X  Y2} can be represented as {X  Y1,c + X  Ycommon, X  Y2,c + X  Ycommon}. Let both X and Yis be k-bit integers. On average, the Hamming weights of Yi, Ycommon and Yi,c are k/2, k/2t and (k/2  k/2t), respectively. The total number of binary additions for the common-multiplicand-multiplication evaluation of {X  Yiji = 1, 2, . . . , t; t P 2} is k/2t + t  (k/2  k/2t). Without the CMM algorithm, the evaluation of {X  Yiji = 1, 2, . . . , t; t P 2} are computed one after another independently using total t  (k/2) binary addition. The performance improvement [27] of the common-multiplicand-multiplication algorithm can be denoted as

k 2t

kt 2

þt

k 2

 2kt



t t þ ð1  tÞ  21t

ð5Þ

:

Based on Eq. (5) shown above, the optimal performance can be obtained as 4/3 when t = 2, which implies we need   1:5k ¼ 2m  34 multiplications by using the CMM algorithm for evaluating X  Y1 and X  Y2. On average, by applying the CMM algorithm and the LSB binary algorithm, exponentiation can be evaluated by using (1.5 + 1)m/2=1.25m [27] multiplications for exponent E being a m-bit integer. Consider the application to public-key cryptography for evaluating (ME mod N), where both M and E are 512-bit, the LSB binary algorithm and the CMM algorithm combined with the LSB binary algorithm require 768 multiplications and 640 multiplications, respectively. 2.3. The Montgomery modular reduction and exponentiation algorithm In 1985, Montgomery [14] first introduced the modular reduction algorithm (also know as Montgomery modular multiplication algorithm). The Montgomery multiplication algorithm speeds up the modular multiplications and squarings required for exponentiation [9]. Suppose that we want to compute modular multiplication AB(mod N), where A, B and modulus N are n-digit integers represented in base 2. Hence



n1 X i¼0

Ai  2 i ;



n1 X i¼0

Bi  2i ;



n1 X

N i  2i ;

ð6Þ

i¼0

where Ai, Bi and Ni are elements of {0, 1} for all i. In 1990, Dusse and Kaliski [6] proposed an efficient Montgomery modular reduction (MMR) algorithm to perform both multiplication and modular reduction simultaneously. Assume we define R = 2n, the modular reduction of a give positive T in the N-residue is defined as T0 = MMR(T) = T + n  N/R = T  R1 mod N where R > N is an integer with gcd(R, N) = 1. To obtain the correct result T0 , an additional quantity N0 is needed, where R  R1  N  N0 = 1 and n ¼ T  N 0 mod 2n . 0 Note, the MMR algorithm is processed in N-residue, it allows the precomputation of N 00 ¼ N 1 0 mod 2 instead of N ¼ N 1 mod 2n . Let N = Nn1  2n1 +    + N1  2 + N0 and X = Xn1  2n1 +    + X1  2 + X0, we can compute X one digit in every modular reduction step instead of computing the whole X at one time. Dusse–Kaliski’s Montgomery modular reduction algorithm can be depicted as follows:

C.-L. Wu / Information Sciences 179 (2009) 410–421

413

Montgomery modular reduction (MMR) algorithm

Input: A, B, N /*A, B and N are n-digit integers in base 2*/ Output: X /*X = AB  2n mod N*/ X = 0; /*X = (Xn1    X1X0)2 where n is the bit length of X*/ begin for i = 0 to m  1 do /*scan from least-significant-bit*/ begin X = X + AiB; 0 1 Y ¼ X 0 N 00 mod 2; /N 00 ¼ N 1 mod 2n / 0 mod 2; N ¼ N X = (X + Y  N)/2; end; if (X P N)X = X  N; /*X = AB  2n mod N = MMR(AB)*/ end.

From the MMR algorithm shown above, we should note that both R1 and N0 can be computed using the Euclidean algorithm [10,12], where R = 2n. Assume N0 can be precomputed and stored before we using the MMR algorithm, the modular reduction result in the N-residue be computed using the formulas (Eqs. (7)–(11)) depicted below:

A0 ¼ MMRðAR0 Þ ¼ AR mod N;

ð7Þ

B0 ¼ MMRðBR0 Þ ¼ BR mod N;

ð8Þ

X ¼ A0  B0 ¼ ARBR mod N;

ð9Þ

C 0 ¼ MMRðXÞ ¼ ABR mod N;

ð10Þ

C ¼ MMRðC 0 Þ ¼ AB mod N:

ð11Þ

As mentioned in Section 2.1, modular exponentiation can be reduced to a series of modular multiplications and squarings. By adopting the LSB binary method to the MMR algorithm shown above, we now introduce the Montgomery modular exponentiation (MME) algorithm as follows [6]: Montgomery modular exponentiation (MME) algorithm Input: M, R, N, E Output: C begin S = M  R mod N, C = R mod N; for i = 0 to m  1 do begin if (ei = 1) then C = MMR(SC); S = MMR(SS); end; C = MMR(C); end.

/*M and N are n-digit integers in base 2*/ /*E = (em1    e1 e0) where 0 6 i 6 m and ei 2 {0, 1}*/

/*R = 2n*/ /*scan exponent E from right to left*/ /*handle the nonzero digit of E*/

/*C = ME mod N*/

3. The proposed CMM–MSD Montgomery algorithm In the following, we will first introduce the basic concept of minimal-signed-digit (MSD) recoding arithmetic. Meanwhile, we will summarize some important mathematical preliminaries and formulas for our improved Montgomery algorithm using common-multiplicand-multiplication (CMM) method and MSD recoding technique. Finally, we will give detailed description for the proposed CMM–MSD Montgomery algorithm and the improved version of the proposed CMM–MSD Montgomery algorithm for fast modular exponentiation used in public-key cryptosystems. 3.1. Minimal-signed-digit recoding arithmetic A signed digit (SD) vector representation of an integer a in radix r is a sequence of digits a = (. . ., ai, . . . , a2, a1, a0) with P i ai 2 {0, ±1, . . . , ±(r  1)} such that a ¼ 1 i¼0 ai  r . Redundant representations of this form have been used successfully in many arithmetic applications [2,16], including the modular exponentiation and modular multiplication used in public-key cryptosystems [19,24]. In 1993, Arno and Wheeler [1] proposed the signed-digit recoding (redundant representations) algorithm for minimal  for 1 and we denote the bit length of a by Hamming weight arithmetic. For signed-digit recoding systems, we write 1 P i 0 a  2 denotes the binary signed expansion with signed-digit recoding of radix r = 2 for a (hence, jaj. If a ¼ m i¼0 i  0; 1gÞ, then we write a ¼ ða0 ; . . . ; a0 Þ . a0i 2 f1; m 0 SD2 Let radix r P 2 be an integer, and let Sr (the signed-digit vector representation of an integer a in radix r) denote the sequence of digits a = (. . ., a2, a1, a0) with ai 2 {0, ±1, . . . , ±(r  1)}. The Hamming weight of an element a 2 Sr, denoted w(a), is defined to be the number of nonzero terms in a.

414

C.-L. Wu / Information Sciences 179 (2009) 410–421

The mapping p : Sr ? Z defined by

pðaÞ ¼

m X

ai  ri

ð12Þ

i¼0

associates an integer with each element a 2 Sr. We refer to Sr as the set of all signed-digit radix r representations of elements x by x  sgn(x)  r, where sgn(x) is 0, 1, 1 depending on whether x is zero, positive, of Z. Let a, b 2 Sr, we define negative digit  or negative. If a = (am, . . . , a2, a1, a0) and b = (bm, . . . , b2, b1, b0), then we define the addition of c = a + b = (cm, . . ., c2, c1, c0) in Sr where e1 = 0 as

8 > > > <

ei ¼ 0



if  r < ai þ bi þ ei1 < r; ci ¼ ai þ bi þ ei1  > ei ¼ sgnðai þ bi þ ei1 Þ > > otherwise: : ci ¼ ai þ bi þ ei1  ei  r

ð13Þ

^ and the new carry We denote the generic notation of the packet of data (at, at+1, et) as (x, y, e) and produce the output digit y ^e, and the new data packet becomes (at+1, at+2, et+1). The output digit y ^ and the carry ^e are generated as follows:

8  ^e ¼ 0 > > if ðai þ bi þ eÞðx þ sgnðai þ bi þ eÞÞ–0 ðmod rÞ; > ^e ¼ sgnðai þ bi þ eÞ > > otherwise: : ^ ¼ ai þ bi þ e  sgnðai þ bi þ eÞ  r y

ð14Þ

The problem of finding minimal-weight binary representation is usually referred to as ‘‘canonical Booth recoding” [3]. The signed-digit recoding is canonical if its signed-digit representation contains no adjacent nonzero digits. Based on [11], the canonical-signed-digit recoding algorithm is depicted as follows: Canonical-signed-digit recoding algorithm Input: a 2 Sr with p(a) = n; Output: A(a) 2 Sr; begin t = 0; while (. . ., at+2, at+1, at) – (. . ., 0, 0, 0) do begin if at – 0 then begin b = (. . ., sgn(at), sgn(at)  r, 0, . . ., 0) c = a + b; ifct+1 = 0 then a = c; end; t = t + 1; end; end.

/*a is redundant representation of n*/ /*A(a) denotes the action of this algorithm on a*/

/*nonzeros at t and t + 1*/

In 1960 [16], Reitwiesner proposed the minimal-signed-digit (MSD) recoding algorithm for producing the canonicalsigned-digit representation (he proved that this representation is unique if the binary representation is treated as padded with an initial ‘‘0”) with minimal Hamming weight. This minimal-signed-digit recoding algorithm is defined as follows: Minimal-signed-digit (MSD) recoding algorithm Input: E = (rm1 rm2    r1 r0)2;  Output: EMSD = (em1 em2    e1 e0)SD2 where 0 6 i 6 m and ei 2 f0; 1; 1g; begin c0 = 0; em+1 = 0; em = 0; for i = 0 to m do begin ci+1 = b(ci + ri + ri+1)/2c; ei = ci + ri  2ci+1; end; end.

3.2. Mathematical preliminaries Here we give some mathematical preliminaries including lemmas and definitions for the proposed fast modular exponentiation method. The basic idea of the proposed CMM–MSD Montgomery modular exponentiation algorithm is to first extract the common recoding parts of multiplicands and save the number of binary additions for the computation of common parts. We here define some variables for the proposed CMM–MSD Montgomery algorithm as follows:

415

C.-L. Wu / Information Sciences 179 (2009) 410–421

Ecommon ¼ AND3i¼1 Ei ¼ E1 AND E2 AND E3 ;

ð15Þ

E1;c ¼ E1 XOR Ecommon ;

ð16Þ

E2;c ¼ E2 XOR Ecommon :

ð17Þ

E3;c ¼ E3 XOR Ecommon :

ð18Þ

Hence, E1, E2, and E3 can be represented as follows:

E1 ¼ E1;c þ Ecommon ;

ð19Þ

E2 ¼ E2;c þ Ecommon ;

ð20Þ

E3 ¼ E3;c þ Ecommon :

ð21Þ

Thus, the common-multiplicand-multiplications X  Ei (i = 1, 2, and 3) can be computed with the assistance of X  Ycommon as

X  E1 ¼ X  E1;c þ X  Ecommon :

ð22Þ

X  E2 ¼ X  E2;c þ X  Ecommon :

ð23Þ

X  E3 ¼ X  E3;c þ X  Ecommon :

ð24Þ

Here we depict the bitwise logical ‘‘AND” and ‘‘XOR” operators in Table 1. The following first two lemmas enable to generate a canonical recoding output with minimal-weight and efficient calcu in the proposed CMM–MSD Montgomery algorithm. late the modular inverse result for handling the negative signed-digit 1 The third lemma enables us to describe the expected value of signed-digit recoding number based on the probability distribution property. Lemma 1. Let S be a string and the sequence [S]l denote S, S, . . . , S (repeat l times), we have the following two equivalences exist in our SD radix-2 representation:

 lÞ ; ð1Þ ð½0; 1l ; 1ÞSD2 ¼ ð1; ½0; 1 SD2  l ; 1Þ   ½0; 1l Þ : ð2Þ ð½0; 1 ¼ ð 1; SD2 SD2

ð25Þ ð26Þ

Proof. (1) Based on the signed-digit arithmetic defined in Section 3.1, we can have

ð½0; 1l ; 1ÞSD2 ¼ 1 þ

l X

22i1 ¼ 22l 

i¼1

l X

22i2 :

ð27Þ

i¼1

P  lÞ . However, 22l  li¼1 22i2 can also be represented as ð1; ½0; 1 SD2  lÞ Therefore, the first equivalence ð½0; 1l ; 1ÞSD2 ¼ ð1; ½0; 1 SD2 holds. (2) Again, based on the signed-digit arithmetic representation, we know

 l ; 1Þ  ð½0; 1 SD2 ¼ 1 

l X

22i1 ¼

i¼1

l X

22i2  22l :

ð28Þ

i¼1

P  ½0; 1l Þ . However, li¼1 22i2  22l can also be represented as ð1; SD2   ½0; 1l Þ  l ; 1Þ ¼ ð 1; Therefore, the second equivalence ð½0; 1 SD2 SD2 holds. h Lemma 2. The relation (a1 = r1 mod N) holds as (r = a mod N). Proof. Based on the extended Euclid theorem [10], we can have (a1 mod N = x) and (a  x mod N = 1), hence, a  x  b  N = 1 where b is an integer. Let r = a mod N and a = Q  N + r, since a  a1 = 1 mod N, we obtain Table 1 Bitwise logical operators ‘‘AND” and ‘‘XOR”. AND

 1

0

1

 1 0 1

 1 0 0

0 0 0

0 0 1

XOR  1 0 1

 1 0  1 0

0  1 0 1

1 0 1 0

416

C.-L. Wu / Information Sciences 179 (2009) 410–421

(QN  a1 + r  a1 = 1 mod N) and (r  a1 = 1 mod N). As (r  r1 = 1 mod N), yields (r  a1 = r  r1 mod N). Hence, (a1 = x = r1 mod N), we complete our proof. h Lemma 3. Let kr be a random variable on the space of m-digit integers denoting the radix-r with minimal-Hamming-weight. As Exp(a) denote the expected value of a, we can have the expected value of kr as

Exp ðkr Þ ¼

ðr  1Þ m ðas m ! 1Þ: rþ1

ð29Þ

Proof. Let n be a random m-digit radix-r integer n with 0 6 n < rm, whose standard representation is given by ^ ¼ ð. . . ; 0; a ^m ; a ^m1 ; a ^m2 ; . . . ; a ^1 ; a ^0 Þ. a = (. . ., 0, am, am1, am2, . . . , a1, a0), and whose minimal representation is given by a ^. Based on [1], we know the expect value of zm for m-digit radix-r recoding integer n is Let zm denote the number of zeros in a

Exp ðzm Þ ¼

2 m ðas m ! 1Þ: ðr þ 1Þ

ð30Þ

 Let Prob(a) denote the probability distribution of a, and let wþ m and wm denote the number of positive digits and the num^ ber of negative digits of a, respectively. We can have the following probability distributions as [1]

Prob ðwþm Þ ¼

ðr  1Þ2 ðr  1Þ and P rob ðwm Þ ¼ rðr þ 1Þ rðr þ 1Þ

ðas m ! 1Þ:

ð31Þ

Therefore, we can have the following expected values as

Exp ðwþm Þ ¼

ðr  1Þ2 ðr  1Þ m ðas m ! 1Þ: m and Exp ðwm Þ ¼ rðr þ 1Þ rðr þ 1Þ

ð32Þ

Recall that kr is independent of the signed-digit representation from the minimal-signed-digit recoding algorithm as depicted in Section 3.1, thus we have

kr ¼ wþm þ wm and Exp ðwm Þ ¼ Exp ðwþm Þ þ Exp ðwm Þ: Therefore, we can have Exp ðkr Þ ¼

ðr1Þ2 rðrþ1Þ



ðr1Þ m rðrþ1Þ

¼

ðr1Þ m ðrþ1Þ

ð33Þ ðas m ! 1Þ. h

3.3. The proposed CMM–MSD Montgomery modular exponentiation algorithm After we have depicted the definitions and lemmas shown above, we here propose CMM–MSD Montgomery algorithm (using Montgomery reduction algorithm, CMM method and MSD technique) for fast evaluating (ME mod N) operation as follows: CMM–MSD Montgomery modular exponentiation algorithm

/*EMSD = (em em1    e2 e1)SD2, R = 2n*/

Input: M, EMSD, N, R ; C2 ¼ M

E2 ½1

E3 ½2







; C3 ¼ M ; D1 ¼ M E1 ½1 ; D2 ¼ M E2 ½1 ; D3 ¼ ME3 ½2 ; * C1 = C2 = C3 = D1 = D2 = D3 = 2 ; / M and N are n-digit integers in base 3*/ S = M  R mod N; /*E1, E2 and E3 are m/3-signed-digit in radix-3*/ begin for i = 1 to m do /*scan exponent EMSD from right to left */ begin /*evaluate M E1 for positive signed-digit*/ if (e1i = 1) then C1 = MMR(SC1); 1  if (e ¼ 1) then D = MMR (S D ); /*evaluate ME1 for negative signed-digit*/ Output: C 1 ¼ M

E1 ½1

n

1i

1

1

/*evaluate M E2 for positive signed-digit*/ if (e2i = 1) then C2 = MMR(SC2); 1  if (e2i ¼ 1) then D2 = MMR (S D2); /*evaluate ME2 for negative signed-digit*/ if (e = 2) then C = MMR(SC ); /*evaluate M E3 for positive signed-digit*/ 3i

3

3

 then D3 = MMR (S1D3); /*evaluateM E3 for negative signed-digit*/ if (e3i ¼ 2) S = MMR(SS); end; end.

In order to execute the exponentiation operation, we divide the exponent EMSD (m-bit) into three equal length parts as E1, E2, and E3, each with of m/3 bits. In the proposed CMM–MSD Montgomery algorithm, we put the operation results of positive digit in the register C1, C2, and C3, and we put the operation results of negative digit in the registers D1, D2, and D3. The C1 and D1 are used to store the operation results in decomposition segment E1 of minimal-signed-digit exponent EMSD. And the C2 and D2 are used to store the operation results in the decomposition segment E2 of exponent EMSD. Meanwhile, the C3 and D3 are used to store the operation results in the decomposition segment E3 of exponent EMSD.

417

C.-L. Wu / Information Sciences 179 (2009) 410–421

Our main goal for applying common-multiplicand-multiplication technique is that the common part among MMR(SC1), MMR(SD1), MMR(SC2), MMR(SD2), MMR(SC3), MMR(SD3), and MMR(SS) can be therefore computed just once. In order to achieve this goal, we define the Montgomery modular reduction computation of MMR(AB) as follows:

MMRðABÞ ¼ ABR1 mod N ¼ AðBn1  2n1 þ Bn2  2n2 þ    þ B1  2 þ B0 Þ  2n mod N ¼ ðBn1 ðA  2k1 mod NÞ þ Bn2 ðA  2k2 mod NÞ þ    þ B1 ðA  2knþ1 mod NÞ þ B0 ðA  2kn mod NÞÞ  2k mod N; ð34Þ n

where A, B and N are n-digit integers in base 2 and R ¼ 2 mod N. From the above, we can have the optimal intermediate reduction result MMR(AB) of Bn1 ðA  2k1 mod NÞ þ    þ B1 ðA  2knþ1 mod NÞ þ B0 ðA  2kn mod NÞ that bound at most an (n + 1) + dlog2 ne-bit integer when k = 2. If we denote T as S  2iþ1 mod N, the operations of S  2i mod N for 1 6 i 6 n  2 are repeatedly computed using the previous computation result of T, where S  2i mod N ¼ T  21 mod N. Also notice that each step of a single modular multiplication in the CMM–MSD Montgomery algorithm has a similar step in the Montgomery modular reduction algorithm as depicted in Section 2.3. Assume we have two n-bits exponents E1, E2, and E3 (where n ¼ m3 and m is the bit length of exponent E), based on the previous mathematical equations (from Eqs. (15)–(24)) depicted in Section 3.2, the exponentiation operation M EMSD can be depicted as 2n 3

n

ME ¼ M E1 kE2 kE3 ¼ M E1 2  M E2 2  ME3

ð\k" is the concatenation operatorÞ: Pm1

i

ð35Þ Pm1

i

Pm1

Let the decomposition segments E1 and E2 be expressed as E1i ¼ i¼0 e1i  2 ; E2i ¼ i¼0 e2i  2 , and E3i ¼ i¼0 e3i  2i  2g.  Moreover, the exponentiations ME1 ½1 , M E2 ½1 , where e1i, e2i, and e3i are minimal-signed-digits and e1i ; e2i e3i 2 f0; 1; 2; 1; and ME3 ½2 are evaluated for handling positive signed-digit in E1, E2, and E3, respectively.    Similarly, the exponentiations ME1 ½1 ; M E2 ½1 , and ME3 ½2 are evaluated for handling negative signed-digit in E1, E2, and E3,  respectively. Finally, we can output the exponentiation results of ‘‘ME1 ½1 mod N; M E2 ½1 mod N; M E3 ½2 mod N; M E1 ½1 mod N;   E2 ½1 E3 ½2 mod N; and M mod N” into six different registers C1, C2, C3, D1, D2, D3, respectively. By applying the common part M extraction technique (depicted in Section 2.2) of the common-multiplicand-multiplication method upon the three exponent segments E1, E2, and E3 and the related mathematical equations (depicted in Section 3.2), we can obtain the three temporary exponents Ecommon, E1,c, E2,c, and E3,c using Eqs. (15)–(18). It should be noted that all three temporary exponents are bitwise mutually exclusive [26]. Therefore, all the following eight exponentiations:

MEcommon½1 ; M E1;c½1 ; M E2;c½1 ; M E3;c½2 ; M Ecommon½1 ; M E1;c½1 ; ME2;c½1 ; ME3;c½2 can be evaluated in a batch more efficiently using a modified version of the proposed CMM–MSD Montgomery exponentiation algorithm as detailed depicted in the next section. 3.4. The improved CMM–MSD Montgomery modular exponentiation algorithm In the improved version of CMM–MSD Montgomery algorithm, we define six exponentiation results M Ecommon½1 ; M E1;c½1 ;  ; M Ecommon½1 ; M E1;c½1 and M E2;c ½1 . Based on Lemma 2 described in Section 3.2, we can improve the proposed CMM–MSD M Montgomery algorithm (depicted in Section 3.3) by replacing the multiplicative inverse operation into multiplication operation as follows: E2;c½1

Improved CMM–MSD Montgomery modular exponentiation algorithm /*EMSD = (em em1    e2 e1)SD2*/

Input: M, EMSD, N, R Output: C 1 ¼ M

Ecommon½1

; C2 ¼ M

E1;c½1

; C3 ¼ M

E2;c½1

; C 4 ¼ M E3;c½2

D1 ¼ M Ecommon½1 ; D2 ¼ ME1;c½1 ; D3 ¼ M E2;c½1 ; D4 ¼ M E3;c½2 ; begin /*M and N are n-digit integers in base 2*/ C1 = C2 = C3 = C4 = D1 = D2 = D3 = D4 = 2n; S = M  R mod N; /*R=2n*/ for i = 1 to m do /*scan exponent EMSD from right to left */ begin /*evaluate MEcommon for positive signed-digit*/ if (e01i ¼ 1) then C1 = MMR(SC1);  then D = MMR(SD ); /* evaluate M Ecommon for negative signed-digit*/ if (e0 ¼ 1Þ 1i

1

1

if (e02i ¼ 1) then C2 = MMR(SC2);  then D2 = MMR(SD2); if (e02i ¼ 1)

/*evaluate ME1;c for positive signed-digit*/ /*evaluate M E1;c for negative signed-digit*/

if (e03i ¼ 1)  if (e03i ¼ 1) if (e04i ¼ 2)  if (e0 ¼ 2Þ

/*evaluate ME3;c for negative signed-digit*/

then C3 = MMR(SC3); then D3 = MMR(SD3); then C4 = MMR(SC4);

then D4 = MMR(SD4); 4i S = MMR(SS); end; end.

/*evaluate ME2;c for positive signed-digit*/ /*evaluate M E2;c for negative signed-digit*/ /*evaluate ME3;c for positive signed-digit*/

418

C.-L. Wu / Information Sciences 179 (2009) 410–421

Note the six exponents Ecommon½1 ; E1;c½1 ; E2;c½1 ; Ecommon½1  ; E1;c½1  ; E2;c½1  ; E3;c½2 , and E3;c½2  , in the improved CMM–MSD Montgomery modular exponentiation algorithm are bitwise mutually exclusive as defined previously in Section 3.3, by applying this property, we can therefore more efficiently evaluate modular exponentiation operation with fewer binary additions as well as bit multiplications [26]. We put the positive signed-digit recoding operation result and negative signed-digit recoding operation results of ME1;c in registers C2 and D2, respectively. And we put the positive signed-digit recoding operation result and negative signed-digit recoding operation results of ME2;c in registers C3 and D3, respectively. Moreover, we put the positive signed-digit recoding operation result and negative signed-digit recoding operation results of ME3;c in registers C4 and D4, respectively. We here describe the improved algorithm in detail as follows. In order to detailed depicted the improved CMM–MSD Montgomery modular exponentiation algorithm shown above, we define six registers C1, C2, C3, C4, D1, D2, D3, D4 in the improved CMM–MSD Montgomery algorithm as follows:

C 1 ¼ M Ecommon½1 mod N; D1 ¼ M

Ecommon½1 

ð36Þ

mod N;

ð37Þ

C 2 ¼ M E1;c½1 mod N;

ð38Þ

D2 ¼ ME1;c½1 mod N;

ð39Þ

C 3 ¼ M E2;c½1 mod N;

ð40Þ

E2;c½1 

mod N:

ð41Þ

C 4 ¼ M E3;c½2 mod N;

ð42Þ

D3 ¼ M D4 ¼ M

E3;c½2 

mod N:

ð43Þ

From Eq. (15) depicted in Section 3.2, we define M

M

Ecommon½1

M

Ecommon½1 

¼M ¼M

E1½1 ANDE2½1

Ecommon½1

and M

Ecommon½1 

as

ð44Þ

;

 E1½1  ANDE2 ½1

ð45Þ

:

From Eq. (16) depicted in Section 3.2, we define M

E1;c½1

and M

E1;c½1 

as

ME1;c½1 ¼ M E1½1 XOREcommon½1 ;

ð46Þ

ME1;c½1 ¼ M E1½1 XOREcommon½1 :

ð47Þ

From Eq. (17) depicted in Section 3.2, we define M

M

E2;c½1

M

E2;c½1 

¼M

E1½1 XOREcommon½1

;

¼M

E2½1  XOREcommon½1 

:

M M

E3;c½2 

¼M

E3½2 XOREcommon½1

;

¼M

E3½2  XOREcommon½1 

:

and M

E2;c½1 

as

ð48Þ ð49Þ

From Eq. (18) depicted in Section 3.2, we define M E3;c½2

E2;c½1

E3;c½2

and M

E3;c½2 

as

ð50Þ ð51Þ E1 ½1

E2 ½1

Note that the exponentiations M and M are evaluated for handling positive signed-digit in exponent segments E1 and E2, respectively. Based on the definitions of Eqs. (19) and (21), we can have the following:

ME1½1 ¼ M E1;c½1 þEcommon½1 ; M

E2½1

¼M

E2;c½1 þEcommon½1

ð52Þ ð53Þ

;

ME3½2 ¼ M E3;c½2 þEcommon½1 :

ð54Þ  E1 ½1

 E2 ½1

 E3 ½2

As the exponentiations M ;M , and M tively. Similarly, we can obtain the following:

are evaluated for handling negative signed-digit in E1, E2, and E3, respec-

ME1½1 ¼ M E1;c½1 þEcommon½1 ; M

E2½1 

¼M

E2;c½1  þEcommon½1 

ð55Þ

;

ð56Þ

ME3½2 ¼ M E3;c½2 þEcommon½1 :

ð57Þ

To calculate the result for two n-bits (n = m/3) exponent segments E1, E2, and E3, we define

ME1 ¼ M E1½1 þE1½1 ;

ð58Þ

ME2 ¼ M E2½1 þE2½1 ;

ð59Þ

M

E3

¼M

E3½2 þE3½2 

:

ð60Þ

419

C.-L. Wu / Information Sciences 179 (2009) 410–421

From the definition of the m-bit minimal-Hamming-weight signed-digit recoding exponentEMSD with E1, E2, and E3, we have

MEMSD ¼ ME1 kE2 kE3 ðwhere \k" is the concatenation operatorÞ:

ð61Þ

To obtain the final result, based on the definition of Eq. (35) defined earlier we have n

2n 3

n

MEMSD mod N ¼ M E1 2  ME2 2  M E3 ¼ M ðE1 2

2n

ÞþðE2 2 3 ÞþE3

mod N:

ð62Þ

Example. We assume N = 37*41 = 1517, message M = 127, E = 1213, ME.  101Þ    Sol. E ¼ ð10010111101Þ2 ¼ ð10100200 SD ; E1 kE2 kE3 ¼ ð101ÞSD kð0020ÞSD kð0101ÞSD . For the positive value:

C 1  M ð101ÞSD mod N  Mð5Þ10 mod N; C 2  Mð0000ÞSD mod N  Mð0Þ10 mod N; and C 3  M ð0001ÞSD mod N  M ð1Þ10 mod N:

For the negative value:

D1  M ð000ÞSD mod N  M ð0Þ10 mod N; D2  M ð0020ÞSD mod N  M ð4Þ10 mod N; and D3  M ð0100ÞSD mod N  M ð4Þ10 mod N:

  100Þ E ¼ ð10100000001ÞSD þ ð1000 SD ¼ ð10100000001Þ2  ð1000100Þ2 ¼ A  B: So, A = (10100000001)2 = 1281, B = (1000100)2 = 68. ME mod N  127128168 mod 1517  1048.

4. Computational complexity analyses In this section, we will detailed describe the theoretical analyses for the performance of the proposed CMM–MSD Montgomery algorithm. We first analyze the performance of the proposed algorithm and then calculate the number of binary additions and modular multiplications needed by using the proposed CMM–MSD Montgomery algorithm. S1 is the inverse of S under modulus N. S1 can be pre-computed using the Euclidean algorithm or Euler theory [10]. Let the m-bit exponent E be recoded as the radix-r (r > 0) signed-digit representation EMSD. Based on Lemma 3 given previously, the probability for the occurrences of positive digit ‘‘r” and negative digit ‘‘r”(or ‘‘ r”) is

ðr1Þ2 rðrþ1Þ

and

ðr1Þ , rðrþ1Þ

 respectively, and the probability for the occurrence of digit ‘‘0” is The two radix-3 nonzero digits, ‘‘1” and ‘‘1”,  and, ‘‘2” and ‘‘2”,occur with equal probability [1], we can thus have the occurrence probabilities of ‘‘0”, ‘‘1” and  ‘‘2” and ‘‘2”,  as fP rob ð0Þ ¼ 2=3; P rob ð1Þ ¼ P rob ð1Þ  ¼ Prob ð2Þ ¼ Prob ð2Þ  ¼ 1=12g. ‘‘1”,, 2 . ðrþ1Þ

In the proposed CMM–MSD Montgomery algorithm, the computation of ð2i S mod NÞ needs (n  2)  (n + 1) single-precision multiplications for 1 6 i 6 n  2. Based on the computational analyses of Montgomery reduction algorithm from [9], the probability of executing modular multiplication MMR(SC1), MMR(SC2), MMR(SC3), or MMR(SC4) in the CMM–MSD Montgomery algorithm is all equivalent to the occurrence probability of signed-digit ‘‘1” and ‘‘2”in EMSD. Therefore, the operations MMR(SC1), MMR(SC2), MMR(SC3), or MMR(SC4) total require

6

1 3  ½1:5m  ðn  2Þ  ðn þ 1Þ ¼ m  ðn2  n  2Þ 12 4

single-precision multiplications. Similarly, operations MMR(SD1), MMR(SD2), MMR(SD3), and MMR(SD4) total require

6

1 3  ½1:5m  ðn  2Þ  ðn þ 1Þ ¼ m  ðn2  n  2Þ 12 4

single-precision multiplications.

420

C.-L. Wu / Information Sciences 179 (2009) 410–421

The operation MMR(SS) requires

2 1  ½0:5m  ðn  2Þ  ðn þ 1Þ ¼ m  ðn2  n  2Þ 3 3 single-precision multiplications, as the ð2i S mod NÞ operations are computed exactly once. By adopting the Montgomery modular reduction MMR algorithm, it requires on average 1.5m  (2n2 + n) multiplications. Meanwhile, Ha–Moon’s improved Montgomery binary algorithm takes 0.5m  (5n2 + 4n) multiplications [9] for calculating an m-bits exponent exponentiation. However, the proposed CMM–MSD Montgomery modular exponentiation algorithm only takes

3 3 1 m  ðn2  n  2Þ þ m  ðn2  n  2Þ þ m  ðn2  n  2Þ  1:833m  ðn2  n  2Þ 4 4 3 single-precision multiplications. Take a 512-bit exponent E and 216-base N to evaluate ðM E mod NÞ for example, Ha–Moon’s improved Montgomery algorithm reduces the overall number of single-precision multiplications by about 16% [6]. On average, the proposed CMM–MSD Montgomery algorithm in this paper reduces the overall number of single-precision multiplications (compared to Dusse– Kaliski’s Montgomery algorithm [9]) by about

1

1:833m  ðn2  n  2Þ 1:167  38:9%:  1:5m  ð2n2 þ nÞ 3

Moreover, on average the proposed CMM–MSD Montgomery algorithm reduces the overall number of single-precision multiplications (compared to Ha–Moon’s Montgomery algorithm [6]) by about

1

1:833m  ðn2  n  2Þ 0:667  26:68%:  0:5m  ð5n2 þ 4nÞ 2:5

5. Conclusions As the modular exponentiation is one of the most important operations in public-key cryptography, therefore, the efficient implementation of modular exponentiation has become the key factors affecting the performance of public-key cryptosystems [22]. In this paper, a new method (the CMM–MSD Montgomery algorithm) for speeding up modular exponentiation is investigated by using the Montgomery modular reduction method, common-multiplicand-multiplication method, and minimal-signed-digit exponent recoding technique. Based on the computational complexity analyses for the proposed modular exponentiation algorithm, we have the following observations. First, the proposed CMM–MSD Montgomery algorithm requires extra additions and shift operations, compared with some modern Montgomery modular exponentiation algorithms. Nevertheless, on average the overhead of increasing such extra operations is less significant compared with that of reducing the multiplications in practice. Secondly, we should point out the following fact that the evaluations of exponent segments M Ecommon½1 ; M E1;c½1 ; M E2;c½1 ; M E3;c½2 are independent of the evaluations of exponent segments M Ecommon½1 ; M E1;c½1 ; M E2;c½1 ; M E3;c½2 , Therefore, we can have those two operations concurrently executed depending on the positive and negative signed-digit representations of exponent segments. Hence, we can further have the proposed CMM–MSD Montgomery algorithm work more efficient by using parallel computing techniques. Moreover, multiplicative inverse operation can be cheaply evaluated as for elliptic curves [15,29] or in the finite field using normal basis [20,21]. Therefore, we can further speed up the proposed CMM–MSD Montgomery modular exponentiation algorithm by evaluating the multiplicative inverse operation over the finite filed using normal basis. Furthermore, by using the proposed CMM–MSD Montgomery algorithm, on average the total number of single-precision multiplications can be reduced by about 38.9% and 26.68% as compared with Dusse–Kaliski’s Montgomery algorithm [6] and Ha–Moon’s Montgomery algorithm [9], respectively. References [1] [2] [3] [4] [5]

S. Arno, F.S. Wheeler, Signed digit representations of minimal Hamming weight, IEEE Transactions on Computers 42 (8) (1993) 1007–1010. A. Avizienis, Signed digit number representation for fast parallel arithmetic, IRE Transactions on Electronic Computers EC-10 (3) (1961) 389–400. A.D. Booth, A signed binary multiplication technique, Quarterly Journal of Mechanics and Applied Mathematics 4 (1951) 236–240. W. Diffie, E. Hellmen, New directions in cryptography, IEEE Transactions on Information Theory 22 (6) (1976) 644–654. V.S. Dimitrov, G.A. Jullien, W.C. Miller, Complexity and fast algorithms for multiexponentiations, IEEE Transactions on Computers 49 (2) (2000) 141– 147. [6] S.R. Dusse, B.S. Kaliski, A cryptographic library for the Motorola DSP 56000, in: Advance in Cryptology – Proceedings of EUROCRYPT’90, LNCS, vol. 73, Springer-Verlag, 1990, pp. 230–244. [7] T. ElGamal, A public key cryptosystem and a signature scheme based on discrete logarithms, IEEE Transactions on Information Theory 31 (1985) 469– 472. [8] D.M. Gordon, A survey of fast exponentiation methods, Journal of Algorithms 27 (1) (1998) 129–146.

C.-L. Wu / Information Sciences 179 (2009) 410–421

421

[9] J.-C. Ha, S.-J. Moon, A common-multiplicand method to the Montgomery algorithm for speeding up exponentiation, Information Processing Letters 66 (2) (1998) 105–107. [10] D.E. Knuth, The Art of Computer Programming, Seminumerical Algorithms, vol. II, Addison-Wesley, MA, 1997. [11] C.K. Koc, Tech. Notes, High-Speed RSA Implementation, RSA Labs. Tech. Note TR 201, August 14, 2007. . [12] (a) S.T. Klein, Should one always use repeated squaring for modular exponentiation?, Information Processing Letters 106 (6) (2008) 232–237; (b) D-C. Lou, C.-C. Chang, Fast exponentiation method obtained by folding the exponent in half, Electronics Letters 32 (11) (1996) 984–985. [13] I. Koren, Computer Arithmetic Algorithms, second ed., A.K. Peters, Natick, MA, 2002. [14] P.L. Montgomery, Modular multiplication without trial division, Mathematics of Computation 44 (170) (1985) 519–521. [15] Y.-H. Park, S. Jeong, J. Lim, Fast exponentiation in subgroups of finite fields, Electronics Letters 38 (13) (2002) 629–630. [16] G.W. Reitwiesner, Binary Arithmetic, Advances in Computers, vol. 1, Academic Education Press, New York, 1960. pp. 231–308. [17] R.L. Rivest, A. Shamir, L. Adleman, A method for obtaining digital signatures and public key cryptosystems, Communications of the ACM 21 (2) (1978) 120–126. [18] W. Stallings, Cryptography and Network Security: Principles and Practice, Prentice-Hall, 1999. [19] M. Syuto, E. Satake, K. Tanno, O. Ishizuka, A high-speed binary to residue converter using a signed-digit number representation, IEICE Transactions on Information and Systems E85-D (5) (2002) 903–905. [20] N. Takagi, J. Yoshiki, K. Takagi, A fast algorithm for multiplicative inversion in GF(2n) using normal basis, IEEE Transactions on Computers 50 (5) (2001) 394–398. [21] Y. Watanabe, N. Takagi, K. Takagi, A VLSI algorithm for division in GF(2m) based on extended binary GCD algorithm, IEICE Transactions on Fundamentals E85-A (5) (2002) 994–999. [22] C.-L. Wu, Fast exponentiation based on common-multiplicand multiplication and minimal-signed-digit techniques, International Journal of Computer Mathematics 84 (10) (2007) 1405–1415. [23] T.-C. Wu, Y.-S. Chang, Improved generalization common-multiplicand multiplications algorithm of Yen and Laih, Electronics Letters 31 (20) (1995) 1738–1739. [24] C.-L. Wu, D.-C. Lou, T.-J. Chang, Fast binary multiplication by performing dot counting and complement recoding, Applied Mathematics and Computation 191 (1) (2007) 132–139. [25] C.-L. Wu, D.-C. Lou, J.-C. Lai, T.-J. Chang, Fast modular multi-exponentiation using modified complex arithmetic, Applied Mathematics and Computation 186 (2) (2007) 1065–1074. [26] J.-H. Yang, C.-C. Chang, Efficient residue number system iterative modular multiplication algorithm for fast modular exponentiation, Computers and Digital Techniques, IET 2 (1) (2008) 1–5. [27] S.-M. Yen, Improved common-multiplicand multiplication and fast exponentiation by exponent decomposition, IEICE Transactions on Fundamentals E80-A (6) (1997) 1160–1163. [28] S.-M. Yen, C.-S. Laih, Common-multiplicand multiplication and its applications to public key cryptography, Electronics Letters 29 (17) (1993) 1583– 1584. [29] N. Zhang, Z. Chen, G. Xiao, Efficient elliptic curve scalar multiplication algorithms resistant to power analysis, Information Sciences 177 (10) (2007) 2119–2129.