Information Sciences 179 (2009) 410–421
Contents lists available at ScienceDirect
Information Sciences journal homepage: www.elsevier.com/locate/ins
An efficient common-multiplicand-multiplication method to the Montgomery algorithm for speeding up exponentiation Chia-Long Wu * Department of Aviation and Communication Electronics, Chinese Air Force Institute of Technology, Kaohsiung 82042, Taiwan, ROC
a r t i c l e
i n f o
Article history: Received 22 February 2008 Received in revised form 19 September 2008 Accepted 3 October 2008
Keywords: Modular arithmetic Exponentiation Common-multiplicand-multiplication Signed-digit recoding Public-key cryptography Complexity analysis
a b s t r a c t The modular exponentiation is a common operation for scrambling secret data and is used by several public-key cryptosystems, such as the RSA scheme and DSS digital signature scheme. However, the calculations involved in modular exponentiation are time-consuming especially when performed in software. In this paper, we propose an efficient CMM– MSD Montgomery algorithm by utilizing the Montgomery modular reduction method, common-multiplicand-multiplication (CMM) method, and minimal-signed-digit (MSD) recoding technique for fast modular exponentiation. By using the technique of recording the common signed-digit representations in the grouped substrings of exponent, our algorithm can improve the efficiency of both the original CMM exponentiation algorithm and the Montgomery multiplication algorithm. The fast modular exponentiation algorithm developed in this paper can be easily implemented in general signed-digit computing machine, and is therefore well suited for parallel implementation to fast evaluating modular exponentiation. Moreover, by using the proposed CMM–MSD Montgomery algorithm, on average the total number of single-precision multiplications can be reduced by about 38.9% and 26.68% as compared with Dusse–Kaliski’s Montgomery algorithm and Ha– Moon’s Montgomery algorithm, respectively. Ó 2008 Elsevier Inc. All rights reserved.
1. Introduction Efficient algorithms that can speed up software implementations of modular exponentiation are often considered practical significance for practical cryptographic applications such as the RSA public-key cryptosystem [17], the Diffie–Hellman key distribution scheme [4], and the ElGamal scheme [7]. The modular exponentiation problem can be described as follows. Given M (message), E (public key), and N (modulus), compute the ciphertext C = ME mod N. To evaluate the result for modular exponentiation of (ME mod N), the very intuitive way is to break the modular exponentiation into a series of modular multiplications. As efficient evaluation of the modular exponentiation of (ME mod N) is very useful for public-key cryptosystem, we need fast multiplication designs or novel exponentiation algorithms such as the Montgomery modular reduction method [14], binary (square-and-multiply) method [10], common-multiplicand-multiplication (CMM) method [23,27], signed-digit recoding method [8], exponent-folding method [13,25], and nonstandard arithmetic methods [5]. Moreover, detailed analyses of fast exponentiation techniques have been well described by Gordon [8]. An efficient method for speeding up modular exponentiation by using the Montgomery modular reduction algorithm, the CMM technique and the MSD recoding method is presented in this paper. This paper is organized as follows. Some related
* Tel.: +886 7 6258738; fax: +886 7 6252530. E-mail address:
[email protected] 0020-0255/$ - see front matter Ó 2008 Elsevier Inc. All rights reserved. doi:10.1016/j.ins.2008.10.004
C.-L. Wu / Information Sciences 179 (2009) 410–421
411
works for efficiently solving the modular exponentiation are introduced in Section 2. In Section 3, we introduce the CMM– MSD Montgomery algorithm for fast modular exponentiation. Then, the complexity of the proposed algorithm is detailed in Section 4. Finally, we conclude our work in Section 5.
2. The modular exponentiation In 1976, Diffie and Hellman proposed the first public-key algorithm as the famous ‘‘Diffie–Hellman key exchange scheme [4]”. In 1978, Rivest et al. proposed the RSA public-key cryptosystem [17] and is widely used in digital communication today. In 1985, ElGamal proposed the ElGamal scheme [7], which can be used for the cryptographic applications of both digital signatures and encryptions. Modular exponentiation is composed of repetition of modular multiplication and is often the dominant part of modern cryptographic algorithms for key exchange, digital signatures, and authentication. Taking the RSA public-key cryptosystem for example, both the encryption and decryption operations are accomplished by modular exponentiation. However, as the high bit lengths are required to provide adequate security for electronic commerce in the telecommunication network, 1024 bit length is considered secure against attack in the near future [18]. There are two commonly used methods to reduce the execution time of the modular exponentiation. One is to reduce the number of modular exponentiation. The other is to reduce the execution time of each modular multiplication. In this paper, we improve the Montgomery multiplication method by using the common-multiplicand-multiplication method and minimal-signed-digit recoding technique to effectively reduce the number of multiplication for fast evaluating exponentiation. In this section, we will give a brief review to the binary (square-and-multiply) method, common-multiplicand-multiplication method, and Montgomery modular reduction method. Most importantly, we will also introduce some essential concepts related to the proposed CMM–MSD Montgomery modular exponentiation algorithm. 2.1. The binary (square-and-multiply) method The most commonly used algorithms for computing exponentiation are the binary methods (also called square-and-multiply methods) [12]. Its basic idea is to compute modular exponentiation by using the binary expression of exponent E and the exponentiation operation is broken into a series of squaring and multiplication operations. Assume m denotes the bit length of the exponent E, the exponent E can be expressed in binary representation as (em1 em2 e1 e0)2 and P i E ¼ m1 i¼0 ei 2 , where ei 2 {0, 1}. There are two useful modular exponentiation algorithms [10] in binary methods can convert the modular exponentiation into a sequence of modular multiplications, i.e., the LSB (least-significant-bit) algorithm and the MSB (most-significant-bit) algorithm. The LSB algorithm is depicted as follows, which computes the modular exponentiation starting from the least-significant-bit of the exponent and proceeding to the left: LSB binary modular exponentiation algorithm Input: Message: M; Exponent: E = (em1 em2 e1 e0)2; 0 6 i 6 m and ei 2 {0, 1}; Output: Ciphertext: C = ME mod N; C = 1; S = M; for i = 0 to m 1 do /*scan from right to left */ begin /*modular multiplication*/ if (ei = 1) then C = C S mod N; S = S S mod N; /*modular squaring*/ end; Output C.
The MSB algorithm (so called left-to-right binary algorithm) computes the modular exponentiation starting from the most-significant-bit of the exponent and proceeding to the right. The MSB algorithm is depicted as follows: MSB binary modular exponentiation algorithm Input: Message: M; Exponent: E = (em1 em2 e1 e0)2; 0 6 i 6 m and ei 2 {0, 1}; Output: Ciphertext: C = ME mod N; C = 1; for i = m 1 to 0 do /*scan from left to right */ begin C = C C mod N; /*modular squaring*/ /* modular multiplication */ if (ei = 1)C = C M mod N; end; Output C.
412
C.-L. Wu / Information Sciences 179 (2009) 410–421
Both the MSB and LSB binary algorithms have the same computations for multiplication and squaring operations, therefore, they have the same computational complexity. The computational complexities of two binary algorithms mentioned above are summarized as follows. Take m-bit length exponent for example, for the average case, we assume the occurrence probabilities for both bits ‘‘1” and bits ‘‘0” are the same. Then, the expectation numbers for bits ‘‘1” and ‘‘0” are both m/2. Therefore, on average the complexities for both algorithms are 2 (m/2) + 1 (m/2) = 1.5m multiplications. 2.2. The common-multiplicand-multiplication (CMM) method In 1993, Yen and Laih [28] developed the algorithm of common-multiplicand-multiplication to improve the performance of the LSB binary exponentiation algorithm. Here we focus on the computations of {X Yiji = 1, 2, . . . , t; t P 2}, a bit level arithmetic algorithm [9] is proposed and applied to the LSB binary algorithm for the efficient evaluation of exponentiation. The following variables are required in the CMM method:
Y common ¼ ANDti¼1 Y i ; Y i;c ¼ Y i XOR Y common
ð1Þ for i ¼ 1; 2; . . . ; t:
ð2Þ
Hence, each can be represented as
Y i ¼ Y i;c þ Y common :
ð3Þ
Therefore, the common-multiplicand-multiplications X Yi (i = 1, 2, . . . , t) can be computed with the assistance of X Ycommon as
X Y i ¼ X Y i;c þ X Y common
for i ¼ 1; 2; . . . ; t:
ð4Þ
The basic idea of CMM technique is to extract the common parts of multiplicands, and then save the number of binary additions for the computation of common parts. By using the above algorithm, the computations {X Y1, X Y2} can be represented as {X Y1,c + X Ycommon, X Y2,c + X Ycommon}. Let both X and Yis be k-bit integers. On average, the Hamming weights of Yi, Ycommon and Yi,c are k/2, k/2t and (k/2 k/2t), respectively. The total number of binary additions for the common-multiplicand-multiplication evaluation of {X Yiji = 1, 2, . . . , t; t P 2} is k/2t + t (k/2 k/2t). Without the CMM algorithm, the evaluation of {X Yiji = 1, 2, . . . , t; t P 2} are computed one after another independently using total t (k/2) binary addition. The performance improvement [27] of the common-multiplicand-multiplication algorithm can be denoted as
k 2t
kt 2
þt
k 2
2kt
¼
t t þ ð1 tÞ 21t
ð5Þ
:
Based on Eq. (5) shown above, the optimal performance can be obtained as 4/3 when t = 2, which implies we need 1:5k ¼ 2m 34 multiplications by using the CMM algorithm for evaluating X Y1 and X Y2. On average, by applying the CMM algorithm and the LSB binary algorithm, exponentiation can be evaluated by using (1.5 + 1)m/2=1.25m [27] multiplications for exponent E being a m-bit integer. Consider the application to public-key cryptography for evaluating (ME mod N), where both M and E are 512-bit, the LSB binary algorithm and the CMM algorithm combined with the LSB binary algorithm require 768 multiplications and 640 multiplications, respectively. 2.3. The Montgomery modular reduction and exponentiation algorithm In 1985, Montgomery [14] first introduced the modular reduction algorithm (also know as Montgomery modular multiplication algorithm). The Montgomery multiplication algorithm speeds up the modular multiplications and squarings required for exponentiation [9]. Suppose that we want to compute modular multiplication AB(mod N), where A, B and modulus N are n-digit integers represented in base 2. Hence
A¼
n1 X i¼0
Ai 2 i ;
B¼
n1 X i¼0
Bi 2i ;
N¼
n1 X
N i 2i ;
ð6Þ
i¼0
where Ai, Bi and Ni are elements of {0, 1} for all i. In 1990, Dusse and Kaliski [6] proposed an efficient Montgomery modular reduction (MMR) algorithm to perform both multiplication and modular reduction simultaneously. Assume we define R = 2n, the modular reduction of a give positive T in the N-residue is defined as T0 = MMR(T) = T + n N/R = T R1 mod N where R > N is an integer with gcd(R, N) = 1. To obtain the correct result T0 , an additional quantity N0 is needed, where R R1 N N0 = 1 and n ¼ T N 0 mod 2n . 0 Note, the MMR algorithm is processed in N-residue, it allows the precomputation of N 00 ¼ N 1 0 mod 2 instead of N ¼ N 1 mod 2n . Let N = Nn1 2n1 + + N1 2 + N0 and X = Xn1 2n1 + + X1 2 + X0, we can compute X one digit in every modular reduction step instead of computing the whole X at one time. Dusse–Kaliski’s Montgomery modular reduction algorithm can be depicted as follows:
C.-L. Wu / Information Sciences 179 (2009) 410–421
413
Montgomery modular reduction (MMR) algorithm
Input: A, B, N /*A, B and N are n-digit integers in base 2*/ Output: X /*X = AB 2n mod N*/ X = 0; /*X = (Xn1 X1X0)2 where n is the bit length of X*/ begin for i = 0 to m 1 do /*scan from least-significant-bit*/ begin X = X + AiB; 0 1 Y ¼ X 0 N 00 mod 2; /N 00 ¼ N 1 mod 2n / 0 mod 2; N ¼ N X = (X + Y N)/2; end; if (X P N)X = X N; /*X = AB 2n mod N = MMR(AB)*/ end.
From the MMR algorithm shown above, we should note that both R1 and N0 can be computed using the Euclidean algorithm [10,12], where R = 2n. Assume N0 can be precomputed and stored before we using the MMR algorithm, the modular reduction result in the N-residue be computed using the formulas (Eqs. (7)–(11)) depicted below:
A0 ¼ MMRðAR0 Þ ¼ AR mod N;
ð7Þ
B0 ¼ MMRðBR0 Þ ¼ BR mod N;
ð8Þ
X ¼ A0 B0 ¼ ARBR mod N;
ð9Þ
C 0 ¼ MMRðXÞ ¼ ABR mod N;
ð10Þ
C ¼ MMRðC 0 Þ ¼ AB mod N:
ð11Þ
As mentioned in Section 2.1, modular exponentiation can be reduced to a series of modular multiplications and squarings. By adopting the LSB binary method to the MMR algorithm shown above, we now introduce the Montgomery modular exponentiation (MME) algorithm as follows [6]: Montgomery modular exponentiation (MME) algorithm Input: M, R, N, E Output: C begin S = M R mod N, C = R mod N; for i = 0 to m 1 do begin if (ei = 1) then C = MMR(SC); S = MMR(SS); end; C = MMR(C); end.
/*M and N are n-digit integers in base 2*/ /*E = (em1 e1 e0) where 0 6 i 6 m and ei 2 {0, 1}*/
/*R = 2n*/ /*scan exponent E from right to left*/ /*handle the nonzero digit of E*/
/*C = ME mod N*/
3. The proposed CMM–MSD Montgomery algorithm In the following, we will first introduce the basic concept of minimal-signed-digit (MSD) recoding arithmetic. Meanwhile, we will summarize some important mathematical preliminaries and formulas for our improved Montgomery algorithm using common-multiplicand-multiplication (CMM) method and MSD recoding technique. Finally, we will give detailed description for the proposed CMM–MSD Montgomery algorithm and the improved version of the proposed CMM–MSD Montgomery algorithm for fast modular exponentiation used in public-key cryptosystems. 3.1. Minimal-signed-digit recoding arithmetic A signed digit (SD) vector representation of an integer a in radix r is a sequence of digits a = (. . ., ai, . . . , a2, a1, a0) with P i ai 2 {0, ±1, . . . , ±(r 1)} such that a ¼ 1 i¼0 ai r . Redundant representations of this form have been used successfully in many arithmetic applications [2,16], including the modular exponentiation and modular multiplication used in public-key cryptosystems [19,24]. In 1993, Arno and Wheeler [1] proposed the signed-digit recoding (redundant representations) algorithm for minimal for 1 and we denote the bit length of a by Hamming weight arithmetic. For signed-digit recoding systems, we write 1 P i 0 a 2 denotes the binary signed expansion with signed-digit recoding of radix r = 2 for a (hence, jaj. If a ¼ m i¼0 i 0; 1gÞ, then we write a ¼ ða0 ; . . . ; a0 Þ . a0i 2 f1; m 0 SD2 Let radix r P 2 be an integer, and let Sr (the signed-digit vector representation of an integer a in radix r) denote the sequence of digits a = (. . ., a2, a1, a0) with ai 2 {0, ±1, . . . , ±(r 1)}. The Hamming weight of an element a 2 Sr, denoted w(a), is defined to be the number of nonzero terms in a.
414
C.-L. Wu / Information Sciences 179 (2009) 410–421
The mapping p : Sr ? Z defined by
pðaÞ ¼
m X
ai ri
ð12Þ
i¼0
associates an integer with each element a 2 Sr. We refer to Sr as the set of all signed-digit radix r representations of elements x by x sgn(x) r, where sgn(x) is 0, 1, 1 depending on whether x is zero, positive, of Z. Let a, b 2 Sr, we define negative digit or negative. If a = (am, . . . , a2, a1, a0) and b = (bm, . . . , b2, b1, b0), then we define the addition of c = a + b = (cm, . . ., c2, c1, c0) in Sr where e1 = 0 as
8 > > > <
ei ¼ 0
if r < ai þ bi þ ei1 < r; ci ¼ ai þ bi þ ei1 > ei ¼ sgnðai þ bi þ ei1 Þ > > otherwise: : ci ¼ ai þ bi þ ei1 ei r
ð13Þ
^ and the new carry We denote the generic notation of the packet of data (at, at+1, et) as (x, y, e) and produce the output digit y ^e, and the new data packet becomes (at+1, at+2, et+1). The output digit y ^ and the carry ^e are generated as follows:
8 ^e ¼ 0 > > if ðai þ bi þ eÞðx þ sgnðai þ bi þ eÞÞ–0 ðmod rÞ; >
^e ¼ sgnðai þ bi þ eÞ > > otherwise: : ^ ¼ ai þ bi þ e sgnðai þ bi þ eÞ r y
ð14Þ
The problem of finding minimal-weight binary representation is usually referred to as ‘‘canonical Booth recoding” [3]. The signed-digit recoding is canonical if its signed-digit representation contains no adjacent nonzero digits. Based on [11], the canonical-signed-digit recoding algorithm is depicted as follows: Canonical-signed-digit recoding algorithm Input: a 2 Sr with p(a) = n; Output: A(a) 2 Sr; begin t = 0; while (. . ., at+2, at+1, at) – (. . ., 0, 0, 0) do begin if at – 0 then begin b = (. . ., sgn(at), sgn(at) r, 0, . . ., 0) c = a + b; ifct+1 = 0 then a = c; end; t = t + 1; end; end.
/*a is redundant representation of n*/ /*A(a) denotes the action of this algorithm on a*/
/*nonzeros at t and t + 1*/
In 1960 [16], Reitwiesner proposed the minimal-signed-digit (MSD) recoding algorithm for producing the canonicalsigned-digit representation (he proved that this representation is unique if the binary representation is treated as padded with an initial ‘‘0”) with minimal Hamming weight. This minimal-signed-digit recoding algorithm is defined as follows: Minimal-signed-digit (MSD) recoding algorithm Input: E = (rm1 rm2 r1 r0)2; Output: EMSD = (em1 em2 e1 e0)SD2 where 0 6 i 6 m and ei 2 f0; 1; 1g; begin c0 = 0; em+1 = 0; em = 0; for i = 0 to m do begin ci+1 = b(ci + ri + ri+1)/2c; ei = ci + ri 2ci+1; end; end.
3.2. Mathematical preliminaries Here we give some mathematical preliminaries including lemmas and definitions for the proposed fast modular exponentiation method. The basic idea of the proposed CMM–MSD Montgomery modular exponentiation algorithm is to first extract the common recoding parts of multiplicands and save the number of binary additions for the computation of common parts. We here define some variables for the proposed CMM–MSD Montgomery algorithm as follows:
415
C.-L. Wu / Information Sciences 179 (2009) 410–421
Ecommon ¼ AND3i¼1 Ei ¼ E1 AND E2 AND E3 ;
ð15Þ
E1;c ¼ E1 XOR Ecommon ;
ð16Þ
E2;c ¼ E2 XOR Ecommon :
ð17Þ
E3;c ¼ E3 XOR Ecommon :
ð18Þ
Hence, E1, E2, and E3 can be represented as follows:
E1 ¼ E1;c þ Ecommon ;
ð19Þ
E2 ¼ E2;c þ Ecommon ;
ð20Þ
E3 ¼ E3;c þ Ecommon :
ð21Þ
Thus, the common-multiplicand-multiplications X Ei (i = 1, 2, and 3) can be computed with the assistance of X Ycommon as
X E1 ¼ X E1;c þ X Ecommon :
ð22Þ
X E2 ¼ X E2;c þ X Ecommon :
ð23Þ
X E3 ¼ X E3;c þ X Ecommon :
ð24Þ
Here we depict the bitwise logical ‘‘AND” and ‘‘XOR” operators in Table 1. The following first two lemmas enable to generate a canonical recoding output with minimal-weight and efficient calcu in the proposed CMM–MSD Montgomery algorithm. late the modular inverse result for handling the negative signed-digit 1 The third lemma enables us to describe the expected value of signed-digit recoding number based on the probability distribution property. Lemma 1. Let S be a string and the sequence [S]l denote S, S, . . . , S (repeat l times), we have the following two equivalences exist in our SD radix-2 representation:
lÞ ; ð1Þ ð½0; 1l ; 1ÞSD2 ¼ ð1; ½0; 1 SD2 l ; 1Þ ½0; 1l Þ : ð2Þ ð½0; 1 ¼ ð 1; SD2 SD2
ð25Þ ð26Þ
Proof. (1) Based on the signed-digit arithmetic defined in Section 3.1, we can have
ð½0; 1l ; 1ÞSD2 ¼ 1 þ
l X
22i1 ¼ 22l
i¼1
l X
22i2 :
ð27Þ
i¼1
P lÞ . However, 22l li¼1 22i2 can also be represented as ð1; ½0; 1 SD2 lÞ Therefore, the first equivalence ð½0; 1l ; 1ÞSD2 ¼ ð1; ½0; 1 SD2 holds. (2) Again, based on the signed-digit arithmetic representation, we know
l ; 1Þ ð½0; 1 SD2 ¼ 1
l X
22i1 ¼
i¼1
l X
22i2 22l :
ð28Þ
i¼1
P ½0; 1l Þ . However, li¼1 22i2 22l can also be represented as ð1; SD2 ½0; 1l Þ l ; 1Þ ¼ ð 1; Therefore, the second equivalence ð½0; 1 SD2 SD2 holds. h Lemma 2. The relation (a1 = r1 mod N) holds as (r = a mod N). Proof. Based on the extended Euclid theorem [10], we can have (a1 mod N = x) and (a x mod N = 1), hence, a x b N = 1 where b is an integer. Let r = a mod N and a = Q N + r, since a a1 = 1 mod N, we obtain Table 1 Bitwise logical operators ‘‘AND” and ‘‘XOR”. AND
1
0
1
1 0 1
1 0 0
0 0 0
0 0 1
XOR 1 0 1
1 0 1 0
0 1 0 1
1 0 1 0
416
C.-L. Wu / Information Sciences 179 (2009) 410–421
(QN a1 + r a1 = 1 mod N) and (r a1 = 1 mod N). As (r r1 = 1 mod N), yields (r a1 = r r1 mod N). Hence, (a1 = x = r1 mod N), we complete our proof. h Lemma 3. Let kr be a random variable on the space of m-digit integers denoting the radix-r with minimal-Hamming-weight. As Exp(a) denote the expected value of a, we can have the expected value of kr as
Exp ðkr Þ ¼
ðr 1Þ m ðas m ! 1Þ: rþ1
ð29Þ
Proof. Let n be a random m-digit radix-r integer n with 0 6 n < rm, whose standard representation is given by ^ ¼ ð. . . ; 0; a ^m ; a ^m1 ; a ^m2 ; . . . ; a ^1 ; a ^0 Þ. a = (. . ., 0, am, am1, am2, . . . , a1, a0), and whose minimal representation is given by a ^. Based on [1], we know the expect value of zm for m-digit radix-r recoding integer n is Let zm denote the number of zeros in a
Exp ðzm Þ ¼
2 m ðas m ! 1Þ: ðr þ 1Þ
ð30Þ
Let Prob(a) denote the probability distribution of a, and let wþ m and wm denote the number of positive digits and the num^ ber of negative digits of a, respectively. We can have the following probability distributions as [1]
Prob ðwþm Þ ¼
ðr 1Þ2 ðr 1Þ and P rob ðwm Þ ¼ rðr þ 1Þ rðr þ 1Þ
ðas m ! 1Þ:
ð31Þ
Therefore, we can have the following expected values as
Exp ðwþm Þ ¼
ðr 1Þ2 ðr 1Þ m ðas m ! 1Þ: m and Exp ðwm Þ ¼ rðr þ 1Þ rðr þ 1Þ
ð32Þ
Recall that kr is independent of the signed-digit representation from the minimal-signed-digit recoding algorithm as depicted in Section 3.1, thus we have
kr ¼ wþm þ wm and Exp ðwm Þ ¼ Exp ðwþm Þ þ Exp ðwm Þ: Therefore, we can have Exp ðkr Þ ¼
ðr1Þ2 rðrþ1Þ
mþ
ðr1Þ m rðrþ1Þ
¼
ðr1Þ m ðrþ1Þ
ð33Þ ðas m ! 1Þ. h
3.3. The proposed CMM–MSD Montgomery modular exponentiation algorithm After we have depicted the definitions and lemmas shown above, we here propose CMM–MSD Montgomery algorithm (using Montgomery reduction algorithm, CMM method and MSD technique) for fast evaluating (ME mod N) operation as follows: CMM–MSD Montgomery modular exponentiation algorithm
/*EMSD = (em em1 e2 e1)SD2, R = 2n*/
Input: M, EMSD, N, R ; C2 ¼ M
E2 ½1
E3 ½2
; C3 ¼ M ; D1 ¼ M E1 ½1 ; D2 ¼ M E2 ½1 ; D3 ¼ ME3 ½2 ; * C1 = C2 = C3 = D1 = D2 = D3 = 2 ; / M and N are n-digit integers in base 3*/ S = M R mod N; /*E1, E2 and E3 are m/3-signed-digit in radix-3*/ begin for i = 1 to m do /*scan exponent EMSD from right to left */ begin /*evaluate M E1 for positive signed-digit*/ if (e1i = 1) then C1 = MMR(SC1); 1 if (e ¼ 1) then D = MMR (S D ); /*evaluate ME1 for negative signed-digit*/ Output: C 1 ¼ M
E1 ½1
n
1i
1
1
/*evaluate M E2 for positive signed-digit*/ if (e2i = 1) then C2 = MMR(SC2); 1 if (e2i ¼ 1) then D2 = MMR (S D2); /*evaluate ME2 for negative signed-digit*/ if (e = 2) then C = MMR(SC ); /*evaluate M E3 for positive signed-digit*/ 3i
3
3
then D3 = MMR (S1D3); /*evaluateM E3 for negative signed-digit*/ if (e3i ¼ 2) S = MMR(SS); end; end.
In order to execute the exponentiation operation, we divide the exponent EMSD (m-bit) into three equal length parts as E1, E2, and E3, each with of m/3 bits. In the proposed CMM–MSD Montgomery algorithm, we put the operation results of positive digit in the register C1, C2, and C3, and we put the operation results of negative digit in the registers D1, D2, and D3. The C1 and D1 are used to store the operation results in decomposition segment E1 of minimal-signed-digit exponent EMSD. And the C2 and D2 are used to store the operation results in the decomposition segment E2 of exponent EMSD. Meanwhile, the C3 and D3 are used to store the operation results in the decomposition segment E3 of exponent EMSD.
417
C.-L. Wu / Information Sciences 179 (2009) 410–421
Our main goal for applying common-multiplicand-multiplication technique is that the common part among MMR(SC1), MMR(SD1), MMR(SC2), MMR(SD2), MMR(SC3), MMR(SD3), and MMR(SS) can be therefore computed just once. In order to achieve this goal, we define the Montgomery modular reduction computation of MMR(AB) as follows:
MMRðABÞ ¼ ABR1 mod N ¼ AðBn1 2n1 þ Bn2 2n2 þ þ B1 2 þ B0 Þ 2n mod N ¼ ðBn1 ðA 2k1 mod NÞ þ Bn2 ðA 2k2 mod NÞ þ þ B1 ðA 2knþ1 mod NÞ þ B0 ðA 2kn mod NÞÞ 2k mod N; ð34Þ n
where A, B and N are n-digit integers in base 2 and R ¼ 2 mod N. From the above, we can have the optimal intermediate reduction result MMR(AB) of Bn1 ðA 2k1 mod NÞ þ þ B1 ðA 2knþ1 mod NÞ þ B0 ðA 2kn mod NÞ that bound at most an (n + 1) + dlog2 ne-bit integer when k = 2. If we denote T as S 2iþ1 mod N, the operations of S 2i mod N for 1 6 i 6 n 2 are repeatedly computed using the previous computation result of T, where S 2i mod N ¼ T 21 mod N. Also notice that each step of a single modular multiplication in the CMM–MSD Montgomery algorithm has a similar step in the Montgomery modular reduction algorithm as depicted in Section 2.3. Assume we have two n-bits exponents E1, E2, and E3 (where n ¼ m3 and m is the bit length of exponent E), based on the previous mathematical equations (from Eqs. (15)–(24)) depicted in Section 3.2, the exponentiation operation M EMSD can be depicted as 2n 3
n
ME ¼ M E1 kE2 kE3 ¼ M E1 2 M E2 2 ME3
ð\k" is the concatenation operatorÞ: Pm1
i
ð35Þ Pm1
i
Pm1
Let the decomposition segments E1 and E2 be expressed as E1i ¼ i¼0 e1i 2 ; E2i ¼ i¼0 e2i 2 , and E3i ¼ i¼0 e3i 2i 2g. Moreover, the exponentiations ME1 ½1 , M E2 ½1 , where e1i, e2i, and e3i are minimal-signed-digits and e1i ; e2i e3i 2 f0; 1; 2; 1; and ME3 ½2 are evaluated for handling positive signed-digit in E1, E2, and E3, respectively. Similarly, the exponentiations ME1 ½1 ; M E2 ½1 , and ME3 ½2 are evaluated for handling negative signed-digit in E1, E2, and E3, respectively. Finally, we can output the exponentiation results of ‘‘ME1 ½1 mod N; M E2 ½1 mod N; M E3 ½2 mod N; M E1 ½1 mod N; E2 ½1 E3 ½2 mod N; and M mod N” into six different registers C1, C2, C3, D1, D2, D3, respectively. By applying the common part M extraction technique (depicted in Section 2.2) of the common-multiplicand-multiplication method upon the three exponent segments E1, E2, and E3 and the related mathematical equations (depicted in Section 3.2), we can obtain the three temporary exponents Ecommon, E1,c, E2,c, and E3,c using Eqs. (15)–(18). It should be noted that all three temporary exponents are bitwise mutually exclusive [26]. Therefore, all the following eight exponentiations:
MEcommon½1 ; M E1;c½1 ; M E2;c½1 ; M E3;c½2 ; M Ecommon½1 ; M E1;c½1 ; ME2;c½1 ; ME3;c½2 can be evaluated in a batch more efficiently using a modified version of the proposed CMM–MSD Montgomery exponentiation algorithm as detailed depicted in the next section. 3.4. The improved CMM–MSD Montgomery modular exponentiation algorithm In the improved version of CMM–MSD Montgomery algorithm, we define six exponentiation results M Ecommon½1 ; M E1;c½1 ; ; M Ecommon½1 ; M E1;c½1 and M E2;c ½1 . Based on Lemma 2 described in Section 3.2, we can improve the proposed CMM–MSD M Montgomery algorithm (depicted in Section 3.3) by replacing the multiplicative inverse operation into multiplication operation as follows: E2;c½1
Improved CMM–MSD Montgomery modular exponentiation algorithm /*EMSD = (em em1 e2 e1)SD2*/
Input: M, EMSD, N, R Output: C 1 ¼ M
Ecommon½1
; C2 ¼ M
E1;c½1
; C3 ¼ M
E2;c½1
; C 4 ¼ M E3;c½2
D1 ¼ M Ecommon½1 ; D2 ¼ ME1;c½1 ; D3 ¼ M E2;c½1 ; D4 ¼ M E3;c½2 ; begin /*M and N are n-digit integers in base 2*/ C1 = C2 = C3 = C4 = D1 = D2 = D3 = D4 = 2n; S = M R mod N; /*R=2n*/ for i = 1 to m do /*scan exponent EMSD from right to left */ begin /*evaluate MEcommon for positive signed-digit*/ if (e01i ¼ 1) then C1 = MMR(SC1); then D = MMR(SD ); /* evaluate M Ecommon for negative signed-digit*/ if (e0 ¼ 1Þ 1i
1
1
if (e02i ¼ 1) then C2 = MMR(SC2); then D2 = MMR(SD2); if (e02i ¼ 1)
/*evaluate ME1;c for positive signed-digit*/ /*evaluate M E1;c for negative signed-digit*/
if (e03i ¼ 1) if (e03i ¼ 1) if (e04i ¼ 2) if (e0 ¼ 2Þ
/*evaluate ME3;c for negative signed-digit*/
then C3 = MMR(SC3); then D3 = MMR(SD3); then C4 = MMR(SC4);
then D4 = MMR(SD4); 4i S = MMR(SS); end; end.
/*evaluate ME2;c for positive signed-digit*/ /*evaluate M E2;c for negative signed-digit*/ /*evaluate ME3;c for positive signed-digit*/
418
C.-L. Wu / Information Sciences 179 (2009) 410–421
Note the six exponents Ecommon½1 ; E1;c½1 ; E2;c½1 ; Ecommon½1 ; E1;c½1 ; E2;c½1 ; E3;c½2 , and E3;c½2 , in the improved CMM–MSD Montgomery modular exponentiation algorithm are bitwise mutually exclusive as defined previously in Section 3.3, by applying this property, we can therefore more efficiently evaluate modular exponentiation operation with fewer binary additions as well as bit multiplications [26]. We put the positive signed-digit recoding operation result and negative signed-digit recoding operation results of ME1;c in registers C2 and D2, respectively. And we put the positive signed-digit recoding operation result and negative signed-digit recoding operation results of ME2;c in registers C3 and D3, respectively. Moreover, we put the positive signed-digit recoding operation result and negative signed-digit recoding operation results of ME3;c in registers C4 and D4, respectively. We here describe the improved algorithm in detail as follows. In order to detailed depicted the improved CMM–MSD Montgomery modular exponentiation algorithm shown above, we define six registers C1, C2, C3, C4, D1, D2, D3, D4 in the improved CMM–MSD Montgomery algorithm as follows:
C 1 ¼ M Ecommon½1 mod N; D1 ¼ M
Ecommon½1
ð36Þ
mod N;
ð37Þ
C 2 ¼ M E1;c½1 mod N;
ð38Þ
D2 ¼ ME1;c½1 mod N;
ð39Þ
C 3 ¼ M E2;c½1 mod N;
ð40Þ
E2;c½1
mod N:
ð41Þ
C 4 ¼ M E3;c½2 mod N;
ð42Þ
D3 ¼ M D4 ¼ M
E3;c½2
mod N:
ð43Þ
From Eq. (15) depicted in Section 3.2, we define M
M
Ecommon½1
M
Ecommon½1
¼M ¼M
E1½1 ANDE2½1
Ecommon½1
and M
Ecommon½1
as
ð44Þ
;
E1½1 ANDE2 ½1
ð45Þ
:
From Eq. (16) depicted in Section 3.2, we define M
E1;c½1
and M
E1;c½1
as
ME1;c½1 ¼ M E1½1 XOREcommon½1 ;
ð46Þ
ME1;c½1 ¼ M E1½1 XOREcommon½1 :
ð47Þ
From Eq. (17) depicted in Section 3.2, we define M
M
E2;c½1
M
E2;c½1
¼M
E1½1 XOREcommon½1
;
¼M
E2½1 XOREcommon½1
:
M M
E3;c½2
¼M
E3½2 XOREcommon½1
;
¼M
E3½2 XOREcommon½1
:
and M
E2;c½1
as
ð48Þ ð49Þ
From Eq. (18) depicted in Section 3.2, we define M E3;c½2
E2;c½1
E3;c½2
and M
E3;c½2
as
ð50Þ ð51Þ E1 ½1
E2 ½1
Note that the exponentiations M and M are evaluated for handling positive signed-digit in exponent segments E1 and E2, respectively. Based on the definitions of Eqs. (19) and (21), we can have the following:
ME1½1 ¼ M E1;c½1 þEcommon½1 ; M
E2½1
¼M
E2;c½1 þEcommon½1
ð52Þ ð53Þ
;
ME3½2 ¼ M E3;c½2 þEcommon½1 :
ð54Þ E1 ½1
E2 ½1
E3 ½2
As the exponentiations M ;M , and M tively. Similarly, we can obtain the following:
are evaluated for handling negative signed-digit in E1, E2, and E3, respec-
ME1½1 ¼ M E1;c½1 þEcommon½1 ; M
E2½1
¼M
E2;c½1 þEcommon½1
ð55Þ
;
ð56Þ
ME3½2 ¼ M E3;c½2 þEcommon½1 :
ð57Þ
To calculate the result for two n-bits (n = m/3) exponent segments E1, E2, and E3, we define
ME1 ¼ M E1½1 þE1½1 ;
ð58Þ
ME2 ¼ M E2½1 þE2½1 ;
ð59Þ
M
E3
¼M
E3½2 þE3½2
:
ð60Þ
419
C.-L. Wu / Information Sciences 179 (2009) 410–421
From the definition of the m-bit minimal-Hamming-weight signed-digit recoding exponentEMSD with E1, E2, and E3, we have
MEMSD ¼ ME1 kE2 kE3 ðwhere \k" is the concatenation operatorÞ:
ð61Þ
To obtain the final result, based on the definition of Eq. (35) defined earlier we have n
2n 3
n
MEMSD mod N ¼ M E1 2 ME2 2 M E3 ¼ M ðE1 2
2n
ÞþðE2 2 3 ÞþE3
mod N:
ð62Þ
Example. We assume N = 37*41 = 1517, message M = 127, E = 1213, ME. 101Þ Sol. E ¼ ð10010111101Þ2 ¼ ð10100200 SD ; E1 kE2 kE3 ¼ ð101ÞSD kð0020ÞSD kð0101ÞSD . For the positive value:
C 1 M ð101ÞSD mod N Mð5Þ10 mod N; C 2 Mð0000ÞSD mod N Mð0Þ10 mod N; and C 3 M ð0001ÞSD mod N M ð1Þ10 mod N:
For the negative value:
D1 M ð000ÞSD mod N M ð0Þ10 mod N; D2 M ð0020ÞSD mod N M ð4Þ10 mod N; and D3 M ð0100ÞSD mod N M ð4Þ10 mod N:
100Þ E ¼ ð10100000001ÞSD þ ð1000 SD ¼ ð10100000001Þ2 ð1000100Þ2 ¼ A B: So, A = (10100000001)2 = 1281, B = (1000100)2 = 68. ME mod N 127128168 mod 1517 1048.
4. Computational complexity analyses In this section, we will detailed describe the theoretical analyses for the performance of the proposed CMM–MSD Montgomery algorithm. We first analyze the performance of the proposed algorithm and then calculate the number of binary additions and modular multiplications needed by using the proposed CMM–MSD Montgomery algorithm. S1 is the inverse of S under modulus N. S1 can be pre-computed using the Euclidean algorithm or Euler theory [10]. Let the m-bit exponent E be recoded as the radix-r (r > 0) signed-digit representation EMSD. Based on Lemma 3 given previously, the probability for the occurrences of positive digit ‘‘r” and negative digit ‘‘r”(or ‘‘ r”) is
ðr1Þ2 rðrþ1Þ
and
ðr1Þ , rðrþ1Þ
respectively, and the probability for the occurrence of digit ‘‘0” is The two radix-3 nonzero digits, ‘‘1” and ‘‘1”, and, ‘‘2” and ‘‘2”,occur with equal probability [1], we can thus have the occurrence probabilities of ‘‘0”, ‘‘1” and ‘‘2” and ‘‘2”, as fP rob ð0Þ ¼ 2=3; P rob ð1Þ ¼ P rob ð1Þ ¼ Prob ð2Þ ¼ Prob ð2Þ ¼ 1=12g. ‘‘1”,, 2 . ðrþ1Þ
In the proposed CMM–MSD Montgomery algorithm, the computation of ð2i S mod NÞ needs (n 2) (n + 1) single-precision multiplications for 1 6 i 6 n 2. Based on the computational analyses of Montgomery reduction algorithm from [9], the probability of executing modular multiplication MMR(SC1), MMR(SC2), MMR(SC3), or MMR(SC4) in the CMM–MSD Montgomery algorithm is all equivalent to the occurrence probability of signed-digit ‘‘1” and ‘‘2”in EMSD. Therefore, the operations MMR(SC1), MMR(SC2), MMR(SC3), or MMR(SC4) total require
6
1 3 ½1:5m ðn 2Þ ðn þ 1Þ ¼ m ðn2 n 2Þ 12 4
single-precision multiplications. Similarly, operations MMR(SD1), MMR(SD2), MMR(SD3), and MMR(SD4) total require
6
1 3 ½1:5m ðn 2Þ ðn þ 1Þ ¼ m ðn2 n 2Þ 12 4
single-precision multiplications.
420
C.-L. Wu / Information Sciences 179 (2009) 410–421
The operation MMR(SS) requires
2 1 ½0:5m ðn 2Þ ðn þ 1Þ ¼ m ðn2 n 2Þ 3 3 single-precision multiplications, as the ð2i S mod NÞ operations are computed exactly once. By adopting the Montgomery modular reduction MMR algorithm, it requires on average 1.5m (2n2 + n) multiplications. Meanwhile, Ha–Moon’s improved Montgomery binary algorithm takes 0.5m (5n2 + 4n) multiplications [9] for calculating an m-bits exponent exponentiation. However, the proposed CMM–MSD Montgomery modular exponentiation algorithm only takes
3 3 1 m ðn2 n 2Þ þ m ðn2 n 2Þ þ m ðn2 n 2Þ 1:833m ðn2 n 2Þ 4 4 3 single-precision multiplications. Take a 512-bit exponent E and 216-base N to evaluate ðM E mod NÞ for example, Ha–Moon’s improved Montgomery algorithm reduces the overall number of single-precision multiplications by about 16% [6]. On average, the proposed CMM–MSD Montgomery algorithm in this paper reduces the overall number of single-precision multiplications (compared to Dusse– Kaliski’s Montgomery algorithm [9]) by about
1
1:833m ðn2 n 2Þ 1:167 38:9%: 1:5m ð2n2 þ nÞ 3
Moreover, on average the proposed CMM–MSD Montgomery algorithm reduces the overall number of single-precision multiplications (compared to Ha–Moon’s Montgomery algorithm [6]) by about
1
1:833m ðn2 n 2Þ 0:667 26:68%: 0:5m ð5n2 þ 4nÞ 2:5
5. Conclusions As the modular exponentiation is one of the most important operations in public-key cryptography, therefore, the efficient implementation of modular exponentiation has become the key factors affecting the performance of public-key cryptosystems [22]. In this paper, a new method (the CMM–MSD Montgomery algorithm) for speeding up modular exponentiation is investigated by using the Montgomery modular reduction method, common-multiplicand-multiplication method, and minimal-signed-digit exponent recoding technique. Based on the computational complexity analyses for the proposed modular exponentiation algorithm, we have the following observations. First, the proposed CMM–MSD Montgomery algorithm requires extra additions and shift operations, compared with some modern Montgomery modular exponentiation algorithms. Nevertheless, on average the overhead of increasing such extra operations is less significant compared with that of reducing the multiplications in practice. Secondly, we should point out the following fact that the evaluations of exponent segments M Ecommon½1 ; M E1;c½1 ; M E2;c½1 ; M E3;c½2 are independent of the evaluations of exponent segments M Ecommon½1 ; M E1;c½1 ; M E2;c½1 ; M E3;c½2 , Therefore, we can have those two operations concurrently executed depending on the positive and negative signed-digit representations of exponent segments. Hence, we can further have the proposed CMM–MSD Montgomery algorithm work more efficient by using parallel computing techniques. Moreover, multiplicative inverse operation can be cheaply evaluated as for elliptic curves [15,29] or in the finite field using normal basis [20,21]. Therefore, we can further speed up the proposed CMM–MSD Montgomery modular exponentiation algorithm by evaluating the multiplicative inverse operation over the finite filed using normal basis. Furthermore, by using the proposed CMM–MSD Montgomery algorithm, on average the total number of single-precision multiplications can be reduced by about 38.9% and 26.68% as compared with Dusse–Kaliski’s Montgomery algorithm [6] and Ha–Moon’s Montgomery algorithm [9], respectively. References [1] [2] [3] [4] [5]
S. Arno, F.S. Wheeler, Signed digit representations of minimal Hamming weight, IEEE Transactions on Computers 42 (8) (1993) 1007–1010. A. Avizienis, Signed digit number representation for fast parallel arithmetic, IRE Transactions on Electronic Computers EC-10 (3) (1961) 389–400. A.D. Booth, A signed binary multiplication technique, Quarterly Journal of Mechanics and Applied Mathematics 4 (1951) 236–240. W. Diffie, E. Hellmen, New directions in cryptography, IEEE Transactions on Information Theory 22 (6) (1976) 644–654. V.S. Dimitrov, G.A. Jullien, W.C. Miller, Complexity and fast algorithms for multiexponentiations, IEEE Transactions on Computers 49 (2) (2000) 141– 147. [6] S.R. Dusse, B.S. Kaliski, A cryptographic library for the Motorola DSP 56000, in: Advance in Cryptology – Proceedings of EUROCRYPT’90, LNCS, vol. 73, Springer-Verlag, 1990, pp. 230–244. [7] T. ElGamal, A public key cryptosystem and a signature scheme based on discrete logarithms, IEEE Transactions on Information Theory 31 (1985) 469– 472. [8] D.M. Gordon, A survey of fast exponentiation methods, Journal of Algorithms 27 (1) (1998) 129–146.
C.-L. Wu / Information Sciences 179 (2009) 410–421
421
[9] J.-C. Ha, S.-J. Moon, A common-multiplicand method to the Montgomery algorithm for speeding up exponentiation, Information Processing Letters 66 (2) (1998) 105–107. [10] D.E. Knuth, The Art of Computer Programming, Seminumerical Algorithms, vol. II, Addison-Wesley, MA, 1997. [11] C.K. Koc, Tech. Notes, High-Speed RSA Implementation, RSA Labs. Tech. Note TR 201, August 14, 2007. . [12] (a) S.T. Klein, Should one always use repeated squaring for modular exponentiation?, Information Processing Letters 106 (6) (2008) 232–237; (b) D-C. Lou, C.-C. Chang, Fast exponentiation method obtained by folding the exponent in half, Electronics Letters 32 (11) (1996) 984–985. [13] I. Koren, Computer Arithmetic Algorithms, second ed., A.K. Peters, Natick, MA, 2002. [14] P.L. Montgomery, Modular multiplication without trial division, Mathematics of Computation 44 (170) (1985) 519–521. [15] Y.-H. Park, S. Jeong, J. Lim, Fast exponentiation in subgroups of finite fields, Electronics Letters 38 (13) (2002) 629–630. [16] G.W. Reitwiesner, Binary Arithmetic, Advances in Computers, vol. 1, Academic Education Press, New York, 1960. pp. 231–308. [17] R.L. Rivest, A. Shamir, L. Adleman, A method for obtaining digital signatures and public key cryptosystems, Communications of the ACM 21 (2) (1978) 120–126. [18] W. Stallings, Cryptography and Network Security: Principles and Practice, Prentice-Hall, 1999. [19] M. Syuto, E. Satake, K. Tanno, O. Ishizuka, A high-speed binary to residue converter using a signed-digit number representation, IEICE Transactions on Information and Systems E85-D (5) (2002) 903–905. [20] N. Takagi, J. Yoshiki, K. Takagi, A fast algorithm for multiplicative inversion in GF(2n) using normal basis, IEEE Transactions on Computers 50 (5) (2001) 394–398. [21] Y. Watanabe, N. Takagi, K. Takagi, A VLSI algorithm for division in GF(2m) based on extended binary GCD algorithm, IEICE Transactions on Fundamentals E85-A (5) (2002) 994–999. [22] C.-L. Wu, Fast exponentiation based on common-multiplicand multiplication and minimal-signed-digit techniques, International Journal of Computer Mathematics 84 (10) (2007) 1405–1415. [23] T.-C. Wu, Y.-S. Chang, Improved generalization common-multiplicand multiplications algorithm of Yen and Laih, Electronics Letters 31 (20) (1995) 1738–1739. [24] C.-L. Wu, D.-C. Lou, T.-J. Chang, Fast binary multiplication by performing dot counting and complement recoding, Applied Mathematics and Computation 191 (1) (2007) 132–139. [25] C.-L. Wu, D.-C. Lou, J.-C. Lai, T.-J. Chang, Fast modular multi-exponentiation using modified complex arithmetic, Applied Mathematics and Computation 186 (2) (2007) 1065–1074. [26] J.-H. Yang, C.-C. Chang, Efficient residue number system iterative modular multiplication algorithm for fast modular exponentiation, Computers and Digital Techniques, IET 2 (1) (2008) 1–5. [27] S.-M. Yen, Improved common-multiplicand multiplication and fast exponentiation by exponent decomposition, IEICE Transactions on Fundamentals E80-A (6) (1997) 1160–1163. [28] S.-M. Yen, C.-S. Laih, Common-multiplicand multiplication and its applications to public key cryptography, Electronics Letters 29 (17) (1993) 1583– 1584. [29] N. Zhang, Z. Chen, G. Xiao, Efficient elliptic curve scalar multiplication algorithms resistant to power analysis, Information Sciences 177 (10) (2007) 2119–2129.