Applied Mathematics and Computation 191 (2007) 132–139 www.elsevier.com/locate/amc
Fast binary multiplication by performing dot counting and complement recoding Chia-Long Wu a
a,*
, Der-Chyuan Lou
b,*
, Te-Jen Chang
b
Department of Aviation and Communication Electronics, Chinese Air Force Institute of Technology, Kaohsiung 82042, Taiwan, ROC b Department of Electrical Engineering, Chung Cheng Institute of Technology, National Defense University, Tahsi, Taoyuan 33509, Taiwan, ROC
Abstract In this paper, we present a new computation method by combining the dots counting method and the complement recoding method to efficiently evaluate modular multiplication in binary multiplications. The LUT (Look-Up Table) technique can be adopted in our proposed method to efficiently reduce the number of multiplications. And the proposed method can be easily implemented for the hardware. Numerous examples are provided to show the efficient and easy operations of the binary multiplication method. In average case, the worst case for the Hamming weight of the product X*Y is uv , where u is the bit-length of multiplicand X, v is the bit-length of multiplier Y. If the proposed method is applied, we 4 could effectively reduce the Hamming weight of the product X*Y to 0.25u. Ó 2007 Elsevier Inc. All rights reserved. Keywords: Look-Up Table; Hamming weight; Binary multiplication; Complement recoding method; Dots counting method
1. Introduction With the increasing popularity of electronic communication, data security becomes more and more important. Binary multiplication is an important operation in modern electronic machines. All these systems for their security are based on some mathematical functions. RSA [1] is the most widely used public key cryptosystem. An RSA operation is a modular exponentiation operation, which requires repeated modular multiplication. So we need some novel and fast modular multiplication algorithms such as the Montgomery reduction method [2,3], high-radix method [4], addition chains method [5], square-and-multiply method [6,7], exponent-folding method [8], residue number conversion method [9], key-partitioning method [10], and signed-digit recoding method [11], etc. to solve this operation. Thus, it can be seen how to reduce ‘‘multiplications’’ is very important in many aspects especially for cryptographic application. Liu et al. introduce ‘‘New methods for binary multiplication’’ in [12]. They present a new method ‘‘dots counting’’ to speed up binary multiplication. In this paper, we will describe a novel method *
Corresponding authors. E-mail addresses:
[email protected] (C.-L. Wu),
[email protected] (D.-C. Lou).
0096-3003/$ - see front matter Ó 2007 Elsevier Inc. All rights reserved. doi:10.1016/j.amc.2007.02.136
C.-L. Wu et al. / Applied Mathematics and Computation 191 (2007) 132–139
133
to combine the dots counting method with the complement recoding method to speed up the operation of multiplications. The rest of the paper is organized as follows. In Section 2, the modular multiplication method, the dots counting method and the complement recoding method are depicted. In Section 3, combining the dots counting method with the complement recoding method is described in the proposed method. The computational complexity analyses will be described in Section 4. Finally, we give the conclusions and future works in Section 5. 2. Mathematical preliminaries There are some important methods for speeding up the computation of exponentiations. They are described as follows. 2.1. Modular multiplication method In 1978, Rivest et al. [1] introduced one of the most successful and widely used public key cryptosystems based on computing modular exponentiations and modular multiplications. The public key ðP ; N Þ is used to encrypt a message and the private key ðQ; N Þ is used to decrypt a message in RSA cryptosystem. The message M is divided into a sequence of blocks and each block is considered as an integer between 1 and N. We use C M Q mod N to encrypt the message for each message block M and use M C P mod N for each ciphertext block to decrypt the ciphertext C. Exponentiations are performed by repeated squaring and multiplication operations. Let the binary representation of the exponent E been en1 . . . e2 e1 , then we can have n1
ME ¼ M2
en
2
M2
e3
1
M2
e2
0
M2
e1
ð1Þ
:
A simple way to perform modular exponentiation is to repeat the modular squaring and modular multiplication operations from the Least-Significant Bit (LSB) of E to the Most-Significant Bit (MSB) of E, where E is the bit-length of the exponent. In this method, modular multiplications are the fundamental operations during iterations. 2.2. Multiplication by counting dots The multiplicand X and the multiplier Y can be expressed in binary form as follows. X ¼ ðX u X u1 . . . X 2 X 1 Þ2 and Y ¼ ðY v Y v1 . . . Y 2 Y 1 Þ2 are two unsigned binary integers, where u P v. It means there are u and v bits for the bit-length of X and Y, respectively. The multiplicand X and the multiplier Y are written in two rows and keep abreast of right. Insert 2u-1 spaces and each space is expressed by a ‘‘–’’ between the two rows to represent the 2u 1 new positions as shown in Fig. 1. The row of new positions is defined by Q(new). The ith position which is denoted Qi(new) is counted from right to left in Q(new) for i ¼ 1; 2; . . . ; 2u 1 [12]. The dots counting method for multiplication can be completed using Step 1 and Step 2 as follows. Step 1: Dots determination We check each bit in the multiplicand X and the multiplier Y. If there are k pairs of symmetric ‘‘1’’ from the row above and below Qi(new), then place k dots above the ‘‘–’’ at Qi(new). This process should be preceded for all bits in X and Y. Let us take an example as shown in Example 1.
Xu
Xu-1
Xu-2 … X2
X1
Yv
Yv-1
Yv-2 …
Y1
Y2
⇐ X ⇐ Q (new) ⇐ Y
Fig. 1. The process of inserting a row of spaces for computing the product X*Y.
134
C.-L. Wu et al. / Applied Mathematics and Computation 191 (2007) 132–139
Example 1. Let X ¼ ð11111Þ2 and Y ¼ ð1001Þ2 . Determine the dots positions. Sol. There are two pairs of ‘‘1’’ symmetric Q4(new): ðX 1 ; Y 4 Þ and ðX 4 ; Y 1 Þ as shown in Fig. 2. Other dots can be deduced following the above process. They are shown in Fig. 2. Step 2: Binary-restoring descriptions We assume a sequence S ¼ S m S m1 . . . S 2 S 1 , where each Si is a non-negative decimal integer. Binary-restoring S is a procedure of transforming the decimal number into the binary number in this sequence [1,13–15]. This procedure is defined as follows. Start from S1 and check each Si from right to left. Set S i ¼ j and S iþ1 ¼ S iþ1 þ k (carry propagation with k as the carry.) Let us take two examples as shown in Examples 2 and 3. Example 2. Let a sequence S ‘‘011022011’’ in decimal number. Evaluate the binary-restoring result of the sequence. Sol. Because the sequence in decimal number is 11022011, it means S 8 ¼ S 7 ¼ S 2 ¼ S 1 ¼ 1; S 5 ¼ S 4 ¼ 2. We assume the binary-restoring result of the sequence is binary number (S 08 S 07 S 06 S 05 S 04 S 03 S 02 S 01 Þ2 . The following procedures are described for the binary-restoring method: S 01 :
ðS 1 þ 0Þ ¼ 0 . . . 1 ) S 01 ¼ 1; 2
ð2Þ
S 02 :
ðS 2 þ 1Þ ¼ 0 . . . 1 ) S 02 ¼ 1; 2
ð3Þ
S 03 :
ðS 3 þ 1Þ ¼ 0 . . . 0 ) S 03 ¼ 0; 2
ð4Þ
S 04 :
ðS 4 þ 0Þ ¼ 1 . . . 0 ) S 04 ¼ 0; 2
ð5Þ
S 05 :
ðS 5 þ 1Þ ¼ 1 . . . 1 ) S 05 ¼ 1; 2
ð6Þ
S 06 :
ðS 6 þ 1Þ ¼ 0 . . . 1 ) S 06 ¼ 1; 2
ð7Þ
S 07 :
ðS 7 þ 0Þ ¼ 0 . . . 1 ) S 07 ¼ 1; 2
ð8Þ
S 08 :
ðS 8 þ 0Þ ¼ 0 . . . 1 ) S 08 ¼ 1: 2
ð9Þ
Then we get the binary-restoring result of the sequence ‘‘11022011’’ is 11110011.
X5 1
X4 1
X3 1
X2 1
X1 1
X
Q (new) 1 0 0 1 Y4 Y3 Y2 Y1 1 1 0 2 2 0 1 1
Y X*Y
Fig. 2. Dots determination for multiplication.
C.-L. Wu et al. / Applied Mathematics and Computation 191 (2007) 132–139
135
Example 3. Let a sequence in decimal number be 413. Evaluate the binary-restoring result of the sequence. Sol. Because the sequence in decimal number is 413, it means S 3 ¼ 4, S 2 ¼ 1, and S 1 ¼ 3. We assume the binaryrestoring result of the sequence is the binary number ðS 05 S 04 S 03 S 02 S 01 Þ2 . The following procedures are described for the binary-restoring method: S 01 :
ðS 1 þ 0Þ ¼ 1 . . . 1 ) S 01 ¼ 1; 2
ð10Þ
S 02 :
ðS 2 þ 1Þ ¼ 1 . . . 0 ) S 02 ¼ 0; 2
ð11Þ
ðS 3 þ 1Þ ¼ 2 . . . 1 ) S 03 ¼ 1; 2 ðS 4 þ 2Þ ¼ 1 . . . 0 ) S 04 ¼ 0; S 04 : 2 S 03 :
ð12Þ ð13Þ
ðS 5 þ 1Þ ¼ 0 . . . 1 ) S 05 ¼ 1: 2 Then, we get the binary-restoring result of the sequence ‘‘413’’ is 10101. S 05 :
ð14Þ
Example 4. Let X ¼ ð10101011Þ2 and Y ¼ ð1011101Þ2 . Evaluate the product of X Y . Sol. The dots are computed on the pairs of symmetric ‘‘1’’ in X and Y in Step 1. The binary-restoring procedure is determined in Step 2. The dots descriptions and the result are shown in Fig. 3. 2.3. Complement recoding method There are two categories in the complement expressions. One is 1’s complement and the other is 2’s complement. We describe them respectively as follows. 2.3.1. 1’s complement Let W be an integer in binary representation and there are r bits in W. W is indicated as W r W r1 . . . W 2 W 1 and the leftmost bit is the most significant one [15–17]. We define the symbol of the 1’s complement of W to be W . A way to find the 1’s complement is simply to take the complement of the binary number bit by bit. For example: W ¼ ð00000110Þ2 and W ¼ ð11111001Þ2 (replace ‘‘0’’ with ‘‘1’’ and ‘‘1’’ with ‘‘0’’). We can use Eq. (15) to compute the complement of W: W ¼ ð100 . . . 00Þðrþ1Þbits W 1;
ð15Þ
where W ¼ W r W r1 . . . W 1 and W i ¼ 0 if W i ¼ 1, or W i ¼ 1 if W i ¼ 0, for i ¼ 1; 2; 3; . . . ; r.
1
0
1 0 1 0 . . . . . . . . . . . . . . . . . . - - - - - - - - - - - - 1 0 1 1 1 1021314232 1111100001 .
1 1 . . . . . . - - - 0 1 3111 1111
X Step 1 Q(new) Y (decimal) Step 2
Fig. 3. Computational results in Example 4.
136
C.-L. Wu et al. / Applied Mathematics and Computation 191 (2007) 132–139
Example 5. Let W ¼ ð101101Þ2 . Evaluate 1’s complement of W. Sol. According to Eq. (15), W ¼ ð1000000Þ2 ð101101Þ2 1 ¼ ð010010Þ2 is obtained. 2.3.2. 2’s complement Let Q be an integer in binary representation and there are d bits in Q. Q is indicated as Qd Qd1 . . . Q2 Q1 and the leftmost bit is the most significant one. Calculate the 2’s complement of Q by following the two steps as shown. Step 1: Invert the binary equivalent of the number by changing all of the ones to zeros and all of the zeros to ones. Step 2: The above result adds 1 and the final result is obtained (2’s complement of Q). Let us take an example as shown in Example 6. Example 6. Let Q ¼ ð00010001Þ2 . Evaluate 2’s complement of Q. Sol. Step 1: (00010001)
!
invert bits
(11101110).
Step 2: ð00010001Þ2 þ ð1Þ2 ¼ ð11101111Þ2 . Then, we can obtain 2’s complement of Q is (11101111)2. 3. Proposed method In 2003, Chang, Kuo, and Lin proposed an efficient method to compute the general multiplication operation and for the common-multiplicand multiplication by performing complement recoding method. In this paper, the fast binary multiplication method is proposed by combining the dots counting method with the complement recoding method. Based on the definitions which are shown in Section 2.3, we can transform X and Y into 1’s complements of X and Y, which are defined X and Y , respectively. So 1’s complements of X and Y are shown as follows: X ¼
u X
xi 2i ¼ ð10 . . . 0Þðuþ1Þbits X 1;
ð16Þ
i¼1
where X ¼ xu xu1 . . . x1 , and ( xi ¼ 0 if xi ¼ 1; xi ¼ 1
if xi ¼ 0;
for i ¼ 1; 2; . . . ; u: Y ¼
v X
y i 2i ¼ ð10 . . . 0Þðvþ1Þbits Y 1;
i¼1
where Y ¼ y v y v1 . . . y 1 , and ( y i ¼ 0 if y i ¼ 1; y i ¼ 1 if y i ¼ 0;
ð17Þ
C.-L. Wu et al. / Applied Mathematics and Computation 191 (2007) 132–139
137
for i ¼ 1; 2; . . . ; v: Then the original multiplication result of X Y can be rewritten as follows. X Y ¼ ½ð100 . . . 00Þuþ1 X 1 ½ð100 . . . 00Þvþ1 Y 1 ¼ ð100 . . . 00Þuþvþ1 Y ð100 . . . 00Þuþ1 ð100 . . . 00Þuþ1 X ð100 . . . 00Þvþ1 þ X Y þ X ð100 . . . 00Þvþ1 þ Y þ 1:
ð18Þ
The proposed algorithm is depicted as shown in Algorithm 1. Algorithm 1 (The proposed algorithm). Input: X ; Y Output: C begin Count the Hamming weight of X, which denotes as Ham(X). Count the Hamming weight of Y, which denotes as Ham(Y). if Ham ðX Þ P u2 and Ham ðY Þ P 2v , Store the ‘‘dots counting’’ information of Y into Look-Up Table in advance. C ¼ ð100 . . . 00Þuþvþ1 Y ð100 . . . 00Þuþ1 ð100 . . . 00Þuþ1 X ð100 . . . 00Þvþ1 þ X Y þ X ð100 . .. 00Þvþ1 þ Y þ 1:= u is the bit-length of X and v is the bit-length of Y*/ if Ham ðX Þ P u2 and Ham ðY Þ < 2v , Store the ‘‘dots counting’’ information of Y into Look-Up Table in advance. C ¼ ð100 . . . 00Þuþvþ1 Y ð100 . . . 00Þuþ1 ð100 . . . 00Þuþ1 X ð100 . . . 00Þvþ1 þ X Y þ X ð100 .. .00Þvþ1 þ Y þ 1:= u is the bit-length of X and v is the bit-length of Y*/ if Ham ðX Þ < u2 and Ham ðY Þ P 2v , Store the ‘‘dots counting’’ information of Y into Look-Up Table in advance. C ¼X Y
if Ham(X Þ < u2 and Ham(Y Þ < 2v , Store the ‘‘dots counting’’ information of Y into Look-Up Table in advance.
C ¼X Y
ð19Þ
return C end. 4. Computational complexity analyses In this section, we will describe the computational complexity of the proposed algorithm. We define the bitlength of X is u and the bit-length of Y is v. So, four cases will be discussed for the product of X Y [18,22]. That is,
Case 2: Ham ðX Þ Case 3: Ham ðX Þ Case 4: Ham ðX Þ
u
and Ham ðY Þ P 2v , 2u P 2 and Ham ðY Þ < 2v , < u2 and Ham ðY Þ P 2v , < u2 and Ham ðY Þ < 2v .
Case 1: Ham ðX Þ P
Because the ‘‘dots counting’’ information of the multiplier Y can be stored by LUT in advance, the computational complexity of the multiplier Y can be negligible as compared with a multiplication operation [3,5,16,19,20] and the Ham (X) is composed of the following two groups.
138
C.-L. Wu et al. / Applied Mathematics and Computation 191 (2007) 132–139
Group 1: Ham ðX Þ P u2 including Ham ðY Þ P 2v , and Ham ðY Þ < 2v ,We can consider uthe conditions of , in other words, Case 1 and Case 2 simultaneously as Group 1. When we assume Ham ðX Þ P 2 Ham ðX Þ < u2 is obtained. Although there are nine items in Eq. (18), we only consider X Y item and other items are only considered the shift position problem [17,21]. That is, other eight items are not considered for computational complexity because we can use ‘‘LUT’’ for storing the ‘‘dots counting’’ information of Y in advance. Then the number of multiplications for the product of X*Y is u2 , where u is the Hamming weight of X. Example 7. Let X ¼ ð10101011Þ2 and Y ¼ ð11011111Þ2 . Evaluate the product of X*Y. Sol. Here, Ham ðX Þ > 4 and Ham ðY Þ > 4; so X Y ¼ ð01010100Þ2 ð00100000Þ2 ¼ ð101010000000Þ2 . Then X Y ¼ ½ð100000000Þ2 ð01010100Þ2 1½ð100000000Þ2 ð00100000Þ2 -1] = (1001010011110101)2. We know the computational complexity for the product of X*Y is 3 in Eq. (18). Group 2: Ham ðX Þ < u2 including Ham ðY Þ P 2v and Ham ðY Þ < 2v .Since the Hamming weight of X is smaller than u2 and the ‘‘dots counting’’ information of the multiplier Y can be stored by LUT in advance, the number of multiplications for the product X*Y are u2 . Example 8. Let X ¼ ð110000Þ2 and Y ¼ ð100100Þ2 . Evaluate the product of X Y . Sol. Here, Ham ðX Þ < 4 and Ham ðY Þ < 4, then X Y ¼ ð110000Þ2 ð100100Þ2 ¼ ð11011000000Þ2 . We know the computational complexity for the product of X*Y is 1 in Eq. (19). Let X ¼ ðX u X u1 . . . X 2 X 1 Þ2 and Y ¼ ðY v Y v1 . . . Y 2 Y 1 Þ2 be two unsigned binary integers. We assume u P v. In average, the Hamming weights of X and Y are u2 and 2v, respectively. If we calculate the product of X Y directly, the worst case for the Hamming weight of the product is uv 4 . But we let ‘‘dots counting’’ information of the multiplier Y store in a LUT in advance [23], it means the LUT can be pre-computed and the computational complexity of multiplier Y is negligible as compared with a multiplication operation [24], i.e. the bit-length v of Y is no longer considered, then we could effectively reduce the Hamming weight of the product of X Y to become 0.25u. 5. Conclusions and future work From the above discussions and complexity analyses, both the occurrence probability of Group 1 and Group 2 are 12, and the average Hamming weight of non-zero in binary representation for multiplicand X and Y are 12, respectively. So, on average case, the number of multiplications for X*Y by using dots counting method and the complement recoding method is 12 12 u ¼ 14 u, where u is the bit-length of multiplicand X. We know the Lou–Wu-Chen method [8] takes 0.689u + 11 multiplications, where u is the bit-length of the multiplicand or the multiplier and the Yang–Guan–Laih’s method [25] takes 1.509u multiplications, where u is the bit-length of the multiplicand or the multiplier. However, the proposed method can reduce the number of multiplications to 0.25u. In conclusion, the proposed method can efficiently reduce the number of multiplications for the product of X*Y by combining dots and complement recoding methods. Furthermore, for computational complexity analyses aspect, we can easily prove that the proposed fast binary multiplication method can work very effectively by reducing the number of multiplications. References [1] R.-L. Rivest, A. Shamir, L. Adleman, A method for obtaining digital signatures and public-key cryptosystems, Communication of ACM 21 (2) (1978) 120–126. [2] O. Nibouche, A. Bouridane, M. Nibouche, Architectures for Montgomery’s multiplication, IEE Proceedings of Computers and Digital Techniques 150 (6) (2003) 361–368.
C.-L. Wu et al. / Applied Mathematics and Computation 191 (2007) 132–139
139
[3] P.L. Montgomery, Modular multiplication without trial division, Mathematics of Computation 44 (170) (1985) 519–521. [4] Y. Wang, J. Leiwo, T. Srikanthan, Efficient high radix modular multiplication for high-speed computing in re-configurable hardware, Proceedings of the IEEE International Symposium on Circuits and Systems 2 (May) (2005) 1226–1229. [5] L.M. Mourelle, N. Nedjah, Reconfigurable hardware for addition chains based modular exponentiation, Proceedings of the Information Technology: International Conference on Coding and Computing 1 (April) (2005) 603–607. [6] M. Khabbazian, T.A. Gulliver, V.K. Bhargava, A new minimal average weight representation for left-to-right point multiplication methods, IEEE Transactions on Computers 54 (11) (2005) 1454–1459. [7] D.E. Knuth, The Art of Computer Programming Vol. II: Seminumerical Algorithms, third ed., Addition-Wesley, MA, 1997. [8] D.-C. Lou, C.-L. Wu, C.-Y. Chen, Fast exponentiation by folding the signed-digit exponent in half, International Journal of Computer Mathematics 80 (10) (2003) 1251–1259. [9] H. Nozaki, A. Shimbo, S. Kawamura, RNS Montgomery multiplication algorithm for duplicate processing of base transformations, IEICE Transactions on Fundamentals E.86-A (1) (2003) 89–97. [10] S.-Y. Lee, Y.-J. Jeong, O.-J. kawon, A faster modular multiplication based on key size partitioning for RSA public-key cryptosystem, IEICE Transactions on Information and System E85-D (4) (2002) 789–791. [11] C. Heuberger, H. Prodinger, Carry propagation in signed digit representations, European Journal of Combinatorics 24 (3) (2003) 293–320. [12] B.-Q. Liu, D.-H. Chen, C.-G. Xiong, K. Xing, New methods for binary multiplication, International Journal of Computer Mathematics 82 (1) (2005) 13–22. [13] N. Clarke, A. Cantoni, Implementation of dynamic look-up tables, IEEE Proceedings of Computers and Digital Techniques 141 (6) (1994) 391–397. [14] C. McIvor, M. McLoone, J.-V. McCanny, Modified Montgomery modular multiplication and RSA exponentiation techniques, IEEE Proceedings of Computers and Digital Techniques 151 (6) (2004) 402–408. [15] C.-C. Chang, Y.-T. Kuo, C.-H. Lin, Fast algorithms for common-multiplicand multiplication and exponentiation by performing complements, in: Proceedings of 17th IEEE International Conference on Advanced Information Networking and Applications, March 2003, pp. 807–811. [16] A.-A. Ziya, S. Remziye, A hardware version of the RSA using the Montgomery’s algorithm with systolic arrays, Integration, The VLSI Journal 38 (2) (2004) 299–307. [17] S. Itagaki, M. Mambo, H. Sshizuya, On the strength of the strong RSA assumption, IEICE Transactions Fundamentals E86-A (5) (2003) 1164. [18] O. Nibouche, M. Nibouche, B. Ahmed, New iterative algorithms for modular multiplication, Signal Processing 84 (10) (2004) 1919– 1930. [19] F. Huang, Z.-H. Guan, A modified method of a class of recently presented cryptosystems, Chaos, Solitons and Fractals 23 (5) (2005) 1893–1899. [20] C.-L. Wu, D.-C. Lou, T.-J. Chang, Fast binary multiplication method for RSA cryptosystem, in: Proceedings of the Symposium on Applications of International, Management and Communication Technology, Session D, Jun. 2005, pp. 28–36. [21] J.-H. Hong, B.-Y. Tsai, L.-T. Lu, S.-H. Shieh, A novel radix-4 bit-level modular multiplier for fast RSA cryptosystem, in: Proceedings of the 2004 International Symposium on Circuits and Systems, vol. 2, May 2004, pp. II 837–840. [22] N. Nedjah, L.M. Mourelle, A review of modular multiplication methods and respective, Informatica on Hardware Implementations 30 (1) (2006) 111–129. [23] M.-M. Luiza, N. Nediah, Fast reconfigurable hardware for the m-ary modular exponentiation, in: Proceedings of the EUROMICRO Systems on Digital System Design, 2004, pp. 516–523. [24] N. Nedjah, L.M. Mourelle, Four hardware implementations for the m-ary modular exponentiation, in: Proceedings of the Third International Conference on Information Technology, New Generations, April 2006, pp. 210–215. [25] W.-C. Yang, D.J. Cuan, C.-S. Laih, Algorithm of asynchronous binary signed-digit recoding on fast multi-exponentiation, Applied Mathematics and Computation 167 (1) (2005) 108–117.