High performance FPGA-based decimal-to-binary conversion schemes for decimal arithmetic

Microprocessors and Microsystems 37 (2013) 287–298 Contents lists available at SciVerse ScienceDirect Microprocessors and Microsystems journal homep...

Download PDF

1000KB Sizes 0 Downloads 67 Views

Report

PDF Reader
Full Text

Microprocessors and Microsystems 37 (2013) 287–298

Contents lists available at SciVerse ScienceDirect

Microprocessors and Microsystems journal homepage: www.elsevier.com/locate/micpro

High performance FPGA-based decimal-to-binary conversion schemes for decimal arithmetic Osama Al-Khaleel a,⇑, Zakaria Al-Qudah b, Mohammad Al-Khaleel c, Christos Papachristou d a

Department of Computer Engineering, Jordan University of Science and Technology, Irbid, Jordan Department of Computer Engineering, Yarmouk University, Irbid, Jordan c Department of Mathematics, Yarmouk University, Irbid, Jordan d Department of Electrical Engineering and Computer Science, Case Western Reserve University, Cleveland, OH, United States b

a r t i c l e

i n f o

Article history: Available online 6 February 2013 Keywords: FPGAs Decimal arithmetic BCD Conversion LUT Schemes

a b s t r a c t Despite that it has been recognized that decimal arithmetic is more suitable than binary arithmetic for human-centric applications, binary arithmetic is still predominant in today’s computers. One approach to bridging this gap involves converting the decimal operands to binary, performing arithmetic in binary, and converting the result back to decimal. Based on this approach, this paper presents novel high-performance decimal-to-binary conversion circuits to support decimal arithmetic over different FPGAs families. Our circuits are based on a simple, yet effective idea. Bits of the BCD inputs are grouped into a number of groups. The contribution of each group to the overall binary result is computed separately. Then these contributions are added to form the ﬁnal binary result. The performance evaluation presented in this paper indicates that the proposed circuits perform signiﬁcantly better than existing BCD-to-binary conversion circuits. Furthermore, for a given FPGA family, the comparison reveals that certain bit-grouping may perform better than others. In addition, we have studied the growth in area and time for each bitgrouping scheme with respect to the number of digits in the BCD input. Ó 2013 Elsevier B.V. All rights reserved.

1. Introduction Many of today’s applications such as ﬁnancial, Internet-based, and scientiﬁc applications handle decimal operands. There are two major approaches for performing arithmetic on decimal operands.1 The ﬁrst approach involves directly manipulating these decimal operands (such as [1]) which has the advantage of reducing potential rounding errors [2]. Furthermore, for applications that require extensive processing of decimal operands, direct manipulation of these decimal numbers might promise better performance due to the elimination of BCD-to-binary and binary-to-BCD conversions. The second approach, as illustrated in Fig. 1, involves converting the decimal operands to binary, perform the required arithmetic in binary, and convert the result back to decimal. The advantage of this approach is that it utilizes the already predominant binary arithmetic hardware. Furthermore, some applications require converting the operands to binary once, perform several operations on the converted operands and converting the ﬁnal result back to decimal. For these applications, the second approach promises better perfor⇑ Corresponding author. Tel.: +962 772107130. E-mail addresses: [email protected] (O. Al-Khaleel), [email protected] (Z. Al-Qudah), [email protected] (M. Al-Khaleel), [email protected] (C. Papachristou). 1 We assume Binary-Coded-Decimal (BCD) representation for these decimal operands. Therefore, we use the terms decimal and BCD numbers interchangeably. 0141-9331/$ - see front matter Ó 2013 Elsevier B.V. All rights reserved. http://dx.doi.org/10.1016/j.micpro.2013.01.002

mance due to the use of the already optimized binary arithmetic hardware. In this paper, we present several high performance architectures for decimal-to-binary conversion to support decimal arithmetic. We ﬁrst split the BCD input into several groups of bits. We then compute the contribution of each group to the ﬁnal result and add the contributions to form the ﬁnal result. For example, suppose that the BCD operand to be converted to binary is (346)10 or (0011 0100 0110)BCD. For a group size of 4 bits (one BCD digit), the digit 6 contributes (0110)2 to the ﬁnal binary result. The digit 4 contributes (40)10 or (101000)2. The digit 3 contributes (300)10 or (100101100)2. After obtaining the contribution of each digit in the BCD operand we add these contributions to obtain the ﬁnal result. Therefore, adding (0110)2 + (101000)2 + (100101100)2 in binary results into (101011010)2 = (346)10. While this approach seems simple, we show in Section 6 that it performs signiﬁcantly better than existing techniques for converting BCD numbers to their binary equivalents on FPGAs. There are several reasons for this superior performance. First, all the contributions are computed in parallel and the addition of these contributions is done via the fast carry-chain logic in the FPGA which results into a fast circuit. Second, the outputs of the contribution generation circuit for a given group are functions of a number of variables equals to the group size. For example, in a

288

O. Al-Khaleel et al. / Microprocessors and Microsystems 37 (2013) 287–298

A disadvantage of converting BCD numbers based on this formula is that it requires multiplication of powers of 10. As the number of digits grows, the size of the multipliers needed quickly grows. The authors in [8] re-write the above equation using Horner’s rule as follows:

BCD Operands

BCD2BIN Conversion circuits

D ¼ ðððDn1 10 þ Dn2 Þ10 þ Þ10 þ D0 Þ:

Binary Operands

ð2Þ

The authors in [8] also present other arrangements of this formula to increase parallelism of computations as follows:

D ¼ ðððDn1 10 þ Dn2 Þ100 þ Þ100 þ ðD1 10 þ D0 ÞÞ;

ð3Þ

Binary Arithmetic Hardware D ¼ ðððDn1 100 þ Dn2 10 þ Dn3 Þ103 þ Þ103 þ ðD2 100 þ D1 10 þ D0 ÞÞ;

ð4Þ

Binary Results D ¼ ððððDn1 10 þ Dn2 Þ100 þ Dn3 10 þ Dn4 ÞÞ104 þ Þ104 þ ððD7 10 þ D6 Þ100 þ D5 10 þ D4 Þ104 þ ððD3 10 þ D2 Þ100

BIN2BCD Conversion circuits

þ D1 10 þ D0 Þ

BCD Results Fig. 1. BCD arithmetic based on binary hardware.

4-bit grouping, the outputs of the group contribution generation are functions of the 4 bits of the group. By choosing the group size that matches the look-up table size on an FPGA family, each function requires only one look-up table (LUT) in the FPGA which results into a compact overall design. The rest of this paper is organized as follows: Section 2 discusses techniques for decimal to binary conversion and other related work. Section 3 discusses the proposed techniques in details. Section 4 provides area and delay analysis for the 4-bit grouping scheme. Section 5 discusses the implementation of our schemes on various FPGA architectures. Section 6 discusses the performance of our techniques. We conclude in Section 7. 2. Related work As mentioned in Section 1, there are two major approaches to decimal arithmetic: direct manipulation of BCD numbers and BCD arithmetic based on binary hardware. For example, the authors in [3,4] present architectures for BCD digit by BCD digit multiplication. Vazquez et al. [1] presents an architecture that operates on multiple BCD digits. The authors in [5,6] present techniques for BCD addition/subtraction on two BCD operands while [7] operates on multi-operands. The architectures described in [8–10] are examples of BCD arithmetic based on binary hardware (the BCD number is ﬁrst converted to its binary equivalent, the arithmetic is performed in binary, and the result is converted back to BCD). The authors highlight several techniques to perform decimal-to-binary conversion. For example, one technique is the traditional successive division by two. In this technique, the BCD input is shifted right by one bit position. Each BCD digit in the shifted number is tested. If the BCD digit is greater than or equal to 8, the number 3 is subtracted from the digit [8,11–13] or the most signiﬁcant bit in the digit is cleared and the number 5 is added to the digit [8]. The procedure is repeated until all bits are generated. Another method for converting a BCD number to its binary equivalent is direct computation based on the following formula in binary:

Dn1 D1 D0 ¼ Dðn1Þ 10n1 þ þ D1 101 þ D0 100 :

ð1Þ

ð5Þ

In [13], the authors proposed a method to convert a BCD number to its binary equivalent based on expanding the BCD number and then shifting the individual BCD digits to left (multiplying by multiples of 2) according to their position in the BCD number. For example, the BCD number 76 is expanded to (7 10 + 6 1 = 7 (8 + 2) + 6 1 = 7 23 + 7 21 + 6 20). This means that the binary equivalent of the BCD number 76 can be obtained by the addition of 7 shifted to the left 3 times (0111000)2, 7 shifted to the left 1 time (01110)2, and 6 not shifted (0110)2. The binary result is (0111000)2 + (01110)2 + (0110)2 = (1001100)2. The authors in [13] employed 4-bit carry-look-ahead addition in a complex tree structure to design an 8-digit to 27-bit converter. The authors in [14] proposed a faster implementation for the expansion method of [13]. The method was demonstrated by presenting the implementation of a 7-digit to 24-bit converter. Instead of using 4-bit carry-look-ahead adders to add the bits of the expanded numbers, the bits are grouped according to their positions carefully such that the result of adding the arranged bits within each group does not exceed (15)10 = (1111)2. By applying this rule, partial sums are obtained without any carry propagation at this stage of the design. PROMs are used to generate these partial sums. Same approach is followed in the second level of logic. The ﬁnal result, which is the sum of the outcomes of the ﬁrst and the second stages and any individual bits that do not belong to any group, is obtained in the third stage where 4bit carry-look-ahead adders and other logic are used whenever needed. The BCD to binary conversion method presented in [15] employs a code converter that converts consecutive pairs of BCD digits to their binary equivalent. The binary codes are generated using PROMs. The ﬁnal binary result is obtained by adding up individual bits in the binary codes using the same approach in [14]. In this paper we compute the binary equivalent of a BCD number based on the direct formula (1) using a novel method. Instead of using multipliers to compute each term of the formula, we compute the contribution of each digit using direct Boolean functions. These contributions are then added appropriately to form the ﬁnal binary result. The authors in [10] have employed a similar technique to design a 4-digit BCD-to-binary conversion circuit which they used in developing an iterative decimal multiplier. However, the BCD-to-binary circuit they developed does not constitute a generic BCD-to-Binary conversion circuit. Furthermore, they did not discuss the implementation and performance of this conversion circuit as a separate module since their focus was on creating an iterative BCD multiplier. In our case, we focus on designing a generic circuit for parallel BCD-to-binary conversion which can be used in any BCD arithmetic circuit that utilizes binary hardware. While the technique in [15] groups input bits into groups of 8-bits

289

O. Al-Khaleel et al. / Microprocessors and Microsystems 37 (2013) 287–298

similar to one of the techniques presented in this work (the 8-bit grouping), the design of the code converter and the addition stages in [15] is different than ours. In addition, the design in [15] always separates the least signiﬁcant BCD digit and considers it as a one group by itself. More importantly, we explore various grouping schemes in addition to the 8-bit grouping. When implementing the work of [15,14] for performance evaluation, we found various typos that we report in Section 6.

Table 1 Example: D2 contribution generator output functions.

3. Proposed architecture Our scheme is based on splitting the input BCD number into groups of consecutive bits from the least signiﬁcant position to the most signiﬁcant position. Throughout this work each group will be denoted as Gi where i is an integer that represents the position of the group starting from right to left. For example, if there are M groups then the least signiﬁcant group would be denoted as G0 and the most signiﬁcant group would be denoted as GM1. The binary contribution of each group varies based on its position or index. It is generated using a digital hardware that has the individual bits of the group as inputs and the individual bits of the binary contribution as outputs. The binary contributions from all groups are added using a binary addition stage to generate the binary equivalent of the input BCD number. This section discusses three different bit-grouping schemes that have been investigated in this work for BCD-to-binary conversion. These are 4-bit grouping, 6-bit grouping, and 8-bit grouping.

BCD digit

Contribution in binary (wd2)

0000 0001 0010 0011 0100 0101 0110 0111 1000 1001

00 00 00 01 01 01 10 10 11 11

0000 0000 0110 0100 1100 1000 0010 1100 1001 0000 1111 0100 0101 1000 1011 1100 0010 0000 1000 0100

10i Di. We list the contributions for each digit to the ﬁnal result and derive the Boolean functions that compute these contributions. Table 1 lists these contributions for the digit D2 as an example. For the invalid inputs (greater than 1001), the functions’ outputs are don’t care. As shown, D2’s contribution to the ﬁnal binary result wd2 is 10 bits in size. From the table, we can derive the equations for these bits as follows:

wd2 ½9 ¼ A3 þ A2 A1 ; wd2 ½8 ¼ A2 A1 A0 þ A3 þ A2 A1 ; wd2 ½7 ¼ A3 A0 þ A2 A1 A0 þ A2 A1 þ A2 A0 ;

3.1. 4-Bit grouping

wd2 ½6 ¼ A1 A0 þ A3 A1 A0 ;

In this scheme the size of the group is 4 bits (1 BCD digit). The scheme is outlined in Fig. 2 where WD0 is used to represent the size of the output of the D0 contribution generator unit, WD1 is used to represent the size of the output of the D1 contribution generator unit, and so on. The BCD input size is N BCD digits, DN1DN2 D1D0, each digit is fed to its corresponding contribution generator unit that computes the contribution of that digit to the ﬁnal binary result. The contribution of D0 is the same four bits representing D0. For example, a (4)10 = (0100)BCD contributes (0100)2 to the ﬁnal binary result. D1 contributes the binary equivalent of 10 D1 to the ﬁnal binary result. In general, Di contributes the binary equivalent of

wd2 ½5 ¼ A3 A0 þ A3 A0 ; wd2 ½4 ¼ A2 ; wd2 ½3 ¼ A1 ; wd2 ½2 ¼ A0 ; wd2 ½1 ¼ 0; wd2 ½0 ¼ 0; where A3 A0 are the 4-bit BCD representation of D2. These Boolean functions represent the contribution generation box corresponding to D2 among the contribution generation boxes shown in Fig. 2.

DN−1 4

DN−2 4

DN−1Contribution Generator

DN−2Contribution Generator

D1 Contribution Generator

WDN−1

WDN−2

WD1

D1 4

Binary Adder

D0 4 D0 Contribution Generator WD0

Binary Adder

Binary Adder

Binary Adder

Binary Addition organized in tree structure Binary Adder W Fig. 2. Architecture for the 4-bit grouping scheme.

290

O. Al-Khaleel et al. / Microprocessors and Microsystems 37 (2013) 287–298

We note that a digit at position i contributes wdi to the ﬁnal binary results with size WDi where

WDi ¼ blog2 ð9 10i Þc þ 1 bits; i ¼ 0; 1; . . . ; N 1;

ð6Þ

and the ﬁnal binary result w requires W bits where

W ¼ blog2 ð10N 1Þc þ 1 bits;

ð7Þ

where we deﬁne for any number x, bxc = ‘, and ‘ is the unique integer such that ‘ 6 x < ‘ + 1. We observe the following characteristics of these contributions: The contribution of a digit at position i has the least signiﬁcant i bits always equal zero (the same observation was reported in [10]). In the example for D2 above, the least 2 bits are equal to zero for all possible input combinations. In the contributions addition stage, these bits are not added with other contributions which results into a smaller adders size. Some other bits of the contributions are always zero as well for all possible combinations. For example, for D5 bit number 14 of the contribution is always zero. We note however that the number of these zero bits is small. Furthermore, the contribution generation Boolean equations are functions of 4 bits (A3 A0). Therefore, each of these functions ﬁts into a single look-up table on FPGAs with 4-input LUTs which results into an overall design with a small area. We note that the contribution generation for all digits is done in parallel. The next stage is to add these contributions to form the ﬁnal binary result. We organize the adders in a tree-like structure to speed up the addition. Further, the fast carry chain logic available in the FPGA is used for a speedy addition. An example to illustrate the 4-bit grouping approach is shown in Fig. 3. In this example, the input size is 9 BCD digits (i.e., nine 4-bit groups). The binary contribution of each group is ﬁrst generated and then all these contributions are added using the binary addition stage to generate the binary equivalent.

3.2. 6-Bit grouping In this scheme, the size of each group is 6 bits (1.5 BCD digits). The BCD input size is N BCD digits DN1DN2 . . . D1D0. The groups 1e. For an are referenced as G0, G1, G2, . . . , Gm, where m ¼ d2N 3 even integer i, the least signiﬁcant four bits of group Gi are composed of the BCD digit D3i . On the other hand, the most signiﬁcant 2

two bits of group Gi are the least signiﬁcant two bits of the BCD digit D3iþ1 . For an odd integer i, the least signiﬁcant two bits of group 2

Gi are composed of the most signiﬁcant two bits of the BCD digit D3i1 , while the most signiﬁcant four bits of group Gi are the BCD di2

git D3i1þ1 . 2

Given a group Gi, if i is even then the greatest decimal value of 3i

the group is 39 10 2 . This is because the most signiﬁcant two bits of the group comes from the least two bits of a BCD digit. Therefore, they can be only 00, 01, 10, 11 in binary or 0, 1, 2, 3 in decimal whereas the least signiﬁcant four bits can be any BCD digit (0–9). On the other hand, if i is odd then the greatest decimal value of 3i1

the group is 98 10 2 . This is because the least signiﬁcant two bits of the group come from the most signiﬁcant two bits of a BCD digit. Therefore, they can take values of 00, 01, and 10 in binary or 0, 4, and 8 in decimal whereas the most signiﬁcant four bits can be any BCD digit (0–9). It should be pointed out that the size of the most signiﬁcant group in this scheme can be 6 bits, 4 bits, or 2 bits according to the number of BCD digits in the input. When converting the decimal equivalent of each group to binary, the least signiﬁcant K bits of the binary equivalent are zeros. K increases according to the position of the group from right to left. For a given group Gi, Ki is calculated using the following equation:

(

Ki ¼

3i ; 2 3iþ1 ; 2

i ev en;

ð8Þ

i odd:

Decimal Digits

D8

D7

D6

D5

D4

D3

D2

D1

D0

1001

0111

0110

0010

0100

0000

0101

1001

0010

Decimal Equivalent

9x10 8

7x10 7

6x10 6

2x10 5

4x10 4

0x10 3

5x10 2

9x10 1

2x10 0

Binary Equivalent

To Binary

To Binary

To Binary

To Binary

To Binary

To Binary

To Binary

To Binary

To Binary

4

30

4

4

27

4

24

4

20

27

4

17

4

14

20

4

10

7

14

27

4

7

Binary Addition

4

14

27

30

Binary_out Fig. 3. An example to illustrate the approach for the case of 4-bit grouping (1 BCD digit).

291

O. Al-Khaleel et al. / Microprocessors and Microsystems 37 (2013) 287–298

Decimal Digits

D8

D7

D6

D5

D4

D3

D2

D1

D0

1001

0111

0110

0010

0100

0000

0101

1001

0010

4

6−bit Groups

Decimal Equivalent

Binary Equivalent

G5

2

2

1001 01

G4

4

11 0110

4

G3

2

2

0010 01

G2

4

4

00 0000

G1

2

2

0101 10

G0

4

01 0010

9x10 8

6x10 6

2x10 5

0x10 3

5x10 2

2x10 0

4x10 7

3x10 7

4x10 4

0x10 4

8x10 1

1x10 1

94x10 7

36x10 6

24x10 4

00x10 3

58x10 1

12x10 0

To Binary

To Binary

To Binary

To Binary

To Binary

To Binary

30

26

20

30

16

10

20

6

10

Binary Addition 20

30

Binary_out Fig. 4. An example to illustrate the approach for the case of 6-bit grouping (1.5 BCD digit).

Also, the size of the binary equivalent of each group increases according to the position of the group from right to left. If the size of the binary equivalent of group Gi is WGi, then WGi is calculated using the following equation:

8 3i > < blog2 39 10 2 c þ 1; i ev en; WGi ¼ 3i1 > : blog2 98 10 2 c þ 1; i odd:

ð9Þ

The Boolean equation of each bit in the binary equivalent of group Gi is formulated based on the different combinations of the group and its position among all groups. An example to illustrate the 6-bit grouping approach is shown in Fig. 4. In this example, the input size is 9 BCD digits (i.e., six 6-bit groups). The decimal equivalent of each group is formed based on the 2 and 4 bits that compose the group as has been mentioned. The binary contribution of each group is ﬁrst generated and then all these contributions are added using the binary addition stage to generate the binary equivalent.

4. Area and delay analysis for the 4-bit grouping scheme This section presents an estimation for the area and delay for the implementation of the 4-bit grouping scheme on 4-input LUT FPGAs. Similar discussion can be easily derived for the other two grouping schemes (6-bit and 8-bit on 6-input LUT FPGAs and 8-input LUT FPGAs respectively). As mentioned before, the proposed BCD-to-binary convertor using the 4-bit grouping has two main stages: the BCD digits’ contributions generation stage and the binary addition stage. Each BCD digit consists of 4 bits, and therefore, the individual bits of the contribution of a BCD digit are computed based on its four bits. Hence, the Boolean logic functions of the contribution generator blocks of Fig. 2 are functions of four variables. This means that targeting an FPGA family of 4-input LUTs or higher would result in the BCD digit’s contributions generation stage be implemented with a single LUT per Boolean function. Since all of these functions are computed concurrently, the logic delay of the contributions generation stage is equal to the delay of a single LUT. If the logic delay for a single LUT is DTLUT, then:

CGlogic 3.3. 8-Bit grouping In this approach the size of the group is 8 bits (2 BCD digits). The number of groups in this case is dN2 e. Group Gi is composed from the BCD digit D2i that lies in the right hand side of the group and the BCD digit D2i+1 that lies in the left hand side of the group. The largest decimal equivalent of Gi is 99 102i. If the size of the binary equivalent of group Gi is WGi then WGi is given by blog2(99 102i)c + 1 and the number of the least signiﬁcant bits in the binary equivalent of group Gi that are zeros is 2i. The size of the most signiﬁcant group in this scheme can be 8 bits, or 4 bits according to the number of BCD digits in the input.

delay

¼ DT LUT ;

ð10Þ

where CGlogic_delay is the logic delay of the contributions generation stage. It should be noted that CGlogic_delay is independent of the number of BCD digits. To estimate the area of the contributions generation stage, the number of Boolean functions that come out of this stage is computed using Eq. (6) by substituting the values of is and then adding up the number of Boolean functions per contribution generator block. Observing that the least signiﬁcant i bits of the contribution of the ith BCD digit are always zeros, for N BCD digits the total number of Boolean functions (NBF) is calculated as follows:

292

O. Al-Khaleel et al. / Microprocessors and Microsystems 37 (2013) 287–298

NBF ¼ ½ðblog2 ð9 100 Þc þ 1Þ 0 þ ½ðblog2 ð9 101 Þc þ 1Þ N2

1 þ þ ½ðblog2 ð9 10 N1

þ ½ðblog2 ð9 10

Þc þ 1Þ ðN 2Þ

Þc þ 1Þ ðN 1Þ

N1 X fðblog2 ð9 10i Þc þ 1Þ ig: ¼

ð11Þ

Note that Eq. (17) does not account for the area needed for the routing logic as the area in the equation is measured in terms of the number of LUTs. Furthermore, we note that some of the contribution bits might be functions of one variable and therefore do not require a LUT. Therefore, Eq. (17) represents an upper bound on the area (in number of LUTs).

i¼0

Since each of these functions can be mapped to a single LUT, the total area of the contributions generation stage (CGarea) can be calculated as:

CGarea ¼ NBF DAreaLUT ;

ð12Þ

where DAreaLUT is the area of a single LUT. In the binary addition stage, the adders are arranged in a tree structure. Let us assume that N is a power of two integer for simplicity. Then the number of levels of the tree is log2(N). The number of adders in each level from top to bottom is N2 ; N4 ; N8 ; . . . ; 2; 1. The size of the adders in the same level increases from right to the left. Therefore, the logic delay of the addition stage is dominated by the delay of the group of adders located to the leftmost side of the architecture of Fig. 2. In FPGAs, binary adders are implemented using the fast carry chain. An n-bit binary adder consumes n LUTs (n stages of the fast carry chain). The size of the leftmost adder in the top level of the addition stage is equal to the number of Boolean function from DN1 contribution generator block of Fig. 2 which is (blog2(9 10N1)c + 1). If we assume that the size of the leftmost adder in each level is the same as the size of this adder (which is the worst case) and if we assume that the delay of a single stage in the fast carry chain is DTFCC, then the estimated logic delay of the binary addition stage in the proposed architecture (BAlogic_delay) is calculated using the following equation:

BAlogic

delay

¼ log2 ðNÞ ðblog2 ð9 10N1 Þc þ 1Þ DT FCC :

ð13Þ

To estimate the area of the binary addition stage, one should observe that in the top level of the tree, two consecutive BCD digits contributions are added using one adder and the adder size is dominated by the size of the contribution of the digit in the odd position (given that positions start from 0). Since the adders are to be implemented using the fast carry chain, the number of LUTs required to implement a level in the addition stage is equal to the sum of the adders sizes in that level. Based on this, and using Eq. (6), the total number of LUTs (NLUTs) of the binary addition stage is as follows: N

NLUTs ¼

1 2 X fðblog2 ð9 102iþ1 Þc þ 1Þ ð2i þ 1Þg i¼0 N

þ

1 4 X fðblog2 ð9 104iþ3 Þc þ 1Þ ð4i þ 3Þg þ i¼0

þ ½ðblog2 ð9 10N1 Þc þ 1Þ ðN 1Þ:

ð14Þ

Therefore, the area of the binary addition stage (BAarea) in terms of DAreaLUT is:

BAarea ¼ NLUTs DAreaLUT :

ð15Þ

Based on Eqs. (10) and (13), the estimated time delay (Architime) of the proposed architecture is estimated as follows:

Architime ¼ CGlogic

delay

þ BAlogic

delay

þ DT routing ;

ð16Þ

where DTrouting is the routing delay after the placement and routing of the design in the FPGA. The estimated area (in terms of LUTs) of the proposed architecture (Archiarea) is calculated using Eqs. (12) and (15) as follows:

Archiarea ¼ CGarea þ BAarea :

ð17Þ

5. FPGA implementation The implementation of our various bit-grouping schemes varies from one FPGA family to another based on the size of the look-up table (LUT) and the fabrication technology of the FPGA family. For example, a logic function of 4 variables ﬁts into a single 4-input LUT. On the other hand, a 6-variable function requires a hierarchy of 4-input LUTs to be implemented on 4-input LUT FPGAs. The 4-variable function would ﬁt in a single 6-input LUT, but the utilization of the LUT would be low because only 25% of the LUT is used and the rest is wasted (i.e., a 4-input function has 16 different combinations whereas the 6-input LUT has the capacity of 64 combinations). Fig. 5 shows the expected implementation of a four BCD digits to binary converter on a 4-input LUT FPGAs. In this ﬁgure, the implementation of the binary contribution of digit D1 or G1 (D1_Cont) is shown in details where each bit in this contribution is generated using a single 4-input LUT as each bit is a function of four variables (the bits of D1). The least signiﬁcant bit of D1_Cont is zero. All binary contributions of the BCD digits are added using the binary addition stage that employs the dedicated fast carry chain in the FPGA. The implementation of the 6-bit grouping scheme on 4-input LUT FPGAs requires each bit in the binary contributions of the groups to be implemented using a hierarchy of 4-input LUTs since each bit is a function of six variables. However, the number of levels in the addition stage and the number of adders in each level are reduced by approximately a factor of 1.5 when compared to the case of 4-bit grouping. On the other hand, the size of the adders in the addition stage becomes larger when compared to the size of the adders in the 4-bit grouping. The implementation example of the four BCD digits to binary converter using 6-bit grouping on 4-input LUT FPGAs is illustrated in Fig. 6. Again, we show the implementation of the bits of the binary contributions of G1 (G1_Cont) in details. The least signiﬁcant two bits (G1_Cont[0] and G1_Cont[1]) are both zeros. The implementation of the 4-bit grouping on a 6-input LUT FPGAs is illustrated in Fig. 7. In this case, the binary addition stage is similar to that of the 4-bit grouping on 4-input LUTs. The drawback here is that the utilization of the 6-input LUTs used to implement the binary contributions generation stage is low because two out of the six inputs of the look-up table are unused. The implementation of the 4-digit BCD-to-binary converter using 6-bit grouping on 6-input LUT FPGAs produces similar addition stage to that of the 6-bit grouping on a 4-input LUT FPGAs. However, in this case each bit in the binary contribution generator is implemented using one 6-input LUTs as opposed to the case of 6-bit grouping over 4-input LUTs in which a hierarchy of LUTs is needed. An illustration of this implementation is shown in Fig. 8. The implementation of the binary contribution of the group G1 (G1_Cont) is shown in details. In the case of 8-bit grouping, both 4-input and 6-input LUT FPGAs require a hierarchy of LUTs to implement the contributions generation stage. However, the number of levels in the addition stage as well as the number of adders in each level are reduced by a factor of two (compared to 4-bit grouping) at the expense of larger sizes for the adders. In the next section, we experimentally evaluate the different bit grouping schemes on FPGAs with different look-up table sizes and fabrication technology.

293

D1 [0] D1 [1] D1 [2] D1 [3]

4 LUT−4

D1 _Cont[1]

Binary adder using fast carry chain

D0 _Cont

0

Binary adder using fast carry chain

O. Al-Khaleel et al. / Microprocessors and Microsystems 37 (2013) 287–298

D1 _Cont

LUT−4

D1 _Cont[2]

7 D2 _Cont 10 D3 _Cont

D1 [0] D1 [1] D1 [2] D1 [3]

LUT−4

D1 _Cont[6]

14

Binary adder using fast carry chain

D1 [0] D1 [1] D1 [2] D1 [3]

Binary_Out

Fig. 5. The implementation of 4-digit BCD to binary conversion using 4-bit groups on 4-input look-up tables (LUTs-4).

Hierarchy of LUTs−4 G1 _Cont[2]

G 0 _Cont 6 G1 _Cont 10

G1 [0] G1 [1] G1 [2] G1 [3] G1 [4] G1 [5]

Hierarchy of LUTs−4

G1 [0] G1 [1] G1 [2] G1 [3] G1 [4] G1 [5]

Hierarchy of LUTs−4

G1 _Cont[3]

G2 _Cont 14

Binary adder using fast carry chain

G1 [0] G1 [1] G1 [2] G1 [3] G1 [4] G1 [5]

Binary adder using fast carry chain

0 0

Binary_Out

G1 _Cont[9]

Fig. 6. The implementation of 4-digit BCD to binary conversion using 6-bit groups on 4-input look-up tables (LUTs-4).

6. Results

6.1. 4-Bit grouping scheme versus existing schemes

In this section, we ﬁrst compare the performance of the proposed 4-bit grouping (one BCD digit grouping) scheme to other existing schemes. We then compare the performance of the different grouping schemes (4-bit or 1 BCD digit, 6-bit or 1.5 BCD digits, and 8-bit or 2 BCD digits) on a variety of FPGA devices (devices with 4-input and 6-input LUTs).

We ﬁrst select the 4-bit grouping scheme to compare it to existing schemes since most of these schemes are implemented on a 4input LUT FPGAs, and we believe that the best performance will be obtained with 4-bit grouping in this case. We have implemented the 4-bit grouping scheme along with six different existing schemes for BCD-to-binary conversion (discussed in Section 2)

D0 _Cont

0 D1 [0] D1 [1] D1 [2] D1 [3]

4 D1 _Cont[1] LUT−6 D1 _Cont

D1 [0] D1 [1] D1 [2] D1 [3]

D2 _Cont D1 _Cont[2] LUT−6

10 D3 _Cont 14

D1 [0] D1 [1] D1 [2] D1 [3]

Binary adder using fast carry chain

7

Binary adder using fast carry chain

O. Al-Khaleel et al. / Microprocessors and Microsystems 37 (2013) 287–298

Binary adder using fast carry chain

294

Binary_Out

D1 _Cont[6] LUT−6

Fig. 7. The implementation of 4-digit BCD to binary conversion using 4-bit groups on 6-input look-up tables (LUTs-6).

G1 _Cont[2] LUT−6

G0 _Cont 6 G1 _Cont 10

G1 [0] G1 [1] G1 [2] G1 [3] G1 [4] G1 [5]

LUT−6

G1 [0] G1 [1] G1 [2] G1 [3] G1 [4] G1 [5]

LUT−6

G1 _Cont[3] G2 _Cont 14

Binary adder using fast carry chain

G1 [0] G1 [1] G1 [2] G1 [3] G1 [4] G1 [5]

Binary adder using fast carry chain

0 0

Binary_Out

G1 _Cont[9]

Fig. 8. The implementation of 4-digit BCD to binary conversion using 6-bit groups on 6-input look-up tables (LUTs-6).

using Verilog HDL data ﬂow model. All schemes are functionally veriﬁed by simulation. The code of each scheme is then synthesized using Xilinx ISE 10.1 cad tool targeting Xilinx Virtex-4

SX35-12 FPGA. Table 2 shows the area results and Table 3 shows the delay results. As shown, our scheme achieves between 31% and 65% reduction in delay for various input size (number of BCD

295

O. Al-Khaleel et al. / Microprocessors and Microsystems 37 (2013) 287–298 Table 2 Comparison of resources utilization of our scheme (the 4-bit grouping) versus other schemes (in number of LUTs). Design

Number of digits

Our 4-bit grouping ShiftAdd5 [8] ShiftSub3 [8,11–13] Eq. (2) [8] Eq. (3) [8] Eq. (4) [8] Eq. (5) [8]

2

4

8

16

6 20 22 11 9 9 9

34 133 142 48 43 54 43

141 575 653 205 173 236 170

538 2429 2792 838 671 915 627

Table 5 Performance comparison between our scheme (the 4-bit grouping) and [14] for a 7digit BCD to binary convertor. Design

Area (LUTs)

Delay (ns)

[14]’s Design Our 4-bit grouping

152 102

8.57 4.38

Table 6 Performance comparison between our scheme (the 4-bit grouping) and [15]. Design

Design

Our 4-bit grouping ShiftAdd5 [8] ShiftSub3 [8,11–13] Eq. (2) [8] Eq. (3) [8] Eq. (4) [8] Eq. (5) [8]

Number of digits 2

4

8

16

1.02 5.06 5.06 2.03 1.99 1.99 1.99

2.99 14.41 14.41 6.49 5.42 4.33 5.42

4.49 30.33 31.11 15.45 12.12 11.18 10.00

6.46 63.40 64.18 33.42 25.82 22.81 19.39

Table 4 Performance comparison between our scheme (the 4-bit grouping) and [13] for an 8digit BCD to binary convertor. Design

Area (LUTs)

Delay (ns)

[13]’s Design Our 4-bit grouping

156 141

5.47 4.49

Our 4-bit grouping [15]’s Design

Area (LUTs) 16-Digit

8-Digit

16-Digit

4.49 7.66

6.46 12.37

141 426

538 2005

1 BCD digit 1.5 BCD digits 2 BCD digits

2350 2100

Area (Number of LUTs)

Table 3 Comparison of delay of our scheme (the 4-bit grouping) versus other schemes (in ns).

Delay (ns) 8-Digit

1850 1600 1350 1100 850 600 350 100 Virtex-4

digits) when compared to the best of other schemes. Furthermore, our scheme achieves, when compared to the best of other schemes, 33%, 20%, 17%, and 14% reduction in area for operands of two digits, four digits, eight digits, and 16 digits respectively. Then we have used the same implementation environment (Xilinx ISE 10.1 cad tool targeting Xilinx Virtex-4 SX35-12 FPGA) to implement the designs presented in [13–15] and compare their results to our results. In the implementation of the design of [13], we have used fast carry chain instead of the 4-bit carry-lookahead additions (which will increase the performance of their scheme). We note that we did not attempt to design larger conversion circuits based on their approach since the design complexity increases dramatically as the BCD input size increases. Therefore, we implement an 8-digit BCD to binary convertor based on their approach and compare it with an 8-digit BCD to binary convertor using our 4-bit grouping approach. The results are shown in Table 4. Despite the fact that we have used the fast carry chain in implementing the design of [13], our 4-bit grouping scheme achieves 9.6% reduction in size and 18% speedup. In [14],2 instead of using the old PROM-based technology as they suggest, we use 4-bit adders structured in a way similar to their approach. Again, we implement only a 7-digit BCD to binary convertor based on their approach since implementing larger converters involves a signiﬁcant design effort to ﬁnd the appropriate groups. We then compare their scheme with a 7-digit BCD to binary 2

We note that the scheme as described in Fig. 3 in [14] involves a typo. Speciﬁcally, it is not clear where bits E2 and E1 at position 9 and E2, E1, F8, and F2 at position 10 should be added. We found two possible ways to account for these bits: either individually or with group P7 as adding these bits to P7 in the ﬁrst stage of design preserves the property of the group sum not exceeding 15 (i.e., no carry out). We have veriﬁed the correctness of both options and found that the best results are achieved when the bits are included in P7. Therefore, we report these results.

Virtex-5

Virtex-6

Virtex-7

Fig. 9. Area comparison of the implementations of a 16-digit BCD to binary convertor using the three grouping schemes and targeting different FPGA families.

convertor based on our 4-bit grouping. The results are shown in Table 5. Our 4-bit grouping scheme achieves about 33% reduction in area and about 49% speedup. For the case of [15], we have replaced the PROMs with boolean equations. However, we have preserved the same addition stage structure that the authors in [15] presented. Table 6 shows the results.3 Our 4-bit grouping scheme achieves 41% and 48% speedup for 8-digit and 16-digit BCD to binary converters respectively. Furthermore, 67% and 73% reduction in the area of the design is achieved for 8-digit and 16-digit BCD to binary converters respectively. 6.2. Comparisons of the various grouping schemes The performance of a particular scheme on a particular FPGA architecture is affected by the performance of contributions generation stage and the performance of the addition stage. Larger bit grouping may require a hierarchy of LUTs on FPGAs with smaller size LUTs which may result in poor contributions generation per3 The design they provided also includes several typos. (i) In Fig. 3B in [15], at position 31 the bits g20 and h18 should not be included in group z4 to ensure that the binary addition of the group does not generate a carry. Also, these bits have been included in group N in Fig. 4B. (ii) In Fig. 4B, the bit c4 at position 11 in group W should be replaced with e4. (iii) Bit z44 at position 32 in group N should be replaced by z41. (iv) Bit x42 at position 33 in group M should be replaced by y42. We note that we discovered these typos while implementing their approach up to 16 digits. Therefore, other typos may exist for larger converters.

296

O. Al-Khaleel et al. / Microprocessors and Microsystems 37 (2013) 287–298

10

1 BCD digit 1.5 BCD digits 2 BCD digits

9 8

Delay (ns)

7 6 5 4 3 2 1 0 Virtex-4

Virtex-5

Virtex-6

Virtex-7

Fig. 10. Delay comparison of the implementations of a 16-digit BCD to binary convertor using the three grouping schemes and targeting different FPGA families.

formance. On the other hand, larger grouping requires smaller number of levels in the addition stage and less number of adders in each level with larger adder sizes. Conversely, smaller bit grouping may result into each bit of the contributions generation stage ﬁt into one LUT (i.e., may eliminate the need for a LUT hierarchy) with the penalty of requiring larger number of levels and larger number of smaller-size adders per level in the addition stage. Therefore, it is not clear beforehand which bit grouping will win

Virtex-4 Virtex-5 Virtex-6 Virtex-7

9 8

Virtex-4 Virtex-5 Virtex-6 Virtex-7

500 400

Area (LUTs)

7

Delay (ns)

on a particular architecture especially that the routing delay may contribute signiﬁcantly to the overall system delay. To evaluate the performance of our various schemes (i.e., various bit grouping) on a variety of FPGA architectures, we implement (in addition to the 4-bit grouping scheme) the 6-bit grouping (i.e., 1.5 BCD digits) and the 8-bit grouping (i.e., 2 BCD digits) in Verilog HDL for 16-digit BCD input. The Verilog data ﬂow modeling is used. Each one of these three schemes (1, 1.5, and 2 BCD digit groupings) has been synthesized on an FPGA device with 4-input LUTs (Xilinx Virtex-4 xc4vlx200-11-ff1513) and on three FPGA devices with 6input LUTs (Xilinx Virtex-5 xc5vlx330t-2-ff1738, Xilinx Virtex-6 xc6vlx760-2-ff1760, and Xilinx Virtex-7 xc7v2000t-2-ffg1925). The area results are shown in Fig. 9 and the delay results are shown in Fig. 10. As shown in Fig. 9, on Virtex-4, the best bit grouping is one BCD digit followed by 1.5 and then 2 BCD digits. This indicates that the performance gains achieved by ﬁtting each bit in the contributions into one LUT overweighs the performance losses caused by a larger binary addition hierarchy. The same observation holds true for the delay performance of the schemes as shown in Fig. 10. On 6-input LUT FPGAs (Virtex-5, Virtex-6 and Virtex-7), the delay and area (in terms of the number of LUTs) of the 6-bit grouping is more or less similar to that of the 4-bit grouping. We note though that in the case of 4-bit grouping on 6-input LUTs, the utilization of the LUTs is very low. While the 6-bit grouping beats the 4-bit grouping in terms of area on the three 6-input LUTs FPGAs we use, the delay of 6-bit grouping is better in some cases and worse

6 5 4 3 2

300 200 100

1 0

0

2

4

6

8

10

12

14

0

16

0

2

4

Number of BCD digits

6

8

10

12

14

16

Number of BCD digits

Fig. 11. Growth in delay and area of the 4-bit grouping scheme as the number of BCD digits to be converted to binary grows.

1000

Virtex-4 Virtex-5 Virtex-6 Virtex-7

9 8

800

Area (LUTs)

7

Delay (ns)

Virtex-4 Virtex-5 Virtex-6 Virtex-7

6 5 4 3 2

600

400

200

1 0

0

2

4

6

8

10

12

Number of BCD digits

14

16

0

0

2

4

6

8

10

12

14

Number of BCD digits

Fig. 12. Growth in delay and area of the 6-bit grouping scheme as the number of BCD digits to be converted to binary grows.

16

297

O. Al-Khaleel et al. / Microprocessors and Microsystems 37 (2013) 287–298

Virtex-4 Virtex-5 Virtex-6 Virtex-7

9 8

Area (LUTs)

7

Delay (ns)

Virtex-4 Virtex-5 Virtex-6 Virtex-7

2000

6 5 4 3

1500

1000

500

2 1 0

0

2

4

6

8

10

12

14

16

0

0

Number of BCD digits

2

4

6

8

10

12

14

16

Number of BCD digits

Fig. 13. Growth in delay and area of the 8-bit grouping scheme as the number of BCD digits to be converted to binary grows.

in others than that of 4-bit grouping. This indicates that some FPGA devices optimize the routing delay better than others for our architectures. For example, despite the fact that 6-bit grouping results into smaller number of levels in the binary addition stage than the 4-bit grouping, the overall delay of the 4-bit grouping is better than the 6-bit grouping on Virtex-5. The same does not hold true for Virtex-6 and Virtex-7. For the 8-bit grouping scheme, a hierarchy of LUTs is needed for the contributions generation stage in all FPGA devices we use. As a general observation, the scheme works better on 6-input LUTs FPGAs than on 4-input LUT FPGAs. Among 6-input LUTs, the scheme shows similar performance on Virtex-6 and Virtex-7 FPGAs. However, the scheme occupies signiﬁcantly smaller area on Virtex-5 with slightly more delay than in the case of Virtex-6 and Virtex-7. The other aspect that we evaluate for our schemes is the growth in area and delay of the various schemes as the number of digits in the BCD operand increases. We evaluate the area and delay for 4bit, 6-bit, and 8-bit groupings on Virtex-4,Virtex-5, Virtex-6, and Virtex-7 FPGAs with 2-digit, 4-digit, 8-digit, and 16-digit BCD input. Figs. 11–13 show the results. As a general note, the area of our schemes grows exponentially as the number of digits increases whereas the delay grows in a logarithmic fashion. As shown, while the Virtex-6 and Virtex-7 FPGAs perform similarly for all schemes, Virtex-5 (which has again 6-input LUTs) performs very different. While we do not know for sure the root cause for this behavior, these results indicates signiﬁcant architectural differences between Virtex-5 FPGA from one side and Virtex-6 and Virtex-7 from another side.

7. Conclusions In this paper, we present a range of efﬁcient decimal-to-binary conversion schemes to support BCD arithmetic based on binary hardware. Our circuits employed several ideas. First, we split the BCD input into several groups of bits and compute the binary contribution of each group to the overall binary result. The contributions are then added using a tree-structured bank of adders utilizing the fast carry chain logic available in FPGAs. For the selection of the group size, we select it such that it matches the size of the lookup tables on the target FPGA. Due to this choice, each function among the outputs of the circuit that computes the contribution of a given group ﬁts exactly in one look-up table which results into a compact design. We demonstrate in this paper that the proposed architecture outperforms existing architectures in terms of area and speed.

Furthermore, we have discussed in this paper the growth in area and delay of the proposed schemes on various FPGA families as the number of BCD digits in the input grows. The general conclusion is that the area grows in an exponential fashion whereas the delay grows in a logarithmic fashion. References [1] A. Vazquez, E. Antelo, P. Montuschi, A new family of high performance parallel decimal multipliers, in: 18th IEEE Symposium on Computer Arithmetic, 2007, ARITH ’07, pp. 195–204. [2] M.F. Cowlishaw, Decimal ﬂoating-point: algorism for computers, in: Proceedings of the 16th IEEE Symposium on Computer Arithmetic (ARITH16’03), ARITH ’03, IEEE Computer Society, Washington, DC, USA, 2003, p. 104. [3] G. Jaberipur, A. Kaivani, Binary-coded decimal digit multipliers, computers digital techniques, IET 1 (2007) 377–381. [4] R. James, T. Shahana, K. Jacob, S. Sasi, Decimal multiplication using compact bcd multiplier, in: International Conference on Electronic Design, 2008, ICED, 2008, pp. 1–6. [5] A. Singh, A. Gupta, S. Veeramachaneni, M.B. Srinivas, A high performance uniﬁed bcd and binary adder/subtractor, in: Proceedings of the 2009 IEEE Computer Society Annual Symposium on VLSI, IEEE Computer Society, Washington, DC, USA, 2009, pp. 211–216. [6] M. Vazquez, G. Sutter, G. Bioul, J.P. Deschamps, Decimal adders/subtractors in FPGA: efﬁcient 6-input lut implementations, in: International Conference on Reconﬁgurable Computing and FPGAs, vol. 0, 2009, pp. 42–47. [7] R.D. Kenney, M.J. Schulte, High-speed multioperand decimal adders, IEEE Transactions on Computers 54 (2005) 953–963. [8] M. Véstias, H. Neto, Parallel decimal multipliers using binary multipliers, in: VI Southern Programmable Logic Conference (SPL), 2010, pp. 73–78. [9] H. Neto, M. Véstias, Decimal multiplier on FPGA using embedded binary multipliers, in: International Conference on Field Programmable Logic and Applications, 2008, FPL, 2008, pp. 197–202. [10] M. Vestias, H. Neto, Iterative decimal multiplication using binary arithmetic, in: VII Southern Conference on Programmable Logic (SPL), 2011, pp. 257–262. [11] BCD-to-Binary/Binary-to-BCD Number Converter MC-4001P, Application Note: Motorola semiconductor products, 1969. [12] R.F. Tinder, Engineering Digital Design, second ed., Elsevier, 2002. [13] L.C. Beougher, A method for high speed BCD-to-binary conversion, Computer Design (1973) 53–59. [14] L.P. Flora, D.P. Wiener, BCD-to-Binary Converter. US patent, 1982. [15] D. Wiener, BCD to Binary Converter. US patent, 1982. Osama Al-Khaleel assistant professor of Computer Engineering in the Department of Computer Engineering of Jordan University of Science and Technology (Irbid, Jordan), received his B.S in Electrical Engineering from Jordan University of Science and Technology in 1999, M.Sc. and Ph.D. in Computer Engineering from Case Western Reserve University, Cleveland, OH, USA in 2003 and 2006 respectively. Currently, his main research interests are in embedded systems design, reconﬁgurable computing, computer arithmetic, and logic design.

298

O. Al-Khaleel et al. / Microprocessors and Microsystems 37 (2013) 287–298 Zakaria Al-Qudah is an assistant professor of computer engineering at Yarmouk University, Jordan. He earned is Ph.D. and M.Sc. degrees from the Electrical Engineering and Computer Science (EECS) department at Case Western Reserve University (CWRU) 2010 and 2007 respectively. He received his BSc. degree from Yarmouk University, Jordan in 2004. He is interested generally in distributed systems and the Internet research. Speciﬁc subjects include the performance and security of Content Delivery Networks (CDNs), efﬁcient utility computing platforms, and Internet Measurements.

Mohammad Al-Khaleel received the M.Sc. and Ph.D. degrees in Applied Mathematics-numerical analysis from McGill University, Montreal, QC, Canada, in 2003 and 2007, respectively. Since 2007, he has been an Assistant Professor of Mathematics with the Department of Mathematics, Yarmouk University, Irbid, Jordan.

Chris Papachristou is Professor of Electrical Engineering and Computer Science at Case Western Reserve. He received the Ph.D. degree in Electrical Engineering and Computer Science from Johns Hopkins University. His research interests include Design Automation, Testing and Reliability of VLSI Systems, Reconﬁgurable Computing Architecture Design, and Wireless Digital Systems. He has published numerous articles in these areas, consulted with industry and government, served as Program Chair and General Chair of several IEEE/ACM conferences and been on the program committees of many international conferences and workshops. He is a Fellow of the IEEE and a member of the ACM and Sigma XI, and is listed in Who’s Who in America.

High performance FPGA-based decimal-to-binary conversion schemes for decimal arithmetic

High performance FPGA-based decimal-to-binary conversion schemes for decimal arithmetic

Recommend Documents