Low power single precision BCD floating–point Vedic multiplier

Low power single precision BCD floating–point Vedic multiplier

Microprocessors and Microsystems 72 (2019) 102930 Contents lists available at ScienceDirect Microprocessors and Microsystems journal homepage: www.e...

2MB Sizes 0 Downloads 45 Views

Microprocessors and Microsystems 72 (2019) 102930

Contents lists available at ScienceDirect

Microprocessors and Microsystems journal homepage: www.elsevier.com/locate/micpro

Low power single precision BCD floating–point Vedic multiplier V. Ramya a,∗, R. Seshasayanan b a b

Department of Electronics and Communication Engineering, Anna University, India Department of Electronics and Communication Engineering, Meenakshi College of Engineering, Chennai, India

a r t i c l e

i n f o

Article history: Received 16 March 2019 Revised 18 October 2019 Accepted 24 October 2019 Available online 26 October 2019 Keywords: Binary floating-point multiplier (BFPM) BCD floating point multiplier (BCD-FPM) Urdhva-Tiryakbhyam (UT) sutra Kogge stone adder (KSA) Binary to BCD converter (B2BCD) BCD to binary converter (BCD2B)

a b s t r a c t In this paper, the Binary coded decimal floating-point multiplier (BCD-FPM) and Binary floating-point multiplier (BFPM) with binary to BCD (B2BCD) converter are proposed using Urdhva-Tiryakbhyam (UT) sutra. Two methods are proposed for BCD-FPM and comparison is made between BCD-FPM and BFPM with B2BCD converter. The designs are modelled in Verilog HDL and synthesized based on the 90nm standard cell library in Cadence EDA Tool. Comparisons are based on the synthesis report generated by Cadence RTL complier and implemented in Encounter RTL TO GDSII system. The results show that BCDFPM has better performance in terms of delay and power. The power for Method II gets reduced by 59.47% and 73.40% when compared with Method I and BFPM with B2BCD converter respectively. The delay for Method II gets reduced by 6.9% than Method I and 30.37% than BFPM with B2BCD converter. The pipelined architecture is designed for Method II as it is efficient than other multipliers, whose delay is reduced by 65.82% after pipelining. © 2019 Published by Elsevier B.V.

1. Introduction Multiplier is the major unit in the Arithmetic and Logic Unit. Since it consumes more power and area there is a need for the design of efficient multiplier in terms of area, power, and latency. Floating point arithmetic is commonly used in most of the digital signal processors. Huge errors cannot be tolerated in applicationoriented sectors like banking, commercial, scientific, accounts, insurance, and other user-related functions. Binary arithmetic is widely used in digital circuits to perform arithmetic and logic operations owing to simpler numerical properties and easy implementation in digital systems. Fixed-point tends to have loss of accuracy when applied in user-oriented applications and eventually the error rating can reach the peak. To eliminate the error rating and increase the range of representation, the floating-point technique is used. Fixed-point multiplication can have the drawback of truncation and hence leads to degradation of precision. The truncation error is reduced by slightly modifying the partial product of Booth multiplication and the error compensation technique is implemented [1][21,23,24]. Modified Booth multiplier is used for the multiplication process and it shows that the mean square error is reduced to 12.3% and 6.3% for 16-bit and 8-bit fixed point multiplication [1,2]. The error characteristics are studied [3] which states that the error can be further reduced when approximate modified



Corresponding author. E-mail address: [email protected] (V. Ramya).

https://doi.org/10.1016/j.micpro.2019.102930 0141-9331/© 2019 Published by Elsevier B.V.

booth multiplication technique is applied. Two techniques are implemented using MBE along with Wallace tree which reduces the error probability to 12.5% and 25% respectively. The fixed point has 3 stages: partial product generation, partial product reduction, and result. The latency is predominantly evident in the second stage. The Vedic technique can efficiently reduce the delay when compared to the modified Booth algorithm [4]. The partial product and their sum are produced in a single step by using a Vedic multiplier [5,14]. The Q15 and Q31 [5] format multipliers are proposed using 8∗ 8 and 16∗ 16 Urdhva-Tiryakbhyam (UT). The larger value cannot be processed using fixed-point that may lead to inaccuracy. Recently floating-point arithmetic is considered in many research areas. The design of a 32-bit binary Vedic multiplier and its simulation using Xilinx ISE 13.4 when compared with the conventional binary multiplier uses more LUTs and I/Os and has comparatively more delay [6]. Further 4-bit and 8-bit Vedic multipliers are proposed using Urdhva-Tiryakbhyam (UT) sutra [7,15,16]. This architecture is realized in 45nm CMOS technology in the Cadence EDA tool and it is conceived that the proposed designs are efficient in terms of power, area, and speed. The Vedic multiplier in combination with the Kogge stone adder has been implemented [8] and it is proved that the design is fastest. The Vedic sutra along with 4:2 and 7:2 compressors for addition is explained [9,20] and compared with the conventional multiplier. The result shows that the implemented design has terms of area and delay. A single and double precision floating point [10] using UT sutra using a carry save adder gives higher speed than the conventional multiplier. The single precision floating point [18,19] is discussed using RCA and it is

2

V. Ramya and R. Seshasayanan / Microprocessors and Microsystems 72 (2019) 102930

compared with a different floating-point multiplier. The design is modelled in Verilog HDL and designed based on the TSMC 180nm standard cell library. The comparison shows that the Vedic multiplier reduces the power by 26% than the modified booth multiplier. Different adders for binary floating-point multiplier using Vedic sutra have been studied and the prefix Sklansky adder has better performance [22]. Decimal multiplication is mostly preferred in many real-time applications [13,25]. Every decimal digit cannot be exactly represented in binary with a finite number of bits, either it is rounded or truncated. Many applications cannot tolerate errors that result from the conversion of binary to decimal format. Thus, to overcome these problems binary coded decimal is used. The BCD multiplication is carried out [11] for fixed-point and it is claimed that the proposed partial product generator architecture saves 30% of the area. For the multiplication process, Vedic Sutra is utilized after the conversion two of the 4-bit binary number to BCD [17] and the resultant binary product is again converted to BCD using binary to BCD converter. A 32-digit binary coded multiplier is proposed using a novel binary counter for addition, BCD full adder and binary to BCD converter for conversion of Binary to BCD [12]. This paper is organized as follows. In Section 2, the binary floating-point format and the architecture are briefly reviewed. The study on Vedic maths is presented in Section 3 . Section 4 describes the proposed BCD floating point multiplier. The pipelined BCD-FPM is discussed in Section 5. In Section 6 the comparison and the performance metrics analysis are discussed. The conclusion is described in Section 7 and the future work is described in Section 8.

Table 1 Floating-point format. Precisions

Sign

Exponent

Mantissa

Single precision (32 BITS) Double precision (64 BITS)

1 1

8 11

22+1(implicit bit) 52+1(implicit bit)

2. Binary floating-point representation Fig. 1. Binary floating-point architecture.

A number is generally represented in fixed-point of significant digit and scaled using an exponent in some fixed base. The fixed point operation can produce the resultant bits that may exceed the operands. In the case of the multiplication process, the resultant bits will be the sum bit of the operands. To fit the resultant bits into the same size as operand bit size the term called truncation and rounding comes into the picture. In this case, there is a possibility of information loss that results in accuracy issues. The problem is that the fixed point is prone to loss of precision when large numbers are evaluated. The other major issue is that integer fixed point is tedious to use in processor due to overflow conditions. In a floating-point number, the decimal point can be shifted to the right or left of a fixed number. When compared to the fixedpoint number, floating-point can represent a very large and small number; thereby expanding the range of representation. The floating point is typically represented as sign (s), exponent (e) and mantissa (M).The standard format for floating point number is depicted as,

−1S X be X M

(1)

The IEEE standard for floating point arithmetic (IEEE754) was established in 1985 by the Institute of Electrical and Electronics Engineer for floating point computation. Half, Single, Double, Extended and Quad are the precisions formulated by the IEEE754 standard. Out of these the single precision and double precision are dominantly used. Table 1 shows the bit format of single and double precision. The single precision binary floating point is represented as 32bits which includes the 8-bit exponent, MSB as sign bit and the 23-bit mantissa. For double precision the MSB is sign bit, the exponent is 11-bits and the mantissa is about 53-bits. The 23rd bit and the 53rd bit of mantissa are implicit bits. The standard format for representing single precision floating point number is,



−1s 2E b0 .b1 b2 b3 . . . . . . . . . . . . .bp−1



(2)

The fractional part and the exponent part is given as,

f = b0 .b1 b2 b3 . . . . . . . . . . . ..bp−1 e = E + 127



(3)

Fig. 1 gives the architecture of BFPM. The multiplication process involves the computation of sign, exponent and the mantissa part. The sign bit is expressed as 0 if the number is positive and 1 if the number is negative. XOR operation is carried out to get the MSB bit which is the resultant sign bit. The exponent term is manipulated using the Kogge stone adder. The 8-bit exponent of both the operands is summed up and it is biased to 127 for single precision. The mantissa part is computed using the Vedic multiplication technique. Urdhva-Tiryakbhyam sutra is used for the calculation of mantissa which results in a 48-bit wide multiplication product as both operands are in length of 24-bits. As the resultant bit is twice the length of the operand, the normalization progression is done by eliminating the preceding one in the mantissa region and accordingly the exponent value is changed. The exponent term is increased by one if the leading bit of the mantissa is high and the mantissa is expressed as 23-bit in length from the succeeding bit. If the foremost bit in the mantissa part is zero then the bit from (n-2) bit position is considered in case of mantissa and the exponent term is maintained as such. When the exponent is from 1 to 254 the number is either positive floating point or negative floating point number which is concluded by the MSB bit of the resultant. The binary result is then converted to BCD using a B2BCD converter. The value after the decimal point cannot be inferred correctly. The conversion of binary to decimal will result in an error and when the larger numbers are used these errors can further in-

V. Ramya and R. Seshasayanan / Microprocessors and Microsystems 72 (2019) 102930

3

that is smaller than the smallest normal number is a denormalized number. The production of a denormalized number is sometimes called gradual underflow because it allows a calculation to lose precision slowly when the result is small. 2.1. Algorithm for binary floating-point multiplier

Input : T wo Binary F loating point numbers A and B Binary Out put : C BCD Out put : D SINGLE PRECISION: M[47 : 0] ← {Implicit bit, A[22 : 0]} ∗ {Implicit bit, B[22 : 0]} NormalizedMantissa ← C[22 : 0] C[31]← A[31]∧ B[31] C[30 : 23] ← A[30 : 23] + B[30 : 23] − 127(Bias ) D ← BC D (C ) DOUBLE PRECISION: M[103 : 0] ← {Implicit bit, A[51 : 0]} ∗ {Implicit bit, B[51 : 0]} NormalizedMantissa ← C[51 : 0] C[63]← A[63]∧ B[63] C[62 : 52] ← A[62 : 52] + B[62 : 52] − 1023(Bias ) D ← BC D (C )

2.2. Simulation result The simulation is performed in CADENCE Sim-Vision for BFPM. The enforced input and the corresponding output for BFPM are illustrated in Table 3 and the corresponding waveform is given in Fig. 3. Fig. 2. Flowchart for binary floating point.

3. Vedic maths Table 2 Exceptional cases in IEEE 754 standard. Sign

Exponent

Mantissa

Representation

0/1 0/1 0/1 0/1 0/1

1-254 0 0 255 255

Anything 0 Non-Zero 0 Non-Zero

Positive/Negative Floating Point Positive/Negative Zero Denormalized number Positive/Negative Infinity Not-a-Number

crease. To avoid these kinds of error the BCD-FPM is proposed. The flow of binary floating point operation goes as shown in Fig. 2. The IEEE 754 standard provides special cases such as overflow and underflow conditions. When the exponent is too large to be represented in the exponent field the checker unit indicates overflow condition. When the negative exponent becomes too large then it indicates an underflow condition. The checker unit checks for 5 exceptional cases namely: zero, negative zero, Positive infinity, Negative infinity, Not a Number (NaN).The different cases are summarized in Table 2. The number is said to be zero if every bit in the representation is zero. NaN is a value that does not make sense such as non-real numbers or the result of an operation like infinity times zero. The number is said to be infinite if all the bit of the exponent is 1 with mantissa value as zero. The positive and negative infinity is declared by the sign bit. Any non-zero number

Vedic mathematics is an ancient mathematics technique that is rediscovered from Vedas (1911-1918) by Sri Bharati Krishna Tirthaji Maharaj (1884-1960). Vedic is a Sanskrit word derived from the word ‘Vedas’ which means ‘KNOWLEDGE’. The regular mathematical model consumes more time and sometimes complex in operation. Vedic mathematics is a collection of sutras that is used to solve arithmetic calculation in simple, efficient and fastest way. It has 16 sutras or aphorisms and 13 sub-sutras [27]. Out of these 3 sutras and 2 sub sutras are used for multiplication which is listed below, 1. 2. 3. 4. 5.

Urdhva- tiryakbhyam Nikilam Navatashcaramam Dashatah Anurupvena Ekanvunena Purvena Antyavordasake’pi

Among all these methods Urdhva-Tiryakbhyam (Vertical and crosswise) is universally adopted method as it is suitable for both binary and decimal number system. Nikhilam sutra specifies the subtraction of a number from the nearest power of 10. This sutra is not suitable for decimal number system because at least one operand has to be near the power of 10. Anurupvena sutra is another Vedic multiplication trick when both numbers are not closer to the power of 10 but closer to multiples of 10 and closer to each

Table 3 Inputs and output in the simulated result. Input

BFPM

S

E

M

A(-19.0) B(9.5)

1 0 OUTPUT

10000011 10000010

00110000000000000000000 00110000000000000000000

(-180.5) Binary

1

10000110

01101001000000000000000

4

V. Ramya and R. Seshasayanan / Microprocessors and Microsystems 72 (2019) 102930

Fig. 3. BFPM waveform.

other. Ekanvunena Purvena is applicable whenever multiplier has only 9 s as digit as the result is not suitable for all types of numbers. Antyavordasake’pi can be applied when the last digit of both the numbers totals as 10. Except for Urdhva-Tiryakbhyam, all other sutras are specific multiplication methods which mean they can be applied when the numbers satisfy certain conditions like both numbers are closer to the power of 10 or numbers closer to each other or addition of last digits of both numbers is 10. The multiplication sutra used for BCD-FPM and BFPM is Urdhva-Tiryakbhyam as it is a general method which suits for all types of numbers. Table 4 defines the condition for the different sutras used for multiplication.

Fig. 4. Line diagram for 3 bit UT sutra.

3.1. Urdhva-Tiryakbhyam The Vedic multiplier technique employed is UrdhvaTiryakbhyam which originated from Sanskrit word meaning “VERTICAL” and “CROSSWISE”. This method is preferable as it can be applied to all types of numbers. The major advantage of UT sutra is all the partial products are generated concurrently. This multiplication technique is faster and efficient when compared with conventional multipliers [18]. The line diagram for two 3-bits; a2a1a0 and b2b1b0 using UT sutra is given in Fig. 4. The final result of the 3-bit multiplier is c4s4s3s2s1s0. Fig. 5 gives the example of UT sutra using two decimal digits. Considering the numbers, A = a2 a1 a0 and B = b2 b1 b0 .



s0 = a0 b0 ; ⎪ ⎪ ⎬ c 1 s 1 = a 1 b 0 +a 0 b 1 ; c2 s2 = c1 +a1 b1 +a0 b2 +a2 b0 ; ⎪ ⎪ c3 s3 = c2 +a1 b2 +a2 b1 ; ⎭ c4 s4 = c3 +a2 b2 ;

Fig. 5. Example for UT sutra.

(4)

The algorithm for UT sutra is given as follows (for 3 bit), Step 1: Arrange the numbers vertically. For the two unequal operands in terms of number of digits, prefix the lesser digit

operand with zeros until it becomes equal to the number of digits in another operand. Step 2: Consider the vertical column from left to right. Vertical multiplication is done for the leftmost column.

Table 4 Vedic Sutras applicable for multiplication. Name

Meaning

Conditions

Nikhilam Navatashcaramam Dashatah Urdhva Tiryakbhyam Anurupye Shunyamanyat Ekanyunena Purvena Antyavordasaki’pi

All from 9 and the last from 10 Vertically and crosswise If one is in ratio, the other is zero By one less than the previous one Last totaling to 10

Applicable Applicable Applicable Applicable Applicable

when the number is near to power of 10 to all types of numbers when the number is closer to 10’s when the digit has only 9 s when the last digit of both numbers equals to 10

V. Ramya and R. Seshasayanan / Microprocessors and Microsystems 72 (2019) 102930

5

Fig. 6. BCD Floating point format.

Step 3: Crosswise multiplication is carried out for the first two columns from the left and it is summed up. Step 4: Vertical multiplication is done for the centre column and crosswise multiplication are performed for the remaining two columns and the resultants are aggregated. Step 5: Crosswise multiplication is performed for the two columns from the right and the result is summed up. Step 6: Vertical multiplication is done for the rightmost column. 4. Proposed method Many applications use BCD floating point format as there is a need for high precision. BCD float can be used with float (32-bits), double (48-bits) and long double (56-bits). Each digit in BCD is represented by a fixed number of bits, commonly 4 or 8 bits. Usually, BCD can be represented in packed and unpacked formats. In packed format, each digit is represented in 4-bits (i.e. 9 = 1001) but in case of unpacked 1 byte (i.e. 9 = 0 0 0 010 0 0) is needed to store a single digit. As a result, there is wastage of space in the case of unpacked which is not preferable. BCD floating point format for single precision [26] is illustrated in Fig. 6. The MSB bit is represented as a sign bit as in binary floating point representation. M epitomizes the 24-bit mantissa value in BCD. E represents the binary coded decimal exponent value that is 6-bit wide. N intimates that the given number is BCD. Multiplication of BCD floating point can be done in two ways. The BCD mantissa is converted to binary and it is processed as a binary number which is converted to BCD at the end. The second way is that the BCD mantissa is kept as such and the process is carried out. But the multiplication of the two BCD results in binary number which should to be converted to BCD. 4.1. Method I In this method, the input is given as 32 bit BCD which composes of MSB as sign followed by the bit which intimates the number is BCD. The next 6-bit express exponent followed by mantissa. Here the BCD M and E are not converted to binary instead computation is done using BCD directly. The by-product of multiplication of any two BCD number is binary and it is also true for aggregation. So the result is converted to BCD using binary to BCD converter (B2BCD).The sign bit is calculated by performing XOR on the MSB bits. The adder used for exponent addition is a Kogge stone adder. The architecture proposed for the method I is portrayed in Fig. 7. 4.1.1. Simulation result The simulation is performed in CADENCE Sim-Vision for BCDFPM (Method II). The enforced input and the corresponding output for the method I is illustrated in Table 5 and its corresponding waveform is given in Fig. 8 4.2. Method II The 32-bit BCD format is enforced as input to the multiplier. The 24-bit mantissa and 6-bit exponent are converted to binary using the BCD2B converter. Kogge stone adder is used for the summation of binary exponents as it reduces the latency and power

Fig. 7. Architecture for Method I.

Table 5 Inputs and output in the simulated result. Input S N Method I & II

A(19.5)

0

1

B(-82.5)

1

1

OUTPUT (-1608.75)

1

1

E

M

001 000 001 000

000110010101000000000000

010 110

000101100000100001110101

100000100101000000000000

Table 6 Binary to BCD conversion 10’s

1’s

Binary

Operation

1 10 100

1 10 101 1000 0000 0000 0000

101000 01000 1000 000 000 00 0 -

Shift left Shift left Shift left Shift left Add 3 Shift left Shift left -

consumption. It generates carry in O (log n) time and is considered as the fastest adder and is widely used in the design to achieve high performance in arithmetic circuits. Kogge stone adder has 3 stages: Pre-processing, Carry look ahead network and Postprocessing. The pre-processing stage involves the computation of generate and propagate signal corresponding to each bit of both operands. The second stage involves the computation of carries corresponding to each bit and the final stage is the computation of sum bits.



pi = A i ∧ Bi ⎪ ⎪ ⎪ gi = A i &Bi ⎬ Pi:j = Pi:k+1 & Pk:j  ⎪ Gi:j = Gi:k+1 Pi:k+1 &Gk:j ⎪ ⎪ ⎭ Si = pi ∧ Ci−1

(5)

The Vedic multiplier using UT sutra is used for the calculation of the mantissa part. The obtained result is converted to BCD using binary to BCD converter (B2BCD). The mantissa bits are normalized after converting to BCD. Rounding off mantissa bit length to 24 is performed by normalizer as the bit length of the mantissa is equal to 24.This is done by considering the range from the MSB bit. The architecture proposed for method II is shown in Fig. 9. The magnitude of the product is decided by XOR operation performed for the sign bit of two operands. N is the one-bit value used to intimate that the given number is BCD. The final 32-bit resultant will be in BCD format.

6

V. Ramya and R. Seshasayanan / Microprocessors and Microsystems 72 (2019) 102930

Fig. 8. BCD-FPM waveform (Method I).

Table 7 BCD to binary conversion

Fig. 9. Architecture for Method II.

Operation

BCD

10’s

1’s

Shift right Shift right Shift right Shift right Shift right Shift right Sub 3 -

100100 10010 1001 100 10 1 -

0 00 100 0100 0010 1001 0110

0 00 00

The architecture for a 6-bit binary to BCD converter is given in Fig. 11. 4.4. BCD to binary converter

4.2.1. Simulation result The input and output values forced for BCD multiplication (Method I) are given in Table 5 and its waveform is shown in Fig. 10. 4.3. Binary to BCD converter The product term obtained after mantissa calculation is converted to BCD digit. The technique used for conversion is Shift and Add-3.The binary and the BCD values are the same for numbers till 9 after that 6 is added to the number to obtain the corresponding BCD value. The number is checked and if it is greater than 9 then 6 need to be added which results in 5-bits. To avoid extra bits the Shift and Add 3 technique is used (Table 6). The number, when shifted to left, denotes that the number is multiplied by 2.3 gets added if the shifted bits equal to or greater than 5. The process continues until all the bits are shifted towards left. For example, if 6-bits are considered then there will be tens and ones place each with 4-bits. The algorithm for Shift and Add-3 is as follows, Algorithm for Shift and Add-3 Initialization Input : Binary numberA Result Out put : BCD number C Process 1. Cl ear al l bits o f C to zero. 2. Shi f t A by one bit le f t 3. Check whet her t he shi f ted number is greater than 4 i f (the shi f ted number is greater than 4) add 3 to the shi f ted number else go to 2 4.Repeat til l al l the bits are shi f ted

The exponent and the mantissa are converted into binary in Method II and the data processing is carried out. The technique preferred for BCD2B conversion is Shift and Sub-3 (Table 7). Here the bits are shifted towards the right and checked whether the 4bits value is greater than 7. If the value is greater than 7 then 3 is subtracted from the value. The architecture for 6 bit BCD to binary converter is shown in Fig. 12. Algorithm for Shift and Sub-3 Initialization Input : BCD number A Result Out put : Binary number C Process 1. Cl ear al l bits o f C to zero. 2.Shi f t A by one bit right 3. Check whet her t he shi f ted number is greater than 7 i f (the shi f ted number is greater than 7) sub 3 to the shi f ted number else go to 2 4.Repeat til l al l the bits are shi f ted

When the two methods are compared it is noted that Method II is efficient than the Method I in terms of power and delay. To further reduce the delay in Method II, pipelining architecture is proposed. 5. Pipelined BCD-FPM architecture The pipelining technique decomposes the function into consecutive sub-functions called stages. Each stage performs the specific operation and produces the intermediate result. Pipelining exploits parallelism by overlapping the execution process. The clock is connected to the latch and at each clock pulse, every stage transfers

V. Ramya and R. Seshasayanan / Microprocessors and Microsystems 72 (2019) 102930

Fig. 10. BCD-FPM waveform (Method II).

Fig. 11. 6 bit binary to BCD converter architecture.

Fig. 12. 6 bit BCD to binary converter architecture.

its intermediate result to the input latch of the next stage. The final result is obtained once the data have been passed through the entire pipeline. The period of the clock pulse should be sufficient for the data to traverse through each stage. The pipelining technique is used to improve the resource utilization and decrease the delay thereby increasing the throughput. This can be achieved by using latches between the logic blocks. A latch-based system gives significantly more flexibility in implementing a pipelined system and offers higher performance. The 3-stage pipeline is done so that the latency approximately gets reduced to one third but the core area and the power utilized get increased. The pipeline is done for Method II so that the delay is further reduced. The pipeline for Method II proposed in Fig 13.

Fig. 13. Pipelined architecture.

6. Results and discussions 6.1. Synthesis report The proposed schemes are synthesized using CADENCE RTL COMPLIER and are implemented in Encounter RTL to GDSII system using 90 nm technology. The comparison Table 8 shows the synthesized results for BFPM, Method I and Method II using KSA without pipelining. The delay for Method II has been reduced by 6.9% and 30.37% when compared with the Method I and BFPM with B2BCD converter respectively. The area gets increased for the proposed BCD multipliers than BFPM as the gate and cell count increases. When

Fig. 14. Overall comparison chart.

7

8

V. Ramya and R. Seshasayanan / Microprocessors and Microsystems 72 (2019) 102930

Fig. 15. Comparison between pipelined and non-pipelined BCD-FPM. Fig. 16. area chart. Table 8 Comparison of synthesized results without pipelining

Parameters

BFPM with B2BCD converter (90 nm)

Method I (90 nm)

Method II (90 nm)

Cells Area(μm2 ) Leakage power (nW) Switching power (nW) Total power (nW) Delay (ps)

4697 29829 158970.79 1313332.38 1472303.175 21558

5317 32693.5 184986.22 787357.05 966343.275 16138

8004 53568.1 303370.20 88204.79 391574.990 15009

Table 9 Comparison of synthesized results for pipelined and non-pipelined architecture.

Parameters

Method II with pipelining (90 nm)

Method II without pipelining (90 nm)

Cells Area(μm2 ) Leakage power (nW) Switching power (nW) Total power (nW) Delay (ps) PDP(pJ)

10932 80969 485447.207 254553.965 740001.172 5130 3.324

8004 53568.1 303370.20 88204.79 391574.990 15009 5.877149

Fig. 17. Power chart.

Method II is compared with the Method I and BFPM the total power is reduced by 59.47% and 73.40%respectively. As a result, the power delay product (PDP) for Method II decrease by 62.31% than the Method I and by 81.48% than BFPM. The overall performance of Method II is efficient than the Method I for BCD-FPM. Table 9 shows the comparison between non-pipelined Method II and pipelined Method II. The pipelined result shows that delay is reduced by 65.82% compared to non-pipelined BCD-FPM. But the core area and the power consumption increases simultaneously as the latches are used between the logic.

Fig. 18. Delay chart.

6.2. Comparison metrics The comparison metrics for BFPM using RCA and KSA are shown in Table 10. The overall comparison metric is shown in Fig. 20.

power of proposed BFPM using KSA gets reduced by 1.06% when the comparison is made with of proposed BFPM using RCA. The power chart is given in Fig. 17. i Delay comparison

i Area comparison The area of proposed BFPM architecture using RCA reduces by 9.136% when compared to [19]. When KSA is used instead of RCA then the area gets increased by 7.8% when compared to [19]. But the area of the proposed BFPM using KSA gets increased by 1.39% when the comparison is made with proposed BFPM using RCA. Fig. 16 shows the area comparison chart. i Power comparison The BFPM architecture proposed using RCA reduces the power by 61.38% when compared to [19]. When RCA is replaced by KSA then power gets decreased by 65.5% compared with [19]. The

The delay gets increased in proposed BFPM architecture using RCA when the comparison is made with [19]. When RCA is replaced by KSA the delay is further increased by 4.3%. Fig. 18 gives the Delay comparison chart. i PDP comparison The PDP for proposed BFPM architecture using RCA is reduced by 16.85% when compared to [19]. The PDP for the proposed BFPM architecture with KSA has reduced by 22.53% than [19]. When RCA is replaced by KSA in the proposed BFPM architecture then PDP gets decreased by 6.83%. Fig. 19 shows the PDP chart.

V. Ramya and R. Seshasayanan / Microprocessors and Microsystems 72 (2019) 102930

9

Table 10 Comparison of performance metrics of BFPM.

BFPM using RCA(180 nm) [18] (1) BFPM using RCA(180 nm) [19] (2) Proposed BFPM using RCA(binary result) (180 nm) (3) Proposed BFPM using KSA(binary result) (180 nm) (4)

Delay (ns)

Area(μm2 )

P.D (mW)

PDP (pJ)

8.0 9.2 19.807 20.659

71946 65109 59160 59995

20.305 15.024 5.802 5.183

162.44 138.22 114.92 107.07

CRediT authorship contribution statement V. Ramya: Conceptualization, Data curation, Writing - original draft, Writing - review & editing. R. Seshasayanan: Conceptualization, Data curation, Writing - original draft, Writing - review & editing. Acknowledgment

Fig. 19. PDP chart.

The authors extend their sincere thanks to the Centre for Research, Anna University, Chennai, India, for supporting this research work under Anna Centenary Research Fellowship (ACRF). References

Fig. 20. Overall comparison chart.

7. Conclusion Low power and delay efficient BCD-floating point multiplier (BCD-FPM) for single precision are designed using UT sutra. Two architectures have been proposed for BCD-FPM, Method I and Method II. To compare the results BFPM is also designed using KSA. The results show that the BCD-FPM Method II outperforms the BFPM in terms of power by 73.41 % and delay by 30.37 % and the BCD-FPM Method I in terms of power by 59.48% and delay by 6.9 %. To further enhance the performance BCD-FPM Method II is pipelined. Though the area increases for pipelined structure, the power-delay product has improved by 43.44 % when compared with architecture without pipelining. Thus, it can be concluded that the pipelined BCD-FPM Method II has better performance metrics than the BCD-FPM Method I and BFPM. 8. Future work The proposed BCD-FPM can be implemented in arithmetic unit design and also the KSA in the architecture can be replaced by Vedic adders. The multiplier can also be extended for double precision. Declaration of Competing Interest None.

[1] J.P. Wang, S.R. Kuang, S.C. Liang, High-accuracy fixed-width modified Booth multipliers for lossy applications, IEEE Trans. Very Large Scale Integr. VLSI Syst. 19 (1) (2011) 52–60. [2] Y.H. Seo, D.W. Kim, A new VLSI architecture of parallel multiplier–accumulator based on the Radix-2 modified Booth algorithm, IEEE Trans. Very Large Scale Integr. VLSI Syst. 18 (2) (2010) 201–208. [3] W. Liu, L. Qian, C. Wang, H. Jiang, J. Han, F. Lombardi, Design of approximate radix-4 Booth multipliers for error-tolerant computing, IEEE Trans. Comput. (2017). [4] A. Mittal, A. Nandi, D. Yadav, Comparative study of 16-order FIR filter design using different multiplication techniques, IET Circ. Devices . Syst. (2017). [5] M. Ashwath, B.S. Premananda, The signed fixed-point multiplier for DSP using the vertically and crosswise algorithm, in: Computing, Communications and Networking Technologies (ICCCNT), 2013 Fourth International Conference on, IEEE, 2013, July, pp. 1–6. [6] A. Bisoyi, M. Baral, M.K. Senapati, Comparison of a 32-bit Vedic multiplier with a conventional binary multiplier, in: Advanced Communication Control and Computing Technologies (ICACCCT), 2014 International Conference on, IEEE, 2014, May, pp. 1757–1760. [7] S. Tripathy, L.B. Omprakash, S.K. Mandal, B.S. Patro, Low power multiplier architectures using Vedic mathematics in 45nm technology for high-speed computing, in: Communication, Information & Computing Technology (ICCICT), 2015 International Conference on, IEEE, 2015, January, pp. 1–6. [8] R. Anjana, B. Abishna, M.S. Harshitha, E. Abhishek, V. Ravichandra, M.S. Suma, Implementation of Vedic multiplier using Kogge-stone adder, in: Embedded Systems (ICES), 2014 International Conference on, IEEE, 2014, July, pp. 28–31. [9] R. Gupta, R. Dhar, K.L. Baishnab, J. Mehedi, Design of high performance 8-bit Vedic Multiplier using compressor, in: Advances in Engineering and Technology (ICAET), 2014 International Conference on, IEEE, 2014, May, pp. 1–5. [10] S.S. Mahakalkar, S.L. Haridas, Design of high-performance IEEE754 floating point multiplier using Vedic mathematics, in: Computational Intelligence and Communication Networks (CICN), 2014 International Conference on, IEEE, 2014, November, pp. 985–988. [11] G. Jaberipur, A. Kaivani, Binary-coded decimal digit multipliers, IET Comput. Digital Tech. 1 (4) (2007) 377–381. [12] S. Veeramachaneni, M.B. Srinivas, Novel high-speed architecture for 32-bit binary coded decimal (BCD) multiplier, in: Communications and Information Technologies, 20 08. ISCIT 20 08. International Symposium on, IEEE, 20 08, October, pp. 543–546. [13] S. Gonzalez-Navarro, C. Tsen, M.J. Schulte, Binary integer decimal-based floating-point multiplication, IEEE Trans. Comput. 62 (7) (2013) 1460–1466. [14] S.S. Saokar, R.M. Banakar, S. Siddamal, High speed signed multiplier for digital signal processing applications, in: Signal Processing, Computing and Control (ISPCC), 2012 IEEE International Conference on, IEEE, 2012, March, pp. 1–6. [15] M. Ramalatha, K.D. Dayalan, P. Dharani, S.D. Priya, High-speed energy efficient ALU design using Vedic multiplication techniques, in: Advances in Computational Tools for Engineering Applications, 2009. ACTEA’09. International Conference on, IEEE, 2009, July, pp. 600–603. [16] S. Patil, D.V. Manjunatha, D. Kiran, Design of speed and power efficient multipliers using Vedic mathematics with VLSI implementation, in: Advances in Electronics, Computers, and Communications (ICAECC), 2014 International Conference on, IEEE, 2014, October, pp. 1–6. [17] A.K. Mehta, M. Gupta, V. Jain, S. Kumar, High-performance Vedic BCD multiplier and modified binary to BCD converter, in: India Conference (INDICON), 2013 Annual IEEE, IEEE, 2013, December, pp. 1–6.

10

V. Ramya and R. Seshasayanan / Microprocessors and Microsystems 72 (2019) 102930

[18] Sharma Bhavesh, Mishra Ruchika, Comparison of single precision floating point multiplier using different multiplication algorithm, Int. J. Electr. Electron. Data Commun. 3 (2015) 106–109 2320-2084. [19] Sharma, B., & Bakshi, A. Design and implementation of an efficient single precision floating multiplier using vedic multiplication. [20] Y. Bansal, C. Madhu, A novel high-speed approach for 16× 16 Vedic multiplication with compressor adders, Comput. Electr. Eng. 49 (2016) 39–49. [21] S. Anjana, C. Pradeep, P. Samuel, Synthesize of high-speed floating-point multipliers based on Vedic mathematics, Procedia Comput. Sci. 46 (2015) 1294–1302. [22] K.V. Gowreesrinivas, P. Samundiswary, Comparative study on the performance of a single precision floating point multiplier using Vedic multiplier and different types of adders, in: Control, Instrumentation, Communication and Computational Technologies (ICCICCT), 2016 International Conference on, IEEE, 2016, December, pp. 466–471. [23] S. Havaldar, K.S. Gurumurthy, Design of Vedic IEEE 754 floating point multiplier, in: Recent Trends in Electronics, Information & Communication Technology (RTEICT), IEEE International Conference on, IEEE, 2016, May, pp. 1131–1135. [24] A. Jais, P. Palsodkar, Design and implementation of a 64-bit multiplier using the Vedic algorithm, in: Communication and Signal Processing (ICCSP), 2016 International Conference on, IEEE, 2016, April, pp. 0775–0779. [25] A. Vazquez, E. Antelo, P. Montuschi, Improved design of high-performance parallel decimal multipliers, IEEE Trans. Comput. 59 (5) (2010) 679–693. [26] http://www.fsinc.com/reference/html/com9anm.htm. [27] Jagadguru Swami Sri Bharati Krisna Tirthaji Maharaja, Vedic Mathematics Sixteen Simple Mathematical Formulae from the Veda, 1965.

V. Ramya received B.Tech. degree in Electronics and Communication Engineering from Sri Manakula Vinayagar Engineering College, Pondicherry University, Puducherry, in the year 2015, M.E. degree in VLSI Design from College of Engineering Guindy, Anna University, Chennai, in the year 2017. She is currently pursuing Ph.D. in VLSI Design at Department of Electronics and Communication Engineering, College of Engineering Guindy, Anna University, Chennai. She is receiving Anna Centenary Research Fellowship from Centre for Research Anna University, Chennai. Her area of interest includes Digital circuit design and VLSI Design.

R. Seshasayanan received his M.E. degree and Ph.D. from Anna University, India in the year 1983 and 2008 respectively. He is presently working as Associate Professor in the Department of Electronics and Communication, Anna University, India and his area of interests include MIMO system architecture, modulation and coding, multi user communications technique and Reconfigurable architecture.