Design of High Speed Carry Select Adder using Modified Parallel Prefix Adder

Design of High Speed Carry Select Adder using Modified Parallel Prefix Adder

Available online at www.sciencedirect.com ScienceDirect ScienceDirect Procedia Computer Science 00 (2018) 000–000 Available at Science www.scienced...

987KB Sizes 0 Downloads 54 Views

Available online at www.sciencedirect.com

ScienceDirect ScienceDirect

Procedia Computer Science 00 (2018) 000–000

Available at Science www.sciencedirect.com Procedia online Computer 00 (2018) 000–000

ScienceDirect

www.elsevier.com/locate/procedia www.elsevier.com/locate/procedia

Procedia Computer Science 143 (2018) 317–324

8th International Conference on Advances in Computing and Communication (ICACC-2018) 8th International Conference AdvancesininComputing Computing and (ICACC-2018) 8th International Conference onon Advances andCommunication Communication (ICACC-2018)

Design of High Speed Carry Select Adder using Modified Design of High Speed Carry Select Adder using Modified Parallel Prefix Adder Parallel Prefix Adder Abhishek R Hebbar, Piyush Srivastava, Vinod Kumar Joshi* Abhishek R Hebbar, Piyush Srivastava, Vinod Kumar Joshi*

Department of Electronics and Communication Engineering, Manipal Institute of Technology, Manipal Academy of Higher Education, Manipal 576104, Karnataka, India Department of Electronics and Communication Engineering, Manipal Institute of Technology, Manipal Academy of Higher Education, Manipal 576104, Karnataka, India

Abstract Abstract

We have proposed a modified Carry Select Adder (CSLA) structure which uses a parallel prefix structure with Binary – Kung (BK), Ladner to Excess We – 1 have converter (BEC). The proposed been(CSLA) compared with Conventional, proposed a modified Carryadder Selecthas Adder structure which uses a BEC, parallelBrent prefix structure with Binary– Fischer (LF) Kogge(BEC). – StoneThe (KS) based CSLA area, power and performance. The proposed CSLA– BEC, Brent – Kung (BK), Ladner to Excess – 1and converter proposed adder in hasterms beenof compared withconsumption Conventional, shows a(LF) significant decrease in the andCSLA powerincompared to KSpower basedconsumption CSLA. Particularly, the proposed CSLA structure Fischer and Kogge – Stone (KS)area based terms of area, and performance. The proposed CSLA exhibit asignificant by 54.41%, 7.95%, 7.82% Conventional CSLA, 65.75%, 24.65%, 21.61% to BECshows significantimprovement decrease in in thespeed area and power compared to KStobased CSLA. Particularly, the proposed CSLA structure CSLA, 13.83%, 9.30% to 43.12%, 8.99%,7.82% 5.35%totoConventional LF-CSLA, 44.64%, 10.50%, 24.65%, 6.30% to21.61% KS-CSLA for 4 exhibit 50.79%, significant improvement in BK-CSLA, speed by 54.41%, 7.95%, CSLA, 65.75%, to BECbit, 8 bit50.79%, and 16 13.83%, bit respectively. AllBK-CSLA, the CSLA43.12%, structures are designed using Verilog44.64%, HDL, simulations and synthesis have for been4 CSLA, 9.30% to 8.99%, 5.35% to LF-CSLA, 10.50%, 6.30% to KS-CSLA performed in Cadence tool using 0.18 µmCSLA CMOSstructures technology. bit, 8 bit and 16 bit respectively. All the are designed using Verilog HDL, simulations and synthesis have been performed in Cadence tool using 0.18 µm CMOS technology. © 2018 The Authors. Published by Elsevier B.V. © 2018 2018 The Authors. Published by Elsevier Elsevier B.V. This is an open accessPublished article under the CC BY-NC-ND license (https://creativecommons.org/licenses/by-nc-nd/4.0/) © The Authors. by B.V. This is an open access article under the CC BY-NC-ND license (https://creativecommons.org/licenses/by-nc-nd/4.0/) Selection under responsibility of the scientific committee of the 8th International Conference on Advances in This is an and openpeer-review access article under the CC BY-NC-ND license (https://creativecommons.org/licenses/by-nc-nd/4.0/) Selection and peer-review under responsibility of the scientific committee of the 8th International Conference on Advances in Computing andpeer-review Communication (ICACC-2018). Selection and under responsibility of the scientific committee of the 8th International Conference on Advances in Computing and Communication (ICACC-2018). Computing and Communication (ICACC-2018). Keywords: CSLA, Parallel prefix tree, Group PG logic, BEC, BK, LF, KS etc. Keywords: CSLA, Parallel prefix tree, Group PG logic, BEC, BK, LF, KS etc.

1. Introduction 1. Introduction Area, power and delay are the three factors for designing of complex digital systems. In digital adders, achieving power delay arefor themost threeoffactors for designing of complexIn digital systems. In digitalofadders, achieving highArea, speed is theand bottleneck the time critical applications. general, Performance the processor is high speed is the bottleneck for most the time critical applications. In general, Performance of the processor is decided by the slowest operation in theof design. decided by the slowest operation in the design. * Corresponding author. Tel.: +91-7892544581. E-mail address: [email protected] * Corresponding author. Tel.: +91-7892544581. E-mail address: [email protected]

1877-0509 © 2018 The Authors. Published by Elsevier B.V. This is an open access under the CC BY-NC-ND license (https://creativecommons.org/licenses/by-nc-nd/4.0/) 1877-0509 © 2018 Thearticle Authors. Published by Elsevier B.V. Selection under responsibility of the scientific of the 8th International Conference on Advances in Computing and This is an and openpeer-review access article under the CC BY-NC-ND licensecommittee (https://creativecommons.org/licenses/by-nc-nd/4.0/) Communication (ICACC-2018). Selection and peer-review under responsibility of the scientific committee of the 8th International Conference on Advances in Computing and Communication (ICACC-2018). 1877-0509 © 2018 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license (https://creativecommons.org/licenses/by-nc-nd/4.0/) Selection and peer-review under responsibility of the scientific committee of the 8th International Conference on Advances in Computing and Communication (ICACC-2018). 10.1016/j.procs.2018.10.402

318 2

Abhishek R Hebbar et al. / Procedia Computer Science 143 (2018) 317–324 Abhishek et al./ Procedia Computer Science 00 (2018) 000–000

Since addition being one of the most commonly used operation in high speed processing design, so the design of multi-bit addition for fast computation has always been challenging [1, 2]. In conventional adder, the sum for each bit positions depends on the sum produced by the previous bit position and generated carry is propagated to the next position. Conventional adder circuits impose limitation on speed due to the generation of carry and hence are considered to be a major problem by most of the authors in this field. Ripple Carry adder (RCA) is a series of cascaded full adders (FAs). The number of FAs required to make the RCA depends on the number of bits required to perform the addition of the two numbers. Since the carry propagates from lower bits to higher bits so it is named as RCA. Propagation delay (PD) occurs in the RCA circuit because the sum and carry-out (Cout) bits of any adder stage depends on the previous stage. Carry Look ahead (CLA) adder can reduce the PD at the cost of increased area. So, CLA are preferred over RCA for time critical applications. Conventional CSLA provides a trade-off between RCA and CLA in terms of area and delay [3]. A conventional CSLA uses a pair of RCA that generates sum and carry bits based on the anticipated carry-in (Cin) bits, such as Cin = 1 and Cin = 0. The Final sum and Cout bits are then selected by a multiplexer once the correct Cin is known. The two pairs of RCA perform addition in parallel, one with Cin = 1 and other with Cin = 0. The upper RCA block takes Cin = 0 and lower RCA block takes Cin = 1. The Conventional CSLA is not area efficient because it uses dual pair of RCA to produce sum and carry. To overcome this problem, conventional structure is modified by replacing lower RCA block with BEC to achieve less area and low power consumption [4]. Design of high speed adder can be achieved using parallel prefix structure and carry propagation is minimized throughout the adder structure. The CSLA structure is obtained by replacing lower and upper RCA blocks with parallel prefix adder (PPA) and BEC [4,5]. BK adder is one of the parallel prefix adder used for the high speed computation of arithmetic operations. PPA construction has 3 stages namely [5], (i) Pre-processing, (ii) Carry generation network, (iii) and Post-processing. BK adder gives asymmetrical loading on all intermediate stages, so it increases the delay with the logic level 2 log 2 N − 1, where N is the number of input bit. Regular CSLA can be modified by replacing upper RCA block with BK adder while lower RCA is retained in the modified structure and mux is used to select one of the output either from BK adder or from the RCA depending on the original Cin. BK adder uses carry generate and carry propagate to compute the carry of each stage which minimizes the carry propagation through each stage and hence minimizes the delay of the circuit. There is an increase in delay in BK adder compared to other PPA due to the maximum logic depth in its structure, while in other hand the use of minimum number of nodes help in reducing the area and power dissipation [5,6]. Ladner & Fischer [7] proposed a new parallel prefix adder which has the logic level computation depth of log 2 N + 1. The number of computational cell used in this approach is given by (N⁄4 ∗ log 2 N). It provides delay much lesser than BK adder because of reduced number of computation involved in the carry generate and carry propagation signal. KS adder provides log 2 N stages and fan out of 2 at each stage [8,9]. To achieve this, it requires long wires which must be connected between stages. The parallel prefix tree of KS adder also contains more computation cells which not only increase the area but also increase the power dissipation. So there is a scope of achieving high speed and less area by modification of parallel prefix tree. In this paper, we have presented a modified CSLA structure in which the upper RCA block was replaced with modified parallel prefix tree while the lower RCA block is retained as BEC. Using this technique, we obtained significant improvement in performance as compared to conventional CSLA, BEC- CSLA, BK- CSLA, LF- CSLA and KS-CSLA reported in literature. The organization of the paper is as follows; Section 2 gives a brief description



Abhishek R Hebbar et al. / Procedia Computer Science 143 (2018) 317–324 Abhishek et al./ Procedia Computer Science 00 (2018) 000–000

319 3

of the BEC. Section 3 introduces the parallel prefix computation and proposed CSLA. Finally, in Section 4 and 5 experimental results and conclusions are discussed. 2. Binary to Excess – 1 Conversion [4] The proposed CSLA structure is obtained by replacing the conventional RCA with BEC adder and modified PPA. The BEC based CSLA structure simplifies the logic structure by using less logic resources compared to RCA, which provides significant area reduction, but has marginally higher carry propagation delay [4]. Ramkumar et al. proposed the new structure to compute the sum and carry by replacing the RCA block with Cin = 1. The gate level modified structure [Fig. 1] is obtained by solving the function table for 4-bit BEC as shown in table 1[4], where B denotes the 4-bit binary number and E is the 4-bit binary to excess-1 conversion. Table 1. 4-bit binary to excess-1 conversion table [4].

B[3:0]

E[3:0]

0000 0001 | |

0001 0010 | |

| |

| |

1110 1111

1111 0000

The reduced expression for 4-bit binary to excess-1 conversion using table 1 are as follows [4], 𝐸𝐸0 = ~𝐵𝐵0 𝐸𝐸1 = 𝐵𝐵0 ⊕ 𝐵𝐵1 𝐸𝐸2 = 𝐵𝐵2 ⊕ ( 𝐵𝐵0 ⋅ 𝐵𝐵1 ) 𝐸𝐸3 = 𝐵𝐵3 ⊕ (𝐵𝐵0 ⋅ 𝐵𝐵1 ⋅ 𝐵𝐵2 )

(1) (2) (3) (4)

Where, ⊕ represent the EXOR operation while ‘·’ represent the AND operation.

Fig. 1. 4-bit BEC internal structure reproduced from the work of Ramkumar et al.

3. Parallel Prefix computation [9] In this section, we have briefly explained the parallel prefix computation formula by using 2 binary operands of N bits namely ‘a’ (a1, a2 ……aN) and ‘b’ (b1, b2 …… bN). In parallel prefix scheme, to efficiently compute the carry signal from 2N number of bits, two signals namely carry generate ( Gi ) and carry propagate (Pi ) are needed [9]. We can generalize these signals to describe that a group of input bits ( a i, bi ) generate the carry propagate (Pi ) and

320 4

Abhishek R Hebbar et al. / Procedia Computer Science 143 (2018) 317–324 Abhishek et al./ Procedia Computer Science 00 (2018) 000–000

carry generate (Gi ) signals. A carry generate (Gi ) signal represent those group of bits which generates a carry, if its Cout is true independent of Cin, and carry propagate (𝑃𝑃𝑖𝑖 ) signal refers to those group of bits to propagate a carry if its Cout is true when there is a Cin. These group signals are generated using Eqs. (5) and (6). The recursive formula to calculate for base case has been obtained from Weste et al., Pi:i = Pi = a i ⊕ bi Gi:i = Gi = a i ⸱bi

(5) (6)

Gi:j = Gi:k + Pi:k ⋅ Gk−1:j Pi:j = Pi:k ⸱Pk−1:j

(7) (8)

P0:0 = 0 G0:0 = Cin

(9) (10)

These signals can be recursively written for i ≥ k > j as [9, 10],

The set of Eqs. (5) and (6) are called bitwise PG logic or prefix computation [9]. The Cin bit must be defined specifically where C0 is defined as C0 = Cin and CN = Cout. Thus, propagate and generate signals for bit ‘0’ are computed by substituting i = 0 in Eqs. (5) and (6).

Fig. 2. Generalized structure of 4-bit parallel prefix tree has been redrawn from Weste et al.

We compute the sum bits Si in terms of Pi and Gi using Eq. 11, Si = Pi ⊕ Gi

Finally, from the above equations addition of parallel prefix tree structure can be computed in 3 steps, (i) First compute bitwise Gi and Pi signals using Eqs. (5) and (6), (ii) Then obtain the group generate Gi−1:0 using PG signals, where N ≥ i ≥ 1 by Eqs. (7) and (8), (iii) Finally calculating the sum using Eq. (11).

(11)



Abhishek R Hebbar et al. / Procedia Computer Science 143 (2018) 317–324 Abhishek et al./ Procedia Computer Science 00 (2018) 000–000

321 5

The generalized structure for parallel prefix tree is shown in Fig. 2. It consists of bitwise PG Logic, Group PG Logic and Sum Logic. We have done the architectural modifications in Group PG logic under the parallel prefix tree which not only increases the speed by minimizing the number of prefix computation but also reduce the area due to the use of less logic resources. 3.1. Proposed Group PG Logic In this section, we have discussed the modifications and working of the logic structure in Group PG Logic. PG logic is an example of a prefix computation. This prefix computation forms an integral part of the parallel prefix tree. The Proposed Group PG Logic structure is shown in Fig. 3 (a). Let Bi,k and Bk−1,j be two adjacent blocks in an adder module. These two blocks consist of (i − k + 1) and (k − j) bits respectively and Bi,k block consists of more significant bits than Bk−1,j . Generate and Propagate signal of individual bits of adjacent blocks are referred as (Gi,k , Pi,k ) and (Gk−1,j , Pk−1,j ) respectively. Now, if we combine the two adjacent blocks to form a continuous block, then the continuous block will have (i − j + 1) bits. Generate and propagate signals of continuous block are computed using Eqs. (7) and (8).

Fig. 3. (a) Proposed Group PG Logic (b) Black, Gray and Buffer cell used in parallel tree structure.

Fig. 3 (b) represents the black, gray and white cells, where black cells contain group generate and group propagate logic. Gray cells contain only group generate logic, while buffer simply passes the value from input to output and it also minimizes the load on the critical path. Black cell consists of 2 AND and 1 OR gate and generates 2 output (Gi:j , Pi:j ), while Gray cell is realized using 1 AND and 1 OR gate to produce an output (Gi:j ) as shown in Fig. 4.

Fig. 4. Internal structure of Black, Gray and Buffer cells using AND-OR Logic.

6 322

Abhishek et al./ Procedia Computer Science 00 (2018) 000–000 Abhishek R Hebbar et al. / Procedia Computer Science 143 (2018) 317–324

Fig. 5. Internal Gate-Level structure of Proposed Group PG Logic.

Fig. 5 represents the internal gate level structure of proposed Group PG logic obtained by replacing the black, gray and white cells of Fig. 3 (a) using equivalent AND – OR Logic. The prefixes {G5:0 ,…, G1:0 } are computed using equation (12), (13), (14), (15), (16) obtained from Fig. 5. G1:0 = G2:0 = G3:0 = G4:3 = Cout = G5:0 =

G1 + G2 + G3 + G4 + G4 +

P1 ⋅ Cin P2 ⋅ Cin P3 (G2 + P2 ⋅ Cin) P4 ⋅ G3 P4 ⋅ G3 + P3 ⋅ P4 (G2 + P2 ⋅ Cin)

(12) (13) (14) (15) (16)

It can be observed from Fig. 3 (a), that the proposed PG logic structure requires less number of Black and Gray cells which in turn reduces the area and delay in the critical path as compared to other CSLA based structures. Eq. (16) shows that the expression of prefixes for the computation of Cout is less dependent on Cin. CSLA with BEC [4] is modified by replacing the upper RCA block with proposed parallel prefix tree while the lower RCA block with BEC circuit is retained as it is. The final sum and Cout is obtained by selecting the partial output of either proposed tree or BEC structure using MUX. The proposed CSLA structure for 16 bit has been shown in Fig. 6.

Fig. 6. Proposed Carry select adder.



Abhishek R Hebbar et al. / Procedia Computer Science 143 (2018) 317–324 Abhishek et al./ Procedia Computer Science 00 (2018) 000–000

323 7

4. Results Table 2 shows the experimental results of the modified CSLA compared to conventional CSLA, BEC- CSLA, BK- CSLA, LF- CSLA and KS-CSLA with respect to area, power and delay. All the simulations and synthesis are carried out using NC launch and RC compiler tool respectively. The area refers to the total area of the entire cell, power dissipation is a combination of both the leakage power and dynamic power, and delay indicates the maximum arrival time at the output port of the synthesized design. The proposed CSLA shows significant improvement in percentage area reduction of 84.19%, 14.43%, 11.57% and power reduction of 40.25%, 14.24%, 11.52% for 4, 8 and 16-bit respectively as compared to KS based CSLA. The tree structure provides comparatively less area and power reduction due to minimization of gray and black cells. Fig. 7 shows corresponding graph of percentage area and power reduction with respect to KS based CSLA. It shows significant area and power reduction for lower bit while for higher bit a marginal improvement is there. It can be noticed from the table 2 that the delay of the proposed CSLA shows a reduction with respect to other CSLA based adder structure for various word size 4, 8 and 16 – bit. Fig. 8 shows the comparison of delay versus word size of the proposed CSLA with conventional CSLA, BEC-CSLA, BK-CSLA, LF- CSLA and KS-CSLA structures. It can be observed from the graph that the delay increases with word size and proposed CSLA performs better as compared to other CSLA structures. It is due to the efficient propagation of the Cin signal throughout the parallel prefix tree without being dependent on the input bits. Table 2. Comparison of Power, Area and Delay for different CLSA structures. Word size

4-bit

8-bit

16-bit

CSLA structure

Area (µm2)

Total Power (µW)

669

Area comparison with proposed CSLA 62.18%

Delay (ns)

33.56

Power comparison with proposed CSLA 62.40%

1.36

Delay comparison with proposed CSLA 54.41%

Conventional CSLA [3] BEC-CSLA [4] BK-CSLA [5]

532 293

52.44%

27.10

53.43%

1.81

65.75%

13.65%

15.35

17.79%

1.26

50.79%

LF-CSLA [7] KS-CSLA [8]

299

15.39%

15.58

19.00%

1.09

43.12%

466

84.19%

21.12

40.25%

1.12

44.64%

Proposed CSLA

253

-

12.62

-

0.62

-

Conventional CSLA[3] BEC-CSLA [4]

978

4.81%

54.29

9.23%

1.76

7.95%

931

0

53.61

8.08%

2.15

24.65%

BK-CSLA [5]

815

-14.23%

46.79

-5.32%

1.88

13.83%

LF-CSLA [7] KS-CSLA [8]

828 1088

-12.44% 14.43%

43.02 57.46

-14.55% 14.24%

1.78 1.81

8.99% 10.50%

Proposed CSLA

931

-

49.28

-

1.62

-

Conventional CSLA[3] BEC-CSLA [4] BK-CSLA [5] LF- CSLA [7] KS- CSLA [8] Proposed CSLA

2315

11.19%

131.36

12.37%

3.07

7.82%

1979 1859 1879 2325 2056

-3.89% -10.60% -9.42% 11.57% -

116.89 110.27 104.38 130.10 115.11

1.52% -4.39% -10.28% 11.52% -

3.61 3.12 2.99 3.02 2.83

21.61% 9.30% 5.35% 6.30% -

324 8

Abhishek R Hebbar et al. / Procedia Computer Science 143 (2018) 317–324 Abhishek et al./ Procedia Computer Science 00 (2018) 000–000

Fig. 7. Percentage area and power reduction of proposed CSLA with respect to KS-CSLA.

Fig. 8. Comparison of delay with various CSLA structure for word size of 4,8 and 16-bit.

5. Conclusion It can be concluded from the results that modified CSLA structure combines the advantage of area reduction provided by BEC and the high speed achieved through parallel prefix tree. Further scope would include in the better parallel prefix structure and implementation of modified CSLA adder structure for 64, 128- bit CSLA. References [1] Raghava et al., High Speed Power Efficient Carry Select Adder Design, IEEE Computer Society Annual Symposium on VLSI, 2017, pp. 3237. [2] J. M. Rabaey, Digtal Integrated Circuits—A Design Perspective, Upper Saddle River, NJ: Prentice-Hall, 2001. [3] O. J. Bedrij, Carry Select Adder, IRE Transactions on Electronic Computers, 1962, Vol. EC-11, No. 3, pp. 340–346. [4] B. Ramkumar and Harish M. Kittur, Low-Power and Area-Efficient Carry Select Adder, IEEE Transactions on Very Large Scale Integration (VLSI) Systems, February 2012, Vol. 20, No. 2, pp. 371–375. [5] Pallavi Saxena, Design of Low Power and High Speed Carry Select Adder Using Brent Kung Adder, International Conference on VLSI Systems, Architecture, Technology and Applications (VLSI-SATA), January 2015, pp. 1-6. [6] K. Golda Hepzibha et al., A Novel Implementation of High Speed Modified Brent Kung Carry Select Adder, International Conference of Intelligent systems and Control, January 2016, pp. 1-5. [7] Richard E. Ladner et. al, Parallel Prefix Computation, Journal of the Association for Computing Machinery, October 1980, Vol 27, No 4, pp 831-838. [8] Peter M. Kogge and Harold S. Stone, Parallel Algorithm for the Efficient Solution of a General Class of Recurrence Equations, IEEE Transactions On Computers, August 1973, Vol. C-22, No. 8. [9] Neil H. E. Weste and D. Harris, CMOS VLSI Design, 4th ed. Addison Wesley, Pearson Education, 1993. [10] Sabyasachi Das and Sunil P. Khatri, A Novel Hybrid Parallel-Prefix Adder Architecture with Efficient Timing-Area Characteristic, IEEE Transactions on VLSI Systems, March 2008, Vol. 16, No. 3.