Performance Comparison of 64-bit Carry Look-Ahead Adders Using 32nm CMOS Technology

Performance Comparison of 64-bit Carry Look-Ahead Adders Using 32nm CMOS Technology

Available online at www.sciencedirect.com ScienceDirect Materials Today: Proceedings 4 (2017) 4153–4168 www.materialstoday.com/proceedings I3C4N P...

1MB Sizes 31 Downloads 37 Views

Available online at www.sciencedirect.com

ScienceDirect Materials Today: Proceedings 4 (2017) 4153–4168

www.materialstoday.com/proceedings

I3C4N

Performance Comparison of 64-bit Carry Look-Ahead Adders Using 32nm CMOS Technology

T.D.Subasha, Ajaiyanb, T.D.Subhac a

b

c

Head of ECE department, Holy Grace Academy of Engineering, Mala, Thrissur, India

Assistant Professor of ECE Department, Holy Grace Academy of Engineering, Mala,Thrissur, India

Assistant Professor of ECE Department, Holy Grace Academy of Engineering, Mala,Thrissur, India

Abstract

In this paper, MCC carry chain with 8-bit carry chain proposed by Costas Efstathiou and the adder circuits were designed using 32nm CMOS technology with a supply voltage of 0.9v and the effect of temperature on the circuit performance is also analyzed in this work. For 4-bit, the input required are carry

a0 to a3 and b0 to b3 along with initial

cin with the sum output signals as s0 to s3 and the carry outputs are c0 to c3 . The new high speed MCC

requires inputs

a0 to a7 , b0 to b7 , cin , clock and the corresponding outputs are c0 to c7 and are implemented by

using HSPICE. © 2017 Published by Elsevier Ltd. Selection and Peer-review under responsibility of Conference Committee Members of International Conference on Computing, Communication, Nanophotonics, Nanoscience, Nanomaterials and Nanotechnology. Keywords: Conventional domino MCC (Manchester carry chain); CLA (Carry look-ahead adders); multi-output domino logic.

2214-7853 © 2017 Published by Elsevier Ltd. Selection and Peer-review under responsibility of Conference Committee Members of International Conference on Computing, Communication, Nanophotonics, Nanoscience, Nanomaterials and Nanotechnology.

4154

T.D.Subash, Ajaiyan, T.D.Subha/ Materials Today: Proceedings 4 (2017) 4153–4168

I. Introduction

The most frequently used arithmetic operation is addition and also the speed-limiting element to make VLSI processors faster. As the demand for higher performance and working processors increases, there is an improvement achieved with higher performance and working of arithmetic units and to increment their functionality. The carry select-adders, (CLA) adders, carry-skip adders, carry look -ahead conditional sum adders, and combinations of these structures [1]–[4] are included in high-speed adder architectures. High-speed adders are based on the CLA principle remains dominant, by calculating each stage in parallel the carry delay can be improved. The first introduced CLA algorithm was in [5], and many other variants have been developed. The Manchester carry chain (MCC) is the most commonly used domino architecture is the CLA adder with regular, fast, and simple structure adequate for implementation in VLSI [6], [7] is the Manchester carry chain (MCC) which is the most common dynamic(domino) CLA adder architecture. The aim is to reduce the area and to reduce the power. A. Full Adder A full adder is constructed with initial carry, a and b as input and generates output as sum and carry. For a 4-bit full adder, four full adder blocks is used with separate inputs and outputs. These full adder blocks are connected in serial manner and the carry out of first full adder is given as the input of second full adder, likewise the full adders are connected. The fig.1 shows the block diagram of 4-bit full adders.

Fig.1.Block diagram of 4-bit full adders (wikepedia).

B.

Carry look-ahead Adder

A fast adder or carry –look-ahead adder is a type of adder used in digital logic. The speed of carry-look-ahead adder is improved by decreasing the amount of time required to determine carry bits. It can be intransitive with the simpler one, usually it can be slower, ripple carry adder for which the carry bit is calculated alongside the sum bit, and each bit must wait until for the previous carry has been calculated to start calculating its own result and carry bits (like ripple carry adder). As same as to that of the carry look ahead adder, it should calculates one or more carry bits before the sum, thus it decreases the wait time to calculate the result of the larger value bits. Fig.2 shows the

T.D.Subash, Ajaiyan, T.D.Subha / Materials Today: Proceedings 4 (2017) 4153–4168

4155

pictorial representation of 4-bit adder with carry-look-ahead adder. The 4-bit adder with carry-look-ahead adder generates the propagate and generate signal with carry and sum as output. The examples of this type of adders are Kogge-Stone adder and Brent-Kung adder. Charles Babbage realized the performance penalty imposed by ripple carry and developed condition for anticipating carriage in his computing engines.

Figure 2: Block diagram of a 4-bit CLA (2).

For reducing the computation time, add the two binary numbers by using the carry chains. They work by creating two signals Carry Propagator and Carry Generator known to be P and G respectively. The carry generator is used to generate the output carry, regardless of input carry and the carry propagator is used to propagate to the next level and is shown in fig.2. The carry-look-ahead adder can be split up in two modules: (1) the PFA (Partial Full Adder) generates Pi, Gi and Si as given by equations 3, 4 and 9 above; and (3) the Carry Look-ahead Logic, which generates the carry-out bits according to the equations from 5 to 8. By using 4 PFAs and the Carry Look-ahead logic block, the 4- bit adder can then be built up, as shown in Figure 4. The carry logic is getting quite complicated for more than 4 bits, this is one of the disadvantage of carry- look ahead adder. So by this reason, specified carry-look-ahead adders with multiples of 4-bits are used and are usually implemented as 4-bit modules. The domino circuit representation is shown in figure 4. The circuit makes use of the similar CLA Logic block as one used in the 4-bit adder. The 4-bit adder generates a group of generate and propagate signal and is use by the CLA logic block. Likewise the multi-output domino CMOS logic a new 8-bit carry chain adder block is proposed. By two independent 4-bit carry chains the odd and even carries of this adder are computed in parallel. Implementation of wider adders based on the use of the new 8-bit adder shows that it operates with higher speed. Fig.3 shows the block diagram of a 16-bit CLA adder.

4156

T.D.Subash, Ajaiyan, T.D.Subha/ Materials Today: Proceedings 4 (2017) 4153–4168

Figure 3: Block diagram of a 16-bit CLA Adder (2).

For constructing integrated circuits the technology used is CMOS. The CMOS technology is used in microprocessors,, static RAM, etc. This technology is also used for many analog circuits such as data converters, image sensors (CMOS sensors) and other types of transceivers used for communication. In 1963, Frank Wanlass patented CMOS, while working for Fairchild Semiconductor. The meaning of "complementary-symmetry" specifies that the particular design methodology with CMOS technology and it uses symmetrical and complementary pairs of n and p-type MOSFET’s. The reduction in static power and increase in noise immunity are the two major characteristics involved in CMOS devices. The series connection is used to obtain proper output during the switching between on and off states, because one transistor of the pair is always off. Consequently, compare to other logics CMOS devices do not produce waste heat as much as other forms of logic, for example, even not changing state NMOS logic or transistor–transistor logic (TTL) which normally have some standing current . On a chip CMOS also allows a high density of logic functions. It was primarily for this reason that in VLSI chip CMOS became the most used technology to be implemented. The domino logic and its operation is discussed below.

II. Previous work and its discussion In this, the domino logic is explained in well manner. The domino logic circuit implementation for the XOR and OR gates propagate as well as generate signals is shown in fig.4.

T.D.Subash, Ajaiyan, T.D.Subha / Materials Today: Proceedings 4 (2017) 4153–4168

4157

Fig. 4. Domino circuit implementation for the (a) generate, (b) XOR propagate, and (c) OR propagate signals (3).

Fig. 5. Conventional domino 4-bit MCC (3).

A  a n1a n2 ...a1a0 and B  bn1bn2 ...b1b0 which represents two binary numbers. In which A and B are added and produce an output called sum and is represented as S  s n 1 s n  2 ...s1 s 0 . The below symbols &,  , ⊕, -, and + are used to denote the AND, INCLUSIVE OR, EXCLUSIVE OR, NOT and OR logical Let us consider

operations, respectively. The recursive formula for the computer addition illustrated by Costas Efstathiou is normally expressed in binary form and is given below: ci  g i  z i .ci 1 (1)

4158

where

T.D.Subash, Ajaiyan, T.D.Subha/ Materials Today: Proceedings 4 (2017) 4153–4168

g i  ai .bi and z i are denoted as the carry generate and also the carry propagate terms, respectively. While

considering the INCLUSIVE OR adders, it is the sum of ai and bi defined as z i

 t i  ai  bi and for EXCLUSIVE

OR adders,

Fig. 6. Static CMOS implementation of the XOR gate for the sum computation (3).

it is defined as given below,

z i  pi  ai  bi . In fig.8, the implementation part of the generate and the two types

of propagate signals in domino CMOS logic is shown. By expanding the relation (1), each carry bit

ci can be

expressed as given below: ci  g i  z i g i 1  z i z i 1 g i 2  ...  z i z i 1 ...z1 g 0  z i z i 1 ...z 0 c 1

The sum of the bits of an adder is defined as si

(2)

 pi  ci 1 , in which c1 is set as the input carry. The MCC

represented in [15], [17] were generates based on the carries with respect to the relation (2), by using an iterative shared transistor design. To cut down the number of series connected transistors, the length of the CLA is limited to four by that reduce the number of series-connected transistors. Fig. 5 shows the conventional implementation of the 4-bit carry chain using multi-output domino CMOS logic. MCC adders are EXCLUSIVE OR adders, i.e., the carry propagate signal is defined as z i  pi  ai  bi , to avoid false discharges produced at the output of the carry chain multi-output gates. For the implementation of the sum signals, the domino chain is terminated, and the sum bit of the MCC adder are implemented by using static CMOS XOR gates and is mentioned in [26], and is shown in figure 6. In the reference 14 and 17 several variations are introduced by the authors. As well as the implementation of static CMOS and MCC are also explained in detail. For improving the performance, the MCC for high speed design is used to get better output. III. New double carry chain adders with high-speed Based on the discussions, the length of the carry chain is reduced to 4-bits according to the technological constraints. The design which uses wider adders may use 4-bit adders and are mentioned clearly in [14], [18], [22]. In this paper, the author proposed a module with 8-bit adder and is composed with the two carry chains in independent manner. The lengths of the carry chains are equal to the 4-bit MCC adders. These chains are termed as the maximum number of series connected transistors. The simulated output response uses the basic block representation as the proposed 8-bit adder rather than the 4-bit MCC adder, which leads high speed during implementation. The carry equation which are derived are same as to that of Ling proposed carries in [25]–[27]. The carry equations derived are of even as well as odd carries. The even carry equations and the odd carry equations are

T.D.Subash, Ajaiyan, T.D.Subha / Materials Today: Proceedings 4 (2017) 4153–4168

4159

separately computed. This separation includes its implementation depends on the two 4- bit carry chains in independent manner; one of the chains computes the even one and the other chain computes the odd one. The proposed method is explained clearly by using MCC adders. Fig.9 shows the sum bit implementation and fig.10 shows the ripple carry chains of 4 and 8-bit MCC adder.

Fig. 7. Proposed carries’ implementation for (a) the even carry chain and (b) the odd carry chain (3).

A. Computation for Even Carry Generation When

i  0 and z o  t 0 , from equation (1), the obtained result is c0  g 0  t 0 .c1 . If the relation g i  g i .t i

holds, then the carry is mentioned as c0

 t 0 .( g 0  c1 )  t 0 .h0 , where h0  g 0  c1 in which c1 is the new

carry. From equation (2), for

i  2 and z i  pi , we obtained that

c2  g 2  p 2 g1  p2 p1 g 0  p 2 p1 p0 c1 . Since g i  pi .g i 1  g i  t i .g i 1 and pi  pi .t i , then the carry is c 2  t 2 ( g 2  g1  p 2 p1 g 0  p 2 p1 p 0 c 1 )  t 2 ( g 2  g1  p 2 p1t 0 ( g 0  c 1 ))  t 2 .h2 Where

h2  g 2  g1  p2 p1t 0 ( g 0  c1 ) is the new carry and is denoted as C2.

Similarly, the new carries for i  4,6 are computed as follows,

4160

T.D.Subash, Ajaiyan, T.D.Subha/ Materials Today: Proceedings 4 (2017) 4153–4168

h4  g4  g3  p4 p3t2 (g2  g1  p2 p1t0 (g0  c1 ))

h6  g6  g5  p6 p5t4  (g4  g3  p4 p3t2 (g2  g1  p2 p1t0 (g0  c1 )))

Fig. 8. New circuit (a) generate and (b) propagate signals implemented in domino CMOS logic (3).

Fig. 9. Sum bit implementation (3).

T.D.Subash, Ajaiyan, T.D.Subha / Materials Today: Proceedings 4 (2017) 4153–4168

4161

Fig. 10. Ripple carry chains based on (a) the proposed 8-bit MCC adder module and (b) the conventional 4-bit MCC adder module (3).

B. Computation for Odd Carry Generation The new carries computed for the odd values of i are obtained as same as the even carry computation. The values used for i is odd numbers and are as follows:

h1  g1  g 0  p1 p0 c1 h3  g 3  g 2  p3 p 2 t1 ( g1  g 0  p1 p0 c1 ) h5  g5  g4  p5 p4t3(g3  g2  p3 p2t1(g1  g0  p1 p0c1))

h7  g7  g6  p7 p6t4 (g5  g2  p5 p4t3 (g3  g2  p3 p2t1(g1  g0  p1 p0c1))) The newly generated and propagated signals are given as, Gi

 g i  g i 1 and Pi  pi . pi 1 .t i 2 respectively. In

multi-output domino CMOS logic, the equations which are mentioned above explains in well manner about the new designed groups of carries and are computed parallel and is shown in Fig. 6. The new generate and propagate signals are represented as Gi and Pi are easily proven that are mutually exclusive and eliminating false node to be discharged. The domino CMOS circuit implementation is shown in Fig. 4. In between the new as well as the conventional carries, ci 1  t i 1 .hi 1 holds; where, the sum bits are calculated as si  pi  (t i 1 .hi 1 ) . According to [17] and [18], the calculation of the sum bits can be performed as mentioned below:

si  hi 1 .pi  hi 1 .( pi  t i 1 ) (3) In which i  0 and s 0  p0  c 1 respectively. The equation (3) can be expressed by using a 2 → 1 multiplexer pi nor pi  t i 1 with respect to the value of hi 1 , as shown in Fig. 8. By considering the XOR gate which introduces same delay with a 2 → 1 multiplexer, as well as both terms pi and pi  t i 1 are

which selects neither

calculated faster than hi, also no additional delay is introduced by using the proposed carry signals for the calculation of the sum bits and is referred in (3). The purpose of multiplexer is to convert many inputs to a single output. For implementing the sum signals, the static and domino chain is terminated by using the CMOS technology, the pi  t i 1 gate and the final 2 → 1 multiplexer. The design of XOR gate which is shown in Fig. 7 is similar to that of Fig. 5. The Fig. 13 shows the implementation of an efficient static CMOS.

4162

T.D.Subash, Ajaiyan, T.D.Subha/ Materials Today: Proceedings 4 (2017) 4153–4168

Fig.11. input signal waveforms.

T.D.Subash, Ajaiyan, T.D.Subha / Materials Today: Proceedings 4 (2017) 4153–4168

Fig.12. Output signal waveforms.

4163

4164

T.D.Subash, Ajaiyan, T.D.Subha/ Materials Today: Proceedings 4 (2017) 4153–4168

Fig.13. Effects of temperature on power of sum circuit.

Fig.14. Effects of temperature on delay of sum circuit.

T.D.Subash, Ajaiyan, T.D.Subha / Materials Today: Proceedings 4 (2017) 4153–4168

Fig.15. Effects of temperature on delay of carry circuit.

Fig.16. Effects of temperature on power of 8-bits sum circuit.

4165

4166

T.D.Subash, Ajaiyan, T.D.Subha/ Materials Today: Proceedings 4 (2017) 4153–4168

Fig.17. Effects of temperature on delay of 8-bits sum circuit.

Fig.18. Effects of temperature on delay of 8-bits carry circuit.

IV. Comparisons of MCC design and conventional design To analyze the speed performance of the proposed (PROP) the different bits of adders are used to designed according to the principle of carry chain, given in Fig. 8(a) and (b) respectively, and simulated using HSPICE in a standard 32-nm CMOS technology (VDD = 0.9 V). By cascading two, four, eight, and sixteen 4-bit MCC adder modules, the conventional 8, 16, 32, and 64-bit MCC adders are designed.

T.D.Subash, Ajaiyan, T.D.Subha / Materials Today: Proceedings 4 (2017) 4153–4168

4167

TABLE I COMPARISION OF CONVENTIONAL MCC ADDER AND ITS POWER DELAY Sum delay Carry delay Sum power No. (Ps) (Ps) (µw) of bits 4-bit 75.11 69.32 4.73 8-bit 150.22 138.64 9.46 16-bit 300.44 277.28 17.72 32-bit 600.88 554.56 37.84 64-bit 1201.76 1109.12 75.68 The Table I shows the simulation results, for the power delay comparison of conventional MCC adder. For 4-bit adder the sum power, sum and carry delay achieved as 75.11ps, 69.32ps and 4.73µw respectively. Likewise, for 64bit the sum power obtained is 75.68µw. This shows that when number of bits increases the amount of power required is also get increases. The Fig.11 shows the pictorial representation of input signal waveforms. For 64-bit conventional MCC adder, the input required are a 0 to a3 and b0 to b3 along with initial carry cin . So for 4-bit, the

a0 to a3 and b0 to b3 along with initial carry cin with the sum output signals as s0 to s 3 and the carry outputs are c0 to c3 . The new high speed MCC requires inputs a 0 to a 7 , b0 to b7 , cin , clock and the corresponding outputs are c0 to c7 . The Fig.12 shows the pictorial representation of output signal waveforms. input required are

Fig.13 shows the effects of temperature on power of sum circuits. In which at 300C the power used is 78µw and for 800C, the power used is 66µw. When the temperature increases from 300C to 800C the power is reduced by 12µw. The fig.14 shows effects of temperature on delay of sum circuit. At 300C the delay is 66ps and for 1000C, the delay is 79ps. This shows when the temperature increases the delay is increased by 13ps. Fig.15 shows the Effects of temperature on delay of carry circuit. At 300C the delay is 61ps, likewise for 1000C, the delay is 79ps and is shown in fig.16. Thus when temperature increases the delay also increases by 18ps. The fig.16 shows the effects of temperature on power of 8-bits sum circuit. In which at 200C, the power used is 168µw, likewise for 800C, the power used is 132µw. Thus the power is reduced by 36µw. Fig.17 shows the effects of temperature on delay of 8bits sum circuit. At 300C, the delay is 88ps and at 1000C, the delay is 108ps by thus the delay is increased by 20ps. The fig.18 shows the effects of temperature on delay of 8-bits carry circuit, in which at 300C the delay is 68ps, likewise at 1000C, the delay is 84ps, and thus the delay is increased by 16ps. V. Conclusion Thus the MCC plays an important role in the construction of adders. In this paper, the concept of new MCC adder is explained in brief manner. To improve the speed two independent carry chains are used. Each of the carry chains are computed in parallel manner. By this, the speed of the performance is improved. Thus the design for 4-bit new MCC adder is implemented by using HSPICE. Likewise for 64-bit MCC adder can implement using HSPICE, but it requires 3000 transistors to design the circuit. Thus the 4-bit MCC adder is implemented with 67µw sum power, 75.11ps sum delay and 69.32ps carry delay. References [1] M. H. Hjakazemi and A. Baniasadi, “An alternative hybrid poweraware adder for high-performance processors,” Journal of Low Power Electronics, vol. 10, no. 1, pp. 38–44, 2014. [2] J. Samanta, M. Halder, and B. P. De, “Performance analysis of high speed low power carry look-ahead adder using different logic styles,” International Journal of Soft Computing and Engineering (IJSCE) ISSN, pp. 2231–2307, 2013. [3] Costas Efstathiou, Zaher Owda and Yiorgos Tsiatouhas, “New high-speed multioutput carry look-ahead adders”, in Circuits and SystemsII:express briefs, 2013 IEEE 56th International Midwest Symposium on. IEEE, 2013, pp. 1387–1390. [4] H. Elmiligi, M. W. El-Kharashi, and F. Gebali, “Power consumption of 3d networks-on-chips: Modeling and optimization,” Microprocessors and Microsystems, vol. 37, no. 6, pp. 530–543, 2013. [5] R. Uma, V. Vijayan, M. Mohanapriya, and S. Paul, “Area, delay and power comparison of adder topologies,” International Journal of VLSI design & Communication Systems (VLSICSj Vo1. 3, No. 1, 2012. [6] M. Hajkazemi, A. Haghdoost, and A. Baniasdi, “Reconfiguring the carry look-ahead adder using application behavior in embedded

4168

T.D.Subash, Ajaiyan, T.D.Subha/ Materials Today: Proceedings 4 (2017) 4153–4168

processors,” in Electrical Engineering/Electronics Computer Telecommunications and Information Technology (ECTI-CON), 2010 International Conference on. IEEE, 2010, pp. 183–187. [7] R. Singh, P. Kumar, and B. Singh, “Performance analysis of fast adders using vhdl,” in Advances in Recent Technologies in Communication and Computing, 2009. ARTCom’09. International Conference on. IEEE, 2009, pp. 189–193. [8] P. Gurjar, R. Solanki, P. Kansliwal, and M. Vucha, “Vlsi implementation of adders for high speed alu,” in India Conference (INDICON), 2011 Annual IEEE. IEEE, 2011, pp. 1–6. [9] P. Celinski, J. F. L´opez, S. Al-Sarawi, and D. Abbott, “Low depth, low power carry lookahead adders using threshold logic,” Microelectronics journal, vol. 33, no. 12, pp. 1071–1077, 2002. [10] K.-H. Cheng, S.-W. Cheng, and W.-S. Lee, “64-bit pipeline carry lookahead adder using all-n-transistor tspc logics,” Journal of Circuits, Systems, and Computers, vol. 15, no. 01, pp. 13–27, 2006. [11] F. Gebali and A. Ibrahim, “Optimized structures of hybrid ripple carry and hierarchical carry lookahead adders,” Microelectronics, 2015, in print. [12] B. H. Meyer, J. J. Pieper, J. M. Paul, J. E. Nelson, S. M. Pieper, and A. G. Rowe, “Power-performance simulation and design strategies for single-chip heterogeneous multiprocessors,” Computers, IEEE Transactions on, vol. 54, no. 6, pp. 684–697, 2005. [13] R. Kumar and S. Dahiya, “Performance analysis of different bit carry look ahead adder using vhdl environment,” vol. 2, 2013. [14] K. Ueda, H. Suzuki, K. Suda, H. Shinohara, and K. Mashiko, “A 64-bit carry look ahead adder using pass transistor bicmos gates,” SolidState Circuits, IEEE Journal of, vol. 31, no. 6, pp. 810–818, 1996. [15] M. C. Osorio, C. Sampaio, A. Reis, R. P. Ribas et al., “Enhanced 32- bit carry look-ahead adder using multiple output enable-disable cmos differential logic,” pp. 181–185, 2004. [16] C. Dacheng, “Vhdl implementation of a fast adder tree,” 2005. [17] C.-C. Wang, P.-M. Lee, R.-C. Lee, and C.-J. Huang, “A 1.25 ghz 32-bit tree-structured carry lookahead adder,” vol. 4, pp. 80–83, 2001. [18] S. H. Kim and S.-K. Chin, “Formal verification of tree-structured carrylookahead adders,” pp. 232–232, 1999. [19] R. Zlatanovici, S. Kao, and B. Nikolic, “Energy–delay optimization of 64-bit carry-lookahead adders with a 240 ps 90 nm cmos design example,” Solid-State Circuits, IEEE Journal of, vol. 44, no. 2, pp. 569–583, 2009. [20] H. Dao and V. G. Oklobdzija, “Application of logical effort techniques for speed optimization and analysis of representative adders,” in Signals, Systems and Computers, 2001. Conference Record of the Thirty-Fifth Asilomar Conference on, vol. 2. IEEE, 2001, pp. 1666–1669. [21] G. A. Ruiz and M. Granda, “An area-efficient static CMOS carry-select adder based on a compact carry look-ahead unit,” Microelectron. J., vol. 35, no. 12, pp. 939–944, Dec. 2004. [22] M. Osorio, C. Sampaio, A. Reis, and R. Ribas, “Enhanced 32-bit carry look-ahead adder using multiple output enable-disable CMOS differential logic,” in Proc. 17th Symp. Integr. Circuits Syst. Design, 2004, pp. 181–185. [23] S. Perri, P. Corsonello, F. Pezzimenti, and V. Kantabutra, “Fast and energy-efficient Manchester carry-bypass adders,” Proc. Inst. Elect. Eng.—Circuits Devices Syst., vol. 151, no. 6, pp. 497–502, Dec. 2004. [24] C. Efstathiou, H. T. Vergos, and D. Nikolos, “Ling adders in CMOS standard cell technologies,” in Proc. 9th ICECS, Sep. 2002, vol. 2, pp. 485–489. [25] B. Parhami, Computer Arithmetic, Algorithms and Hardware. New York, NY, USA: Oxford Univ. Press, 2000. [26] A. Weinberger and J. L. Smith, “A logic for high speed addition,” Nat. Bureau Stand. Circulation, vol. 591, pp. 3–12, 1958. [27] N. Weste and D. Harris, CMOS VLSI Design, A Circuit and System Perspective. Reading, MA, USA: Addison-Wesley, 2004. [28] P. K. Chan and M. D. F. Schlag, “Analysis and design of CMOS Manchester adders with variable carry-skip,” IEEE Trans. Comput., vol. 39, no. 8, pp. 983–992, Aug. 1990. [29] Z.Wang, G. Jullien,W.Miller, J.Wang, and S. Bizzan, “Fast adders using enhanced multiple-output domino logic,” IEEE J. Solid State Circuits, vol. 32, no. 2, pp. 206–214, Feb. 1997. [30] G. Dimitrakopoulos and D. Nikolos, “High-speed parallel-prefix VLSI Ling adders,” IEEE Trans. Comput., vol. 54, no. 2, pp. 225–231, Feb. 2005.