A novel high speed Low Latency Column Bit Compressed MAC architecture for Wireless Sensor Network applications

A novel high speed Low Latency Column Bit Compressed MAC architecture for Wireless Sensor Network applications

Computer Communications 150 (2020) 739–746 Contents lists available at ScienceDirect Computer Communications journal homepage: www.elsevier.com/loca...

2MB Sizes 0 Downloads 34 Views

Computer Communications 150 (2020) 739–746

Contents lists available at ScienceDirect

Computer Communications journal homepage: www.elsevier.com/locate/comcom

A novel high speed Low Latency Column Bit Compressed MAC architecture for Wireless Sensor Network applications R. Suguna ∗, R. Vimalathithan Department of Electronics & Communication Engg, Karpagam College of Engineering, India

ARTICLE Keywords: VLSI MAC WSN Area Delay Power

INFO

ABSTRACT Wireless Sensor Network (WSN) provides significant challenges for application such as distributed control and digital signal processing. Multiply-Accumulate Unit (MAC) plays a significant role in kernel computation which determines the speed and power factor of entire system. Constructing low power and high speed MAC is very critical to utilize VLSI technologies in WSN. This research work concentrates on fast, low power and reduced delay based Low Latency Column Bit Compressed (LLCBC) MAC. This proposed MAC design based on binary stacking counter is designed for optimizing delay, power, area and hardware complexities. Increased operational speed is attained by performing 6:3 and 7:3 binary stacking counters with higher column indeed of conventional full adder. The proposed method is simulated using cadence environment, in which superior outcomes are attained. The parameters such as Area, Cells, leakage power (nW), Dynamic Power (nW), Total Power (nW) and Delay (ps) are evaluated. For instance, in 16 bit, the proposed (LLCBC) MAC consumes 14.4% Area, 17.5% Total Power and 46.2% Delay with respect to maximum value among all MAC units compared. The proposed architecture is simulated, synthesized and place & route is done with 90 nm standard CMOS library using cadence SOC encounter, effectual improvement in terms of Power, Area, and Delay is attained.

1. Introduction

Communication and processing in wireless sensor networks possess huge application in wireless communication. Filtering is one of the

Wireless Sensor Networks (WSNs) provides network connectivity with compact micro sensors possessing the ability to communicate wirelessly. These micro sensors are available abundantly since they are less costly. WSNs handle many applications of gathering data from military to health care services. Some of architectural challenges are encountered by the designers like Power, Area, Energy, Delay, Cost, Resources, and Ability to sense etc., WSN is similar to embedded system in which the system comprises of both hardware and software. Every sensor nodes are equipped with processing units. Many diverse kinds of processing units are integrated with sensor nodes. There are huge amount of commercially needed microcontrollers, field programmable gate arrays (FPGA), DSP’s [1] which facilitates huge flexibility during execution. All the sensor nodes need either microcontroller or microprocessor. Generally FPGA is not used in sensor nodes due to many reasons. The main reason is that power consumption is more and not compatible with programming methods. Due to this VLSI is not adopted in Wireless sensor networks for the sensor nodes. However VLSI techniques may be required for computational complexities in huge WSN applications [2].

main tasks in signal processing applications such as environmental monitoring, object tracking, surveillance and numerous other applications. But these applications require effective energy resource in computational nodes of WSN. Most of the sensor nodes do not possess computational resources to complete numerous communication tasks repeatedly. As MAC unit [3] acts as a significant kernel design in various applications, an effectual MAC design [4] is anticipated in this investigation. Thus, reduced delay, power consumption using the proposed MAC unit will produce significant impact towards energy consumption of every node [5]. As outcome [6] efficiency based MAC unit improves the WSNs’ efficiency and enhances computational capabilities, which can be applied to various WSN applications. In this work, a fast, low power and reduced latency [7] based MAC unit is designed and the various performance factors are compared with the existing MAC [8,9]. The proposed Low Latency Column Bit Compressed (LLCBC) MAC unit performance is improved in terms of power, delay and area [10]. The

∗ Corresponding author. E-mail addresses: [email protected] (R. Suguna), [email protected] (R. Vimalathithan).

https://doi.org/10.1016/j.comcom.2019.11.013 Received 22 May 2019; Received in revised form 9 October 2019; Accepted 11 November 2019 Available online 19 November 2019 0140-3664/© 2019 Elsevier B.V. All rights reserved.

R. Suguna and R. Vimalathithan

Computer Communications 150 (2020) 739–746

3. Various MAC architecture

two significant contributions of LLCBC MAC are as follows: The significant contributions of designing Low Latency Column Bit Compressed (LLCBC) MAC unit are given below:

MAC 2C-Conventional: The Conventional MAC 2C is one which concentrates on reducing the delay of accumulate adder when compared to the PP unit because of the XOR gates when compared to the full adder delay as in Fig. 2. MAC 2C-RA (Register Altered): In this MAC moves the critical path moves through final adder and partial product as in Fig. 3 MAC 3C: This 3-cycle MAC insist on the PP unit critical path as in Fig. 4. MAC 2C-CSK (Carry Skip Adder): It is similar to MAC 2C-RAbut carry skip logic is used in partial product addition in Fig. 5. Low Latency Column Bit Compressed (LLCBC) MAC unit: The LLCBC MAC critical path delay is reduced by means of 6:3 and 7:3 counter based stacking in partial product addition as in Fig. 6. Here, in this work four architectures were considered that share similar structure for partial product unit, accumulate adder and final adder:

• By using 6:3 and 7:3 [11,12] counters based symmetric stacking, the delay has been reduced by reducing the critical path caused by the XOR gates. This generates Low Latency Column Bit Compressed (LLCBC) MAC unit, thereby reducing the overhead in terms of energy, latency or area. • Low Latency Column Bit Compressed (LLCBC) MAC architecture with binary stacking is faster generally, leading to superior delay balance amongst the proposed stages.

2. General MAC unit The Multiply and Accumulate Unit play a vital role in WSN. In WSN, constructing a processing unit with energy limited sensing nodes is a great challenge. The main operation required for Wireless Sensor Network (WSN) is filtering. The digital Infinite Impulse Response (IIR) filters, Finite Impulse Response (FIR) filters find its own applications in many fields. The significant challenge to the researchers is that computational complexity and energy source involved in filtering. Most sensor nodes do not possess any computational resources to perform many signal-processing tasks. MAC is one of the important kernel in DSP architectures and energy resource can be saved by means of power saving which in turn increases the computational efficiency and lifetime of WSNs. Every Wireless Communication system require an power, area, efficient MAC Unit. The processor unit in MAC has significant characteristics to influence the speed of the processor. It is the aggregation of accumulator, multiplier [13,14] and adder. The inputs are fed into the multiplier and the result is given to the adder and accumulated in the accumulator. Generally, one clock cycle is needed for completing the process. The MAC unit [6] is efficient for carrying out addition and multiplication for successive preceding product numerous times successively. The general structure of MAC design [6] is provided below in Fig. 1.

Fig. 2. Block diagram of MAC 2C-Conventional.

4. Low Latency Column BIT Compressed (LLCBC-MAC) architecture The Low Latency Column Bit Compressed (LLCBC) MAC architecture consist of counter based Wallace multiplier and carry save adder which is designed in such a way to increase the speed by reducing the latency and thereby reducing power and area. The Carry save adder is well known for multiplier architecture which is used for efficient

Fig. 1. General MAC unit.

740

R. Suguna and R. Vimalathithan

Computer Communications 150 (2020) 739–746

Fig. 4. Block diagram Of MAC 3C.

Fig. 3. Block diagram of MAC 2C-RA (Register Altered).

CMOS implementation of much wider variety of algorithms for high speed digital signal processing. The latency reduction is achieved by reducing the critical path in the XOR gates by using 7:3 and 6:3 counters by means of symmetry stacking concept which is explained in rest of the chapters below. Usually, Multiply-Accumulate (MAC) unit executes MAC instruction, which is generally needed for entire DSP processor. In order to enhance the multiplication operation speed, the step involved in the partial product is also enhanced. This can be carried out in two methods: • Construct partial product in quicker way. • In this work, the implementation of the proposed Low Latency Column Bit Compressed (LLCBC) MAC unit is executed. The proposed Low Latency Column Bit Compressed (LLCBC) MAC unit architecture is superior in terms of speed, latency, area and power. ∙ While comparing the Low Latency Column Bit Compressed (LLCBC) MAC unit with conventional 2-cycle MAC, proposed architecture require stacking based partial product reduction. ∙ While comparing the Low Latency Column Bit Compressed (LLCBC) MAC unit with the conventional 3-cycle MAC, the proposed architecture consists of binary stacking approaches with corresponding clock energy without degrading the system performance and speed.

Fig. 5. Block diagram of MAC 2C-CSK.

Usually in high performance MAC unit, powerful computational capabilities and higher throughput features are considered specifically to act as a superior competitive candidate for wireless sensor networks. Such MAC unit is executed in this work by adopting the symmetric 741

R. Suguna and R. Vimalathithan

Computer Communications 150 (2020) 739–746

Fig. 6. Block Diagram of LLCBC MAC.

For example, parallel 7:3 counter was used to construct Wallace tree multiplier, which is based on high speed counters [15]. In addition to it, multiplexers were used in these circuits to diminish the XOR gates count. Some sort of MUXs can be executed with transmission logic gate to construct higher speed designs [16]. The counting based approach is presented that utilizes bit stacking circuits followed by in co-operating the two smaller stacks to build bigger stacks. XOR gates or Multiplexers are not involved in 6:3 stack based counter [17]. The counter based stacking is ultimately faster than conventional method with minimum power. Bit stacking The 6:3 counters based on symmetric stacking is constructed by means of 3-bit stacker circuit in which ‘1’ bits are grouped in left side by means of OR gates and ‘0’ bits are grouped in the right side by means of AND gate as in Fig. 8. With the help of symmetric technique 6-bits are generated. Three-bit stacking circuit Consider three inputs U0 , U1 , U2 , in which the 3 bit stack circuit will provide three outcomes V0 , V1 , V2 . ‘1’ bits are generated in left side by OR gate and ‘0’ bits in right side by AND gate as shown in Eqs. (1), (2), and (3) as shown in Fig. 7.

Fig. 7. Three bit stacker circuit.

structure using binary stacking. In this section, the proposed architecture is illustrated in detail along with the advantages associated with the Low Latency Column Bit Compressed (LLCBC) MAC unit. To attain greater efficiency, huge amount of bits with equivalent weights are considered. The preliminary approach lies on dealing with huge amount of bit. The single column bits are counted to generate bits with diverse weights. For instance, 7:3 circuit counter holds 7 bits with same weight and determines the number of ‘1’ bits. This output is then provided to 3 bits of increasing weight. 7:3 and 6:3 circuit counter can be generated using half and full adders. XOR gates are the main reason for higher delay of counter circuits over critical path. Henceforth, based on this numerous fast parallel counter architecture has been given.

𝑽𝟎 = 𝑼 𝟎 + 𝑼 𝟏 + 𝑼𝟐

(1)

𝑽𝟏 = 𝑼 𝟎 𝑼𝟏 + 𝑼 𝟎 𝑼𝟐 + 𝑼 𝟏 𝑼𝟐

(2)

𝑽𝟐 = 𝑼 𝟎 𝑼𝟏 𝑼𝟐

(3)

The 𝑽𝟏 output is the complex term which can be generated by means of single complementary MOS gate. Stacking concept The constructing of 6 bit stacking is carried out using the 3 bit stacking and that is discussed in this section. Consider six inputs, namely, U0 , U1 , U2 , U3 , U4 , and U5 . Now partition all the inputs into 2 742

R. Suguna and R. Vimalathithan

Computer Communications 150 (2020) 739–746

before ‘E’ bits. These D0 , D1 , D2 and E0 , E1 , E2 will be stacked by means of two stacking circuits of length 3 bit. Output acquired from those two circuits will be concatenated to generate stack outputs V5 , V4 , V3 , V2 , V1 , V0 . which is shown in Fig. 8. Bit stack to binary number In this the concept of bitstack to binary number is elaborated. For quick and rapid count, intermediate values such as A, B and D are utilized to rapidly evaluate every output bit devoid of the need of bottom layer of stacks. The output bits are represented in binary form of number of ‘1’ input bits. To evaluate S, the parity output of the first 3 stacker bits can be determined easily. Even parity happens in A if ‘0’ or two ‘1’ bits appear in U0 , U1 , U2 . Therefore, Ae and Be signifies even parity in A and B bits, which is given as below (10) and (11): 𝑨𝒆 = 𝑨𝟎 + 𝑨𝟏 𝑨𝟐

(10)

𝑩𝒆 = 𝑩𝟎 + 𝑩𝟏 𝑩𝟐

(11)

Here, ‘S’ signifies odd parity in A and B bits as in Eq. (12), 𝑺=𝑨𝒆 ⊕ 𝑩 𝒆

(12)

Here this single XOR gate does not contribute to critical path delay. In the next step we need to calculate C1 and C2 . C1 is obtained by three steps. If the count output is 2, 3 or 6 then the output C1 is high. The input should consist of at least two inputs and not three inputs. Also input should be from top level stacker otherwise from 2 stacks which is of length 1 and D bits should be all reset. Subsequently, check all six inputs are ‘1’. This can be verified using the three bits of A and B. As it is a bit stack, verify rightmost stack bit which provides A2 B2 . From these, the below given Eq. (13) is attained, ) ( )( (13) 𝑪 1 = 𝑨𝟏 + 𝑩 𝟏 + 𝑨𝟎 𝑩 𝟎 𝑫 𝟎 + 𝑫 𝟏 + 𝑫 𝟐 + 𝑨𝟐 𝑩 𝟐 C2 can be easily computed, and it should be set whenever 4 bit set is available at least.

Fig. 8. Six bit stacker circuit.

𝑪𝟐 =𝑫𝟎 + 𝑫𝟏 + 𝑫𝟐

groups by stacking using 3 bit stacking circuits. Assume U0 , U1 , U2 are stacked into signal naming A0 , A1 , A2 and U3 , U4 , U5 is stacked into B0 , B1 , B2 . At first, reverse first three bits, such that, A2 , A1 , A0 , B0 , B1 , B2 . While doing so, series of ‘1’ bits which will be in line with ‘0’ bits. The proper structure will be in such a way that ‘0’ bits will be in left side to form a proper stack. The 6 bit stack is constructed by generating two more vectors of 3 bits namely D0 , D1 , D2 and E0 , E1 , E2 . The concept behind this is to fill ‘D’ vectors with ones first, before filling E vector. Consider the formula given below (4)–(6): 𝑫𝟎 = 𝑨𝟐 + 𝑩 𝟎

(4)

𝑫𝟏 = 𝑨𝟏 + 𝑩 𝟏

(5)

𝑫𝟐 = 𝑨𝟎 + 𝑩 𝟐

(6)

With the above given equations 6:3 counter circuit efficiency in terms of delay can be improved. The proposed Low Latency Column Bit Compressed (LLCBC) MAC works superior with the aid of 6:3 and 7:3 counters by eliminating XOR Gates in the critical path. One small drawback is the slight increase in wiring complexity. The 6:3 Counter with Symmetric Stacking is shown in Fig. 9. In this investigation, 6:3 counter designs is constructed using VHDL and LLCBC MAC constructed using counters is simulated and synthesized using 90 nm standard CMOS library using cadence SOC encounter thereby reducing critical path delay by reducing XOR gates and hence it enhances the speed. Thus, it works faster than the conventional MAC designs [18]. Therefore, novel counting method through bit stacking facilitates counter construction for increase in substantial performance in LLCBC MAC architecture. The critical path delay in 6:3 counter is computed as seven basic gates starting from 3 bit stacker circuit which consist of two basic gates, after which it comprises 5 basic gates starting from where k bit is taken followed by OR gate, inverter, AND gate and OR gate from where the output C1 is taken. Similarly for stacker based 7:3 counter the critical path is computed in similar manner which consist of seven basic gates along with MUXs. Two MUXs over critical path can be executed with transmission gate logic or GDI, hence it enhances the speed.

In this manner, initial three successive ‘1’ are guaranteed to fill ‘D’ bits, even though an appropriate stack is not formed. Here the similar inputs are used in AND gates to reject the duplication of ‘E’ bits as in Eqs. (7)–(9): 𝑬𝟎 = 𝑨𝟐 𝑩𝟎

(7)

𝑬𝟏 = 𝑨𝟏 𝑩𝟏

(8)

𝑬𝟐 = 𝑨𝟎 𝑩𝟐

(9)

(14)

If The ‘E’ bits will be logic ‘0’ if there is successive ones and AND gates moves three position apart. If any of the E bits is high which implies AND gates are three positions less apart. In general, D0 , D1 , D2 and E0 , E1 , E2 still holds the similar amount of 1’s as total input, however in this case, ‘D’ will be compiled with 1’s

7:3 counter The 7:3 counter with symmetric stacking is shown in Fig. 10. When the last bit is high, then C1 = 1, and the count of input bit should be 743

R. Suguna and R. Vimalathithan

Computer Communications 150 (2020) 739–746

Fig. 9. 6:3 Counter with symmetric stacking.

Fig. 10. 7:3 Counter with symmetric stacking.

744

R. Suguna and R. Vimalathithan

Computer Communications 150 (2020) 739–746

Table 1 MAC unit types and parameter comparison. Operand size

16-bit

32-bit

64-bit

Parameters

MAC 2C-Conventional

MAC 2C-RA

MAC 3C

MAC 2C-CSK

Low Latency Column Bit Compressed (LLCBC) MAC

Area (Cell area) (μm2 )

11 599

13 930

15 079

11 653

2 170

Leakage power (nw)

57735.459

58122.269

72880.207

58133.645

10543.009

Dynamic power (nw)

569358.845

206501.585

436990.303

206467.742

99012.67

Total power (nW)

627094.304

264623.854

509870.509

264601.387

109555.68

Delay (ps)

16 045

15 430

17 812

15 430

8 224

Area (Cell area) (μm2 )

51 583

50 286

62 398

51 546

4 350

Leakage power (nw)

247344.457

242656.02

289932.663

248097.365

20894.625

Dynamic power (nw)

3245200.589

1344930.625

1227185.28

1288584.606

206163.369

Total power (nW)

3492545.046

1587586.646

1517117.95

1536681.971

227057.995

Delay (ps)

32 500

32 912

39 336

32 361

16 221

Area (Cell area) (μm2 )

103165

90 818

124 758

103 093

15 140

Leakage power (nw)

494684.38

10.0617

580012.864

496194.729

73979.193

Dynamic power (nw)

6670748.168

0.9001

3362918.61

2544983.538

1105822.069

Total power (nW)

7165432.548

3133730.984

3942931.47

3041178.267

1179801.262

Delay (ps)

32 499

32 850

39 332

32 361

16 221

one atleast 1 or 3 or 5 and C2 = 1 if the count is 3 at least which are given below.

64-bit (LLCBC) MAC shows 16.67%, 12%, 14.6% and 14.67% reduction in Area, Leakage power is reduced by 73%, 12.75%, 14.9%, 14.95%, Dynamic power reduction is achieved by 48%, 32%, 43%, and 16.5%, Total power reduction is 37.64% 29.9% 38.79% 16.46% and delay is reduced by 49%, 41%, 50%, and 49.9% when compared to MAC 2C-RA, MAC 3C, MAC 2C-CSK, MAC 2C-Conventional respectively.

C1 = (A0 + B0 )D0 D1 D2 + A2 B1 + A1 B2 C2 = E0 E1 E2 Both C1 and C2 are evaluated and MUX is utilized to choose version based on U6 . However, the design uses MUXs over the critical path which is shown in Fig. 10. The critical path delay in 6:3 counter is computed as seven basic gates starting from 3 bit stacker circuit which consist of two basic gates, after which it comprises 5 basic gates starting from where k bit is taken followed by OR gate, inverter, AND gate and OR gate from where the output C1 is taken. Similarly for stacker based 7:3 counter the critical path is computed in similar manner which consist of seven basic gates along with MUXs. Two MUXs over critical path can be executed with transmission gate logic or GDI, hence it enhances the speed.

6. Conclusion Energy efficient computation is complex for enhancing the performance in case of power limited systems. This investigational work has illustrated about LLCBCMAC unit which can carry out extensive data and complex computational operations in VLSI technologies. Here, Low Latency Column Bit Compressed LLCBC MAC Architecture is designed using 6:3 and 7:3 counters based on symmetric stacking to sum up the partial products. For instance, the 16 bit LLCBC MAC shows 85.6% area efficient, total power reduction is greatly achieved by 82.5% and delay is reduced by 53.8% when the proposed architecture is simulated and synthesized in Cadence SOC Encounter and the GDS-II is also extracted for all the LLCBC MAC. The proposed LLCBCMAC design can be greatly utilized for WSN application for the purpose of spectrum sensing. In this modern era, there arises huge problem of scarcity in spectrum due to emerging new technologies. Hence, Spectrum space should be utilized in proper way by means of spectrum sensing technique. This Fast LLCBC Mac can be utilized in Spectrum Sensing which may shows effectual reduction in power, area and delay, which can be further applied in the energy detector part of spectrum sensing in order to utilize the spectrum holes effectively.

5. Numerical results and discussions The LLCBC MAC is constructed using fast 6:3 and 7:3 counters and simulated and synthesized using Cadence RTL compiler whose results are tabulated with the performance factors and compared with the existing model which is shown in Table 1. In this investigation 5 MAC designs are considered, they are: MAC 2C-RA, MAC 3C, MAC 2C-CSK, MAC 2C-Conventional and Low latency column bit compressed MAC unit as in Fig. 11. The 16-bit (LLCBC) MAC shows 15.5%, 14.39%, 18.65%, 18.7% reduction in Area, Leakage power is reduced by 18%, 14.4%,18%, 18.26%, Dynamic power reduction is achieved by 48%, 22%, 47%, 17%, Total power reduction is reduced by 41%, 21% 41% 17% and delay is reduced by 53%, 46%, 53%, 51% when compared to MAC 2CRA, MAC 3C, MAC 2C-CSK, MAC 2C-Conventional respectively. The 32-bit (LLCBC) MAC shows 8.6%, 6.9%, 8.4% and 8.4% reduction in Area Leakage power is reduced by 8.6%, 7.2%, 8.4%, and 8.4%, Dynamic power, reduction is achieved by 15.3%, 16.7%, 15.9%, and 6.2%, Total power reduction is 14.3% 14.9% 14.7% 6.5% and delay is reduced by 49%, 41.2%, 50.1%, 49% when compared to MAC 2CRA, MAC 3C, MAC 2C-CSK, MAC 2C-Conventional respectively. The

Declaration of competing interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper. 745

R. Suguna and R. Vimalathithan

Computer Communications 150 (2020) 739–746 [3] R. Pawar, Review on multiply-accumulate unit, Int. J. Eng. Res. Appl. 7 (2017) 09–13, http://dx.doi.org/10.9790/9622-0706040913. [4] K. Taoumanis, S. Xydis, C. Efstathiou, N. Moschopoulos, K. Pekmestzi, An optimized modified booth recorder for efficient design of the add-multiply operator, IEEE Trans. Circuits Syst. 61 (4) (2014) 1133–1143, http://dx.doi.org/ 10.1109/TCSI.2013.2283695. [5] Ku. Shweta, N. Yengade, Review on design of low power multiply and accumulate unit using Baugh–Wooley based multiplier, Int. Res. J. Eng. Technol. 04 (2) (2017) 1–5. [6] N. Priyanka, Multiplier-accumulator (MAC) unit, Int. J. Digit. Appl. Contemp. Res. 5 (3) (2016) 1–4. [7] K. Swathi, Low latency mac designfor low power DSP applications, Int. J. Sci. Eng. Adv. Technol. 5 (7) (2017) 840–844. [8] T.T. Hoang, P. Larsson-Edefors, A high-speed, energy-efficient two-cycle multiplyaccumulate (MAC) architecture and its application to a double-throughput MAC unit, IEEE Trans. Circuits Syst. 57 (12) (2010) 3073–3081, http://dx.doi.org/10. 1109/TCSI.2010.2091191. [9] V. Vimal Raj, Low power and area efficient 2C multiply-accumulate unit and its application to a DTMAC unit, Int. J. Adv. Res. Comput. Sci. 3 (7) (2012) 45–48. [10] C.N. Marimuthu, P. Thangaraj, Low power high performance multiplier, ICGST-PDCS 8 (1) (2008) 31–38. [11] S. Asif, Y. Kong, Design of an algorithmic wallace multiplier using high speed counters, in: Proceedings of the IEEE Computer Engineering & Systems (ICCES), 2015, pp. 133–138, http://dx.doi.org/10.1109/ICCES.2015.7393033. [12] S. Asif, Y. Kong, Analysis of different architectures of counter based wallace multipliers, in: Proceedings of the 10th International Conference on Computer Engineering & Systems, ICCES, 2015, pp. 139–144. [13] A. Dandapat, S. Ghosal, P. Sarkar, D. Mukhopadhyay, A 1.2-ns 16 × 16-bit binary multiplier using high speed compressors, Int. J. Electron. Electr. Eng. 4 (3) (2010) 234–239. [14] R.P. Rajput, M.N.S. Swamy, High speed modified booth encoder multiplier for signed and unsigned numbers, in: UKSim 14th International Conference on Computer Modelling and Simulation, 2012, pp. 649–654, http://dx.doi.org/10. 1109/UKSim.2012.99. [15] S. Veeramachaneni, L. Avinash, M. Krishna, M.B. Srinivas, Novel architectures for efficient (m, n) parallel counters, in: Proceedings of the 17th ACM Great Lakes Symposium on VLSI, 2007, pp. 188–191, http://dx.doi.org/10.1145/1228784. 1228833. [16] S. Mukherjee, Energy efficient multiplier for high speed DSP application, Int. J. Comput. Sci. Mobile Comput. 4 (6) (2015) 66–75. [17] P. Aliparast, Ziaadin D. Koozehkanani, Farhad Nazari, An ultra high speed digital 4-2 compressor in 65-nm CMOS, Int. J. Comput. Theory Eng. 5 (4) (2013) http://dx.doi.org/10.7763/IJCT. [18] T. Shaikh, FPGA implementation of multiply accumulate (MAC) unit based on block enable technique, Int. J. Innov. Res. Comput. Commun. Eng. 3 (4) (2015) 188–191.

R. Suguna

Vimalathithan Rathinasabapathy Fig. 11. Normalized values of power, delay and area for 16, 32 and 64 bit LLCBC MAC.

References [1] A. Abdelgawad, High speed and area-efficient multiply accumulate (MAC) unit for digital signal processing applications, 2007, pp. 3199–3202. [2] R. Malleshwari, FPGA implementation of low power and high speed 64-bit multiply accumulate unit for wireless applications, Int. J. Sci. Res. 5 (4) (2016) 1462–1467.

746