Vedic algorithm for cubic computation and VLSI implementation

Vedic algorithm for cubic computation and VLSI implementation

Engineering Science and Technology, an International Journal xxx (2017) xxx–xxx Contents lists available at ScienceDirect Engineering Science and Te...

2MB Sizes 0 Downloads 39 Views

Engineering Science and Technology, an International Journal xxx (2017) xxx–xxx

Contents lists available at ScienceDirect

Engineering Science and Technology, an International Journal journal homepage: www.elsevier.com/locate/jestch

Full Length Article

Vedic algorithm for cubic computation and VLSI implementation Deepak Kumar a,⇑, Prabir Saha b, Anup Dandapat b a b

Department of Computer Science and Engineering, National Institute of Technology Meghalaya, Shillong 793003, Meghalaya, India Department of Electronics and Communication Engineering, National Institute of Technology Meghalaya, Shillong 793003, Meghalaya, India

a r t i c l e

i n f o

Article history: Received 24 May 2017 Revised 10 August 2017 Accepted 3 October 2017 Available online xxxx Keywords: Anurupyena Cubic High speed Vedic mathematics Yavadunam sutra (YVDN)

a b s t r a c t Algorithm of cubic computation and its VLSI implementation is described in this paper through ‘Vedic mathematics’ formulae. An N-bit cubic implementation circuit was structured into two cubic subgroups (bit length = N2 or lesser), multiplier and adder. VLSI aspects such as propagation delay and dynamic power consumption of such circuitry were lessened down notably by reducing the number of partial products. Designs implementation and estimation of performance parameters: delay and power consumption were figured out by spice spectre with 90 nm CMOS technology. The estimated values for propagation delay and power consumption of the reported 8-bit cubic circuitry were 5.5 ns and 2.6 mW respectively. Propagation delay has been enhanced by 12% and power consumption dropped down by 22% in comparison to its counterpart (traditional architecture). Ó 2017 Karabuk University. Publishing services by Elsevier B.V. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).

1. Introduction Exponentiation, the methodology of raising power of number is a frequent operation in scientific computing and digital signal processing [1,2]. Additionally, iterative techniques like NewtonRaphson and Taylor series expansion are important for the computation of reciprocal, square root and other elementary functions. Moreover, higher-order function approximations reduce the number of iterations aiming to achieve utmost precision. Using fast and efficient parallel exponentiation units reduces the number of iterations, thereby improves the overall latency of the computation of elementary functions [3,4]. Numerous methods have so far been implemented to speed up the cubic circuitry [1–6]. Traditionally a cubic circuitry can be realized by multiplying with itself [1]. E.g., an n-bit number, by multiplying with itself produces a 2n-bit number; followed by the multiplication of the same generates 3n-bit result. Therefore, different length multiplier like (n  n) and (2n  n) are playing crucial role for handling partial products [5] facilitating cube [5] architecture. As an alternative, parallel rearrangement [2,4], grouping [4] have been developed for the partial products arrangement to improve the VLSI aspects, but, the computation strategy is same in all cases [1–6]. A renowned mathematical system ‘Vedic

⇑ Corresponding author. E-mail addresses: [email protected] (D. Kumar), [email protected] (P. Saha), [email protected] (A. Dandapat).

mathematics’, which offered the natural ways of algebraic problem solving technique [7,8]. Yavadunam formula (YVDN) (English translation is ‘whatever the extent of its deficiency’) and Anurupyena (English translation is ‘Proportionately’) [8] were utilized to devise cubic computation methodology. Binary implantation of an ‘N’ bit cubic architecture has been investigated through one smaller cubic unit (bit length  N), multiplier unit (bit length  N) and adder unit. ‘Anurupyena’ technique was employed to eliminate the redundant partial products, while, YVDN methodology was utilized to transform the multiplication operation in terms of shift operation. UrdhvaTiryagbyham-Vertically and crosswise) sutra is useful to implement the multiplication as well as cubic operation. Meheta et al. [9] has proved that multiplier implementation based on (UrdhvaTiryagbyham) sutra and traditional implementations are same. Some other-sutra like Nikhilam Navatashcaramam Dashatah (All from 9 and the last from 10) [8,10] are useful only for the special case multiplication, which have been used in this paper for multiplication purpose. Parallel implementation methodology has been utilized to achieve high speed operation. Algorithmic optimization has been incorporated to miniaturize the partial products significantly. Simplified structural design along-with regular array realization has been utilized aiming to ample improvement in performances. VLSI aspects as a function of delay and power have been examined through spice with 90 nm CMOS technology and counterpart comparison has been carried out. The reported circuitry is presented 5.5 ns delay along-with 2.6 mW power consumption.

Peer review under responsibility of Karabuk University. https://doi.org/10.1016/j.jestch.2017.10.001 2215-0986/Ó 2017 Karabuk University. Publishing services by Elsevier B.V. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).

Please cite this article in press as: D. Kumar et al., Vedic algorithm for cubic computation and VLSI implementation, Eng. Sci. Tech., Int. J. (2017), https://doi. org/10.1016/j.jestch.2017.10.001

2

D. Kumar et al. / Engineering Science and Technology, an International Journal xxx (2017) xxx–xxx

2. Review of applicable formulae Vedic formulae offered the algebraic problem solution techniques analogous to mental calculation to generate fast answers. In this paper, YVDN and Anurupyena sutras (formulae) have been addressed for cubic implementation. 2.1. YVDN method The sutra signifies ‘Whatever the extent of its deficiency’ [8] is relevant for cubic computation. 2.1.1. Examples Computing examples for decimal number system have been elaborated in Fig. 1 through YVDN sutra followed by an explanation specified in Table 1. 2.1.2. Mathematical derivation Xn has been considered as an ‘n’ digit unsigned number.

Xn ¼

n1 X xi 10i where xi 2 f0; 1; . . . 9g

Table 1 Cube calculation steps, (i) example number close to BO, explained in Fig. 1(a); (ii) example number is not close to BO, explained in Fig. 1(b). (i) number is close to the BO

(ii) number is not close to the BO

1) Here considered BO = 1000. Subtract 993 from 1000, result is 7 2) Calculate 1st term, cube of 7 i.e. 73 = 343 3) Compute the 2nd term, 3  72 = 147. Since BO = 103, here power = 3, so 147 is left shifted by 3 positions 4) Calculate the 3rd term by subtracting 2  7 from the actual number, i.e. (993  14) = 979 and shifting by power  2 = 3  2 = 6 positions left

1) Here considered BO = 500. Subtract 500 from 521, result is 21 2) Calculate 1st term, cube of 21 = 9261 3) Calculate the 2nd term, 3  5  212 = 6615, Since BO = 500 (5  102), here power = 2, so 6615 is left shifted by 2 positions 4) Calculate the 3rd term by adding 2  21 to actual number, i.e. (521 + 2  21 = 563) then multiply it with 5power = 52 i.e. 563  52 = 14075 and shifting by power  2=2  2 = 4 positions left 5) Finally result = 3rd term + 2nd term + 1st term (‘+‘since BO was subtracted from actual number) i.e., the result is equal to 141420761.

5) Finally result = 3rd term + 2nd term – 1st term (‘‘ since actual number 993 was subtracted from BO) i.e., the result is equal to 979146657.

ð1Þ

i¼0

The Eq. (1) can be rewritten as:

Xn ¼

8 > > < > > :

10n  ðXÞ

BO ¼ 10n

n1 2

n1 X X xi 10i  xi 10i i¼2n

BO ¼

n1 X

xi 10i

ð2Þ

i¼n2

i¼0

(a)

(a)

(b) Fig. 2. (a) Algebraic proof of anurupyena sutra(formula); (b) Numerical example of anurupyena sutra with decimal number system.

(b) Fig. 1. Examples: Cube of a number employing YVDN sutra(formula) (a) while base of operation (BO) is near to the number (BO: 1000); (b) while BO is far away from the number (New BO: 500).

X 3n ¼

8 3 > > 102n ðX  2XÞ þ 3  10n X  ðXÞ ; > > > ! !2 > n1 n1 > n1 n1 > 2 2 X X X X > > < x2i 102i X  2 xi 10i þ 3 xi 10i xi 10i > > > > > > > > > > :

i¼n2

i¼0

i¼0 n1 2



X xi 10i i¼0

BO ¼ 10n

ð3Þ

i¼0

!3 ;

n1

BO ¼ mn xi 10i i¼2

Please cite this article in press as: D. Kumar et al., Vedic algorithm for cubic computation and VLSI implementation, Eng. Sci. Tech., Int. J. (2017), https://doi. org/10.1016/j.jestch.2017.10.001

D. Kumar et al. / Engineering Science and Technology, an International Journal xxx (2017) xxx–xxx

Let us assume that subtraction of the number X n from base of operation (BO) gives its complement X n . Fig. 1(a) has been considered for simplicity, where BO is 1000 and number for which cube has to be calculated is 993, so, X ¼ 7. 2.2. Anurupyena method The meaning of ‘anurupyena sutra’ in English is ‘proportionately formula’ and can be applied to all of the cubic operations [8]. 2.2.1. Examples Algebraic proof of anurupyena formula has been shown in Fig. 2 (a) and its depiction with a decimal value (i.e., 23) is given in Fig.2 (b). For example, to calculate the cube of a decimal value 23, the steps to be executed will be as follow (where, x = 2, and y = 3):

Step 3: Compute 2xy2 (2  2  32 = 36). Put the value one column before the column of digit ‘7’, as depicted in Fig. 2(b). Step 4: Compute x2y (22  3= 12). Put the value one column before the column of digit ‘6’, as depicted in Fig. 2(b). Step 5: Compute 2x2y (2  22  3 = 24). Put the value one column before the column of digit ‘6’, as depicted in Fig. 2(b). Step 6: Finally compute cube of the left most digit, i.e., 23 = 8, and keep at proper position as depicted in Fig. 2(b). 2.2.2. Mathematical derivation Mathematical formulation of sutra (formula) is shown in Eqs. (4)–(8). Where decimal number is Xn [Eq. (1)] and its cube is X 3n , The Eq. (1) can be rewritten as:

Xn ¼

n1 n2 X X xi 10i ¼ xn1 10n1 þ xi 10i i¼0

Step 1: First compute cube of the right most digit, i.e., 33 = 27. Step 2: Compute xy2 (2  32 = 18). Put the value one column before the column of digit ‘7’, as depicted in Fig. 2(b).

3

ð4Þ

i¼0

X n ¼ xn1 10n1 þ X n1

ð5Þ

Fig. 3. Flow chart diagram of proposed cubic function implementation procedure.

Please cite this article in press as: D. Kumar et al., Vedic algorithm for cubic computation and VLSI implementation, Eng. Sci. Tech., Int. J. (2017), https://doi. org/10.1016/j.jestch.2017.10.001

4

D. Kumar et al. / Engineering Science and Technology, an International Journal xxx (2017) xxx–xxx

where, X n1 ¼ lated as:

Pn2 i¼0

xi 10i , Then the cubic operation can be formu-

X 3n ¼ ðxn1 10n1 þ X n1 Þ

3

3

X 3n ¼ ðxn1 10n1 Þ þ X 3n1 þ 3xn1 10n1 X n1 ðxn1 10n1 þ X n1 Þ

ð6Þ

4. Results and discussions

ð7Þ

Design and simulation of the reported algorithm up to 8-bits were checked using Spice spectre simulator with 90 nm CMOS operating at 250 MHz supplied by 1 volt voltage. Performance parameter comparisons between proposed architecture and the conventional ones are depicted in Table 2. Appropriate optimizations have been taken at circuits and structural design levels to evaluate the performance in terms of delay, power consumptions and product of them. Performance parameter products like energy delay product (EDP) and power delay product were also calculated for other architectures and shown in Table 2. EDP (1021)J-s and PDP (1012)J represents quantitative measurements and show the association of the speed with power following technology miniaturization. Inputs as a function of number of bits were taken in regular fashion. For transitional calculation, measurement of propagation delay is taken from 50% of the input voltage swing to 50% of the output voltage swing. For comparative study, different implementation methodology as reported in different references [4] were taken and implemented in similar environment (spice spectre with standard 90nm CMOS technology). The worst case delay and power values have been measured as a function of possible bit combinations. Table 2 illustrates that 8  8  8 bit

Similarly, 3

3

X 3n ¼ ðxn1 10n1 Þ þ ðxn2 10n2 Þ þ X 3n2 þ 3xn1 10n1 ðxn2 10n2 þ X n2 Þðxn1 10n1 þ xn2 10n2 þ X n2 Þ þ 3xn2 10n2 X n2 ðxn2 10n2 þ X n2 Þ

included for implementation of high order compressors. In the end, the final result has been computed by the parallel adder [12].

ð8Þ

3. Architecture design For cube computation, reduction in performance parameters: propagation delay, power consumption and layout area has been achieved by using Vedic multiplier and squarer. In this work, design for function specific fast cubic circuitry has been targeted to achieve low propagation delay, low switching power consumption and reduced area. Vedic mathematics [8] after reformulation for binary number has been used for the proposed architecture to achieve the goal. With the help of optimization technique in the proposed design, the proposed architecture obtained noteworthy improvements as a function of performance parameters. Flow chart illustration for cube calculation of a binary number is depicted in Fig. 3. YVDN formula for cube has been integrated with simpler cubic calculation function (realized through ‘anurupyena’) to realize the target. Besides, addition or subtraction is used for the realization of such functions wherever possible. Combinational shift register circuitry has been used to insert the partial results at appropriate places and then concatenating them to compute the final result. As shown in flow chart diagram, an n-bit binary input number (N) has been subtracted from 2n (BO), and if difference is not less than 2n/2  1 then input number is another time subtracted from 2n1 (BO) to compute the difference. Two examples were demonstrated in Fig. 4 for clarification of the process. 3.1. Implementation Implementation features of the reported method have been discussed in this section. Fig. 5(a) illustrates a functional depiction of the proposed architecture. Preprocessing unit in this architecture encloses base of the operation selection module [9] which is shown in Fig. 5(b). In this unit, values of (2n – input number) and (input number – 2n1) are first calculated in parallel. In addition, partial result has been deducted from 2n=2  1 and carry is taken for further process [9]. Input number was parted into most significant portion (MSP) i.e. L and least significant portion i.e. R as formulated in Eqs. (2) and (3) and also shown in Fig. 3. Then as shown in flowchart diagram, BO was decided. Elementary cubic circuit has been produced employing the ‘anurupyena’ sutra. 3.2. Elementary cubic structure Cube computation for a 4-bit number produces 64 partial products as given in Fig. 6. Partial products have been minimized using binary arithmetic to get the reduced sets as shown in stage 2 of Fig. 6. The number of full adders and half adders, utilized for partial product minimization, can be reduced by using special adders (known as compressors [11–13]), which are able to add more bits (>3) during same time. Binary counter characteristics have been

Fig. 4. Cubic circuitry implementation example through the flow chart diagram of Fig. 3. (a) when number is close to BO (b) when number is not close to BO.

Please cite this article in press as: D. Kumar et al., Vedic algorithm for cubic computation and VLSI implementation, Eng. Sci. Tech., Int. J. (2017), https://doi. org/10.1016/j.jestch.2017.10.001

D. Kumar et al. / Engineering Science and Technology, an International Journal xxx (2017) xxx–xxx

5

Fig. 5. Hardware realization of proposed cubic circuit (a) block diagram of proposed architecture (b) block diagram of associated BO generation unit.

cubic needs 5.51 ns to propagate signal from input to output and consumes 2.61mW power. Reported architecture achieved 15%, 9%, faster operation (propagation delay) than traditional and grouping architecture respectively. It was further noticed that reported architecture required 23%, 14% less power in comparison to the same architectures respectively. Schematic (layout) of 8-bits cubic circuitry is shown in Fig. 7, having the area of 0.063 mm2.

5. Conclusions A high speed, low power cubic implementation circuitry devised from ancient ‘Vedic mathematics’ formulae has been

reported in this paper. Incorporating the Boolean algebra with Vedic mathematics, transformation of an N-bit cubic circuitry into one small cubic (bit length  N) and other associated circuit components like squarer and multiplier has been achieved. Dividing a general cubic operation into two very smaller cubic operations and computing in parallel, the goal of high speed operation avoiding direct partial products generation has been achieved. The revealed approach involving simplified arithmetic computations, in conjunction with array-like structure offered substantial improvement in performance. Implementation at the transistor level was performed in Spice and comparisons of the result were done with grouping and traditional architectures. Significant improvement in speed (15%) and appreciable low power consumption (23%) have been achieved by the proposed architecture

Please cite this article in press as: D. Kumar et al., Vedic algorithm for cubic computation and VLSI implementation, Eng. Sci. Tech., Int. J. (2017), https://doi. org/10.1016/j.jestch.2017.10.001

6

D. Kumar et al. / Engineering Science and Technology, an International Journal xxx (2017) xxx–xxx

Fig. 6. Generated Partial products for the computation of ‘4-bit’ cubic of a number.

Table 2 Performance comparison in terms of propagation delay (ns), avg. dynamic power consumption (mW), EDP (1021) J-s, PDP (1012) J, % improvements in respect of delay and power consumption. (* indicates traditional architecture, and Grou indicates grouping architecture). Number of bits

Considered Architecture

Propag. Delay

Dynamic Power

EDP

PDP

Improvement in delay (%)

Improvement in power (%)

4

Tra.* Grou. [4] Proposed

3.82 3.4 3.08

1.2 1.1 0.9

18 13 9

4.6 3.7 2.7

19 9.5 –

25 18 –

8

Tra.* Grou. [4] Proposed

6.48 6.07 5.51

3.39 3.06 2.61

142 113 79

22 19 14

15 9.2 –

23 14 –

Fig. 7. Schematic (layout) of ‘8-bit’ cubic circuitry.

in comparison to traditional one, due to the substantial reduction in partial product generation stages. Furthermore, reported architecture offered 9% improvement in speed and about 14% improvement in power consumption compared with grouping based [4] architecture. References

[2] H. Thapliyal, S. Kotiyal, M.B. Srinivas, Design and analysis of A novel parallel square and cube architecture based on ancient indian vedic mathematics, in: 48th Midwest Symp. on Circuits and Systems, KY, USA, 2005, pp. 1462–1465. [3] A.A. Liddicoat, M.J. Flynn, Parallel square and cube computations, in: ThirtyFourth Asilomar Conf. on Signals, Systems and Computers, Pacific Grove, CA, USA, 2000, pp. 1325–1329. [4] J.E. Stine, J.M. Blank, Partial product reduction for parallel cubing, in: IEEE Computer Society Annual Symp. on VLSI, Porto Alegre, 2007, pp. 337–342. [5] A. Deshpande, J. Draper, Comparing squaring and cubing units with multipliers, in: 55th Int. Midwest Symp. on Circuits and Systems (MWSCAS), Boise, ID, 2012, pp. 466–469. [6] A.G.M. Strollo, D.D. Caro, N. Petra, Elementary Functions Hardware Implementation Using Constrained Piecewise-Polynomial Approximations, IEEE Trans. Comput. 60 (3) (March 2011) 418–432. [7] P.D. Chidgupkar, M.T. Karad, The implementation of vedic algorithms in digital signal processing, Global J. Eng. Edu. 8 (2) (2004) 153–158. [8] J.S.S.B.K.T. Maharaja, Vedic Mathematics, Motilal Banarsidass Publishers Pvt. Ltd, New Delhi, 2001. [9] P. Mehta, D. Gawali, Conventional versus Vedic Mathematical Method for Hardware Implementation of a Multiplier, in: Int. Conference on Advances in Computing, Control, and Telecommunication Technologies, Trivandrum, Kerala, 2009, pp. 640–642. [10] P. Saha, D. Kumar, P. Bhattacharyya, et al., Design of 64-bit squarer based on Vedic mathematics, J. Circuits Syst. Comput. 22 (8) (2014). 1450092:1-19. [11] I. Jahangir, A. Das, M. Hasan, Design of novel quaternary encoders and decoders, in: Int. Conf. on Informatics, Electronics & Vision, Dhaka, 2012, pp. 1021–1026. [12] P.K. Saha, A. Banerjee, A. Dandapat, High speed low power complex multiplier design using parallel adders and subtractors, Int. J. Electron. Electric. Eng. 07 (11) (2009) 38–46. [13] C.H. Chang, J. Gu, M. Zhang, Ultra low-voltage low-power CMOS 4-2 and 5-2 compressors for fast arithmetic circuits, IEEE Trans. on Circuits and Systems- I 51 (10) (2004) 1985–1997.

[1] V. Kunchigi, L. Kulkarni, S. Kulkarni, Low power square and cube architectures using vedic sutras, in: Proc. of Fifth Int. Conf. on Signal and Image Processing (ICSIP), Jeju Island, 2014, pp. 354–358.

Please cite this article in press as: D. Kumar et al., Vedic algorithm for cubic computation and VLSI implementation, Eng. Sci. Tech., Int. J. (2017), https://doi. org/10.1016/j.jestch.2017.10.001