Transistor and pin reordering for leakage reduction in CMOS circuits

Transistor and pin reordering for leakage reduction in CMOS circuits

Microelectronics Journal 53 (2016) 25–34 Contents lists available at ScienceDirect Microelectronics Journal journal homepage: www.elsevier.com/locat...

2MB Sizes 0 Downloads 77 Views

Microelectronics Journal 53 (2016) 25–34

Contents lists available at ScienceDirect

Microelectronics Journal journal homepage: www.elsevier.com/locate/mejo

Transistor and pin reordering for leakage reduction in CMOS circuits Jae Woong Chun a,n, C.Y. Roger Chen b a b

Department of Electrical and Electronic Engineering, Anyang University, Gyeonggi-do 430-714, Korea Department of Electrical Engineering and Computer Science, Syracuse University, Syracuse, NY 13244, USA

art ic l e i nf o

a b s t r a c t

Article history: Received 1 August 2014 Received in revised form 16 February 2016 Accepted 9 April 2016

Leakage power is currently a critical problem in nanometer-scale CMOS circuit technology. In this paper, a novel reordering method for reducing the overall leakage currents is proposed for CMOS logic gates, including CMOS complex gates. This new method takes into account the subthreshold leakage current (ISUB) and gate leakage current (IG) and includes the often-ignored reverse gate tunneling current (IRG). Additionally, this method considers the interaction between leakage components based on the stacking/ non-stacking effect case and different W /L ratios of an on-/off-transistor block in a stack. Thus, unlike existing approaches, the proposed method can generate the best configuration for leakage reduction even in CMOS complex gates, and can be used in combination with other leakage reduction techniques to achieve further improvement. & 2016 Elsevier Ltd. All rights reserved.

Keywords: Transistor reordering Pin reordering Gate leakage Subthreshold leakage Leakage power reduction Power state dependency

1. Introduction In the nanometer-scale CMOS technology era, leakage power has become a major component of the total power dissipation due to the downscaling of threshold voltage (Vth) and gate oxide thickness (Tox). Leakage power consumption has received even more attention with the increasing demand for mobile devices. Because mobile devices spend a majority of their time in a standby mode, the leakage power savings in standby states is critical to extend battery life. For this reason, low power consumption has become a major factor in designing CMOS circuits. Thus, many techniques have been proposed to minimize leakage power dissipation. The dual Vth [1–3] approach uses higher Vth devices along non-critical paths to reduce leakage power, while lower Vth devices applied to the critical path maintain performance. The multi-Vth CMOS (MTCMOS) [4–6] technique places high-Vth sleep transistors in series with low-Vth circuitry for leakage reduction. The input vector control (IVC) [7–9] technique is presented to reduce leakage during standby mode by applying a minimum leakage input vector to the primary inputs of a circuit block. Several research efforts [10–12] related to transistor reordering have been conducted and reported. The aim of these techniques is to minimize dynamic power consumption during active mode instead of reducing the leakage power during standby mode. n

Corresponding author. E-mail addresses: [email protected] (J.W. Chun), [email protected] (C.Y.R. Chen). http://dx.doi.org/10.1016/j.mejo.2016.04.005 0026-2692/& 2016 Elsevier Ltd. All rights reserved.

Pin reordering [13] and input/transistor reordering [14] techniques are proposed to reduce gate leakage and the effect of transistor aging, respectively. However, these methods are not sufficiently accurate to analyze the overall leakage reduction through transistor reordering in CMOS circuits because they did not take into account the IRG interaction with IG and ISUB or the impacts of stacking effects and different W /L ratios of transistors on the overall leakage savings. It will be demonstrated in this paper that, in many cases, the reordering method for leakage reduction without considering IRG leads to misleading results when the concept of the existing reordering method is applied to CMOS complex gates. Additionally, there are limitations of using the pin reordering in many cases, and thus, transistor reordering is inevitable. For this reason, a better understanding and a more accurate method of transistor reordering is essential. In this paper, we propose a novel reordering method for overall leakage reduction in CMOS circuits considering all components of IG, including IRG, as well as interactions between ISUB and IG in various conditions with different numbers and W /L ratios of on-/off-transistor blocks in a stack. This method provides direct and accurate algorithms for implementing a minimal leakage structure in CMOS logic gates, which leads to the reduction of the overall leakage dissipation of CMOS circuits. The organization of this paper is as follows. In Section 2, we discuss the state dependence of leakage current in various conditions for a single transistor and the low leakage input states. Next, we analyze the effect of pin reordering on IG and ISUB and note the problems and limitations in association with existing pin

26

J.W. Chun, C.Y.R. Chen / Microelectronics Journal 53 (2016) 25–34

reordering technique when they are applied to CMOS complex gates in Section 3. In Section 4, the proposed transistor reordering method for leakage reduction in complex gates is described. Experimental results, using the ISCAS85 benchmark circuits, are presented in Section 5. Finally, Section 6 concludes the paper.

2. Leakage current and input state dependence The main components of the leakage current are ISUB and IG. The total leakage power has increased enormously in nanometerscale CMOS devices due to the scaling of the Vth and Tox in each technology generation. 2.1. Leakage currents of MOS transistors ISUB is a current flowing between drain and source nodes of an off MOS transistor when the gate voltage is below Vth. The IG of a MOS transistor depends on the biased voltages between the gate node and the other three nodes: gate-to-source voltage (VGS), gateto-drain voltage (VGD) and gate-to-body voltage (VGB). IG can be divided by the current flows: gate to source (IGS), gate to drain (IGD), and gate to body (IGB). Thus, IG is equal to the sum of IGS, IGD and IGB. Fig. 1 shows the major leakage currents of typical steady states [15] in NMOS (S1N − S5N ) and PMOS (S1P − S5P ) transistors. The body (substrate) node (B) of NMOS and PMOS connected to ground and VDD, respectively. The body and gate (G) nodes have full logic values (either VDD or ground) in steady state conditions, while the other nodes (drain (D) and source (S)) have either full or close to full logic values depending on circuit structures. The logic values “1” and “0” are represented by the high and low level of each terminal node, respectively. The direction of current flows for CMOS transistors is determined by biased voltage between the nodes. Among steady states, IRG exists in S3N / P and S5N / P . To gain a better understanding of leakage behavior, we present five distinct types of steady state based on leakage components of a single transistor: least leakage state (SLEAST: S1N and S1P ), most gate leakage state (SGATE: S2N and S2P ), subthreshold leakage state (SSUB: S3N and S3P ), least gate leakage state (SLG: S4N and S4P ), and reverse gate leakage state (SRG: S5N and S5P ). 2.2. Input state dependence The total leakage current of a CMOS circuit is dependent on the applied input vector. For instance, leakage currents of a 2-input NAND gate are listed in Table 1. Unless otherwise noted, all of the experiments are conducted by HSPICE using the 32 nm technology

Table 1 Leakage current (nA) of 2-input NAND gate. Input

ISUB

IG

ITotal

00 01 10 11

0.726 24.640 6.869 82.824

0.440 1.692 0.118 2.566

1.166 26.332 6.987 85.390

PTM [16] model with VDD ¼0.9 V, and the channel width/length ( W /L ) ratio and P /N ratio are designed as 4 and 2, respectively. Among possible states in a 2-input NAND gate, the total leakage is the lowest when both inputs are zero (“00”) due to the stacking effect [17]. However, when both inputs are one (“11”), the pull-up network (PUN) has two off transistors in parallel, forming less resistance than the series case, which leads to the leakiest state. There is only one off transistor in a pull-down network (PDN) when the inputs are either (“10”) or (“01”). The input (“10”) is less leaky than (“01”) due to drain-induced barrier lowering (DIBL). In CMOS logic gates, there are two types of input vectors; one produces the same gate input state (either high or low logic value presents in gate nodes: e.g., “11” or “00” in NAND2 gate), and the other produces the different gate input states (both high and low logic values present in gate nodes: e.g., “01” or “10” in NAND2 gate). Note that, in this paper, we do not take into account the input vector that yields the same gate input state of CMOS logic gates because pin/transistor reordering does not have any effect on the components of steady states.

3. Analysis of the effects of pin reordering on the leakage current In this section, we analyze the effect of pin reordering on the IG and ISUB of typical CMOS logic gates, such as NAND, NOR, XOR, XNOR, AOI and OAI. Under different gate input states in an offstate network (OSN) of CMOS logic gates, SSUB ( S3N and S3P ) can be divided into two cases (SSUB1 and SSUB2) based on correlation with the conducting transistor. To demonstrate this, consider the seriesconnected parts of a NAND or NOR gate, which are in an off-state when input vectors are “10” and “01”, as shown in Fig. 2. When the source nodes voltage (VS) of SSUB1 and SSUB2 are the same in S 3N /S 3P , the leakage current of SSUB1 is less than that of SSUB2 because the drain-to-source voltage (VDS) and VGD of SSUB2 are approximately Vth higher than SSUB1 as shown in Fig. 2. Similarly, there are two types of conducting transistors (SGATE (S 2N /S 2P ) and SLG (S 4N /S 4P )); the type depends on the correlation with SSUB. SGATE presents when the conducting transistor is located below (above)

Fig. 1. Typical bias conditions for CMOS transistors in a circuit.

J.W. Chun, C.Y.R. Chen / Microelectronics Journal 53 (2016) 25–34

27

Table 2 Leakage current (nA) and steady state components for AOI-21 and OAI-21 logic gates in an off-state network (OSN). Gate type OSN Input (ABC)

Type

Steady state type (A B C)

ITotal

OAI-21

Type1 Type2 Type1 Type2 Type1 Type2 Type1 Type2 Both types

S SUB1 SSUB2 S SUB1 SSUB2 S SUB1 SSUB2 SGATE SLG SSUB2

7.44 26.28 7.44 26.28 7.63 27.53 51.11 12.71 82.69

S SUB2

S SUB1

SLG

12.71

Type1 Type2 Type1 Type2 Type1 Type2 Type1 Type2 Both types

SGATE SLG S SUB1 SSUB2 S SUB1 SSUB2 S SUB1 SSUB2 SSUB2

SSUB2 S SUB1 SLG SGATE SLG SGATE SRG SLEAST SSUB2

SSUB2 S SUB1 SLG SGATE SRG SLEAST SLG SGATE SGATE

82.05 13.86 8.66 41.42 7.77 41.30 7.77 41.30 51.21

S SUB2

SLG

S SUB1

31.97

PDN 001 010 011 100

Fig. 2. Comparison of the steady states in an off-state network of 2-input NAND (a) and NOR (b) gate with different gate input states.

PUN

101

PUN

011

110

the SSUB in PDN (PUN), whereas SLG presents when the conducting transistor is located above (below) SSUB in PDN (PUN), as shown in Fig. 2. Note that SLG leaks significantly less than SGATE, typically 3 to 6 orders of magnitude less because the magnitude of IG is a strong function of the applied bias [18]. Thus, leakage reduction can be achieved by pin reordering; all of the conducting (non-conducting) transistors of series-connected transistors in off-state PDN (PUN) are located above the non-conducting (conducting) transistor(s). Hence, the total leakage current can be minimized by replacing SGATE (S 2N /S 2P ) and SSUB2 with SLG (S 4N /S 4P ) and SSUB1, respectively. In other words, the pin reordering technique formulates the minimal leaky steady state in certain input combinations by eliminating the SGATE; replacing the SGATE with SLG results in SSUB changing from SSUB2 to SSUB1. 3.1. A combination of serial and parallel MOS structures in an offstate network Unlike basic logic gates, such as NAND and NOR gates, a complex gate is formed by the combination of serial and parallel MOS structures with complementary pull-up and pull-down logic. For the evaluation of the effect of pin reordering on leakage reduction in complex logic gates, we use an off-state network of typical CMOS complex logic gates, such as and-or-inverter (AOI), or-andinverter (OAI), exclusive-or (XOR) and exclusive-nor (XNOR). Fig. 3 shows the different implementations of OAI-21 (F = A (B + C )) and AOI-21 (F = A + (BC )) gates. Table 2 lists the leakage current and steady state components for AOI-21 and OAI21 logic gates in an off-state network. In an off-state pull-down (pull-up) network of OAI-21 (AOI-21) gates, the Type 1 structure

AOI-21

100 101 110 PDN 001 010

SRG SLEAST SRG SGATE SLG SGATE SSUB2 S SUB1 SGATE

SLG SGATE SLG SLEAST SLG SGATE SSUB2 S SUB1 SSUB2

does not present SSUB2 and SGATE other than in the input “100” (“011”) case; hence, the Type 1 structure is less leaky than the Type 2 structure other than in the input “100” (“011”) case. Whereas the Type 2 structure does not present SSUB2 and SGATE only in the input “100” (“011”) case, which results in less leaky than the Type 1 structure. In off-state pull-up (pull-down) network of OAI-21 (AOI-21) gates, input “110” (“010”) is less leaky than input “101” (“001”) because of the steady states of B and C transistors; steady states of B and C transistors in “110” (“010”) are SSUB1 and SLG, respectively, while steady states of B and C in “101” (“001”) are SGATE and SSUB2, respectively. Fig. 4 (Fig. 5) shows the XOR2 (XNOR2) CMOS gate with two and three different input locations of pull-up and pull-down networks, respectively. An important point to note is that if XOR2 and XNOR2 gates are implemented by a CMOS complex gate realization, all of the input vectors of both XOR and XNOR gates do not produce the same gate input state due to the presence of negated inputs. For example, if input components of the XOR2 (XNOR2)

Fig. 3. OAI-21 with different pull-down structures ((a) Type1, (b) Type2), and AOI-21 with different pull-up structures ((c) Type1, (d) Type2).

28

J.W. Chun, C.Y.R. Chen / Microelectronics Journal 53 (2016) 25–34

Fig. 4. 2-input XOR gate with different input positions in PUN ((a) Type1, (b) Type2) and PDN ((c) Type3, (d) Type4, (e) Type5).

Fig. 5. 2-input XNOR gate with different input positions in PUN ((a) Type1, (b) Type2) and PDN ((c) Type3, (d) Type4, (e) Type5).

gate are A and B, the logical expression of XOR2 (XNOR2) is A ·B¯ + A¯ ·B ( A ·B + A¯ ·B¯ ), and thus, both original inputs (A and B) and negated inputs ( A¯ and B¯ ) are presented in each CMOS network, as shown in Fig. 4 (Fig. 5). Consequently, unlike the other logic gates (such as NOR, NAND, AOI and OAI) that present the same gate input state when the input vector consists of same logic value, all of the input vectors are considered to evaluate the effects of pin reordering in the XOR (XNOR) gate as shown in Table 3. 3.2. Limitation of using pin reordering in CMOS complex gates There is a limitation of the using pin reordering technique [13]

in CMOS complex gates. To illustrate this, consider the different implementations of OAI-21 (F = A (B + C )) and AOI-21 (F = A + BC ) gates, as shown in Fig. 3. In this structure, only OR (AND) components (inputs B and C) of the OAI-21 (AOI-21) gate can be interchanged with each other, whereas inputs A and B or A and C cannot be interchanged because this transformation affects the gates logic function. Thus, in many cases of CMOS complex gates, pin reordering is only applicable to one of the CMOS networks (either PUN or PDN). For instance, as shown in Table 4, PUN (PDN) in OAI-21 (AOI-21) can benefit from pin reordering, whereas a PDN (PUN) cannot.

J.W. Chun, C.Y.R. Chen / Microelectronics Journal 53 (2016) 25–34

29

Table 3 Leakage current (nA) and steady state components for a 2-input XOR (XNOR) logic gate in an OSN with different input vectors. OSN

Input (AB):

Type

XOR (XNOR) PUN

00 (01) 11 (10)

PDN

01 (00)

10 (11)

ITotal

XOR : A B A¯ B¯ (XNOR : A B¯ A¯ B)

Type1 Type2 Type1 Type2

SGATE SLG S SUB1 SSUB2

SGATE SLG S SUB1 SSUB2

SSUB2 S SUB1 SLG SGATE

SSUB2 S SUB1 SLG SGATE

82.640 15.509 15.509 82.640

Type3 Type4 Type5 Type3 Type4 Type5

SSUB2 S SUB1 SSUB2 SLG SGATE SLG

SGATE SLG SGATE SSUB1 SSUB2 S SUB1

SLG SLG SGATE SSUB2 SSUB2 S SUB1

SSUB1 S SUB1 SSUB2 SGATE SGATE SLG

33.249 13.939 52.491 33.249 52.491 13.939

Table 4 Leakage current saving (%) obtained through pin and transistor reordering. Gate type

Reordering type

OSN Input (ABC or AB)

Saving (%)

OAI-21

Transistor reordering

PDN 001: Type2 → Type1

71.7

010: Type2 → Type1 011: Type2 → Type1 100: Type1 → Type2 PUN 101 → 110

71.7 72.3 75.1 40.5

PDN 001 → 010 PUN 011: Type1 → Type2

37.6 83.1

100: Type2 → Type1 101: Type2 → Type1 110: Type2 → Type1

79.1 81.2 81.2

Pin reordering AOI-21

Pin reordering Transistor reordering

XOR2 (XNOR2) Pin reordering

Pin reordering

PDN 01(00): Type3 → 01(00): Type5 → 10(11): Type3 → 10(11): Type4 → PUN 00(01): Type1 → 11(10): Type2 →

58.1

Type4 73.4

Type4 58.1

Type5 73.4

Type5 81.2

Type2 81.2

Type1

4. Transistor reordering for leakage reduction in CMOS complex gates In CMOS complex gates, pin reordering is limited by the gate logic function, as discussed in the previous section. To overcome this limitation of using pin reordering, transistor reordering is inevitable. Hence, when applying both pin and transistor reordering techniques, all of the networks of CMOS complex gates can benefit from using pin and transistor reordering, as listed in Table 4. In this section, we propose a novel transistor reordering for leakage reduction in CMOS complex gates. For the accurate analysis of transistor reordering, we consider all of the components of IG as well as ISUB. The previous reordering rule [13,14] for leakage reduction, placing all off (on) transistors at the bottom of the stack for each gate in the PDN (PUN), works well in simple complex logic gates such as the 3-input OAI-21 (AOI-21) gate as we explored previously. However, this approach is not fully optimized for every complex logic gate because it does not take into account IRG and its interaction with ISUB. This can result in an inaccurate analysis of the effect of transistor reordering on leakage reduction. To demonstrate this, consider the OAI-4211 gate as shown in

Fig. 6. OAI-4211 gate (a) and its simplified formation (b).

Fig. 6 (a). Note that the following analysis is performed for an NMOS pull-down stack, but is equally applicable to a PMOS stack. This complex gate can be transformed into the simplified form, as shown in Fig. 6 (b), when parallel transistors are considered as one single entity. When the input vector of this gate is “abcdefgh” (“ABCD”)¼“00011000” (“0011”), the worst transistor order of simplified form (top-to-bottom order) is “off-off-on-on” (“0011”); possible transistor orders are “ABCD”, “ABDC”, “BADC” and “BACD”. However, the best transistor order of simplified form (top-tobottom order) is “on-on-off-off” (“1100”); possible transistor orders are “CDAB”, “CDBA”, “DCAB” and “DCBA”. Fig. 7 depicts transistor reordering to minimize leakage current. After this transformation is done, the OSN can be divided into two block bindings: conducting block binding (CBB) and non-conducting block binding (NCBB). In this formation, as seen from Fig. 8, the steady state(s) of each conducting block (CB) of an OAI gate consists of either SLG (S4N ) or SLG along with SRG (S5N ). Fig. 8 shows the two possible transistor (conducting block) orders (“CD” and “DC”) in the CBB of the OAI-4211 gate under input: “abcdefgh” (“ABCD”) ¼“00011000” (“0011”). In this example, if we ignore the IRG of S5N in CB, the difference of the leakage current in the two transistor orders is negligible. However, when we take into account IRG, there is a huge discrepancy between the two

Fig. 7. Transistor reordering for leakage reduction in an off-state network when the simplified input (ABCD) of OAI-4211 gate is “0011”.

30

J.W. Chun, C.Y.R. Chen / Microelectronics Journal 53 (2016) 25–34

Fig. 8. Different conducting block orders in CBB of OAI-4211 gate: (a) “DC” (b) “CD”. Table 5 Comparison of leakage current (nA) without IRG and with IRG in different transistor orders in an OSN of an OAI-4211 gate. Transistor order:

ISUB

Without IRG

Top → bottom (condition) DCAB CDAB DCBA CDBA

(on-on-off-off) (on-on-off-off) (on-on-off-off) (on-on-off-off)

1.10 1.09 0.68 0.67

With IRG

4.1. Transistor reordering in conducting block binding

IG

ITotal

IG

ITotal

5.48E  3 6.04E  3 5.33E 3 5.92E  3

1.10 1.10 0.68 0.68

1.70 0.80 1.83 0.92

2.80 1.89 2.50 1.59

Table 6 Comparisons of leakage saving obtained without IRG and with IRG in different transistor orders in an OSN of the OAI-4211 gate. Transistor reordering:

Leakage saving (%)

0011 → 1100

Without IRG

ABDC ABDC ABDC ABDC

→ → → →

DCAB CDAB DCBA CDBA

transistor orders (“DCXX” and “CDXX”, where “XX” stands for the input vector (either “AB” or “BA”)) in the total leakage current (ITotal), as listed in Table 5. It can be observed that measuring leakage current is inaccurate without considering the IRG, resulting in misreading the effect of transistor reordering on leakage savings as shown in Table 6. Moreover, ISUB also varies with the transistor (non-conducting block) order; the “XXBA” transistor order is less leaky (more leakage saving) than the “XXAB” transistor order, where “XX” stands for the input vector (either “DC” or “CD”). Hence, simply placing all off transistors at the bottom of the stack for each gate in the PDN is not a fully optimized strategy for every complex logic gate. To overcome these problems, the impact of transistor reordering on IRG and its interaction with ISUB should be considered to achieve minimal leaky structure, which will be described in the following subsections. In the rest of this section, we assume that the first step of transistor reordering has been conducted in an OSN of a CMOS logic gate; that is, all conducting (non-conducting) blocks are located above the all non-conducting (conducting) blocks in offstate PDN (PUN).

With IRG

ISUB

IG

ITotal

ISUB

IG

ITotal

16.6 17.0 48.4 48.8

99.8 99.8 99.8 99.8

72 72 82 82

16.6 17.0 48.4 48.8

42.1 72.8 37.9 68.7

34 56 41 63

For the analysis of complex logic gates, we use the following notations to describe the structure of complex logic gates. Given parallel transistors within a MOS stack, conducting block and nonconducting block(NCB) are numbered based on their locations in the CBB and NCBB, respectively; each block binding is numbered in ascending order from top to bottom in each block binding, as shown in Fig. 9(a). In addition, the internal node voltage of CBB (Vicb) and NCBB (Vincb) is also numbered in ascending order from top to bottom in each block binding, while the internal node voltage between CBB and NCBB is Vibou (internal boundary node voltage). To gain a better understanding of leakage behavior in complex gates, we present the simplified formation of a complex gate, as illustrated in Fig. 9(b). This formation is based on the gate input state and width of transistors in each block; if the parallel transistors in a block have the same gate input state, they are replaced with a single transistor with a transistor size equal to the sum of their sizes. For example, the three transistors of CB1 in Fig. 9(a) are replaced with two types of transistor based on the gate input state, as shown in

Fig. 9. Schematic and notation for complex gate analysis: a PDN of OAI-333321 (a) and its simplified formation (b).

J.W. Chun, C.Y.R. Chen / Microelectronics Journal 53 (2016) 25–34

Table 7 Leakage current (nA) for the OAI-5311 gate in an OSN with different CB orders.

31

4.1.2. Both conducting and non-conducting transistors exist in CBB Algorithm 1. Transistor reordering for CBB ().

W /L in CB (n) th − CT

Vibou

CBB

NCBB

ITotal

CB1

CB2

CB3

(V )

IG

IG

ISUB

20 20 12 12 4 4

12 4 20 4 20 12

4 12 4 20 12 20

0.622 0.624 0.624 0.625 0.635 0.635

0.029 0.036 0.040 0.057 0.058 0.063

0.073 0.074 0.074 0.075 0.080 0.080

5.590 5.639 5.640 5.672 5.982 5.985

5.692 5.749 5.754 5.804 6.120 6.128

Fig. 9(b): the conducting transistor in CB (CB1-CT) and non-conducting transistor in CB (CB1-NCT). In CMOS complex logic gates, there are two types of conducting blocks (CBs) in CBB. The first one (CB_case 1) consists of only a conducting transistor (SLG), such as CB3 in Fig. 9; the second one (CB_case 2) consists of both a conducting transistor (SLG) and a non-conducting transistor (SRG), such as CB1 and CB2 in Fig. 9. Hence, a CBB can be divided into two cases depending on the absence or presence of the non-conducting transistor. Let us therefore consider these two cases separately. 4.1.1. Conducting transistors only exist in CBB To analyze the effects of transistor (CB) reordering in CBB on the first case (CB_case 1) of leakage current in PDN, which does not contain a non-conducting transistor in CBB, consider the different CB orders in CBB of an OAI-5311 gate with three CBs and one NCB (W /L = 4). The leakage currents of candidates for the best transistor order are listed in Table 7. Note that IG in CBB is forward gate tunneling current (IFG), whereas IG in NCBB is IRG. It can be observed that among best CB order candidates, the lowest leakage current presents in both CBB and NCBB when the CBs are sorted by W /L in descending order from top to bottom in CBB, whereas the worst leakage current presents when the CBs are sorted in the reverse way of the best transistor order. This is basically because of internal node voltages of an OSN. In CBB, the nth internal node voltage (Vicb (n) ) is higher than (n + 1) th internal voltage (Vicb (n + 1) ) because of the voltage drop across each CB in the stack (e.g., Vout > Vicb1 > Vicb2 > Vibou in Fig. 9). Thus, the VGD and VGS of CB (n) th − CT are less than those of CB (n + 1) th − CT ). For instance, in Fig. 9, VGD (¼ 0) and VGS ( ¼VDD-Vicb1) of CB1-CT are less than VGD (¼ VDD-Vicb1) and VGS (¼VDD-Vicb2) of CB2-CT, respectively. Consequently, in terms of leakage reduction, a higher location of CB-CT in CBB has better leakage circumstances (less VGD and VGS) than a lower one. Hence, the best CB (transistor) order in CBB for leakage reduction can be obtained when CBs are arranged by descending order of widths from the output node to the internal boundary node (internal node between CBB and NCBB) because the IG is proportional to the width of a transistor. Furthermore, this CB order in CBB has a positive impact on the leakage current in NCBB as well; the best conducting block order in CBB exhibits the lowest leakage current in NCBB because it produces the lowest Vibou as shown in Table 7. The lower Vibou means that a lower VDS (less DIBL effect) and |VGD | (less IRG) in NCB-1 results in lower ISUB and IG through NCBB.

1: for all CB ∈ CBB do 2: Calculate the W /L of CB − CT & CB − NCT ; 3: CB divide into two cases (CB _case1 & CB _case2) ; 4: end for 5: if CBB consists of only CB _case1 then 6: CBs are sorted by their W /L of CB-CT in descending order from top of CBB; 7: else if CBB consists of only CB _case2 then 8: CBs are sorted by their W /L of CB-NCT in descending order from bottom of CBB; 9: else CBB consists of both CB _case1 & CB _case2 10: place all CBs of CB_case1 above the CBs of CB_case2; 11: for CBs in CB_case1 do 12: CBs are sorted by their W /L of CB-CT in descending order from top of CBB; 13: end for 14: for CBs in CB_case2 do 15: CBs are sorted by their W /L of CB-NCT in descending order from bottom of CBB; 16: end for 17: end if

To analyze the effects of transistor reordering in CBB on the second case (CB_case 2) of leakage current in PDN, consider the different CB orders in CBB of an OAI-531 gate with two CBs and one NCB (W /L = 4). The leakage currents of candidates for the best transistor order are listed in Table 8. As seen in Table 8, compared to the IFG in CBB, IRG exhibits different leakage behaviors based on the location in the CBB. In the case of IFG in CBB, as we discussed previously, a higher location of CB-CT is less leaky than a lower one, and thus, IFG is reduced when a higher width of CB-CT (W /L = 8) is located above the lower width of CB-CT (W /L = 4). However, in the case of IRG in the CBB, a higher location of CB-NCT is leakier than a lower one because gate input state of CB-NCT is 0, and thus, the VGD and VGS of CB (n) th − NCT are more than those of CB (n + 1) th − NCT . For instance, in Fig. 9, |VGD | ( ¼VDD) and |VGS | (¼Vicb1) of CB1-NCT are higher than |VGD | (¼Vicb1) and |VGS | (¼Vicb2) of CB2-NCT, respectively. Thus, IRG in CBB is reduced when a higher width of CB-NCT (W /L = 16) is located below the lower width of CB-NCT (W /L = 4), as shown in Table 8. It should be noted that if a CBB contains the non-conducting transistor, the dominant leakage current is not the IFG but the IRG (which is exhibited only in a nonconducting transistor) because this leakage is at least two orders of magnitude higher than IFG (which exhibits only in conductingtransistor) in the CBB, as listed in Table 8. Hence, if a CBB contains the non-conducting transistor, the best CB order in the CBB for leakage reduction can be obtained when CBs are arranged by widths of CB-NCT in ascending order from the output node to the internal boundary node. In addition, experimental data indicates that this conducting block order in the CBB exhibits the lowest leakage current in NCBB because it produces the lowest Vibou.

Table 8 Leakage current (nA) for an OAI-531 gate in an OSN with different CB orders.

W /L in CB-CT & CB-NCT

Vibou

CBB

NCBB

ITotal

CB1-CT

CB1-NCT

CB2-CT

CB2-NCT

(V )

IFG

IRG

IRG

ISUB

4 8

16 4

8 4

4 16

0.635 0.626

0.017 0.011

2.269 1.287

0.080 0.075

5.983 5.691

8.349 7.064

32

J.W. Chun, C.Y.R. Chen / Microelectronics Journal 53 (2016) 25–34

Table 9 Leakage current (nA) for OAI-211/-311/-411 gates in an OSN with different NCB orders under one CB (W /L = 4) and two NCBs. Gate type

W /L in NCB

Vincb1

CBB

NCBB

NCB1

(V )

IFG

IRG

ISUB

NCB2

ITotal

OAI-211

8 4

4 8

0.117 0.081

0.001 0.001

0.299 0.141

0.703 1.175

1.003 1.317

OAI-311

12 4

4 12

0.126 0.070

0.001 0.001

0.439 0.134

0.743 1.654

1.184 1.789

OAI-411

16 4

4 16

0.132 0.062

0.001 0.002

0.573 0.129

0.773 2.094

1.347 2.224 Fig. 10. ISUB and IRG reduction obtained by NCB reordering for ISUB and IRG, respectively.

Based on the above observations, we propose a transistor reordering procedure for CBB of PDN to minimize overall leakage, which is described in Algorithm 1. 4.2. Transistor reordering in non-conducting block binding To analyze the effect of transistor (NCB) reordering on leakage current in NCBB of PDN, consider the different NCB orders in NCBB of OAI-211, OAI-311 and OAI-411 gates with one CB (W /L = 4) and two non-conducting blocks (NCBs). The leakage currents of the candidates for the best transistor order are listed in Table 9. In the case of IRG in NCBB, for the same reason as IRG in the CBB case (CB-NCT), the IRG in the NCBB is minimized when the larger width of NCB is located below the smaller width of NCB. In the case of ISUB in NCBB, ISUB varies with the width of NCB1 as listed in Table 9; the larger block width of NCB1 exhibits more stacking effects than the smaller one because a larger block width of NCB1 yields more positive potential at its source node (Vincb1) than a smaller one. Thus, ISUB in NCBB is minimized when the larger width of NCB is located above the smaller width of NCB. As seen from Table 9, IRG and ISUB show different leakage behaviors in transistor (NCB) order; IRG in NCBB is minimized when the NCBs are arranged by those widths in ascending order from the top of the NCBB (internal boundary node) to the ground, whereas ISUB in NCBB is minimized when a reversed transistor order for IRG reduction is applied to the NCBB. It can be observed that when the number of NCBs is two, NCB reordering for ISUB reduction is more effective (less leaky) than for IRG reduction. However, when the number of NCBs is more than two, NCB reordering for IRG reduction is more effective than for ISUB reduction, as shown in Table 10. Fig. 10 shows the amount of ISUB reduction ( ▵ISUB ) and IRG reduction ( ▵IRG ) when the NCB order changed from IRG reduction to ISUB reduction and vice versa, respectively. ▵ISUB is obtained by subtracting ISUB in NCB order for ISUB reduction from ISUB in NCB order for IRG reduction, while ▵IRG is obtained by subtracting IRG in NCB order for IRG reduction from IRG in NCB order for ISUB Table 10 Leakage current (nA) for OAI-2111/-3111/-4111 gates in an OSN with different NCB orders under one CB (W /L = 4) and three NCBs. Gate type

W /L in NCB NCB1

NCB2

NCB3

CBB

NCBB

ITotal

IFG

IRG

ISUB

OAI-2111

8 4

4 4

4 8

0.001 0.001

0.317 0.158

0.461 0.603

0.780 0.762

OAI-3111

12 4

4 4

4 12

0.001 0.001

0.461 0.156

0.474 0.691

0.936 0.848

OAI-4111

16 4

4 4

4 16

0.001 0.001

0.598 0.155

0.483 0.742

1.082 0.898

reduction. This is basically because the effect of the stacking effect on ISUB reduction decreases as the number of non-conducting transistors in the stack increases [19], and thus, the effect of NCB reordering for ISUB reduction also decreases. However, NCB reordering for IRG reduction exhibits consistent results regardless of the number of NCBs as shown in Fig. 10. Between the two types of NCB reordering (one for ISUB reduction and the other one for the IRG reduction) for leakage reduction in NCBB, a favorable NCB reordering for any given technology node can be chosen from comparisons of the effect of those reorderings on leakage reduction based upon the number of NCBs; finding the number of NCBs when a reordering aimed at ISUB reduction in NCBB is less effective than a reordering aimed at IRG reduction in NCBB, i.e., ▵ISUB < ▵IRG . For instance, in a 32 nm technology node, if the number of NCBs is less than three, a reordering aimed at ISUB reduction is applied to the NCBB. Otherwise, a reordering aimed at IRG reduction is applied. Algorithm 2 details the transistor reordering procedure for the NCBB described above. Algorithm 2. Transistor reordering for NCBB (). 1: Find the number (n) of NCBs when ▵ISUB < ▵IRG 2: for all NCB ∈ NCBB do 3: Calculate the W /L of NCBs ; 4: Count the number (m) of NCBs in NCBB ; 5: end for 6: if m < n then 7: NCBs are sorted by their W /L of NCB in descending order from top of NCBB; 8: ⧹⧹ Transistor reordering for ISUB reduction 9: else 10: NCBs are sorted by their W /L of NCB in ascending order from top of NCBB; 11: ⧹⧹ Transistor reordering for IRG reduction 12: end if

4.3. A transistor reordering procedure for leakage reduction in CMOS complex gates With the understanding of the leakage behaviors of circuit structure, we propose a transistor reordering method for leakage reduction in CMOS complex gates. Algorithm 3 shows the overall transistor reordering procedure for leakage reduction in PDN of CMOS complex gates. In the previous method in [14], the reordering procedure is conducted at step 2 in proposed Algorithm 3. Therefore, the previous transistor reordering approach can provide the best transistor order only if the following conditions are met: all CBs W /L components ( W /L of CB-CT and CB-NCT) are

J.W. Chun, C.Y.R. Chen / Microelectronics Journal 53 (2016) 25–34

equal, and all NCBs W /L values are equal. In other words, it cannot generate the best transistor order for a minimum leaky formation other than that condition as described in the OAI-4211 example earlier in this section. However, the proposed method can generate the best transistor order by considering all components of IG, including IRG, as well as interactions between ISUB and IG in various conditions with different numbers and W /L ratios of on-/offtransistor blocks in a stack. Algorithm 3. Overall transistor reordering procedure for leakage minimization (). Step 1. Divide blocks in OSN into two categories: CBs and NCBs. Step 2. Generate the two block bindings (CBB and NCBB) in OSN by placing all CBs above the NCBs. Step 3. If all CBs W /L components (CB-CT and CB-NCT) are equal and all NCBs W /L values are equal, go to step 6. Otherwise, go to step 4. Step 4. Sort out the conducting blocks in CBB using a proposed transistor reordering scheme for CBB (Algorithm 1). Step 5. Sort out non-conducting blocks in NCBB using a proposed transistor reordering scheme for NCBB (Algorithm 1). Step 6. Exit.

5. Experimental results The proposed method was implemented and tested using ISCAS85 benchmark circuits. All circuits were synthesized and mapped using the ABC tool [20] and the 44-6.genlib library with up to five inputs to obtain gate-level netlists. The extracted gatelevel netlists were then fed into the HSPICE simulator to measure the leakage power consumption and the delay. The leakage power consumption for each circuit was measured for 1000 random input vectors using the 45 nm (VDD ¼1.0 V), 32 nm (VDD ¼0.9 V), and 22 nm (VDD ¼0.8 V) PTM technologies. In this experiment, each reordering technique is applied to reduce the leakage power dissipation without considering the delay effects of pin/transistor reordering on the circuit performance. Fig. 11 shows the dependences of the leakage power savings (%) for each of the reordering techniques ([13,14] and the proposed method) and the average delay overhead (%) of the proposed method based on the input states of the c1908 circuit in 22 nm technology. The propagation delay of the c1908 circuit is measured between the primary input edge when it reaches half of VDD and the primary output edge when it reaches half of VDD. We select 100 transitions between the primary input vectors that yield the

33

transition of the critical path's primary output to measure the average delay overhead of the c1908 circuit. The results show that our reordering technique yields an average leakage power reduction of 24.62% (ranging up to 29.15%) with an overall average delay overhead of 1.43% (ranging up to 6.31%). It can be observed that selection of the appropriate pin/transistor order can not only reduce the leakage power but can also reduce the circuit delay, as shown in Fig. 11. Therefore, the reordering approach can improve both the leakage and the circuit performance simultaneously. For example, the proposed approach improves the delay by up to 3.68% with a 23.20% leakage power saving in the c1908 circuit. The experimental results are listed in Table 11. Three different technologies were used to analyze the effects of technology scaling on the reordering techniques. The minimum, maximum and average leakage power savings of the pin reordering method [13], the input and transistor reordering method [14], and the proposed transistor and pin reordering method are listed in columns 3–5, 6– 8, and 9–11, respectively. Columns 12 and 13 show the maximum leakage power savings of the proposed method when compared with the methods of [13] and [14], respectively. The results show that the proposed method is highly consistent with technology scaling and is more effective than the other reordering methods for leakage reduction. Our approach achieves leakage power savings of up to 32.96% (average: 27.61%), 38.03% (average: 31.31%), and 39.12% (average: 32.45%) for the 45 nm, 32 nm, and 22 nm technology nodes, respectively. The proposed reordering technique yields additional leakage power savings of up to 24.13% (average: 18.36%) and 12.09% (average: 8.98%) when compared with the methods of [13] and [14] in the 22 nm technology, respectively.

6. Conclusion In this paper, we demonstrated how IG and ISUB vary when the order of transistors and pins in a stack is varied, and we addressed the problems and limitations related to the existing reordering approach. To solve these problems, we presented a novel reordering method for leakage power reduction in CMOS circuits by comprehensively analyzing various types of transistor configurations. Experimental results on ISCAS85 benchmark circuits show that our reordering design method can achieve improvements in terms of leakage power savings that range from 6% to 24% when compared with the previous works. Defining a lowest leaky state in each input combination of a logic gate is necessary to achieve minimum power dissipation of CMOS circuits in standby mode operation. Thus, this method provides a new choice to manage the leakage power problem for circuit designers. In addition, the

Fig. 11. Experimental results for c1908 circuit in 22 nm technology using 1000 random input vectors.

34

J.W. Chun, C.Y.R. Chen / Microelectronics Journal 53 (2016) 25–34

Table 11 Experimental results for ISCAS85 benchmarks circuits. Technology Circuit Leak. power savings(%) by [13]

Min

Leak. power savings(%) by [14]

Leak. power savings(%) by proposed method

Max. Leak. power savings(%) compared to

Max

Avg

Min

Max

Avg

Min

Max

Avg

[13]

[14]

45 nm

c432 c499 c880 c1355 c1908 c3540 Avg

9.57 8.80 4.03 9.52 6.94 9.96 8.14

21.68 14.01 17.49 15.33 12.94 15.89 16.23

15.47 11.72 12.55 12.19 10.18 12.65 12.46

14.58 14.16 7.05 12.26 12.33 15.30 12.62

29.98 21.10 27.20 20.01 19.41 23.92 23.60

22.01 17.58 20.32 16.14 16.22 19.59 18.64

19.02 17.81 10.96 17.40 17.10 18.32 16.77

32.81 24.56 32.96 23.08 24.27 27.95 27.61

25.67 20.79 25.68 20.01 20.78 23.84 22.79

18.71 13.03 19.73 10.84 14.48 14.49 15.21

11.55 6.08 8.29 6.90 6.89 6.03 7.62

32 nm

c432 c499 c880 c1355 c1908 c3540 Avg

9.61 8.42 4.44 9.86 7.08 10.77 8.36

23.52 13.90 19.20 16.62 13.84 17.02 17.35

16.52 11.57 13.53 12.48 10.68 13.52 13.05

15.11 15.13 8.48 12.91 14.04 17.37 13.84

35.59 23.53 32.26 22.57 22.71 27.62 27.38

25.05 19.18 23.76 17.59 18.57 22.65 21.13

20.63 19.36 13.68 18.79 19.44 20.44 18.72

37.69 27.20 38.03 25.17 27.96 31.78 31.31

28.67 22.91 29.60 21.68 23.68 26.88 25.57

22.09 16.05 24.03 12.96 17.95 17.26 18.39

12.03 7.09 9.03 7.29 7.89 6.04 8.23

22 nm

c432 c499 c880 c1355 c1908 c3540 Avg

10.24 9.30 4.54 10.34 7.61 11.43 8.91

25.19 14.95 19.90 16.97 14.54 17.89 18.24

17.24 12.55 14.07 13.26 11.35 14.25 13.79

16.82 15.71 8.43 13.53 14.35 17.74 14.43

35.12 24.20 32.60 22.67 22.84 28.02 27.57

25.57 19.84 23.91 18.14 18.90 22.98 21.56

21.35 20.73 14.73 19.90 20.37 21.10 19.69

39.12 28.65 38.88 26.44 29.15 32.45 32.45

29.21 24.12 30.43 22.79 24.62 27.61 26.46

21.35 15.99 24.13 13.10 18.41 17.16 18.36

12.09 8.15 10.10 8.47 8.53 6.52 8.98

proposed method can be easily integrated into existing CAD tools and other leakage reduction techniques, such as IVC, to achieve further improvement.

References [1] Q. Wang, S.B.K. Vrudhula, Algorithms for minimizing standby power in deep submicrometer, dual-vt cmos circuits, IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst. 21 (3) (2002) 306–318, http://dx.doi.org/10.1109/43.986424. [2] J. Gu, G. Qu, L. Yuan, Enhancing dual-vt design with consideration of on-chip temperature variation, in: 2010 IEEE International Conference on Computer Design (ICCD), 2010, pp. 542–547, doi: http://dx.doi.org/10.1109/ICCD.2010. 5647619. [3] R. Fan, Z. Dandan, Y. Xiaolang, An algorithm for reducing leakage power based on dual-threshold voltage technique, in: 2013 Fourth International Conference on Digital Manufacturing and Automation (ICDMA), 2013, pp. 132–134, doi: http://dx.doi.org/10.1109/ICDMA.2013.31. [4] N. Sirisantana, L. Wei, K. Roy, High-performance low-power cmos circuits using multiple channel length and multiple oxide thickness, in: Proceedings of International Conference on Computer Design, 2000, pp. 227–232, doi: http://dx.doi.org/10.1109/ICCD.2000.878290. [5] A. Abdollahi, F. Fallah, M. Pedram, A robust power gating structure and power mode transition strategy for MTCMOS design, IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 15 (1) (2007) 80–89, http://dx.doi.org/10.1109/ TVLSI.2007.891093. [6] S. Hemantha, A. Dhawan, H. Kar, Multi-threshold CMOS design for low power digital circuits, in: TENCON 2008 - 2008 IEEE Region 10 Conference, 2008, pp. 1–5, doi: http://dx.doi.org/10.1109/TENCON.2008.4766689. [7] A. Abdollahi, F. Fallah, M. Pedram, Leakage current reduction in CMOS VLSI circuits by input vector control, IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 12 (2) (2004) 140–154, http://dx.doi.org/10.1109/TVLSI.2003.821546. [8] F. Gao, J. Hayes, Exact and heuristic approaches to input vector control for leakage power reduction, IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst. 25 (11) (2006) 2564–2571, http://dx.doi.org/10.1109/TCAD.2006.875711. [9] F. Firouzi, S. Kiamehr, M. Tahoori, Power-aware minimum NBTI vector selection using a linear programming approach, IEEE Trans. Comput.-Aided Des.

[10]

[11]

[12]

[13]

[14]

[15]

[16] [17]

[18]

[19]

[20]

Integr. Circuits Syst. 32 (1) (2013) 100–110, http://dx.doi.org/10.1109/ TCAD.2012.2211103. R. Hossain, M. Zheng, A. Albicki, Reducing power dissipation in CMOS circuits by signal probability based transistor reordering, IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst. 15 (3) (1996) 361–368, http://dx.doi.org/10.1109/ 43.489107. S.C. Prasad, K. Roy, Transistor reordering for power minimization under delay constraint, ACM Trans. Des. Autom. Electron. Syst. 1 (2) (1996) 280–300, http: //dx.doi.org/10.1145/233539.233543. T.-W. Chiang, C.-Y. Chen, W.Y. Chen, A technique for selecting cmos transistor orders, in: 25th International Conference on Computer Design, 2007 (ICCD 2007), 2007, pp. 438–443, doi: http://dx.doi.org/10.1109/ICCD.2007.4601936. D. Lee, D. Blaauw, D. Sylvester, Gate oxide leakage current analysis and reduction for VLSI circuits, IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 12 (2) (2004) 155–166, http://dx.doi.org/10.1109/TVLSI.2003.821553. S. Kiamehr, F. Firouzi, M.B. Tahoori, Input and transistor reordering for NBTI and HCI reduction in complex CMOS gates, in: Proceedings of the Great Lakes Symposium on VLSI (GLSVLSI '12), ACM, New York, NY, USA, 2012, pp. 201– 206, doi: http://dx.doi.org/10.1145/2206781.2206829. URL 〈http://doi.acm.org/ 10.1145/2206781.2206829〉. R. Rao, J. Burns, A. Devgan, R. Brown, Efficient techniques for gate leakage estimation, in: Proceedings of the 2003 International Symposium on Low Power Electronics and Design, 2003 (ISLPED '03), 2003, pp. 100–103, doi: http://dx.doi.org/10.1109/LPE.2003.1231843. Predictive technology model. URL 〈http://ptm.asu.edu〉. S. Narendra, S. Borkar, V. De, D. Antoniadis, A. Chandrakasan, Scaling of stack effect and its application for leakage reduction, in: International Symposium on Low Power Electronics and Design, 2001, pp. 195–200, doi: http://dx.doi. org/10.1109/LPE.2001.945400. K. Cao, W.-C. Lee, W. Liu, X. Jin, P. Su, S.-H. Fung, J. An, B. Yu, C. Hu, BSIM4 gate leakage model including source-drain partition, in: International Electron Devices Meeting, 2000 (IEDM '00), Technical Digest, 2000, pp. 815–818, doi: http://dx.doi.org/10.1109/IEDM.2000.904442. A. Agarwal, S. Mukhopadhyay, A. Raychowdhury, K. Roy, C. Kim, Leakage power analysis and reduction for nanoscale circuits, IEEE Micro 26 (2) (2006) 68–80, http://dx.doi.org/10.1109/MM.2006.39. Berkeley logic synthesis and verification group, Abc: a system for sequential synthesis and verification. URL 〈http://www.eecs.berkeley.edu/  alanmi/abc/〉.