Microelectronics Journal 46 (2015) 1313–1324
Contents lists available at ScienceDirect
Microelectronics Journal journal homepage: www.elsevier.com/locate/mejo
Variability modeling in near-threshold CMOS digital circuits M. Slimani a,n, F. Silveira b, P. Matherat a a b
Institut Télécom/Télécom ParisTech, CNRS-LTCI UMR 5141, Paris, France Universidad de la República, Montevideo, Uruguay
art ic l e i nf o
a b s t r a c t
Article history: Received 13 May 2015 Received in revised form 30 September 2015 Accepted 4 October 2015 Available online 2 November 2015
Sub-threshold operation is an efficient solution for ultra low power applications. However, it is very sensitive to process variability which can impact the robustness and effective performance of the circuit. On the other hand, this sensitivity decreases toward near-threshold operation. In this paper, the impact of variability on sub-threshold and near-threshold circuit performance is investigated through analytical modeling and circuit simulation in a 65 nm industrial low power CMOS process. It is shown that variability moves the effective minimum energy point toward the near-threshold region. Also, when variability is taken into account, a complete model including the region near-threshold (moderate inversion) is required to correctly model circuit performance around the minimum energy point. An analytical solution for the optimum supply voltage that minimizes the total energy per operation, while considering variability effects, is provided. Additionally, the resulting speed-consumption trade-off in a variability aware analysis of sub-threshold and near-threshold operation is presented. & 2015 Elsevier Ltd. All rights reserved.
Keywords: Sub-threshold logic Near-threshold operation Variability Modeling
1. Introduction Over the last decade, sub-threshold logic has been an ideal option to achieve ultra low energy consumption for applications with low demand in speed requirements. The sub-threshold term refers to the weak inversion (WI) region where there is an exponential dependence between drain current and gate voltage [1] and where the minimum energy can be achieved using a supply voltage (VDD) well below the threshold voltage. Traditionally digital circuits have a supply voltage VDD well above the threshold voltage. In such a case, when the transistors are on, they operate in the strong inversion region where the dependence between drain current and gate voltage is quadratic (or near-linear when the velocity saturation effect dominates [2]). Between the weak and strong inversion there is the moderate inversion region or nearthreshold region, occurring for VDD voltages around the threshold voltage, where the transistor ID–VG characteristic is neither exponential nor quadratic [2]. It has been recently proven that the nearthreshold region provides convenient trade-offs [3] in the design of ultra low energy digital circuits and this paper contributes to this topic. As the interest for ultra low energy digital circuits has increased, research related to sub-threshold logic has attained n
Corresponding author. E-mail addresses:
[email protected] (M. Slimani), silveira@fing.edu.uy (F. Silveira). http://dx.doi.org/10.1016/j.mejo.2015.10.001 0026-2692/& 2015 Elsevier Ltd. All rights reserved.
considerable importance. Modeling and characterization of subthreshold operation for standard CMOS cell designs have been investigated for energy and performance analysis [1,4]. However, several works have shown that variability in sub-threshold logic is a critical limitation to achieve robust ultra low energy devices [5– 7]. As the current in weak inversion region exponentially depends on threshold voltage (Vt), random Vt variations significantly affect the on and off currents, gate delays, the output swings and may result in functional failures of some gates. Moreover, the minimum energy point can be strongly affected by the variability. Models considering the variability have been developed in previous works, specifically for use in sub-threshold weak inversion region [8,7]. Utilizing Monte Carlo Spice simulations, we have shown that variability moves the effective minimum energy point toward the near-threshold region (Moderate Inversion). Thus, models restricted to the weak inversion region (exponential drain current vs. gate voltage) can no longer model circuit performance around the minimum energy point. The previous constraints make that when a designer considers the use of a low VDD design for reducing energy consumption, she needs a simple way for assessing the energy – speed – variability trade-offs in the process (or processes) that she is targeting as well as an overall understanding of the reasons behind these trade-offs and how they interplay. The goal of this paper is to fulfill this need. As mentioned, such a procedure requires to consider both the suband near-threshold regions (i.e. the weak and moderate inversion regions). Furthermore, since the goal is to have an orientation tool
M. Slimani et al. / Microelectronics Journal 46 (2015) 1313–1324
for the designer and to explore the trade-offs, the proposed approach gives precedence to having a model with the minimum required complexity that allows to show and understand the relevant trends over modeling in detailed way the many effects required to define the performance with high accuracy. This approach makes that well-known modeling issues, e.g. for defining a gate delay, are not considered, favouring a basic model that by using a few adjustable parameters allows to reach our goal. In this paper, we apply a complete and compact transistor model valid from weak to strong inversion in order to correctly model the circuit performance in a variability aware analysis. We provide an analytical solution for the optimum supply voltage (VDDopt) that minimizes the total energy per operation, while considering variability effects. To our knowledge, this is the first attempt to derive an analytical solution of VDDopt, while considering mismatch effects. Simulations of the transistor and inverter chain characteristics are required [9]. Then, the model is validated for predicting the behaviour of more complex circuits, using a 32bit adder and an 8-bit array multiplier as test cases. The proposed model is a simple tool for assessing the trade-offs between energy, speed and variability, often faced when designing in the sub/nearthreshold regions. This paper is organized as follows. Section 2 presents the concept of sub-threshold operation and investigates the impact of process variations on minimum energy point. In Section 3, a complete model that includes the near-threshold region is developed and the analytical expression for the optimum VDD is derived. Then both are validated in the benchmark circuits. Section 4 summarizes the main conclusions of the paper.
2. Sub-threshold circuit design In this section, the concept of sub-threshold operation is briefly explained. Variability impact on sub-threshold circuit performance via Monte Carlo Spice simulations of an inverter chain implemented in a Low Power technology from an industrial foundry is then presented. 2.1. Sub-threshold operation Sub-threshold operation consists in reducing the power supply voltage VDD below the threshold voltage Vt in order to achieve minimum energy consumption. The concept is simple: as the dynamic power consumption Pdyn is proportional to the square of VDD, a small reduction in supply voltage causes quadratic decrease in this power consumption, at the cost of a significant increase in delay. This results in an increase in the leakage energy as the clock period needs to be extended proportionally to the delay of the circuit. The opposing trends of the two forms of energy (dynamic and leakage) lead to a minimum energy point achieved at an optimum supply voltage. This point often occurs in the weak inversion region where sub-threshold leakage currents are used as the active drain current. Fig. 1 plots the drain current ID versus the gate to source voltage VGS for an NMOS transistor in a 65 nm technology. The curve in Fig. 1 points out two main characteristics of sub-threshold operation. First, in the WI region, the drain current depends exponentially on VGS. Second, lowering the supply voltage causes the degradation of active to idle currents ratio (I on =I off ), where Ion is the sub-threshold current when VGS ¼VDD and Ioff is the leakage current derived when VGS ¼0. The decrease of this ratio depends on the sub-threshold swing S defined as : S ¼ nV t ln10, where n is the sub-threshold slope factor [1]. Moreover, a reduced I on =I off ratio can impact the output swing and result in a functional failure especially when process variation effects are introduced.
10
10
10 ID(A)
1314
10
S−1
10
Ioff 10
0
Sub− Super− threshold threshold 0,2
VT
0,6
0,8
1
1,2
1,4
VGS(V)
Fig. 1. ID versus VGS curve for NMOS transistor.
Note that the exponential decrease of Ion in the weak inversion region leads to an exponential increase of the delay. Hence, the application of sub-threshold logic is limited to circuits requiring low to medium throughput constraints. Wireless-sensor networks is an example of an application that would benefit from subthreshold operation, as energy consumption is of primary concern since small batteries with a long lifetime are needed. However, since sub-threshold leakage currents depend exponentially on the threshold voltage, operating the circuit in the weak inversion region will result in an increased sensitivity to process variation. Process variations are fluctuations around the intended value of a design parameter, caused by manufacturing process. They are typically classified into global (extrinsic) and local (intrinsic) variations [10]. Global variation affects every structure on a die equally and can induce different characteristics from one die to another. They result generally from factors such as temperature effects and equipment properties. Local variation, however, affects structures on the same die differently. Three sources are of particular importance: Random Dopant fluctuations (RDF :random placement of dopant atoms in the channel region), Line Edge roughness and oxide thickness Tox variations. Both global and local variations can induce Vt variations and impact the functionality of sub-threshold circuits. However, in [11], authors show that global Vt variation can be compensated with the utilisation of an adaptive body biasing technique (ABB). Further analysis of the efficiency of this technique for subthreshold circuits can be found in [8]. For local variation, Random Dopant Fluctuations (RDF) has the most significant impact on Vt variability [8]. Fig. 2 plots the delay distribution of a 32-bit adder in subthreshold (0.2 V) and above-threshold (1.2 V) resulting from 1 K Monte Carlo simulation in a 65 nm process technology. As can be seen in Fig. 2, the standard deviation in sub-threshold regime is one order of magnitude larger than that at nominal voltage, implying a higher variability of sub-threshold operation that must be considered when designing sub-threshold circuits. Also, the effect of variability when very low supply voltages are used, compromises the correct functionality of sub-threshold circuits. For example, if variations let NMOS devices stronger than PMOS ones, the pull up network can fail to drive the output to the correct logic value VDD, and vice versa. It has been previously proposed to design sub-threshold circuits while considering failure due to process variation and low ratio of on to off currents. In 2004, Wang et al. have used process corners analysis to determine the minimum operation voltage and the minimum transistor sizing, ensuring good functionality of the gate [4]. Wang's method
M. Slimani et al. / Microelectronics Journal 46 (2015) 1313–1324
1315
searched for the minimum ratio of the PMOS width to NMOS width of a standard cell library, that maintained an output swing of 10–90% of the supply voltage in the worst case corners. Later, Kwong et al. applied an efficient method to verify logic gate output levels (VOL and VOH) based on an SNM (Signal Noise Margin) approach [6]. Kwong et al.'s “butterfly plot” superimposed the VTC (Voltage Transfer Characteristic ) of the gate in question with the mirrored VTC of NOR gate, in order to verify VOL and with the mirrored VTC of NAND gate to determine VOH, as they have the worst case VIL and VIH, respectively. A negative SNM means that the VOL of the gate is above the required input voltage (VIL) of the succeeding gate. This implies a functional failure and thus the gates cannot operate at this supply voltage. 2.2. Variability impact on minimum energy point To show the impact of variability on the minimum energy point, Monte Carlo Spice simulations with 1000 points have been performed for different values of VDD. Fig. 3 shows the evolution of total energy consumption without and with variability effect, considering typical and 3σ worst case delay, respectively. We observe that variability leads to a considerable increase in leakage energy. This results in the increase of the minimum energy point by 8%, getting closer to the moderate inversion region.
3. Variability aware modeling in sub-threshold operation Modeling the minimum energy point has been addressed in previous works based on the simple sub-threshold current model (WI Model : exponential drain current vs. gate voltage) [1,8,7]. However, as we have seen in Section 2.2, variability considerably affects the minimum energy point which moves toward the moderate inversion region. In [1], modeling in moderate inversion applying an all region transistor model is also considered, but the impact of variability is not analyzed in this case. In 2010, Markovic et al. proposed to work in the near-threshold voltage region in order to recover some of the delay performance at the expense of a little energy increase [3]. Therefore, an energy-delay modeling framework including all inversion regions is developed in Markovic's work. However, variability is still not considered in these models. A variability aware model that extends over the weak and moderate inversion region is presented in this section. This model 300 VDD=0.2V
occurences
250 200 150 100 50 0 0.5
1
1.5
2
300 VDD=1.2V
occurences
250 200 150 100 50 0 0.5
1
1.5
2
Delay normalized to the mean
Fig. 2. Delay distribution of a 32-bit adder: (a) in sub-threshold (0.2 V), (b) in above-threshold (1.2 V).
Fig. 3. The evolution of dynamic, leakage and total energy consumption of a 32-bit adder with and without variability considerations.
is based on the EKV model expressions [1]. We start with a simple model based on the characteristics of an inverter. Then, we show that this model allows to predict the behaviour of more complex circuits. Spectre and Matlab simulations demonstrate that the WI model is no longer sufficient to model the performance of a system exposed to process variations. The proposed model is a simple tool for the assessment of the trade-offs faced between energy, speed and variability when designing in the sub- and near-threshold regions. 3.1. Current and delay model under variability analysis In the weak and moderate inversion regions, the drain current has previously been expressed by Eq. (1) in [1]: V GS V t V GS V t V DS 2 2 I DS ¼ I S ln 1 þ exp exp ln 1 þ exp 2nU T 2nU T 2U T ð1Þ where n is the sub-threshold slope factor, Vt is the threshold voltage, VGS and VDS are the gate to source and drain to source voltages, respectively. IS is the specific current given by the following equation: I S ¼ 2nμC ox U 2T W=L ¼ 2nβ U 2T
ð2Þ
where μ is the mobility, Cox is the oxide capacitance, UT is the thermal voltage and W/L denotes the channel width–length ratio of the transistor. Eq. (1) tends to the classical exponential WI model when V GS V t tends to minus infinity: V GS V t V DS I DS ¼ I S exp 1 exp ð3Þ nU T UT The model of Eq. (1) is a long channel model that does not include effects such as mobility reduction, velocity saturation and drain induced barrier lowering (DIBL). The mobility reduction and velocity saturation have a large impact on the strong inversion region. The DIBL has some impact on the performance in the WI and MI regions [8]. For simplicity sake and considering the stated goal of our work, DIBL was not included in the model, and as shown later, reasonable agreement is achieved anyway. Though DIBL is more important as technology scales, since we focus on the sub/near-threshold regions, the range of variation of VDD is small and thus the impact of VDD variation, through DIBL, in the developed model. Leakage currents can be determined from the IDS expression when V GS ¼ 0. Fig. 4 shows the ID versus VGS for NMOS transistor determined with the weak inversion model (Eq. (3)), the complete model (Eq. (1)) and Spice simulations. As expected, the weak inversion model
1316
M. Slimani et al. / Microelectronics Journal 46 (2015) 1313–1324
is not sufficient to model the current in the near-threshold region. Moreover, we observe that the complete model is not so accurate in strong inversion region due to the lack of modeling of mobility reduction and velocity saturation [12]. To extract model parameters, we have applied the method based on the gm/ID curve described in [12]. The following values were obtained for the NMOS transistor of a basic inverter from the considered 65 nm LP industrial library:
n ¼1.22; Vt ¼0.38 (V); β ¼4.83e 4 (A/V2).
σ V ¼20e 3 (V); σ β ¼80e 6 ðA=V 2 Þ. t
The expression of the inverter's delay is estimated from the current model as follows: Td ¼
capacitance value (CL), it proves sufficient to consider just the characteristics of one transistor type, the NMOS in our case. For variability analysis, random Vt and β variations were considered, modeled as normal distributions with parameters (μV t , σ V t , μβ , σ β ). In order to determine σ V t and σ β , a set of ID versus VGS curves were determined through Spice Monte Carlo simulations with the statistical parameters provided by the foundry. For each of these curves, V t and β were extracted, as previously discussed, and from this set of V t and β values the standard deviation was calculated to determine σ V t and σ β . The resulting values are:
C L V DD I on
ð4Þ
where CL is the equivalent load capacitance, VDD is the supply voltage and Ion is the saturated on-current. Though this model is a rough one, which neglects the fact that the charge/discharge current is not constant and does not take into account the slope of the input signal, it proves acceptable for our goal of having an orientation tool of the expected trade-offs in the design space. In the same sense, since the delay is fitted using the equivalent load
The current model considering process variations is derived from Eq. (1) by replacing β and Vt by values that follow the normal distribution with the extracted mean and sigma values. Global variations are not included in this work, as adaptive body biasing can be efficiently used to compensate such variations [11]. It is well-known that the current and hence the delay are normally distributed in the Strong Inversion region and that in the Weak Inversion region, they have lognormal distribution [13]. However, the distribution in the moderate inversion region has not been well described to date. This is vital to correctly model the circuit characteristics as they have different expressions depending on the considered distribution. We consider this issue in the next section. 3.2. Current and delay distribution in different operating regions The probability density function (PDF) of a normal distribution is given by the following equation: 1 1 xμ 2 P ðX Þ ¼ pffiffiffiffiffiffiffiffiffi exp2ð σ Þ σ ð2π Þ
Fig. 4. ID versus VGS curves for NMOS transistor of inverter (W ¼0.2 μm and L ¼ 65 nm).
where μ and σ represent the mean and the standard deviation of the variable X, respectively. A lognormal distribution is closely related to the normal one. If X ¼ log ðYÞ is normally distributed with mean μ and standard deviation σ, then the random variable Y is a lognormal distribution characterized by the mean m and the variance v given by: σ2 ð6Þ m ¼ exp μ þ 2
0.4
0.7
Normal
Lognormal
0.6
0.3
0.5 0.4
PDF
PDF
ð5Þ
0.2
0.3 0.2
0.1
0.1 0
0
2
4
6
8
10
0
−4
−2
Fig. 5. (a) Lognormal distibution, (b) Normal distribution.
0
2
4
M. Slimani et al. / Microelectronics Journal 46 (2015) 1313–1324
v ¼ expð2μ þ σ 2 Þðexpðσ 2 Þ 1Þ
ð7Þ
simulation with mismatch effect. As expected, due to the exponential dependence of the sub-threshold currents on Vt, the delay histogram in the WI region (V DD ¼ 0:2) resembles the one of lognormal distribution. The delay in the SI region (VDD ¼1.2 V) clearly resembles the normal distribution, which is due to the linear dependence of the currents on Vt in SI region. However, it is difficult to distinguish between the lognormal and normal distributions when looking at the delay distribution for VDD ¼0.4 V and VDD ¼0.6 V.
300
300
250
250 VDD= 0.2V
200
occurences
occurences
Fig. 5(a) and (b) plot the PDF of a lognormal and a normal distribution with μ ¼ 0 and σ ¼ 1, respectively. Visually, what differentiates the two distributions is the symmetry: while the normal distribution looks symmetric, the lognormal distribution is highly asymmetric with a tail. However, this might not be clearly seen in real data. For instance, Fig. 6 shows the delay distribution of the 32-bit adder for different supply voltages from a 1k-point Monte Carlo
150 100 50
VDD= 0.4V
200 150 100 50
0 0.5
1
1.5
0 0.5
2
300
1
1.5
2
300
250
250 VDD= 0.6V
200
occurences
occurences
1317
150 100 50
VDD= 1.2V
200 150 100 50
0 0.5
1
1.5
0 0.5
2
1
1.5
2
Delay normalized to the mean Fig. 6. 32-bit adder delay distribution for different supply voltages.
−5
−5
x 10
2.5
Original data quantiles
Original data quantiles
2
1.5
1
0.5
x 10
2
1.5
1
0.5 Lognormal fitting
Normal fitting 0
0 0
0.5 1 1.5 Synthetic data quantiles
2 −5
x 10
0
0.5 1 1.5 2 Synthetic data quantiles
2.5 −5
x 10
Fig. 7. Fitting of the Monte Carlo data of the 32-bit adder delay for VDD ¼ 0.2 V. (a) Normal distribution, (b) Lognormal distribution. The blue (dark) points are original data while the cyan (light) points correspond to a set of 10,000 synthetic data forming the enveloppe-QQplot.
1318
M. Slimani et al. / Microelectronics Journal 46 (2015) 1313–1324
To determine the distribution that best fits the data, a graphical ”QQ-plot” method, compares the two distributions by plotting their percentiles against each other [14]. Fig. 7(a) and (b) show the QQ-plots of the Monte Carlo data of the 32-bit adder delay for VDD ¼0.2 V fitted using normal and lognormal distribution, respectively. We observe that the lognormal distribution is effectively the best fit for the delay data on sub-threshold region. The QQ-plot technique was used for all the delay data in each supply voltage to determine the distribution that results in the best fit in the moderate inversion region. The results show that the distribution from VDD ¼0.2 V until VDD ¼0.7 V is best fitted with a lognormal distribution. Beyond VDD ¼0.7 V, the data will be considered as normally distributed. 3.3. Model vs. the logic depth For a complex circuit with a logic depth LD corresponding to the number of inverter delays that compose the critical path of the circuit, the delay will be: T circuit ¼ LD :T d
ð8Þ
Simulating the circuit's critical path delay and normalizing to the delay of an inverter with the same supply voltage, provides LD. As previously stated, the current and thus the delay can have normal or lognormal distribution depending on the operating region. Consequently, the delay model will have different expressions depending on the nature of the considered distribution. This difference in the distribution made that for extracting LD for a circuit we choose to use the value at VDD equal to 0.2 V as representative for the sub/near-threshold region where we are focusing. Case of normal distribution: If Td has a normal distribution defined by (μdelay, σdelay), Tcircuit, which is the sum of LD normally distributed Td, will be a normal distribution too, defined by (LD :μdelay , pffiffiffiffiffiffiffiffiffi ðLD Þ:σ delay ). Therefore, the delay variability (σ delay =μdelay ) of the circuit can be obtained in the following equation: pffiffiffiffiffiffiffiffiffi ð9Þ var circuitdelay ¼ ð1= ðLD ÞÞ:var inverterdelay Table 1 lists the delay variability for different logic depths at V DD ¼ 1:2 V. As expected, the delay variability of a circuit decreases as its logic depth increases, and the decrease follows perfectly the ffi pffiffiffiffiffiffiffiffi 1= ðLD Þ law. Case of lognormal distribution: If Td is lognormally distributed, the sum of several lognormal random variables can be approximated by another lognormal random variable as shown in [15]. Matching the first moments as shown in [15], we obtain: 1 2
1 L3D 2 LD 1 þ expðσ 2log Þ
μðln T circuit Þ ¼ μlog þ σ 2log þ ln
ð10Þ
sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 1 σ ðln T circuit Þ ¼ ln 1 þ ðexpðσ 2log Þ 1Þ LD
ð11Þ
where μlog and σlog are respectively the mean and the standard deviation of the normal distribution log ðT d Þ. Table 1 Delay variability for different logic depths at VDD ¼ 1.2 V. LD
1 8 15 23
Similar equations have been shown in the work of Zhai [13]. However, while they consider just Vt variations with WI models, our concern is to derive a generic model taking into account all operating regions and considering both Vt and β variations. We use Eqs. (6) and (7) to calculate the mean and the standard deviation of the distribution Tcircuit, respectively. Fig. 8 shows delay variability of a chain of inverters with different logic depths at V DD ¼ 0:2 V and V DD ¼ 0:5 V. We observe that the model closely matches Spice simulation for both of the supply voltages. One of the most interesting parameter is the 3σ worst-case delay defined as the value from which just 0.3% of the data are above. This parameter is also computed differently depending on the considered distribution. If the delay Td is normally distributed with mean μ and standard deviation σ, the 3σ worst case delay is calculated as: T d;3σ ¼ μ þ 3σ
Otherwise, for VDD where Td follows a lognormal distribution with mean μ and standard deviation σ, the 3σ worst case Td is given by: T d;3σ ¼ expðμlog þ 3σ log Þ
Spice simulation
Analytical model
0.085 0.025 0.019 0.014
– 0.03 0.021 0.016
ð13Þ
where μlog and σlog are the mean and the standard deviation of the normal distribution log ðT d Þ respectively, and can be computed as follows: !
μ2 ffi μlog ¼ ln pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi σ 2 þ μ2 σ log ¼
ð14Þ
sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ffi ln
σ2 þ1 μ2
ð15Þ
Fig. 9 shows the evolution of typical and 3σ Worst Case (WC) delay of a chain of inverters with logic depth LD ¼ 23. As expected, the delay increases exponentially in the sub-threshold region. The complete model with and without variability consideration follows perfectly the Spice simulations, whereas the delay of the WI model deviates from 0:3 V of VDD. This proves, firstly, that the proposed variability model considering just Vt and β variations predicts well the effect of mismatch on the circuit delay. And, secondly, that as expected, the WI model is limited for nearthreshold voltages and above modeling. This is further illustrated in Fig. 10 where the delay variability obtained with the WI model, remains constant at different supply voltages, while the complete model has a similar shape as the Spice simulations. Fig. 11 shows how the WC delay determined by Zhai equations where just Vt variations are considered, is less accurate than the one derived by our model taking into account both Vt and β variations. We conclude that β effects on delay variability are not negligible. 3.4. Energy model under variability analysis The total energy consumed by the circuit is the sum of the dynamic energy Edyn, required to charge and discharge parasitic capacitances during logic transitions, and static energy Estat due to leakage currents (Ileak). This can be summarized in the following expression as shown by Calhoun in [16]: Etot ¼ C eff V 2DD þ V DD W eff I leak T circuit
Delay variability (σ delay =μdelay )
ð12Þ
ð16Þ
where Ceff denotes the average switched capacitance of the circuit. It is estimated by simulating the circuit for a typical simulation and solving C eff ¼ I avg T circuit =V DD . Weff denotes the normalized width that contributes to leakage current. It is calculated by measuring the average leakage current of the circuit normalized to the inverter's leakage current (Ileak).
M. Slimani et al. / Microelectronics Journal 46 (2015) 1313–1324
0.5
1319
1 Spice simulation
Spice simulation
Analytical model
Analytical model 0.8
VDD=0.5V
Delay variability (σ/μ)
Delay Variability (σ/μ)
0.4
0.3
0.2
0.1
0
VDD=0.2V 0.6
0.4
0.2
0
10
20
30
0
0
10
LD
20
30
LD
Fig. 8. Delay variability at sub-threshold voltages (V DD ¼ 0:5 V ; V DD ¼ 0:2 V ) for different logic depths.
0.16
Spice 3σ WC Complete model 3σ WC −6
0.14
WI model 3σ WC
10
Spice typical
complete model
WI model typical
0.1 σ/μ
Delay(s)
spectre
0.12
Complete model typical
−8
10
WI model
0.08 0.06 0.04
−10
10
0.02 0.3
0.4
0.5
0.6
0.7
0.8
0 0.2
VDD(V) Fig. 9. Evolution of typical and 3σ WC delay of a chain of inverters with logic depth LD ¼ 23 for different VDD.
Without loss of generality, we consider Just in time operation [17], where the circuit works in its maximum frequency, i.e. the period is set to be the critical path delay of the circuit. To consider variability in energy analysis, 3σ worst-case delay and mean Ileak are considered in the static energy calculation as follows: Estat ¼ V DD μIleak T circuit;3σ
ð17Þ
0.4
0.6
0.8
1
1.2
VDD(V) Fig. 10. Delay variability for different VDD(V) of a chain of inverters with LD ¼ 23.
minimum value, equal to 0.2 V, for the delay). For the circuit related parameters, the same as the case of the inverter applies. Only delay and energy Spice simulations of the circuit at only one VDD for each case are required. This is a big saving in simulation effort compared to performing delay and energy Monte Carlo simulations of the circuit for many VDD values.
3.5. Summary of methodology
3.6. Test cases : 32-bit adder and 8-bit multiplier
The methodology proposed in this paper intends to give a simple way for assessing the energy – speed – variability tradeoffs for sub- and near-threshold operations. The flow diagram summarizing the different steps to extract model parameters is represented in Fig. 12. From the diagram it can be seen that, in order to apply the proposed modeling approach, we need to extract parameters that are related to the process (n, Vt, β, σ V t , σ β , CL) and parameters that are related to the particular circuit (LD, Ceff, Weff). For the process parameters are required DC typical and Monte Carlo Spice simulations of one transistor and delay and energy Spice simulations of an inverter at only one VDD value in each case (the nominal one, equal to 1.2 V, for the energy and the
The proposed methodology was tested and validated in the analysis of two benchmark circuits. The results of the proposed modeling approach was checked against extensive Spice Monte Carlo simulations for the two circuits. The first benchmark circuit is a 32-bit carry save adder used in a FIR filter for biomedical applications. The second benchmark circuit is an 8-bit array multiplier, used as test circuit in [8]. Both circuits have been synthesized in the considered 65 nm Low Power process technology. Table 2 lists all values of the parameters of our model for the benchmark circuits. CL is the equivalent load capacitance of the inverter. LD is obtained by normalizing the circuit delay to that of the inverter at V DD ¼ 0:2 V. σ V t and σ β are obtained by 1 K Monte
1320
M. Slimani et al. / Microelectronics Journal 46 (2015) 1313–1324
100
Spice simulation Our model Model using Zhai equations
−6
80
Error(%)
3σ WC delay(s)
10
−7
10
Our Model Zhai equations
60
40
20
−8
0 0.2
10
0.3
0.35 0.4 VDD(V)
0.45
0.3
0.4
0.5
VDD(V)
Fig. 11. Comparison of our model (Vt and β variations) and the model with Zhai equations (Vt variations).
Carlo simulation. The values of Ceff and Weff needed for energy modeling, are extracted from Spice simulation at VDD ¼ 1.2 V. Fig. 13 compares delay models (WI and complete) vs Spice simulations for the considered benchmark circuits. Results show once again that the complete model is required to correctly model the delay under variability consideration in near-threshold region. Fig. 14 shows the consumed energy considering mismatch variations of the benchmark circuits. We observe that the total energy consumed is slightly different from that determined by the models. The error of the energy on the minimum point is of 7% (6%) and 9% (7%), when obtained by the complete and the WI model, respectively, for the considered 32-bit adder (8-bit multiplier) benchmark circuit. Unexpectedly, the error of the WI model is comparable to that of the complete model as shown in Fig. 14 (right). This can be explained by the inverse tendency of variability and delay determined by the WI model. On the one hand, the variability of the WI model is constant whatever the value of the supply voltage. Hence, it is overestimated in the moderate and strong inversion regions. On the other hand, the delay obtained by the WI model is underestimated with respect to the one obtained by Spice simulations as observed in Fig. 9. As the energy contains the product of variability and the delay, there is a compensation that allows the WI model to remain an adequate model of energy consumption even under variability analysis and outside the WI region. Overall, the results of the application of the proposed approach to these two test cases was verified against extensive Monte Carlo Spice simulations. This verification shows that the proposed method fulfills the goal of easily providing a tool for assessing the delay-energy tradeoff in a variability aware analysis. When dealing with larger circuits and complete systems, the designer may apply the proposed method to key building blocks of his system in order to quickly analyze the trade-offs. The verification of the precision of the method against Spice simulations for large circuits is out of the scope of this work.
point under variability analysis. In [16], authors have provided such expression but without considering effects of variations. To our knowledge, this is the first attempt to derive analytical solution of VDDopt while considering mismatch effects. The expression, determined by Calhoun in [16], is a solution of a set of equations derived from a WI model. As we have demonstrated in Section 3.4 that the WI model is still a good model for energy consumption under variability consideration, we will use the simple WI model and we will follow the same steps as in [16]. Let us start with the expression of the total energy Etot shown in Eq. (16). Under variability consideration, this expression is: Etot;var ¼ C eff V 2DD þ V DD W eff μleak T circuit;3σ
ð18Þ
Using the WI model as in Eq. (3), μleak and Tcircuit can be written as follows: Vt μleak ¼ 2nβU 2T exp ð19Þ nU T CV DD V DD V t 2nβ U 2T exp nU T T circuit ¼ LD T d
Td ¼
ð20Þ
Since we know that Tcircuit is a lognormal distribution in the WI region and the relevant part of the MI region, the 3σ WC Tcircuit is: T circuit;3σ ¼ expðμlog ;circuit þ 3σ log ;circuit Þ
ð21Þ
where μlog ;circuit ¼ meanðlog ðT circuit ÞÞ and σ log ;circuit ¼ StdDevðlog ð T circuit ÞÞ and they can be written as functions of μlog and σlog, the mean and the standard deviation of the delay of one inverter Td, as shown in Eqs. (10) and (11), respectively. Applying the function logarithm to Td in Eq. (20) , we get: ! V DD C Vt μlog ¼ lnðV DD Þ þ ln ð22Þ þ nU T nU T 2nβ U 2T sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi σ2 1 ¼ σ 2 þ 2beta 2 Vt μbeta ðnU T Þ
3.7. Analytical solution of minimum energy point under variability analysis
σ log
In this section, we derive an analytical expression for the optimum voltage VDDopt corresponding to the minimum energy
Now, substituting Eqs. (19) and (21)) into Eq. (18) gives the expression of total energy under variability analysis Etot;var , which
ð23Þ
M. Slimani et al. / Microelectronics Journal 46 (2015) 1313–1324
1321
Fig. 12. Diagram summarizing the different steps to extract model parameters.
can be written as follows: V DD þB Etot;var ¼ C eff V 2DD þ AV DD exp lnðV DD Þ nU T V DD ¼ C eff V 2DD þ AV 2DD exp B nU T
To search for V DDopt giving the minimum energy point, we solve ∂Etot;var ¼ 0. The derivative of Etot;var with respect to VDD is: ∂V DD ∂Etot;var V DD V DD ¼ 2C eff V DD þ AV DD 2 exp B ð25Þ ∂V DD nU T nU T
where
Equating Eq. (25) to 0 and making a simple change of variables yields: 2C eff V DDopt V DDopt 2 ð26Þ exp 2 ¼ nU T nU T AexpðB 2Þ
Vt A ¼ 2nβU 2T W eff exp nU ! T Vt C L3 sDffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi! B¼ þ ln þ0:5ln nU T σ V 2t 2nβU 2T σ2 þ 2beta LD 1 þ exp 2 ðnU T Þ μbeta
vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi! !! u σ V 2t σ 2beta u σ 2V t σ 2beta tln 1 þ 1 exp þ 1 þ þ 3 LD ðnU T Þ2 μ2beta ðnU 2T Þ μ2beta
Therefore, the analytical expression of VDDopt is: 2C eff V DDopt ¼ nU T 2 lambertW AexpðB 2Þ
sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
þ 0:5
ð24Þ
where, W ¼lambertW(x) W expðWÞ ¼ x.
function
is
the
ð27Þ solution
to
1322
M. Slimani et al. / Microelectronics Journal 46 (2015) 1313–1324
Table 2 Model parameters values for the benchmark circuits. Prameter
32-bit adder
n UT β
1.22 0.026 (V)
8-bit multiplier
4.83e-4 ðA=V 2 Þ 0.38 (V) 20e-3 (V)
Vt σV t σβ
4. Conclusion
80e-6 ðA=V 2 Þ 3e-15 (F) 43 8.34e-14 (F) 300
CL LD Ceff Weff
Analytical solution vs. Simulation: The VDDopt values calculated with Eq. (27) for the considered benchmark circuits are shown in Table 3. A good agreement with the values obtained by Spice simulation is achieved (less than 5% error).
21 6.60e-13 (F) 880
This paper has examined modeling under variability consideration for sub-threshold and near-threshold operation. We have presented a complete model to assess the delay-energy trade-off in a variability aware analysis. However, for energy modeling we have shown that the WI model is suitable for both
Spice 3σ WC
−4
10
0.18
Complete model 3σ WC
spectre
WI model \3sigma WC
0.16
complete model
Spice typical −6
10
0.12 σdelay/μdelay
Delay(s)
WI model typical
−8
10
WI model
0.14
Complete model typical
0.1 0.08 0.06
−10
10
0.04 0.02
−12
10
0.2
0.4
0.6 VDD(V)
0 0.2
0.8
0.4
0.6 0.8 VDD(V)
1
1.2
0.2 −5
10
0.18 0.16
σDelay/μDelay
Delay(s)
0.14
−10
10
0.12 0.1 0.08 0.06 0.04
−15
10
0.02 0.2
0.4
0.6 VDD(V)
0.8
0 0
0.2
0.4
0.6 0.8 VDD(V)
1
Fig. 13. Evolution of typical and 3σ WC delay (right) and the delay variability (σ delay =μdelay ) (left) obtained by Spice simulation, complete model and WI model. The top plots correspond to the 32-bit adder and the bottom plots to the 8-bit multiplier.
M. Slimani et al. / Microelectronics Journal 46 (2015) 1313–1324
Edyn spice
10
Eleak spice
Edyn model
−14
10
Eleak complete model Etotal WI model Eleak WI model
−16
10
6
Error(%)
Energy(J)
10
WI model 8
Etotal complete model
Zoom on Etotal −15
Complete model
Etotal spice
−14
10
1323
4
2
−17
10
0.2
0.4
0.6
0.8
0.2 0 0.2
1
0.4
VDD(V)
0.6 0.8 VDD(V)
1
1.2
−12
10
8
−14
10
Error(%)
Energy(J)
6
4
−16
10
2 −18
10
0.2
0.4
0.6 VDD(V)
0.8
1
0.1
0.2
0.4
0.6
VDD(V)
Fig. 14. Consumed energy under mismatch variations (left), and % of complete and WI model error compared to Spice simulations (right) of a 32-bit CSA (top) and 8-bit multiplier (bottom).
Table 3 Analytical and simulation results for optimal VDD.
Acknowledgement
Circuit
Analytical
Spice simulation
Error (%)
32-bit adder 8-bit multiplier
0.2574 0.1966
0.27 0.20
4.66 1.7
sub- and near-threshold regimes. Therefore, we implemented the simple equations of the WI model to introduce an analytical solution for the optimum supply voltage that minimizes the total energy considering process variation effects. The proposed model allows designers to easily assess energy, speed and variability tradeoffs that arise in the design of sub/near-threshold digital circuits.
The authors would like to acknowledge the financial support of STIC AmSud Projects NanoRadio and RELEMED and CSIC, Universidad de la República.
References [1] E. Vittoz, Weak inversion for ultimate low-power logic, in: Christian Piguet (Ed.), Low-Power CMOS Circuits, CRC, Boca Raton, 2006. [2] Y.P. Tsividis, C. McAndrew, Operation and Modeling of the MOS Transistor, Oxford University Press, New York, 2011. [3] D. Markovic, C. Wang, L. Alarcon, J. Rabaey, Ultra-low-power design in nearthreshold region, Proc. IEEE, 98 (2) (2010) 237–252.
1324
M. Slimani et al. / Microelectronics Journal 46 (2015) 1313–1324
[4] A. Wang, A. Chandrakasan, A 180-mV subthreshold FFT processor using a minimum energy design methodology, IEEE J. Solid-State Circuits 40 (1) (2004) 310–319. [5] S. Hanson, B. Zhai, D. Blaauw, D. Sylvester, A. Bryant, X. Wang, Energy optimality and variability in subthreshold design, in: Proceedings of the 2006 International Symposium on Low Power Electronics and Design, 2006. ISLPED'06, IEEE, 2006, pp. 363–365. [6] J. Kwong, A. Chandrakasan, Variation-driven device sizing for minimum energy sub-threshold circuits, in: Proceedings of the 2006 International Symposium on Low Power Electronics and Design, ACM, 2006, pp. 8–13. [7] D. Bol, D. Kamel, D. Flandre, J. Legat, Nanometer MOSFET effects on the minimum-energy point of 45 nm subthreshold logic, in: Proceedings of the 14th ACM/IEEE International Symposium on Low Power Electronics and Design, ACM, 2009, pp. 3–8. [8] D. Bol, R. Ambroise, D. Flandre, J. Legat, Interests and limitations of technology scaling for subthreshold logic, IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 17 (10) (2009) 1508–1519. [9] M. Slimani, F. Silveira, P. Matherat, Variability-speed-consumption trade-off in near threshold operation, Integr. Circuit Syst. Des. Power Timing Model., Optim. Simul. (2011) 308–316. [10] K. Bernstein, D.J. Frank, A.E. Gattiker, W. Haensch, B.L. Ji, S.R. Nassif, E.J. Nowak, D.J. Pearson, N.J. Rohrer, High-performance cmos variability in the 65-nm regime and beyond, IBM J. Res. Dev. 50 (4.5) (2006) 433–449.
[11] J. Kao, M. Miyazaki, A. Chandrakasan, A 175-mv multiply-accumulate unit using an adaptive supply voltage and body bias architecture, IEEE J. Solid-State Circuits 37 (11) (2002) 1545–1554. [12] P. Jespers, The Gm/ID Methodology, a Sizing Tool for Low-voltage Analog CMOS Circuits: The Semi-Empirical and Compact Model Approaches, Springer Verlag, Dordrecht, 2009. [13] B. Zhai, S. Hanson, D. Blaauw, D. Sylvester, Analysis and mitigation of variability in subthreshold design, in: Proceedings of the 2005 International Symposium on Low Power Electronics and Design, ACM, 2005, pp. 20–25. [14] H. Thode, Testing for Normality, 164, CRC, New York, 2002. [15] N. Beaulieu, A. Abu-Dayya, P. McLane, Comparison of methods of computing lognormal sum distributions and outages for digital wireless applications, in: IEEE International Conference on Communications, 1994. ICC'94, SUPERCOMM/ICC'94, Conference Record,'Serving Humanity Through Communications.', IEEE, 1994, pp. 1270–1275. [16] B. Calhoun, A. Chandrakasan, Characterizing and modeling minimum energy operation for subthreshold circuits, in: Proceedings of the 2004 International Symposium on Low Power Electronics and Design, 2004. ISLPED'04, IEEE, 2005, pp. 90–95. [17] M. Seok, S. Hanson, D. Sylvester, D. Blaauw, Analysis and optimization of sleep modes in subthreshold circuit design, in: Proceedings of the 44th Annual Design Automation Conference, ACM, 2007, pp. 694–699.