Microelectronics Reliability 52 (2012) 1837–1842
Contents lists available at SciVerse ScienceDirect
Microelectronics Reliability journal homepage: www.elsevier.com/locate/microrel
Statistical model of NBTI and reliability simulation for analogue circuits Z. LV a,b, L. Milor a,⇑, S. Yang b a b
School of ECE, Georgia Institute of Technology, Atlanta, GA 30332, USA Automation Dept., Tsinghua University, Beijing 100084, China
a r t i c l e
i n f o
Article history: Received 3 June 2012 Accepted 18 June 2012 Available online 25 July 2012
a b s t r a c t The faults caused by process variation and degradation are different, which makes it difficult to handle reliability. This paper proposes a statistical model of NBTI, which captures all the variations that come from circuit use conditions, and presents a framework to do analogue reliability simulations, with which reliability can be handled as early as the design phase. A feed-forward equalizer (FFE) was studied. For this circuit, we have found the limiting performances for reliability, which helps to enable the design of on-line tests for reliability. Ó 2012 Elsevier Ltd. All rights reserved.
1. Introduction Process variation poses great reliability problems to ultrascaled CMOS integrated circuits [1]. And more seriously, due to transistor degradation, integrated circuits become more and more susceptible during their lifetimes [2]. Immediately after circuit production there is a spread of device parameters due to process variation, shown in Fig. 1a, resulting in a spread of circuit performances and a less than 100% time-zero-yield, shown in Fig. 1b. As time going on, the spread of device parameters changes due to aging. This change includes a shift in location due to the deterministic physical process of aging and an increase in the spread due to the statistical variation in degradation. This change causes the spread of the circuit performances to move. As a result, more samples fail the specifications. In order to grantee the circuit functionality throughout a circuit’s lifetime, circuit designers need to take this feature into account during the design phase, including both process variations and degradation. The degradation mechanisms include electro-migration, gate oxide breakdown, hot carrier injection, negative bias temperature instability, positive bias temperature instability, backend dielectric breakdown, and stress migration [3]. Negative bias temperature instability (NBTI) is one of the important reliability concerns that most drastically impacts circuit performances. This paper focuses on NBTI to provide an example of a methodology to analyze process variation and degradation. 2. Background Many physical models have been developed at the device level for NBTI, but there is a gap between device design and circuit ⇑ Corresponding author. Tel.: +1 404 894 4793; fax: +1 404 894 4641. E-mail address:
[email protected] (L. Milor). 0026-2714/$ - see front matter Ó 2012 Elsevier Ltd. All rights reserved. http://dx.doi.org/10.1016/j.microrel.2012.06.039
design [4]. So the circuit designer cannot easily estimate the reliability for a newly designed circuit. This problem is more serious for analogue circuits, because analogue circuits are more sensitive to device mismatch, which makes it harder to estimate the impact of changing devices parameters on circuit performances. The aging model for failure mechanisms also needs to capture the mismatch in degradation. To cover this gap, some work has been done recently. In [5], the authors have developed a framework for reliability simulation and have applied it to a flash ADC. They considered various failure mechanisms, including NBTI, analyzed the impact on circuit performance, and found the reliability-critical devices and failure mechanisms. In this work, they have used a deterministic physical model to inject degradation, but did not take into account the randomness in the degradation process. The authors of [6] have taken the variation of the process and stress voltages into account, and have used a quasi-static way to calculate the threshold-voltage shift. In [1], the authors also considered process variations, and used an integrating-way to deal with stress voltage variation. They proposed some techniques, like Monte Carlo analysis, a screening experimental design, regression, and a response surface model to do reliability-aware design for analogue circuits. Considerable work has been done on developing statistical models for NBTI degradation. The authors of [7–10] have considered the intrinsic NBTI-variability in trap creation carefully. They use quantitative simulations to estimate the effect of each trapped charge to determine the threshold voltage change. In these works, the trapped charges are generated and placed randomly in the threedimensional channel. This model has been used to get the distribution of the threshold voltage shift. The resulting distribution has been applied to a four-bit fast-carry adder to analyze its gate propagation delay, static power and dynamic power in [8] and to an SRAM to determine its noise margin in [9,10]. In [8] the authors also considered line edge roughness (LER) and poly-Si granularity (PSG). Other process variations, such as the initial threshold voltage
1838
Z. LV et al. / Microelectronics Reliability 52 (2012) 1837–1842
improved. However, this is challenging, since many analogue circuits have a large number of specifications. In our case, in this paper, at least 54 (details can be found in Section 5) performances need to be tested. It is expensive to test all of them, especially with an on-line procedure. Based on the fact that the degradation runs in a quasi-deterministic way (the physical model shows that devices degrade deterministically, but variation from use conditions also increases the spread), there are dependencies among the degraded performances. By recognizing the limiting performances, we can reduce the number of tests needed to detect degradation. In this work, we use a Weibull distribution to model the randomness in real use conditions, and combine the physical models to describe the stochastic NBTI degradation. We propose a framework to do reliability simulation for both process variation and degradation. We consider a complex circuit, an FFE, and analyze the impact on all performances of the circuit. We have found the limiting performances from a reliability perspective for the FFE. This paper is organized as follows. Section 3 describes that statistical model of NBTI degradation. We then describe the aging simulation procedure in Section 4. The application to the FFE is discussed in Section 5, followed by the conclusion in Section 6. 3. Statistical model of NBTI taking into account use conditions
Fig. 1. Impact of process variation and degradation, the spread of (a) device parameters, and (b) circuit performances.
distribution and the gate oxide thickness, also, have an effect on NBTI variability. To model these factors, the authors of [11] applied a stochastic collocation method, which was originally proposed to model the gate delay under uncertainty [12]. In [13–15], the authors have considered effect of the gate area and ioxide thickness on NBTI variability. They calculate the variance of the threshold voltage shift first and consider a normal distribution. But in [16,17], the authors found that the distribution of the threshold voltage shift is not normal. Analysis and experimental data showed that the Skellam distribution is more suitable. They also provided a method to calculate the variance. Another problem is that these models mainly considered two kinds of random sources, process variation and intrinsic aging fluctuation (random trap creation). Extrinsic causes (that come from use conditions), besides the stress voltage, are not taken into account. For extrinsic causes, the authors of [18] developed a dynamic temperature model. They use integration to get the cumulative effect of the temperature history. Although they proposed some tricks, the calculation is time-consuming. The distribution of the threshold voltage shift, taking extrinsic randomness into account, is more complex. Generally the Weibull distribution can model the time-to-failure of a device well. Therefore we combined the Weibull distribution of time-to-failure with the NBTI physical mechanism to get the distribution of the threshold voltage shift. To grantee circuit reliability, the widely used technique is redundancy [19]. Improving the original circuit design based on the reliability simulation results can also improve reliability [4]. Additionally, proper tests, which cover most aging failures, can set off an alarm when the circuit is faulty, which calls necessary maintenance or replacement. In this way, reliability can also be
NBTI is due to the presence of interface traps at the gate oxide interface. There are several models of NBTI degradation, including the reaction–diffusion theory [20] and the hole trapping/de-trapping theory [21]. This work is based on the reaction–diffusion theory. According to the reaction–diffusion theory, NBTI is modeled based on electrochemical reactions. Models involve a forward rate constant (kf) for the rate of dissociation of the Si-H bonds at the interface and a reverse rate constant (kr) for the self-annealing process [20]. The total number of interface traps (NIT), as a function of the forward and reverse rate constants and time, t, is
NIT ¼
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi kf N0 =kr ðDtÞn
ð1Þ
where N0 is the concentration of inversion carriers, D is a constant modeling the diffusion of hydrogen in the oxide, and n 1=6 [22]. The interface traps result in an increase in charge in the capacitor formed by the channel and the gate, which, in turn, increases the threshold voltage (Vtp). Hence the increase in Vtp due to NBTI aging when the device is under stress follows a power law function:
DV tp
ðm þ 1Þq ¼ C ox
sffiffiffiffiffiffiffiffiffiffi kf N 0 ðDtÞn kr
ð2Þ
where Cox is the gate oxide capacitance per unit area, q is the electron charge, and m is a correction factor to accommodate the decrease in mobility. When the stress is removed, a recovery phase is initiated and the number of interface traps decreases according to
pffiffi 1 f1 t NIT ¼ NITinit pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 1 þ f2 t
ð3Þ
for t > 0, where f1 and f2 are constants which depend on the oxide thickness and the back diffusion rate of hydrogen [20]. The threshold voltage shifts accordingly. This model does not account for bias, temperature, and gate area. Hence, Eq. (2) must be supplemented with temperature and bias [22] and gate area, AGOX, [15,23], i.e., n DV tp ¼ A A1=2 GOX expðcV gs Þ expðE0 =kTÞt
ð4Þ
where A and c are process constants and E0 is an activation energy of the NBTI process. c is approximately 0.75 [24]. In this analysis
Z. LV et al. / Microelectronics Reliability 52 (2012) 1837–1842
we assume a constant temperature. C ¼ A expðE0 =kTÞ. Consequently,
Therefore,
we
n DV tp ðtÞ ¼ CA1=2 GOX expðcV gs Þt :
1839
set
ð5Þ
Let us assume that the PMOS lifetime is defined at the process level as a 10% degradation of the threshold voltage of a minimum-sized device with area AGOX–MIN when subjected to a stress of the supply voltage, i.e., Vgs = Vdd. Then, n DV tp ðt fail Þ ¼ 0:1V tpnom ¼ CA1=2 GOXMIN exp ðcV dd Þt fail
ð6Þ
where Vtp–nom is the nominal PMOS threshold voltage. Let Aratio = AGOX/AGOX–MIN Then, replacing C in (5) with tfail for minimum-sized devices, we have
DV tp ¼
n 0:1V tpnom t 1=2 t fail Aratio exp c V dd V gs
ð7Þ
For circuits undergoing identical stress, tfail for minimum-sized devices with supply voltage stress is not a constant, but a distribution. We model this with a two-parameter Weibull distribution. The time-to-failure, tfail, relates to the probability, p, as
tfail p t fail ¼ 1 exp
b !
g
ð8Þ
where g is the characteristic lifetime and b is the shape parameter. Therefore,
t fail ¼ gð lnð1 pÞÞ
1=b
ð9Þ
As we vary p within [0,1], we generate the distribution of tfail for devices undergoing identical stress. Combining (7) and (9), we have that
Fig. 3. Mean value of DVtp vs. the standard deviation of DVtp.
DV tp ðtÞ ¼
0:1V tpnom exp c V gs V dd n t n 1=b 1=2 Aratio gð lnð1 pÞÞ
ð10Þ
The trajectory of the threshold voltages of each device in a circuit is a function of the applied stress, the gate oxide area, the statistics of failure for the minimum-sized device (modeled with g and b), and the random probability point, p, selected from a uniform distribution within [0, 1]. Note that we do not assume that a circuit fails based on the 10% shift assumed for process data. Instead, we look at the failure of the circuit specifications. Hence, a larger or smaller shift can constitute failure, depending on the sensitivity of the circuit design to shifts in threshold voltage. Fig. 2 shows the threshold shifts generated using our model. The curves are fit to experimental data in Fig. 9 in [25]. The three fitting curves on each plot are for data measured at different temperatures. Compared to the measured data, the mean values of threshold voltages agree well, as shown in Fig. 2a, because we have taken the physical mechanism into account. Additionally, there are also variations at each age. The standard deviation of threshold shifts versus time is shown in Fig. 2b. The standard deviation increases with time. Prior work [15] has shown the relationship between the average threshold voltage shift and the sigma of the threshold voltage shift in Fig. 6 in [15]. We also plot this relationship in Fig. 3. Compared to Fig. 6 in [15], the standard deviation in our work is larger because we have considered randomness from use conditions, while prior work [15] only considered the intrinsic randomness in trap creation.
4. Architecture of aging simulation
Fig. 2. (a) Mean value of DVtp. (b) Standard deviation of DVtp.
The degradation model is a statistical one. We account for the randomness in the degradation process, which results in an increase in the standard deviation of the threshold voltages of all of the devices in the circuit as a function of time. Therefore, we use Monte Carlo simulation to analyze circuit reliability. The architecture for reliability simulation of aging analogue circuits in this paper is shown in Fig. 4. The device aging behavioral model captures both process variation and degradation. The model for process parameter variation from manufacture is a Normal distribution. It includes variations of channel length, channel width, threshold, gate-oxide thickness and electro-mobility, and also includes the mismatches in channel length, channel width, threshold voltage, gate-oxide thickness. These random parameters are assigned to each device in the circuit randomly in each iteration of the Monte Carlo analysis.
1840
Z. LV et al. / Microelectronics Reliability 52 (2012) 1837–1842
Fig. 4. Architecture of reliability simulation for both process variation and degradation.
The parameters degradations are generated using our model derived in Section 3, which captures all the variations from use conditions. The degradation in each parameter is a function of both the use conditions and intrinsic random variation in aging, modeled with (10). Using a circuit simulator, like Spice, we get the performance of an aged circuit. Multiple independent runs via Monte Carlo analysis enable the determination of the most frequent degraded performances, called the limiting performances. The simulation results also capture the dependencies among performances as they degrade. That is, a sample that fails one specification may also fail another specification at the same time. If both specifications degrade together, by measuring only one of them, we can identify the bad circuits. Based on these results, we can select the proper tests for reliability. The following case study demonstrates this capability.
5. Application Our methodology has been applied to a feed-forward equalizer (FFE) [26], which has a delay line of nine delay cells and nine variable-gain amplifiers (VGAs). The block diagram is shown in Fig. 5. Each delay cell is composed of six analogue delay units. Each delay cell creates a delay of about 60 ps. The output of each delay cell is amplified by a VGA and then added up at the output node. The mathematic model for this circuit is a finite-impulse-response (FIR) filter. The delay cell specification is 50–70 ps. Each VGA has a control voltage varying from 0 V to 1.2 V to get a gain ranging from 0.25 to 0.25. There are nine control voltages in the FFE, so the settings produce a nine-dimensional state space. We cannot enumerate all of them. We only consider the corners of the voltage
settings. But there are still 18 settings. The three main specifications of the FFE are gain, delay, and bandwidth. So the total number of specifications is at least 18 3 = 54. The circuit has 305 MOSFET transistors, among which 63 of them are PMOSs which suffer from NBTI degradation. It is implemented with 0.18 lm CMOS. The gate-oxide thickness is 4 ns. The supply voltage is 1.8 V. Eighteen PMOSs work as amplifiers and the nominal stress is about 0.67 V. 27 PMOSs work as active loads, and the nominal stress is 0.56 V. The other PMOSs are used for bias circuits and the nominal stress is 1.34 V. The stress voltages change with the input signal. We did some simulations for one VGA and the delay line. The results are shown in the Weibull plot, Fig. 6. From Fig. 6a, we can see that the bandwidth dominates the degradation of the VGA. From Fig. 6b we find that the bandwidth at tap one dominates the degradation of the delay line. For the VGA, to get Fig. 6a, 50 ms is needed to do one AC simulation to get the gain and bandwidth. We did 1000 iterations at seven time points which are [0.5, 1, 2, 4, 8, 16] years. So the total time for simulation was 50 ms 1000 6 = 300 s. The actual simulation time was 330 s. For the FFE, 0.6 s was needed to do one AC simulation to get a bandwidth for one tap at one control-voltage setting. So the total time was 0.6 s 18 (settings) 1000 (iterations) 6 (time points) = 18 h. The actual simulation time to get Fig. 6b was about 19 h. The FFE circuit is big and has eighteen different settings, so the simulation time greatly increased comparing to the VGA. The system degradation is also dominated by bandwidth. Fig. 7 shows the distribution of the bandwidth of the second tap of the FFE, (a) at time-zero and (b) after 1 year of aging. We can see that the mean value of the bandwidth changed, which corresponds to the shift in threshold voltages discussed in Section 1. And the
Fig. 5. Structure of the feed-forward equalizer, VGA = variable gain amplifier, D = delay element.
Z. LV et al. / Microelectronics Reliability 52 (2012) 1837–1842
1841
Fig. 8. Faulty samples of the bandwidth at the second tap of the FFE due to 1 year NBTI degradation.
Do the Monte-Carlo simulation to get the degraded performances at the given age. Let T = φ , fT = 0 Specify fT* (the required fault coverage) While fT < f T* {
Fig. 6. Simulation results of the FFE: (a) Weilbull plot for one VGA, and (b) Weibull plot for bandwidths of each tap of the delay line.
Find out the uncovered samples ( S F ) by T and their performances. Select the test j * that covers the most failed samples of S F . Add j * to T , re-calculate fT } T is the resulting test set, fT is the fault coverage of T . Fig. 9. The test selection algorithm.
Fig. 7. (a) The spread of bandwidth of the second tap of the FFE due to process variation at time-zero. (b) Due to 1 year NBTI degradation.
variance also became bigger, because aging increases the spread of device parameters. In Fig. 7b, there is a second peak that is not present in Fig. 7a. Fig. 8 shows some examples of the faulty circuits, associated with
the second peak in Fig. 7b. Fig. 8 contains the Bode plots at the second tap of the FFE. For a good sample, the solid curve in Fig. 8, the Bode plots reach the cut-off frequency beyond 6 GHz. But for a bad sample due to NBTI, the dashed curve, the bandwidth does not degrade gradually, but sinks sharply at about 2.5 GHz. These bad samples constitute the second peak in Fig. 7b. Using our model, we clearly captured this special degradation feature of circuit performance. We also developed a greedy algorithm to select on-line tests for reliability failure. For the FFE, there are 54 major performances, as discussed before. When one or more of them fail their specifications, the FFE is considered to be faulty. The performances’ degradations are not independent of each other. When one performance fails its specification, another performance may also fail at the same time. In this case, by testing one of them, we can identify the bad circuits. So not all of these performances need be tested in order to make sure the FFE is fault-free. Our testing scheme intends to detect most wearout faults using as few tests as possible. The reliability tests are designed to test the performances of the FFE output directly. So there are 54 tests for 54 performances. We did a Monte Carlo simulation with 1000 iterations. Some of these samples in the simulation may fail one performance specification. For example one sample may fail the bandwidth requirement of the first tap. So the test for the bandwidth of the first tap will indentify this bad sample. The test for the most vulnerable performance is most valuable. So our algorithm selects the most valuable test first and then selects the most valuable test for the uncovered faulty samples until the fault coverage reaches the requirement. The algorithm is shown in Fig. 9. T denotes the set of tests, fT is the fault coverage.
1842
Z. LV et al. / Microelectronics Reliability 52 (2012) 1837–1842
Table 1 Test selection results. Age (years)
Number of tests for 80% fault coverage
Number of test for 90% fault coverage
0.5 1 2 4 8 16
9 11 12 14 11 13
14 22 23 23 26 18
Table 1 shows the test selection results at different ages. When the age is 0.5 years, nine tests can cover 80% of failures of all 54 specifications, while fourteen tests have 90% coverage. Within 16 years, only fourteen tests are needed for 80% fault coverage, and 26 tests are needed for 90% fault coverage. Generally the number of tests needed increase with time. And the tests we need also change. So the reliability testing scheme needs to be changed with time. The framework in this paper provides this information for testing scheme design. 6. Conclusion A statistical model of NBTI, which captures all the variations from circuit use conditions, and a framework to do analogue reliability simulation, have been proposed. Based on this model and framework, we can capture special degradation features of analogue circuits, find limiting performances and select tests for reliability. References [1] Maricau E et al. Efficient variability-aware NBTI and hot carrier circuit reliability analysis. IEEE Trans Comput-Aided Des Integr Circ Syst 2010;29(12):1884–93. [2] Maricau E et al. Stochastic circuit reliability analysis 11 March 2011:1–6. [3] Strong AW et al. Reliability wear-out mechanisms in advanced CMOS technologies. New Jersey (USA): John Wiley & Sons; 2009. [4] Gielen G et al. Analogue circuit reliability in sub-32 nanometer CMOS: analysis and mitigation 2011(March):1–6.
[5] Yan B et al. Reliability simulation and circuit-failure analysis in analogue and mixed-signal applications. IEEE Trans Dev Mater Reliab 2009;9(3): 339–47. [6] Latif MAA, et al. Hussin NBTI-induced 8-Bit DAC circuit mismatch in SystemOn-Chip (SoC). In: ASQED; 2011. p. 29–36. [7] Brown AR et al. Statistical simulation of progressive NBTI degradation in a 45 nm technology pMOSFET. IEEE Trans Electron Dev 2010;57(9):2320–3. [8] Tang TB et al. Statistical NBTI-effect prediction for ULSI circuits. ISCAS; May 2010. p. 2494–97. [9] Cheng B et al. Impact of NBTI/PBTI on SRAM stability degradation. IEEE Electron Dev Lett 2011;32(6):740–2. [10] Asenov A et al. Statistical aspects of NBTI/PBTI and impact on SRAM yield 2011:14–8. [11] Lu Y, et al. Statistical reliability analysis under process variation and aging effects. DAC’09; July 2009. p. 26–31. [12] Kumar SY et al. A probabilistic collocation method based statistical gate delay model considering process variations and multiple input switching. 2005:7–11. [13] Vaidyanathan B, et al. Intrinsic NBTI-variability aware statistical pipeline performance assessment and tuning. In: ICCAD; 2009. p. 2–5. [14] Kang K, et al. Estimation of statistical variation in temporal NBTI degradation and its impact on lifetime circuit performance. In: ICCAD; 2007. p. 4–8. [15] Pae S et al. Effect of BTI degradation on transistor variabiltiy in advanced semiconductor technologies. IEEE Trans Dev Mater Reliab 2008;8(3): 519–25. [16] Huard V, et al. NBTI degradation: From transistor to SRAM arrays. In: IRPS; 2008. p. 289–300. [17] Rauch SE et al. Review and reexamination of reliability effects related to NBTIinduced statistical variations. IEEE Tran Dev Mater Reliab 2007;7(4):524–30. [18] Zhang B, et al. Modeling of NBTI-induced PMOS degradation under arbitrary dynamic temperature variation. In: ISQED; 2008. p. 17–19. [19] Siemaszko D et al. Impact of modularity and redundancy in optimising the reliability of power systems that include a large number of power converters. Mircoelectronics Reliab 2011;51:1484–8. [20] Alam M et al. A comprehensive model of PMOS NBTI degradation. Microelectron Reliab 2005;45:71–81. [21] Grasser T et al. The paradigm shift in understanding the bias temperature instanstability: from reaction/diffusion to switching oxide traps. IEEE Trans Electron Dev 2011;58(11):3652–66. [22] Kumar SV et al. A finite-oxide thickness-based analytical model for negative bias temperature instability. IEEE Trans Dev Mater Reliab 2009;9(4):537–56. [23] Rauch SE. The statistics of NBTI-induced VT and b mismatch shifts in pMOSFETs. IEEE Trans Dev Mater Reliab 2002;2(4):89–93. [24] Jha NK et al. NBTI degradation and its impact for analogue circuit reliability. IEEE Trans Electron Dev 2005;52(12):2609–15. [25] Alam M et al. A comprehensive model of PMOS NBTI degradation: recent progress. Microelectron Reliab 2007;47:853–62. [26] Kim HS, et al. Performance analysis of balanced and unbalanced feed-forward equalizer structures for multi-gigabit applications in 0.18 lm CMOS process. EuMIC; 2008. p. 143–146.