Microelectronics Reliability 47 (2007) 1336–1342 www.elsevier.com/locate/microrel
An approach to statistical analysis of gate oxide breakdown mechanisms Cher Ming Tan *, Nagarajan Raghavan School of Electrical and Electronics Engineering, Nanyang Technological University, Singapore 639 798, Singapore Received 4 July 2007 Available online 20 August 2007
Abstract A credible statistical algorithm is required to determine the parameters of the bimodal Weibull mixture distribution exhibited by the gate oxide breakdown phenomenon, consisting of extrinsic early-life defect-induced failures and intrinsic wear-out failure mechanisms. We use a global maximization algorithm called simulated annealing (SA) in conjunction with the expectation–maximization (EM) algorithm to maximize the log-likelihood function of multi-censored gate oxide failure data. The results show that the proposed statistical algorithm provides a good fit to the stochastic nature of the test failure data. The Akaike information criterion (AIC) is used to verify the number of failure mechanisms in the given set of data and the Bayes’ posterior probability theory is utilized to determine the probability of each failure data belonging to different failure mechanisms. 2007 Elsevier Ltd. All rights reserved.
1. Introduction Gate oxide failure study is increasingly important as we downscale devices further and advance towards the 45 and 32 nm technologies with a gate oxide thickness (tox) of 1 nm and below. The physics that governs gate oxide failures has been well established, and it is found that gate oxide failures are generally caused by two failure mechanisms [1]. One of them is the extrinsic early life breakdown due to large size defects induced by low-quality processing. The other mechanism is the intrinsic wear-out failure mechanism where percolation paths are gradually formed, which eventually short the gate to the substrate. However, as gate oxide thickness undergoes further downscaling resulting in an increased electric field, it may be possible for new failure mechanisms to occur. In such cases, the number of components in the multi-modal distribution describing the time-dependent dielectric breakdown (TDDB) data may be greater than the two failure mechanisms currently observed since every failure mechanism has its own characteristic distribution and its associated parameters. The Akaike information criterion (AIC) [2,3] *
Corresponding author. Tel.: +65 6790 4567; fax: +65 6792 0415. E-mail address:
[email protected] (C.M. Tan).
0026-2714/$ - see front matter 2007 Elsevier Ltd. All rights reserved. doi:10.1016/j.microrel.2007.07.011
will be used in this study to determine the number of failure mechanisms in a given set of TDDB data. The presence of two or more failure mechanisms implies that the TDDB data consists of a mixture of statistical distributions. In order to be able to analyze such mixture distributions, the expectation–maximization (EM) algorithm [4,5] has been conventionally used. However, given the drawbacks of the EM algorithm in its ability to perform only local optimization, it is necessary to use other algorithms in conjunction with it so that the globally optimal values of the distribution parameters for the various failure mechanisms are obtained. In this study, we make use of a global optimization algorithm, known as simulated annealing (SA) [6,7] in conjunction with the EM algorithm for this purpose. Most reliability tests are time-terminated where the accelerated tests on a sample of test units are stopped after some time t = s. When analyzing such time-terminated test data, it is necessary to consider the unfailed test units in addition to the failed test units. In this case, the unfailed test units are interpreted as censored data at time t = s. Additionally, some test units may be purposely withdrawn during the TDDB test for other reasons even before they fail. The time of withdrawal of these units from the test must also be considered as censored data. It is therefore
C.M. Tan, N. Raghavan / Microelectronics Reliability 47 (2007) 1336–1342
necessary to have a statistical algorithm that is capable of analyzing mixture distribution data which is multi-censored in nature. The EM and SA algorithms are in fact capable of such a data analysis. It is essential to consider censored data during failure data analysis because ignorance of censored data can lead to a large under-estimation of the device reliability, especially when extrapolated to normal use conditions. Considering the time and expenses incurred in extensive failure analysis (FA), the proposed statistical algorithm is capable of determining the probability of each failure data belonging to the different failure mechanisms using the Bayes’ posterior probability [5] theory. The use of this technique enables us to selectively identify a small number of failed units from the entire test sample on which FA needs to be performed. The failure analysis on these selected units will be sufficient to uncover the physical nature of breakdown for all the possible failure mechanisms in the given test. The novelty of this work lies in using the EM and SA algorithms together for the first time for analyzing multicensored mixture distribution TDDB data. Furthermore, the Akaike information criterion (AIC) is used to determine the number of failure mechanisms in a given set of failure data and the Bayes’ posterior probability theory is applied to determine the probability of each failure unit belonging to the different failure mechanisms. The main motivation behind this work is to illustrate the applicability and usefulness of statistics in the semiconductor industry for precise data analysis. Since the formation of percolation breakdown paths in the TDDB process is catastrophic in nature, the Weibull distribution is the best description of the statistical nature of gate oxide failures [8]. Therefore, in this work, we apply the mixture distribution theory to the Weibull statistics. 2. Experimental details TDDB time-terminated tests were conducted on 51 MOS capacitors with an oxide thickness tox = 11 nm with the accelerated E-field stress set to 10.4 MV/cm. A total of 44 failures were observed and the remaining 7 devices correspond to censored data removed from the test for various reasons. The set of failure data obtained is tabulated
1337
in Table 1. Note that the test was time-terminated at s = 207.5 s, when 3 out of the 51 units had not yet failed. The censored times for the 7 devices are 0.15, 2.5, 19.03, 120.21, 207.5, 207.5 and 207.5 s. 3. Akaike information criterion (AIC) Although past research studies reveals that TDDB failures always exhibit two failure mechanisms, it may be possible for other new failure mechanisms to exist in today’s 65 nm and below technologies. Therefore, a robust statistical technique is needed to determine the possible number of failure mechanisms in a given set of data. For this purpose, we make use of the Akaike information criterion (AIC) [2,3], which was derived based on the v2-statistic. This AIC is a measure of the goodness of fit of a proposed statistical mixture model. It is used to determine the optimal number of mixture components that fit a given set of failure data. The expression for AIC is given in (1) where L is the maximum likelihood estimation (MLE) of the fitted mixture model and m is the number of degrees of freedom (d.o.f) in a k-failure modes mixture distribution, The relationship between m and k is given by m = 3k 1. The term 2m in (1) is the penalty factor which prevents too many mixture components to be redundantly added to fit the data more accurately. The number of components k, for which the AIC is the lowest, is found to be the best estimated number of failure mechanisms in the test data. AIC ¼ 2 log½LðkÞ þ 2 m
ð1Þ
Table 2 lists the MLE values for different assumed number of mixture components for the given failure data set in Table 1 using the expectation and maximization (EM) algorithm results. From Table 2, it can be noticed that a two-component mixture model (k = 2) is the best fit to the set of TDDB test failure data in Table 1 because it has the lowest AIC criterion value. The optimal number of components that fit a given set of failure data is precisely indicative of the number of failure mechanisms in that data set. Therefore, since k = 2, there are two failure mechanisms in the data set in Table 1. 4. Simulated annealing (SA) 4.1. Theory of simulated annealing
Table 1 TDDB failure data for the accelerated test at E-field stress of 10.4 MV/cm 5.847 · 1010 4.628 · 107 2.246 · 104 0.142 19.205 145.08 157.17 168.56 173.75 182.26 186.69
5.543 · 109 5.631 · 107 8.172 · 104 3.217 21.347 145.98 159.13 169.78 174.30 186.43 187.29
2.999 · 108 5.301 · 105 2.09 · 103 6.515 72.218 146.67 159.75 171.77 176.38 186.44 206.07
5.138 · 108 8.097 · 105 0.112 8.599 131.85 154.37 164.04 172.89 180.43 186.69 207.5
The simulated annealing (SA) algorithm, as its name suggests, is a technique adopted from the thermodynamic Table 2 AIC value for different number of mixture components (k) k
m
MLE
Penalty (2m)
AIC
1 2 3 4
2 5 8 11
906.73 60.54 63.43 63.89
+4 +10 +16 +22
1817.47 111.08 111.85 105.79
1338
C.M. Tan, N. Raghavan / Microelectronics Reliability 47 (2007) 1336–1342
processes involved in the physical annealing of atoms in a solid, where the energy of the atoms eventually settle down to their global minimum energy equilibrium value irrespective of their initial energy provided that the annealing rate is slow [7]. Applying SA to a mathematical function F(x), a step-by-step optimization procedure to determine the global minimum of F(x), where x is a vector whose elements are the independent variables of F, is developed as follows: (a) Define the range of values [xmin, xmax] for every element in the input vector x. (b) Perform a set of simulations by a randomized search over the entire input parameter subspace to determine roughly the range of the objective function = jFmax Fminj. Set the ‘‘initial temperature’’ of this mathematical optimization system to T0 = jFmax Fminj. (c) For this initial temperature cycle T = T0 and starting from the initial state function value of F0 corresponding to an input vector x0, perform a randomized space search of the input parameter subspace and move to a random point x1 with a corresponding state function value of F1. Compare the value of F1 with that of F0 and do the following: If F 1 < F 0 ! Transit from State S 0 ! S 1 : Elseif ðF 1 > F 0 Þ and eðF 0 F 1 Þ=T 0 > U ½0; 1 ! Transit from State S 0 ! S 1 : Else ! Remain in State S 0 ! Repeat steps for another x1 : End In the above statements, U[0, 1] represents a random number generator for a uniform distribution with lower and upper limits of 0 and 1 respectively and the exponential expression resembles the Boltzmann’s probability distribution. The exponential condition in the elseif statement above is called the metropolis acceptance criterion [6,7]. The conditional logic above is executed repeatedly as lower values of F are detected and state transitions F0 ! F1 ! F2 ! F3 ! ! FJ occur. This process of successive state transitions resembles a Markov process and it is performed until a quasi-static equilibrium state is reached wherein no more function transitions occur since a global minimum equilibrium state has been attained for the current initial temperature cycle of T = T0. Generally, around 1000–2000 random simulations are required to achieve a quasistatic equilibrium condition. (d) After a quasi-static equilibrium state is reached for initial cycle T = T0 with a final objective function value of FJ, the temperature is now reduced according to an annealing schedule given by T N ¼ kN
T 0; N 2 Zþ 0 and 0 < k < 1. The symbol k is called the temperature reduction coefficient and its value is generally taken to be around 0.70–0.95. The larger the k value, the slower the annealing (cooling) rate, the more accurate the final solution, the longer the computational time and the larger the system memory requirements. After reducing the temperature, step (c) is again performed starting from the previous function state of FJ and undergoing further state transitions FJ ! FJ+1 ! FK etc. based on the metropolis acceptance criterion. This process of temperature reductions and state transitions is iteratively performed. (e) For every temperature TN, the ratio of successful state transitions to the total number of simulation runs (space searches) is to be determined. This ratio is termed as the acceptance ratio (AR). The value of AR for initial high temperatures (T0) can be quite high at around 90–95% since the metropolis criterion is satisfied in most cases for high temperatures. As the temperature is successively reduced, the objective function moves closer to the global minimum value and hence the proportion of further successful transitions reduces, thus causing the AR value to decrease sharply. (f) The stopping criterion for the SA algorithm is defined as follows. When the AR value decreases to below 0.01%, the algorithm is terminated. An AR value of 0.01% clearly indicates that the function value has almost reached the global minimum. Finally, after all these steps are executed, it can be confidently concluded that the SA algorithm would terminate with the function value very close to the global minimum. The corresponding set of input parameter values which minimized the objective function can henceforth be easily determined from the simulation results. 4.2. Application to TDDB analysis The probability density function (p.d.f) for a bimodal Weibull mixture distribution is expressed as in (2) below where gi and bi are the Weibull scale and shape parameters of the ith component distribution respectively. The symbol pi is the mixing weight of each distribution. The symbol f(t) is the mixture p.d.f and w is the set of all component distribution parameters (gi, bi, pi). f ðt; wÞ ¼
2 X i¼1
pi fi ðt j gi ; bi Þ;
X
pi ¼ 1
ð2Þ
i
An accurate estimate of the parameters in (2) can be obtained by global maximization of the log-likelihood function (L) expressed in (3) below where n is the number of failure data, r is the number of censored data, (n + r) is the test sample size, tf is the ‘‘failure’’ data, tc is the ‘‘censored’’ data and R refers to the reliability at the time of
C.M. Tan, N. Raghavan / Microelectronics Reliability 47 (2007) 1336–1342
censoring. It is well-known that the global maximum of (L) corresponds to the best-fit parameters for the mixture model [5]. As described in Section 4.1, a very efficient technique for this global optimization is simulated annealing [6,7]. Since the SA algorithm helps determine the global minimum, we find the global minimum of (L) instead of the global maximum of (L). The distribution parameters (gi, bi, pi) which optimize the log-likelihood function are the best-fit optimal parameters for the mixture distribution. Lðt; wÞ ¼ ln
X X n! ln½f ðtf j wÞ þ ln½Rðtc j wÞ þ ðn rÞ! c f ð3Þ
4.3. SA algorithm results The set of multi-censored TDDB failure data in Table 1 was input to the simulated annealing algorithm with the ranges specified for the various distribution parameters as given by Table 3. These ranges can be easily determined by observing the Weibull probability plot of the failure data. Notice in Table 3 that the range specified for the parameters is very wide in nature. Despite this wide range, the SA algorithm will be able to fit the data reasonably well. The results of the SA algorithm are provided in Table 4, along with the Weibull probability plot in Fig. 1. The convergence of the negative log-likelihood function (L)
Table 3 Input range of values for all parameters
GLOBAL CONVERGENCE OF LOG-LIKELIHOOD FUNCTION 1000
Range of values for distribution parameters Minimum
Maximum
g1 b1 g2 b2 p1
0.001 0.01 130 6 0
100 2 250 12 1
Table 4 Optimal values of parameters after SA algorithm execution Parameter
Value
g1 b1 p1 g2 b2 p2 No. of simulations
3.128 0.155 0.52 182.49 11.19 0.48 82446
900
Negative Log-Likelihood value (-L)
Parameter
600 500 400 300 200 100 0
GLOBAL MINIMUM = +79.67 0
0.5
1
1.5
2
2.5
3
3.5
4
4.5 x104
Fig. 2. Convergence of the negative log-likelihood (L) function towards global minimum.
FRACTION OF SUCCESSFUL STATE TRANSITIONS FOR EVERY MARKOV CYCLE
100 90 SIMULATED ANNEALING FIT FAILURE MECHANISM 2
(STEP 1)
0.25 0.10
FAILURE MECHANISM 1
0.05 0.02
Acceptance Ratio (AR) %
Probability
700
No. of successful Markov Transitions
Weibull Probability Plot
0.50
(-L)MIN = +79.67
800
-100
0.99 0.96 0.90 0.75
1339
80
T = λN * T N
70
0
Reduction in TEMPERATURE
60 50 40 30 20
AR < 0.01% (STOP)
10
0.01 10-5
100
TTF Data (s) Fig. 1. Weibull plot to fit TDDB data based on results from simulated annealing algorithm.
0
0
10
20
30
40
50
60
# Markov Cycles (Temperature Transitions) = N Fig. 3. Trend of decrease in the acceptance ratio (AR) as number of Markov temperature cycles (N) is increased.
1340
C.M. Tan, N. Raghavan / Microelectronics Reliability 47 (2007) 1336–1342
towards the global minimum during the annealing algorithm execution is clearly illustrated by Fig. 2. Fig. 3 shows the reduction in the acceptance ratio (AR) as the temperature of the optimization system is reduced from T0 ! T1 ! ! T1. At T = T1, the AR value is as low as 0.01% implying that we have reached very close to the global minimum. Having approached the global minimum of (L) or the global maximum of (L), we have entered the well containing the global optimum and we may now use the local optimization EM algorithm to reliably reach the exact optimum point. In this case, although EM algorithm only performs a local maximization, we attain the global optimum result since the SA algorithm execution previously has enabled us to enter into the global well.
1. Start with an initial guess for each component’s parameters {pi, gi, bi}. 2. For iteration j, determine posterior probability, Pr{ijtf} and Pr{ijtc}. ðjþ1Þ 3. Find the MLE for gi by minimizing the absolute value of the function in (8) using the Simplex Search method for y = gi (Iteration: j + 1). 4. Perform step 3 above for y = bi (Iteration: j + 1). 5. Determine the modified value of pi, using the values of gi and bi just found in steps 3 and 4, based on Eq. (7). The above sequence of steps 2–5 are repeatedly executed until the desired level of accuracy is reached. In this study, we define the accuracy as 0.1% which implies that subsequent changes in all the parameter values are less than 0.1% when the algorithm terminates.
5. EM algorithm 5.2. Application and results of the EM algorithm
5.1. Theory of EM algorithm In spite of the drawback of its ability to perform only local optimization, the EM algorithm can be very efficient after the execution of the SA algorithm. The EM algorithm solves the partial derivative equations in (4) using local optimization methods such as the Newton–Raphson method and the Simplex Search method to maximize the log-likelihood function (L). oL oL oL ¼ ¼ ¼0 ogi obi opi
ð4Þ
The expressions in (2)–(4) may be modified to give Eqs. (5)–(8) for ease of solving. The expression Pr{ijtf} in (5) is the probability that failure data tf belongs to component i of the mixture distribution. It is called the posterior probability and is a result of the Bayes’ theory. Similarly, the expression in (6) is the probability that censored data tc belongs to component i of the mixture distribution. The symbol y in (8) needs to be replaced by gi or bi as required. fi ðtf j gi ; bi Þ all i p i fi ðt f j gi ; bi Þ
ð5Þ
Ri ðtc j gi ; bi Þ all i p i Ri ðt c j gi ; bi Þ
ð6Þ
Prfi j tf g ¼ pi P Prfi j tc g ¼ pi P pi ¼ ð1=nÞ
X f
X
Prfi j tf g
f
þ
X c
Prfi j tf g þ
X
! Prfi j tc g
ð7Þ
The EM algorithm is implemented on the Matlab platform. The results from the SA algorithm in Table 4 are fed as input to the EM algorithm. The simulation results indicate that the EM algorithm is very time efficient and quickly converges to the optimal solution within around 15 s. The optimal values for the mixture distribution parameters obtained after solving the EM algorithm are tabulated in Table 5. Fig. 4 shows the final cumulative probability plot for the TDDB mixture distribution obtained from the EM algorithm. Notice that the final results after the sequential execution of the SA and EM algorithms is more accurate. This is partly due to the fact that the SA algorithm results serve as a very good initial guess for the EM algorithm. The results in Table 5 suggest that both the extrinsic and intrinsic failure mechanisms are equally important in the gateoxide breakdown process for this set of test sample as indicated by the equally large mixing weights of p1 = 44.5% and p2 = 55.5% respectively. The least-square residual (LSR) value for fitting the TDDB data is found to be as low as 0.047, showing the effectiveness of the combined use of the SA and EM algorithms. In Table 5, since b1 1 and b2 1, failure mechanism 1 (FM 1) corresponds to early failures while FM 2 corresponds to wearout failures. Table 6 is the indicator matrix showing the probability for some of the failure data belonging to the two different
c
o ln½fi ðtf j gi ; bi Þ oy
o ln½Ri ðtc j gi ; bi Þ ¼0 Prfi j tc g oy
Table 5 Final results after executing the EM algorithm
ð8Þ
The sequence of solving the equations above to determine the optimal distribution parameter values is as follows:
Parameter
Value
g1 b1 p1 g2 b2 p2
0.887 0.124 0.445 180.26 9.93 0.555
C.M. Tan, N. Raghavan / Microelectronics Reliability 47 (2007) 1336–1342
to have failed by extrinsic mechanism (e.g. TTF = 8.599 s) and the other one by the intrinsic mechanism (e.g. TTF = 174.3 s) based on the results in Table 6.
Weibull Probability Plot 0.99 0.96 0.90 0.75
Probability
0.50
1341
Final Fitting Results FAILURE MECHANISM 2
INPUT: Simulated Annealing
6. Conclusion
OUTPUT: EM Algorithm 0.25 0.10
FAILURE MECHANISM 1
0.05 0.02 0.01
10-5
100
TTF Data (s) Fig. 4. Final Weibull probability plot of TDDB data after execution of EM algorithm. Data fit is now very accurate after sequential execution of the SA and EM algorithms.
Table 6 Indicator matrix showing probability of each ‘‘failure’’ data belonging to FM 1 (extrinsic)/FM 2 (intrinsic) TTF (s)
Pr (FM 1)
Pr (FM 2)
Remark
5.847e-010 5.543e-009 2.999e-008 5.138e-008 6.5149 8.599 19.205 21.347 72.218 131.85 145.08 145.98 146.67 186.44 186.69 187.29 206.07 207.5
1 1 1 1 1 1 0.9999999 0.9999998 0.9651696 0.0639903 0.0272867 0.0258675 0.0248414 0.0080863 0.0081294 0.008242 0.0317072 0.0385043
6.85E-112 2.79E-102 4.583E-95 9.175E-93 1.252E-12 2.002E-11 6.182E-08 1.782E-07 0.0348304 0.9360097 0.9727133 0.9741325 0.9751586 0.9919137 0.9918706 0.991758 0.9682928 0.9614957
Extrinsic Extrinsic Extrinsic Extrinsic Extrinsic Extrinsic Extrinsic Extrinsic Extrinsic Intrinsic Intrinsic Intrinsic Intrinsic Intrinsic Intrinsic Intrinsic Intrinsic Intrinsic
failure mechanisms [extrinsic mechanism – early failures (FM 1) and intrinsic mechanism – wear-out failures (FM 2)]. This information is extracted from the Bayes’ posterior probability theory given by (5). Based on the data in Table 6, it can be confidently concluded that the first 19 failures up to TTF of 72.218 s are due to the extrinsic breakdown (FM 1) while the remaining failures belong to the intrinsic failure mechanism (FM 2) as indicated by the corresponding extreme probability values. As a result, since we are able to identify the failure mechanism to which each failure data is most likely to belong to, it is not necessary to perform failure analysis on all the failed units. It would suffice to just investigate two failure units: one which is expected
A comprehensive outlook into the statistical nature of gate oxide TDDB failures has been provided. The accuracy of the use of a concrete combination of the SA and EM algorithms for multi-censored TDDB data analysis is evident from the results shown in the previous sections. The AIC criterion helped us verify that there are indeed only two failure mechanisms in the set of data and the Bayes’ theory helped identify the most probable failure mechanism that each failure data could be associated to. In spite of its efficiency, the EM algorithm which has always been used in the past, has a few drawbacks. It only performs local optimization and therefore the accuracy of the final solution is heavily dependent on the user’s initial guess for the distribution parameters. Moreover, the EM algorithm is gradient sensitive since it makes use of the Newton–Raphson method. These drawbacks have been overcome in this work by combining the EM with the SA algorithm. Since SA algorithm is only based on random space search and also since it spans the entire parameter space defined, it is independent of the initial guess and insensitive to the gradient of the objective function (L). The combined use of both the SA and EM algorithms has resulted in a more robust method which has helped overcome the inherent drawbacks of the EM algorithm and solve the mixture distribution problem more reliably and accurately. The use of the posterior probability theory has helped selectively identify failed units for failure analysis (FA) thereby reducing the need for an extensive FA in the industry. This can in turn reduce the cost and time spent on FA, which is considered a very expensive process in the semiconductor industry. Acknowledgements The authors thank the School of EEE, Nanyang Technological University (NTU), Singapore for the logistical support provided during the course of this research work. References [1] Degraeve R, Ogier JL, Bellens R, Roussel PJ, Groeseneken G, Maes HE. A new model for the field dependence of intrinsic and extrinsic time-dependent dielectric breakdown. IEEE Trans Electron Dev 1998;45(2):472–81. [2] Akaike H. A new look at the statistical model identification. IEEE Trans Automat Contr 1974;6:716–23. AC-19. [3] Bucar T, Nagode M, Fajdiga M. Reliability approximation using finite Weibull mixture distributions. Reliab Eng Sys Safe 2004;84(3):241–51. [4] Dempster AP, Laird NM, Rubin DB. Maximum likelihood from incomplete data via the EM Algorithm. J Roy Stat Soc 1977;39(1): 1–38.
1342
C.M. Tan, N. Raghavan / Microelectronics Reliability 47 (2007) 1336–1342
[5] Jiang S, Kececioglu D. Maximum likelihood estimates, from censored data, for mixed-Weibull distributions. IEEE Trans Reliab 1992;41(2):248–55. [6] Li Y, Yao J, Yao D. An efficient composite simulated annealing algorithm for global optimization. Int Conf Commun Circ Syst 2002;2:1165–9.
[7] Brooks SP, Morgan BJT. Optimization using simulated annealing. Statistician 1995;44(2):241–57. [8] Milton O. Reliability and failure of electronic materials and devices. Academic Press; 1998.