Improved step stress accelerated life testing method for electronic product

Improved step stress accelerated life testing method for electronic product

Microelectronics Reliability 52 (2012) 2773–2780 Contents lists available at SciVerse ScienceDirect Microelectronics Reliability journal homepage: w...

591KB Sizes 0 Downloads 149 Views

Microelectronics Reliability 52 (2012) 2773–2780

Contents lists available at SciVerse ScienceDirect

Microelectronics Reliability journal homepage: www.elsevier.com/locate/microrel

Improved step stress accelerated life testing method for electronic product He Qingchuan ⇑, Chen Wenhua ⇑, Pan Jun, Qian Ping Zhejiang Province’s Key Laboratory of Reliability Technology for Mechanical and Electronic Product, Zhejiang Sci-Tech University, Hangzhou 310018, China

a r t i c l e

i n f o

Article history: Received 8 July 2011 Received in revised form 26 March 2012 Accepted 5 April 2012 Available online 7 June 2012

a b s t r a c t Quantitative accelerated life testing (ALT) is designed to quantify the life characteristics of the product under normal use conditions. Usually, step stress ALT (SSALT) method is firstly selected to do quantitative ALT planning when resource limitations on the availability of test prototypes and/or test equipment pose a restriction on the number of samples that can be tested. This paper describes the main limitations of the SSALT method and proposes an improved SSALT method for electronic product. Without changing ALT planning, the new method could be used to duplicate the failure, verify whether all samples can exhibit survivability well at all stress levels and whether failure modes occurring at high stress level are differ from those occurring at low stress level, and to help confirm the leading cause for failures without additional testing. A case study is given to illustrate the implementation of the new method. Guidelines for the new method are then provided in the discussion and conclusions. Ó 2012 Elsevier Ltd. All rights reserved.

1. Introduction In today’s competitive marketplace, electronic product design teams are under immense pressure to reduce product qualification time. Consequently, product qualification test is usually performed through quantitative accelerated life testing (ALT) [1]. Quantitative ALT is designed to quantify the life characteristics of the product under normal use conditions, and thereby provide reliability information, including the determination of the probability of failure of the product under use conditions, mean life under use conditions, and projected warranty period. In addition, reliability information could be used to assist in the performance of risk assessments, design comparisons, etc. Before quantitative ALT is performed, it is extremely important to do ALT planning including test condition selection, ALT method selection, stress level selection, etc. Studies [1–4] have reported that how to identify the dominant failure modes and failure mechanisms, and then how to select the appropriate acceleration loads; how to determine the test procedures; how to determine the ALT methods, which will be the focus of this paper; how to determine the stress levels; how to perform the tests; and how to interpret the test data, which includes extrapolating the accelerated testing results to normal operating conditions. When resource limitations on the availability of test prototypes and/or test equipment pose a restriction on the number of samples that can be tested, SSALT method is the first one that selected to do ALT planning [5–8]. It is capable of providing much reliability information, avoiding a ⇑ Corresponding authors. Tel.: +86 0571 86843742; fax: +86 0571 86843367 (H. Qingchuan). E-mail addresses: [email protected] (H. Qingchuan), [email protected] (C. Wenhua). 0026-2714/$ - see front matter Ó 2012 Elsevier Ltd. All rights reserved. http://dx.doi.org/10.1016/j.microrel.2012.04.003

high stress start point, reducing the test time and hence the test expense [5–10]. During quantitative ALT of electronic product, if failed component is repairable, it should be repaired and placed back on test, if failed component is unrepairable, it should be replaced by a new one and then placed back on test [10]. For example, a personal computer (PC) mainly contains display, motherboard, central process unit (CPU), graphics process unit (GPU), memory card, hard disk, network card, power supply, fans, and other input/output ports. While the ALT of the PC is being conducted, if the fan fails we should replace it with a new one and then continue the ALT; if the motherboard fails and could be still repaired (e.g. the caused for failure is the breakdown of the South Bridge or the North Bridge Chipset.) we should repair it and then continue the ALT. We hope the repaired or new component could experience the same test conditions and stress conditions as those the failed component has experienced without changing ALT planning, whose purpose is to duplicate the failure and determine a root-cause [3,10,11]. This practice is a commonly incorporated in the test-analyzeand-improve development strategy of high-end, low-productionrun products, but our aim is inconsistent with the traditional SSALT procedure. Moreover, some samples may fail at low stress level for the cumulative effect of exposure at successive stress, and, as a result, not all samples can experience all accelerated stress levels before they are taken off test for a permanent failure. Thus, it is difficult to determine whether all samples can be operating well for a long period of time at all accelerated stress levels. These issues will be discussed in more detail in the section two. For electronic products, no-fault-found (NFF) failures have been widely observed during field use and qualification test. NFF implies that a failure (fault) occurred or is reported to have occurred during a product’s use, and then the product is analyzed or tested to

2774

H. Qingchuan et al. / Microelectronics Reliability 52 (2012) 2773–2780

confirm the failure, but ‘‘a failure or fault’’ could be not found [11]. A high NFF rate in a product can cause customer inconvenience, loss of customer confidence and can damage a company’s reputation. If the information on NFF failures collected from qualification test is credible, the comprehensive cause and effect diagram (it is provided by Qi et al. [11]) can be used to identify the possible causes for a field failure, and then the products could be improved in time to prevent the impact of NFF failures from spreading through customers. Moreover, for an electronic product, when it has been used for several or tens of years, it is very common that a high NFF rate in it could bring so much inconvenience to us that we do not used it any more though it can still operate. This phenomenon indicates the frequency of NFF failures in a given period could be used as the termination criterion of the qualification test and the ALT if we can sure the information on NFF failures is reliable. Therefore, the reliable information on NFF failure can be very useful for us to terminate the test in time and hence avoid wasting the test expense. The fact that ALT planning is designed by using the traditional SSALT method makes much information on NFF failures unacceptable for determining the root-cause, and hence the frequency of NFF failures cannot be used as the termination criterion. The reasons will be given in the section two. The purpose of this paper is to propose an ALT method for electronic products to make up for some shortages of the traditional SSALT method. This paper describes the main limitations of traditional SSALT method when it is applied to do quantitative ALT planning for electronic products and proposes an improved SSALT method. Then, one case of implementation of the new ALT method to DC–DC converter is presented. Lastly, the paper gives the discussions and conclusion on the new methods.

2. Descriptions of traditional SSALT method’s limitations and improved SSALT method’s advantages Nelson [2] is the first to propose the step stress scheme, with the cumulative exposure model (CEM) and method of analysis. Fig. 1 depicts a traditional simple SSALT procedure with three stress levels, where stress S0 is usual stress level. Initially, n test samples are placed at stress S1, and tested for a period of time s1. Then the stress is increase to S2, and test is continued until time s2, when the stress is increase to S3. Test is continued until all samples fail or a pre-specified censoring time T, whichever comes first. In the test, total of ni failures are observed at time ti,j, (j = 1,2,. . .ni) while testing at Si (i = 1,2,3), and nc (nc = n  n1  n2  n3) samples are un-failed until censored time T. The choice of stress level is very import for us to obtain credible reliability information. As shown in Fig. 2, the stress limits include the specification limits, design limits, operating limits and the destruct limits [1]. The specification limits are provided by the manufacturer to limit the use conditions by the customer. The design limits are the stress conditions at which the product is designed

Fig. 1. A traditional simple SSALT schematic.

Fig. 2. Stress limits diagram.

to survive. The operational limits of the product are reached when the product can no longer function at the accelerated conditions due to a recoverable failure. The stress value at which the product fails permanently and catastrophically is identified as the destruct limit. It is clear that as the stress level becomes higher, the required test duration decreases, but as the stress level moves away from the design limits, the uncertainty in the extrapolation increases [2]. In addition, the total time of quantitative test determines accuracy of product qualification test but not the number of failures. Normally, stress levels for ALT should fall outside the product specification limits but inside the operating limits. This could ensure failure mechanisms and failure modes in the accelerated environment could be the same as those observed under actual usage conditions [1]. In Fig. 1, let us suppose S1 is equal to upper specification limit, S2 is equal to upper design limit and S3 is equal to a value that between upper design limit and upper operating limit. The traditional SSALT procedure shows the test stress is timedependent and hence the stress loading procedure is not reversible. That is, while the ALT planning is performed, the samples cannot be subject to S2 from 0 to s1 or from s2 to T unless the test planning is changed. We suppose a repairable component in a product fails at S2, and then it is repaired and placed back on test; or suppose the failed component is unrepairable, and then it is replaced by a new one and placed back on test. When ALT is continued run under S2 the repaired component or the new component could not experience S1 without changing ALT planning. Later, if the repaired or new component fails again at S2, we may be not sure whether the component cannot survive after it has been tested for a long period under S2 because the stresses that repaired or new component experiences are different from those the failed component has experienced. As S2 is equal to upper design limit, it is possible that the component is not appropriate to be used in the product. Thus, we should do some additional tests to duplicate the failure and determine a root-cause [1,11]. Moreover, the new component will be tested from the S3 that fall outside design limit when an unrepairable component fails at S3. This case is also inconsistent with our requirements that all the test samples and components should avoid being tested from the highest stress S3 during testing to decrease the uncertainty and risk of ALT [6–8] . As it is described in the first paragraph of this section, a sample may fail at S1, S2 or S3 because of the cumulative effect of exposure at successive stress. If one sample fails at S2 and it is unrepairable, the reliability information on it operating at S3 cannot be obtained. For electronic products, it is clear that one operating well until censoring time T exhibits survivability well at S3, and one failed at low stress level also may exhibit survivability well at S3, yet it has failed and it is impossible to verify our assumption by using the ‘‘lifeless’’ sample. The time-to-failure at low stress level could be extrapolated to the high stress level basing on an assumption that ‘‘the remaining life of specimens depends only on the current cumulative fraction failed and current stress, regardless how the fraction accumulated’’[2]. However, the sample that has failed at

H. Qingchuan et al. / Microelectronics Reliability 52 (2012) 2773–2780

low stress level may be not capable of operating under a higher stress level or not capable of operating under a higher stress level for a long period of time. If this case can be verified the uncertainties of reliability assessment can be reduced too. The ‘‘bathtub curve’’ describes the fact that the products have a higher failure rate in their ‘‘old stage’’. For samples tested in ALT, they may have came into ‘‘old stage’’ for the cumulative effect of exposure at successive stress S1 and S2, hence most of failures of samples (NFF failure and permanent failure) may occur at S3. In addition, for some electronic components, if they continually undergo the S3 for a long period, there would be high risk of inducing some unexpected failures. These could cause high frequent of failure from s2 to T. Whereas, the sample may have few failures when they are operating under S3 in their ‘‘young stage’’. Thus, the reliability of the products operating under S3 may be underestimated by using the data collected from traditional SSALT. NFF failures can occur during ALT, even though the stress levels are appropriately selected. As shown in Fig. 1, suppose that some NFF failures occur at S2, in this case, we could not confirm that the S2 is the leading cause for failure. Maybe, the cumulative effect of exposure at successive stress results in NFF failures. Then, we could not conclude that the sample has a higher NFF rate when it is operating under S2, and hence cannot be sure whether the product could satisfy customer’s needs. Furthermore, even if no NFF failure occurs under S2, we could not conclude the NFF failures cannot be observed when it is operating under the S2, because the cumulative effect of exposure at successive stress might be not harsh enough to result in NFF failure. Moreover, NFF failures occur at S3 is not credible because the stress level fall outside of the product design limit, but the fact is that many NFF failures will occur from s2 to T for the cumulative effect of exposure at successive stress S1 and S2. Whereas, suppose that the traditional SSALT is divided into m (m2N+) sub-ALTs, and hence the test period p for each sub-ALT is T/m and the change time of ith (i = 1, 2. . . m) sub-ALT can be expressed as follows:

8 > <

s0i;1 ¼ s1 =m þ ði  1Þp s0i;2 ¼ ðs2  s1 Þ=m þ s0i;1 > : s0 ¼ ðT  s Þ=m þ s0 2 i;3 i;2 Then, a new type of improved SSALT procedure shows in Fig. 3a. If temperature is used as accelerated stress, the ramp rate of stress changing should be controlled so that they accelerate the products’

Fig. 3. Improved SSALT schematic: (a) without controlling ramp rate of stress changing and (b) with controlling ramp rate of stress changing.

2775

failure under consideration but do not introduce failure modes that would never occur under use conditions [2,3,10]. Then, the other type of improved SSALT procedure shows in Fig. 3b. As shown in Fig. 3, while improved step stress ALT is performed, samples are periodically subjected to the same stress levels in every sub-ALT. For repairable or unrepairable component, suppose that its first failure occurs at any stress level in the ith sub-ALT, then it is repaired or replaced and placed back on test at the beginning of the (i + 1)th sub-ALT. The repaired or new component can experience the same test condition and stress levels as the failed component has experienced, and can be tested from S1. When the component fails again at any stress level after it experiences numbers of sub-ALTs, we can conclude the accumulation of stress damage is the leading cause for failure. Furthermore, it is clearly that the sample can be operating under all stress S1, S2, and S3, and thereby the credible information on it operating under these stress levels is obtained. If some components frequently fail at S2 or S3, it indicates the stress level is the leading cause for failure. Once the leading cause for failure is determined, some measures could be taken to improve the products. It can also be found that it is easy to allocate testing time to each sub-ALT by changing the test period p and corresponding change time s0i;1 and s0i;2 . This could help decrease the risk of inducing some unexpected failures when the sample continually undergo the highest stress level for a long period. In addition, we could verify whether the failure mechanisms and failure modes under test conditions and stress levels in current sub-ALT could be still the same as those in previous sub-ALTs. For example, if some anormal failures only occur at S3 from the ith sub-ALT, it indicates the sample’s failure mechanism under the S3 should be changed, but the sample’s failure mechanism under the S1 and S2 should be unchanged. This advantage is very useful for us to identify whether the initial choice of accelerated stress levels are still appropriate for product qualification test. However, when we use traditional SSALT method to perform the testing it is difficult for us to verify whether the failure mechanisms and failure modes under current test conditions and stress level could be still the same as those under previous test conditions and stress levels. The improved SSALT can also help us obtain much useful information on NFF failures to confirm the leading cause for failure. For every NFF failure, the stress level at which it occurs and the corresponding time-to-failure is known. Suppose that the occurrence of NFF failures are random (that is, they occur at any stress levels and any time), we could determine the NFF failures mainly depend on the cumulative effect of exposure at successive stress. Suppose most of NFF failures occur at S3, we could determine the NFF failures mainly depend on S3. If no NFF failure occurs in the initial numbers of sub-ALTs, we could conclude few NFF failures can be observed when it is operating under stress S1, S2 and S3. Furthermore, the data on NFF failure can be used to assess the NFF failure rate when the products are in field use, and hence we can make a decision about whether we should improve the products to reduce NFF failure rate. If the products are improved, the improved SSALT method could be used to verify whether the NFF failure rate is reduced. In performing an ALT analysis, the first step is to determine an appropriate statistical distribution to describe life data at fixed stress level of the accelerating variables. Usually, in performing a SSALT analysis, the practitioners usually assume a statistical distribution to describe the life data and use it at all stress levels according to the location-scale distribution model [2,12]. For example, when considering the Weibull distribution, the scale parameter g is chosen to be the ‘‘life characteristic’’ that is stress-dependent, while b is assumed to remain constant, that is, it does not depend on the stress levels. Then, these parameters could be inferred by using the testing data according to a CEM proposed by Nelson [2].

2776

H. Qingchuan et al. / Microelectronics Reliability 52 (2012) 2773–2780

For the improved SSALT, according to its principle, all the survival samples have the same probability of failure in each subALT. Thus, the improved SSALT could be described as a transformed ‘‘constant stress’’ ALT with equal inspection period. The inspection period is equal to p when one test period is regarded as a unit of testing and the step stress loading in a sub-ALT is regarded as a transformed ‘‘constant stress’’ loading. Supposed n test samples are placed on testing from the first sub-ALT until all samples fail, the interval censored failure time tj (j = 1, 2,. . .n) represented in units of test period could be obtained. For a constant ALT, according to the location-scale distribution model, if the inspection period is small enough the lifetime distribution of the product could be described by the lifetime distribution fitted by the interval censored data [2,12]. Similarly, this principle could be extended to the improved SSALT. Obviously, the obtained statistical distribution fitted by using the interval censored data may be more reliable and actual than the assumed statistical distribution. The second step is to determine a life-stress model (i.e. acceleration model) to describe the relationship between the lifetime distribution characteristics and the accelerating variables. For some situations, the physical life-stress model (e.g. Arrhenius relationship, Eyring relationship, inverse power law relationship) and the empirical life-stress model (e.g. the quadratic models used in Meeker and Escobar 1988, section 17.5 [12]) could be applied. Once the life-stress model is determined the lifetime at high levels of the accelerating variables could be extrapolated to use conditions.

sample could be operating for a long period of time near operating limit. Each sample experienced successively higher level of stresses at a fixed period. Initially each sample was tested at stress S1 for t1, and then the stress was increased to S2 with ramp rate 1 °C/min. Next, the samples were tested at stress S2 for t2 from s0i;1 and then the stress was increased to S3 with ramp rate 1 °C/min. Following this way, when the samples were tested at stress S5 for t5 from s0i;4 , the stress was decreased to S1 with ramp rate 1 °C/min. The test was continually running as described above. Stress S1, S2, S3, S4 and S5 are 55 °C, 65 °C, 75 °C, 80 °C and 85 °C, respectively. The test period p for each sub-ALT is 24 h, the corresponding test time t1, t2, t3, t4 and t5 is 230 min, 230 min, 235 min, 355 min and 330 min (the test time t is a constant during testing), and the change time is expressed as follows:

8 0 si;1 ¼ 4h þ 24ði  1Þh > > > > 0 0 > > < si;2 ¼ 4h þ si;1 0 si;3 ¼ 4h þ s0i;2 ði ¼ 1; 2; :::mÞ > > 0 0 > s ¼ 6h þ s > i;4 i;3 > > : 0 si;5 ¼ 6h þ s0i;4 In the course of testing, the +5 V DC output was monitored with full load and the sampling rate is 20 Hz. The ±13 V DC output were not monitored with 50% loads. All the samples are powered by one 24 V DC regulated power supply. A failure occurs when a sample fails to meet the criteria: samples must output voltage exactly to specifications (see Table 1).

3. Case study – improved SSALT of DC–DC converter 3.3. Experimental results and discussion 3.1. DC–DC converter description Fig. 4 shows a custom-built DC–DC converter, which is an important part of a CNC controller for the CNC machine tool. The anormal fluctuations of output voltage will causes faults of the CNC controller, and this, in turn, may cause the CNC machine tool failure. The DC–DC converter’s main hardware components include MOSFETs, power rectifiers, voltage regulators, isolating transformers, pulse width modulation (PWM) controller chip and filter electrolytic capacitors. Under normal conditions, the DC–DC converter has a variable input ranging from 18 V to 30 V DC, and the DC output voltages and output ripple noise are required to remain within the regulation ranges (see Table 1). For the DC–DC converter’s upper temperature limits, the upper specification limit is 60 °C, the upper design limit is 75 °C, and the upper operating limit is close to 90 °C. 3.2. Stress levels and test conditions Nine samples were used in the case study. Fig. 5 shows one subALT loading profile. The stress level ranges from 55 °C to 85 °C. The stress 55 °C is used to verify whether the sample could be operating under normal use. The stress 85 °C is used to verify whether the

Fig. 4. Inside view of a CNC controller.

Table 2 summarizes the occurrence of NFF failures and permanent failures in each sub-ALT. As shown in Table 2, for some samples, several NFF failures can occur in a sub-ALT. For example, for sample 3, two NFF failures occurred in the 43rd sub-ALT: one occurred at S4 and the other occurred at S5. Fig. 6 illustrates the output voltages of sample 3: Fig. 6a shows output voltages in the 41st sub-ALT, Fig. 6b shows output voltages in the 43rd and Fig. 6c shows output voltages in the 48th sub-ALT. As shown in Fig. 6b, ripple noise is not within specifications (see Table 1) under stress S4 and S5. By comparison, the anormal voltage output was considered as NFF failures. In this case study, NFF failures always followed a pattern that ripple noise is not within specifications, and the permanent failures can be described that the capacitive load capability of the samples degraded to be below 50% of the original value. For sample 1, permanent failure occurred at the end of the 43rd sub-ALT (1032 h), and only one NFF failure occurred at S3 in the 31st sub-ALT. For sample 2, permanent failure occurred at the end of the 49th sub-ALT (1176 h) and no NFF failure occurred. For sample 3, permanent failure occurred at the end of the 51st sub-ALT (1224 h); the first NFF failure occurred at S5 in the 18th sub-ALT and the others occurred in the 43rd and 46th sub-ALT, respectively. For sample 4, permanent failure also occurred at the end of the 51st sub-ALT; the first NFF failures occurred at S5 in the 48th sub-ALT and the others occurred in the 50th and 51st sub-ALT, respectively. For sample 5, permanent failure occurred at the end of the 52nd sub-ALT (1248 h); the first NFF failure occurred at S4 in the 46th sub-ALT and the others occurred in the 47th, 49th, 51st and 52nd sub-ALT, respectively. For sample 6, permanent failure also occurred at the end of the 52nd sub-ALT; the first NFF failure occurred at S5 in the 49th sub-ALT and the others occurred in the 50th, 51st and 52nd sub-ALT, respectively. For sample 7, permanent failure occurred at the end of the 55th sub-ALT (1320 h); the first NFF failure occurred at S5 in the 19th sub-ALT, the others occurred in the 43rd, 51st, 54th and 55th sub-ALT, respectively. For sample 8, permanent failure occurred at the end of the 60th

H. Qingchuan et al. / Microelectronics Reliability 52 (2012) 2773–2780

2777

Table 1 DC output voltage regulation. Output

Min (V)

Nom (V)

Max (V)

Max ripple noise (mV)

Max current (A)

Nominal working (V)

+5 V DC +13 V DC 13 V DC

+4.5 +12.0 12.0

+5.1 +13.0 13.0

+6.0 +14.0 14.0

50 130 130

6.0 3.0 2.0

5.1  5.5 12.5  13.5 12.5  13.5

Fig. 5. One sub-ALT loading profile.

Table 2 Data on occurrence of NFF failures and permanent failures. Sub-ALT ID #

Sample ID # 1

2

3

4

5

6

7

8

9

18 19 30

N N N

N N N

S5 N N

N N N

N N N

N N N

N S5 N

N N N

31 43

S3 F

N N

N N

N N

N N

N S5

N N

N N S1 S2 S3 N N

N

S4 S5 S4 N S3–S4 S5 N

N

N

N

S4

N N S5

N N N

N N N

S4 N S3–S4

S5

N

N

S4

S5

S5

N

S3–S4 F

S3–S4 F

N

N

53

N

N

54

S3 S4 S3 F

S3–S4 S4 S5

S3 S5 S3–S4 S4–S5 S3–S4 S4–S5 S2–S3 S4–S5 S2–S3

46

N

N S4 S5 S3–S4

47 48 49

N N F

N N N

N S5 N

50

N

S5

51

F

S4–S5 F

52

55 56 57 58 59 60 61 62

S2–S3 S4–S5 S2 S3 S3–S4 S5 S3 S4–S5 S4–S5 F

S3 S5 S5 S4 S4–S5 S3–S4 S4–S5 S3–S4 S4–S5 S4–S5 S4–S5 S3 F

Si: NFF failure occurred at Si; Si–Si+1: NFF failure occurred while stress was increasing from Si to Si+1; N: no failure occurs; F: permanent failure; blank: sample was removed from test.

sub-ALT (1440 h); the NFF failures frequently occurred from the 54th to 60th sub-ALT. For sample 9, the permanent failure occurred at the end of the 62nd sub-ALT (1488 h); the first NFF occurred in the 30th sub-ALT, and NFF failures frequently occurred from the 53rd to 62nd sub-ALT.

Fig. 6. Output voltage of sample 3: (a) in 41st sub-ALT, (b) in 43rd sub-ALT and (c) in 48th sub-ALT.

In general, there were no NFF failures at the end of the 17th subALT (408 h). From the first to the 42nd sub-ALT, only six NFF failures occurred at different stress levels. This indicated that samples’ failure mechanisms and failure modes in the accelerated environment could be not changed, and hence it can be concluded that the stress levels were appropriately selected at the beginning of ALT. Furthermore, we were sure that all samples exhibited operating well at all stress levels. By analyzing these failure data, it can also be concluded that the occurrence of NFF failure mainly depend on the accumulation of stress damage. The results also showed that the stress levels were still appropriate for applying to the samples until the 42nd sub-ALT, but the cumulative effect of exposure at successive stresses on samples was becoming harsh enough to result in failures from the beginning of the 18th sub-ALT. From the 43rd to the 62nd sub-ALT, most of NFF failures occurred at S4, S5 and during the course of stress changing, and these failures all occurred in the ‘old stage’ of each sample. Furthermore, it could be concluded the leading cause for those failures those were collected form the 43rd to the 62nd sub-ALT was the stress level, especially stress S4 and S5. This indicated that the sample’s stress limits might have been changed because of the cumulative effect of exposure at successive stress, and hence the stress S4 and S5 were not appropriate for applying to the samples. It is clear that the components’ performance was degrading with increasing of the cumulative effect of exposure at successive stresses, and, as a result, the samples’ operating limit was slowly closing to design limit or specification limit [2]. Once the sample’s operating limit was close to design limit or specification limit, the initial selection of stress levels would be not appropriate for applying to the sample too. It was also observed that some NFF failures occurred while the stress level was increased from low stress level to high stress level and the first one were observed in the 46th sub-ALT. For sample 8 and 9, many NFF failures occurred during stress changing (see

2778

H. Qingchuan et al. / Microelectronics Reliability 52 (2012) 2773–2780

Table 2). These results showed the ramp rate also was an important stress that could affect the reliability of DC–DC converters. In this case study, if we had decreased the initial stress S2, S3, S4 and S5 to 60 °C, 65 °C, 70 °C and 75 °C and not changed the planning on test time and change time from the beginning of the 44th sub-ALT, we would prolong the total test time, and hence obtain more credible test results. For the DC–DC converter, as the capacitance of electrolytic capacitors decreases and the ESR increases, the lines start to show voltage ripple [13,14]. The ripple on the voltage regulators may result in an unanticipated output from the analog signal generating circuit, and this, in turn, may result in NFF failures. According to the comprehensive cause and effect diagram and the test results, we could identify that the low quality electrolytic capacitors those used as filter in the DC–DC converter resulted in the high NFF rate. The designers do not use an appropriate testing method to verify whether the low quality filter electrolytic capacitor could satisfy the customer’s needs, and thus they think the test prototype could pass the qualification test. Usually, the price of the low quality filter electrolytic capacitor is much cheaper than the price of the high quality ones but the low quality ones have a lower reliability. To save product cost, the manufacturer is willing to choose the low quality filter electrolytic capacitor. The test results (there are too many NFF failures during testing) show the filter electrolytic capacitor used in the DC–DC converter should be replaced by a higher quality and reliability one in time to prevent the impact of NFF failures from spreading through customers. Further analysis shows the cause for permanent failures is that the filter electrolytic capacitors failed. All the failed electrolytic capacitors show one symptom: bulging of the vent on the top of the capacitor (see Fig. 7). Their capacitance degraded to 25–40% of the original value. However, the designers have assumed that it might drop up to 50% of its original value over its life. Thus, the samples’ capability of loading and stability of the output voltages was compromised by such a dramatic drop of the capacitance. Dissection of the failed filter electrolytic capacitors revealed that they were still moist indicating little loss of electrolyte during the ALT. There was no sign of electrolyte leakage around the rubber packing/rubber interface areas. The Aluminum lead tabs were inspected and no trace of corrosion was detected. The capacitor was unwound and both anode and cathode were inspected. The results show a trace of pitting corrosion attack can be seen on the surface of anode and cathode foil, and a small area of discoloration was visible without aid. These are usually caused during anode etching and/or formation and pose no threat to capacitor performance. The analysis results mentioned above indicate the electrolytic capacitor’s failure mode could be categorized as degradation failure and the failure mechanism could be categorized as wearout mechanism. As shown in Table 2, the total test periods of each sample until occurrence of the permanent failure was 43, 49, 51, 51, 52, 52, 55,

Fig. 7. A picture of bulging of the vent on the top of the capacitor.

60, 62, separately. In this case study, the life data are first graphically analyzed by using the software Weibull++. Fig. 8 gives the lognormal probability plot, and the plotted points fall roughly along a straight line. This indicates that the lognormal distribution is adequate for describing the life data. The graphical analysis is followed by the maximum likelihood (ML) estimation. By using ML, we obtain the estimates of mean l  3.9609 and standard deviation r  0.1085 of the natural logarithms. The Arrhenius relationship could be implied to describe the relationship between the l and the temperature stress [2], and thus the acceleration factor is given by the following equation:

AFðT x Þ ¼

   life at T 0 Ea 1 1 ¼ exp  k T0 Tx life at T x

ð1Þ

where Ea is the activation energy (in eV), k is Boltzmann’s constant (8.6171e5 eV/K), and temperature is expressed in Kelvin (K = °C + 273.15), T0 is the normal working environment temperature, and Tx is the temperature used in the ALT, that is, it is also equal to the stress level Sx (x = 1, 2, 3. . .). In this case study, as the cause for permanent failures was that the filter electrolytic capacitors failed (that is, the life of the DC–DC converter is limited by the lifetime of the filter electrolytic capacitor), we could suppose the DC–DC convertor’s median lifetime could be described by using the electrolytic capacitor’ s median lifetime. Ideally, we could assume the filter electrolytic capacitor’s median lifetime L0 is 50,000 h at T0 = 303.15 K (30 °C) with the same loading conditions in this paper [14] and the remaining life of the filter electrolytic capacitor depends only on the current cumulative fraction failed and current stress [2]. Then, the Ea can be computed by:

X

 t x AFðSx Þ ¼ L0 =p

ð2Þ

where tx is the dwell time at Sx in each test period (in hour).  ¼ expðlÞ are the mean test periods that the samples have been p tested until the permanent failure occurs. According to Eq. (2), We substituted the test time, stress levels, L0 and l into Eq. (2), and then get: 5 X tx AFðSx Þ  4AFðS1 Þ þ 4AFðS2 Þ þ 4AFðS3 Þ þ 6AFðS4 Þ þ 6AFðS5 Þ x¼1

¼ 50000= expð3:9609Þ

ð3Þ

where AF(Sx) can be calculated by Eq. (1). From Eq. (3), we could get Ea = 0.7198 eV. Once the Ea had been obtained, we could obtain the median lifetime at other working environment temperature through dividing L0 by the corresponding acceleration factor, and thus get the lifetime distribution basing on the scale accelerated failure-time model. For example, we obtain L = 13638.5 h at 45 °C, that is l = log(L) = 9.5206, and then we can obtain the cumulative distribution function (CDF) of the product under 45 °C:

Fðt; l; rÞ ¼ Unor

  logðtÞ  9:5206 ; 0:1085

t>0

Fig. 8. The lognormal probability plot.

H. Qingchuan et al. / Microelectronics Reliability 52 (2012) 2773–2780

where Unor is standard normal distribution CDF. In addition, the lognormal P quantile can be expressed as:

t P ¼ exp½9:5206 þ 0:1085U1 nor ðPÞ

4. Discussion and conclusion ALT plays an important role in the quantitative test due to the advantage of time compression. The test results can assist to verify whether the product meets or exceeds reliability and quality requirements of the intended application. Therefore, it is very important to select an appropriate ALT method according to the characters of the product that could produce more credible and useful test results for product qualification. When resource limitations on the availability of test prototypes and/or test equipment pose a restriction on the number of samples that can be tested, SSALT method is firstly selected to conduct quantitative test. In fact, the traditional SSALT method has its limitations when it is applied to quantitative test of electronic products. The main limitations include: (a) it is impossible to duplicate the failure that occur in testing without changing ALT planning; (b) it is impossible to verify the assumption that the sample failed at low stress level may exhibit survivability well at high stress level; (c) it is difficult to determine the leading cause for failures without additional testing; (d) it is difficult to verify the assumption that failure modes occurring at low stress level may differ from those occurring at high stress level. The current focus in this paper is to propose a method to solve the problems mentioned above. The new method is an improved SSALT method which is transformed by dividing traditional step stress ALT into m sub-ALTs. The approaches of test condition selection and stress level selection for the new method can adopt those for traditional SSALT method. As descriptions of the improved SSALT procedures, the samples could be periodically subjected to the same stress levels in every sub-ALT. If a sample fails in the ith sub-ALT, it could be repaired or replaced by a new one and placed back on test at the beginning of the (i + 1)th sub-ALT without changing ALT planning. This ensures that the new components, repaired components and un-failed components in a sample could experience the same test conditions and stress levels. The test results include more credible and useful reliability information than those are collected from traditional SSALT, and hence can help us solve the problems mentioned in the first paragraph of this section. The test period p and change time s can be easily determined and allocated to each sub-ALT. Thus, the total test time that the samples were tested at the highest stress level can also be allocated to each sub-ALT. This decreases the risk of inducing some unexpected failures in samples because of effect of continually undergoing the highest stress level for a long period. When little failure data is collected until censoring time T we could prolong the test by number of sub-ALTs. In the course of the improved SSALT, whether the failure mechanisms and failure modes under current stress level are still the same as those under previous test conditions and stress levels can be identified in time. In field use, the electronic product might repeatedly undergo a cyclic stress loading, but the effect of cyclic stress on product life cannot be taken into account when traditional SSALT planning is used to perform product qualification test. For electronic product, cyclic stress can exposure some unknown failure modes during product qualification. Hence, if we take into account some effects of cyclic stress on product life when we make an ALT planning, the uncertainty in the reliability assessment could be reduced. The improved SSALT method provides us with an approach to subject the samples to effect of cyclic stress.

2779

The experimental results indicate the samples’ stress limits could be changed because of the cumulative effect of exposure at successive stress. Therefore, when the improved SSALT method is applied to do quantitative ALT planning for electronic products, the cumulative effect of exposure at successive stress on samples’ stress limits should be considered. It is better to decrease the stress levels when the samples’ operating limit will be close to design limit. Once this premise is accepted, then the improved step stress ALT method can help us obtain more credible test results of product qualification. There are also some limitations of the improved SSALT for product qualification test. In this paper, the lifetime distribution was fitted through the interval censored data. In fact, the interval censored data can affect the accuracy of the fitted lifetime distribution, and subsequently the extrapolation to use conditions. Thus, it must take caution to set the test period to be short enough to observe the spread of the failures. For example, if the test period is too long, all the samples in the test may fail within that interval, and thus no lifetime distribution could be obtained. Yet, the test period should not be set to be too short, otherwise the improved SSALT will become a cyclic stress ALT (i.e. each sub-ALT cannot be regarded as a SSALT). This departs our original intention that we hope the sample could be tested at constant stress as much as possible, and then the location-scale distribution model and the CEM used for performing the traditional SSALT analysis could be applied to the improved SSALT analysis. However, the location-scale distribution model and the CEM also have some disadvantages which are well described by Nelson [2] and Meeker and Escobar [12]. Thus, to obtain a reliable and actual statistical distribution to describe the product under use conditions, the statistical analysis methods for the improved SSALT need to be further studied. Moreover, the failure mechanisms in the improved SSALT may be different to those in a traditional SSALT. The main reason is also that the samples can be subjected to the effect of cyclic stress in the improved SSALT. This may also increase the uncertainty in the results when we use the CEM to perform the statistical analysis of the improved SSALT, because the CEM do not take into account the effect of cyclic stress [2]. Every coin has two sides. The effect of cyclic stress not only could reduce the uncertainty in the reliability assessment, but also could increase the uncertainty in the results when we use the CEM to perform the statistical analysis of the improved SSALT. Thus, we should keep a balance between reducing the uncertainty in the reliability assessment and guaranteeing the failure mechanisms in the improved SSALT could be the same as those in a traditional SSALT. Then, the problems that how to plan a scientific sub-ALT loading profile need to be further studied. Acknowledgements This study was financially supported by Major Projects of CNC (No. 2009ZX04014-013-02), National Natural Science Foundation of China (No. 51075370) and Zhejiang’s Key Science and Technology Innovation Team (No. 2010R50005). References [1] Wang WQ, Azarian M, Pecht M. Qualific for Prod Develop, ICEPT-HDP 2008:1–12. [2] Nelson WB. Accelerated testing: statistical models, test plans, and data analysis. New York: John Wiley & Sons Inc; 2004. [3] Pecht M. Prognostics and health management of electronics. New York: John Wiley & Sons Inc; 2008. [4] He QC, Chen WH, Pan J, Wang SJ. Challenges in reliability assessment for electronics. Adv Mater Res 2010;118–120:419–23. [5] Qin J. A new physics-of-failure based VLSI circuits reliability simulation and prediction methodology. USA: University of Maryland; 2007. [6] Monroe EM. Optimal experimental designs for accelerated life tests with censoring and constraints. USA: Arizona State University; 2009.

2780

H. Qingchuan et al. / Microelectronics Reliability 52 (2012) 2773–2780

[7] Li CH. Optimal step-stress plans for accelerated life testing considering reliability-life prediction. USA: Northeaster University Boston; 2009. [8] Ma HM. New developments in planning accelerated life tests. USA: Iowa State University; 2009. [9] McPherson JW. Reliability physics and engineering: time-to-failure modeling. New York: Springer; 2010. [10] Wasserman GS. Reliability verification, testing, and analysis in engineering design. New York: Marcel Dekker Inc; 2003.

[11] Qi HY, Ganesan S, Pecht M. No-fault-found and intermittent failures in electronic products. Microelectron Reliab 2008;48:663–74. [12] Meeker WQ, Escobar LA. Statistical methods for reliability data. New York: John Wiley & Sons Inc; 1998. [13] Wikipedia, Switched-mode power supply; February 1, 2012. [accessed 18.02.12]. [14] Wikipedia, Capacitor plague; February 11, 2012. [accessed 18.02.12].