Microelectron. Reliab., Vol. 32, No. 6, pp. 759-762, 1992. Printed in Great Britain.
0026-2714/9255.00 + .00 © 1992Pergamon Press Ltd
BURN-IN TO IMPROVE WHICH MEASURE OF RELIABILITY? FRANK GUESS and ESTEBAN WALKER Department of Statistics, University of Tennessee, Knoxville, TN 37996-0532, U.S.A. and DORINDA GALLANT Department of Mathematics, Winthrop College, Rock Hill, SC 29733, U.S.A. (Received for publication 22 March 1991) Abstract--Burn-in plans or screens are often used to improve the "reliability" of systems, subassemblies, and components. There are, however, different ways of measuring reliability. Sometimes contradictory results are obtained when different measures of reliability are used. This problem stems from the fact that each one of these measures is really assessing reliability from a different point of view. These differences are crucial for designing in reliability and for devising burn-in plans. It is known, for example, that a burn-in plan that optimizes (or improves) one reliability measure does not necessarily yield an optimal (or improvement) for another measure. After introducing and discussing several reliability measures, examples are presented to illustrate the behavior of different reliability measures. We stress the importance of understanding what the end user of a product needs in terms of "reliability" before the design stage, through all the developmental stages, and for bum-in plans.
1. INTRODUCTION Burn-in is used to improve the reliability of a component, subsystem, or system in a variety of industries. Jensen and Petersen [1] write, "Burn-in has long been recognized as a useful method for screening out early failures. It might be argued, very reasonably, that if all parts were made properly in the first place, then burn-in, or indeed any other form of post-production screening, should not be necessary. For quite a large number of electronic components or mechanical devices with a long history of production this will certainly be true. However, in a time of rapidly changing technologies and production methods, coupled with an increasing awareness of reliability, most manufacturers will be forced to instigate some sort of reliability screening to minimize the number of early failures in the field..." Tustin [2] provides an excellent short introduction on ESS (Environmental Stress Screening). He discusses an example from IBM of a screen being used upstream " . . . in the design and production of the company's Model 4234 printer." According to Tustin, the screens saved at least an estimated $1,000,000 by preventing earlier warranty failures by the improved design and production. Burn-in plans for devices with standard bathtub shaped (and some other standard) failure rates are presented in Jensen and Petersen [1]. For additional information see the paper by Wumik and Pelloth [3] on integrated circuits and the recent book by Nelson [4] on accelerated life testing. 759
In some cases, minimizing the failure rate can also maximize the mean residual life (see, for example, Guess and Park [5]). In other situations, however, the point at which mean residual life is maximized is strictly earlier than the time point that minimizes the failure rate [6]. Similar situations are possible for other reliability measures (these terms are formally defined in Section 2). In some settings it may be more important to optimize a particular measure of reliability, e.g., if the goal is to improve the average life, then the mean residual life (MRL) is the relevant measure. The MRL could also be the measure of interest for staffing policies for certain positions with high turnover. A new policy identifying key variables (e.g., training, merit incentives, and appropriate benefits) for longer average retention (MRL) could save a company money and prevent quality trained staff from leaving too early. A costly screen of a product might actually succeed in minimizing (or reducing) the failure rate, but at the expense of the MRL being smaller than its optimal. On the other hand, a less costly screen could, in some cases, make the M R L optimal, although the failure rate would not be at its optimal minimum. It is crucial to understand what the end user of a product needs in terms of "reliability" before the design stage, through all the developmental stages, and for bum-in plans. In the following section, various measures of reliability are presented and discussed. Two examples are included in Section 3. A summary and conclusions are given in Section 4.
760
FRANK GUESSet al. 2. MEASURES OF RELIABILITY
We first state notation and standard definitions of some basic measures of reliability. Let X be the random life (strength) of a component, subassembly, or system, w i t h f ( x ) and F ( x ) its density and distribution functions, respectively. We define F ( x ) = 1 - F ( x ) = P ( X > x) =
f;
f ( u ) du (reliability function)
F(xlt) = P ( x + t)/f'(t) = P ( X > x + t l X > t) (conditional reliability) r(t) = f ( t ) / F ' ( t ) (failure rate) m ( t ) = E ( X - t l X > t) =
f;
F(u) d u / F ( t ) (mean residual life),
where F(t) > 0 for the conditional reliability, failure rate, and means residual life. The reliability function and the conditional reliability (CR) are relevant when the interest is in the probability of surviving a "mission" of fixed length x. The CR can be thought of as the reliability function after a burn-in of time t. This can be seen by noting that F(x) = F(x[0) and the definition of F(xlt). The failure rate (FR) is a measure of "local" reliability that is related to the number of failures in a short time interval after the burn-in time t. In contrast, the nature of the mean residual life (MRL) makes it more of a "long term" measure of reliability. See Guess and Proschan [7] for more details. The objective of a burn-in plan is to determine the time t that optimizes one of the above measures of reliability. As will be seen later, these optimal burn-in times frequently do not coincide. Therefore, the decision of which reliability measure is most relevant in a particular situation becomes crucial. When there exists a t such that r ( x l t ) > F ( x ) then a burned-in device of age t will be more reliable than a new device, in terms of the probability of surviving a mission of length x. If no such t exists for that particular x, then burn-in is not needed and the device is said to be new-better-than-used for a mission of length x. (Note that the general condition of newbetter-than-used, NBU, implies no t exist for all mission lengths.) The effect of burn-in depends on the properties of the lifelength distribution of the item under study. See, for example, Guess et al. [8], Hollander et al. [9, 10], Klefsj6 [11], Tiwari and
1051 4063 6473 9806 11745 16179
Zalkikar [12], or Wells and Tiwari [13] and their discussion of various classes of distributions. Chandrasekaran [6] compares the MRL and the FR. He gives situations where the maximum M R L occurs at time tMRL, while the F R is minimized at tFR with tMRL< tFR. It the customer really needs a larger average life rather than fewer initial failures, then the goal should be to maximize the M R L instead of minimizing the FR. Note that the producer also benefits from using the smaller burn-in time, /MRL" For other articles on M R L a n d / o r F R see Guess and Proschan [7], a survey paper on MRL, Guess and Park [14], Guess and Kitchin [15], Gupta and Keating [16], and Gupta and Kirmani [17]. Compare also Launer [18] and his references on median (percentile), residual life. In some situations, tFR = tcR(X), where tcR(X) is the burn-in time that maximizes the CR for a mission of length x. For products with very short mission length, x, a burn-in time t for a low (or minimum) F R will also yield a high (or maximum) CR because P(xlt)=exp(-f'+Xr(u)du)
= exp( - x r(t)). The above formula is actually true for any situation in which the F R is (approximately) constant in the interval (t, t + x). This is the case, for example, for bathtub and other FRs for which the flatter useful life after burn-in t is longer than x (see the second example of Section 3). Similarly, the optimal burn-in for the M R L is not necessarily the same for the CR. This is illustrated in the first example of Section 3. 3. EXAMPLES
The following data on Kevlar 49/epoxy strand life times at 7.008 kg stress were analyzed by Schmoyer [19]. He writes, "Kevlar 49 is an organic reinforcing fiber made by Du Pont, which is used in the manufacture of, among other things, high-quality bicycle tires and canoes." Its many uses also include making bullet proof vests, tennis rackets, boat hulls, etc. This data, with 49 strands tested, is duplicated in Table 1. Schmoyer's [19] purpose was to extrapolate low stress-response probabilities of survival (failure) based on higher stress response data assuming only mild, reasonable nonparametric conditions. Our goal is to
Table 1. Kevlar 49/epoxy strand lives in hours at 7.008 kg stress 1337 1389 1921 1942 2322 3629 4006 4921 5445 5620 5817 5905 5956 6068 7501 7886 8108 8546 8666 8831 9106 10205 10396 10861 11026 11214 11362 11604 11762 11895 12044 13520 13670 14110 14496 17092 17568 17568
4012 6121 9711 11608 15395
761
Burn-in as an improvement of reliability illustrate the different behaviors of the reliability measures. Figure 1 displays the empirical M R L and the CR for a mission of length x = 2000 h as functions of the burn-in time t. As is apparent from the figure, the M R L is maximized at tMRL = 0, implying that burn-in does not improve the average remaining life of the product. The CR function, on the other hand, indicates that the probability of surviving a mission of length 2000 h is maximized if the product is burned-in for tCR(2000 ) = 1942 h. Recall that CR depends on the mission length x. For example, when the mission length is changed to x = 3500 h, the optimal burn-in times for CR and M R L are the same, in this case, tMRL= tc~(3500) = 0. The second example illustrates that the optimal burn-in time for the F R is not necessarily the optimal time for the CR. It is easy to extend this simple example to more complicated bathtub shaped FRs via polynomial insertions and connections over appropriate intervals. Let ! r(t)=
f o r 0 ~ < t < 1, for l~2
where 0 < b < 1 < c. Clearly this F R is minimal for any time in the interval [1,2); thus, the optimal economical burn-in time for the F R is tF~ = 1. If the concern is for the best "reliability" as measured by
the probability of survival for a mission of length x, then P(xlt) is maximized at the same time tcR(X) = 1 only if x < 1, otherwise the optimum is t o t ( x ) = max{0, 2 - x} < tFR. For example, if x I> 2 then the optimum is tea = 0, i.e., burn-in is not needed. To see this, note again that
P(x[t)=exp(-~'+~r(u)du), and therefore in order to maximise P(xlt), we need to minimize the cumulative F R on the interval [t, t + x). 4. CONCLUSIONS Burn-in is a valuable technique to improve the "reliability" of a product. The objective of a burn-in plan is to find the burn-in time t, so that the appropriate reliability is optimized. Unfortunately, the times at which the different measures of reliability are optimized (or improved) are not necessarily the same. In designing a burn-in, one has to be keenly aware of the relevant measure of reliability for each particular end user. We want to emphasize that the reliability measure which is most important for the customer can vary. It is important to be aware of these issues before the design stage and also in planning any possible burn-in that improves (optimizes) the relevant measures of
,!-9000
1 .00
~'8000
!
C 0 n
d 0.75 i
t i 0 13
j+
r 7000 !
M e
,'- 6 0 0 0
n
I
i
] i
0
F 5000
I
0 50 R
i
d u G
~-3000 I
x
b I
s
'~ !
I i 1
•
,
F 4000
e
R
0,25
-+
x
i
I
f
F2000
•
[
t y
i
~Iooo
L
x
i
0.00 I
I
o
2000
-r--
I
4000
--'t---
[
---i
6000
L i
I
I
I
I
I
I
8000
10000
12000
14000
16000
18000
Fig. 1. Empirical mean residual life (---) and conditional reliability (*) for a mission of length x = 2000. Kevlar data.
FRANK GUESSet al.
762
reliability. F o r some products there is a need to include burn-in to push the reliability limits even higher than the current best design and production process technologies permit. For products with extremely costly field repairs, burn-in and screens are important even with "high quality design and production" (e.g., the Hubble Telescope was not screened adequately). Satellite devices, undersea cable components, etc. need burn-in and screening due to the tremendous expense of field failures. Acknowledgements---Dorinda Gallant acknowledges research support from the Ronald McNair Post Baccalauerate Achievement Program sponsored by the United States Department of Education, Washington, DC and thanks R. McFadden for making the grant funding possible. Esteban Walker and Frank Guess appreciate research funding from the Univeristy of Tennessee, College of Business Administration, Faculty Research Fellowships. REFERENCES
1. F. Jensen and N. E. Petersen, Burn-in: An Engineering Approach to the Design and Analysis of Burn-in Procedures. John Wiley, New York (1982). 2. W. Tustin, Recipe for reliability: shake and bake. IEEE Spectrum 86, 37-42 (1986). 3. F. Wurnik and W. PeUoth, Functional burn-in for integrated circuits. Microelectron. Reliab. 30, 265-274 (1990). 4. W. Nelson, Accelerated Testing: Statistical Models, Test Plans, and Data Analyses. John Wiley, New York (1990). 5. F. Guess and D. H. Park, Modeling discrete bathtub and upside down bathtub mean residual life functions. IEEE Trans. Reliab. 37, 545-549 (1988). 6. R. Chandrasekaran, Optimal policies for burn-in procedures. Opsearch 14, 149-160 (1977).
7. F. Guess and F. Proschan, Mean residual life: theory and applications, In Handbook of Statistics" Quality Control and Reliability (edited by P. R. Krishnaiah and C. R. Rao), Vol. 7, pp. 215-224. North Holland, Amsterdam (1988). 8. F. Guess, M. Hollander and F. Proschan, Testing exponentiality versus a trend change in mean residual life. Ann. Statist. 14, 1388-1398 (1986). 9. M. Hollander, D. H. Park and F. Proschan, Testing whether new is better than used of a specified age, with randomly censored data. Can. J. Statis. 13, 45-52 (1985). 10. M. Hollander, D. H. Park and F. Proschan, Testing whether F is "more NBU" than is G. Microelectron. Reliab. 26, 39-44 (1986). 11. B. Klefsj6, Testing against a change in the NBUE property. Microelectron. Reliab. 29, 559-570 (1989). 12. R. C. Tiwari and J. N. Zalkikar, Testing whether F is "more IFRA" than is G. Microelectron. Reliab. 2,8, 703-712 (1988). 13. M. T. Wells and R. C. Tiwari, Testing whether F is "more NBU" than is G with randomly censored data. Microelectron. Reliab. 28, 901-908 (1988). 14. F. Guess and D. H. Park, Nonparametric confidence bounds on the mean residual life using censored data. IEEE Trans. Reliab. 40, 78-80 (1991). 15. F. Guess and J. Kitchin, Mean time to system recovery. Quality Reliab. Engng Int. 7, 5-6 (1991). 16. R. C. Gupta and J. P. Keating, Relations for reliability measures under length biased sampling. Scand. J. Statist. 13, 49-56. (1986). 17. R. C. Gupta and S. N. U. A. Kirmani, On order relations between reliability measures. Commun. Statist.Stochastic Models 3, 149-156 (1987). 18. R. L. Launer, Some properties and applications of the percentile residual life function. University of South Carolina Statistics Technical Report No. 151 (1989). 19. R. L. Schmoyer, Linear interpolation with a nonparametric accelerated failure-time model. J. Am. Statist. Assoc. 83, 441-449 (1988).