SYSTEM RELIABILITY PREDICTION USING GLOBAL OPTIMIZATION
J. Ushert, B. Stuckman t t - Department of Industrial Engineering, University of Louisville
t - Department of Electrical Engineering,
University of Louisville Louisville, Kentucky 40292
Abstract.
The prediction of system reliability is an important consideration in
manufacturing system design.
In realistic scenarios, this problem involves the
maximization of a likelihood function over a large multidimensional space. If there is access to system failure data where the the exact cause of failure is known, then the maximization can be performed analytically.
In many situations however, system
failure data is masked, meaning that the cause of failure can only be narrowed to some subset of possible components. In this case, a global optimization algorithm must be used to maximize the likelihood function. This paper presents the mathematical framework for the formation of the likelihood function for Weibull distributed component failures in the presence of masked data. A case study is presented involving the reliability of a system based upon ten components. Simulated annealing is used to find the maximum likelihood estimates of the component parameters.
Results are
compared with other methods of estimation. Keywords.
Optimization;
global optimization; simulated annealing;
reliability;
maximum likelihood estimation; masked data analysis. INTRODUCTION Estimating the reliability of components from system
In practice, however, this type of analysis is often
life data is generally accomplished by making a series
confounded by the problem of ~, i.e., where the exact cause of system failure is unknown. This occurs
system assumption and applying a competing risks model. The observable quantities of interest are the
frequently in complex systems and in field data where
lifelength of the system (failure or censoring time) and
the cause of failure may only be isolated to some subset
the exact component causing failure . Finding maximum likelihood estimates (MLEs) for component life
of components, such as a circuit card containing many
distribution parameters has been widely addressed in
individual components. The resulting quantities observed are then the lifelength of the system, and
the literature. For numerous references and results, see
partial information on the cause of failure.
for example Nelson (1982) . Due to the increasing costs of system downtime and the The component reliability estimates that are obtained
complexity of detailed failure analysis, masked system
from analysis of system life data are extremely useful
life data is becoming more widespread. Generally this
because they reflect the reliability of components after
forces many analysts to discard data that could be effectively used in the estimation process.
their assembly into an operational system. As such, the estimates account for the many degrading effects introduced by the system manufacturing, assembly, Because of distribution and installation processes.
In this research, we address the problem of obtaining
these
data.
advantages,
companies
such
as
IBr.l,
MLEs of component reliability from masked system life
are
beginning to implement computer plans designed to
estimators with excellent statistical properties but , as
track such system life data and generate component reliability estimates,
c.
f.
The method of maximum likelihood yields
we will show here, it is particularly difficult to apply in
Usher , Alexander, and
the masked data case.
Thompson (1990).
211
Section 2 describes the statistical development of the
parameters
likelihood function for the masked data case.
simultaneously , rather
than
separably.
In
Even a cursory review of the literature would reveal an
Section 3 we briefly describe the simulated annealing
abundance of well-known local optimization methods.
method of global optimization which has been used to
However, these algorithms may not be applicable
Section 4 describes
because they do not guarantee that a global optimum
that data set used in the research and the estimates
will be found . In multi-parameter situations they often
maximize the likelihood function.
that have been found using global optimization.
converge to a local optimum.
In
Most of these methods
also require good initial starting points to guarantee
Section 5 we discuss future directions for the research.
convergence. In addition, the faster gradient methods, THE STATISTICAL MODEL
such as Newton-Raphson, require first and second
As an extension to the work of Guess, Usher and
masking case, these are not available in closed form,
Hodgson (1991) , and Usher and Guess (1989) , assume
therefore requiring even more numerical computation.
partial derivatives of the likelihood function.
For the
that n non-identical systems have been placed onto Further suppose that each of the components in
The overall result is that MLEs, with their desirable
these systems can be placed into one of K classes. Let
statistical properties (invariance, asymptotic efficiency
N' k represent the number of type-k components in the
and normality, etc.), are extremely expensive and
test.
it~
system .
difficult to obtain without the use of sophisticated
As before let T·1 represent the time to
failure of the ith system.
global optimization methods.
Let fk(t), and I\(t) be the
density and reliability function for a type-k component lifelength with parameter ,'ector
~k'
SIMULATED ANNEALING
Let "i be the
indicator variable to denote system failure or censoring In order to maximize (2), we have employed the use of
where we define:
a general simulated annealing algorithm.
b - {I if system i fails i - 0 if system i is censored
There are
many different simulated annealing algorithms. Collins (1990) gives annotated bibliography of many of these
To allow for the effect of masking, let Qik denote the quantity of type k components that are suspected of causing failure in the ith system. represent
the
system
i
Then,
contribution
to
methods and applications. The general method can be explained in two parts.
letting Li the
full
Step 1) A random starting point is chosen by the user.
likelihood we find the generalized likelihood expression:
Next, a new point is selected randomly from the neighborhood of the previous point. This neighborhood may be described by a uniform distribution of fix ed circular or square radius or by virtually any other distribution centered around the previous point. If the new point yields a function evaluation, f(X), which is less than the current minimum function value (in the case
The likelihood of the full sample is then given as: n
the assumption
that
all
greater
terms
111
of
Step 2) If the new function evaluation, f(X), is greater
components ha"e
than the current function minimum, fmin , then an acceptance expression is used as the decision function to determine if the new point will be accepted.
identically distributed Weibulllife, we let: ( i1 k ) B-1 fk ( )t- fit e Ok
minimization,
(2)
L=O L· i=1 1 Cnder
of
maximization), f min , the process is repeated, otherwise step 2 is performed.
~t )Bk
The
acceptance expression is the failsafe which allows for
k
the algorithm to exit the region of a local solution by deliberately
selecting
a
point
which
not
IS
an
improvement over the previous point. This acceptance expression can vary as the search progresses and may be given as a Boltzmann distribution or a Cauchy The ~ILEs, Ok ' .Bk for k=I,2, ... ,J-.: , are the "alues of Ok' ;Jk that miLximize (2) or equivalently , 1.=ln(L).
distribution.
The
complexity of maximizing (2) is brought on by the fact
Simulated
that (1) represents a product of summation terms.
computational complexity of O( i), where
annealing
algorithms
can
have
Thus, the optimization must be performed over all
number of iterations. These algorithms do not require
I
IS
a the
bounded variables, thus, they attempt to searching a
212
space from
± 00 in each dimension.
As indicated
Two separate runs were made with different starting
above, the search is initialized by a single point, chosen
values to ensure that the true maximum was being
either randomly or by the user. These algorithms are,
found. The startinf values for Run #1 were O /lkO)=10,000 and =2.0 for k=1,2, oo. ,10. The procedure was allowed to run in batch mode for a
.ak
by nature, nondeterministic since the progression of the search is dictated by a series of random numbers.
period of approximately 30 days on the VAX cluster at the University of Louisville Computing Center. In this
Convergence of the algorithm on the global solution can be shown probabilistically for a bounded search space.
30 day period, approximately 360 hours of CPU time were charged.
The results of the iterative search
procedure are presented in Table 1 and the resulting
ANALYSIS AND RESULTS
MLEs are given in Table 2. System Description The data used in this research is a system of 145
From Table 1 we see that at the initial starting points
facilitate analysis, the components have been placed
yielded a log-likelihood value of L= - 12437.5. After 1,190,000 iteration the procedure had found a
into 10 categories as follows:
maximum at L= - 5645.52. Looking further at Table 1
components configured in a series arrangement.
To
we see that the search gets near the maximum very quickly, Component
Code
Transmit NMOS Capacitors Connectors Bipolar BiCMOS Oscillators Var. Resistors CM Power
1 2 3 4 5 6 7 8 9 10
i.e.,
after
only
100,000
iterations
(approximately 30 minutes of CPU time) it has found a
Qty.
value of 1
But, the algorithm requires
- 5649.0.
considerable time to significantly improve upon its
1
111
quick success.
4 1
This is an inherent feature of the
simulated annealing algorithm.
15 2 7 1
The resulting MLEs for each of the 10 component types
1
are given in Table 2. From the table we see that each component category has a shape parameter greater than 1.
This indicates that the components are
Sample Data The life data used for estimation of these component
exhibiting an increasing failure rate, a surprising results in that most electronic components yield decreasing
reliabilities, consists of a sample of 7700 systems.
failure rates. Further work is underway to understand the implications of this result.
Through careful analysis of this data we find that it consists of 466 failure times and 7234 censoring times.
To help ensure that the search was in fact finding a
Of the 466 system failures, 358 have known cause of failure.
global maximum, the procedure was repeated with
That is, there are 108 system failures where
different starting values.
the cause of failure has been isolated only to some subset of components.
#2 were /lkO)=l,OOO and
These masked failures can be
The starting values for Run
.akO)=0.5 for k=1,2, oo. ,10.
The
procedure was again allowed to run in batch mode for a
summarized as follows:
period of approximately 14 da.ys on the V AX cluster at the University of Louisville Computing Center. In this 14 day period, approximately 100 hours of CPU time
69 failures with cause isolated to one of 2 components 31 failures with cause isolated to one of 3 components
were charged .
12 failures with cause isolated to one of 4 components
The results of the iterative search
procedure are presented in Table 3 and the resulting
6 failures with cause isolated to one of 5 components
MLEs are given in Table 4. Total 108 masked failures
From Table 3, we again see that the procedure quickly converged toward the maximum. But , once near the
Data A naIysis The simulated
maximum the procedure required extensive time and annealing
algorithm,
described
effort to move further towards the true maximum. From Table 4 we see that the MLEs are consistent with
III
Section 3, was applied to the problem of maximizing equation (2) for the given data set. For the system
those obtained
considered here, i.e., 10 component categories each having
two
parameters,
the
problem
in
Run
# 1, indicating
that
the
procedure has converged to a solution at or near the true global maximum.
is one of
maximizing a non-linear function over a 20-dimensional search space.
213
Table 1 - Computational Results of Simulated Annealing (Run #1)
DIRECTIONS FOR FUTURE WORK
Iterations
Our research on the problem of finding MLEs of
0
component reliability from masked system life data is continuing.
- 5664.6
60,000
- 5651.7
100,000
-5649.0
other optimization techniques, including the multiBayesian search method, DIRECT, under development at the University of Louisville in conjunction with the General Motors Research Labs. Those results will be documented in forthcoming reports.
In particular we
are interested in evaluating which methods yield both a high probability of convergence on the true maximum and
require
completion
a of
minimum of CPU our
research,
we
time. plan
Upon to
have
comparative results from which decisions can be made as to the appropriate optimization method to use for a particular type and size of data set .
- 12437.5
20,000
We are currently investigating the use of
univariate method, Stuckman (1989), and a second
Log-Likelihood
120,000
- 5648.9
160,000
- 5648.1
190,000
- 5647.6
200,000
- 5647.1
250,000
- 5646.8
310,000
- 5646.8
410,000
- 5646.7
440,000
- 5646.7
590,000
- 5646.4
740,000
- 5646.2
890,000
- 5646.0
1,190,000
- 5645.5
CPU Time:;;:; 3 minutes per 10.000 iterations (VAX)
REFERENCES Collins, N.E, Eglese, R.W., and B.L. Golden. (1990) "Simulated Annealing -- An Annotated Bibliography" , Am Journal of Math . f3 Management Sciences, Vo!. 8, Nos. 3 & 4, 209-307.
Table 3 - Computational Results of Simulated Annealing (Run #2)
Guess, F.M., Usher, J.S., and Hodgson, T.J. (1991) "Estimating system and component reliabilities under partial information on the cause of failure", to appear in Journal of Statistical Planning and Inference.
Iterations
Nelson, W. (1982) Applied Life Data Analysis, New York, Wiley, 1982. Stuckman, B.E. (1989a) "A Global Search Method for Optimizing Nonlinear Systems", IEEE Transactions on Systems, Man and Cybernetics, Vol, 18, No. 6. pp. 965977. Usher, J.S., Alexander, S.M., and Thompson, J.D. (1990) "System Reliability Prediction Based On Historical Data", Quality and Reliability Engineering International, Vo!. 6, pp 209-218.
0
- 908,364.3
100
- 207,786.0
10,000
-5843.0
40,000
- 5811.9
90,000
-5804.0
140,000
-5647.9
190,000
- 5647.3
CPU Time;;;; 3 minutes per 10,000 iterations (VAX)
Usher, J.S. and Guess, F.M. (1989) "An Iterative Approach For Estimating Component Reliability From Masked System Life Test Data" , Quality and Reliability Engineering International, Vol 5., pp 257-261.
Table 2 -
~ILEs
Log- Likelihood
of Weibull Parameters
(Run #1 - Starting Values: 0=10 ,000 P=2 .0)
o(hrs.)
p
Comp. Type
Code
Transmit
1
7,080
N~IOS
2
50,567
1.284
Capacitors
3
63 ,392
2.623
1.552
Connectors
4
55,924
2.048
Bipolar
5
30,454
1.474
BiOIOS
6
55,283
1.860
Oscillators
7
31 ,743
1.268
Var. Resistors
8
26,361
2.377
01
9
11,866
2.487
Power
10
28,846
2.206
214
Table 4 - MLEs of Weibull Parameters (Run #2 - Starting Values: 8=1 ,000 ~=0. 5 ) Comp. Type
Code
Transmit
1
8 (hrs.)
~
6,979
NMOS
2
43,585
1.519 1.329
Capacitors
3
60,736
2.657
Connectors
4
59,114
2.001
Bipolar
5 6 7 8 9 10
49,550
1.292 2.015
BiCMOS Oscillators Var. Resistors CM Power
40,684 37,320 30,041 20,033 30,863
215
1.192 2.324 1.949 2.132