System Reliability Prediction Using Global Optimization

System Reliability Prediction Using Global Optimization

SYSTEM RELIABILITY PREDICTION USING GLOBAL OPTIMIZATION J. Ushert, B. Stuckman t t - Department of Industrial Engineering, University of Louisville ...

859KB Sizes 2 Downloads 96 Views

SYSTEM RELIABILITY PREDICTION USING GLOBAL OPTIMIZATION

J. Ushert, B. Stuckman t t - Department of Industrial Engineering, University of Louisville

t - Department of Electrical Engineering,

University of Louisville Louisville, Kentucky 40292

Abstract.

The prediction of system reliability is an important consideration in

manufacturing system design.

In realistic scenarios, this problem involves the

maximization of a likelihood function over a large multidimensional space. If there is access to system failure data where the the exact cause of failure is known, then the maximization can be performed analytically.

In many situations however, system

failure data is masked, meaning that the cause of failure can only be narrowed to some subset of possible components. In this case, a global optimization algorithm must be used to maximize the likelihood function. This paper presents the mathematical framework for the formation of the likelihood function for Weibull distributed component failures in the presence of masked data. A case study is presented involving the reliability of a system based upon ten components. Simulated annealing is used to find the maximum likelihood estimates of the component parameters.

Results are

compared with other methods of estimation. Keywords.

Optimization;

global optimization; simulated annealing;

reliability;

maximum likelihood estimation; masked data analysis. INTRODUCTION Estimating the reliability of components from system

In practice, however, this type of analysis is often

life data is generally accomplished by making a series

confounded by the problem of ~, i.e., where the exact cause of system failure is unknown. This occurs

system assumption and applying a competing risks model. The observable quantities of interest are the

frequently in complex systems and in field data where

lifelength of the system (failure or censoring time) and

the cause of failure may only be isolated to some subset

the exact component causing failure . Finding maximum likelihood estimates (MLEs) for component life

of components, such as a circuit card containing many

distribution parameters has been widely addressed in

individual components. The resulting quantities observed are then the lifelength of the system, and

the literature. For numerous references and results, see

partial information on the cause of failure.

for example Nelson (1982) . Due to the increasing costs of system downtime and the The component reliability estimates that are obtained

complexity of detailed failure analysis, masked system

from analysis of system life data are extremely useful

life data is becoming more widespread. Generally this

because they reflect the reliability of components after

forces many analysts to discard data that could be effectively used in the estimation process.

their assembly into an operational system. As such, the estimates account for the many degrading effects introduced by the system manufacturing, assembly, Because of distribution and installation processes.

In this research, we address the problem of obtaining

these

data.

advantages,

companies

such

as

IBr.l,

MLEs of component reliability from masked system life

are

beginning to implement computer plans designed to

estimators with excellent statistical properties but , as

track such system life data and generate component reliability estimates,

c.

f.

The method of maximum likelihood yields

we will show here, it is particularly difficult to apply in

Usher , Alexander, and

the masked data case.

Thompson (1990).

211

Section 2 describes the statistical development of the

parameters

likelihood function for the masked data case.

simultaneously , rather

than

separably.

In

Even a cursory review of the literature would reveal an

Section 3 we briefly describe the simulated annealing

abundance of well-known local optimization methods.

method of global optimization which has been used to

However, these algorithms may not be applicable

Section 4 describes

because they do not guarantee that a global optimum

that data set used in the research and the estimates

will be found . In multi-parameter situations they often

maximize the likelihood function.

that have been found using global optimization.

converge to a local optimum.

In

Most of these methods

also require good initial starting points to guarantee

Section 5 we discuss future directions for the research.

convergence. In addition, the faster gradient methods, THE STATISTICAL MODEL

such as Newton-Raphson, require first and second

As an extension to the work of Guess, Usher and

masking case, these are not available in closed form,

Hodgson (1991) , and Usher and Guess (1989) , assume

therefore requiring even more numerical computation.

partial derivatives of the likelihood function.

For the

that n non-identical systems have been placed onto Further suppose that each of the components in

The overall result is that MLEs, with their desirable

these systems can be placed into one of K classes. Let

statistical properties (invariance, asymptotic efficiency

N' k represent the number of type-k components in the

and normality, etc.), are extremely expensive and

test.

it~

system .

difficult to obtain without the use of sophisticated

As before let T·1 represent the time to

failure of the ith system.

global optimization methods.

Let fk(t), and I\(t) be the

density and reliability function for a type-k component lifelength with parameter ,'ector

~k'

SIMULATED ANNEALING

Let "i be the

indicator variable to denote system failure or censoring In order to maximize (2), we have employed the use of

where we define:

a general simulated annealing algorithm.

b - {I if system i fails i - 0 if system i is censored

There are

many different simulated annealing algorithms. Collins (1990) gives annotated bibliography of many of these

To allow for the effect of masking, let Qik denote the quantity of type k components that are suspected of causing failure in the ith system. represent

the

system

i

Then,

contribution

to

methods and applications. The general method can be explained in two parts.

letting Li the

full

Step 1) A random starting point is chosen by the user.

likelihood we find the generalized likelihood expression:

Next, a new point is selected randomly from the neighborhood of the previous point. This neighborhood may be described by a uniform distribution of fix ed circular or square radius or by virtually any other distribution centered around the previous point. If the new point yields a function evaluation, f(X), which is less than the current minimum function value (in the case

The likelihood of the full sample is then given as: n

the assumption

that

all

greater

terms

111

of

Step 2) If the new function evaluation, f(X), is greater

components ha"e

than the current function minimum, fmin , then an acceptance expression is used as the decision function to determine if the new point will be accepted.

identically distributed Weibulllife, we let: ( i1 k ) B-1 fk ( )t- fit e Ok

minimization,

(2)

L=O L· i=1 1 Cnder

of

maximization), f min , the process is repeated, otherwise step 2 is performed.

~t )Bk

The

acceptance expression is the failsafe which allows for

k

the algorithm to exit the region of a local solution by deliberately

selecting

a

point

which

not

IS

an

improvement over the previous point. This acceptance expression can vary as the search progresses and may be given as a Boltzmann distribution or a Cauchy The ~ILEs, Ok ' .Bk for k=I,2, ... ,J-.: , are the "alues of Ok' ;Jk that miLximize (2) or equivalently , 1.=ln(L).

distribution.

The

complexity of maximizing (2) is brought on by the fact

Simulated

that (1) represents a product of summation terms.

computational complexity of O( i), where

annealing

algorithms

can

have

Thus, the optimization must be performed over all

number of iterations. These algorithms do not require

I

IS

a the

bounded variables, thus, they attempt to searching a

212

space from

± 00 in each dimension.

As indicated

Two separate runs were made with different starting

above, the search is initialized by a single point, chosen

values to ensure that the true maximum was being

either randomly or by the user. These algorithms are,

found. The startinf values for Run #1 were O /lkO)=10,000 and =2.0 for k=1,2, oo. ,10. The procedure was allowed to run in batch mode for a

.ak

by nature, nondeterministic since the progression of the search is dictated by a series of random numbers.

period of approximately 30 days on the VAX cluster at the University of Louisville Computing Center. In this

Convergence of the algorithm on the global solution can be shown probabilistically for a bounded search space.

30 day period, approximately 360 hours of CPU time were charged.

The results of the iterative search

procedure are presented in Table 1 and the resulting

ANALYSIS AND RESULTS

MLEs are given in Table 2. System Description The data used in this research is a system of 145

From Table 1 we see that at the initial starting points

facilitate analysis, the components have been placed

yielded a log-likelihood value of L= - 12437.5. After 1,190,000 iteration the procedure had found a

into 10 categories as follows:

maximum at L= - 5645.52. Looking further at Table 1

components configured in a series arrangement.

To

we see that the search gets near the maximum very quickly, Component

Code

Transmit NMOS Capacitors Connectors Bipolar BiCMOS Oscillators Var. Resistors CM Power

1 2 3 4 5 6 7 8 9 10

i.e.,

after

only

100,000

iterations

(approximately 30 minutes of CPU time) it has found a

Qty.

value of 1

But, the algorithm requires

- 5649.0.

considerable time to significantly improve upon its

1

111

quick success.

4 1

This is an inherent feature of the

simulated annealing algorithm.

15 2 7 1

The resulting MLEs for each of the 10 component types

1

are given in Table 2. From the table we see that each component category has a shape parameter greater than 1.

This indicates that the components are

Sample Data The life data used for estimation of these component

exhibiting an increasing failure rate, a surprising results in that most electronic components yield decreasing

reliabilities, consists of a sample of 7700 systems.

failure rates. Further work is underway to understand the implications of this result.

Through careful analysis of this data we find that it consists of 466 failure times and 7234 censoring times.

To help ensure that the search was in fact finding a

Of the 466 system failures, 358 have known cause of failure.

global maximum, the procedure was repeated with

That is, there are 108 system failures where

different starting values.

the cause of failure has been isolated only to some subset of components.

#2 were /lkO)=l,OOO and

These masked failures can be

The starting values for Run

.akO)=0.5 for k=1,2, oo. ,10.

The

procedure was again allowed to run in batch mode for a

summarized as follows:

period of approximately 14 da.ys on the V AX cluster at the University of Louisville Computing Center. In this 14 day period, approximately 100 hours of CPU time

69 failures with cause isolated to one of 2 components 31 failures with cause isolated to one of 3 components

were charged .

12 failures with cause isolated to one of 4 components

The results of the iterative search

procedure are presented in Table 3 and the resulting

6 failures with cause isolated to one of 5 components

MLEs are given in Table 4. Total 108 masked failures

From Table 3, we again see that the procedure quickly converged toward the maximum. But , once near the

Data A naIysis The simulated

maximum the procedure required extensive time and annealing

algorithm,

described

effort to move further towards the true maximum. From Table 4 we see that the MLEs are consistent with

III

Section 3, was applied to the problem of maximizing equation (2) for the given data set. For the system

those obtained

considered here, i.e., 10 component categories each having

two

parameters,

the

problem

in

Run

# 1, indicating

that

the

procedure has converged to a solution at or near the true global maximum.

is one of

maximizing a non-linear function over a 20-dimensional search space.

213

Table 1 - Computational Results of Simulated Annealing (Run #1)

DIRECTIONS FOR FUTURE WORK

Iterations

Our research on the problem of finding MLEs of

0

component reliability from masked system life data is continuing.

- 5664.6

60,000

- 5651.7

100,000

-5649.0

other optimization techniques, including the multiBayesian search method, DIRECT, under development at the University of Louisville in conjunction with the General Motors Research Labs. Those results will be documented in forthcoming reports.

In particular we

are interested in evaluating which methods yield both a high probability of convergence on the true maximum and

require

completion

a of

minimum of CPU our

research,

we

time. plan

Upon to

have

comparative results from which decisions can be made as to the appropriate optimization method to use for a particular type and size of data set .

- 12437.5

20,000

We are currently investigating the use of

univariate method, Stuckman (1989), and a second

Log-Likelihood

120,000

- 5648.9

160,000

- 5648.1

190,000

- 5647.6

200,000

- 5647.1

250,000

- 5646.8

310,000

- 5646.8

410,000

- 5646.7

440,000

- 5646.7

590,000

- 5646.4

740,000

- 5646.2

890,000

- 5646.0

1,190,000

- 5645.5

CPU Time:;;:; 3 minutes per 10.000 iterations (VAX)

REFERENCES Collins, N.E, Eglese, R.W., and B.L. Golden. (1990) "Simulated Annealing -- An Annotated Bibliography" , Am Journal of Math . f3 Management Sciences, Vo!. 8, Nos. 3 & 4, 209-307.

Table 3 - Computational Results of Simulated Annealing (Run #2)

Guess, F.M., Usher, J.S., and Hodgson, T.J. (1991) "Estimating system and component reliabilities under partial information on the cause of failure", to appear in Journal of Statistical Planning and Inference.

Iterations

Nelson, W. (1982) Applied Life Data Analysis, New York, Wiley, 1982. Stuckman, B.E. (1989a) "A Global Search Method for Optimizing Nonlinear Systems", IEEE Transactions on Systems, Man and Cybernetics, Vol, 18, No. 6. pp. 965977. Usher, J.S., Alexander, S.M., and Thompson, J.D. (1990) "System Reliability Prediction Based On Historical Data", Quality and Reliability Engineering International, Vo!. 6, pp 209-218.

0

- 908,364.3

100

- 207,786.0

10,000

-5843.0

40,000

- 5811.9

90,000

-5804.0

140,000

-5647.9

190,000

- 5647.3

CPU Time;;;; 3 minutes per 10,000 iterations (VAX)

Usher, J.S. and Guess, F.M. (1989) "An Iterative Approach For Estimating Component Reliability From Masked System Life Test Data" , Quality and Reliability Engineering International, Vol 5., pp 257-261.

Table 2 -

~ILEs

Log- Likelihood

of Weibull Parameters

(Run #1 - Starting Values: 0=10 ,000 P=2 .0)

o(hrs.)

p

Comp. Type

Code

Transmit

1

7,080

N~IOS

2

50,567

1.284

Capacitors

3

63 ,392

2.623

1.552

Connectors

4

55,924

2.048

Bipolar

5

30,454

1.474

BiOIOS

6

55,283

1.860

Oscillators

7

31 ,743

1.268

Var. Resistors

8

26,361

2.377

01

9

11,866

2.487

Power

10

28,846

2.206

214

Table 4 - MLEs of Weibull Parameters (Run #2 - Starting Values: 8=1 ,000 ~=0. 5 ) Comp. Type

Code

Transmit

1

8 (hrs.)

~

6,979

NMOS

2

43,585

1.519 1.329

Capacitors

3

60,736

2.657

Connectors

4

59,114

2.001

Bipolar

5 6 7 8 9 10

49,550

1.292 2.015

BiCMOS Oscillators Var. Resistors CM Power

40,684 37,320 30,041 20,033 30,863

215

1.192 2.324 1.949 2.132