Reliability Engineering and System Safety 112 (2013) 82–93
Contents lists available at SciVerse ScienceDirect
Reliability Engineering and System Safety journal homepage: www.elsevier.com/locate/ress
Probabilistic sensitivity analysis of system availability using Gaussian processes Alireza Daneshkhah, Tim Bedford n Department of Management Science, University of Strathclyde, Glasgow G1 1QE, UK
a r t i c l e i n f o
a b s t r a c t
Article history: Received 7 January 2012 Received in revised form 30 October 2012 Accepted 1 November 2012 Available online 27 November 2012
The availability of a system under a given failure/repair process is a function of time which can be determined through a set of integral equations and usually calculated numerically. We focus here on the issue of carrying out sensitivity analysis of availability to determine the influence of the input parameters. The main purpose is to study the sensitivity of the system availability with respect to the changes in the main parameters. In the simplest case that the failure repair process is (continuous time/ discrete state) Markovian, explicit formulae are well known. Unfortunately, in more general cases availability is often a complicated function of the parameters without closed form solution. Thus, the computation of sensitivity measures would be time-consuming or even infeasible. In this paper, we show how Sobol and other related sensitivity measures can be cheaply computed to measure how changes in the model inputs (failure/repair times) influence the outputs (availability measure). We use a Bayesian framework, called the Bayesian analysis of computer code output (BACCO) which is based on using the Gaussian process as an emulator (i.e., an approximation) of complex models/functions. This approach allows effective sensitivity analysis to be achieved by using far smaller numbers of model runs than other methods. The emulator-based sensitivity measure is used to examine the influence of the failure and repair densities’ parameters on the system availability. We discuss how to apply the methods practically in the reliability context, considering in particular the selection of parameters and prior distributions and how we can ensure these may be considered independent—one of the key assumptions of the Sobol approach. The method is illustrated on several examples, and we discuss the further implications of the technique for reliability and maintenance analysis. & 2012 Elsevier Ltd. All rights reserved.
Keywords: Sensitivity analysis Sobol indices Dependency Computer code model output Emulator-based sensitivity Gaussian process System availability
1. Introduction In this paper, we present a new approach to study the sensitivity analysis of availability. In general sensitivity analysis is concerned with understanding how changes in the model input (distribution parameters) would influence the output. Suppose that our deterministic model can be written as y ¼ f ðxÞ, where x is a vector of input variables (or parameters) and y is the model output. For example, the inputs could be considered as the parameters of the failure and repair densities, h, and the output could be the availability Aðt, hÞ at time t. The traditional method of examining sensitivity of a model with respect to the changes in its input variables is local sensitivity analysis which is based on derivatives of f ðÞ evaluated at some ‘base-line’ (or central estimate) x ¼ x0 and indicates how the output y will change if the base line input values are slightly perturbed
n
Corresponding author. E-mail address:
[email protected] (T. Bedford).
0951-8320/$ - see front matter & 2012 Elsevier Ltd. All rights reserved. http://dx.doi.org/10.1016/j.ress.2012.11.001
(see [11] for the different local sensitivity measures commonly used in Bayesian analysis). This is clearly of limited value in understanding the consequences of real uncertainty about the inputs, which would in practice require more than infinitesimal changes in the inputs. Furthermore, these methods are computationally very expensive for complex models and usually require a considerable number of model runs if we use a Monte Carlo based method to compute these sensitivity measures. For instance, Marseguerra et al. [21] used Monte Carlo simulation to calculate the first-order differential sensitivity indexes of the basic events characterising the reliability behaviour of a nuclear safety system. They reported that the computation of the sensitivity indexes for the system unavailability at the mission time by Monte Carlo simulation requires 107 iterations. In another study, Reedijk [28] reported that first order reliability methods and Monte Carlo simulation have certain disadvantages and some problems could not be solved with these methods. This issue is particularly interesting in the case where the model is computationally expensive so that simply computing the output for any given set of input values is a non-trivial task.
A. Daneshkhah, T. Bedford / Reliability Engineering and System Safety 112 (2013) 82–93
This is especially the case for large process models in engineering, environmental science, reliability analysis, etc. that may be implemented in complex computer codes requiring many minutes, hours or even days for a single run. However, in order to implement many of the standard sensitivity techniques discussed by [30] we require a very large number of model runs. In that case even for a model that takes just one second to run, many sensitivity analysis measures may take too long to compute. The most frequently used sensitivity indices are due to Sobol [32]. However, these require an assumption of independence (as discussed by Bedford [1]). Hence in Section 3 we discuss here how one might go about choosing an appropriate parameterisation in which the sensitivity analysis can be carried out using independent uncertainty variables. It should be noticed that the probabilistic sensitivity analyses are often effectively carried out with efficient sampling procedures (e.g., [14,15]), but these procedures are computationally very expensive. Therefore, we present an alternative computational tool to implement sensitivity analysis based on the work of [22]. This is a Bayesian approach of sensitivity analysis which unifies the various methods of probabilistic sensitivity analysis which will be briefly introduced in Section 2. This approach is computationally highly efficient and allows effective sensitivity analysis to be achieved by using very smaller numbers of model runs than Monte Carlo methods require. The range of tools used in this approach also enables us to do uncertainty analysis, prediction, optimisation and calibration. Section 4 presents this method. This paper extends work carried out by Daneshkhah and Bedford [10] where emulators were used to examine the influence of failure and repair densities’ parameters on the system availability of a simple one component repairable system where Exponential and Weibull were considered as distributions for the failure and repair rates. Here, we consider the sensitivity analysis of repairable systems with more than one component, chosen so that we can use numerical integration to compute availability. The systems are: a parallel system with two components where the failure and repair distributions are exponentials; a well known standby-redundancy system where there are three parameters to be examined and the failure and repair distributions are also exponentials; and move-drive system with eight components and 17 parameters, where the repair rate is constant but the failure distributions are Weibulls. There are closed forms for the availability functions for the first two systems and we use them to validate the method. But there is no closed form for the third system, and to evaluate the system availability at the selected parameter values, an expensive numerical method required. We present some conclusions and possible future developments in Section 6.
2. Probabilistic sensitivity analysis Local sensitivity analysis evaluates the influence of uncertain inputs around a point in the input space and generally relies on the estimation, at this point, of the partial derivatives of the output with respect to the inputs. This is known as a one-at-atime measure of sensitivity because it measures the effect on the response of varying one input alone by a fixed fraction of its value (assumed known). As a consequence, if the model is nonlinear, the relative importance of the model inputs depends on the chosen point. Several investigators tried to get around this limitation by evaluating averages of the partial derivatives at different points in the input space. Conversely, global sensitivity analysis of model output evaluates the relative importance of inputs when they are varied generously, i.e., when their uncertainty is acknowledged over a
83
wide range. One approach to global sensitivity analysis is the analysis of variance of the model response originally proposed by [32]. In this setting nonlinearity in the model is not an issue. The approach can capture the fraction of the model response variance explained by a model input on its own or by a group of model inputs. In addition, it can also provide the total contribution to the output variance of a given input, that is its marginal contribution and its cooperative contribution. There are different computational techniques to perform this sensitivity analysis, e.g., [32,30,33]. We will focus on the emulator-based method to calculate this sensitivity measure presented by [22]. Borgonovo et al. [4] introduced the moment-independent sensitivity methods which are recently attracted by practitioners (particulary, in Environmental sciences). Similar to the emulatorbased sensitivity method, these methods provide a thorough way of investigating the sensitivity of model output under uncertainty. However, their estimation is challenging, especially in the presence of computationally intensive models. Borgonovo et al. [4] suggest to replace the original model by a metamodel to lower the computation burden. They utilise the emulator proposed in [27] as an efficient metamodel. Their results show that the emulator allows an accurate estimation of density-based sensitivity measures, when the main structural features of the original model are captured (see also [3]). Ratto et al. [26] also reported that emulators (also denoted as metamodelling in the literature) provide an efficient means for doing a sensitivity analysis for large and expensive models. They provide some tools and applications of sensitivity analysis in the context of environmental modelling. Caniou and Sudret [6] propose an alternative method against the variance-based sensitivity methods mentioned above (e.g., Sobol indices) which are relatively expensive because of the conditional moments estimation for quantifying the uncertainty of the model output due to the uncertain input variable. They substitute the initial model by an alternative metamodel called polynomial chaos expansion. The process is applied to numerical test cases. Their obtained results are discussed and compared with the reference obtained with variance-based methods. In 2011 [7], these authors developed this work to the situation where the input parameters are correlated. In this paper, we focus on the method suggested by [22]. We shall consider how a function f ðxÞ depends on its input variables, and in our case f will typically be the function that computes system availability as a function of a vector of quantities such as failure and repair distribution parameters. Note that we would consider availability at different time points as distinct functions, so time is not considered as an input variable. We also assume the existence of a distribution G representing the uncertainty of the parameters. We discuss below appropriate choices for the vector of inputs and the prior distribution G. For the purpose of computing Sobol indices it is often assumed that G has to be an independent distribution, and we discuss this further below. Some notation is introduced first. Write a d-dimensional random vector as X ¼ ðX 1 , . . . ,X d Þ, where Xi is the ith element of X, the subvector ðX i ,X j Þ is shown by Xi,j , and in general if p is a set of indices then write Xp for the subvector of X whose elements have those indices. We also denote Xi as the subvector of x containing all elements except xi. Similarly, x ¼ ðx1 , . . . ,xd Þ denote the corresponding observed random vector X.
2.1. Main effects and interactions Since f is a function of uncertain quantities X we can consider its expected value (when f is availability at a given time, then the expected value is the predictive availability at that time).
84
A. Daneshkhah, T. Bedford / Reliability Engineering and System Safety 112 (2013) 82–93
2.2. Variance-based methods
Write z0 ¼ E½f ðXÞ. The function zi ðxi Þ ¼ E½f ðXÞ9xi E½f ðXÞ is called the main effect of variable i. It is the function of xi only that best approximates f in the sense of minimising the variance (calculated over the other variables)
V i ¼ varfEðY9X i Þg:
varXi ðf ðX1 , . . . ,Xi1 ,xi ,Xi þ 1 , . . . ,Xn Þzi ðxi ÞÞ: The main effect has a straightforward interpretation in our context: It is the expected change to the availability that would be obtained if we were to know that parameter i has value xi, taking into account the residual uncertainty in the other parameters. Furthermore, the well-known decomposition of variance formula varðYÞ ¼ EðvarðY9XÞ þvarðEðY9XÞÞ implies that the variance of the main effect for parameter i is the expected amount by which the variance of f would be reduced if were to know the value of parameter i. The above definition and interpretation do not require that G is an independent distribution. If the parameters are not independent then the variance reduction from specifying one parameter might change the amount of variance reduction that could be achieved by specifying another parameter. See the discussion in [1] for ways to define variance reduction in this case. Assuming that the parameters are independent one can go much further. Sobol [32] proves that any square integrable mathematical function can be decomposed as follows: y ¼ f ðxÞ ¼ E½f ðXÞ þ
d X i¼1
þ
X
zi ðxi Þ þ
X zi,j ðxi,j Þ
A second measure, first proposed by [18], is V T i ¼ varðYÞvarfEðY9Xi Þg, which is the remaining uncertainty in Y that is unexplained after everything has been learnt except xi. These two measures can be converted into scale invariant measures by dividing by var(Y) as follows: Si ¼
Vi , varðYÞ
zi,j,k ðxi,j,k Þ þ þ z1,2,...,d ðxÞ,
ð1Þ
iojok
where zi ðxi Þ ¼ E½f ðXÞ9xi E½f ðXÞ, zi,j ðxi,j Þ ¼ E½f ðXÞ9xi,j zi ðxi Þzj ðxj Þ E½f ðXÞ, zi,j,k ¼ E½f ðXÞ9xi,j,k zi,j ðxi,j Þzi,k ðxi,k Þ zj,k ðxj,k Þzi ðxi Þzj ðxj Þzk ðxk ÞE½f ðXÞ
and so on. This is called the Sobol decomposition. Note that Ratto et al. [26,33] also mentioned that the starting point of the metamodel building process is the representation of y ¼ f ðxÞ in terms of the ANOVA model given in (1). We call zi,j ðxi,j Þ the first-order interaction between xi and xj and so on. It is easy to see that the functions in the Sobol decomposition are pairwise orthogonal, that is, for each ðir , . . . ,is Þ a ðiq , . . . ,it Þ
because of the independence assumption. By assumption all the terms (except the constant term) have mean zero. It is straightforward to show that if we require the functions in a Sobol decomposition to be orthogonal with zero mean, then in fact they have to be the functions given above, that is the Sobol decomposition is unique. Ways of defining a Sobol decomposition for dependent variables are discussed in [1]. It is important to remember that the Sobol decomposition does, however, depend on the distribution G of the uncertain inputs. The sensitivity measures determined are done so with respect to the distribution G. However, they allow us to identify which elements in X are the most influential in inducing the uncertainty in f. The main effects and first-order interactions and their plots can be considered as a powerful visual tool to investigate how the model output responds to each individual input, and how those inputs interact in their influence on the model output.
ST i ¼
V Ti ¼ 1Si , varðYÞ
where Si is the main effect index of xi, and ST i is the total effect index of xi. The details of other sensitivity analysis methods, such as variance decomposition and regression components can be seen in [22]. The variance measures are linked to the Sobol decomposition, when the parameters are independent, because the independence of the terms in the Sobol decomposition implies that the total variance for f can be represented as the sum of the variances for each term X X varðf Þ ¼ Vi þ V i,j þ þ V 1,2,...,d , i
ioj
E½zir ,...,is ðX ir , . . . ,X is Þ ziq ,...,it ðX iq , . . . ,X it Þ ¼ 0
As mentioned above, the variance of the main effect can be interpreted as the amount by which the overall variance of f would be reduced if we knew xi. The main effect variance for parameter i is therefore defined as
ioj
where V i1 ,...,ik ¼ varðzi1 ,...,ik Þ. Hence the main effect variance is one of the terms in the variance decomposition, and the total variance for i is the sum of all the variances of terms involving i.
3. Choice of parameters and prior distribution G In order to give a useful operational meaning to the sensitivity measures described in this paper, it is important that the parameters used have a reasonable operational interpretation. In this part we discuss how to ensure that the parameters are meaningful and how the uncertainty on the parameters should be modelled. Bedford and Cooke [2] take the view that uncertainty quantification has to be about observable quantities, or at least about quantities that could in principle be observed. This is based around the notion of subjective probability which is used to quantify uncertainties. Hence we could in principle use probabilities to describe the uncertainty in the event that my car will start tomorrow morning, giving us probabilities of single events. Similarly we could use probabilities to consider the event that my car will fail by a given time t, giving us cumulative probability distributions. However, we can also use probabilities at a second order level if we are happy to consider families of events that we judge are exchangeable. The classical example of this is a sequence of coin tosses which are judged to be exchangeable. In principle we could consider the long-run average number of heads to be an observable, and therefore use probabilities to quantify the uncertainty in the long-run average. This approach, while allowing us to consider ‘‘a probability of a probability’’ does not allow anything further (a probability of a probability of a probability) because it ties the second level probability into an exchangeable sequence. It also allows us to quantify the uncertainty in median lifetime, or the uncertainty in other quantiles (admittedly, explaining to a domain expert what a quantile of a quantile is, is not easy, but at least it makes philosophical sense). This approach gives a clear articulation of the difference between
A. Daneshkhah, T. Bedford / Reliability Engineering and System Safety 112 (2013) 82–93
3.20
EF
3.15
3.10
3.05
3.00 1480.00
1485.00
1490.00
1495.00
1500.00
t50 Fig. 1. Random sample of 100 points for of an independent distribution on t50 and EF ¼ t95 =t 50 .
.00
.00
k
aleatory and epistemic uncertainty, as also discussed for example in [12,13,17,24,25]. From this perspective, and in a reliability context, it makes sense to consider sensitivity analysis for observables as described above, for example, mean time to failure, mean time to repair, quantiles of failure time distributions, etc. The parameters of the classical lifetime distributions do not always have a direct operational interpretation, but may sometimes be derived from something with such an interpretation. For example the constant failure rate of an exponential distribution is one over the mean time to failure. A less obvious example is the Weibull where the shape and scale parameters might not be directly considered as observables, but could be calculated if two quantiles are specified. Quantiles, as discussed above, can be considered observable. Hence uncertainty on the values of two quantiles (for definiteness we consider the 50% and 95% quantiles) can be transformed into uncertainty on the scale and shape parameters. Alternatively, uncertainty about these parameters could be given an operational meaning by transforming into uncertainty about quantile values. This type of approach is used in [16] to specify uncertainty in hazard curves for seismic events. In practical terms the initial specification through uncertainty on quantiles might come through expert judgement, while uncertainties on parameters might be available through generic databases. In either case, reliability managers can find themselves in a situation where they are designing new systems and have to deal with uncertainty assessments about the components of the system. In order to reduce the overall uncertainty for the system the managers will wish to explore which parameters could provide the biggest decrease in uncertainty, if they could be determined more precisely. That is the problem context for this paper. Focussing particularly on the Weibull distribution, we use the parameterisation ! 1þa kt FðtÞ ¼ 1exp , t Z0,k 4 0, a 41, 1þa
85
.00
.00
where k is the shape parameter and a is the scale parameter. The a failure rate is lðtÞ ¼ kt . By taking logarithms twice we have lnðlnð1FðtÞÞÞ ¼ lnðkÞlnð1 þ aÞ þ ð1 þ aÞlnðtÞ:
ð2Þ
Hence if we have the 50% and 95% quantiles, t50, t95, then the following equations hold: lnðlnð0:5ÞÞ ¼ ðlnðkÞlnð1þ aÞ þð1 þ aÞlnðt 50 Þ,
ð3Þ
lnðlnð0:05ÞÞ ¼ ðlnðkÞlnð1 þ aÞ þ ð1 þ aÞlnðt 95 Þ,
ð4Þ
from which we determine k and a. There is no reason why the uncertainty distributions defined on k and a should be product distributions (i.e., independent). If we were to consider quantiles for a given lifetime distribution then the ordering requirement (a 50% quantile is always less than a 95% quantile) would imply that the joint uncertainty cannot be independent. However, if we consider the median and the error factor EF, t 50 ,EF ¼ t 95 =t 50 as the parameters of the Weibull, then the uncertainties in these quantities might be considered independent. Therefore, the shape parameter k and the scale parameter a can be determined in terms of t50 and EF ¼ t 95 =t 50 as follows: lnðlnð0:5ÞÞlnðlnð0:05ÞÞ 1 , ð5Þ a¼ lnðt 50 Þlnðt 50 EFÞ k ¼ expðlnðlnð0:5ÞÞ þ lnð1þ aÞð1þ aÞlnðt 50 ÞÞ,
ð6Þ
We shall illustrate the use of the these parameters in Section 5.3.
.00 .24
.26
.28
.30
.32
.34
alpha Fig. 2. Random sample of 100 points for of an independent distribution on t50 and EF ¼ t95 =t 50 , shown in a and k parameters.
As discussed before, the Sobol expansion used in sensitivity analysis does assume that the parameters are independent. For the two possible parameterisations considered here (that is, k and a or t50 and t95 =t50 ) it makes a considerable difference whether one chooses one or the other. This is illustrated in Figs. 1 and 2, where we show 100 points drawn randomly from independent uniform distributions on ð1480,1500Þ for t50 and ð3,3:2Þ for EF ¼ t 95 =t 50 . The same 100 points are used to calculate 100 values for the a and k parameters in Fig. 2, and it can be seen that they are far from independently distributed.
4. Emulators-based sensitivity analysis In principle, if f ðxÞ is sufficiently simple it would be possible to analytically compute the sensitivity measures described in the previous sections. With more complex models this is not possible and we have to seek to derive the desired measures computationally.
86
A. Daneshkhah, T. Bedford / Reliability Engineering and System Safety 112 (2013) 82–93
If f ðxÞ is cheap enough (that is, we can compute it quickly) that we are able to evaluate it easily for many different inputs, standard Monte Carlo methods would be suffice to estimate var(Y). The computation techniques proposed by [32,9,31] demand many thousands of function evaluations. Thus, these methods are impractical for a more expensive function. We use the Bayesian inference tools developed by [22], briefly introduced in the following section, to tackle this computation issue. By using this method we are able to estimate all the quantities of interest required to examine the sensitivity analysis in reliability analysis. The most important aspect of this Bayesian approach is that the model (function) f ðÞ is considered as uncertain. Specifically this means that the value of f ðxÞ is unknown for any particular input configuration x until we actually run the model for those inputs. The Bayesian model specifies a prior joint distribution for the values taken by f ðxÞ at different values of x. This prior is then updated according to the usual Bayesian paradigm, using as data the outputs yi ¼ f ðxi Þ,i ¼ 1, . . . ,N, from a set of runs of the model. The result is a posterior distribution for f ðÞ, which is used to make formal Bayesian inferences about the sensitivity measures that were introduced in the previous section. Although we are still uncertain about the function f at parameter values where it was not evaluated, the uncertainty is very much reduced by the assumed correlation of function values from one point to another. Typically the expected value of the posterior distribution is used as a point estimate for f. The reader should be alert to the fact that there are therefore two different distributions being used in the sensitivity analysis computation: the first is the distribution G which represents the uncertainty in the availability model parameters x, and which is propagated to the output values through the function f; the second is the (posterior) distribution on f which plays a pure computational role, can be reduced as much as required by computing the function f at more points x, and does not have any operational interpretation. Expected values computed over the second distribution will be denoted by EGP (GP standing for Gaussian Process) to avoid potential confusion with expected values taken over G. The next section describes briefly the Bayesian set up and the second distribution.
4.1. Inference about functions using Gaussian processes We consider the function or complex model under study as a deterministic code that returns an output y ¼ f ðxÞ at input vector x. A Gaussian process provide a prior model for our uncertain knowledge of the values taken by this function before it is evaluated through the deterministic code. The evaluation of the code at different input values gives data with which a posterior distribution is derived. In fact, the prior distribution considered for f ðÞ is a two-stage prior distribution, where in the first stage given some hyperparameters a Gaussian process is chosen as a prior distribution on f ðÞ, and the mean vector and covariance matrix of the Gaussian process are determined by estimating the hyperparameters in the second stage. The key requirement to use the Gaussian process is that f ðÞ should be a smooth function, so if we know the value of f ðxÞ we should then have some idea about the value of f ðx0 Þ for x close to x0 . This basic assumption of a smooth, continuous function gives the Gaussian process major computational advantages over MC methods, since the expected proximity of function values evaluated at nearby points is usually ignored in the Monte Carlo simulation approach. Using a Gaussian process prior for f ðÞ implies that the uncertainty about ff ðx1 Þ, . . . ,f ðxn Þg, given any set of points fx1 , . . . ,xn g, can be represented through a multivariate normal distribution. We therefore need to make tractable prior assumptions about the mean and covariance. The mean of f ðxÞ conditional on the hyperparameters, b,
is modelled as E½f ðxÞ9b ¼ hðxÞT b,
ð7Þ
where hðÞ is a vector of q known functions of x, and b is a vector of coefficients. The choice of hðÞ is arbitrary, but it should be chosen to incorporate any beliefs that we might have about the form of f ðÞ. The covariance between f ðxÞ and f ðx0 Þ is given by covðf ðxÞ,f ðx0 Þ9s2 Þ ¼ s2 cðx,x0 Þ,
ð8Þ þ
where cð,Þ is a monotone correlation function on R with cðx,xÞ ¼ 1, and decreases as 9xx0 9 increases. Furthermore, the function cð,Þ must ensure that the covariance matrix of any set of outputs fy1 ¼ f ðx1 Þ, . . . ,yn ¼ f ðxn Þg is positive semi-definite. Throughout this paper, we use the following correlation function which satisfies all the conditions mentioned above and is widely used for its computationally convenience, cðx,x0 Þ ¼ expfðxx0 ÞT Bðxx0 Þg,
ð9Þ
where B is a diagonal matrix of positive smoothness parameters. The matrix B has the effect of rescaling the distance between x and x and x0 . Thus B determines how close two inputs x and x0 need to be such that the correlation between f ðxÞ and f ðx0 Þ takes a particular value. Oakley and O’Hagan [22] suggest the following conjugate prior, the normal inverse gamma distribution, for ðb, s2 Þ pðb, s2 Þpðs2 Þð1=2Þðd þ q þ 2Þ expffðbzÞT V 1 ðbzÞ þ ag=ð2s2 Þg for fixed hyperparameters z,V,a and d. The output of f ðÞ is observed at n design points x1 , . . . ,xn to obtain y ¼ ff ðx1 Þ, . . . ,f ðxn Þg considered as data. It should be noticed that these points, in contrast with Monte Carlo methods, are not chosen randomly but are selected to give good information about f ðÞ. The design points will usually be spread to cover X , the input space of X. Since X is unknown, the beliefs about X is represented by the probability distribution GðXÞ. Therefore, the choice of the design points will also depend on GðÞ (the choice of design points is discussed in [29]). The standardised posterior distribution of f ðÞ given y ¼ ff ðx1 Þ, . . . ,f ðxn Þg is f ðxÞmn ðxÞ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiy t d þ n , ð10Þ s^ cn ðx,x0 Þ where t d þ n is a student t random variable with n þ d degrees of freedom and d is the dimension of x [19], mn ðxÞ ¼ hðxÞT b^ þ tðxÞT A1 ðyHb^ Þ,
ð11Þ
cn ðx,x0 Þ ¼ cðx,x0 ÞtðxÞT A1 tðx0 Þ þ ðhðxÞT tðxÞT A1 HÞðHT A1 HÞ1 ðhðx0 ÞT tðx0 ÞT A1 HÞT
ð12Þ
and T
T
tðxÞT ¼ ðcðx,x1 Þ, . . . ,cðx,xn ÞÞ, HT ¼ ðh ðx1 ÞT , . . . ,h ðxn ÞT Þ, 0 1 1 cðx1 ,x2 Þ . . . cðx1 ,xn Þ B C 1 ^ B cðx2 ,x1 Þ C C, A¼B B C ^ & @ A ... 1 cðxn ,x1 Þ
b^ ¼ V n ðV 1 zþ HT A1 yÞ, s^ 2 ¼
T fa þ zT V 1 zþ yT A1 yb^ ðV n Þ1 b^ g ðn þ d2Þ
V n ¼ ðV 1 þHT A1 HÞ1 : The outputs corresponding to any set of inputs will now have a multivariate t-distribution, with covariance between any two outputs given by Eq. (10). Note that the t-distribution arises as a marginal distribution for f ðÞ after integrating out the hyperparameters b and s2 . In practice, further hyperparameters, the smoothness parameters B, will be associated with the modelling
A. Daneshkhah, T. Bedford / Reliability Engineering and System Safety 112 (2013) 82–93
of the correlation function, cð,Þ. It is not practical to give B a fully analytical Bayesian treatment since it is generally impossible to integrate the posterior distribution analytically with respect to these further parameters. The simplest option is to keep B fixed. Another way is to use a numerical method to integrate the posterior distribution. Although it is possible to integrate numerically, in particular, by using Markov chain Monte Carlo (MCMC) sampling which is a highly intensive computation, but a pragmatic and robust approach is to simply estimate the hyperparameters of cð,Þ from the posterior distribution and then to substitute these estimates into cð,Þ wherever they appear in the above formulae (see [19]). These estimates can be obtained by using the posterior mode approach, and using a cross validation approach [23]. The GEM-SA package can estimate the smoothness parameters using both methods. We will give more details in the next section. 4.2. Inference for main effects and interactions In this section, we explain how inferences about the sensitivity measures introduced in the preceding section can be estimated from the Gaussian process posterior distribution derived in Section 4.1. One of the key insights in [22] is that inference about f can be used to obtain inference about the main and interaction effects of f because these are simply linear functionals of f. In particular, when the posterior distribution for f is td þ n after standardising as shown in (10), so is the resulting posterior for the main and interaction effects. Specifically, if the posterior mean for f is given by Eq. (11), then for Z EðY9xp Þ ¼ f ðxÞ dGp9p ðxp 9xp Þ ð13Þ wp
(recalling that wp denote the input space associated with xp , and Gp9p ðxp 9xp Þ is conditional distribution of xp given xp under G), the posterior mean of this quantity can be written as Epost fEðY9xp Þg ¼ Rp ðxp Þb^ þ T p ðxp Þe, where Rp ðxp Þ ¼
Z wp
T p ðxp Þ ¼
Z
wp
hðxÞT dGp9p ðxp 9xp Þ, tðxÞT dGp9p ðxp 9xp Þ
and e ¼ A1 ðyHb^ Þ. The posterior mean of main effect or interaction can be similarly obtained as follows: Epost fzi ðxi Þg ¼ fRi ðxi ÞRgb^ þfT i ðxi ÞTge, Epost fzi,j ðxi,j Þg ¼ fRi,j ðxi,j ÞRi ðxi ÞRj ðxj RÞgb^ þfT i,j ðxi,j ÞT i ðxi ÞT j ðxj TÞge: In a similar way, we can derive the standard deviations of the main effects and interactions, see [22] for the details of this computation. We can plot the posterior mean of the main effect Epos ðzi ðxi ÞÞ against xi, with bounds of, for example, plus and minus two posterior standard deviations. By standardising the input variables we are able to draw Epos ðzi ðxi ÞÞ for i¼1,y,d on a single plot giving a good graphical summary of the influence of each variable. We will present this plot for the examples given in Section 5. Direct posterior inference for the variance-based measures introduced in Section 2.2, Vi and V T i is more complex as these measures are quadratic functionals of f ðÞ, and we refer to [22] for a discussion of ways of dealing with this.
87
5. Application in reliability analysis We present five examples where we examine the sensitivity analysis of the quantities of interest in reliability analysis with respect to the changes of the relevant parameters. We are particularly interested to study the sensitivity analysis of the availability function with respect to the uncertainty in the parameters of the failure and repair distributions at the given time of interests. In the first two examples, we consider a one component repairable system, where in the first example the failure and repairable distributions are assumed to be exponential while in the second example these distributions are Weibull. These two examples are discussed in detail in [10], and we only briefly report their findings in Section 5.1. We then move to repairable systems with more than one component. We are interested in parallel systems, such as two component standby-redundancy system or three-component redundancy system. The failure and repair rates of each component are assumed to be constant and follow exponential distributions. These two examples are given in Section 5.2. In the last example given in Section 5.3, we study a more complex system, the move-drive system. This system consists of eight components with constant repair system and time-dependent failure rates, where the failure rate follows Weibull distribution. To build the emulator of the system availability required to conduct the sensitivity analysis, we need to evaluate this availability at selected parameter points at time t. In the following section, we briefly present the methods to evaluate the system availability at the given parameters. There are no restrictions on the values of the inputs selected to build the emulator, but a good emulator (one presenting small uncertainties from relatively small numbers of runs) usually requires a well spaced set of inputs covering the range of inputs over which emulation is required. We use maximin Latin hypercube or LP-tau designs [29] to generate an efficient set of data points or inputs. Suppose we wish to draw n random values of x ¼ ðx1 , . . . ,xd Þ. For i¼1,y,d we divide the sample space of xi into n regions of equal marginal probability. We then draw one random value of xi from each region, giving realisations we call xi1 , . . . ,xin . To obtain one random value of x we sample without replacement from the values xi1 , . . . ,xin for i ¼ 1,2, . . . ,d. This ensures that each dimension in the input space is fully represented. In other words, the Latin Hypercube Sampling scheme is designed to ensure that the whole sample space of x is represented without having to use a very large sample size as described in the details as above. After building the emulator, it is trivial to calculate main effects and joint effects to aid model interpretation and model checking as part of a sensitivity analysis, when direct evaluation of the model output is too costly. 5.1. One-component repairable system We assume that the systems under study here and later are repairable, and that at any given time, a component is either functioning normally or failed (and under repair), so that the component state changes as time evolves. We further assume that repairs restore the component to a condition as good as new. Thus, the process consists of alternating repetitions of the repairto-failure and the failure-to-repair processes. The availability at time t, denoted by A(t), is the probability that the component is functioning at time t. The availability of a system provides a measure of the readiness of a system for use at any instant of time. The availability analysis of a system requires a knowledge of the following aspects:
the system configuration describing how the components are functionally connected;
A. Daneshkhah, T. Bedford / Reliability Engineering and System Safety 112 (2013) 82–93
the failure process of the components; the method of operation and the definition of failure of the system;
the repair or maintainability policy. A variety of failure and repair time distributions can be used in the availability analysis. Two of the well-known of these distributions are exponential and Weibull. We examined the emulatorbased sensitivity analysis of the availability when the failure and repair rates are either both exponential distributions or both Weibull distributions [10]. In the case when the failure and repair rates are both constant and their densities follow exponential distribution, we show that the system availability becomes more sensitive to the repair rate as time increased from t¼0.01 to t ¼ 1000 (for t Z1000, the steady state availability will be obtained), and correspondingly its variance contribution increased from 0.1% to 84.71%. It is obvious at the early time the system is more sensitive with respect to the failure rate and the percentage variance contribution of its main effect was 98.84% when t ¼0.01 and decreased to 14.51% as time increased to t ¼1000 (the details can be found in [10]). In the latter case, the main effect’s plots suggest that as time increased, for the fixed scale parameter of the failure to time distribution, the system availability is more sensitive with respect to the scale parameter of the repair density, and the variance contribution of its main effect varies from 87% to 91%. However, the computation of the availability at the selected parameters’ values for the first case is trivial, but as expected, the evaluation of the system availability for the selected parameters’ values when the failure and repair rates are time dependent is not an easy task and a numerical method is required. In the next section, we show that the emulator-based sensitivity measure introduced in this paper can be easily calculated for the systems with a higher number of components, complexity and number of parameters to be estimated. In particular, we first focus on the parallel systems and then study a real system with more complexity and parameters. The systems investigated here have been selected so that the availability function can, in principle, be calculated by numerical integration and without Monte Carlo simulation.
repair at time t, so, P0 ðtÞ þ P1 ðtÞ þ P2 ðtÞ ¼ 1, l and m denote the common failure rate and repair rate, respectively, per unit time of the components. These differential equations can be easily solved numerically (for example, in MATLAB) under the initial conditions: P0 ð0Þ ¼ 1, P1 ð0Þ ¼ 0, P 2 ð0Þ ¼ 0, corresponding to the system starting off ‘‘as good as new’’. Since States 0 and 1 constitute the operation of the system, the availability of this system at time t is given by AðtÞ ¼ P 0 ðtÞ þ P1 ðtÞ: If there are two repair persons, one repair person can be assigned to each component and the availability of this system can be obtained by solving the following differential equations: dP0 ðtÞ ¼ 2lP 0 ðtÞ þ mP 1 ðtÞ, dt dP 1 ðtÞ ¼ 2lP0 ðtÞðl þ mÞP 1 ðtÞ þ2mP2 ðtÞ, dt dP 2 ðtÞ ¼ lP1 ðtÞ2mP2 ðtÞ: dt
ð15Þ
To examine the sensitivity analysis of A(t) at time t with respect to the uncertainties in l and m, we need to calculate the main effects, their plots and other sensitivity measures described in Section 4. Plots of main effects provide us a cheap and effective tool to examine which of the parameters of A(t) is significantly sensitive to, and the nature of the possible relationships between A(t) and the parameters.
1.005
Main effect
88
1
0.995
0.99
5
6
7
8
9
10
λ
x 10
−3
0.997
5.2. Availability of parallel systems Consider a parallel system considering of two components A and B. For this system, there are three possible states
Main effect
0.9965
1. State 0, both components operating. 2. State 1, one component operating and the other under repair. 3. State 2, both components under repair. We first calculate the availability of this system under two maintenance policies: one repair person; two repair persons. In the first case when there is only a single repair person to service the two components. The availability of this system at any time, t can be obtained by solving the following differential equations: dP0 ðtÞ ¼ 2lP 0 ðtÞ þ mP 1 ðtÞ, dt dP1 ðtÞ ¼ 2lP 0 ðtÞðl þ mÞP1 ðtÞ þ mP2 ðtÞ, dt dP2 ðtÞ ¼ lP 1 ðtÞmP2 ðtÞ, dt
ð14Þ
where P0 ðtÞ is the probability that both components at time t are operating, P 1 ðtÞ denote to the probability that one component is operating and the other one is under repair at time t, P 2 ðtÞ is standing for the probability that both components are under
0.996 0.9955 0.995 0.9945 0.01
0.015
0.02
0.025
0.03
μ
0.035
0.04
0.045
0.05
Fig. 3. Estimated main effects for the availability system described in (14) at t¼ 10 based on 10 datapoints.
Table 1 The emulator-based sensitivity analysis of the system availability based on 10, 20 and 40 design points at t ¼ 10. Measurement
10 points
20 points
40 points
s2
0.219781
0.0759006
0.0351314
AEm ð10Þ EmStdðAð10ÞÞ
0.995543 0.0017
0.995527 0.001634
0.995523 0.00163
Variance (%) of m Variance (%) of l
96.03 3.05
95.1 4.27
95.44 3.94
Total variance (%)
99.0767
99.3671
99.3733
A. Daneshkhah, T. Bedford / Reliability Engineering and System Safety 112 (2013) 82–93
89
Fig. 4. Transition diagram for cold, warm, or hot standby. (a) Diagram for cold standby. (b) Diagram for warm or hot standby.
Fig. 3 illustrates the estimated main effects, EðAðtÞ9lÞ and EðAðtÞ9mÞ, and defined in the general case in (13) and its approximation given in the previous section, for the system availability based on 10 design points and at t¼ 10. It is clear by increasing the number of design points the uncertainty of the fitted emulator will be reduced (see Table 1). The thickness of the band in Fig. 3 shows the emulator uncertainty associated with each input. For instance, fixing m ¼ 0:025, this point shows the expected value of the output (obtained by averaging over l). In addition, fixing l at its central value and comparing m ¼ 0:01 with m ¼ 0:05 would underestimate the influence of this input. In other words, the emulator will allow small groups of inputs to vary, others fixed at original default values. From comparing the thickness of the main effect plots of l and m, it can be concluded that there is more uncertainty about m than l. The sensitivity analysis of the system availability, A(t) with respect to the changes in m and l is shown in Table 1 for the different number of design points. The system availability is more sensitive to the changes of m regardless of the number of design points at which it is covering more than 95% of output variance. The other fact that we can derive from this table is that the emulator standard error denoted by EmStdðAðtÞÞ, t ¼ 10,20,40, decreases as the number of design points increased. This standard error also shows the uncertainty in the model output that is induced by input/parameter’s uncertainties (the details can be found in [22]). Furthermore, the estimation of the emulator parameter, such as s2 , by maximising the emulator posterior distribution is also given in this table. The value of 1=s2 can be considered as precision of the fitted Gaussian process required to implement the sensitivity and uncertainty analysis. It is clear that the precision will be increased by increasing the number of design points. The information given in this table regarding to variance percentage of the parameters enable us to conduct a variancebased sensitivity analysis to identify which uncertain parameters are driving the system availability’s uncertainty. From this table, we can say that 96% , 95.1% and 95.4% of total variance (also known as the main effect and explained in Section 4) can be explained by m based on 10, 20 and 40 points, respectively, while main effects of l are only described 3%, 4.3% and 4% for 10, 20, 40 points, respectively. We can then conclude that the system availability is more sensitive to m. In addition, we can conclude that 10 points is enough to examine sensitivity analysis of the system availability. We consider product uniform as the probability distribution of the parameters GðÞ in this example and
main effects of the availability system based on 10 points for t=10
1.005 1 0.995
5
6
7
8
9
10
λ
x 10−3
1 0.998 0.996 2.5
3
3.5
η
4
4.5
5 x 10−3
0.999 0.998 0.997 0.996 0.01
0.015
0.02
0.025
0.03 μ
0.035
0.04
0.045
0.05
Fig. 5. Estimated main effects for the system availability of the standby-redundancy system presented in Eqs. (16) at t¼ 10 and obtained based on 10 points.
other examples given in the rest of this paper. We also check the product normal for GðÞ, and we have got the similar results. We obtain the similar results yielded above for the system presented in (15) for two repair persons. We now study the emulator-based sensitivity analysis of availability of a more complex system with respect to the changes in its parameters. The availability of the following standbyredundancy system is studied in [20]. The behaviour of this system consisting of two components A and B is shown in Fig. 4. Each rectangle represents a redundancy state. The rectangle 1 represents a state where component B is in standby and A in operating. Similarly, rectangle 4 expresses the event that component B is operating and A is under repair. Possible state transitions are shown in the same figure. The warm or hot standby has transitions from 1 to 3, or state 2 to 4, whereas the cold standby does not. For the warm or hot standby, the standby component fails with constant failure rate Z. For hot standby, Z is equal to l, the failure rate for principle components. For cold standby, Z ¼ 0. The warm standby (0 r Z r l) has as its special cases the hot standby l ¼ Z and the cold standby, Z ¼ 0. Two or less components can be repaired at a time, and each component has a
90
A. Daneshkhah, T. Bedford / Reliability Engineering and System Safety 112 (2013) 82–93
constant repair rate m. In all cases, the system fails when it enters state 5. We denote the probability that the redundant system in state i at time t by Pi(t) and its derivative shown by Pi ðtÞ=dt. The availability of this system can be obtained by solving the following ordinary differential equations [20, pp. 428–430]:
1 0.995
5
6
7
8
9
10
λ
x 10−3
0.998
dPð0Þ ðtÞ ¼ ðl þ ZÞP ð0Þ ðtÞ þ mP ð1Þ ðtÞ, dt
0.997
dPð1Þ ðtÞ ¼ ðl þ ZÞP ð0Þ ðtÞðl þ mÞPð1Þ ðtÞ þ 2mP ð2Þ ðtÞ, dt dPð2Þ ðtÞ ¼ lP ð1Þ ðtÞ2mP ð2Þ ðtÞ, dt
main effects of the availability system based on 30 points for t=10
1.005
0.996 2.5
ð16Þ
3
3.5
η
4
4.5
5 x 10−3
0.999 0.998
where Pð0Þ ðtÞ ¼ P1 ðtÞ þ P2 ðtÞ, P ð1Þ ðtÞ ¼ P 3 ðtÞ þP 4 ðtÞ, Pð2Þ ðtÞ ¼ P5 ðtÞ, m is the constant repair rate for each component, and Z and l are the constant failure rates for the principle components in the different situations as explained above. Similarly, we can solve Eqs. (16) using the Matlab commands mentioned above. Figs. 5 and 6 illustrate the estimated main effects (EðAðtÞ9lÞ, EðAðtÞ9ZÞ and EðAðtÞ9mÞ) for the system availability presented in Eq. (16) at t ¼10, based on 10 and 30 design points, respectively. The thickness of the band in each plot of these figures shows the uncertainty associated with each input. For instance, there are more uncertainties regarding m and larger values of Z (0:004 o Z o0:005) based on 10 design points as illustrated in Fig. 5. As expected, these uncertainties decrease as more design points used to approximate the corresponding main effects. From these figures, we can conclude that the system availability at the given time (t ¼10) is more sensitive to the changes of l and then m. This conclusion can be obtained in more detail from Table 2 which shows sensitivity analysis of A(t) with respect to the changes in l, Z and m. The main effects of l are almost 83% and 85% based on 10 and 30 points respectively, while m covers only 10% and 12% of total variance for 10 and 30 points, respectively. In addition, Z’s changes do not have any influence on the system availability uncertainty. This is reasonable to conclude that the system availability is more under influence of the failure rate at the early time. For larger t, for example at t¼ 500, the system availability becomes more sensitive to the changes of m. As shown in Table 3, the m’s contribution to the total variance is 82% based on using 10 design points and 86% based on using 30 points. But, the availability at t ¼500 is slightly sensitive to the changes of l (its contribution to the variance is 11% and 9% for 10 and 30 points, respectively) which is opposite to the results we have got for smaller t, shown above for t ¼10. This is also reasonable to conclude that the system availability is more under influence of the repair rate at the late time. In addition, regardless of the time that system availability evaluated at, increasing the number of points will increase the precision of the fitted emulator and computation measured by 1=s2 . From Tables 2 and 3, we can conclude that the precision of the emulator-based sensitivity analysis presented above will increase more than 10 times when we increase the number of points from 10 to 30. The method used here shows that the availability is sensitive to different parameters at early times and late times. The late time corresponds more or less to the steady state. Note also that the calculations of the sensitivity indices and availability are fairly robust to the number of points chosen to build the emulator model. Note also that by determining the indices at different time points we could determine the intervals on which each parameter is the parameter of most importance, or more generally the periods in which sensitivity to parameters remains broadly unchanged.
0.997 0.996 0.01
0.015
0.02
0.025
0.03 μ
0.035
0.04
0.045
0.05
Fig. 6. Estimated main effects for the system availability of the standby-redundancy system presented in Eqs. (16) at t¼ 10 and obtained based on 30 points.
Table 2 The emulator-based sensitivity analysis of the system availability of the standbyredundancy system presented in Eqs. (16) at t ¼10 and obtained based on 10 and 30 points. Measurement
10 points
30 points
s2
0.935836
0.0881324
AEm ð10Þ EmStdðAð10ÞÞ
0.996918 1.2255e 005
0.996943 4.5408e 006
Variance (%) of l Variance (%) of Z Variance (%) of m
82.6633 4.66386 12.0513
84.7421 3.2605 10.6746
Total variance (%)
99.3785
98.6772
Table 3 The emulator-based sensitivity analysis of the system availability of the standbyredundancy system presented in Eq. (16) at t¼ 500 and obtained based on 10 and 30 points. Measurement
10 points
30 points
s2
9.75993
0.502583
AEm ð500Þ EmStdðAð500ÞÞ
0.951366 0.0011
0.952061 0.00055
Variance (%) of l Variance (%) of Z Variance (%) of m
11.5649 0.120892 81.8986
9.37515 0.575127 85.711
Total variance (%)
93.5844
95.6613
5.3. Move-drive system We consider a more complex system, three special vehicles of the same kind, taken from Zhang and Horigome [34]. Their purpose is to examine the performance of the move-drive system of the vehicles. The system consists of eight LWTSS (load wheel and torsional shaft suspensions) or components which can be treated as the same kind. These eight components are a series system in logic. Since the components are same, integrate these states by the following definition: State i – i out of eight components is failed, i ¼ 0,1, . . . ,8. Based on these considerations, this system has N ¼9 states.
A. Daneshkhah, T. Bedford / Reliability Engineering and System Safety 112 (2013) 82–93
91
Table 4 The possible ranges for t50 and t 95 =t 50 . Component i
t50
t 95 =t 50
1 2 3 4,5,6,7,8
(1480,1500) (3750,3850) (11400, 11500) (22 700, 23 000)
(3, 3.2) (4.2, 4.4) (4.2, 4.4) (4.2,4.4)
Fig. 7. System state-transition diagram of move-drive system.
According to the information given in [34] and their suggestion, it is not unreasonable to assume that the failure rate functions have the form li ðtÞ ¼ ai t bi . They have also assumed that there are at most three repair facilities for each vehicle with the same proficiency in operation and the repair rate is also considered constant (the repair rate for a failed component by one repair person is independent of time). Fig. 7 shows the resulting system state-transition diagram. Eq. (17) is derived from Fig. 7: dPðtÞ ¼ T0R ðtÞPðtÞ, dt where PðtÞ ¼ ðP0 ðtÞ,P1 ðtÞ, . . . ,P 8 ðtÞÞ, 0 c m 0 0 0 Bl 0 0 B 1 m 2m B B l2 0 2m 3m 0 B Bl 0 0 3m 3m B 3 B 0 0 0 3m T0R ðtÞ ¼ B B l4 Bl 0 0 0 0 B 5 B B l6 0 0 0 0 B Bl 0 0 0 0 @ 7 l8 0 0 0 0
c
8 X
li ,
ð17Þ
0 0
0 0
0 0
0
0
0
0
0
0
3m
0
0
3m
3m
0
0
3m
3m
0
0
3m
0
0
0
1 0 0 C C C 0 C C 0 C C C 0 C C, 0 C C C 0 C C 3m C A 3m
ð18Þ
i¼1
where li ¼ ki t ai . The initial condition for (17) is ðP0 ð0Þ,P 1 ð0Þ, . . . ,P 8 ð0ÞÞ ¼ ð1,0, . . . ,0Þ: The closed form for the system availability does not exist here, and Zhang and Horigome [34] used numerical methods to obtain the system availability. Their methods involves integrating convulsion kind integrals and then solving a matrix differential equation of linear time-varying system with the given initial condition (the details can be found in [34]). In order to examine the sensitivity analysis of the system availability with respect to the changes in the parameters, we use the numerical method mentioned above to evaluate the system availability at the training points obtained from the Maxmin Latin Hypercube design. In order to get the training points for the corresponding parameters, we first should know the possible ranges for t50 and t 95 =t 50 of each component. It can be achieved using expert judgment or the available failure data. By studying the system and data available in [34], the following ranges for each component of t50 and t 95 =t 50 are proposed: We are now able to generate training points (200 design points) for t50 and t 95 =t 50 over the ranges proposed in Table 4, and then using Eqs. (5) and (6), we can obtain the corresponding parameter values for fðki , ai Þ,i ¼ 1,2, . . . ,8g. In addition, in a similar way we obtain 200 design points for the repair rate, m A ð0:02,0:08Þ. We can now use these values to calculate the system availability, as proposed in [34], at the requested time (t¼ 100).
Fig. 8 shows the main effects of the system availability, A(t) for the parameters yielded as discussed above when t¼ 100 and n¼ 100 training points used to fit the required Gaussian process. From this figure and information, regarding the main effects of the parameters required to calculate the system availability given in Table 5, we can conclude that the availability is most sensitive to the uncertainties in m, a4 and k1, when 100 design points are used, respectively, and when 200 training points are used the system availability is most sensitive to the changes of the following parameters, m and k1. This method enables us to revise the sensitivity analysis if the values of some of these parameters are known or fixed. For instance, when we use 200 design points, if we know the values of m and k1 somehow, the system availability given that these parameters are known are then most sensitive to k2 (with 29.95% of variance contribution) , a3 (13.82%), k4 (12.29%) and k5 (12.70%). As a result, total variance contribution would drop to 94.8222%, but the estimated system availability is increased to 0.960544. Note that the emulator was constructed using 10 times as many points as before because of the larger dimensionality of the space of parameters.
6. Discussion An alternative sensitivity analysis of the quantities of interest in the reliability analysis, such as the availability/unavailability function, with respect to the changes of uncertain parameters has been presented. This method is originally introduced by [22] to examine sensitivity analysis of a complex model with respect to changes in its inputs based on an emulator which is built to approximate the model. This method enables us to decompose the output variance into components representing main effects and interactions. This approach allows effective sensitivity analysis to be achieved by using far smaller number of model runs than standard Monte Carlo methods. For example, Marseguerra et al. [21] employ a Monte Carlo method with 107 runs to compute the first-order differential sensitivity indexes of the basic events characterising the reliability behaviour of the transport system while our approach might require few hundred runs to achieve the same accuracy but with broader sensitivity analysis measures. In the reliability context it is very useful to compute sensitivities to different parameters in order to understand which has the largest impact and therefore which has to be assessed more accurately. In our final example we showed that it was possible to assess this for a system with more realistic number of parameters, and that only a small number of the parameters generated most of the uncertainty about the availability. It is worth mentioning that the availability function considered here is a function of time and should therefore, in principle, be computed through time. In this paper, we present a method to examine sensitivity of the system availability at certain times, such as time after starting the system or time very close to steady state situation. In [10], for a one-component repairable system, we examined the sensitivity analysis of the availability function at
92
A. Daneshkhah, T. Bedford / Reliability Engineering and System Safety 112 (2013) 82–93
0.95 0.9 0.85
6
8
10
0.95 0.9 0.85 0.25
k1x 10−5 0.95 0.9 0.85
5
6
7
0.95 0.9 0.85 −0.02
k3x 10−5 0.91 0.905 0.9 2.5
3
3.5
0.94 0.92 0.9 −0.02
kx 10−55 0.92 0.91 0.9 2.5
3
3.5
0.94 0.92 0.9 −0.02
kx 10−57 1 0.9 0.8
0
0.05 μ
0.3 α1
0 α3
0 α5
0 α7
0.35
0.92 0.91 0.9
1
2
3
0.92 0.91 0.9 −0.02
3.5
0.95 0.9 0.85 −0.02
3.5
0.95 0.9 0.85 −0.02
3.5
0.95 0.9 0.85 −0.02
k2x 10−4
0.02
0.95 0.9 0.85 2.5
3 kx 10−54
0.91 0.905 0.9 2.5 0.02
3 kx 10−56
0.91 0.905 0.9 2.5 0.02
3 kx 10−58
0 α2
0.02
0 α4
0.02
0 α6
0.02
0 α8
0.02
0.1
Fig. 8. Estimated main effects for the availability at t¼ 100 and based on 200 design points for move-drive system given in (17).
Table 5 The emulator-based sensitivity analysis of the system availability of the move-drive system at t¼ 100 and based on 100 and 200 design points. Measurement
100 points
200 points
Variance (%) of k1 Variance (%) of a1 Variance (%) of k2 Variance (%) of a2 Variance (%) of k3 Variance (%) of a3 Variance (%) of k4 Variance (%) of a4 Variance (%) of k5 Variance (%) of a5 Variance (%) of k6 Variance (%) of a6 Variance (%) of k7 Variance (%) of a7 Variance (%) of k8 Variance (%) of a8 Variance (%) of m Total variance (%) AEm ð100Þ EmStdðAð100ÞÞ
3.59 0.82 0.42 0.22 0.87 0.85 0.56 9.53 1.57 0 1.15 0 1.56 0.02 1.13 0.02 76.94 99.2358 0.907689 0.00034
4.53 0.86 0.08 0.02 0.61 0.51 0.17 0.53 0.12 0 0.14 0 0.12 0 0.20 0 91.57 99.4702 0.905833 0.00031
more times. There is certainly a case for studying the sensitivity through time. In the case that the dynamic behaviour of the reliability quantity is of our interest then variants of the approaches used above can be employed. Conti and O’Hagan [8] suggested the following procedures to emulate a dynamic simulator: 1. The first method consists of using a multi-output emulator, developed by [8], where the dimension of the output space is q¼T and we assume that the outputs of the dynamic model is presented by y ¼ ðy1 , . . . ,yT Þ. Hence this approach simultaneously emulates at different time points. 2. The second approach uses a single-output emulator, following an idea originally mentioned by Kennedy and O’Hagan [19] to analyse the spatial outputs. The time is considered as an extra
input for this model, and the output yt ¼ f t ðxÞ can now be n presented as f ðx,tÞ, where t ¼ 1, . . . ,T. 3. The third approach is to emulate the T outputs separately, each via a single-output emulator. Data for the t-th emulator would be then provided by the corresponding column of the data set. Any of these methods can be used to emulate A(t) and then use the fitted emulator to examine the sensitivity analysis of A(t) (and other quantities of interest in reliability analysis) over time. Buzzard [5] proposed an alternative method at which he used the sparse grid interpolation to provide good approximations to smooth functions in high dimensions based on relatively few function evaluations. He used an efficient conversion from the interpolating polynomial provided by evaluations on a sparse grid to a representation in terms of orthogonal polynomials. He then shows how to use these relatively few function evaluations to estimate several types of sensitivity coefficients and to provide estimates on local minima and maxima. Applying Buzzard’s method to examine the sensitivity analysis of interest in this paper and comparing it with the emulator-based method proposed here can be considered as the future works. Having a solution to the computational problem of computing sensitivity for large scale models, however, does not make these models easily useable in a practical context. The uses for this kind of analysis are, in our view, two fold.
A high level screening of reliability parameters to identify
which parameters are the most significant, and hence to reduce the number of parameters being considered. An exploratory approach to look at gross model sensitivities.
The second point may require us to reparameterise our reliability models to insert extra parameters that represent model features. For example, when a material is being used in a novel context, there may be a lot of uncertainty about the time at which aging begins. Hence instead of (say) a Weibull model, we could use a distribution which includes a specific time to onset of aging. Similarly, when investigating the impact of possible logistical delays then extra parameters are introduced to model the repair delay.
A. Daneshkhah, T. Bedford / Reliability Engineering and System Safety 112 (2013) 82–93
Finally it is clear that by modelling costs associated to the repair process we can compute the uncertainties in costs up to time t alongside the availability. The approach given above assumes that we are able to determine a distribution G for the model parameters. Typically the distributions should come from generic databases for new equipment, or be informed by industrial data records where similar equipment being used in a similar context. The uncertainty represents the state of knowledge uncertainties about the parameter values—uncertainties which, if we learn from operating experience, will be reduced through time. Hence the methods discussed here are most appropriate in the design phase for new equipment and the planning stage for maintenance, when these state of knowledge uncertainties are greatest.
Acknowledgements This research was supported by the Engineering and Physical Sciences Research Council (Grant EP/E018084/1). The authors also wish to acknowledge discussions with Dr. John Quigley. References [1] Bedford T. Sensitivity indices for (tree)-dependent variables. In: Chan K, Tarantola S, Campolongo F, editors. SAMO’98, Proceedings of second international symposium on sensitivity analysis of model output. EUR 17758 EN, JRC-EC, Ispra, 1998. p. 17–20. [2] Bedford T, Cooke R. Probabilistic risk analysis: foundations and methods. Cambridge: Cambridge University Press; 2001. [3] Borgonovo E. A new uncertainty importance measure. Reliability Engineering and System Safety 2007;92(6):771–84. [4] Borgonovo E, Castaings W, Tarantola S. Model emulation and momentindependent sensitivity analysis: an application to environmental modelling. Environmental Modelling and Software 2012;34:105–15. [5] Buzzard GT. Global sensitivity analysis using sparse grid interpolation and polynomial chaos. Reliability Engineering and System Safety 2012;107:82–9. [6] Caniou Y, Sudret B. Distribution-based global sensitivity analysis using polynomial chaos expansions. Procedia—Social and Behavioral Sciences 2010;2(6):7625–6. [7] Caniou Y, Sudret B. Distribution-based global sensitivity analysis in case of correlated input parameters using polynomial chaos expansions. In: Nishijima K, editor. Applications of statistics and probability in civil engineering. CRC Press; 2011. p. 695–702. [8] Conti S, O’Hagan A. Bayesian emulation of complex multi-output and dynamic computer models. Journal of Statistical Planning and Inference 2010;140:640–51. [9] Cukier RI, Fortuin CM, Shuler KE, Petschek AG, Schaibly JH. Study of the sensitivity of coupled reaction systems to uncertainties in rate coefficients. I. Theory. Journal of Chemical Physics 1973;59(8):3873–8. [10] Daneshkhah A, Bedford T. Sensitivity analysis of a reliability system using Gaussian processes. In: Bedford T, Quigley J, Walls L, Alkali B, Daneshkhah A, Hardman G, editors. Advances in mathematical modeling for reliability. Amsterdam: IOS Press; 2008. p. 46–62. [11] Gustafson P. The local sensitivity of posterior expectations. Unpublished PhD thesis. Department of Statistics, Carnegie Mellon University; 1994.
93
[12] Helton JC. Treatment of uncertainty in performance assessments for complex systems. Risk Analysis 1994;14(4):483–511. [13] Helton JC. Uncertainty and sensitivity analysis in the presence of stochastic and subjective uncertainty. Journal of Statistical Computation and Simulation 1997;57(1–4):3–76. [14] Helton JC, Davis FJ. Latin hypercube sampling and the propagation of uncertainty in analyses of complex systems. Reliability Engineering and System Safety 2003;81(1):23–69. [15] Helton JC, Johnson JD, Sallaberry CJ, Storlie CB. Survey of sampling-based methods for uncertainty and sensitivity analysis. Reliability Engineering and System Safety 2006;91(10–11):1175–209. [16] Helton JC, Sallaberry CJ. Yucca mountain 2008 performance assessment: incorporation of seismic hazard curve uncertainty. In: Proceedings of the 13th international high-level radioactive waste management conference (IHLRWMC), Albuquerque, NM April 10–14, 2011. La Grange Park, IL: American Nuclear Society; 2011. p. 1041–8. [17] Hoffman FO, Hammonds JS. Propagation of uncertainty in risk assessments: the need to distinguish between uncertainty due to lack of knowledge and uncertainty due to variability. Risk Analysis 1994;14(5):707–12. [18] Homma T, Saltelli A. Importance measures in global sensitivity analysis of model output. Reliability Engineering and System Safety 1996;52:1–17. [19] Kennedy MC, O’Hagan A. Bayesian calibration of computer models (with discussion). Journal of the Royal Statistical Society: Series B 2001;63:425–64. [20] Kumamoto H, Henley EJ. Probabilistic risk assessment and management for engineers and scientists. Piscataway, NJ: IEEE Press; 1996. [21] Marseguerra M, Zio E, Podofillini L. First-order differential sensitivity analysis of a nuclear safety system by Monte Carlo simulation. Reliability Engineering and System Safety 2005;90:162–8. [22] Oakley JE, O’Hagan A. A probabilistic sensitivity analysis of complex models: a Bayesian approach. Journal of the Royal Statistical Society: Series B 2004;66:751–69. [23] O’Hagan A, Kennedy M, Oakley JE. Uncertainty analysis and other inference tools for complex computercodes (with discussion). In: Bernardo JM, Berger JO, Dawid AP, Smith AFM, editors. Bayesian statistics, vol. 6. Oxford, UK: Oxford University Press; 1999. p. 503–24. [24] Pate -Cornell ME. Uncertainties in risk analysis: six levels of treatment. Reliability Engineering and System Safety 1996;54(2–3):95–111. [25] Parry GW. The characterization of uncertainty in probabilistic risk assessments of complex systems. Reliability Engineering and System Safety 1996;54(2–3):119–26. [26] Ratto M, Castelletti A, Pagano A. Emulation techniques for the reduction and sensitivity analysis of complex environmental models. Environmental Modelling and Software 2012;34:1–4. [27] Ratto M, Pagano A. Using recursive algorithms for the efficient identification of smoothing spline ANOVA models. Advances in Statistical Analysis 2010;94:367–88. [28] Reedijk CI. Sensitivity analysis of model output: performance of various local and global sensitivity measures on reliability problems. Master’s thesis, Delft University of Technology; 2000. [29] Sacks WJ, Welch TJ, Mitchell, Wynn HP. Design and analysis of computer experiments. Statistical Science 1989;4(4):409–35, With comments and a rejoinder by the authors. [30] Saltelli A, Chan KPS, Scott M, editors. Sensitivity analysis. New York: Wiley; 2000. [31] Saltelli A, Tarantola S, Chan KPS. A quantitative model-independent method for global sensitivity analysis of model output. Technometrics 1999;41(1):39–56. [32] Sobol I. Sensitivity analysis for nonlinear mathematical models. Mathematical Modelling and Computational Experiment 1993;1:407–14. [33] Sudret B. Global sensitivity analysis using polynomial chaos expansions. Reliability Engineering and System Safety 2008;93(7):964–79. [34] Zhang T, Horigome M. Availability and reliability of system with dependent components and time-varying failure and repair rates. IEEE Transactions on Reliability 2001;50(2):151–8.