Computers & Industrial Engineering 62 (2012) 190–197
Contents lists available at SciVerse ScienceDirect
Computers & Industrial Engineering journal homepage: www.elsevier.com/locate/caie
Optimal maintenance policies for systems subject to a Markovian operating environment Yisha Xiang a,⇑, C. Richard Cassady b,1, Edward A. Pohl b,2 a b
Sun Yat-sen Business School, Sun Yat-sen University, 135 W. Xingang Rd., Guangzhou 510275, China Department of Industrial Engineering, University of Arkansas, Fayetteville, AR 72701, USA
a r t i c l e
i n f o
Article history: Received 15 March 2011 Received in revised form 7 September 2011 Accepted 9 September 2011 Available online 17 September 2011 Keywords: Stochastic degradation Dynamic environment Condition-based maintenance
a b s t r a c t Many stochastic models of repairable equipment deterioration have been proposed based on the physics of failure and the characteristics of the operating environment, but they often lead to time to failure and residual life distributions that are quite complex mathematically. The first objective of our study is to investigate the potential for approximating these distributions with traditional time to failure distribution. We consider a single-component system subject to a Markovian operating environment such that the system’s instantaneous deterioration rate depends on the state of the environment. The system fails when its cumulative degradation crosses some random threshold. Using a simulation-based approach, we approximate the time to first failure distribution for this system with a Weibull distribution and assess the quality of this approximation. The second objective of our study is to investigate the cost benefit of applying a condition-based maintenance paradigm (as opposite to a scheduled maintenance paradigm) to the repairable system of interest. Using our simulation model, we assess the cost benefits resulting from condition-based maintenance policy, and also the impact of the random prognostic error in estimating system condition (health) on the cost benefits of the condition-based maintenance policy. Ó 2011 Elsevier Ltd. All rights reserved.
1. Introduction A repairable system is a device or unit of equipment which after failure can be restored to an operating condition by maintenance actions including but not limited to replacing the entire system. The maintenance of a repairable system may include not only responses to failures, but also actions intended to delay failures and improve the day-to-day performance of the system. Most industrial and military organizations depend upon the effective operation of individual repairable systems or fleets of repairable systems (e.g. vehicles, machines, etc.) to successfully complete their mission. In order to keep the system operating at a desirable condition, scheduled maintenance actions have traditionally been used to delay or prevent system failure. Scheduled maintenance is initiated based on some measure of elapsed time, and maintenance schedules are determined typically using a probabilistic model of repairable system operation, failure, and maintenance. The literature on the use of mathematical modeling for analyzing and ⇑ Corresponding author. Tel.: +86 20 84112601; fax: +86 20 84036588. E-mail addresses:
[email protected] (Y. Xiang),
[email protected] (C.R. Cassady),
[email protected] (E.A. Pohl). 1 Tel.: +1 479 5756735. 2 Tel.: +1 479 5756042. 0360-8352/$ - see front matter Ó 2011 Elsevier Ltd. All rights reserved. doi:10.1016/j.cie.2011.09.006
optimizing scheduled maintenance plans is extensive. Early reviews on scheduled maintenance include Pierskalla and Voelker (1976), Sherif and Smith (1981), Valdez-Flores and Feldman (1989), Cho and Parlar (1991) and Wang (2002). More recent work of scheduled maintenance in industrial applications can be found in Das and Acharya (2004), Budai, Huisman, and Dekker (2006), Kenné, Gharbi, and Beit (2007), Panagiotidou and Tagaras (2007), Alardhi and Labib (2008), Canto (2008), Samrout, Châtelet, Kouta, and Chebbo (2009) and Moghaddam and Usher (2011). Because scheduled maintenance policies are based on probabilistic time to failure models, implementing scheduled maintenance policies does not eliminate the risk of system failure and implies that systems in an operating condition will be shut down for maintenance. Recently, condition-based maintenance policies have received more and more attentions. This type of policies takes into account updated risks of failure, and suggests system inspection and maintenance action based on the currently observed system state (Liao, Elsayed, & Chan, 2006). The ultimate aim of condition-based maintenance is to eliminate the wasted operating time and risk of failure associated with using a scheduled maintenance policy. The available accurate sensor technologies that can continuously provide performance indicators at low cost make condition-based maintenance more appealing. Sensor data on system condition (health), especially if it is collected in real time, can be analyzed so that maintenance technicians can intervene
Y. Xiang et al. / Computers & Industrial Engineering 62 (2012) 190–197
immediately before system failure (see examples in Elwany and Gebraeel (2008), Kaiser and Gebraeel (2009) and Elwany, Gebraeel, and Maillart (2011)). Jardine (2002) provides a review of common strategies for implementing smart condition-monitoring approaches such as trend analysis and expert systems. The substantial economic benefits associated with this approach to the optimization of condition-based maintenance decisions are shown through case examples. Bloch-Mercier (2002) and Chen, Chen, and Yuan (2003) propose state-dependent preventive maintenance policy for a Markovian deteriorating multi-state system. However, the classification of the multiple states is usually arbitrary and probabilities of transition in a Markov chain can be difficult to evaluate in practice, so a more realistic modeling scheme is to treat the system as a continuous-state system (Dieulle, Bérenguer, Grall, & Roussignol, 2003; Grall, Bérenguer, & Dieulle, 2002; Grall, Dieulle, Bérenguer, & Roussignol, 2006; Liao & Rausch, 2010; Liao et al., 2006). van der Weide, Pandey, and van Noortwijk (2010) and van der Weide and Pandey (2011) consider a maintained system subject to random shocks, and a combined condition-based and age-based preventive maintenance is proposed. The recent popularity of condition-based maintenance in the research community relies heavily on the development of stochastic degradation models. Early work on degradation models uses regression-based methods for lifetime estimation due to their simplicity (Boulanger & Escobar, 1994; Lu & Meeker, 1993; Meeker, Escobar, & Lu, 1998). One of the widely adopted approaches to stochastic failure models is describing wear by a diffusion process, such as a Wiener process (Chien-Yu & Sheng-Tsaing, 2010; Joseph & Yu, 2006; Wang, Carr, Xu, & Kobbacy, 2011; Whitmore, Crowder, & Lawless, 1998; Whitmore & Schenkelberg, 1997), or a gamma process (Chih-Chun, Sheng-Tsaing, & Balakrishnan, 2011; Liao & Rausch, 2010; Liao et al., 2006; Pan & Balakrishnan, 2011; Park & Padgett, 2005; van Noortwijk, 2009; van Noortwijk, van der Weide, Kallen, & Pandey, 2007). Other degradation-based approaches include using a stochastic process to model failure rate (Aalen & Hakon, 2001) or dynamic operating environment, for example, Markovian environment (Kharoufeh, 2003; Kharoufeh & Cox, 2005; Kharoufeh & Mixon, 2009; Kharoufeh, Solo, & Ulukus, 2010). Overviews of stochastic-process-based failure models can be found in Singpurwalla (1995) and Si, Wang, Hu, and Zhou (2011). The fundamental theme of these models is that they are derived by using stochastic processes to describe the underlying mechanisms, such as degradation and wear that cause system failure. Most of the aforementioned condition-based maintenance policies are developed for systems subject to stochastic degradation. Much less efforts are devoted to maintenance management for systems subject to dynamic environments (see example in Kurt and Kharoufeh (2010)). Relative to degradation models, most of the existing literature assumes that failure threshold remains constant. Limited degradation models address random failure thresholds (Wang et al., 2011). In this paper, we attempt to fill this gap by developing cost models to determine optimal maintenance policies for systems with random failure thresholds operating in a Markovian environment. Our interest is in stochastic degradation models of the type proposed by Kharoufeh and Cox (2005) where a single-unit system operates in a Markovian environment. We use a simulation model to generate time to first failure data in such an environment and use Weibull distribution to fit the data. We then develop condition-based maintenance and age-based preventive maintenance for the defined system, and assess the cost benefits resulting from condition-based maintenance. We further test the impact of prognostic error in estimating equipment health on the cost benefits of condition-based maintenance. There are three main contributions in this study. Firstly, investigate the potential for approximating the time to failure probability distribution
191
resulting from a class of stochastic degradation models with traditional time to failure distributions (e.g. the Weibull). The Weibull distribution and its relatives have been around for about 70 years and have widespread usage. By showing that such an approximation is plausible, credibility is lent to the use of stochastic degradation models by those practitioners experienced and comfortable with the traditional, scheduled maintenance paradigm. Secondly, investigate the cost benefit of applying a condition-based maintenance paradigm as opposed to a scheduled maintenance paradigm to the repairable system of interest. Lastly, assess the impact on the cost benefits if the condition-based maintenance policy is subject to prognostic error. Our results show that moderate prognostic error can render condition-based maintenance inferior to age-based preventive maintenance. The remainder of the paper is organized as follows. Section 2 is devoted to the description of the system under consideration and a numerical experimentation to assess the quality of approximating time to failure data with the Weibull distribution. The development of optimal condition-based and age-based maintenance policies for the system of our interest are presented in Section 3. Section 3 also examines the impact of random prognostic error on the cost benefits from the condition-based maintenance policy. Section 4 provides conclusion and future direction. 2. Stochastic deterioration model 2.1. System definition Consider a repairable system almost identical to the type modeled by Kharoufeh and Cox (2005) that operates in a stochastic environment. The environment in which the system operates is a three-state, continuous-time Markov chain. Let X(t) denote the state of the environment at time t, t P 0, and note that: (1) X(t) e {1, 0, +1}, for all t P 0; (2) X(0) = 0. Environment state 0 corresponds to the system’s nominal environment; environment state 1 corresponds to a less severe than nominal environment; environment state +1 corresponds to a more severe than nominal environment. Let qx,x0 denote the transition rate of the continuous-time Markov chain from state x to state x0 , for all x e {1, 0, +1}, x0 e {1, 0, +1}, x – x0 . Furthermore, let vx denote the transition rate of the continuous-time Markov out of state x for all x e {1, 0, +1}, and let px,x0 denote the transition probability of the continuous-time Markov chain from state x to state x0 , for all x e {1, 0, +1}, x0 e {1, 0, + 1}, x – x0 . Note that
qx;x0 ¼ px;x0 v x for all x e {1, 0, +1}, x0 e {1, 0, +1}, x – x0 . We assume q1,+1 = q+1,1 = 0, i.e. from the less (or more) severe than nominal state, the environment can only transition to the nominal state. The system is subject to stochastic deterioration, and the instantaneous rate of system deterioration is a function of the current state of the environment. Let r(X(t)) denote the instantaneous
Fig. 1. State transition diagram for the assumed operating environment.
192
Y. Xiang et al. / Computers & Industrial Engineering 62 (2012) 190–197
rate of system deterioration at time t, t P 0, where r(1) 6 r(0) 6 r(+1). Fig. 1 contains a graphical representation of the states of the environment and the potential transitions among the states. Let Y(t) denote the cumulative deterioration of the system at time t, t P 0,
YðtÞ ¼
Z
t
rðXðuÞÞdu
0
The system fails for the first time when the cumulative deterioration reaches the value W where W is a non-negative, continuous random variable having cumulative distribution function H. We assume that W is a random variable having mean 1, and we consider four specific forms of the probability distribution of W: 1. W is an exponential random variable
HðwÞ PrðW 6 wÞ ¼ 1 expðwÞ 2. W is a Weibull random variable (h > 0)
h ! 1 HðwÞ PrðW 6 wÞ ¼ 1 exp w C 1 þ h 3. W is a gamma random variable (a > 0)
HðwÞ PrðW 6 wÞ ¼
Z
w
0
aa xa1 eax dx CðaÞ
4. W is a lognormal random variable (l < 0)
HðwÞ PrðW 6 wÞ ¼
Z 0
w
! 1 ðln x lÞ2 pffiffiffiffiffiffiffiffiffiffiffiffiffiffi exp dx 4l x 4lp
Let T denote the time to first failure of the system,
Fig. 2. Example time to first failure data.
is replicated a user-specified number of times n. The other user-defined inputs include the transition rates associated with the continuous-time Markov chain and the deterioration rates associated with each state of the continuous-time Markov chain. Consider an example system (which we refer to as Example System 1) such that the failure threshold is a Weibull random variable (h = 2), r(1) = 0.7, r(0) = 1, r(+1) = 1.5, q1,0 = 40, q0,1 = 35, q0,+1 = 80, and q+1,0 = 45. Fig. 2 contains the adjusted relative frequency based on n = 1000 simulated observations on system time to first failure. Let F denote the cumulative distribution function of the Weibull random variable used to approximate T,
PrðT 6 tÞ ffi FðtÞ ¼ 1 expððt=gÞb Þ and let f denote the corresponding probability density function. Note that b > 0 and g > 0. Let {t1, t2, . . . , tn} denote the time to first failure values obtained from the simulation model. We use maximum likelihood estimation to estimate the values of b and g. The log-likelihood function is constructed as follows
T ¼ minfYðtÞ P Wg tP0
Note that T is a non-negative, continuous random variable. Kharoufeh and Cox (2005) show that the probability distribution of this type of time to failure random variable is unwieldy when the failure threshold W is constant. 2.2. Approximating time to failure behavior The time to first failure (T) of the system defined in Section 2.1 has a probability distribution that is unwieldy. In addition, the Weibull probability distribution is widely accepted by maintenance planning researchers and practitioners as a model of time to first failure behavior. Therefore, our goal in this section is to define a methodology for approximating the true time to failure probability distribution of the system defined in Section 2.1 with a Weibull probability distribution. Our approach in developing this approximation is motivated by the data-driven approach typically used in practice to fit a Weibull distribution to time to first failure data. The outline of our approach is as follows. We first construct a discrete-event simulation model to mimic the transition behavior of the dynamic environment, generate time to first failure data in a Markovian environment, use Weibull distribution to fit the simulated data, and conduct large numerical experiments to assess the quality of the fit by a K–S goodness-of-fit test. To generate time to first failure data, a discrete-event simulation model of the system defined in Section 2.1 was constructed. This model is initialized with a new system in state 0, mimics the transitions of the underlying continuous-time Markov chain (accumulating deterioration over time), terminates when the cumulative deterioration crosses the randomly-generated failure threshold, and outputs the time to first failure value. This process
Kðb; gÞ ¼ ln
n Y
f ðt1 Þ
ð1Þ
i¼1
The maximum likelihood estimate (MLE) of b and g can be obtained by solving the following two equations
g^ ¼
n 1X ^ tb n i¼1 i
!1b ð2Þ
Pt
^ b n 1X 1 t i ln t i 1 lnðti Þ ¼ 0 Pn b^ ^ n b 1 1 ti
ð3Þ
Farnum and Booth (1997) show that the maximum likelihood estimates for the two-parameter Weibull distribution are unique, therefore, bisection search(Monahan, 2001) is used to find the root ^ and then for Eq. (3), which is the maximum likelihood estimate b, substitution into Eq. (2) is used to identify the maximum likelihood ^ . The bisection search is terminated when estimate g
jbkþ1 bk j 6 where bk is the value of b in iteration k and bk+1 is the value of b in the iteration k + 1. We use = 0.005. We established the initial interval of uncertainty on b by trying different lower and upper bounds, and we found that an initial lower bound of 0.2 and an initial upper bound of 9 work very well. For Example System 1, the ^ and g ^ are 2.03 and 0.99, respectively. Note resulting estimates of b ^ > 1 is consistent with the belief that systems subthat a value of b ject to deterioration processes exhibit increasing failure rate behavior. In Fig. 3, the probability density function corresponding to these estimates is superimposed onto the histogram from Fig. 2.
193
Y. Xiang et al. / Computers & Industrial Engineering 62 (2012) 190–197
P0;þ1 ¼ 1 p0;1
Fig. 3. Example time to failure histogram with approximate density function.
The final step in our analysis of the simulated time to first failure data is motivated by researchers who test the quality of pseudo-random number generators by subjecting the pseudorandom numbers to statistical tests for randomness even though they know the null hypothesis to be false. Based on this motivation, we subject a subset of the simulated data to a Kolmogorov– Smirnoff (K–S) goodness-of-fit test (with a level of significance of 0.05) based on the approximate Weibull probability distribution (see Kelton and Law (2000) for the details associated with performing a K–S goodness-of-fit test). Since it is unlikely that an analyst attempting to fit a probability distribution to a time to first failure data set would have access to a large data set, only fifty observations from the simulated data set are utilized. The test statistic from the K–S test is 0.680, which is less than the critical value 0.856 for a level of significance of 0.05 and a sample size of 50. Therefore, the Weibull probability distribution provides a reasonable fit to the simulated time to failure data for Example System 1. 2.3. Numerical assessment of the quality of the approximation The results for Example System 1 presented in Section 2.2 demonstrate that it is possible to fit a Weibull distribution to the time to first failure data from a system of the type defined in Section 2.1. The purpose of this section is to define a larger numerical experiment that covers a wider and reasonable range of system parameter values. Without loss of generality, we assume r(0) = 1 for all experiments. For each experiment, the system deterioration rate in the less severe than nominal environment should be smaller than r(0), and the system degradation rate in the more severe than nominal environment should be larger than r(0). Therefore, r(1) is drawn from a uniform probability distribution over the range (0.25, 0.75), 4Þ. Regarding the i.e. r(1) U(0.25, 0.75), and rðþ1Þ Uð1:3; failure threshold, our intention is to include a wide variety of the four different probability distributions. When the threshold is exponentially distributed, since the failure threshold mean is equal to 1, the rate of the exponential distribution is 1 and no other parameter needs be specified. When the threshold is a Weibull random variable, h U(0.5, 3.5); when the threshold is a gamma random variable, a U(0.5, 3.5); when the threshold is lognormally distributed, l U(1, 0.1). For the continuous-time Markov chain that governs changes in the system operating environment, our intention in designing the experiments was for the typical system to transition among environmental states many time before system failure. Therefore, m0 U(50, 200). In addition, our intention was for dwell times in the nominal environmental state to be longer than the dwell times in the other two states. Therefore, m1 U(1.5, 4) m0 and m+1 U(1.5, 4) m0. The transition probabilities for the environmental states are drawn according to p0,1 U(0.25, 0.75) and
Using this methodology, 10,000 experimental systems were created for each of the four failure threshold distributions (40,000 total experimental systems). For each system, a time to first failure data set of size 50 was created using the simulation model. For each data set, the parameters of the approximate Weibull distribution were estimated, and the corresponding K–S test was performed using a level of significance of 0.05. The results are summarized in Table 1. From Table 1, we can see that for more than 90% of the tests involving exponential, Weibull, and gamma failure threshold distributions, the Weibull probability distribution provides a good approximation for the time to first failure behavior of the system. The quality of the Weibull fit is not as good when the failure threshold is a lognormal random variable. This is because the lognormal distribution has the thicker tail among the four distributions used here. When the random failure threshold has a thick tail, time to failure has a thick tail, too. The MLE is notorious in extrapolating from a fit that is dominated from the middle of the data, and frequently provides bad fits in tails where the (simulated) data are scarce. In an effort to gain insight into the quality of the Weibull approximation, we attempted to develop a statistical model of the K–S test result as a function of the system parameters. No adequate model forms could be identified. Furthermore, a one-factor-at-a-time graphical analysis revealed no obvious patterns in the test results. Since the failure of the system of interest is governed by a cumulative deterioration process, we expected the system to demonstrate increasing failure rate behavior. Therefore, we were quite ^ < 1. Upon surprised at the number of experiments resulting in b further analysis of the results from the experiments involving gamma and Weibull failure threshold distributions, we found that when h > 1 or a > 1, the resulting approximated Weibull distribution almost always has increasing failure rate (IFR), and when h < 1 or a < 1, the resulting approximated Weibull distribution almost always has a decreasing failure rate (DFR). Based on all of these results, we hypothesize that the behavior time to first failure distribution for the repairable system of interest mimics the general behavior of the failure threshold distribution. Since the exponential distribution is a special case of the Weibull distribution and the gamma distribution has similar behavior to the Weibull distribution, we think it is reasonable that the Weibull approximation of the time to first failure distribution is very good for the experiments involving exponential, gamma, and Weibull failure threshold distributions. 3. Maintenance models 3.1. Scheduled and condition-based maintenance policies for the system of interest In this section, we develop both scheduled and condition-based maintenance policies for the system of our interests and assess the
Table 1 Experimental results for approximating time to failure with the Weibull distribution. Failure threshold distribution
Experiments resulting ^>1 in b
Experiments resulting in rejection of the null hypothesis in the K–S test
Exponential Weibull Gamma Lognormal
5586 8361 8375 5164
599 (5.99%) 547 (5.47%) 864 (8.64%) 3923 (39.23%)
(55.86%) (83.61%) (83.75%) (51.64%)
194
Y. Xiang et al. / Computers & Industrial Engineering 62 (2012) 190–197
potential cost benefits from implementing condition-based maintenance policies. Based on the approximation of Weibull distribution described in Section 2.2, an analytical cost model is developed to determine optimal scheduled maintenance policies. Simulation optimization is used to seek optimal condition-based maintenance policies. Large numerical experiments are conducted to compare the long-run average cost rates under different maintenance strategies. We also examine the robustness of the cost savings by considering prognostic errors. Under a scheduled maintenance paradigm, the approximate Weibull probability distribution of T can be used to develop a scheduled maintenance policy for the repairable system. In this paper, we consider an age-based preventive maintenance policy. Under this policy, instantaneous, perfect, corrective maintenance is performed if the repairable system fails. If the repairable system operates without failure for s time units (where s > 0), then instantaneous, perfect, preventive maintenance is performed. We evaluate the cost of this age-based preventive maintenance policy using the long-run average cost of system maintenance. Let cCM denote the cost of a corrective maintenance action, and let cPM denote the cost of a preventive maintenance action. We assume 0 < cPM < cCM. Since both corrective and preventive maintenance restore the repairable system to an as good as new condition, the operation and maintenance of the repairable system can be modeled by a renewal process where the renewal points are the completion of any maintenance action. Let l(s) denote the expected value of the duration of one cycle of the renewal process, i.e. the expected time between the completion of two consecutive maintenance actions given a scheduled maintenance policy s. Assuming that the approximate Weibull time to failure distribution is valid,
lðsÞ ¼
Z s
tf ðtÞdt þ sð1 FðsÞÞ
0
Let q(s) denote the expected value of the maintenance cost incurred in one cycle of the renewal given a scheduled maintenance policy s. Assuming that the approximate Weibull time to failure distribution is valid,
qðsÞ ¼ cCM FðsÞ þ cPM ð1 FðsÞÞ Let cSM(s) denote the long-run expected maintenance cost per unit time given a scheduled maintenance policy s. Assuming that the approximate Weibull time to failure distribution is valid,
cSM ðsÞ ¼
qðsÞ lðsÞ
ð4Þ
Numerical integration is required to evaluate Eq. (4) and equal interval search (Arora, 2004) is used to find the recommended agebased preventive maintenance policy s. The algorithm is terminated when
jskþ1 sk j s sk where sk is the value of s in iteration k and sk+1 is the value of s in the iteration k + 1. We use es = 0.005. We initiate the numerical search from a very small value,
s0 ¼ 0:01 MTTF where MTTF is the mean time to failure based on the approximate Weibull distribution. For Example System 1 with CPM = 1 and CCM = 1.5, the recommended age-based preventive maintenance policy is s = 1.38 and cSM(s) = 1.745. Since the Weibull probability distribution is approximate, the value of cSM(s) does not reflect the true long-run expected maintenance cost per unit time given the recommended scheduled maintenance policy. We extend the simulation model described in
Section 2.2 to include corrective maintenance, and the recommended scheduled maintenance policy, to estimate the true long-run expected maintenance cost per unit time given the recommended scheduled maintenance policy. We denote to this ^SM ðs Þ. Estimating the cost in this manner introduces estimate as c a new issue: simulation run length. We use the procedure recommended by Law and Carson (1979) to terminate the simulation. Their method requires specification of a confidence level and q, an upper limit on the ratio of the confidence interval half width to the point estimate. We use a confidence level of 95% and q = 0.025, as suggested in Law and Carson (1979). Based on the simulation, the estimate of the actual long-run cost per unit time ^SM ð1:38Þ of the recommended scheduled maintenance policy is c which is approximately 20% greater than cSM(1.38), the cost indicated by the renewal-reward process. The difference between cost rates resulted from simulation and analytical analysis show that there is some gap between the Weibull approximation and the true failure distribution. Weibull distribution is a reasonable approximation statistically, however, a better maintenance strategy based on the underlying failure mechanism is needed to eliminate the risk of system failure and achieve cost efficiency. This further motivates exploring potential benefits of implementing conditionbased maintenance for the system of our interests. Suppose the defined repairable system is such that inspection can be used to observe the current value of the cumulative deterioration of the system. As an alternative to the age-based, scheduled preventive maintenance policy, we consider a condition-based maintenance policy based on periodic inspection of the repairable system. Under such a policy, instantaneous, perfect, corrective maintenance is performed if the repairable system fails. Furthermore, the cumulative deterioration of the system is observed (through instantaneous, perfect inspection) every d time units after each maintenance action (where d > 0). Upon any inspection, if the cumulative deterioration exceeds n (where n > 0), then instantaneous, perfect, preventive maintenance is performed. We evaluate the cost of this periodic-inspection, conditionbased maintenance policy using the long-run average cost of system maintenance. Let c1 denote the cost on inspection. We assume 0 < c1 < cPM. Because of the complexity of the stochastic deterioration process, we modify our existing simulation model to estimate cCBM(d, n), the long-run expected maintenance cost per unit time given a condition-based maintenance policy (d, n). We denote this ^CBM ðd; nÞ. estimate as c We embed this simulation model into an optimization algorithm to determine recommended values of d and n. The optimization algorithm we use is the cyclic coordinate method (Bazaraa, Sherali, & Shetty, 2006), a multidimensional search technique which does not utilize the derivatives of the objective function, and equal interval search (Arora, 2004) for each dimension. The algorithm is terminated when the number of iterations exceeds N, or
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ðdkþ1 dk Þ2 þ ðnkþ1 nk Þ2 where (dk, nk) is the solution in iteration k and (dk+1, nk+1) is the solution in iteration k + 1. We use e = 0.1 and N = 1000. This search algorithm does not guarantee a global optimum, therefore we replicate the process multiple times with different starting solutions and select the values of d and n that generate the smallest value ^CBM ðd; nÞ across all replications. We initiate the search process of c using a very small inspection interval
d0 ¼ 0:01 MTTF and a very low critical deterioration level
n0 ¼ fn : PrðW 6 nÞ ¼ 0:05g ¼ 0:256
195
Y. Xiang et al. / Computers & Industrial Engineering 62 (2012) 190–197 Table 2 Experimental results for assessing the cost benefit of condition-based maintenance (CBM) as opposed to preventive maintenance (PM). Failure threshold distribution
Number of experiments
Experiments resulting ^SM ðsÞ ^CBM ðd; nÞ < c in c
Average cost savings resulting from CBM as opposed to PM (%)
Median cost savings resulting from CBM as opposed to PM (%)
Standard deviation of cost savings resulting from CBM as opposed to PM (%)
Exponential Weibull Gamma Lognormal
5586 8361 8375 5164
5523 8175 8331 5115
30.98 23.17 44.88 30.20
31.91 23.66 47.32 29.56
11.41 10.71 12.69 12.76
(98.87%) (97.78%) (99.47%) (99.05%)
Table 3 Experimental results (5586 experiments) for assessing the cost benefit of condition-based maintenance (CBM) as opposed to preventive maintenance (PM) under prognostic error when the failure threshold is an exponential random variable. Standard deviation of prognostic error (r)
Experiments resulting ^SM ðsÞ ^CBM ðd; nÞ < c in c
Average cost savings resulting from CBM as opposed to PM (%)
Median cost savings resulting from CBM as opposed to PM (%)
Standard deviation of cost savings resulting from CBM as opposed to PM (%)
0.03 0.06 0.09 0.12 0.15
2938 2913 2898 2910 2894
1.16 1.18 1.01 1.00 1.22
1.32 1.05 1.08 1.12 0.85
21.20 21.00 20.92 20.56 20.67
(52.60%) (52.15%) (51.88%) (52.09%) (51.81%)
Table 4 Experimental results (8361 experiments) for assessing the cost benefit of condition-based maintenance (CBM) as opposed to preventive maintenance (PM) under prognostic error when the failure threshold is a Weibull random variable. Standard deviation of prognostic error (r)
Experiments resulting ^SM ðsÞ ^CBM ðd; nÞ < c in c
Average cost savings resulting from CBM as opposed to PM (%)
Median cost savings resulting from CBM as opposed to PM (%)
Standard deviation of cost savings resulting from CBM as opposed to PM (%)
0.03 0.06 0.09 0.12 0.15
4807 4594 4357 4161 4050
1.23 0.18 1.27 2.28 3.15
3.03 2.27 0.94 0.08 0.55
17.37 17.61 18.38 18.68 19.21
(57.49%) (54.95%) (52.11%) (49.77%) (48.44%)
We then replicate the search process using different starting points by making d0 a random (uniformly distributed) proportion of MTTF,
d0 ¼ k MTTF 0 < k < 1. For Example System 1 with cPM = 1, cCM = 1.5, and cI = 0.1, the recommended condition-based maintenance policy is ^CBM ð0:78; 1:18Þ ¼ 1:81. The cost resulting (d = 0.78, n = 1.18) and c from the condition-based maintenance policy represents a savings of approximately 14% as compared to the recommended age-based preventive maintenance policy. 3.2. Numerical assessment of the cost benefits of the condition-based maintenance The results for the example system presented in Section 3.1 demonstrate that there is potential cost savings through implementing a condition-based maintenance policy as opposed to a scheduled maintenance policy. The purpose of this section is to define a larger numerical experiment that covers a wider and reasonable range of system parameter values and to use these experiments to assess the potential cost benefits of a conditionbased maintenance paradigm. Since preventive maintenance is only a reasonable activity for repairable systems having an increasing failure rate, we begin by considering only the experiments from Section 2.3 resulting in ^ > 1. For each of the experiments, without loss of generality, we b assume that cPM = 1, cCM U(1.5, 10), and c1 U(0.005, 0.025). For each experiment, numerical search (as described in Section 3.1) is conducted to find a recommended age-based preventive mainte-
nance policy and a recommended condition-based maintenance policy. The long-run costs per unit time resulting from these two maintenance policies are compared using the previously described simulation model. The results of these experiments, as described by the cost savings resulting from condition-based maintenance (as compared to age-based preventive maintenance), are summarized in Table 2. The results imply that even a simplistic condition-based maintenance policy provide a great opportunity for reducing maintenance costs for systems of the type defined in Section2.1. 3.3. Impact of prognostic errors on system performance under condition-based maintenance As Carrasco and Cassady (2006) point out, a condition-based maintenance tool that provides a perfect estimate of equipment health is an unrealistic standard because prognostic tools associated with condition-based maintenance are typically subject to prognostic error. These prognostic errors can lead to unnecessary preventive maintenance and unnecessary system failures. In this section, we attempt to address this concern by determining how these prognostic errors impact the cost savings described in Section 3.2. To model the prognostic errors associated with the conditionbased maintenance tool, we assume that the estimated cumulative degradation of the system observed as time t can be modeled as
Y read ðtÞ ¼ YðtÞ þ where the prognostic error e is a normal random variable having a mean of zero and a standard deviation of r (note that r = 0
196
Y. Xiang et al. / Computers & Industrial Engineering 62 (2012) 190–197
Table 5 Experimental results (8375 experiments) for assessing the cost benefit of condition-based maintenance (CBM) as opposed to preventive maintenance (PM) under prognostic error when the failure threshold is a gamma random variable. Standard deviation of prognostic error (r)
Experiments resulting ^SM ðsÞ ^CBM ðd; nÞ < c in c
Average cost savings resulting from CBM as opposed to PM (%)
Median cost savings resulting from CBM as opposed to PM (%)
Standard deviation of cost savings resulting from CBM as opposed to PM (%)
0.03 0.06 0.09 0.12 0.15
4865 4781 4719 4610 4488
1.76 1.30 0.76 0.16 0.06
3.53 3.18 2.44 2.06 1.56
17.26 17.36 17.76 17.74 17.84
(58.09%) (57.09%) (56.35%) (55.04%) (53.59%)
Table 6 Experimental results (5164 experiments) for assessing the cost benefit of condition-based maintenance (CBM) as opposed to preventive maintenance (PM) under prognostic error when the failure threshold is a lognormal random variable. Standard deviation of prognostic error (r)
Experiments resulting ^CBM ðd; nÞ < c ^SM ðsÞ in c
Average cost savings resulting from CBM as opposed to PM (%)
Median cost savings resulting from CBM as opposed to PM (%)
Standard deviation of cost savings resulting from CBM as opposed to PM (%)
0.03 0.06 0.09 0.12 0.15
3156 3051 2973 2894 2872
2.41 1.87 1.12 0.37 0.17
4.99 3.91 3.56 2.59 2.26
20.32 19.75 19.96 20.51 20.55
(61.12%) (59.08%) (57.57%) (56.04%) (55.62%)
corresponds to perfect prognostics). This error may cause unnecessarily early maintenance due to overestimation of system deterioration and system failure due to underestimation of system deterioration. We consider the experiments from Section 3.2 for and evaluate the performance of the recommended condition-based maintenance policy under five different levels of prognostic error in the prognostic tool. The experimental results are summarized in Tables 3–6. The general patterns observed in Tables 3–6 are: (1) even small levels of prognostic error can render condition-based maintenance no better than or even worse than scheduled, preventive maintenance; (2) as prognostic errors increases, the relative performance of condition-based maintenance (as compared to scheduled maintenance) degrades. 4. Conclusions and opportunities for future work In this study, we present a simulation-and-optimization-based approach for the estimation of full lifetime distributions for single-component systems operated under a Markovian environment. Both age-based and condition-based maintenance policies are developed for the defined systems and a large numerical experiment is conducted to assess the cost benefits of the conditionbased maintenance. Our results show that there is significant potential for cost savings resulting from shifting from a scheduled maintenance paradigm to a condition-based maintenance paradigm. However, the results also suggest that moderate prognostic error can render condition-based maintenance inferior to age-based preventive maintenance. Our approach is novel in that we combine analytic stochastic deterioration modeling techniques with traditional life distribution estimation approaches. We also consider the impact of prognostic error in estimating the system condition on the cost benefits, which has rarely been addressed in the existing literature. There are several obvious extensions to this research. Relative to the environment, expansion of the environment state space and consideration of semi-Markov or non-Markovian environments are both worthwhile extensions. Relative to the system, consideration of multi-unit systems is an obvious extension. Relative to maintenance policies, there are also several worthwhile
extensions to this research. These extensions include consideration of: (1) non-periodic inspection policies, (2) sensor data that is only correlated with the true cumulative deterioration, and (3) realtime sensor data. Acknowledgements Xiang acknowledges helpful conversations on this topic with Professor David Coit. Xiang gratefully acknowledges support from Chinese Ministry of Education under Grant 11YJC630228, Natural Science Foundation of Guangdong under Grant 2011040002092, and China Postdoctoral Science Foundation under Grant 20110490948. References Aalen, O. O., & Hakon, K. G. (2001). Understanding the shape of the hazard rate: A process point of view. Statistical Science, 16(1), 1–14. Alardhi, M., & Labib, A. W. (2008). Preventive maintenance scheduling of multicogeneration plants using integer programming. Journal of the Operational Research Society, 59(4), 503–509. Arora, J. (2004). Introduction to optimum design. Academic Press. Bazaraa, M. S., Sherali, H. D., & Shetty, C. M. (2006). Nonlinear programming: Theory and algorithms. LibreDigital. Bloch-Mercier, S. (2002). A preventive maintenance policy with sequential checking procedure for a Markov deteriorating system. European Journal of Operational Research, 142(3), 548–576. Boulanger, M., & Escobar, L. A. (1994). Experimental design for a class of accelerated degradation tests. Technometrics, 36(3), 260–272. Budai, G., Huisman, D., & Dekker, R. (2006). Scheduling preventive railway maintenance activities. Journal of the Operational Research Society, 57, 1035–1044. Canto, S. P. (2008). Application of Benders’ decomposition to power plant preventive maintenance scheduling. European Journal of Operational Research, 184(2), 759–777. Carrasco, M., & Cassady, C. R. (2006, 23–26 January 2006). A study of the impact of prognostic errors on system performance. Paper presented at the reliability and maintainability symposium, 2006. RAMS0 06. Annual. Chen, C.-T., Chen, Y.-W., & Yuan, J. (2003). On a dynamic preventive maintenance policy for a system under inspection. Reliability Engineering and System Safety, 80(1), 41–47. Chien-Yu, P., & Sheng-Tsaing, T. (2010). Progressive-stress accelerated degradation test for highly-reliable products. Reliability, IEEE Transactions on, 59(1), 30–37. Chih-Chun, T., Sheng-Tsaing, T., & Balakrishnan, N. (2011). Optimal burn-in policy for highly reliable products using gamma degradation process. Reliability, IEEE Transactions on, 60(1), 234–245.
Y. Xiang et al. / Computers & Industrial Engineering 62 (2012) 190–197 Cho, D. I., & Parlar, M. (1991). A survey of maintenance models for multi-unit systems. European Journal of Operational Research, 51(1), 1–23. Das, A. N., & Acharya, D. (2004). Age replacement of components during IFR delay time. Reliability, IEEE Transactions on, 53(3), 306–312. Dieulle, L., Bérenguer, C., Grall, A., & Roussignol, M. (2003). Sequential conditionbased maintenance scheduling for a deteriorating system. European Journal of Operational Research, 150(2), 451–461. Elwany, A. H., & Gebraeel, N. (2008). Sensor-driven prognostic models for equipment replacement and spare parts inventory. IIE Transactions, 40(7), 629–639. Elwany, A. H., Gebraeel, N., & Maillart, L. M. (2011). Structured replacement policies for components with complex degradation processes and dedicated sensors. Operations Research, 59, 684–695. Farnum, N. R., & Booth, P. (1997). Uniqueness of maximum likelihood estimators of the 2-parameter Weibull distribution. Reliability, IEEE Transactions on, 46(4), 523–525. Grall, A., Bérenguer, C., & Dieulle, L. (2002). A condition-based maintenance policy for stochastically deteriorating systems. Reliability Engineering and System Safety, 76(2), 167–180. Grall, A., Dieulle, L., Bérenguer, C., & Roussignol, M. (2006). Asymptotic failure rate of a continuously monitored system. Reliability Engineering and System Safety, 91(2), 126–130. Jardine, A. K. S. (2002). Optimizing condition based maintenance decisions. Paper presented at the reliability and maintainability symposium, 2002. Proceedings. Annual. Joseph, V. R., & Yu, I. T. (2006). Reliability improvement experiments with degradation data. Reliability, IEEE Transactions on, 55(1), 149–157. Kaiser, K. A., & Gebraeel, N. Z. (2009). Predictive maintenance management using sensor-based degradation models. Systems, Man and Cybernetics, Part A: Systems and Humans, IEEE Transactions on, 39(4), 840–849. Kelton, W. D., & Law, A. M. (2000). Simulation modeling and analysis (Vol. 2). McGraw Hill. Kenné, J. P., Gharbi, A., & Beit, M. (2007). Age-dependent production planning and maintenance strategies in unreliable manufacturing systems with lost sale. European Journal of Operational Research, 178(2), 408–420. Kharoufeh, J. P. (2003). Explicit results for wear processes in a Markovian environment. Operations Research Letters, 31(3), 237–244. Kharoufeh, J., & Cox, S. (2005). Stochastic models for degradation-based reliability 1. IIE Transactions, 37(6), 533. Kharoufeh, J. P., & Mixon, D. G. (2009). On a Markov modulated shock and wear process. Naval Research Logistics (NRL), 56(6), 563–576. Kharoufeh, J. P., Solo, C. J., & Ulukus, M. Y. (2010). Semi-Markov models for degradation-based reliability. IIE Transactions, 42(8), 599–612. Kurt, M., & Kharoufeh, J. P. (2010). Monotone optimal replacement policies for a Markovian deteriorating system in a controllable environment. Operations Research Letters, 38(4), 273–279. Law, A. M., & Carson, J. S. (1979). A sequential procedure for determining the length of a steady-state simulation. Operations Research, 27(5), 1011–1025. Liao, H. T., & Rausch, M. (2010). Spare part inventory control driven by condition based maintenance. In Annual reliability and maintainability symposium, 2010 Proceedings. Liao, H., Elsayed, E. A., & Chan, L.-Y. (2006). Maintenance of continuously monitored degrading systems. European Journal of Operational Research, 175(2), 821–835. Lu, J., & Meeker, W. Q. (1993). Using degradation measures to estimate a time-tofailure distribution. Technometrics, 35(2), 161–174.
197
Meeker, W. Q., Escobar, L. A., & Lu, C. J. (1998). Accelerated degradation tests: Modeling and analysis. Technometrics, 40(2), 89–99. Moghaddam, K. S., & Usher, J. S. (2011). Sensitivity analysis and comparison of algorithms in preventive maintenance and replacement scheduling optimization models. Computers and Industrial Engineering, 61(1), 64–75. Monahan, J. F. (2001). Numerical methods of statistics. Cambridge, New York: Cambridge University Press. Pan, Z., & Balakrishnan, N. (2011). Reliability modeling of degradation of products with multiple performance characteristics based on gamma processes. Reliability Engineering and System Safety, 96(8), 949–957. Panagiotidou, S., & Tagaras, G. (2007). Optimal preventive maintenance for equipment with two quality states and general failure time distributions. European Journal of Operational Research, 180(1), 329–353. Park, C., & Padgett, W. (2005). Accelerated degradation models for failure based on geometric Brownian motion and gamma processes. Lifetime Data Analysis, 11(4), 511–527. Pierskalla, W. P., & Voelker, J. A. (1976). A survey of maintenance models: The control and surveillance of deteriorating systems. Naval Research Logistics Quarterly, 23(3), 353–388. Samrout, M., Châtelet, E., Kouta, R., & Chebbo, N. (2009). Optimization of maintenance policy using the proportional hazard model. Reliability Engineering and System Safety, 94(1), 44–52. Sherif, Y. S., & Smith, M. L. (1981). Optimal maintenance models for systems subject to failure – A review. Naval Research Logistics Quarterly, 28(1), 47–74. Si, X.-S., Wang, W., Hu, C.-H., & Zhou, D.-H. (2011). Remaining useful life estimation – A review on the statistical data driven approaches. European Journal of Operational Research, 213(1), 1–14. Singpurwalla, N. D. (1995). Survival in dynamic environments. Statistical Science, 10(1), 86–103. Valdez-Flores, C., & Feldman, R. M. (1989). A survey of preventive maintenance models for stochastically deteriorating single-unit systems. Naval Research Logistics (NRL), 36(4), 419–446. van der Weide, J. A. M., & Pandey, M. D. (2011). Stochastic analysis of shock process and modeling of condition-based maintenance. Reliability Engineering and System Safety, 96(6), 619–626. van der Weide, J. A. M., Pandey, M. D., & van Noortwijk, J. M. (2010). Discounted cost model for condition-based maintenance optimization. Reliability Engineering and System Safety, 95(3), 236–246. van Noortwijk, J. M. (2009). A survey of the application of gamma processes in maintenance. Reliability Engineering and System Safety, 94(1), 2–21. van Noortwijk, J. M., van der Weide, J. A. M., Kallen, M. J., & Pandey, M. D. (2007). Gamma processes and peaks-over-threshold distributions for timedependent reliability. Reliability Engineering and System Safety, 92(12), 1651–1658. Wang, H. (2002). A survey of maintenance policies of deteriorating systems. European Journal of Operational Research, 139(3), 469–489. Wang, W., Carr, M., Xu, W., & Kobbacy, K. (2011). A model for residual life prediction based on Brownian motion with an adaptive drift. Microelectronics Reliability, 51(2), 285–293. Whitmore, G. A., Crowder, M. J., & Lawless, J. F. (1998). Failure inference from a marker process based on a bivariate wiener model. Lifetime Data Analysis, 4(3), 229–251. Whitmore, G., & Schenkelberg, F. (1997). Modelling accelerated degradation data using Wiener diffusion with a time scale transformation. Lifetime Data Analysis, 3(1), 27–45.