An integrated framework for health measures prediction and optimal maintenance policy for mechanical systems using a proportional hazards model

An integrated framework for health measures prediction and optimal maintenance policy for mechanical systems using a proportional hazards model

Mechanical Systems and Signal Processing 111 (2018) 285–302 Contents lists available at ScienceDirect Mechanical Systems and Signal Processing journ...

692KB Sizes 0 Downloads 27 Views

Mechanical Systems and Signal Processing 111 (2018) 285–302

Contents lists available at ScienceDirect

Mechanical Systems and Signal Processing journal homepage: www.elsevier.com/locate/ymssp

An integrated framework for health measures prediction and optimal maintenance policy for mechanical systems using a proportional hazards model Chaoqun Duan a,b,⇑, Viliam Makis b,⇑, Chao Deng a a b

School of Mechanical Science and Engineering, Huazhong University of Science and Technology, Wuhan 430074, PR China Department of Mechanical and Industrial Engineering, University of Toronto, 5 King’s College Road, Toronto, ON M5S 3G8, Canada

a r t i c l e

i n f o

Article history: Received 12 May 2017 Received in revised form 10 November 2017 Accepted 14 February 2018

Keywords: Health measures Multi-state deterioration process Conditional-based maintenance Proportional hazards model Gamma process

a b s t r a c t This paper considers an integrated framework for health measures prediction and optimal maintenance policy for mechanical systems subject to condition monitoring (CM) and random failure. We propose the proportional hazards model (PHM) to consider CM information as well as the age of the mechanical systems. Although the form of health prediction for the mechanical systems under periodic monitoring in the PHM with Markov chain was developed previously, the case of the continuous-state degradation process allowing possible degradation between the inspections still has not appeared. To this aim, the paper allows the use of Gamma process with non-constant degradation, which broadens the application area of PHM. A matrix-based approximation method is employed to compute health measures of the machine, such as condition reliability, mean residual life, residual life distribution. Based on the health measures, the optimal maintenance policy, which considers both hazard rate control limit and age control limit, is proposed and the optimization problem is formulated and solved in a semi-Markov decision process (SMDP) framework. The objective is to minimize the long-run expected average cost. The method is illustrated using two real data sets obtained from feed subsystem of a boring machine and GaAs lasers collected at regular time epochs, respectively. A comparison with other methods is given, which illustrates the effectiveness of our approach. Ó 2018 Elsevier Ltd. All rights reserved.

1. Introduction With the development of sensor technologies and CM techniques, advanced studies in reliability and maintenance framework have been proposed based on the actual information regarding the degradation process for efficient health monitoring of aging mechanical systems [1,2]. Comprehensive literature reviews on the reliability modeling and maintenance policy of mechanical systems have been provided in [3–6]. Among these approaches, degradation process can be modeled as discretestate stochastic process, continuous-state stochastic process, or PHM. Discrete-state degradation process includes Markov and semi-Markov models. Kim et al. [7] provided an effective hidden Markov model in Bayesian control chart for the oil transmission unit fault detection. Peng and Dong [8] presented a four-state age dependent hidden semi-Markov model

⇑ Corresponding authors at: School of Mechanical Science and Engineering, Huazhong University of Science and Technology, Wuhan 430074, PR China (C. Duan) and Department of Mechanical and Industrial Engineering, University of Toronto, 5 King’s College Road, Toronto, ON M5S 3G8, Canada (V. Makis). E-mail addresses: [email protected] (C. Duan), [email protected] (V. Makis). https://doi.org/10.1016/j.ymssp.2018.02.029 0888-3270/Ó 2018 Elsevier Ltd. All rights reserved.

286

C. Duan et al. / Mechanical Systems and Signal Processing 111 (2018) 285–302

for prediction of remaining useful life of the hydraulic pumps. Subsequently, Liu et al. [9] proposed an adaptive four-state hidden semi-Markov model for the integrated diagnosis and prognosis of multi-sensor equipment. In contrast with discrete-state degradation process, continuous-state degradation process has also been widely used, such as regressionbased method, Gamma process and Wiener process. Gebraeel et al. [10] developed a regression-based degradation model for computing residual life distributions of rotating machinery operating under time-varying environmental conditions. Nicolaia at al. [11] considered non-stationary Gamma process to measure the deterioration of organic coating layer of steel structures. Huang et al. [12] proposed an adaptive skew-Wiener process for assessment of residual life of ball bearings in rotating electrical machines. Other approaches about continuous-state degradation process modeling can be found in [13,6]. As the failure of a mechanical system is a developing process involving the load action and damage accumulation [14], a good health assessment method should take advantage of mutual information from multiple features for system degradation modeling [15]. PHM combines the system age and CM information as covariate, in which can be modeled by discrete-state or continuous-state process. [16–18]. Makis and Jardine [16] were the first to present PHM with continuous time Markov chain covariate for system degradation modeling, and provided an optimal replacement policy. Banjevic et al. [17] extended the Makis-Jardine model by relaxing the monotonicity assumption of the hazard function. Makis and Jiang [19], Ghasemi et al. [20] used PHM to model degrading systems under partial observations and derived an optimal replacement. These models assumed degradation process is constant between inspections and state transitions only can be observable at inspection epochs, which is not actual case for most real world systems. Even although PHM with discrete-state degradation process has been utilized in practice for many years, few papers have considered continuous-state degradation process covarate in PHM. Tang et al. [21] presented an autoregressive model as a covariate in PHM to model the GaAs laser degradation path. In their paper, the degradation process is assumed to be constant between inspections for convenient computation. Another continuous-state stochastic process covariate presented by Liu et al. [22] has considered system degradation level following a time-varying Gaussian distribution, which means the degradation level was determined before condition monitoring and new CM information was not taken into account for both health prediction and maintenance decision making. This also leads to limited applications. So far, from most existing research works, to derive health measures based on PHM is still a great challenge. An exception to this is the papers by Wu and Ryan [23,24], who considered non-constant Markov model and semi-Markov model in PHM to model a deteriorating system under various monitoring schemes. Nevertheless, the proposed approach only works for the discrete-state degradation process with very limited states, i.e. Markov or semi-Markov, and cannot deal with continuousstate degradation process or large numbers of discrete degradation states due to the dramatically increasing of the computational complexity. To overcome the challenge above, a continuous-state deterioration process with non-constant degradation mechanism is considered in this paper. Since the most mechanical systems present continuous and gradual degradation process where damages accumulate monotonically over time in a sequence of tiny increments, we consider Gamma process to model the accumulative damages and use PHM to describe the hazard rate. A matrix-based approach is employed to derive the health measures and a control policy is proposed to make maintenance decision for mechanical systems. It should be noted, to mutually and sufficiently exploit the CM information, our approach has combined the health prediction and maintenance policy to make a maintenance decision whenever monitoring information is collected. The predicted health measures serve as indicators for maintenance decision making, the maintenance policy can therefore prolong the lifetime of mechanical system dependent on the system health condition. Unlike previous approaches which prediction computation is usually timeconsuming and maintenance optimization is unpractical or complicated, our approach uses matrix operations to obtain health measures and solves maintenance optimization problem in SMDP framework with matrix form. The closed form expressions of health measures and corresponding optimal maintenance action can be obtained through the simple multiplication of matrices, which is particularly attractive for practical applications. Once the health measures are predicted, an optimal maintenance policy for monotonically degrading systems is presented. Similarly, Tang et al. [21], Jafari and Makis [25] proposed hazard rate-based maintenance policy using PHM for systems subject to condition monitoring. As we consider the system deterioration is influenced by age and degradation process simultaneously, it is natural to make an extension by considering both preventive age threshold and hazard rate threshold in our policy. The health condition of mechanical system is determined by the age and covariate values, and the hazard rate is calculated right after obtaining the new information through CM. Whenever the hazard rate of system exceeds certain level or the system reaches the preventive age level, the preventive maintenance will be set up. The control limits as hazard rate threshold and age threshold are decision variables in our maintenance policy, which will be determined by formulating and analyzing the decision problem in a SMDP framework. To our knowledge, this is the first paper applying an integrated framework for matrix-based estimation and two-level maintenance control policy to degrading systems based on PHM with practical degradation mechanisms. The main contributions of this paper can be summarized as follows: (I) Incorporation of the influence of both aging and degradation state in modeling the hazard rate. The continuous, nonconstant degradation process is considered. (II) Development of matrix-based approximation method which provides closed form expressions for health measures prediction.

C. Duan et al. / Mechanical Systems and Signal Processing 111 (2018) 285–302

287

(III) Development of the SMDP framework and the algorithm to obtain optimal maintenance decision based on the actual health status of the mechanical systems. (IV) Integration of health assessment and maintenance policy to make efficient maintenance decision. (V) Development of two case studies and a comparison with multi-state degradation process covariate in PHM. Although we consider Gamma process in PHM, our method can be applied to a much wider class of stochastic degradation process. For example, our model can be applied to Wiener process or multi-state degradation process. The remainder of the paper is organized as follows. Section 2 summarizes the details of the proposed model and presents problem formulation. Section 3 introduces the integrated health prediction and maintenance policy framework. The matrix-based approximation method and computational algorithm in the SMDP framework are developed. Section 4 deals with PHM with multi-state degradation process covariate. The case application and result are illustrated in Section 5, and a comparison with other models is also given. In Section 6, we conclude the research and discuss possible extensions of our model. 2. Model formulation Generally, the failure occurrence of a mechanical system not only depends on the operational age, but also on the conditions of its deterioration. To describe the behavior of the system deterioration process properly and obtain the optimal condition-based maintenance policy, we assume that the value of covariate process, such as the performance index of manufacturing facility, level of metal particles in engine oil, or a vibration monitoring level, indicates the system deterioration. To incorporate these values into the model, we consider that the hazard rate is described by the PHM, which explicitly includes both the effect of the age and the degradation state. It can be expressed by the following relation:

hðt; Z t Þ ¼ h0 ðtÞ  wðZ t Þ

ð1Þ

where h0 ðtÞ denotes baseline hazard rate at time t, and wðZ t Þ is a time-varying covariate indicating the system degradation process. We assume that wðZ t Þ is a homogeneous Gamma process with probability density function:

f Zt ðxÞ ¼ Gaðxjv ðtÞ; uÞ ¼

uv ðtÞ v ðtÞ1 expðuxÞ x Cðv ðtÞÞ

ð2Þ

where the shape parameter v ðtÞ is a right-continuous, real-valued function for t P 0 with v ð0Þ ¼ 0, and the scale parameter R1 u is a positive constant, Cðv ðtÞÞ ¼ 0 zv ðtÞ1 ez dz is the gamma function. The degradation process has infinite divisibility with following properties: 1. For any given time epochs 0 < tn1 < tn < 1, the increments Z tn1  Z tn are independent; 2. For any given time epochs, the increments Z tn1  Z tn have gamma distribution Gaðv ðtn1 Þ  v ðt n Þ; uÞ as their sum. The system condition is monitored at regular inspection times ðD; 2D; . . .Þ, and the values of the covariate process Z t are observed at discrete points of time ðD; 2D; . . .Þ. The condition reliability function is given by

Rðtjz; nDÞ ¼ Prðn > nD þ tjn > nD; Z 1D ; . . . ; Z nD ; Z nD ¼ zÞ   Z nDþt   h0 ðsÞ  wðZ s Þds jZ nD ¼ z ¼ E exp 

ð3Þ

nD

where n is the failure time of the system. z is the value of Z nD , z is the maximum degradation level. Noortwijk [13] has showed the lifetime distribution expression of Gamma process, however, in the PHM, to directly derive the explicit formula of Eq. (3) is quite challenging and even impossible due to the time-varying of the value of Z t . To this aim, the goal of our methodology is to provide useful health measures information and effective maintenance policy for the system. The health measures include conditional reliability, residual life distribution and mean residual life. Once the health measures are obtained, the maintenance action needs to be determined by maintenance policy. The maintenance policy maintains the system upon its failure or at the preventive maintenance (PM) time. The system will be preventively maintained when its hazard rate exceeds the preventive maintenance level W or its age reaches the age level T, and will be given corrective maintenance upon failure. The following cost structure is considered in the model:    

C P : Preventive maintenance cost when PM is performed, which takes T P time units. C F : Failure (replacement) cost incurred when corrective maintenance is performed, which takes T F time units. C I : Inspection cost incurred when inspection is performed. K: Set-up cost.

A maintenance can be either a true physical replacement or an overhaul or repair such that the system is as-good-as-new after the maintenance. Even though both the preventive and the corrective maintenance actions put the system back in the

288

C. Duan et al. / Mechanical Systems and Signal Processing 111 (2018) 285–302

as-good-as-new state, they are not necessarily identical in practice because the corrective maintenance is unplanned. Thus, the corrective maintenance is considered to be more complex and more expensive (i.e. C F > C P ). 3. Integrated health prediction and maintenance policy framework The integrated health prediction and maintenance policy framework for degrading systems subject to condition monitoring and random failure is illustrated schematically in Fig. 1. The degradation data is collected by condition monitoring to obtained operational age and the degradation state of the system. Based on the PHM proposed in Section 2, the health measures of the system are evaluated using a matrix-based approximation method, and the maintenance policy is optimized in SMDP framework. Afterward, the maintenance decision is made based on the actual status of the system. The feedback control is also applied to make dynamic health evaluation and maintenance decision. To apply the matrix-based approximation method and SMDP algorithm, one needs to discrete the degradation process into finite set of states. We therefore partition the range of degradation process ½0; z into M small intervals with equal length d, and define the state M when the process is above the z. We have the degradation state space X ¼ f0; 1; . . . ; Mg. The system state is defined as follows:  State ð0; 0Þ: System is in ‘‘as good as new” condition.  State ði; tÞ: The first component represents the code value of the degradation state in covariate, and the second component  T is predetermined maximum life limit of the system. is the operating age of the machine, where i 2 X, 0 6 t 6 T,  State PM: The value of hazard rate crosses the preventive maintenance level, so PM action is performed. Note that PM action is also performed when the age of the machine exceeds the specified age T. Based on the system state, we define state space of SMDP as S ¼ S1 [ S2 [ S3 , where S1 ¼ fði; tÞji 2 X; t 2 Ng, S2 ¼ fPMg and S3 ¼ fð0; 0Þg. In the following, we develop the matrix-based approximation method and SMDP approach. 3.1. Health measures prediction In this subsection, we emphasize on the development of the matrix-based approximation method. Considering nonconstant and stochastic degradation process within inspections, the possible degradation process in a small amount of time ^ h, the joint should be recorded. At an arbitrary time point kh, k 2 Nþ , where h is a sufficiently small time interval with D ¼ n one-step transition probability from state ði; khÞ to state ðj; ðk þ 1ÞhÞ can be written as

Kij ðkhÞ ¼ Prðn > ðk þ 1Þh; Z t ¼ hj jn > kh; Z l ¼ hi ; Z; . . . ; Z 0 Þ ¼ Prðn > ðk þ 1Þh; Z ðkþ1Þh ¼ hj jn > kh; Z kh ¼ hi Þ ¼ PrðZ ðkþ1Þh ¼ hj jn > ðk þ 1Þh; Z kh ¼ hi Þ  Prðn > ðk þ 1Þhjn > kh; Z kh ¼ hi Þ

ð4Þ

where i; j 2 X, hi and hj are the degradation levels at states i and j, respectively. KðkhÞ is one-step transition matrix at time kh. P Recall that in Eq. (3), the Rðhji; nhÞ ¼ jPi Kij ðkhÞ, so for a relatively long time interval the calculation of Rðtji; nhÞ for t > h should consider all the probabilities of degradation transition occurrences in every single h. First, we continue to compute one-step transition probability for a single h. For a large M, the length of the d is small, and we can approximate its degra-

Condition Monitoring Operating age

Degradation state

Matrix-based approximation method SMDP framework Feedback control Health measures

Control policy

Optimal maintenance decision Fig. 1. Health prediction and maintenance policy scheme.

C. Duan et al. / Mechanical Systems and Signal Processing 111 (2018) 285–302

289

dation level by the mid-point of the intervals. The first probability term in (4) can be computed by the cumulative density function of Gamma process in the interval with state j, for which the three scenarios are given by  When i; j 2 f0; 1; . . . ; M  1g and i ¼ j, then:

Z

PrðZ ðkþ1Þh ¼ hj jn > ðk þ 1Þd; Z kd ¼ hi Þ ¼

uDv ðkhÞ

ðjiÞdþ2d

xDv ðkhÞ1 expðuxÞdx

ð5Þ

uDv ðkhÞ xDv ðkhÞ1 expðuxÞdx CðDv ðkhÞÞ

ð6Þ

CðDv ðkhÞÞ

0

 When i; j 2 f0; 1; . . . ; M  1g and j > i, then:

Z

PrðZ ðkþ1Þh ¼ hj jn > ðk þ 1Þd; Z kd ¼ hi Þ ¼  When i 2 f0; 1; . . . ; M  1g, j ¼ M then:

ðjiÞd2d

Z

PrðZ ðkþ1Þh ¼ hj jn > ðk þ 1Þd; Z kd ¼ hi Þ ¼

ðjiÞdþ2d

þ1

ðjiÞd2d

uDv ðkhÞ xDv ðkhÞ1 expðuxÞdx CðDv ðkhÞÞ

ð7Þ

where Dv ðkdÞ ¼ v ððk þ 1ÞdÞ  v ðkdÞ. The second term in (4) can be evaluated as

Z Prðn > ðk þ 1Þhjn > kh; Z kh ¼ hi Þ ¼ exp 

ðkþ1Þh

!

h0 ðtÞ  wðidÞ

ð8Þ

kh

where id is the mid-point of state i. Therefore, we have transition matrix representing each small degradation time interval h.

KðnhÞ ¼ ðKij ðnhÞÞ

ð9Þ

i; j 2 X; 0 6 n 6 N  1

 The system is subject to random failure which is denoted by state F, where N is maximum number of intervals and Nh ¼ T. when system goes directly to failure state in the next time interval h from an operating state ði; nhÞ, the transition probability can be expressed as

Pi;F ðnhÞ ¼ 1 

M X Kij ðnhÞ

ð10Þ

jPi

Due to the incorporating the age factor and degradation, the one-step transition probability matrix is non-homogeneous with respect to the monitoring time. Once we obtain the one-step transition of PHM, we consider the probability the system in state ðj; nhÞ will not fail at next two time intervals ðn þ 1Þh and ðn þ 2Þh is

Prðn > ðn þ 2Þh; jn > nh; Z nh ¼ hi Þ ¼ Kii ðnhÞKii ððn þ 1ÞhÞ þ Kii ðnhÞKiðiþ1Þ ððn þ 1ÞhÞ þ Kiðiþ1Þ ðnhÞKðiþ1Þðiþ1Þ ððn þ 1ÞhÞ þ    þ KiM ðnhÞKMM ððn þ 1ÞhÞ

ð11Þ

¼ Ki ðnhÞ  Kððn þ 1ÞhÞ where Ki ðnhÞ is the i th row of KðnhÞ. We found that probability expression can be simplified by matrix operations. Inspired by this result, a matrix is employed to represent the transition probabilities at each time point of entire life of the system, where any state ðZ nh ; nhÞ can make transition to arbitrary state ðZ ðnþ1Þh ; ðn þ 1ÞhÞ represented by probability matrix Kij ðnhÞ, or directly to failure state by probability ðI  Ki ðnhÞÞ1.

3 0 Kð0Þ 0  0  0 ðI  Kð0ÞÞ1 7 6 0 KðhÞ    0  0 ðI  KðhÞÞ1 7 6 0 7 6 7 6 .. 7 6 7 6    0   . 7 6 7 6 0  0 KðnhÞ    0 ðI  KðnhÞÞ1 7 6 0 7 D¼6 7 6 .. 7 6      . 7 6  7 6 7 6 0 0  0    KððN  2ÞhÞ ðI  KððN  2ÞhÞÞ1 7 6 0 7 6 6 0 0 0  0    KððN  1ÞhÞ ðI  KððN  1ÞhÞÞ1 7 5 4 2

0

0

0



0



0

ð12Þ

1

where I is ðM þ 1Þ  ðM þ 1Þ identity matrix; 0 is ðM þ 1Þ  ðM þ 1Þ zero matrix; the vector matrix 1 has each of its ðM þ 1Þ elements equal to unity. It should be noted this manipulation in Eq. (12) can be used to easily calculate the health measures in subsequent derivations. To simplify the matrix D and reduce computational complexity, a smart partition approach which presented by Brook and Evans (see detail in [26,27]) is employed to deal with computation problem. The matrix can be partitioned and expressed as

290

C. Duan et al. / Mechanical Systems and Signal Processing 111 (2018) 285–302 ⎡



⎢ ⎢⎣

⎥ ⎥⎦

where R is the N  ðM þ 1Þ dimensional matrix in D and PF is a vector of N  ðM þ 1Þ elements, then

PF ¼ ðI  RÞ1 Since D is a stochastic transition matrix and hence its row sums are equal to unity. It follows that

It is not difficult to see that Rr consists of non-zero ðM þ 1Þ  ðM þ 1Þ block matrices located r positions to the right from the main block diagonal with form

KðnhÞ  Kððn þ 1ÞhÞ      Kððn þ r  1ÞhÞ where n ¼ 0; 1; 2; . . . ; N  1, r 6 N  n. In the light of partitioned matrix, the matrix expressions of the conditional reliability function for the system at state ði; nhÞ is given by

Rðrhji; nhÞ ¼ Rðrhjn > nh; Z nh ¼ hi Þ ¼ Prðn > ðn þ rÞhjn > nh; Z nh ¼ hi Þ

ð13Þ

¼ 1  pi ðnÞðI  Rr Þ1 Another important measure to evaluate the health condition of system is residual life distribution. Let residual life of the system at state ði; nhÞ be RLi;nh , then probability mass function of RLi;nh is given by

PrðRLi;nh ¼ rhjn > nh; Z nh ¼ hi Þ ¼ PrðRLi;nh 6 rhjn > nh; Z nh ¼ hi Þ  PrðRLi;nh 6 ðr  1Þhjn > nh; Z nh ¼ hi Þ

ð14Þ

¼ pi ðnÞðI  Rt Þ1  pi ðnÞðI  Rt1 Þ1 ¼ pi ðnÞRr1 ðI  RÞ1 where pi ðnÞ is 1  NðM þ 1Þ matrix with ðnðM þ 1Þ þ iÞth element equal to 1, and zeros elsewhere. Hence, according to Eq. (14), the mean residual life of system at state ði; nhÞ can be derived as below

MRLi;nh ¼ EðRLi;nh jn > nh; Z nh ¼ hi Þ ¼

Nr X PrðRLi;nh ¼ rhjn > nh; Z nh ¼ hi Þ  rh

ð15Þ

r¼1

¼ hpi ðnÞðI  RÞ1 1 Some other health measures, i.e. conditional hazard rate function, conditional cumulative hazard rate function or cumulative hazard rate can be derived in a similar way. One of advantages of this approximation method is that the moment generating function of residual life RL can be obtained easily. By that, the moments of residual life distribution can be figured out by differentiating moment generating function. Due to the space limitation, interested readers can refer to for the derivation of moments of RL [26]. It can be found that the health measures are the functions of matrix R, which can be known at the time of PHM parameters obtained. Through simple multiplication of matrix R, the system actual health status can be known. In the following subsection, we formulate maintenance problem and solve it in a SMDP framework. 3.2. Optimal maintenance policy For the smooth system operation and effective maintenance planning, it is important to develop an efficient maintenance policy for a degrading system. Using the information when the system will be likely fail and when a preventive maintenance should take place, the maintenance engineers are able to prepare the maintenance ahead of the time when maintenance starts (e.g. ordering necessary parts for maintenance). In this subsection, the predicted health measures are integrated into the SMDP framework to determine when a preventive maintenance should be given. When a new CM data is obtained, the machine health status and corresponding maintenance decision can be given simultaneously. We now develop a computational algorithm in the SMDP framework to address the maintenance optimization problem. The covariate process is monitored at each inspection epoch. Suppose that the machine is operational at the n th inspection epoch ðnDÞ, and the hazard rate is computed using Eq. (1). We partition the interval of the hazard rate ½0; f  into a fixed large L subintervals, where f is suitably selected upper bound for the hazard rate. In [16], it has been proved that there exists one and only 

one f that minimizes the long-run expected average cost, therefore the largest control limit f has no influence on the compu-

C. Duan et al. / Mechanical Systems and Signal Processing 111 (2018) 285–302

291



tational result of the optimal control limit f [21]. The number of subintervals, L should be selected properly. The more accurate results will be obtained by choosing the larger number of subintervals, L, however the computational time increases. Accordingly, the number of subintervals should be selected big enough to get sufficiently precise results in a reasonable time. The objective of maintenance policy is to determine optimal decision variables ðW  ; T  Þ. Using SMDP state space defined previously, for the cost minimization problem, the SMDP can be determined by the following quantities: 1. P m;k is the probability that the system will be in state k 2 S at the next decision epoch given the current state is m 2 S. 2. sm is the expected sojourn time until the next decision epoch given the current state is m 2 S. 3. C m is the expected cost incurred until the next decision epoch given the current state is m 2 S. Once all these quantities are defined, for a fixed preventive maintenance level W and age level T, the long-run expected average cost gðT; WÞ can be obtained by solving the following system of linear equations [28]:

V m ¼ C m  gðT; WÞsm þ

X Pm;k  V m

ð16Þ

k2S

V j ¼ 0; for an arbitrary selected single state; j 2 S where V m values are the unknown values together with gðT; WÞ which are determined by solving Eq. (16). fV m g values are related to the so-called relative values (see [28], pp. 168–169 for more details). Next, the derivation of the SMDP quantities is considered, i.e. transition probabilities, expected sojourn times and expected costs. The SMDP transition probabilities for the states defined above are calculated as follows: 1. Assume that the system is in the state ðz; ðn  1ÞDÞ, where hððn  1ÞD; zÞ < W and nD < T. Then the transition probability to the next state ðz0 ; nDÞ, where, hðnD; z0 Þ < W, is given by

Pðz;ðn1ÞDÞ;ðz0 ;nDÞ ¼ PrðZ nD ¼ hz0 ; n > nDjZ ðn1ÞD ¼ hz ; n > ðn  1ÞDÞ

ð17Þ

It is the probability that the value of the hazard rate will not exceed the preventive maintenance level W, the age of system is below the T and system will not fail in the next inspection interval. Then, this probability can be calculated as

Pðz;ðn1ÞDÞ;ðz0 ;nDÞ ¼ PrðZ nD ¼ hz0 ; n > nDjZ ðn1Þ D ¼ hz ; n > ðn  1ÞDÞ ¼ PrðZ nD ¼ hz0 jn > nD; Z ðn1Þ D ¼ hz ; n > ðn  1ÞDÞ

ð18Þ

 Prðn > nD; Z ðn1Þ D ¼ hz ; n > ðn  1ÞDÞ ¼ Pz;z0 ðDÞ  RðDjz; ðn  1ÞDÞ

The first probability term can be assessed by (5)–(7), and the second term can be obtained by (13). 2. If the hazard rate crosses the preventive maintenance level (i.e. hðnD; z0 Þ P W) or the age of the machine exceeds the age level T, where hððn  1ÞD; zÞ < W, then the system goes to the PM state and the transition probability is given by

Pðz;ðn1ÞDÞ;ðPMÞ ¼

8X < PrðZ nD ¼ hz0 jZ ðn1ÞD ¼ hz ; n > ðn  1ÞDÞ  RðDjz; ðn  1ÞDÞ; nD < T :

z0

RðDjz; ðn  1ÞDÞ;

ð19Þ

nD P T

where hz0 are the covariate values that cause the hazard rate to exceed the preventive maintenance level. 3. If the machine failure occurs, then the corrective maintenance will be performed which brings the system to the as good as new condition. Therefore, the next state will be ð0; 0Þ and the transition probability Pðz;ðn1ÞDÞ;ð0;0Þ from state ðz; ðn  1ÞDÞ to state ð0; 0Þ, where 0 6 hððn  1ÞD; zÞ < W, nD < T, can be calculated as follows:

Pðz;ðn1ÞDÞ;ð0;0Þ ¼ 1  RðDjz; ðn  1ÞDÞ; n P 1

ð20Þ

4. If the system is in the PM state, then mandatory preventive maintenance is performed and the system goes back to state ð0; 0Þ. We have

PðPMÞ;ð0;0Þ ¼ 1

ð21Þ

Then the expected sojourn times are calculated as

sðz;ðn1ÞDÞ ¼ Eðsojourntimejz; n  1Þ ¼ Cðz; ðn  1ÞDÞ þ ð1  RðDjz; ðn  1ÞDÞÞ  T F

sPM

¼ EðsojourntimejPMÞ

ð22Þ

¼ TP

Cðz; ðn  1ÞDÞ is the expected sojourn time for system operating during inspection interval, and P^ Cðz; ðn  1ÞDÞ ¼ nt¼1 Rðthjz; ðn  1ÞDÞ  h. where

292

C. Duan et al. / Mechanical Systems and Signal Processing 111 (2018) 285–302

The expected cost incurred at each SMDP state is computed, and given by

C ðz;ðn1ÞDÞ ¼ Eðcostjz; n  1Þ ¼ C I þ ð1  RðDjz; ðn  1ÞDÞÞ  ðC F þ KÞ ¼ Eðcostjz; nÞ

C PM

ð23Þ

¼ CP þ K After obtaining the SMDP quantities P m;k , sm and C m , we used embedded technique to support our integrated framework. For convenient computation, we transform the first equation in Eq. (16) to following form:

8 V 1 ¼ C 1  gðT; WÞs1 þ ðP 11 V 1 þ P12 V 2 þ    þ P1H V H Þ > > > < V 2 ¼ C 2  gðT; WÞs2 þ ðP 21 V 1 þ P22 V 2 þ    þ P2H V H Þ >  > > : V H ¼ C H  gðT; WÞsH þ ðPH1 V 1 þ PH2 V 2 þ    þ PHH V H Þ

ð24Þ

where H denotes the number of liner equations, H ¼ ðn þ 1Þ  ðM þ 1Þ þ 1 and n D ¼ T  . These equations can be rewritten into matrix form:

2 6 6 6 6 4

3

C 1 C 2 .. .

2

P11  1 P 12  7 6 P P  1   21 22 7 6 7¼6 .. 7 6 5 4 .

C Lþ2

PH1

PH2

P1H P1H

   PHH

2 3 s1 6 6 s1 7 76 6 7 .. 76 6 5 . 6 4 sH

V1 V2 .. . VH

3 7 7 7 7 7 7 7 5

ð25Þ

gðT; WÞ

There have H equations with H þ 1 unknown variables ðV 1 ; . . . ; V H ; gðT; WÞÞ in the matrix, we need to add a new equation V j ¼ 0, for some j 2 S, and extend (25) to,

3 2 P11  1 V1 P12    P1H 7 6 6 V P P  1    P1H 7 6 6 2 21 22 7 6 6 7 6 6 .. .. 7¼6 6 . . 7 6 6 7 6 6 PH2    PHH 4 V Hþ1 5 4 PH1 gðT; WÞ 0 1 0  2

s1

31 2

7 s1 7 7 .. 7 . 7 7 7 sH 5 0

3 C 1 7 6 6 C 1 7 7 6 6 .. 7 6 . 7 7 6 7 6 4 C H 5 0

ð26Þ

Then the unknown variables ðW ; T Þ and corresponding gðW ; T Þ can be solved by the matrix (26). To reduce the computational effort and human operation error, we embed matrix (26) and matrix R into the online monitor fixed on the machine. The prediction process is shown in Fig. 2. The PHM parameters are estimated offline and the online monitor collects the new observations. Through the embedded matrices, the health measures (13)–(15) and maintenance decision ðW  ; T  Þ can be quickly obtained. 4. PHM using multi-state degradation process To predict health status and determine maintenance decision variable, we only need to derive two embedded matrices: matrix R and matrix (26). The PHM using multi-state degradation process with conventional maintenance policies are developed in this section. The multi-state degradation process is represented by a continuous time Markov chain and conventional maintenance policies are considered as three well-known policies in our integrated framework. The first one is hazard-rate based maintenance which performs preventive maintenance as the hazard rate exceeds a certain level until its maximum life

Parameters of PHM

Off-line

Health measures On-line

New observations

Embedded matrices Maintenance decision Fig. 2. Flowchart of integrated framework.

C. Duan et al. / Mechanical Systems and Signal Processing 111 (2018) 285–302

293

limit. The second one is age-based maintenance policy which correctively maintains the system at prescheduled maintenance time, and the last one is run-to-failure maintenance policy which only performs corrective maintenance at the time of failure. The continuous time Markov chain representing multi-state degradation process has the state space X ¼ f0; 1; . . . ; Mg with following characteristics:  M can be a large number of states which leads to sufficient accuracy. State 0 indicates the best possible state, M is the worst state and is absorbing state.  The state may transit at any time between two inspections, not just at inspection epochs.  By relaxing the sequential degradation, the state can make transition to neighbor (next) state or jump to any higher state, e.g. i to i þ j, i; i þ j 2 X, j P 1. The Z process is a monotonic continuous time Markov chain, in which sate process is non-decreasing with probability 1, i.e. qij ¼ 0 for all j < i, and its instantaneous transition rates qij , i, j 2 X, are defined by

qij ¼ limh!þ1 X qij ¼  qij ;

PðX h ¼ jjX 0 ¼ iÞ < þ1; h

i–j i ¼ j;

ð27Þ

i–j

Since the degradation process is happening in any time unit throughout the lifetime of the system, we consider a sufficiently small time intervals h for approximation. We first note that since the amount of a time until a transition occurs is exponentially distributed, it follows that the probability of two or more transitions in a time h is o(h) [29]. The two or more transitions happened in a small time interval h can be neglected. Only zero or one possible transition occurrence is considered in the interval h. Thus, for PHM model, the one-step transition probability Kij ðnhÞ can be derived by  When Z t transits from state ði; nhÞ to state ðj; ðn þ 1ÞhÞ, j – i and i; j 2 0; 1; . . . ; M, suppose transition occurs at u 2 ðnh; ðn þ 1ÞhÞ, then:

Kij ðnhÞ ¼ Prðn > ðn þ 1Þh; Z ðnþ1Þh ¼ jjn > nh; Z nh ¼ iÞ ! Z nhþu Z h exp  hðt; iÞdt  k2i u expðki uÞ ¼ Pij 0

nh

Z  exp 

ðnþ1Þh

hðt; jÞdt

!

ð28Þ



  expðkj ðh  uÞÞ  ð1 þ kj ðh  uÞÞ du

nhþu

 When Z t remains in state ði; nhÞ in the next time interval with ði; ðn þ 1ÞhÞ, then

Kii ðnhÞ ¼ Prðn > ðn þ 1Þh; Z ðnþ1Þh ¼ ijn > nh; Z nh ¼ iÞ ! Z ðnþ1Þh hðt; iÞdt  ðki h  1Þ expðki hÞ ¼ exp 

ð29Þ

nh

The proof of one-step transition probability Kij ðnhÞ is in Appendix A. Then we have transition matrix representing each small monitoring time interval.

KðnhÞ ¼ ðKij ðnhÞÞ i; j 2 X; 0 6 n 6 N  1

ð30Þ

 When the joint degradation process transits from state ði; nhÞ to failure state F in the next time interval h, then where Nh ¼ T. P the transition probability is given by P i;F ðnhÞ ¼ 1  M jPi Kij ðnhÞ. Here we have all the elements to fill matrix (12) to obtain embedded matrix R. Next, to obtain the embedded SMDP matrix, we first need to derive the SMDP quantities, i.e., transition probabilities, sojourn times and expected costs. The algorithm developed in Section 3.2 can also be used to calculate the minimum long-run expected average cost per unit time for the hazard-rate based maintenance policy. However, since the derivation of SMDP quantities of hazard-rate based maintenance is quite similar to the proposed model, we only present derivation of optimizing other two polices. Because the age-based maintenance does not take CM information into consideration, we need to define the new state space of SMDP. When the system is inspected at n th inspection epoch, the SMDP process is defined to be in k for k 2 ½0; n , n D is age control limit. When the system is performed PM action, the SMDP is defined to be in state PM. So the state space of SMDP can be denoted by X ¼ fkjk 2 ½0; n g [ fPMg. According to the SMDP iteratively linear equations in (16), we have

294

C. Duan et al. / Mechanical Systems and Signal Processing 111 (2018) 285–302

8 V 0 ¼ C 0  gðn Þs0 þ P00 V 0 þ P01 V 1 > > > >  > > < V 1 ¼ C 1  gðn Þs1 þ P10 V 0 þ P12 V 2  > > > V PM ¼ C PM  gðn ÞsPM þ PPM0 V 0 > > > : V0 ¼ 0

ð31Þ

the embedded SMDP matrix is given by

3 2 P00  1 P01 V0 7 6 6 V 1 6 1 7 6 P10 7 6 6 6 .. 7 6 .. 6 . 7¼6 . 7 6 6 7 6 6 0 4 V PM 5 4 1 gðn Þ 1 0 2

 

0

0

s0

31 2

7 0  0 s1 7 7 .. 7 . 7 7 7    0 1 sPM 5    0

P12  

3 C 0 7 6 6 C 1 7 7 6 6 .. 7 6 . 7 7 6 7 6 4 C PM 5 0

ð32Þ

The SMDP quantities: one-step transition probabilities can be derived as

Pkkþ1 ¼ RðDjkDÞ Pk0

¼ 1  RðDjkDÞ

PPM0 ¼ 1 where RðDjkDÞ ¼

sk

PM 0

RðDjz; kDÞ, and the expected sojourn times are given by

¼ EðsojourntimejkDÞ ¼ CðkDÞ þ ð1  RðDjkDÞÞ  T F

ð33Þ

sPM ¼ EðsojourntimejPMÞ ¼ TP where CðkDÞ ¼

C0

Pn^

t¼1 RðthjkDÞ,

and the expected costs are given by

¼ Eðcostj0Þ ¼ ð1  RðDjkDÞÞ  ðC F þ KÞ

ð34Þ

C PM ¼ EðcostjnÞ ¼ CP þ K

For run-to-failure maintenance policy, which does not consider both degradation state and operating age, we use renewal theory, the long-run expected average cost per unit time for this policy Cðn DÞ is given by

 ¼ CðTÞ

Expected Cycle Cost EðCCÞ ¼ Expected Cycle Length EðCLÞ

ð35Þ

 The expected cycle cost can be obtained by EðCCÞ ¼ ðK þ C F Þ  where EðCLÞ is the maximum life limit T.

Positioning accuracy (um)

40 35 30 25 20 15 10 5 0 0

5

10

15

20

Insepction epoch Fig. 3. Degradation paths of the feed subsystem.

25

R T 0

ðhðt; Z t ÞÞdt.

C. Duan et al. / Mechanical Systems and Signal Processing 111 (2018) 285–302

295

5. Application and results 5.1. Case study 5.1.1. Data acquisition and model parameter estimation To evaluate the performance of the developed approach for mechanical system, it was tested using real degradation data from a feed subsystem of a boring machine. The real monitoring data were obtained by measuring positioning accuracy of the feed subsystem, which collected every 200 h by laser interferometer under standard condition in ISO 230-2: 2014. This data set consists of 24 histories, 11 of them ended with an observable failure and the remaining 13 were suspended. The recorded data is from normally operating condition of feed subsystem and its degradation paths are depicted in Fig. 3. The positioning accuracy serves as degradation indicator of the feed subsystem, and this degradation will increase the hazard rate of feed subsystem and eventually result in system failure. The degradation process is modeled as covariate in PHM in accordance with Section 2. The parameters of PHM are estimated by the following procedures. For n independently observed data histories, let ðT i ; C i ; Z is ; s 6 T i Þ, i ¼ 1; 2; . . . ; n be the i th degradation history, which is ended by censoring or the failure occurrence at T i . Z is denotes the its positioning accuracy at time s. The censoring indicator C i is defined as



1 if failure occurs

Ci ¼

0 if the system is censored

Define the parameters of hazard rate function as h ¼ fb; a; cg. The likelihood of the sample is given by

LðhÞ /

n Y

ðiÞ

Ci

ðiÞ

hðT i ; Z T i ; hÞ  SðT i ; Z T i ; hÞ

1C i

i¼1

where

ðiÞ hðT i ; Z T i ; hÞ

ðiÞ

is hazard function and SðT i ; Z T i ; hÞ is the reliability function. Due to that the hazard rate increases with time,

we consider a parametric PHM with baseline Weibull hazard function:

h0 ðtÞ ¼

bt b1

ab

where a is scale parameter and b is sharp parameter. The covariate is assumed to be exponential function having form ðiÞ

wðZ t Þ ¼ expðcZ t Þ, c is the regression coefficient. Then hazard function hðT i ; Z T i ; hÞ is given by



bT b1

ðiÞ ðiÞ h T i ; Z T i ; h ¼ ib exp cZ T i

a



ðiÞ Let 0 ¼ T i0 < T i1 <    < T idi ¼ T i be the actual inspection points for the i th history. The S T i ; Z T i ; h can be calculated by

( d 1 Z i X ðiÞ SðT i ; Z T i ; hÞ ¼ exp  j¼0

T iðjþ1Þ

T ij



ðiÞ exp cZ T i dðt=aÞb

)

The total log-likelihood l ¼ lnðLðhÞÞ can be obtained by

l ¼ lðb; a; cÞ / nf lnðb=aÞ þ ðb  1Þ 

Z T iðjþ1Þ d i 1 X j¼0

T ij

nf X i



ðiÞ exp cZ T i dðt=aÞb

lnðT i =aÞ þ

nf X

cZ TðiÞi

i

ð36Þ

where nf is the number of the failures. The parameter estimation will be reprocessed since the new data history is available or the working environment changes. By maximizing the log-likelihood function presented in Eq. (36), we obtain the estimates for PHM parameters, where

a ¼ 2:05  103 , b ¼ 4:63, c ¼ 0:281. The shape parameter v ðtÞ and scale parameter u for covariate process are estimated

as v ðtÞ ¼ 1:15  102 t, u ¼ 4:86 [13]. The inspection interval D ¼ 200 h, and the predetermined maximum life limit T is ^ and L are considered as 64. We also consider the following parameters in our case to illustrate proset as 5000 h. The M, n posed maintenance policy: C I ¼ $50, C P ¼ $1560, C F ¼ $6780, K ¼ $2000; T P ¼ 40 h, T F ¼ 40 h. In order to validate the fitted model, we use Anderson-Darling (AD) test and the probability plot to assess whether the increments of positioning accuracy degradation follow the gamma process and compare the cumulative distribution function of fitted PHM with value from Kaplan-Meier estimator. We found that the AD value is 0.284, and p-value is above 0.500. From Fig. 4a and b, the probability plot and cumulative distribution function graph show that the fitted model is well suited for the data.

296

C. Duan et al. / Mechanical Systems and Signal Processing 111 (2018) 285–302 0.5 0.995 0.99

Cumulative distribution function

0.95 0.9

Probability

PHM Kaplan–Meier estimator

0.45

0.75 0.5 0.25 0.1 0.05 0.01 0.005

0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

5

0

2

4

6

8

Positioning accuracy increment

10

12

14

16

18

20

Insepction epoch

State (3, 200) State (9, 200) State (15, 200)

1

-3

MRL

x 10

2

0.8

Probability

Conditional Reliability Function

Fig. 4. (a) Probability plot of the increments for the degradation data; (b) cumulative distribution function estimated by fitted PHM and Kaplan-Meier estimator.

0.6

1.5 1 0.5

0.4

0 20

0.2

15

1th inspection epoch

0

0

5

10

15

Ins 20

25

30

Insepction epoch

35

40

45

50

ep

8000 6000

10

cti

on

4000

5

ep

oc

h

2000 0

0

L MR

Fig. 5. (a) Conditional reliability function at 1th inspection epoch; (b) mean residual life and residual life distribution.

5.1.2. Results of health measures prediction and maintenance policy Using the approximation and embedded matrices in Section 3, the related health measures and maintenance control limits of the feed subsystem can be obtained. To illustrate the performance of proposed prediction method, a failure history with failure time 4320 h with states ð3; 200Þ, ð9; 800Þ, ð15; 1600Þ, ð24; 2400Þ, ð36; 3200Þ monitored respectively at the 1th, 4th, 8th, 12th, 16th inspection epochs has been selected for this purpose. Fig. 5a shows the conditional reliability prediction of system at the 1th inspection with three different states ð3; 200Þ, ð9; 200Þ, ð15; 200Þ. The speed of decreasing in reliability at future time are different with different initial states. We can notice that when feed subsystem operates with an older state at same time point, the conditional reliability decreases much more dramatically. We also illustrate the mean residual life of the feed subsystem and its residual life distribution with the same failure history at the 1th, 4th, 8th, 12th and 16th inspection epochs, respectively in Fig. 5b. The sharp of residual life distribution shows a narrow trend with aging, and corresponding mean residual life reduction is greater after 12th inspection when the covariate value jumps to higher degradation state. By calculating moment generating function of residual life [26], we found the variance of residual life in each corresponding time are 645,440, 368,926, 160,978, 62,540, respectively, which verifies the shorter prediction interval of the residual life and the higher failure probability with deterioration. Same failure history is selected for the maintenance decision making, where the optimal control limits are obtained as W  ¼ 62:5  105 and T  ¼ 3200 h with 347.5 of minimum expected average cost per inspection cycle. We illustrate the procedure of applying the proposed optimal maintenance policy to the data history from feed subsystem with the optimal decision variables (W  , T  ) in Fig. 6a. The stars show the hazard rate at each inspection epoch. Once the hazard rate exceeds the preventive maintenance level or the age of system reaches the age level, PM will be performed. For the data history in Fig. 6a, the hazard rate exceeds at the 15th inspection epoch before age control limit due to the dramatically increasing of degradation.

297

C. Duan et al. / Mechanical Systems and Signal Processing 111 (2018) 285–302 -4

x 10

hazard rate

4.5

Optimal control limits

Age control limit(Inspection epoch)

5

4 3.5

Hazard rate

W* 3 2.5 *

2

T

1.5 1

17

16

15 10

0.5 0

5 0

2

4

6

8

10

12

14

16

18

C /C F

20

P

Insepction epoch

2.6 0

2.4

3.2

3.4

it 2.8 ol limx 10-4 on tr c e rat ar d Ha z 3

Fig. 6. (a) The illustration of the proposed approach to monitor the hazard rate at each inspection epoch; (b) optimal control limits for different cost ratios C F =C P :.

When the corrective maintenance cost is high, which the unexpected economic (production) loss caused by sudden breakdown, the changes in optimal control limits are demonstrated in Fig. 6b. The maintenance control limits show a decreasing trend as the cost ratio of the expected cost of corrective maintenance to the expected cost of preventive maintenance (i.e. C F =C P ) increases. This result corresponds to an intuitive result in maintenance practice: a higher risk of failure results in a more conservative condition-based maintenance policy. The approach for deriving the health measures and optimal decision variables requires an appropriate discretization level to get the sufficiently precise result. A basic approach of stopping rule presented in [21] is used to determine the discretization level. Using the proposed model in Section 3, the results for different discretization levels are demonstrated in Table 1. When the number of levels exceeds 32, the health measures, maintenance decision variables and corresponding expected average cost have not changed much, and the same situation also happens when the levels continue to increase. By the stop^ ¼ 64, L ¼ 64 are sufficient and the discretization levels do not need to be chosen ping rule, we can conclude that M ¼ 64, n very large.

5.2. Another case example To further verify the proposed approach, we study the real GaAs laser degradation data presented by Meeker and Escobar (refer to [31]). The data set describes laser’s operating current. When the current increases, the device are more prone to fail. The operating currents are inspected every 100 h up to 2000 h or until failure happens. The maintenance parameters are considered as: C I ¼ $100, C P ¼ $1000, C F ¼ $5000, K ¼ $2000; T P ¼ 20 h, T F ¼ 20 h. Using the parameter estimation method presented in Section 5.1.1, the estimates for PHM are obtained, where a ¼ 8:48  102 , b ¼ 1:42, c ¼ 0:35. The shape param-

eter and scale parameter for covariate are v ðtÞ ¼ 4:77  102 t, u ¼ 19:5, respectively. The goodness-of-fit test can be found in [2], which shows an excellent fit of the chosen model. Using approach proposed in Section 3, the health measures and maintenance decision variables can be obtained. The mean residual life and residual life distribution in states ð6; 200Þ, ð11; 400Þ, ð24; 600Þ, ð25; 800Þ, ð34; 1000Þ monitored at the 2th, 4th, 6th, 8th, 10th inspection epochs are depicted in Fig. 7a, and the optimal maintenance policy are illustrated with that data history in Fig. 7b. We can observe that the hazard rate control limit does not prevent sharply increasing of potential failure before age control limit, which is different with previous failure history that age control limit does not prevent potential failure before hazard rate control limit. This signifies that hazard rate control limit and age control limit have jointly played an important role in reducing the risks of the systems.

Table 1 The effect of discretization level on results of case study of feed subsystem. Discretization level

16

32

64

128

CRF at 16th inspection MRL at 16th inspection

0.732 1118 6.563

0.731 1112 6.25

0.733 1128 6.25

0.733 1128 6.25

3400 365.4

3200 352.5

3200 347.5

3200 347.5

Hazard rate limit (104 ) Age control limit Average cost

298

C. Duan et al. / Mechanical Systems and Signal Processing 111 (2018) 285–302 -3

1.5

x 10

MRL

-3

x 10

hazard rate

W*

5

Hazard rate

Probability

6

4 3 2 1

1

0.5

0 10 8

Ins

ep

2000

cti

on

1000

4

ep

T*

1500

6

oc

h

500 2

0

MR

L

0

0

2

4

6

8

10

12

14

16

Insepction epoch

Fig. 7. (a) Mean residual life and residual life distribution; (b) the illustration of the proposed approach to monitor the hazard rate at each inspection epoch.

Table 2 The effect of discretization level on the results of case study of GaAs laser. Discretization level

16

32

64

128

CRF at 11th inspection MRL at 11th inspection

0.708 347 12.5

0.711 355 10.94

0.711 355 10.94

0.711 355 10.94

1000 165.4

1100 162.5

1100 162.5

1100 162.5

Hazard rate limit (104 ) Age control limit Average cost

We also investigate the effect of discretization level on the results of health measures and maintenance decision variables, which is shown in Table 2. It can be seen that when the discretization level is above 32 in this case, the results almost remain unchanged. Same situation happens when discretization level continues to increase. Hence, the discretization level 64 is sufficient to get the precise result. 5.3. Comparison with discrete multi-state degradation model We again use the real positioning degradation data set and GaAs laser data set considered in the previous subsections to compare with PHM using multi-state degradation process presented in Section 4. To fit the continuous time Markov chain, we now identify each state of observations. Here, k-means procedure [30] is used to partition the observation sequences extracted from a data set into k disjoint subsets. The k-means approach is an unsupervised pattern classification method and able to be applied directly to industrial environments without the need of being trained by data measured on a machine under a fault condition. Further advantages of the method are its ease of programming and the accomplishment of a good trade-off between achieved performance and computational complexity. Our objective is to convert original feature values to discrete levels in the sense that the distribution of the feature does not change very much. To cope with the challenge of using k-means approach, i.e., the problem of finding the optimal value of the number of states, the modified version of the nonparametric test called Mann-Kendall [32,1], where weights are considered according to the distance between points, can be used. Now, for n sequences of CM data, this index can be calculated as:

KðmÞ ¼

dk X dk n X X



ðkÞ ðkÞ ðT kj  T ki Þ  sgn Z T kj  Z T ki

ð37Þ

k¼1 i¼1 j¼1;j>i

where

8 > < 1 if x < 0 sgnðxÞ ¼ 0 if x ¼ 0 > : 1 if x > 0 ðkÞ

and Z T kj is degradation state at monitoring time T kj , dk is the number of measurements of history k, n is the total number of degradation histories, and m is the number of states. Now, the best value for m is selected in the sense that increasing it does not change very much the index KðmÞ.

299

C. Duan et al. / Mechanical Systems and Signal Processing 111 (2018) 285–302 4

35 History #1 History #2 Discretized state

30

3

degradation path

Percent change in K(m)

3.5

2.5 2 1.5

25

20

15

1

10

0.5

5

0 0

5

10

15

20

0

25

0

2

4

6

Number of states

8

10

12

14

16

18

20

22

Insepction epoch

Fig. 8. (a) Change in the Mann-Kendall monotonicity test versus the number of degradation states and (b) the illustration of fitted discretized states for degradation histories #1 and #2 with m ¼ 11.

Fig. 8a presents the change in the index KðmÞ with increasing the number of states of positioning data. It can be seen that index stays almost unchanged for m > 11. Therefore, the number of possible CM outcomes is determined as 11, that is M ¼ 10. To illustrate the effectiveness of the degradation discretization method, Fig. 8b shows the fitted discretization paths for histories #1, #2. The excellent agreement among the real data and the discretized counterparts confirms the effectiveness of the proposed method. Through the k-means approach, the optimal M for laser current data set is 12. Using the data histories, the estimated transition rates of the state process for positioning data, Q p , and laser current data, Q c , are given in (38) and (39).

2

6 6 6 6 6 6 6 6 6 6 6 3 Q p  10 ¼ 6 6 6 6 6 6 6 6 6 6 4

2 6 6 6 6 6 6 6 6 6 6 6 6 6 3 Q c  10 ¼ 6 6 6 6 6 6 6 6 6 6 6 6 6 4

8:02

5:33

0

8:56

0

0

0

0

0

0

0

0

0

0

0

0

0 0

0 0

0

0

6:25 6:25 0

2:69

0

0

0

0

0

0

0

5:02

3:22

0:32

0

0

0

0

0

0 0

0

0

0

0

0

0

0

0

0

0

0

0

0 0

0 0

0

0

3

0 7 7 7 7:22 5:94 1:28 0 0 0 0 0 0 7 7 0 7:36 5:26 2:1 0 0 0 0 0 7 7 7 0 0 7:50 5:66 1:84 0 0 0 0 7 7 7 0 0 0 6:72 4:77 1:95 0 0 0 7 7 0 0 0 0 7:12 5:02 2:1 0 0 7 7 7 0 0 0 0 0 6:69 5:55 1:14 0 7 7 0 0 0 0 0 0 6:82 5:25 1:57 7 7 7 0 0 0 0 0 0 0 6:72 6:72 5 0

0

0

0

0

0

0

0

0

ð38Þ

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

3

0 7 7 7 7 4:51 4:51 0 0 0 0 0 0 0 0 0 7 7 0 4:60 4:60 0 0 0 0 0 0 0 0 7 7 0 0 6:15 6:15 0 0 0 0 0 0 0 7 7 7 0 0 0 6:22 6:22 0 0 0 0 0 0 7 7 0 0 0 0 7:45 7:45 0 0 0 0 0 7 7 7 0 0 0 0 0 7:18 6:68 0 0 0 0:5 7 7 0 0 0 0 0 0 7:26 7:26 0 0 0 7 7 7 0 0 0 0 0 0 0 8:21 7:21 0 1 7 7 0 0 0 0 0 0 0 0 8:21 7:71 0:5 7 7 7 0 0 0 0 0 0 0 0 0 8:10 8:10 5

5:35 5:35

0 0

0

0

0

0

0

0

0

0

0

0

0

0

ð39Þ

300

C. Duan et al. / Mechanical Systems and Signal Processing 111 (2018) 285–302 2500

10000 Actual residul life Gamma process model Multi-state process model

9000

2000

7000

Prediciton interval

Prediciton interval

8000

Actual residul life Gamma process model Multi-state process model

6000 5000 4000

1500

1000

3000

500

2000 1000 0

0

2

4

6

8

10

12

14

16

18

0

0

2

4

6

8

10

12

Inspection epoch

Inspection epoch

Fig. 9. (a) Feed subsystem residual life with 95% confidence interval; (b) GaAs laser Residual life with 95% confidence interval.

To evaluate the performance of the proposed prediction model, we used 95% confidence interval of residual life as comparison criterion. The PHM with Gamma process covariates is compared with PHM with multi-state process covariates for the two case studies. Figs. 9a and b show the corresponding estimated 95% confidence intervals as well as the actual residual life (ARL) and mean residual life for the investigated failure histories for both Gamma process and multi-state process covariates in PHM, respectively. The results show that for both Gamma process model and multi-state process model, the estimated 95% confidence intervals cover the ARL but the proposed Gamma process model provides considerably shorter interval with lower variability and the mean residual life is closer to ARL. We have also used the rest of 10 failure histories of positioning accuracy data and 6 failure histories of laser current data to compute the 95% confidence interval of residual life and mean residual life for both models. The results reveal that our predicted distribution of residual life is tighter and has smaller variability than that of obtained by multi-state modeling. We also verified the superiority of the proposed approach by comparing the results with the PHM using multi-state process covariate combining hazard rate-based maintenance policy (model 2), PHM using multi-state process covariate combining age-based policy (model 3), and the run-to-failure policy (model 4) which do not make good use of CM information. These policies were described in Section 4 and the long-run expected average cost are obtained. Considering the same parameters as in the previous example, the minimum long-run expected average cost using the convention maintenance policies increased to 385.8, 468.8, 700.6 respectively, which are higher than the optimal average cost obtained using the proposed approach with hazard rate control limit and age control limit. Table 3 shows the long-run expected average cost per inspection cycle and the optimal decision variables for the convention maintenance policies. Same situation has happened when we apply to the laser data, which is shown in Table 4. Finally, we analyzed the algorithm complexity of the proposed approach. There are mainly two steps in the computation: health measures derivations and maintenance decision variables determination. In the first step, health measures can be computed using (13)–(15) and embedded matrix R. The number of operations required for solving health measures is Oð1Þ. In the second step, based on the embedded matrix (26), the algorithm searches the maintenance variables for cycles to determine the optimal values. The number of operations is OðnÞ Hence, the overall time complexity required to evaluate all variables is OðnÞ, which is extremely fast for practical applications. 6. Conclusions and future research In this paper, we proposed an integrated framework for health measures prediction and optimal maintenance policy for degrading mechanical systems subject to condition monitoring and random failure. The gradation process is modeled as Gamma process in covariates of PHM. Although Wu and Ryan [23] solved approximation problem for the Z t process as non-constant within inspections, the approximation for continuous-state degradation process or a large number of discrete

Table 3 The optimal control limits and the long-run expected average cost for the case of feed subsystem. Maintenance policy

Hard rate limit (104 )

Age limit

Average cost

Proposed approach Model 2 Model 3 Model 4

6.25 5.625 – –

3200 – 3200 –

347.5 385.8 468.8 700.6

301

C. Duan et al. / Mechanical Systems and Signal Processing 111 (2018) 285–302 Table 4 The optimal control limits and the long-run expected average cost for the case of GaAs laser. Maintenance policy

Hard rate limit (104 )

Age limit

Average cost

Proposed approach Model 2 Model 3 Model 4

10.94 12.5 – –

1100 – 1100 –

162.5 186.8 229.6 316.2

degrading states was left intact and is still a hard problem. To cope with this difficulty, a matrix-based approximation method is used to predict health measures and then the optimal maintenance policy based on both age control limit and hazard rate control limit is proposed and solved in SMDP framework with matrix form. The health measures and maintenance decision variables can be easily obtained through the embedded matrices. We presented both continuous-state degradation process covariate and multi-state process covariate in PHM, and verified the proposed approach by using two case studies from feed subsystem of a boring machine and GaAs laser. A comparison with other methods was also given. It was found that our approach achieved better health prediction compared to multi-state continuous time Markov covariate and gained the lowest total maintenance cost. The characteristics of proposed model is good for online prediction and maintenance decision making and the policy is easy to implement and also appealing for real applications. Although we analyze data coming from a boring machine and GaAs lasers, our method can be applied to a much wider class of applications. For example, our model can be applied to gearboxes, oil engines with condition monitoring data. There are several possible directions for future research. One extension is to consider uncertainty in the inspections or consider nonhomogeneous Markov chain in PHM with flexible transition mechanisms or their combination. Another interesting future research topic could be the consideration of multivariate CM information. Such an extension may lead to both interesting theoretical and practical results. Acknowledgment The authors would like to express their gratitude to the technical editor and anonymous referees for their valuable and constructive comments which contributed to a significant improvement of the original version of this paper. Financial support provided by China Scholarship Council (CSC) (Grant No. 201506160096) and National Key Research and Development program of China (Grant No. 2016YFE0121700) is also gratefully acknowledged. Appendix A A.1. Proof of one-step transition probability Kij ðnhÞ  When Z t transits from state ði; nhÞ to state ðj; ðn þ 1ÞhÞ, j – i and i; j 2 0; 1; . . . ; M, suppose transition occurs at time u 2 ðnh; ðn þ 1ÞhÞ, then:

Kij ðnhÞ ¼ Prðn > ðn þ 1Þh; Z ðnþ1Þh ¼ jjn > nh; Z nh ¼ iÞ i Rh h R nhþh ¼ P ij 0 E expð nh hðt; Z t ÞdtÞjt i ¼ u k2i u expðki uÞ   expðkj ðh  uÞÞ  ð1 þ kj ðh  uÞÞ du Rh R nhþu ¼ Pij 0 expð nh hðt; iÞdtÞ  k2i u expðki uÞ R

  ðnþ1Þh  exp  nhþu hðt; jÞdt  expðkj ðh  uÞÞ  ð1 þ kj ðh  uÞÞ du

 When Z t remains in state ði; nhÞ in the next time interval h, then

Kii ðnhÞ ¼ Prðn > ðn þ 1Þh; Z ðnþ1Þh ¼ ijn > nh; Z nh ¼ iÞ i R þ1 h R ðnþ1Þh ¼ h E expð nh hðt; Z t ÞdtÞjti ¼ u  k2i u expðki uÞdu h i R nhþh ¼ E expð nh hðt; Z t ÞdtÞji  ðki h þ 1Þ expðki hÞ R

ðnþ1Þh hðt; iÞdt  ðki h þ 1Þ expðki hÞ ¼ exp  nh

When joint degradation process transits from state ði; nhÞ to failure state F in the next time interval h, then the transition probability is given by

Pi;F ðnhÞ ¼ 1 

M X Kij ðnhÞ jPi

302

C. Duan et al. / Mechanical Systems and Signal Processing 111 (2018) 285–302

References [1] R. Moghaddass, M.J. Zuo, An integrated framework for online diagnostic and prognostic health monitoring using a multistate deterioration process, Reliab. Eng. Syst. Saf. 124 (2014) 92–104. [2] D. Tang, J. Yu, Optimal replacement policy for a periodically inspected system subject to the competing soft and sudden failures, Eksploatacja i Niezawodnosc-Maint. Reliab. 17 (2) (2015) 228–235. [3] W. Kuo, M.J. Zuo, Optimal Reliability Modeling: Principles and Applications, John Wiley & Sons, 2003. [4] X.S. Si, W. Wang, C.H. Hu, D.H. Zhou, Remaining useful life estimation-a review on the statistical data driven approaches, Eur. J. Oper. Res. 213 (1) (2011) 1–14. [5] A.K. Jardine, D. Lin, D. Banjevic, A review on machinery diagnostics and prognostics implementing condition-based maintenance, Mech. Syst. Sig. Process. 20 (7) (2006) 1483–1510. [6] S. Alaswad, Y. Xiang, A review on condition-based maintenance optimization models for stochastically deteriorating system, Reliab. Eng. Syst. Saf. 157 (2017) 54–63. [7] M.J. Kim, R. Jiang, V. Makis, C.G. Lee, Optimal Bayesian fault prediction scheme for a partially observable system subject to random failure, Eur. J. Oper. Res. 214 (2) (2011) 331–339. [8] Y. Peng, M. Dong, A prognosis method using age-dependent hidden semi-Markov model for equipment health prediction, Mech. Syst. Sig. Process. 25 (1) (2011) 237–252. [9] Q. Liu, M. Dong, W. Lv, X. Geng, Y. Li, A novel method using adaptive hidden semi-Markov model for multi-sensor monitoring equipment health prognosis, Mech. Syst. Sig. Process. 64 (2015) 217–232. [10] N. Gebraeel, J. Pan, Prognostic degradation models for computing and updating residual life distributions in a time-varying environment, IEEE Trans. Reliab. 57 (4) (2008) 539–550. [11] R.P. Nicolai, R. Dekker, J.M. Van Noortwijk, A comparison of models for measurable deterioration: An application to coatings on steel structures, Reliab. Eng. Syst. Saf. 92 (12) (2007) 1635–1650. [12] Z. Huang, Z. Xu, X. Ke, W. Wang, Y. Sun, Remaining useful life prediction for an adaptive skew-Wiener process model, Mech. Syst. Sig. Process. 87 (2017) 294–306. [13] J.M. Van Noortwijk, A survey of the application of gamma processes in maintenance, Reliab. Eng. Syst. Saf. 94 (1) (2009) 2–21. [14] Q. Zhang, C. Hua, G. Xu, A mixture Weibull proportional hazard model for mechanical system failure prediction utilising lifetime and monitoring data, Mech. Syst. Sig. Process. 43 (1) (2014) 103–112. [15] R. Huang, L. Xi, X. Li, C.R. Liu, H. Qiu, J. Lee, Residual life predictions for ball bearings based on self-organizing map and back propagation neural network methods, Mech. Syst. Sig. Process. 21 (1) (2007) 193–207. [16] V. Makis, A.K.S. Jardine, Optimal replacement in the proportional hazards model, INFOR: Inform. Syst. Oper. Res. 30 (1) (1992) 172–183. [17] D. Banjevic, A.K.S. Jardine, V. Makis, M. Ennis, A control-limit policy and software for condition-based maintenance optimization, INFOR: Inform. Syst. Oper. Res. 39 (1) (2001) 32–50. [18] H.T. Pham, B.S. Yang, T.T. Nguyen, Machine performance degradation assessment and remaining useful life prediction using proportional hazard model and support vector machine, Mech. Syst. Sig. Process. 32 (2012) 320–330. [19] V. Makis, X. Jiang, Optimal replacement under partial observations, Math. Oper. Res. 28 (2) (2003) 382–394. [20] A. Ghasemi, S. Yacout, M.S. Ouali, Optimal condition based maintenance with imperfect information and the proportional hazards model, Int. J. Prod. Res. 45 (4) (2007) 989–1012. [21] D. Tang, J. Yu, X. Chen, V. Makis, An optimal condition-based maintenance policy for a degrading system subject to the competing risks of soft and hard failure, Comput. Ind. Eng. 83 (2015) 100–110. [22] B. Liu, Z. Liang, A.K. Parlikad, M. Xie, W. Kuo, Condition-based maintenance for systems with aging and cumulative damage based on proportional hazards model, Reliab. Eng. Syst. Saf. 193 (2017) 381–391. [23] X. Wu, S.M. Ryan, Value of condition monitoring for optimal replacement in the proportional hazards model with continuous degradation, IIE Trans. 42 (8) (2010) 553–563. [24] X. Wu, S.M. Ryan, Optimal replacement in the proportional hazards model with semi-Markovian covariate process and continuous monitoring, IEEE Trans. Reliab. 60 (3) (2011) 580–589. [25] L. Jafari, V. Makis, Joint optimal lot sizing and preventive maintenance policy for a production facility subject to condition monitoring, Int. J. Prod. Econ. 169 (2015) 156–168. [26] D. Brook, D.A. Evans, An approach to the probability distribution of CUSUM run length, Biometrika 59 (3) (1972) 539–549. [27] S. Zhao, V. Makis, S. Chen, Y. Li, On-line reliability assessment for an electronic system subject to condition monitoring, 2016 IEEE International Conference on, IEEE, 2016. [28] H.C. Tijms, Stochastic Models: an Algorithmic Approach, John Wiley & Sons Inc, 1994, p. 303. [29] S.M. Ross, Introduction to Probability Models, Academic Press, 2014. [30] C.T. Yiakopoulos, K.C. Gryllias, I.A. Antoniadis, Rolling element bearing fault detection in industrial environments based on a K-means clustering approach, Expert Syst. Appl. 38 (3) (2011) 2888–2911. [31] W.Q. Meeker, L.A. Escobar, Statistical Methods for Reliability Data, John Wiley & Sons, 1998. [32] S. Yue, P. Pilon, A comparison of the power of the t test, Mann-Kendall and bootstrap tests for trend de-tection/Une comparaison de la puissance des tests t de Student, de Mann-Kendall et du bootstrap pour la detection de tendance, Hydrol. Sci. J. 49 (1) (2004) 21–37.