Optimal Bayesian early fault detection for CNC equipment using hidden semi-Markov process

Optimal Bayesian early fault detection for CNC equipment using hidden semi-Markov process

Mechanical Systems and Signal Processing 122 (2019) 290–306 Contents lists available at ScienceDirect Mechanical Systems and Signal Processing journ...

1MB Sizes 1 Downloads 20 Views

Mechanical Systems and Signal Processing 122 (2019) 290–306

Contents lists available at ScienceDirect

Mechanical Systems and Signal Processing journal homepage: www.elsevier.com/locate/ymssp

Optimal Bayesian early fault detection for CNC equipment using hidden semi-Markov process Chaoqun Duan a,b,c, Viliam Makis b,⇑, Chao Deng c a

School of Mechatronic Engineering and Automation, Shanghai University, Shanghai 200444, China Department of Mechanical and Industrial Engineering, University of Toronto, 5 King’s College Road, Toronto, ON M5S 3G8, Canada c School of Mechanical Science and Engineering, Huazhong University of Science and Technology, Wuhan 430074, China b

a r t i c l e

i n f o

Article history: Received 29 June 2017 Received in revised form 16 November 2018 Accepted 21 November 2018

Keywords: Early fault detection Multivariate Bayesian control policy Condition-based maintenance Mean residual life Hidden semi-Markov model

a b s t r a c t The goal of a computer numerically controlled (CNC) equipment is to guarantee high specified performance and to maintain it effectively over its life cycle time. Thus, the assessment of health condition is crucial for industrial systems. However, very few papers have dealt with the cost-optimal early fault detection and remaining useful life prediction of CNC equipment using multivariate positioning data. The novel approach presented here is based on vector autoregressive (VAR) degradation modeling and hidden semi-Markov modeling using the optimal Bayesian control technique. System condition is modeled using a continuous time semi-Markov chain with three states, i.e. unobservable healthy state 1, unobservable warning state 2 and observable failure state 3. Model parameter estimates are calculated using the expectation-maximization (EM) algorithm. The optimal control policy for the three-state model is represented by a Bayesian control chart for a multivariate observation process. The optimization problem is formulated and solved in the semiMarkov decision process (SMDP) framework. A formula for the mean residual life (MRL) is also derived based on Bayesian approach, which enables the estimation of the remaining useful life and early maintenance planning based on the observed data. The validation of the proposed methodologies is carried out using actual multivariate degradation data obtained from a CNC equipment. A comparison with the multivariate Bayesian control chart based on a hidden Markov model (HMM) is given, which illustrates the effectiveness of the proposed approach. Ó 2018 Elsevier Ltd. All rights reserved.

1. Introduction The CNC equipment is widely used in industry for achieving productivity performance goals, and its fault detection and failure prediction are important for plant profitability and human security [1]. The mechanism of fault of a piece of equipment is always complicated and can be traced to an underlying machine degradation process. Degradation process may involve corrosion, material fatigue, wear and fracturing and the accumulation of damage throughout machine’s life is irreversible and may lead to failure, if it is not properly maintained [2]. For mechanical system, the deterioration of the machine tools results in changing of positioning precision and stability which carry important information about the machine tool’s condition [3]. Since the failure of the machine tool always incurs high costs (e.g. due to production losses, low quality) and

⇑ Corresponding author. E-mail address: [email protected] (V. Makis). https://doi.org/10.1016/j.ymssp.2018.11.040 0888-3270/Ó 2018 Elsevier Ltd. All rights reserved.

C. Duan et al. / Mechanical Systems and Signal Processing 122 (2019) 290–306

291

safety hazards, an important topic in reliability and maintenance engineering is determining how positioning degradation data should be properly integrated into fault detection and failure prevention of machine tools. The degradation data analysis started over two decades ago considering crack length, oil measurements, vibration data collection and analysis, etc [4–8]. Although these methods worked well for general systems, for specific CNC equipment the positioning degradation data would be a more effective and comprehensive characteristic to indicate any abnormal operation and internal physical changes [9]. Current research usually uses motion degradation data in machine design and manufacturing [10,11]. Very little work has been done using methods from statistics and operational research to model and analyze the motion data for the purpose of CNC equipment fault prediction and maintenance decision making. In this paper, we present a method for assessing the health condition and predicting failures of CNC equipment. We analyze real multivariate data obtained from positioning data of circular motion collected at regular time epochs from a CNC machine. The condition of the machine system is modeled as a 3-state continuous time semi-Markov chain, where states 1 and 2 represent good (healthy) and warning (unhealthy) operational states, and state 3 represents the failure state. The system is in state 1 as long as it remains in a healthy condition. Over time the system deteriorates with age and usage and, as a result, its failure rate increases. Eventually, the system can no longer be considered to be in a good or healthy condition, even though it can still operate. The real states (healthy or unhealthy) cannot be observed directly when the machine is operational (i.e. in states 1 or 2), but can only be estimated using the positioning data obtained through system condition monitoring. In particular, while the system is operational, multivariate positioning data Z 1 ; Z 2 ; :::; is sampled at equidistant time points D; 2D; :::. When the system fails (i.e. enters state 3), it can no longer operate, and no further condition monitoring data is collected. A VAR model is fitted to the healthy portions of data histories and residuals of the fitted model are chosen as the observation process in the hidden semi-Markov framework. Compared with HMM for which sojourn time in each operational state of the system is assumed to follow exponential distribution, which has ‘‘memoryless” property, the hidden semi-Markov model (HSMM) is much closer to represent real machine degradation process. Liu and Dong [12,13] proposed a diagnosis and prediction method of equipment based on the HSMM with the sojourn times following normal distributions. Rui [14] provided a minor extension to HMM, considering a HSMM with the sojourn time following two-phase Erlang distribution, which still has limited applications. Moghaddass [15] presented a nonhomogeneous continuous-time hidden semi-Markov process for online diagnostic and prognostic health monitoring, however it is only applicable to univariate monitoring, and may not be effective when multivariate observations are available. We apply the EM algorithm to estimate the state and observation parameters of the HSMM. The parameter updates in each iteration of the EM algorithm have explicit formulas, which is computationally attractive for real applications. Once the parameters of the HSMM are estimated, we develop a multivariate Bayesian control chart for failure prediction [16]. After each multivariate observation is collected, the posterior probability that the system is in a warning state is updated using Bayes’ theorem, and plotted on the chart. Once the posterior probability exceeds a control limit, full system inspection is carried out. We develop an efficient computational algorithm which solves the optimization problem in SMDP framework. Generally, there are typically two criteria for maintenance optimization over long-term horizons: cost minimization and availability maximization. In this paper, we consider the average cost objective although our analysis can be easily modified for the availability objective. In the maintenance literature, no paper has considered Bayesian control chart design based on HSMM for CNC equipment. Although we analyze positioning data coming from a milling machine, our method can be applied to all types of CNC machines. The remainder of the paper is organized as follows. In Section 2, we fit a VAR model to the positioning data, and obtain the residuals for both the healthy and unhealthy portions of the data histories. In Section 3, we model the residual process and the state process as a HSMM. Using the EM algorithm, we estimate both the state and residual model parameters. In Section 4, a cost-optimal Bayesian control chart is constructed and applied as the fault detection scheme. Conditional reliability and remaining useful life prediction formulas are developed in Section 5, and a comparison with other approach is given. Section 6 provides concluding remarks. 2. Data pre-processing Since machine tool achieves its manufacturing function through the motion control of terminal executive unit, the deterioration of the machine can be reflected by the abnormal change of the motion trajectory. The circular motion test is the most common and representative performance test for multi-axis linkage machine tools. The circular motion, which is executed by synthesizing the interpolation instructions, contains not only information related to the geometric accuracy, position error, repeatability accuracy of the CNC equipment, but also the dynamic error information related to the feed rate and servo control system, such as equipment crawling, the scale error, the backlash, the servo gain mismatch, the machining radius due to the servo response hysteresis, etc. Thus, circular motion information can be a comprehensive reflection of changes in CNC equipment condition. This study adopts the circular motion information of a CNC milling machine, which comes from the type OTM-650 in Huazhong University of Science and Technology. 2.1. Data collection and analysis The experimental setup includes a CNC milling machine, a HEIDENHAIN KGM 182 grid encoder, and a PC computer, as shown in Fig. 1. The KGM 182 grid encoder is a non-contact measurement with high precision for two-dimensional motion

292

C. Duan et al. / Mechanical Systems and Signal Processing 122 (2019) 290–306

accuracy test, consisting of a grid plate, non-contact scanning head, IK220 acquisition card and ACCOM evaluation software. The grid plate is embedded in a mounting base and fixed to the machine table, the scanning head is locked into the spindle end, and the acquisition card is installed on the PC computer, as shown in Fig. 2. The KGM182 grid encoder was used to dynamically test the circular path of the milling machine. The experiment was implemented under standard condition specified in ISO 230-4:2005 [17]. The circular path was tested every D ¼ 20 hours, and the workbench of the milling machine moves in a circle with a radius of 50 mm and feed rate of 2000 mm/min on the X-O-Y plane with anticlockwise and clockwise motions. The KGM 182 grid encoder measures the actual path of traverse, and the ACCOM evaluation software determines deviations from an ideal circular path and calculates the numerical values. Four different accuracy indices, which are circular hysteresis, circular deviation, minimum radius deviation and maximum radius deviation, are measured at each sampling epoch. The circular hysteresis is the maximum radial difference between two actual paths, where one path is carried out by a clockwise contouring motion and the other one by an anticlockwise contouring motion. The circular deviation is measurement of the minimum radial separation of two concentric circles enveloping the actual path. The radius deviation is obtained by deviation between actual path and ideal path, with maximum value measured away from the center of the circle, and minimal value towards the center of the circle. The actual circular path of circular test is shown in Fig. 3. The total number of data histories recorded is 24, which consist of N ¼ 11 failure histories and M ¼ 13 suspension histories. The failure history is defined as the history that ends with sudden failure or functional failure, and the suspension history is defined as the history that ends when the machine is still in operation and has not failed. The degradation signals obtained in machine condition monitoring are 4-dimensional time series data. Fig. 4 shows that from the 1st to 12th sampling epoch the measurements are stable, indicating that the machine is operating under normal conditions. This is considered as the healthy portion of the data history. Beyond 12th sampling epoch, the measurement indices dramatically increase, which indicates severe wear of the machine. This is classified as the unhealthy portion of the data history. Next, we present the development of a VAR model for the healthy part of the processed signal which is then used to obtain the residuals. 2.2. VAR modeling and computation of residuals In order to fit the VAR model, we conservatively segment the healthy portion of processed signals to assure accurate early fault detection and failure prediction. The residuals of the fitted model are computed for both the healthy and unhealthy portions of the data histories. If the machine runs in the healthy state, the residual of the signal shows only the prediction error and obeys Gaussian distribution, and if the machine moves to an unhealthy state, the characteristics of the residuals will change, signalling severe machine deterioration. The residuals will then be chosen as the observation process in the continuous time HSMM. The 4-dimensional observed healthy data histories have the following representation

Z n  d0 ¼

p X

Ui ðZ ni  d0 Þ þ en ;

n2Z

ð1Þ

i¼1

where d0 2 R4 is the mean; Ui 2 R44 are the autocorrelation matrices; p 2 N is the model order; en follows N 4 ð0; CÞ with the covariance matrix C 2 R44 . P We set d ¼ d0  pi¼1 Ui d0 , and re-write Eq. (1) in the standard form. The whole procedure of the standard form VAR model fitting and parameter estimation is given in Appendix A. The model order p ¼ 4 has provided the best fit, and the VAR model parameter estimates are given by:

Fig. 1. Experimental setup.

293

C. Duan et al. / Mechanical Systems and Signal Processing 122 (2019) 290–306

IK220 data acquisition card

Scanning head Mounting base

Ball screw

Workbench (XOY plane)

Bearing

Nut

Slide rail

ACCOM evaluation software

Fig. 2. Diagram of experiment.

Anticlockwise

Starting point

Centre

Clockwise

Fig. 3. Actual path of circular motion.

0

0:229 B 0:034 ^1 ¼B U B @ 0:151 0:655 0 0:048 B 0:036 B ^4 ¼B U @ 0:093 0:016

0:365 0:155 0:240 0:166 0:022 0:010 0:004 0:038

0:151 0:016 0:352 0:275 0:020 0:080 0:071 0:023

1 0 1 0 0:151 0:237 0:536 0:127 0:193 0:208 0:063 B 0:182 0:116 0:057 0:204 C B 0:06 0:048 0:024 C C ^ B C ^ B C; U 2 ¼ B C; U ¼ B @ 0:518 0:428 0:443 0:105 A 3 @ 0:105 0:103 0:632 A 0:123 0:101 0:238 0:615 0:167 0:193 0:055 1 0 1 0 1 0:019 0:0198 21:3 6:96 4:95 5:55 C B C B C 0:075 C ^ B 0:0115 C ^ B 6:96 3:00 4:58 2:36 C 6 C; d ¼ B C; C  10 ¼ B C: @ 0:0161 A @ 4:95 4:58 38:9 4:69 A 0:014 A 0:006

0:0142

5:55

0:223 0:130 0:026 0:057

1 0:113 0:108 C C C; 0:079 A 0:194

2:36 4:69 2:90

^ are utilized to calculate the residuals Y n of the ^ 1; U ^ 2; U ^ 3; U ^ 4 ; CÞ ^ ¼ ð^ ^; U The estimated VAR model parameters b d; p ðZ n : n 2 NÞ signals:

  Y n :¼ Z n  Eb^ Z n j~ Z n1

ð2Þ

! ^. For n 6 p ^, we recursively compute Y n using the Kalman filter, applying the procedure prowhere Z n1 ¼ ðZ 1 ;    ; Z n1 Þ, n > p vided in [18]. The residuals are then computed for both the healthy and unhealthy portions of each data history. Hence, when the system does in fact move to an unhealthy state, the characteristics of the residuals will change, signalling system deterioration. Using the fitted VAR model, the residuals for the machine degradation process are obtained and illustrated in Fig. 5. The statistical test of independence and normality of the 4-dimensional residuals is performed using Portmanteau Independence Test [19] and Henze-Zirkler Multivariate Normality Test [20], and the resulted p-values are provided in Table 1, which show that the residuals of the fitted model are independent and have a multivariate normal distribution. The ultimate goal of our methodology is to model the hidden system state process with more realistic sojourn time distribution, and use the fitted model to design an optimal Bayesian control chart for fault detection. In the next section, the

294

C. Duan et al. / Mechanical Systems and Signal Processing 122 (2019) 290–306

0.06 0.05

Circular test measurement (mm)

0.04 0.03 0.02 0.01

Healthy portion

0.00

Unhealthy portion

-0.01 -0.02 -0.03 -0.04

Circular hysteresis Circular deviation Minimum radius deviation Maximum radius deviation

-0.05 -0.06 -0.07 0

5

10

15

20

Sampling epoch Fig. 4. Accuracy measurements of circular motion.

Residuals

Residuals

Residuals

Residuals

Circular hysteresis 0.02 0.01 0 -0.01 0.02 0.01 0 -0.01

0

5

10 15 Circular deviation

Unhealthy portion

0

5

10 15 Unhealthy portion Minimum radius deviation

0

5

10 15 Unhealthy portion Maximum radius deviation

0

5

0 -0.02 -0.04

0.02 0.01 0 -0.01

10 15 Sampling epoch Fig. 5. Scatter plot of the residuals.

Unhealthy portion

25

295

C. Duan et al. / Mechanical Systems and Signal Processing 122 (2019) 290–306 Table 1 Residual independence and normality tests. Test

Healthy data set

Unhealthy data set

Independence (Portmanteau test) Normality (Henze-Zirkler test)

0.2463 0.1386

0.4031 0.3245

residuals of the fitted model defined in Eq. (2) are chosen as the observation process in the HSMM framework, and the EM algorithm is used to estimate the state and observation parameters of the HSMM. 3. Hidden semi-Markov modeling and parameter estimation In this section, the residual data is considered as the partial observation data for fitting the hidden semi-Markov model. We assume that the condition of the machine can be categorized into one of three states: a healthy or ‘‘good” state, an unhealthy or ‘‘warning” state, and a failure state, denoted as state 1, state 2, state 3, respectively. Since the machine deterioration has monotonic behavior over time, the state process can be well represented by a non-decreasing continuous time semi-Markov process. It should be noted that for mechanical system, it is usually sufficient (and even preferable) to consider only two operational states: a healthy state (state 1) and a warning state (state 2) [24]. Accordingly, instead of considering N states for milling machine, in this paper we shall focus on the case with the state space X ¼ f1; 2g [ f3g, i.e. N ¼ 3. The machine starts in a healthy state, i.e. X 0 ¼ 1, and moves through unhealthy state to failure or jumps to failure directly. The state 1 and state 2 are hidden and only the failure state is observable. In general, for semi-Markov modeling the system transition probability matrix can be written as:

0

0 p12 B P ¼ @0 0 0 0

p13

1

C 1 A 1

Since the probability distribution of any positive random variable can be arbitrarily closely approximated by a mixture of Erlangian distributions with the same rate (scale) parameters, the sojourn times in state i for i ¼ 1; 2, are modeled by a general Erlang distribution with shape parameter ki 2 Nþ and density function:

f i ðtÞ ¼

kki tki 1 ekt ðki  1Þ!

for t P 0

ð3Þ

where k > 0 is an unknown rate parameter. The Erlang distribution is one of phase-type distributions, which can be modeled as a series of a given number of independent exponential phases that are sequentially processed. The sojourn time is completed when the last exponential phase is completed. System stays in state 1 until finishing last phase and transits to state 2 with probability p12 or fails with probability p13 , where p12 þ p13 ¼ 1. Consider k1 phases in state 1 and k2 phases in state 2 with rate k. Then the state space for HSMM is G ¼ fK 1 ; K 2 ; K 3 g, where K 1 ¼ f1;    ; k1 g denotes the set of states where the machine is operating in a healthy condition, K 2 ¼ fk1 þ 1;    ; k1 þ k2 g denotes the set of states where the machine is working in a warning condition, and K 3 ¼ fk1 þ k2 þ 1g represents the failure state of the machine. The system state transition of HSMM for machine is depicted in Fig. 6. The HSMM with Erlang distribution in each hidden state is essentially the multi-state Markov process. The instantaneous transition rates Q ¼ ðqi;j Þi;j2G for the state process from state i to state j are given by:

8 k > > > < p12 k qij ¼ > > p13 k > : 0

if i–k1 ; j ¼ i þ 1 if i ¼ k1 ; j ¼ k1 þ 1 if i ¼ k1 ; j ¼ k1 þ k2 þ 1 otherwise;

qii ¼ 

X

qij

j–i

p13

1

2 Healthy state (Hidden)

k1

p12

k1 1

k1 k2

Unhealthy state (Hidden)

p23

k1 k2 1 Failure state

Fig. 6. State transition structure of HSMM for the machine degradation and failure.

296

C. Duan et al. / Mechanical Systems and Signal Processing 122 (2019) 290–306

The transition probability matrix of the state process PðtÞ ¼ ðP i;j ðtÞÞi;j2G with its elements P i;j ðtÞ ¼ PðGt ¼ jjG0 ¼ iÞ has the following form [25]: ji

Pi;j ðtÞ ¼ ðktÞ ekt ; ðjiÞ!

8 i; j 2 K 1 or i; j 2 K 2 j i 6 j ji

ekt ; 8i 2 K 1 ; j 2 K 2 P i;j ðtÞ ¼ p12 ðktÞ ðjiÞ! P Pi;k1 þk2 þ1 ðtÞ ¼ 1  P ij ðtÞ; 8i 2 G

ð4Þ

jPi

The residual process vectors ðY n : n 2 NÞ obtained in (2) are independent given the state of the machine system. For the periodic sampling, the Y D ; Y 2D ;    ; Y kD denote d-dimensional observation vector at each sampling epoch. Then, Y kD given the state X nD ¼ x, x ¼ 1; 2 follows multivariate normal distribution N d ðlx ; Rx Þ with density

  0 1 1 f Y kD jX nD ðyjiÞ ¼ qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi exp  y  li R1 i ðy  li Þ 2 ð2pÞd jRi j

ð5Þ

l1 ; l2 2 Rd and R1 ; R2 2 Rdd are the unknown observation process parameters. Once the system fails, PðY kD ¼ gjX kD ¼ 3Þ ¼ 1, where g R Rd represents a failure signal. We assume that N failure histories and M suspension histowhere

ries have been collected, which are represented by F 1 ;    ; F N , and S1 ;    ; SM , respectively. Y i ¼ ðY i1 ;    ; Y iT i Þ represents collection of all the data points of ith failure history with failure time ni ¼ t i , where T i D < t i 6 ðT i þ 1ÞD. Similarly, Y j ¼ ðY 1j ;    ; Y Tj j Þ denotes jth failure history with failure time nj P T i D. Let O ¼ fF 1 ;    ; F N ; S1 ;    ; SM g represent all observable data and LðK; WjOÞ be the associated likelihood function, where K ¼ ðk; P 12 ; k1 ; k2 Þ and W ¼ ðl1 ; l2 ; R1 ; R2 Þ are the sets of unknown state and observation parameters, respectively. Since the sample paths of the state process in HSMM are unobservable, it is not feasible to maximize LðK; WjOÞ. We use EM algorithm to estimate K and W by iteratively maximizing the so-called pseudo likelihood function. The diagram illustrating EM algorithm is in Fig. 7. The specific EM algorithm for HSMM works as follows: 1. Let K0 ; W0 be some initial values of the unknown parameters. 2. E-step. For n P 0, compute the pseudo log-likelihood function defined by: Q ðK; WjKn ; Wn Þ ¼ EKn ;Wn ðlnLðK; WjCÞjOÞ

Start

Select initial parameters * , (State and observation )

*

Compute the pseudo likelihood function Q

,

| ˆ, ˆ

such that

*

E ˆ , ˆ ln L

*

,

|C |O

,

Choose * , arg max Q

*

,

| ˆ, ˆ

,

*

,

*

ˆ, ˆ

Yes Stop Fig. 7. The diagram of EM algorithm.

No

297

C. Duan et al. / Mechanical Systems and Signal Processing 122 (2019) 290–306

3. M-step. Choose Knþ1 and Wnþ1 such that ðKnþ1 ; Wnþ1 Þ 2 arg max Q ðK; WjKn ; Wn Þ K;W

4. If k ðKnþ1 ; Wnþ1 Þ  ðKn ; Wn Þ k < e for 





e > 0, put ðK ; W Þ ¼ ðKnþ1 ; Wnþ1 Þ and stop, otherwise go to step 2.



where C ¼ fF 1 ; . . . ; F N ; S1 ; . . . ; SM g represents the complete data set, in which each failure history F i and suspension history Sj of the observable data set O has been augmented with the unobservable sample path information of the state process. The new parameter estimates K, W are then used in the E-step, and the E-step and M-step continue until k ðKnþ1 ; Wnþ1 Þ  ðKn ; Wn Þ k < e, for small e > 0. Detailed formulas for the likelihood function lnLðK; WjCÞ and the pseudo likelihood function Q ðK; WjKn ; Wn Þ can be found in [25] (see Appendix B). We assign initial values to K0 , W0 and use the convergence criterion k ðKnþ1 ; Wnþ1 Þ  ðKn ; Wn Þ k < 104 for the iteration. We obtained the estimated parameters using the residual data. The iterations of EM algorithm and the results are shown in Table 2. In order to do a model comparison later on, we have chosen the HMM developed by Kim et al. [26]. The transition rate matrix is given by:

0 B Q ¼@

ðk12 þ k13 Þ

k12

0

k23

0

0

k13

1

C k23 A 0

where k12 , k13 , k23 2 ð0; þ1Þ are unknown state parameters. The parameters of HMM are estimated using EM algorithm, applying the estimation procedure in [26]. The similar results of HMM are shown in Table 3. In the next section, the estimates are used to develop an optimal multivariate early fault detection scheme. 4. Multivariate Bayesian fault detection scheme In this section, we describe an optimal multivariate Bayesian control chart to detect early fault of the system. Bayesian control chart is monitoring the posterior probability that system is in a warning state. When multivariate observations Y 1 ;    ; Y n have been collected, the posterior probability that the system is in state i, 1 6 i 6 k1 þ k2 , at the nth decision epoch is denoted by

Pn ðiÞ ¼ PðGnD ¼ ijn > nD; Y 1 ;    ; Y n ; Pn1 Þ where P0 ð1Þ ¼ 1,

Pk1 þk2

!

Pn ðiÞ ¼ 1 and P n ¼ ðPn ð1Þ; Pn ð2Þ;    ; Pn ðk1 þ k2 ÞÞ. It was proved that the posterior probability statistic that the system is in warning state is sufficient for early fault detection and decision making [15]. Let P2n be the posi¼1

terior probability that system is in warning state, that is

P2n ¼

X

PðGnD ¼ ijn > nD; Y 1 ;    ; Y n ; Pn1 Þ ¼

i2K 2

X

Pn ðiÞ

ð6Þ

i2K 2

Table 2 Optimal parameter estimates for HSMM. Parameters

Initial values

1st iteration

2nd iteration

Last iteration

^ k ^12 p ^13 p ^ k

0.2

0.22

0.28

0.26

0.5 0.5 2

0.91 0.09 3

0.92 0.18 4

0.917 0.083 4

^ k 2 l^ 1

2 0

1 0

2 0

2 0

1

l^ 2

^1  e þ 6 R

^2  e þ 6 R

1 0:2 B 0:2 C C B @ 0:2 A 0:2 0 1 0:3 B 0:3 C B C @ 0:3 A 0:3 0 10 10 B 10 20 B @ 10 10 10 10 0 40 20 B 20 50 B @ 20 20 20 20

10 10 30 10 20 20 60 20

1 10 10 C C 10 A 40 1 20 20 C C 20 A 70

1 0:02 B 0:04 C C B @ 0:05 A 0:02 0 1 0:26 B 0:28 C B C @ 0:31 A 0:27 0 1 9 7 5 6 B 7 3 5 5 C B C @ 5 5 9 8 A 6 5 8 6 0 1 21 17 17 15 B 17 20 15 16 C B C @ 17 15 23 14 A 15 16 14 19

1 0:01 B 0:03 C C B @ 0:01 A 0:02 0 1 0:014 B 0:017 C B C @ 0:033 A 0:013 0 12 8 B 8 4 B @ 6 6 7 3 0 11 6 B 6 4 B @ 7 6 8 6

6 6 13 6 7 6 13 13

1 7 3 C C 6 A 3 1 8 6 C C 13 A 14

1 0:004 B 0:005 C C B @ 0:007 A 0:003 0 1 0:014 B 0:016 C B C @ 0:034 A 0:013 0 10:3 7:0 B 7:0 3:0 B @ 5:0 4:6 5:6 2:4 0 10:6 6:0 B 6:0 3:8 B @ 6:6 5:8 7:6 6:2

5:0 4:6 10:9 4:7 6:6 5:8 13:28 13:6

1 5:6 2:4 C C 4:7 A 2:9 1 7:6 6:2 C C 13:6 A 14:0

298

C. Duan et al. / Mechanical Systems and Signal Processing 122 (2019) 290–306

Table 3 Optimal parameter estimates for HMM. Parameters

Initial values

1st iteration

2nd iteration

Last iteration

^ k12 ^ k13

0.2

0.12

0.07

0.059

0.1

0.02

0.007

0.005

^ k23 l^ 1

0.1 0 1 0:2 B 0:2 C B C @ 0:2 A 0:2 1 0 0:3 B 0:3 C C B @ 0:3 A 0:3 0 10 10 B 10 20 B @ 10 10 10 10 0 40 20 B 20 50 B @ 20 20 20 20

0.14 0 1 0:02 B 0:04 C B C @ 0:05 A 0:02 1 0 0:26 B 0:28 C C B @ 0:31 A 0:27 0 9 9 B 9 3 B @ 1 2 6 5 0 28 18 B 18 24 B @ 10 5 15 16

0.146 0 1 0:01 B 0:03 C B C @ 0:01 A 0:02 1 0 0:014 B 0:017 C C B @ 0:033 A 0:013 0 11 8 B 8 4 B @ 5 7 7 3 0 12 7 B 7 5 B @ 7 6 8 6

0.132 0 1 0:004 B 0:005 C B C @ 0:007 A 0:003 1 0 0:014 B 0:016 C C B @ 0:034 A 0:013 0 10:3 7:0 B 7:0 3:0 B @ 5:0 4:6 5:6 2:4 0 10:6 6:0 B 6:0 3:8 B @ 6:6 5:8 7:6 6:2

l^ 2

^1  e þ 6 R

^2  e þ 6 R

10 10 30 10 20 20 60 20

1 10 10 C C 10 A 40 1 20 20 C C 20 A 70

1 1 6 2 5 C C 9 3 A 3 6 1 10 15 5 16 C C 25 11 A 11 39

5 7 12 6 7 6 13 13

1 7 3 C C 6 A 3 1 8 6 C C 13 A 15

5:0 4:6 10:9 4:7 6:6 5:8 13:28 13:6

1 5:6 2:4 C C 4:7 A 2:9 1 7:6 6:2 C C 13:6 A 14:0

! At each sampling epoch, the posterior probability vector P n ¼ ðPn ð1Þ;    ; Pn ðk1 þ k2 ÞÞ is updated. At sampling epoch nD, 

if P2n exceeds the control limit P , the full system inspection is performed to check whether the system is in the healthy state or in the warning state. If the system is found to be in the healthy state, then a false alarm occurred and with some small adjustment and negligible time and cost the system will be operational. If the system is found to be in the warning state, preventive maintenance (PM) is carried out. After full system inspection, the system will restart in as good as new condition. Full system inspection takes T insp time units with C insp cost, and preventive maintenance takes T prev time units with C prev cost. Upon system failure, corrective maintenance is carried out which takes T fail time units with C fail cost. A corrective maintenance or PM action can be either a true physical replacement or an overhaul or repair such that the system is as-goodas-new after the replacement. The objective is to determine the optimal stopping time for control policy that minimizes the long-run expected average cost per unit time. The stopping time represents the first sampling epoch at which full system inspection should take place. 

From renewal theory, the cost minimization problem is equivalent to finding the optimal value of the control limit P , such that

   g P ¼ inf

EP ðCCÞ

P 2½0;1

EP ðCLÞ

where CC and CL denote the system cost in one cycle and cycle length, respectively. Using the Bayes’ theorem, the posterior probability that system is in the state i at nth decision epoch can be recursively computed with following formula:

Pn ðiÞ ¼

where

8 > > P > < > > > :

ciP

n1 1=2

j

8j2K 1

cP

n1

expð12ðV n þDÞÞ

þðjR1 j1 jR2 jÞ

ciP

ðjR1 jjR2 j1 Þ

1=2

expð12ðV n þDÞÞ

ciPn1 ¼ R16k6i Pn1 ðkÞPki ðDÞ,

P

n1

cj

8j2K 1 Pn1

þ

P

c

j

if

i 2 K1

if

i 2 K2

8j2K 2 Pk1

P

cj

ð7Þ

8j2K 2 Pk1

1 1 1 0 D ¼ ðl20 R1 2 l2  l10 R1 l1 Þ  B ðR2 l2  R1 l1 Þ,

1 A ¼ R1 2  R1 ,

1

1 B ¼ ðR1 2  R1 Þ

0 1 ðR1 2 l2  R1 l1 Þ, V n ¼ ðY n  BÞ AðY n  BÞ. The detailed derivation of (7) is given in Appendix C. When we obtained the Bayesian signal, next, we develop a computational algorithm in SMDP framework to determine the 

optimal control limit P . The proposed algorithm is suitable for a wide class of degrading models, and it will be presented in detail considering the positioning degradation data introduced earlier in this paper. Since the knowledge of posterior probability P2n is not sufficient to calculate the essential components (i.e. transition probabilities, mean costs and sojourn times) ! in SMDP, then the vector P n is used in SMDP components derivation to find the optimal control limit for P2n . In general, the computation in SMDP framework requires discretization of possible range of values of posterior probability [0,1] into finite set of states. We partition the interval [0,1] into L subintervals. We have found in practice that when L P 25, the partition leads to a high degree of precision, so L does not need to be chosen very large. Suppose that at sampling time nD the system ! has not failed. The posterior probability vector P n is updated using (7), and we denote the updated vector as pn . Then for fixed L, the process is defined to be in state ði1 ; i2 ;    ; ik1þk21 Þ if the 1st to ðk1 þ k2  1Þth elements of pn , i.e.

299

C. Duan et al. / Mechanical Systems and Signal Processing 122 (2019) 290–306 

þ



þ



þ





ðpn ð1Þ;    ; pn ðk1 þ k2  1ÞÞ, lie in corresponding intervals ½i1 ; i1 Þ, ½i2 ; i2 Þ,  , ½ik1þk21 ; ik1þk21 Þ, where i ¼ ði  1Þ P =L and 

þ



i ¼ i P =L. If at sampling epoch nD, the posterior probability P2n is above the control limit, i.e. P2n P P, full inspection is performed and the process is defined to be in inspection state fðI; i1 ; i2 ;    ; ik1þk21 Þg. Upon full system inspection, if the system is found to be in healthy state, then a false alarm occurs and the process is assumed to be in initial healthy state ð1; 0;    ; 0Þ. Similarly, if upon full system inspection the system is found to be in unhealthy state, a true alarm is triggered and the process is defined to be in preventive maintenance state fPMg.   P P Let S ¼ fði1 ; i2 ;    ; ik1þk21 Þ : k1þk21 is 6 L; s 2 Nþ ; p2n < Pg, where p2n ¼ s2K 2 pðsÞ. Therefore, the state space for the s¼1 

SMDP is defined by S ¼ S [fð1; 0;    ; 0Þg [ fðI; i1 ; i2 ;    ; ik1þk21 Þg [ fPMg. With this definition of the state space of SMDP for the optimal cost minimization problem, the SMDP is determined by the following quantities [29]: px;x0 : the probability that at the next decision epoch the system will be in state x0 given the current state is x, for x; x0 2 S. sx : the expected sojourn time until the next decision epoch given the current state is x 2 S. C x : the expected cost incurred until the next decision epoch given the current state is x 2 S. 

With the quantities defined before for the SMDP framework, the long-run expected average cost gðPÞ based on optimal control limit can be obtained by solving the following system of linear equations [29]: 

v x ¼ C x  gðPÞsx þ

P j2S

vk ¼ 0

px;x0 v x0

8x; x0 2 S

ð8Þ

for some k 2 S 





The optimal control limits P 2 ½0; 1 and the corresponding optimal average cost gðP Þ ¼ inf gðPÞ can be obtained by P2½0;1

iteratively solving Eq. (8). The remainder of the mathematical analysis is devoted to deriving the closed form expressions for the SMDP quantities px;x0 ; sx ; C x , for x; x0 2 S. 0

0

0

Assume that the milling machine in state ði1 ; i2 ; . . . ; ik1 þk2 1 Þ at ðn  1Þth decision epoch transits to ði1 ; i2 ; . . . ; ik1 þk2 1 Þ at the next

decision

pði1 ;i2 ;...;ik

1 þk2 1

epoch,

Þ;ði01 ;i02 ;...;i0k

pði1 ;i2 ;...;ik

1 þk2 1

1 þk2 1

Þ

 is

and

¼

0 ðis



þ 1Þ=L; is

¼ is0 =L

where ði1 ; i2 ; . . . ; ik1 þk2 1 Þ,

Þ;ði01 ;i02 ;...;i0k

1 þk2 1

for

s ¼ 1; . . . ; k1 þ k2  1.

0 0 0 ði1 ; i2 ; . . . ; ik1 þk2 1 Þ

The

SMDP

transition

2 S can be computed as follows:

Þ

! ; n > nDjn > ðn  1ÞD; P n1 ¼ pn1 Þ  i0k þk 1 1 i0k þk 1 i0 1 i0 ¼ Pr 1 L 6 Pn ð1Þ < L1 ; . . . ; 1 2L 6 Pn ðk1 þ k2  1Þ < 1 L2 jn > nD; pn1 RðDjpn1 Þ

¼

i0 1 Prð 1 L



probabilities



i01 L

6 Pn ð1Þ < ; . . . ;

i0k

1 þk2 1

1

L

6 Pn ðk1 þ k2  1Þ <

i0k

1 þk2 1

L

ð9Þ

where RðDjpn1 Þ denotes the conditional reliability at nth decision epoch, which is given by:

Rðtjpn1 Þ ¼ Prðn > nD þ tjn > nD; pn1 Þ P 1 þk2 ð1  Pi;k1 þk2 þ1 ðtÞÞpn1 ðiÞ ¼ ki¼1

ð10Þ

and P ij ðtÞ is the element of transition probability matrix PðtÞ which can be computed using Eq. (4). The second term in Eq. (9) can be computed using the conditional reliability function defined in Eq. (10). Since the SMDP at time ðn  1ÞD is in state ði1 ; i2 ; . . . ; ik1 þk2 1 Þ, for large L, the lengths of the intervals ½ðis  1Þ=L; is =LÞ for s ¼ 1; . . . ; k1 þ k2  1 are small, and therefore pn1 ¼ ðpn1 ð1Þ; :::; pn1 ðkP 1 þ k2 ÞÞ can be approximated by the mid-point of the intervals given by pn1 ðsÞ ¼ ðis  0:5Þ=L and also, pn1 ðk1 þ k2 Þ ¼ 1  s ðis  0:5Þ=L. Khaleghei [27] showed the procedure of computing the first probability term in Eq. (9), and the detailed derivation is given in Appendix D. 

When the posterior probability P2n exceeds the control limit P, a full inspection is performed and the system will make   0 0 0 transition to the inspection state I; i1 ; i2 ; ::::; ik1 þk2 1 . The corresponding transition probabilities are given by:

pðI;i1 ;i2 ;...;ik

1 þk2 1

P



Þ;ðI;i01 ;i02 ;::::;i0k

i01

1 þk2 1

Þ

i01

6 Pn ð1Þ < L ; :::; p L n o A ¼ i 2 Sji P P

¼

2 2A n

pr

i0k

1 þk2 1

L

1

6 Pn ðk1 þ k2  1Þ <

i0k

1 þk2 1

L

 jn > nD; pn1  RðDjpn1 Þ

ð11Þ

When inspection is performed, then it can be a false alarm or true alarm. If a false alarm occurred, then the milling machine is in as good as new condition, so the system makes transition to state ð1; 0;    ; 0Þ. Otherwise, the true alarm is triggered and the system goes to the PM state, and the corresponding transition probabilities are given by:

300

C. Duan et al. / Mechanical Systems and Signal Processing 122 (2019) 290–306

pðI;i1 ;i2 ;...;ik

Þ;ð1;0;;0Þ

pðI;i1 ;i2 ;...;ik

Þ;ðPMÞ

1 þk2 1 1 þk2 1

¼ 1  p2n1

ð12Þ

¼ p2n1 

When observable failure occurs at state ði1 ; i2 ; . . . ; ik1 þk2 1 Þ 2 S during operational time, then the corrective maintenance is performed which brings the system back to the initial healthy state, the corresponding transition probability is given by:

pði1 ;i2 ;...;ik

1 þk2 1

Þ;ð1;0;;0Þ

¼ 1  RðDjpn1 Þ

ð13Þ

When the machine is in the PM state, mandatory PM is performed and a new system cycle begins for the state process which is represented by the equation:

pðPMÞ;ð1;0;;0Þ ¼ 1

ð14Þ

Then the corresponding mean costs and sojourn times for the SMDP are given by:

C ði1 ;i2 ;...;ik

1 þk2 1

C ðI;i1 ;i2 ;...;ik

Þ

1 þk2 1

¼ ð1  RðDjpn1 ÞÞC fail Þ

¼ C insp

C PM ¼ C prev Z

sði1 ;i2 ;...;ik1 þk2 1 Þ ¼

0

D

Rðtjpn1 Þdt þ ð1  RðDjpn1 ÞÞT fail

ð15Þ

sðI;i1 ;i2 ;...;ik1 þk2 1 Þ ¼ T insp sPM ¼ T prev where the parameters T insp , C insp , T prev , C prev , T fail , C fail are the times and costs required to complete full system inspection, preventive maintenance and corrective maintenance, respectively. Note that the inspection is only performed at a sampling epoch, but the failure can happen at any time. Hence, if the system is normally operating below the control limit, the expected sojourn time for the system to go to the next state depends on the expected sojourn time for the system to continue operation and to be correctively maintained. This completes the derivations of SMDP quantities. Now, we apply the Bayesian control policy to our CNC equipment. The inspection and maintenance time parameters T insp ¼ 0:2, T prev ¼ 1, T fail ¼ 1, and the maintenance cost parameters C insp ¼ 20, C prev ¼ 200, C fail ¼ 800. Coding Eqs. (8)–(15), 

the optimal control limit was computed as P ¼ 0:25 with the corresponding expected average cost 23.8. The examples of the Bayesian control policies for HSMM developed in this paper and for HMM are given in Fig. 8 for one of the failure histories. 

Once the posterior probability that the system is in warning state exceeds P ¼ 0:25 at a sampling epoch, full system inspection is triggered to check the machine’s condition. For the failure history in Fig. 8, this occurred at the 18th sampling epoch with HSMM. The result is compared with Kim’s model [4], which considered Bayesian control approach based on HMM. The posterior probability is computed using estimated parameters of HMM provided in Table 3. The chart signals

1 HSMM HMM

0.9 0.8

Posterior probability

0.7 0.6 0.5 0.4 Control limit for HMM 0.3 Control limit for HSMM

0.2 0.1 0

0

5

10 15 Sampling epoch

20

Fig. 8. Illustration of Bayesian control chart.

301

C. Duan et al. / Mechanical Systems and Signal Processing 122 (2019) 290–306 

alarm at the 20th sampling epoch. The optimal control limit P ¼ 0:3 and the minimal average cost has increased 8.8% to 25.9. It demonstrates that the Bayesian control approach based on HSMM is more efficient than the one based on HMM. The latter overestimates the healthy condition and allows a higher operational risk with a higher expected cost. 5. Residual life prediction For the smooth system operation and efficient production, it is important to predict the residual life of a degrading system ahead of failure. In this section, we derive explicit formulas for the conditional reliability function and the MRL function for the model proposed in this paper. Each of these quantities can be expressed in terms of the posterior probability statistics. We can assess the remaining useful life of the system at the nth decision epoch by computing the MRL using the posterior ! probability vector P n ¼ pn . The deterioration process of the milling machine is a hidden semi-Markov process with 4 and 2 phases in hidden states 1 and 2, respectively. Following the derivation in Eq. (4) in Section 3 and Eq. (7) in Section 4, for any t P 0 the conditional reliability function at the nth decision epoch is given by:

! Rðtjpn Þ ¼ Prðn > nD þ tjn > nD; P n ¼ pn Þ ! ¼ PrðGnDþt R K 3 jn > nD; P n ¼ pn Þ P6 P6 ¼ i¼1 j¼1 Pij ðtÞpn ðiÞ jPi



ð16Þ

  j i P P þ pn ð2Þekt 1 þ 2j¼1 ðktÞ þ 4i¼3 p12 ðktÞ j! i!     i i P P þ pn ð4Þekt 1 þ 2i¼1 p12 ðktÞ þpn ð3Þekt 1 þ kt þ 3i¼2 p12 ðktÞ i! i!

¼

pn ð1Þekt 1 þ

P3

j¼1

ðktÞ j j!

þ

P5

i¼4 p12

ðktÞi i!



þpn ð5Þekt ð1 þ kt Þ þ pn ð6Þekt Then, the MRL function for milling machine at the nth decision epoch can be computed as follows:

lnD ðpn Þ ¼ Eðn  n > tDjn > nD; pn Þ R1 ¼ 0 Rðtjpn Þdt ¼ pn ð1Þ 1k ð4 þ 2p12 Þ þ pn ð2Þ 1k ð3 þ 2p12 Þ þ pn ð3Þ 1k ð2 þ 2p12 Þ þpn ð4Þ 1k ð1 þ 2p12 Þ þ pn ð5Þ 2k þ pn ð6Þ 1k

ð17Þ

! With the updated posterior probability vector P n ¼ ðPn ð1Þ; Pn ð2Þ; Pn ð3Þ; Pn ð4Þ; Pn ð5Þ; Pn ð6ÞÞ at each sampling epoch, the conditional reliability functions at different epochs are plotted in Fig. 9a. For the 1st, 6th and 12th sampling points, the value of the conditional reliability function monotonically decreases with the increasing system deterioration. The estimated MRLs for HSMM at each sampling epoch for the investigated failure history are plotted in Fig. 9b. A comparison with Bayesian prediction using HMM is also depicted, where the parameters of HMM are in Table 3. We note that the prediction using HMM gives a wider range of estimated time, while the proposed approach is much closer to the actual remaining life (ARL) of the milling machine at each sampling epoch. Particularly, after the 19th sampling epoch the MRL

1 1st sampling 6th sampling 12th sampling

0.9 0.8

Mean residual life (h)

350

Conditional reliab ility

0.7 0.6 0.5 0.4 0.3

300 250 200 150 100

0.2

50

0.1 0

HMM HSMM ARL

400

0

5

10

15

20

25

30

35

40

45

50

0

2

4

6

8

10

12

14

Time ( 20h)

Time ( 20h)

(a)

(b)

Fig. 9. (a) Conditional reliability function and (b) Mean residual life prediction.

16

18

20

22

24

302

C. Duan et al. / Mechanical Systems and Signal Processing 122 (2019) 290–306

value for the investigated failure history is less than 2 sampling intervals. To validate the effectiveness of the proposed approach, we also computed for the available failure histories the MRL values at the last three sampling epochs. They were found to be less than 2 sampling intervals in 80% of the investigated failure histories. This result suggests that on average 80% of failures can be predicted using the proposed HSMM model. We also found that for the HMM, all failures have the MRL value greater than 2 sampling intervals, when computed at the last three sampling epochs before failure. This suggests that a large percentage of failures would not be predicted using the MRL statistic given by the HMM. This clearly indicates that the Bayesian prediction using HSMM is a considerably better indicator of machine failure. 6. Conclusions In this paper, we have modeled a mechanical equipment subject to condition monitoring as a partially observable deteriorating system with multivariate observations. A fault detection scheme based on Bayesian estimation and control methodologies has been developed by utilizing multivariate degradation data obtained from a milling machine. Machine deterioration process is described by a hidden, three-state homogeneous continuous-time semi-Markov process. A vector autoregressive model was fitted to the healthy portions of the data histories, and residuals of the fitted model were chosen as the observation process in the HSMM framework. The EM algorithm has been applied to obtain the model parameter estimates and a multivariate Bayesian control problem has been formulated and solved in SMDP framework to find the optimal control limit and the optimal average cost. The developed Bayesian control chart initiates full system inspection when posterior probability that the system is in unhealthy state exceeds a certain level. The conditional reliability function and remaining useful life prediction formulas have also been developed. It has been demonstrated that compared with the HMM, our approach achieved a more effective early fault detection and agreed better with the actual remaining life. This is the first paper that applies Bayesian estimation and control methods to multivariate deteriorating data for machine tool. The model and methods developed in this paper can be applied to a wide range of degrading machines with multivariate continuous monitoring data. Further improvement of the model to consider competing risks of soft and hard failure is a suitable topic for future research. Acknowledgments This research was supported by China Scholarship Council (CSC) (Grant No. 201506160096) and National Key Research and Development Program of China (Grant No. 2016YFE0121700). Appendix A 0

The 4-dimensional observed healthy data histories fzl1 ;    ; zltl g, zlt ¼ ðzl1t ; zl2t ; zl3t ; zl4t Þ for l ¼ 1;    ; N þ M have the following representation

Z n  d0 ¼

p X

Ui ðZ ni  d0 Þ þ en ;

n2Z

i¼1

where d0 2 R4 is the mean; Ui 2 R44 are the autocorrelation matrices; p 2 N is the model order; en follows N 4 ð0; CÞ with the P covariance matrix C 2 R44 . All the parameters are unknown and need to be estimated. By setting d ¼ d0  pi¼1 Ui d0 , the VAR process can be written in the standard form

Zn ¼ d þ

p X

Ui Z ni þ en ;

n2Z

i¼1

Thus the observed healthy data histories fzl1 ;    ; zltl g, l ¼ 1;    ; N þ M, have the regression representation

W ¼ VA þ E where

h i0 NþM 1 1 W ¼ zNþM tNþM    ztpþ1    zt 1    zt pþ1 2

1

6 zNþM 6 tNþM 1 6 V ¼6 . 6 .. 4 zNþM t NþM p

1

1

zNþM tp

z1tNþM 1



.. .

zNþM 1

zNþM t NþM p

1

30

7 7 7 .. .. 7  .  .7 5 z11 z1tP

303

C. Duan et al. / Mechanical Systems and Signal Processing 122 (2019) 290–306

A ¼ ½d U1    Up 0 E¼

h

NþM 1 1 eNþM t NþM    etpþ1    et 1    etpþ1

i0

Using the least squares method, Reinsel [18] showed that the estimates of A and C have the form

^ ¼ V 0 V 1 V 0 W A  0 ^ ðW  V AÞ ^ C^ ¼ ðT  ð4p þ 1ÞÞ1 W  V A

ðA:1Þ

P where T ¼ NþM l¼1 ðt 1  pÞ is the total number of available data points of healthy portions. Several information criteria exist to determine the overall VAR model order and to measure the fitting quality of the model taking its complexity into account, such as AIC criterion, BIC criterion, likelihood ratio statistic, and HQ criterion [21,22]. The AIC model selection criterion [23] is defined as

^ r jÞ þ AIC ¼ logðjC

2r T

^ r is the corresponding maximum likelihood residual where r is the number of parameters estimated in the VAR model, C covariance matrix estimate of C, and T is the sample size. For the 4-dimensional data, we have found that the optimal model ^ ¼ 4. Also the adequacy of model order is verified using HQ criterion, order p corresponding to the minimum value of AIC is p and shows the minimum value HQ ¼ 91:07 obtained for p ¼ 4, which indicates that p ¼ 4 has achieved the best quality of fitting. Using (A.1), the VAR model parameter estimates are given by 1 1 0 0 0:229 0:365 0:151 0:151 0:237 0:536 0:127 0:193 0:208 C C B B B B 0:034 0:155 0:016 0:024 C B 0:182 0:116 0:057 0:204 C B 0:06 ^1 ¼ B C ^ C ^ B B U B 0:151 0:240 0:352 0:632 C; U2 ¼ B 0:518 0:428 0:443 0:105 C; U3 ¼ B 0:105 A A @ @ @ 0:655 0:166 0:275 0:123 0:101 0:238 0:615 0:167 0:193 0

0:063 0:223 0:113

1

C 0:048 0:130 0:108 C C; 0:103 0:026 0:079 C A 0:055 0:057 0:194

1 0 1 1 0 0:0198 0:048 0:022 0:020 0:019 21:3 6:96 4:95 5:55 C B C C B B B 0:0115 C B 0:036 0:010 0:080 B 6:96 0:075 C 3:00 4:58 2:36 C ^4 ¼ B C; C^  106 ¼ B C; ^d ¼ B C U B 0:0161 C B 0:093 0:004 0:071 0:014 C B 4:95 4:58 38:9 4:69 C: A @ A A @ @ 0:0142 0:016 0:038 0:023 0:006 5:55 2:36 4:69 2:90 0

Appendix B ^1 ; k ^2 are given by ^13 ; ^ ^12 ,p k; k The explicit formulas for the state parameters p

PN

PM ^ j j¼1 b1 PM ^ j ; i ^ ^ i¼1 ða1 þ xÞ þ j¼1 b1

^12 ¼ P p N

i¼1

a^ i1 þ

PN

^13 ¼ P p N

i¼1

^i x

^ i¼1 ð 1

^ Þi þ a þx

PM ^ j j¼1 b1

^1 , k ^2 can be obtained by maximizing: ^ k, k

max

k1 ;k2 ;k

þ

N P i¼1 N P

! j M  N N M P P P P ^ ^1 þ x ^ i1 k2 lnk þ a ^ i3 ðk2  1Þ þ ^Þ þ ^1 k1 lnk þ a b1 þ c ða i

j¼1

^2 þ x ^ lntÞ þ ða i

i¼1



N P

M  P j¼1

i

^1 þ x ^Þ þ ða

i¼1

^ þc ^3 b 3

j

i¼1

!

i¼1

ðk1  1Þ 

N P i¼1

! j M  P ^ þc ^ b lnðk1  1Þ!  1 1

N P

j¼1

a^ i1 þ

M P j¼1

^ j lnðk2  1Þ! b 1 i

^1 þ x ^Þ þ t ða

i¼1

PN

^ ni ;C

i

PM

^ n1 ;D

j¼1

j ^j s1 ;D

j¼1

s2 ;D

i¼1

s1 ;C

þ

i¼1

s2 ;C

þ

j



;

j

þ

PN i ^ i PM j ^ j

n ;C þ n ;D 2 i Pj¼1 j2 j ; l^ 2 ¼ Pi¼1 N M i ^ ^

PN

^ ni ;C

i

PM

^ n3 ;D

j¼1

j ^j s1 ;D

j¼1

s2 ;D

i¼1

s1 ;C

þ

i¼1

s2 ;C

þ

j



j

þ

^ 1 ¼ Pi¼1 3 i Pj¼1 R N M i ^

! j M  P ^ þc ^ b k 2 2

j¼1

The formulas for the observation parameter estimates are given by

1 i Pj¼1 l^ 1 ¼ Pi¼1 N M i ^

j¼1

!

PN i ^ i PM j ^ j

n ;C þ n ;D ^ 2 ¼ Pi¼1 4 i Pj¼1 j4 j

R N M i ^ ^

j ha4 ;g^i ^f j

304

C. Duan et al. / Mechanical Systems and Signal Processing 122 (2019) 290–306

where ha; bi ¼ a0 b,

ni1 ¼ ð0; yi1 ;

2 X

yin ; . . . ;

n¼1 Ti X

ni2 ¼

yin ;

n¼1

ni3

¼

Ti X

Ti X

0

yin Þ ; si1 ¼ ð0; 1; . . . ; T i Þ

n¼1

!0 yin ; . . . ; yiT i ; 0

; si2 ¼ ðT i ; . . . ; 1; 0Þ0

n¼2

0; ðyi1

 l1 Þ

0

ðyi1

 l1 Þ; . . . ;

Ti X

!0 ðyin

 l1 Þ

0

ðyin

 l1 Þ

n¼1 Ti X

ni4 ¼

0

ðyin  l2 Þ ðyin  l2 Þ;

n¼1

Ti X

!0 0

0

ðyin  l2 Þ ðyin  l2 Þ; . . . ; ðyiT i  l2 Þ ðyiT i  l2 Þ; 0

n¼2

Appendix C The posterior probability that system is in the state i at nth decision epoch can be recursively computed by using the Bayes’ theorem,

!

Pn ðiÞ ¼ PðGnD ¼ ijn > nD; Y n ; P n1 Þ

! ! n1 ÞPðGnD ¼ijn>nD;Y n1 ; P n1 Þ ¼ PgðY n jGnD ¼i;n>nD;Y n1 ; P! ! j

gðY n jGnD ¼j;n>nD;Y n1 ;

ðC:1Þ

P n1 ÞPðGnD ¼jjn>nD;Y n1 ; P n1 Þ

where

P

P

16k6i P ki ðDÞ n1 ðkÞ Pk1 þk2 Pkj ðDÞ n1 ðkÞ P kj k¼1

! PðGnD ¼ ijn > nD; Y n1 ; P n1 Þ ¼ Pk

1 þk2

P

Considering multivariate normal distribution in Eq. (5), the probability term in (C.1) is given by

! gðY n jGnD ¼ i; n > nD; Y n1 ; P n1 Þ ¼



f ðY n jl1 ; R1 Þ f ðY n jl2 ; R2 Þ

if i 2 K 1 if i 2 K 2

Therefore, the posterior probability Pn ðiÞ can be further written as

Pn ðiÞ ¼

8 > P > > < f ðY n jl1 ;R1 Þ

P 16i6j

P

f ðY n jl1 ;R1 Þ

P ðDÞPn1 ðkÞ 16k6i k;i

P ðDÞPn1 ðiÞþf ðY n jl2 ;R2 Þ 8j2K 1 i;j

P

16i6j

Under the assumption that R1 –R2 in (5), we set f ð Y n jl 1 ;R1 Þ

1=2

0

ðð2pÞd jR1 jÞ expð12ðY n l1 Þ R1 1 ðY n l1 ÞÞ 1=2 0 f ð Y n jl 2 ;R2 Þ ðð2pÞd jR2 jÞ expð12ðY n l2 Þ R1 1 ðY n l2 ÞÞ  1=2 1  1 ¼ jR1 j  jR2 j exp 2 ðV n þ DÞ ¼

where

V n ¼ ðY n  BÞ0 AðY n  BÞ 1 A ¼ R1 2  R1

 1   1 1 B ¼ R1 R1 2  R1 2 l 2  R1 l 1 

P ðDÞPn1 ðiÞ 8j2K 2 i;j

P > f ðY n jl2 ;R2 Þ P ðDÞPn1 ðkÞ > 16k6i k;i > P P : f ðY jl ;R ÞP P Pi;j ðDÞPn1 ðiÞþf ðY n jl2 ;R2 Þ n 1 1 16i6j 8j2K 16i6j 8j2K 1



P





1 0 1 1 l20 R1 2 l2  l10 R1 l1  B R2 l2  R1 l1



2

P i;j ðDÞPn1 ðiÞ

if i 2 K 1 ðC:2Þ if i 2 K 2

305

C. Duan et al. / Mechanical Systems and Signal Processing 122 (2019) 290–306

If we define ciPn1 ¼ R16k6i Pn1 ðkÞP ki ðDÞ, the posterior probability in (C.2) can be simplified to

Pn ðiÞ ¼

8 > > >P < > > > :ðR j

ciP c

j

8j2K 1 Pn1

1

þðjR1 j

jR2 jÞ

expð12ðV n þDÞÞ

ciP 1 jjR2 j

1 1=2

Þ

P

n1 1=2

P

P

n1

expð12ðV n þDÞÞ

cj

8j2K 1 Pn1

þ

c

j

if

i 2 K1

if

i 2 K2

8j2K 2 Pk1

cj

8j2K 2 Pk1

Appendix D The first probability term in Eq. (9) can be computed as follows:

  Pr h11 6 Pn ð1Þ < h21 ; . . . ; h1k1 þk2 1 6 Pn ðk1 þ k2  1Þ < h2k1 þk2 1 jn > nD; pn1 ¼

k1P þk2 i¼1

¼

k1P þk2 i¼1

¼

k1P þk2 i¼1

for any 0 <

h1i

  ! Pr h11 6 Pn ð1Þ < h21 ; . . . ; h1k1 þk2 1 6 Pn ðk1 þ k2  1Þ < h2k1 þk2 1 jn > nD; P n1 ; GnD ¼ i PrðGnD ¼ ijpn1 Þ   ciP Pr Kðh11 Þ 6 V n < Kðh21 Þ; . . . ; Kðh1k1 þk2 1 Þ 6 V n < Kðh2k1 þk2 1 ÞjGnD ¼ i Pk1 þkn1 2 j j¼1



ðD:1Þ

cP

n1

     ciP 1 Pr max Kðh11 Þ; . . . ; Kðh2k1 þk2 1 Þ 6 V n < min Kðh21 Þ; . . . ; hðik1 þk2 1 Þ jGnD ¼ i Pk1 þkn1 2 j j¼1

<

h2i

< 1, and the term

Kðhni Þ

cP

n1

for n ¼ 1; 2, in (D.1) is given by

8  0 1 P 1=2 > ciP hni cj > ðjR1 jjR2 j1 Þ > j2K 1 Pðn1Þ ðn1Þ > AD P > 2ln@ j > hni c > < j2K 2 Pðn1Þ n  Kðhi Þ ¼ 0 1 P > j > ciP hni cP ðjR1 j1 jR2 jÞ1=2 > j2K > AD > P2 ðn1Þ 2ln@ ðn1Þ > > hni cj : j2K 1 Pðn1Þ

if i 2 K 1

if i 2 K 2

~ l  B; R2 Þ, the probabilities in (D.1) Since V n ¼ ðY n  BÞ0 AðY n  BÞ; Y n  BjGnD 2 K 1  Nðl1  B; R1 Þ, and Y n  BjGnD 2 K 2 Nð 2 can be computed explicitly using Theorem 3.1 proposed by Provost [28], who provided a closed form expression for the cumulative distribution function of V n jGnD . References [1] D.A. Tobon-Mejia, K. Medjaher, N, CNC machine tool’s wear diagnostic and prognostic by using dynamic bayesian networks, Mech. Syst. Sig. Process. 28 (2012) 167–182. [2] J.L. Bogdanoff, F. Kozin, Probabilistic Models of Cumulative Damage, John Wiley & Sons, 1985. [3] X. Yang, D. Lu, C. Ma, J. Zhang, W. Zhao, Analysis on the multi-dimensional spectrum of the thrust force for the linear motor feed drive system in machine tools, Mech. Syst. Sig. Process. 82 (2017) 68–79. [4] M.J. Kim, R. Jiang, V. Makis, C.G. Lee, Optimal Bayesian fault prediction scheme for a partially observable system subject to random failure, Eur. J. Oper. Res. 214 (2) (2011) 331–339. [5] F.D. Maio, K.L. Tsui, E. Zio, Combining relevance vector machines and exponential regression for bearing residual life estimation, Mech. Syst. Sig. Process. 31 (2012) 405–427. [6] S.J. Bae, W. Kuo, P.H. Kvam, Degradation models and implied lifetime distributions, Reliab. Eng. Syst. Saf. 92 (5) (2007) 601–608. [7] A. Lehmann, Joint modeling of degradation and failure time data, J. Stat. Plann. Inference 139 (5) (2009) 1693–1706. [8] M. Shafiee, M. Finkelstein, C. Berenguer, An opportunistic condition-based maintenance policy for offshore wind turbine blades subjected to degradation and environmental shocks, Reliab. Eng. Syst. Saf. 142 (2015) 463–471. [9] C. Duan, V. Makis, C. Deng, An integrated framework for health measures prediction and optimal maintenance policy for mechanical systems using a proportional hazards model, Mech. Syst. Sig. Process. 111 (2018) 285–302. [10] Y. Ye, C.B. Yin, Y. Gong, J. Zhou, Position control of nonlinear hydraulic system using an improved pso based pid controller, Mech. Syst. Sig. Process. 83 (2017) 241–259. [11] C.H. Lin, M.D. Hsiao, K.J. Lai, A study on using image serving technology for high precision mechanical positioning, Mech. Syst. Sig. Process. 81 (2016) 493–514. [12] Q. Liu, M. Dong, Y. Peng, A novel method for online health prognosis of equipment based on hidden semi-markov model using sequential monte carlo methods, Mech. Syst. Sig. Process. 32 (2012) 331–348. [13] Q. Liu, M. Dong, W. Lv, X. Geng, Y. Li, A novel method using adaptive hidden semi-markov model for multi-sensor monitoring equipment health prognosis, Mech. Syst. Sig. Process. 64 (2015) 217–232. [14] R. Jiang, System Availability Maximization and Residual Life Prediction Under Partial Observations PhD dissertation, University of Toronto, 2011. [15] R. Moghaddass, M.J. Zuo, An integrated framework for online diagnostic and prognostic health monitoring using a multistate deterioration process, Reliab. Eng. Syst. Saf. 124 (2014) 92–104. [16] V. Makis, Multivariate Bayesian control chart, Oper. Res. 56 (2) (2008) 487–496. [17] ISO 230-4, Test code for machine tools; Part 4: Circular tests for numerically controlled machine tools, Geneva, Switzerland, 2005. [18] G.C. Reinsel, Elements of Multivariate Time Series Analysis, Springer, New York, 1997. [19] J.B. Cromwell, Multivariate Tests for Time Series Models (No. 100), Sage Publications, 1994. [20] N. Henze, B. Zirkler, A class of invariant consistent tests for multivariate normality, Commun. Stat.-Theory Methods 19 (10) (1990) 3595–3617. [21] B.G. Quinn, Order determination for a multivariate autoregression, J. Roy. Stat. Soc.: Ser. B (Methodol.) (1980) 182–185. [22] E.J. Hannan, B.G. Quinn, The determination of the order of an autoregression, J. Roy. Stat. Soc.: Ser. B (Methodol.) (1979) 190–195.

306

C. Duan et al. / Mechanical Systems and Signal Processing 122 (2019) 290–306

[23] H. Akaike, Information Theory and an Extension of the Maximum Likelihood Principle, Selected Papers of Hirotugu Akaike, Springer, New York, 1998, pp. 199–213. [24] M.J. Kim, V. Makis, Joint optimization of sampling and control of partially observable failing systems, Oper. Res. 61 (3) (2013) 777–790. [25] A. Khaleghei, V. Makis, Model parameter estimation and residual life prediction for a partially observable failing system, Nav. Res. Logist. 62 (3) (2015) 190–205. [26] M.J. Kim, V. Makis, R. Jiang, Parameter estimation for partially observable systems subject to random failure, Appl. Stochastic Models Bus. Ind. 29 (3) (2013) 279–294. [27] A. Khaleghei, Modeling, Estimation, and Control of Partially Observable Failing Systems Using Phase Method PhD dissertation, University of Toronto, 2016. [28] S.B. Provost, E.M. Rudiuk, The exact distribution of indefinite quadratic forms in noncentral normal vectors, Ann. Inst. Stat. Math. 48 (2) (1996) 381– 394. [29] H.C. Tijms, Stochastic models: an algorithmic approach, John Wiley & Sons, New York, 1994.