An LSSVR-based algorithm for online system condition prognostics

An LSSVR-based algorithm for online system condition prognostics

Expert Systems with Applications 39 (2012) 6089–6102 Contents lists available at SciVerse ScienceDirect Expert Systems with Applications journal hom...

1MB Sizes 2 Downloads 81 Views

Expert Systems with Applications 39 (2012) 6089–6102

Contents lists available at SciVerse ScienceDirect

Expert Systems with Applications journal homepage: www.elsevier.com/locate/eswa

An LSSVR-based algorithm for online system condition prognostics Jian Qu, Ming J. Zuo ⇑ Department of Mechanical Engineering, University of Alberta, Edmonton, Alberta, Canada T6G2G8

a r t i c l e

i n f o

Keywords: Machine condition prognostics Least square support vector machine Cumulative sum technique Genetic algorithms

a b s t r a c t Online machine condition prognostics are useful for condition based maintenance decision making in order to prevent unexpected machine breakdown, human injuries, and costs due to loss of productivity. Noise present in measured condition indicators which can represent machine conditions may, however, adversely affect prognostic results. In literature, machine condition prognostics considering noisy observations of condition indicators are rarely reported. In this work, we propose an algorithm to predict true values of condition indicators based on noisy observations. The proposed algorithm jointly uses least square support vector regression (LSSVR), genetic-algorithm-based optimization, and cumulative sum (CUSUM) technique. LSSVR is used to predict true values of condition indicators. To handle noise effects in observations, parameters of LSSVR are selected using an optimization process of which model is specially developed. Genetic algorithms (GA) are used to solve the optimization problem. To accommodate changes of indicator values and noise effects, CUSUM is employed to trigger re-determination of LSSVR parameters when current predictions are not acceptable. The proposed algorithm is compared with five reported methods using two simulation datasets and two experimental datasets. Ó 2011 Elsevier Ltd. All rights reserved.

1. Introduction Prediction of deterioration trend for machine condition, known as machine condition prognostics, is a useful way to make decisions for maintaining high system reliability and availability (Samanta & Nataraj, 2008). Machine condition is usually represented by a condition indicator. The condition indictor can be either directly observed from monitoring instruments or extracted from raw condition monitoring signals such as vibration signals and acoustic emission signals. In a problem of machine condition prognostics, observations of a condition indicator available up to current time point are used to predict one at future time point of interest. As observations are updated, condition indicator values are successively predicted for future time points. This problem is virtually a time series prediction problem. Data-driven approaches can be used to address this problem (Jardine, Lin, & Banjevic, 2006). Methods reported in this realm include artificial neural networks (ANN), adaptive neuro-fuzzy inference system (ANFIS), autoregressive moving average (ARMA), etc. Samanta and Nataraj (2008) adopted ANFIS and ANN for gearbox condition prognostics. Tse and Atherton (1999) predicted amplitude of vibration signal for gearbox condition prognostics using recurrent neural networks.

⇑ Corresponding author. Address: Department of Mechanical Engineering, Uni versity of Alberta, Edmonton, Alberta, Canada T6G 2G8. Tel.: +1 (780) 492 4466; fax: +1 (780) 492 2200. E-mail address: [email protected] (M.J. Zuo). 0957-4174/$ - see front matter Ó 2011 Elsevier Ltd. All rights reserved. doi:10.1016/j.eswa.2011.12.002

Yan, Koc, and Lee (2004) adopted ARMA to predict condition indicators for elevator condition prognostics. Support vector regression (SVR) that bases on VC-theory (Vapnik, 1998) recently emerged as an effective method for prediction problems. SVR is a mathematical framework that estimates dependency by learning from finite samples. SVR simultaneously minimizes estimation errors and model complexity. This enables it a good generalization ability and to be less prone to over-fitting. For these good properties, SVR attracts more and more attentions in machine condition prognostics in recent years. In reported studies, researchers usually consider prognostic problems with a single condition indicator and noise effects in observations are not considered (Chen, 2007; Ding et al., 2008; Hong & Pai, 2006; Jiang & Zuo, 2008; Pai, 2006; Samanta, 2009; Zhao, Chen, & Xu, 2009). Nevertheless, observations collected from field are usually contaminated by noise due to various factors such as environmental conditions and instrument errors. In addition, observations in a time series may also show noisy patterns because of large variation of their values occurring in a short time interval. Intuitively, an SVR model obtained by learning from such noisy observations tends to give unreliable predictions of which values are far from true values of condition indicator. Hence, there is a need of a method to predict true values of condition indicator based on noisy observations. This is a challenging problem, because it aims to predict a trend of condition indicator for future time points rather than extract a trend for past time points which is a widely studied problem, known as trend extraction (Charbonnier, Garcia-Beltan, Cadet, & Gentil, 2005).

6090

J. Qu, M.J. Zuo / Expert Systems with Applications 39 (2012) 6089–6102

SVR governs the learning process using pre-specified model parameters. This enables SVR flexibility to seek different expected results through different selection of these parameters. Two kinds of methods, analytical methods and optimization-based methods, have been reported for selecting SVR parameters in consideration of noise effects. The analytical methods have mathematical expressions for parameters which can directly compute SVR parameters. In literature, it is acknowledged that one parameter of SVR, parameter e, should be selected considering noise level (standard deviation of noise). Cherkassky and Ma (2004) formulated an expression to select parameter e in terms of noise level and sample size. Chalimourda, Schölkopf, and Smola (2004), and Schölkopf and Smola (2002) stated that optimal e is proportional to noise level with a factor of 0.612. Kowk and Tsang (2003) derived an expression indicating that optimal parameter e is proportional to noise level with a factor of 0.9846. Huber (1964) proposed a loss aka least-modulus (LM) loss which practically suggested zero value for parameter e. Analytical methods are, however, not case-specific, so they do not ensure good results for a specific problem. The optimization-based methods select SVR parameters based on an optimization process. In such methods, SVR model parameters are treated as decision variables and objective function is minimizing a measure value which evaluates errors between actual observations and corresponding predictions. The normalized root mean square error (NRMSE) is usually used for the measure. Cross validation is also incorporated with the optimization model in order to get rid of over-fitting caused by noise (Chen, 2007). Chen (2007) and Pai (2006) predicted a condition indicator of system reliability using SVR, where cross validation was incorporated with a GA-based optimization process for obtaining SVR parameters. These optimization-based methods are case-specific, but there is no term directly relevant to noise effects. Recently, Qu and Zuo (2010) proposed an optimization model which addresses noise effects in their fitness function for SVR-based machine condition prognostics. In these reported studies, SVR parameters are determined only once based on observations at the beginning of monitoring periods which correspond to initial machine condition. Since observation values may change significantly over the whole monitoring periods, the parameters for initial machine condition are insufficient. In addition, another type of SVR called least square SVR (LSSVR) is reported to be effective in predicting true values for cases where observations are subject to Gaussian noise. Sun, Huang, and Fang (2005) used LSSVR to extract trend for noisy lidar signal. Suykens, Brabanter, Lukas, and Vandewalle (2002) proposed a weighted LSSVR method for data prediction. These studies are, however, related to trend extraction rather than trend prediction. In this work, we propose an algorithm for machine condition prognostics, which addresses the challenge of trend prediction based on noisy data. The proposed algorithm jointly uses LSSVR, genetic-algorithm-based optimization, and cumulative sum (CUSUM) technique. LSSVR is used for prediction. LSSVR parameters are determined by an optimization model which is proposed to directly handle noise effects. Genetic algorithms (GA) are used to solve this optimization problem. CUSUM is used to trace deviations of predictions from true values and to trigger a re-determination of LSSVR parameters. The proposed algorithm is compared with three analytical methods and two optimization-based methods using four datasets including two simulation datasets, one slurry pump dataset and one planetary gearbox dataset. Section 2 introduces basics of SVR and LSSVR. Section 3 presents rationale of the proposed algorithm. Section 4 shows results of using the proposed algorithm in the four datasets. Section 5 draws conclusions.

2. Support vector regression An SVR model is obtained by minimizing regression risk of SVR denoted as RSVR:

RSVR ¼ RSTR þ REMR ;

ð1Þ

where RSTR represents structural risk (model complexity), and REMR represents empirical risk (estimation error) assessed by loss functions. The type of SVR is dependent on loss function used. The standard SVR employs a e-insensitive loss function which constructs resultant regression model with selected samples known as support vectors. Another SVR called least square SVR (LSSVR) employs squared loss function to construct resultant regression model but use entire samples. One can refer to Schölkopf and Smola (2002) for more details of SVR loss functions. The loss function of LSSVR corresponds to a Gaussian density model; hence, it has superior performance for data with Gaussian additive noise (Suykens et al., 2002). We introduce the basics of these two SVR models in this section, because LSSVR is a building block of the proposed algorithm and standard SVR is used as a prediction method for comparisons. 2.1. Standard SVR (e-insensitive SVR) Assume that a given set of data fðx1 ; y1 Þ; . . . ; ðxn ; yn Þg  v  R is generated by an underlying functional dependency plus additive noise, yi = f(xi) + gi. If it can be further assumed that the data are linearly describable, the following expression exists:

f ðw; xÞ ¼ wT x þ b;

w 2 v;

b 2 R;

ð2Þ

where w denotes a weight vector and b is a scalar. In SVR theory, the norm of w, kwk2 = wTw, is used to estimate the RSTR term in Eq. (1). It is reported in Vapnik (1998) that the smaller the norm value the lower complexity the SVR model and a low model complexity corresponds to a flat shape of f(x). The standard SVR adopts a e-insensitive loss function to determine tolerated errors between actual and predicted outputs, which is formulated as:

jy  f ðw; xÞje ¼



0;

if jy  f ðw; xÞj 6 e;

jy  f ðw; xÞj  e; otherwise:

ð3Þ

This loss function indicates that one is not concerned about absolute values of the errors that are smaller than e (e > 0), but have to care about those larger than e. It means that SVR only concerns data points that are outside a e-insensitive zone. Using the e-insensitive loss function a standard SVR model can be formulated as: n X   1 Minimize kwk2 þ C ni þ ni ; 2 i¼1 8 T  w x  b 6 e þ ni ; y > i i < Subject to wT xi þ b  yi 6 e þ ni ; > : ni ; ni P 0;

ð4Þ

ð5Þ

where slack variables ni and ni ; ðni > 0; ni > 0; i ¼ 1; 2; . . . ; nÞ, are adopted to measure deviations outside the e-insensitive zone, and parameter C is a positive constant called regularization parameter which determines a compromise between model complexity and amount up to which deviations larger than e should be tolerated. The parameters C and e need to be pre-specified, for which analytical methods and optimization-based methods are reported in literature as mentioned in Section 1. In SVR theory, Eqs. (4) and (5) are solved using Lagrange multiplier method:

6091

J. Qu, M.J. Zuo / Expert Systems with Applications 39 (2012) 6089–6102

Maximize Lðw; b; n; n Þ ¼

n X   1 kwk2 þ C ni þ ni 2 i¼1



n X 

ai e þ ni  yi þ wT xi þ b

f ðxÞ ¼





 

i¼1



ki ni þ

ki ni



Kðxk ; xj Þ ¼ uðxk ÞT uðxj Þ; ;

ð6Þ

i¼1

where ai ; ai ; ki , and ki are Lagrange multipliers and subject to nonnegative requirements. The conditions of optimality can be obtained by taking partial derivative of variables, w, b, ni and ni :

8 n   P > @L > ai  ai xi ; > @w ¼ 0 ! w ¼ > > i¼1 > > > > n > P < @L ¼ 0 ! ðai  ai Þ ¼ 0; @b i¼1 > > > > @L > ¼ 0 ! C ¼ ai  ki ; i ¼ 1; 2; . . . n; > @ni > > > > @L : ¼ 0 ! C ¼ ai  ki ; i ¼ 1; 2; . . . n; @n

ð7Þ

n n n X X 1X ðai  ai Þðaj  aj ÞxTi xj  e ðai þ ai Þ þ yi ðai  ai Þ 2 i;j¼1 i¼1 j¼1

ð9Þ

i¼1

From the first line of Eq. (7), we have w an expression of ai and ai . Therefore, Eq. (2) can be rewritten as: n X 

n X 



ai  ai Kðxi ; xÞ þ b:

ð15Þ

2.2. Least square SVR (LSSVR) Compared with standard SVR, LSSVR is based on a squared loss function and is optimal for prediction of data with Gaussian additive noise (Suykens et al., 2002). The LSSVR model is formulated as:

ð8Þ

f ðxÞ ¼

ð14Þ

i¼1

Incorporating Eq. (7) into Eq. (6) makes w; b; ni ; ni ; ki , and ki vanish and yields a quadratic optimization problem with respect to ai and ai :

n X ðai  ai Þ ¼ 0 and ai ; ai 2 ½0; C: Subject to

k; j ¼ 1; . . . ; n:

Any functions that satisfy the Mercer’s condition can be treated as kernel functions. One can refer to Schölkopf and Smola (2002) for more information about kernel functions. A kernel function and its parameters need to be pre-specified for SVR model. Incorporating the kernel function, K(xi, x), Eq. (13) can be re-written as:

f ðxÞ ¼

i

Maximize 

ð13Þ

The function u() is usually difficult to determine. Fortunately, Eq. (13) reveals that only an inner product of u() is needed in computations. In SVR theory, this is addressed using a kernel function:

ai e þ ni þ yi  wT xi  b

n X 



ai  ai uðxi ÞT uðxÞ þ b:

i¼1

i¼1 n X

n X 

Minimize Jðw; eÞ ¼

n 1 1 X kwk2 þ C e2 ; 2 2 i¼1 i

T

Subject to yi ¼ w uðxi Þ þ b þ ei

ð16Þ

ði ¼ 1; 2; . . . ; nÞ;

where a mapping u() is directly adopted to allow LSSVR to work in a linear space, similar to the regular SVR. Parameter C is regularization parameter which needs to be pre-specified. Eq. (16) has equality constraints resulting from the squared loss function. Using Lagrange multiplier method for Eq. (16) yields an un-constrained optimization problem:

Maximize Lðw; b; e; aÞ ¼ Jðw; eÞ 

n X

ai ðwT uðxi Þ þ b þ ei  yi Þ:

i¼1



ai  ai xTi x þ b;

ð10Þ

ð17Þ

i¼1

where the term b can be obtained based on Karush–Kuhn–Tucker (KKT) conditions on which the product of dual variables and constraints at the point of the optimal solution has to vanish:

ai ðe þ ni  yi þ wT xi þ bÞ ¼ 0; ai ðe þ ni  yi þ wT xi þ bÞ ¼ 0; ðC  ai Þni ¼ 0; ðC  ai Þni ¼ 0:

ð11Þ

tion to term b:

T

b ¼ yi  w xi  e;

for 0 6 ai 6 C; for 0 6 ai 6 C:

8 n P > @L > ¼ 0 ! w ¼ ai uðxi Þ; > @w > > > i¼1 > > > n < @L P ¼ 0 ! ai ¼ 0; @b i¼1 > > > @L > > ¼ 0 ! ai ¼ Cei ; i ¼ 1; 2; . . . n; > @e i > > > : @L ¼ 0 ! wT uðxi Þ þ b þ ei  y ¼ 0; i @a i

It can be seen that only data points corresponding to ai = C or ai ¼ C lie outside the e-insensitive zone and yield a non-zero coefficient w. Such data points are referred to as support vectors. As ai and ai can not be non-zero at the same time, this yields the solu-

b ¼ yi  wT xi  e;

The conditions for optimality are thus given by:

ð12Þ

Eq. (10) is derived based on an assumption that data can be linearly represented, but if it is not the case, Eq. (10) is not applicable any longer. In SVR theory, one introduces a mapping function u() to project original inputs to a high dimensional feature space in which the data can be linearly represented and Eq. (10) becomes applicable again. Accordingly, the regression model is formulated as:

ð18Þ

i ¼ 1; 2; . . . n:

These conditions are similar to those of the standard SVR except for the condition ai = Cei. It can be seen that ai is dependent on ei which is usually non-zero for all data points, so for LSSVR all data points are involved in constructing its regression model, and this is different from regular SVR. By incorporating the first three lines of Eq. (18) into the fourth line, w and e vanish. One obtain equations only containing a and b. For simplicity, these equations can be expressed in a form of matrix equation:

"

0

1T

1 X þ C 1 I

#  b

a

¼

  0 y

;

ð19Þ

where Xk,j = u(xk)Tu(xj), k, j = 1, . . . , n, y = [y1; . . . ; yn], a = [a1; . . . ; an], and 1 = [1; . . . ; 1]. The matrix of X + C1I is symmetric and positive definite, so its inverse exists. The solution of Eq. (19) can thus be obtained:

6092

J. Qu, M.J. Zuo / Expert Systems with Applications 39 (2012) 6089–6102

8 < a ¼ ðX þ C 1 IÞ1 ðy  b1Þ; :b ¼

1T ðXþC 1 IÞ1 y : 1T ðXþC 1 IÞ1 1

ð20Þ

In the theory, one adopts a kernel function to address the inner product, X, which is exactly the same as used in Eq. (14). Therefore, the LSSVR model can be given by:

f ðxÞ ¼

n X

ai Kðxi ; xÞ þ b:

ð21Þ

time t. If a = 1, the prediction is a one-step forecast, while when a > 1 the prediction is a multi-step forecast. Although multi-step forecasting may capture some system dynamics, the performance is poor due to the accumulation of errors (Chen, 2007). For this reason, we consider a = 1 in this paper. Eq. (23) can be interpreted as ^tþa using LSSVR predictor, H(), based on observations predicting y between time points t-b and t. 3.4. The proposed algorithm

i¼1

3. The proposed algorithm for machine condition prognostics 3.1. Assumptions Problems of machine condition prognostics are diversified. In this work, we focus on addressing problems subject to following assumptions. (1) One condition indicator is always available for use. It can be directly observed from instruments or extracted from condition monitoring (CM) data, e.g. vibration data, acoustic emission data. (2) Observations of the condition indicator are not noise-free. The noise is due to various independent random factors. It is known from the central limit theorem that irrespective of the distribution that each random factor follows the combination of them can be reasonably approximated by a Gaussian distribution with mean zero and a certain standard deviation. (3) Observations of the condition indicator are a time series data which are subject to a deterioration trend; however, it is not necessary to be strictly monotonic. (4) Machine of interest can work properly for a sufficient time span since starting operation. This ensures a certain amount of observations available for utilization. (5) A threshold is available for the boundary of machine deterioration. The setup of the threshold implies that once the prediction of indicator exceeds the threshold, a fault with an unacceptable level or a failure is detected and the machine has to be shut-down for maintenance immediately.

The proposed prognostic algorithm aims to predict true values of condition indicator for a series of future monitoring intervals based on updated observations until a prediction exceeds a specified threshold representing an unacceptable working condition. We develop an optimization model for the proposed algorithm in order to select appropriate LSSVR parameters on which true values can be predicted based on noisy observations. In order to accommodate variations of observation values and variations of noise magnitude, the cumulative sum technique (CUSUM) is employed to detect the variations and trigger accordingly re-determination of LSSVR parameters. Fig. 1 shows the procedure of the proposed algorithm. For a succinct presentation, in the following, we may jointly introduce several blocks of Fig. 1 in a single step. Step 1: Collect CM data, compute condition indicator and store observations. The proposed algorithm keeps collecting observations of condition indicator in a dataset until a specified N number of observations are available. This is subject to assumption (4) which enables sufficient data to be used in following steps. Fig. 1 considers the situation that condition indicator is not available from a direct measurement but can be extracted from collected CM data. Step 2: Check data sufficiency. The relationship between t and N is checked where t is time label starting from 1. If t < N, observations will be continuously gathered and stored in the dataset. If t = N, the proposed algorithm will conduct determination of LSSVR parameters for the first time in next step. If t > N, observations will be directly used to train LSSVR model for prediction. The observations in the dataset can be transferred through this step without any loss. Start

3.2. Modeling observations

t=1

Given that a time series of condition indicator values, Yt = [y1, . . . , yt], are known, where yt is an observation at time t. Based on assumption (2), the observations can be modeled as:

yt ¼ ytðtrueÞ þ et ;

t=t+1

t=t+1

Collect CM data at time t

Compute condition indicator values for time t

ð22Þ

where et represents noise following a normal distribution with mean zero and a standard deviation rt(noise). rt(noise) may change over time, but a particular rt(noise) will remain stable for a certain time span. yt(true) represents the true value of condition indicator at time t.

Observations up to time t

Predict condition indicator at time t+1 using the established LSSVR model

No

Fulfill CUSUM criterion? Yes

If t
3.3. Prediction using LSSVR For illustration purpose, we express the process of LSSVR-based prediction for the time point at a-step ahead of time t as:

^tþa ¼ HðYbt ; pLSSVR Þ; y

t and N ?

Trend machine condition using prediction at time t+1

If t=N Determine LSSVR model parameters using GA Exceed threshold?

ð23Þ

^tþa represents the prediction at a-step ahead of time t, pLSSVR where y represents LSSVR parameters including parameter C, kernel function, kf, and kernel parameter, kp, and Ybt represents a time series [ytb, . . . , yt] where ytb represents observation at b-step behind of

If t>N

Establish LSSVR model using optimal parameters

Yes Stop

Fig. 1. Flow chart of the proposed algorithm.

No

J. Qu, M.J. Zuo / Expert Systems with Applications 39 (2012) 6089–6102

Generate individuals for initial population

Train LSSVR model using available observations

Coding of C and kp for an individual

Calculate fitness value

Optimal values for C and kp

Yes

6093

where l represents the number of predictions needed for computa^ represents tion, Clim represents the upper bound of parameter C; r the estimate of noise level, and Dy represents the difference between observations and corresponding predictions. Based on Eq. (22), if Dyi is equal to the noise term, ei, the true values, yi(true), can be obtained for predictions. Gaussian noise can be depicted by its mean and standard deviation, so we build the fitness function to enable Dyi to have identical values to the mean and the standard deviation of ei. In this way, the LSSVR predictions can reach or approach the true values. In Eq. (24), the term in the first absolute sign is the difference between the standard deviation of Dyi and the estimated standard deviation of noise. The estimated standard deviation of noise is obtained using an estimation method to be presented later. The term in the second absolute sign is the difference between the mean of Dyi and the ideal mean of noise, zero. We expect a scenario where the fitness function returns a zero ^i ¼ yiðtrueÞ . Nevertheless, this scenario is often value which yields y unlikely to happen due to the randomness of noise. For this reason, ^i to be as close to yi(true) as possible. The effectiveness of we desire y Eq. (24) for SVR-based prediction has been studied in our preliminary work (Qu & Zuo, 2010), so it is directly used in our proposed algorithm without more demonstrations. The estimated standard deviation of noise is obtained using k-nearest-neighbors regression (Cherkassky & Ma, 2004):

Fulfill stopping

No Selection

Crossover

Mutation

Winner individuals for next generation

vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi u n u n1=5 k 1 X r^ ¼ t 1=5 ðzi  ^zi Þ2 ; n k  1 n i¼1

Fig. 2. Procedure of determining LSSVR model parameters.

Step 3: Determine LSSVR parameters. This is an optimization process which is triggered for the first time when condition t = N is satisfied or the CUSUM criterion (Step 5) is not satisfied. We solve this optimization problem using GA and this GA-based optimization is achieved through MATLAB GA toolbox. Fig. 2 shows the procedure of its implementation and we describe it step by step in the following. For detailed information we used for each sub-step, one can refer to the Help document of MATLAB. Step 3-1: Coding. Two parameters, C and kp, in LSSVR model are decision variables for the optimization problem. We adopt real-value coding strategy for them which means these two parameters are real number during GA optimization. Step 3-2: Generate initial population. An individual (chromosome) is composed of the two parameters in the form of [C, kp]. A number of 100 individuals are generated at random for the initial population. Step 3-3: Train LSSVR model. The LSSVR model with the parameters carried by each individual is trained based on available observations. Step 3-4: Calculate fitness value. Fitness value of training data such as NRMSE is easy to compute; however, it tends to over-fitting due to noise. We propose a fitness function to address this problem. The fitness function is designed to compel the difference between observations and corresponding predictions as close to noise as possible. The mathematical expressions of the fitness function and constraints subject to are given as:   vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi    X u X  t  1 t u1 t 1 X  2   t ^ þ  Minimize  ½Dyi  ðDyj Þ  r ðDyj Þ   l l l   i¼tlþ1 j¼tlþ1 j¼tlþ1

Subject to 0 < C 6 C lim ; Dyi ¼ yi  y^i ; ðYbi ; C; kt ; kp Þ;

^iþ1 ¼ H y

ð24Þ

ð25Þ

where ^zi represents the estimate of the data point zi obtained by knearest neighbors regressions, k is the number of nearest data used to estimate zi, and n is the sample size of the whole dataset. Step 3-5: Selection. A stochastic uniform method is employed to select individuals. This is a default option of MATLAB. Step 3-6: Crossover. An arithmetic crossover is employed which creates children (new individuals) that are the weighted arithmetic mean of two parents (old individuals). The weight is 0.8 for the individual with a greater fitness value and is 0.2 for the other individual. Step 3-7: Mutation. A uniform mutation is employed to provide necessary diversity property of population. The mutation fraction is set to 0.1. Step 3-8: Stopping criterion. The optimization process stops if there is no improvement in the objective function for 10 generations. Step 4: Establish LSSVR model and do prediction. The LSSVR model is established using the optimal parameters returned by Step 3. This model is then used in prediction as shown in Eq. (23). Step 5: Fulfill CUSUM criterion. CUSUM is adopted to determine adequacy of current LSSVR model and trigger, if needed, the optimization process (Step 2) to re-determine optimal parameters for LSSVR model based on the updated observations. Suppose that a random variable Z is drawn from a normal distribution with mean l and standard deviation of r. CUSUM detects the deviation of a certain observation from the mean by:

di ¼

zi  l

r

;

ð26Þ

where di represents the multiple of r that a certain observation zi deviates from l. In terms of a large sample size, the two statistical parameters, l and r, can be replaced with sample mean and sample standard deviation.

6094

J. Qu, M.J. Zuo / Expert Systems with Applications 39 (2012) 6089–6102

Compute μ, σ, and d at current time t

Compute UB and LB at current time t

UB=0 LB=0 No

If UB or LB
Fig. 3. CUSUM criterion.

CUSUM adopts two sums to detect the unallowable deviation which are given by Ryan (1989):

UBi ¼ max½0; ðdi  mÞ þ UBi1 ; LBi ¼ max½0; ðdi  mÞ þ LBi1 ;

ð27Þ

where UBi is for detecting positive deviation and LBi is for detecting negative deviation. The values of UB0 and LB0 are zeros. The value of m is usually selected to be 0.5 which is appropriate for detecting a 1-sigma deviation. There is a threshold, h, such that when it is exceeded by either sum, an unallowable deviation is detected. The h is usually selected to be 3 as suggested in Ryan (1989). We expect the predictions of LSSVR to be as close to the true values of condition indicator as possible. As a result, the differences ^i , are norbetween observations, yi, and corresponding predictions, y mally distributed with mean zero and a certain standard deviation. ^i . Fig. 3 shows CUSUM can be used to control the differences, yi  y the CUSUM criterion in which l and r of the differences up to time t are estimated by sample mean and sample standard deviation: t X 1 ^i Þ; ðy  y t  t0 þ 1 i¼t i 0 vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi u t u 1 X ^i  lÞ2 ; r¼t ðy  y t  t 0 i¼t i



ð28Þ

ð29Þ

0

where t0 represents the time point that CUSUM begins to operate. Once either of the two sums exceeds the threshold, the sum values are reset to zero. Step 6: Trend prediction. If CUSUM criterion is fulfilled, the prediction at time t + 1 is accepted to represent machine condition. Step 7: Check threshold. According to assumption (5), if the prediction from Step 5 is greater than the threshold, the proposed algorithm is stopped; otherwise, Steps 1 to 5 are repeatedly operated. 4. Test preliminaries The proposed algorithm is examined using two simulation datasets and two experimental datasets. The portion of checking a prediction whether exceeding the threshold at Step 7 is not included in this study, as this portion is for decision making which is not our focus in this paper. 4.1. Methods for comparisons The following five methods are adopted in comparisons with the proposed algorithm.

Method 1: Analytical method proposed in Cherkassky and Ma  þ 3ry j; jy   3ry jÞ and (2004) where C ¼ maxðjy e = 3r  (ln (n)/n)1/2. Method 2: Analytical method proposed in Kowk and Tsang (2003) where e = 0.9846  r (no method for parameter C is reported in Kowk & Tsang (2003)). The  þ 3ry j; jy   3ry jÞ proposed method of C ¼ maxðjy in Cherkassky and Ma (2004) is used. Method 3: Analytical method proposed in Chalimourda et al. (2004) where C = n  ymax and e = 0.612  r. Method 4: GA-based optimization method proposed in Chen (2007) where SVR parameters, C, e and kp are decision variables for optimization. The procedure of implementation is similar to that shown in Fig. 2 except that the fitness function in Step (3–4) is minimizing NRMSE obtained from 5-fold cross validation same as used in Chen (2007). Method 5: Similar to Method 4 but SVR is replaced with LSSVR as LSSVR is optimal for the prediction of data with Gaussian noise (Suykens et al., 2002). LSSVR parameters, C and kp, are decision variables.  The notations in the above methods are defined as follows: y represents the mean of training outputs, ry represents the standard deviation of training outputs, n represents the number of training data, ymax represents the maximal value of training outputs, r represents the standard deviation of noise, and kp represents the parameter of kernel function. 4.2. Measures and criterion 4.2.1. Relative noise level (RNL) A measure called relative noise level (RNL) (Qu & Zuo, 2010) is used to represent noise effects on the observations of condition indicator. RNL is defined as a ratio of noise level to the standard deviation of observations. The RNL values approximately range from zero to one, so noise effects on observations can be easily perceived through it. It is straightforward that the higher the RNL value the greater the noise effects. 4.2.2. Normalized root mean square error (NRMSE) Tests using simulation datasets are necessary since true values of condition indicator are available for simulation data. We evaluate the five methods and the proposed algorithm using NRMSE; this is given by:

vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ffi , u n n uX X 2 ^i Þ NRMSE ¼ t ðyiðtrueÞ  y y2iðtrueÞ ; i¼1

ð30Þ

i¼1

^i represent true values known for simulation data where yi(true) and y and corresponding predictions, respectively. The smaller the NRMSE value, the better the methods perform. 4.2.3. Criterion for evaluation Using NRMSE measure only is not enough to accurately evaluate the performances of different methods, because a small NRMSE value may correspond to a trend with unacceptable fluctuations which still shows noisy pattern. For this reason, the visual comparison of a predicted trend and its corresponding trend of true value are necessary. The criterion is that a predicted trend is regarded superior to its counterpart if it is smoother given that its NRMSE value is comparable to its counterpart. Furthermore, we prefer a smooth trend with a slightly greater NRMSE value to a noisy trend with a smaller NRMSE value because the former is with less uncertainty and is easier to be used in decision making. For simulation

J. Qu, M.J. Zuo / Expert Systems with Applications 39 (2012) 6089–6102

dataset, NRMSE value and a plot of trend are both given, but for experimental dataset, only a plot of trend is given since true value of condition indicators are not available in this situation. 4.3. Datasets 4.3.1. Simulation dataset #1 Machine condition deterioration usually corresponds to a monotonic trend of condition indicator values. We simulate the observations in this situation by the following expression:

yðtÞ ¼ 10 þ 103 expðtÞ þ c  eðtÞ;

ð31Þ

where t is an integer representing time label, e(t) represents additive noise follows normal distribution with mean zero and a standard deviation of one, and c is a parameter for determining noise level. For Simulation dataset #1, we generated six groups of data based on RNL values, 0.2, 0.3, 0.4, 0.5, 0.6, and 0.7. Data length is 400 for each RNL value. 4.3.2. Simulation dataset #2 In practice, degradation trend may contain fluctuations due to seasonal components. We simulate the observations in this situation by the following expression:

yðtÞ ¼ 10 þ sinð2pftÞ þ 103  expðtÞ þ c  eðtÞ;

ð32Þ

where f is sampling frequency and is equal to 500. Other parameters have the same meanings as those of Simulation dataset #1. We generated six groups of this data using the way same to that used in Simulation dataset #1. Data length is 1000 for each RNL value in this dataset.

6095

4.3.3. Slurry pump dataset Fig. 4 illustrates a lab slurry pump test rig that we established to collect slurry pump CM data. According to the examinations of impellers taken off from field, we manually made two damage modes for pump impellers including impeller vane trailing edge damage and impeller vane leading edge damage (see Fig. 5). Four wear degrees including baseline (no wear yet), slight, moderate, and severe were created on the impellers for each damage mode. Experiments were operated under pump speeds of 1800, 2200, and 2600 rpm. Vibration data were collected through three accelerometers mounted on different places of pump casing. Each accelerometer collected data from three directions x, y, and z. A sampling frequency of 9 kHz was used for the experiments. One can refer to Zuo, Wu, Feng, Aulakh, and Miao (2007) for more details of this experiment. A 5-min time span vibration data were collected for each combination of damage modes, wear degrees and pump speeds. The software for data acquisition automatically split and saved the data into 40 separate data files each of which is an 8-sec time record. As a result, each file should contain 72,000 (9000  8) data points; however, a close look of all the data files suggested that some had data points less than 72000. For this reason, we selected 3 consecutive data segments from each data file. Each segment contained 18,000 data points. This data length is sufficient to capture the lowest interested frequency component of 30 Hz which is corresponding to the pump speed of 1800 rpm. Therefore, for a combination of pump speed and damage mode, we had 120 (3  40) data segments for a single damage degree and 480 (4  120) data segments for all four damage degrees. We connected these data segments from baseline to severe to construct a series of

Fig. 4. Schematics of the lab slurry pump system.

Fig. 5. Impeller vane trailing edge damage (left) and leading edge damage (right).

6096

J. Qu, M.J. Zuo / Expert Systems with Applications 39 (2012) 6089–6102

degradation data. We built slurry pump datasets using the vibration signal from A1-z direction and collected under a pump speed of 2600 rpm. Kurtosis and the second harmonic of pump speed (2X) were computed and employed as the indicators to represent pump conditions. 4.3.4. Planetary gearbox dataset Fig. 6 illustrates the test rig from which we collected CM data. The test rig was designed to provide full capabilities of performing controlled experiments for developing a reliable diagnostic system for the planetary gearbox. The planetary gearbox has an over-hung floating configuration to mimic the support used in the field gearboxes of Syncrude mining operations. The main components of the test rig include one 20 HP drive motor, one stage bevel gearbox, two stages of planetary gearboxes, two stages of speed-up gearboxes, and one 40 HP load motor. A run-to-failure (RTF) experiment was conducted to the test rig. The second stage planetary gearbox is our study object. We chose softer gear set for the second stage planetary gearbox so that these gears can wear out before wear happens to other gears. With this arrangement, we are able to study wear progression of the planetary gearbox. The parameter settings for the experiment are the motor speed is 1200 rpm, the load applied is 20 Klb  in, and the sampling frequency is 5 kHz. The RTF experiment has been operated for 762 h. Interim, we have suspended 19 times for inspections. Pictures were taken for every gear in the second stage during each suspension. We terminated the experiment, because significant wear (60% material loss on tooth surface) was observed on the sun gear as shown in Fig. 7. Vibration signals of 5-min time span were collected every two hours for each of 19 runs. Each time span data were further split into 10 segments in an equal length. All data segments were connected to construct a series of degradation data. Many features are available to detect faults for fixed shaft gearboxes (Lei, Zuo, He, & Zi, 2010), but it is not the case

for planetary gearboxes. We select a sideband feature to represent the conditions of the planetary gearbox based on the study in Inalpolat and Kahraman (2009). 5. Results and analysis This section shows the results of the proposed algorithm and the five methods used for comparisons. Condition indicator values showing in plots are normalized using (y-min (y))/(max (y)-min (y)) where y is the observation of a condition indicator. For simulation datasets, NRMSE values are given for all cases of RNL, but only the plots of cases RNL = 0.3 (small noise effects) and RNL = 0.7 (large noise effects) are given due to paper length. The parameters used for Method 4, Method 5, and the proposed algorithm are given in Table 1. Parameter N denotes the number of observations available (see Fig. 1), l denotes the number of predictions used to determine SVR/LSSVR parameters (see Eq. (24)) and b denotes the number of observations used for training (see Eq. (23)). Parameter N is arbitrarily selected to 100 for simulation datasets #1 and #2 and slurry pump dataset. For planetary gearbox dataset, N is set to 200. For Methods 4 and 5, l is set to 5 as suggested by Chen (2007). Because we use a different optimization model, l is set to 30 for the proposed algorithm in order to provide sufficient predictions to compute the mean and the standard deviation of noise. The selection of b is subject to a constraint that b + l is no larger than N. Parameters m and h are for the CUSUM of the proposed algorithm (see Step 5 in Section 3.4 for their selections). 5.1. Results of simulation dataset #1 Table 1 lists the NRMSE values of all six methods for simulation dataset #1. It shows that the proposed algorithm consistently provided

Fig. 6. View of the test rig.

Fig. 7. Progression of wear on sun gear.

6097

J. Qu, M.J. Zuo / Expert Systems with Applications 39 (2012) 6089–6102 Table 1 Parameter settings. Dataset

Method Methods 4 and 5

Simulation dataset #1 Simulation dataset #2 Slurry pump dataset Planetary gearbox dataset

The proposed algorithm

N

l

b

N

l

b

m

h

100 100 100 200

5 5 5 5

50 50 50 50

100 100 100 200

30 30 30 30

50 50 50 50

0.5 0.5 0.5 0.5

3 3 3 3

the smallest NRMSE values for all RNL cases. Fig. 8 shows the predicted trends for case RNL = 0.3. The ‘‘Errors’’ curve (read dash curve) in the legend denotes the difference between the true values and corresponding predictions. It can be seen that Methods 1 and 2 basically provided a similar trend but Method 3 provided one that is still noisy. Nevertheless, Table 2 shows that Methods 1 and 2 provided higher NRMSE values than did Method 3. This is caused by the large deviation occurring when the true value is increasing rapidly. Method 4 basically captured the trend of true values; however, it provided a trend with small fluctuations. Method 5 provided a very smooth trend, since Errors curve is almost merged with zero reference line, but the trend deviates from the true value when the true value increases. This is similar to Methods 1 and 2. The proposed algorithm performed comparably to Method 5 for the duration that the true value is basically consistent. When the true value is increasing and the deviation becomes greater, the proposed algorithm triggered another optimization process to re-determine LSSVR parameters. This happens at time 312 highlighted by a vertical line. Because of this adjustment, the proposed algorithm proceeded to capture the true value after time 312, but we still can see a clear deviation after time 346. This is because training data for determining the second set of parameters still correspond to constant true values, which is not capable of capturing increasing trend

Method 3 1.2

Reference line

Observations Errors

1

Reference line

Observations Errors

1

Predictions

Predictions

0.6 0.4

0.6 0.4

0.8 0.6 0.4

0.2

0.2

0.2

0

0

0

-0.2

-0.2

100

-0.2

100

Time label

Method 5

Method 4

The proposed method

1.2

1.2

1.2 Reference line

Reference line Observations Errors

1

Reference line

Observations Errors

1

Predictions

Predictions

0.6 0.4

0.6 0.4

0.8 0.6 0.4

0.2

0.2

0.2

0

0

0

-0.2

-0.2

-0.2

100 Time label

Observations Errors Predictions

0.8

Condition indicator

Condition indicator

0.8

100

Time label

Time label

1

Observations Errors Predictions

0.8

Condition indicator

0.8

Condition indicator

Condition indicator

Table 3 lists the NRMSE values of all six methods for simulation dataset #2. The proposed algorithm almost provided the smallest NRMSE value for every RNL case. Fig. 10 shows the results of case RNL = 0.3. The observations are basically same as those of simulation dataset #1. Methods 1 and 2 performed comparably. Method 3 provided a smaller NRMSE value than Methods 1 and 2, but its predicted trend is still noisy. Method 4 provided a small NRMSE value of 0.0819 but its trend has fluctuations which display a pattern of noise; in contrast, the trend of Method 5 is quite smooth, but still has deviation when the true value increases. The distinction from

1.2

Reference line

Condition indicator

5.2. Results of simulation dataset #2

Method 2

Method 1 1.2 1

of true values. Hence, it can be predicted that another optimization process will be triggered soon after time 400. Fig. 9 shows the results of case RNL = 0.7. The observations of Methods 1, 2, and 3 are basically same as those for the case RNL = 0.3. Method 4 provided a trend with larger fluctuation than Method 5. Method 5 provided a trend with deviations when the true value increases. The proposed algorithm triggered optimization process at times, 223, 249 283, 309, 349, and 360 which allows for predicting a trend close to that of the true value when the true value increases.

100 Time label

Fig. 8. Trend prediction (RNL = 0.3).

X: 346 Y: 0.01519 100

312 Time label

6098

J. Qu, M.J. Zuo / Expert Systems with Applications 39 (2012) 6089–6102

Table 2 NRMSE values for simulated degradation dataset #1. RNL

Method 1

Method 2

Method 3

Method 4

Method 5

The proposed algorithm

0.2 0.3 0.4 0.5 0.6 0.7

0.2676 0.2746 0.2465 0.1960 0.1726 0.1379

0.2728 0.2810 0.2528 0.2086 0.1796 0.1435

0.2037 0.2176 0.2181 0.1890 0.1904 0.1760

0.1342 0.1803 0.1634 0.1035 0.1847 0.1241

0.1077 0.2992 0.1709 0.1071 0.1111 0.1126

0.0881 0.1306 0.1548 0.0823 0.0814 0.0755

Method 1

Observations Errors

1

Observations Errors

0.8

0.4

0.8

Condition indicator

Condition indicator

0.6

0.6 0.4

0.6 0.4 0.2

0.2

0.2

0

0

0

-0.2

-0.2

-0.2

100

Method 4

Time label

Method 5

The proposed method

1.2 Reference line

1.2 Reference line

Observations Errors

1

Predictions

Reference line

Observations Errors

1

Predictions

0.6 0.4

0.6 0.4

0.8 0.6 0.4

0.2

0.2

0.2

0

0

0

-0.2

100

-0.2

100

Time label

Observations Errors Predictions

0.8

Condition indicator

Condition indicator

0.8

-0.2

100

Time label

1.2 1

-0.4

100

Time label

Observations Errors Predictions

Predictions

0.8

Condition indicator

Reference line

Reference line

Predictions

Condition indicator

1

1.2 Reference line

1

Method 3

Method 2

1.2

Time label

100

223 249 283 309 Time label

349 360

Fig. 9. Trend prediction (RNL = 0.7).

Table 3 NRMSE values for simulated degradation dataset #2. RNL

Method 1

Method 2

Method 3

Method 4

Method 5

The proposed algorithm

0.2 0.3 0.4 0.5 0.6 0.7

0.1275 0.1173 0.1157 0.1006 0.1073 0.0687

0.1336 0.1235 0.1240 0.1061 0.1124 0.0676

0.1066 0.1124 0.1243 0.1194 0.1399 0.1192

0.0747 0.0819 0.0632 0.0719 0.1484 0.2151

0.1530 0.1472 0.1366 0.1069 0.1227 0.0507

0.0751 0.0544 0.0619 0.0579 0.0635 0.0428

simulation dataset #1 is that the deviations happen quite early for Methods 1, 2, 5 and 6 which is due to a large scale fluctuation generated by the sinusoidal term in Eq. (32). The same is true of the proposed algorithm (see deviation happens at time 153), but the proposed algorithm triggered an optimization process at time 222, which compels the predicted trend return to the right path. The optimization process has been triggered 7 times which indicates the complexity of this dataset and accordingly the effectiveness of the proposed algorithm. Fig. 11 shows the results of case RNL = 0.7. A significant noisy pattern was observed from trends provided by Methods 1, 3 and 4. We saw that the trend of Method 4 is even noisier than the actual observation. The trend of Method 5 was subject to a smallscale fluctuation throughout the whole time span. This is distinct

from its smooth trends for other RNL cases. For interests, we independently ran Method 5 ten times, but the obtained trend patterns were very similar. The proposed algorithm also provided a trend with some small-scale fluctuations in certain time spans, but these fluctuations were quickly corrected. We also found that Method 5 provided a fairly small NRMSE value which is even comparable to that of the proposed algorithm, but because its trend is fluctuated, the trend of the proposed algorithm is more favorable. We conclude the performances of the six methods based on the criterion mentioned in Section 4.2 (3) in the following. Method 3 is the worst among all three analytical methods as its predicted trend is quite noisy. Compared to Method 1, Method 2 provided smoother trends and comparable NRMSE values. Method 4 provided smaller NRMSE values than Method 5, but its predicted trend has

6099

J. Qu, M.J. Zuo / Expert Systems with Applications 39 (2012) 6089–6102 Method 1

1.2

Method 2

1.2

Observations Errors

1

0.6 0.4

Predictions

0.8

Condition indicator

Condition indicator

Condition indicator

0.8

0.6 0.4

0.8 0.6 0.4

0.2

0.2

0

0

0

-0.2

-0.2

-0.2

100

0.2

100

Method 4

Time label

Method 5

The proposed method

1.2

1.2

Reference line

Reference line

Observations Errors

Observations Errors

1

Predictions

Observations

1

Errors

Predictions

0.6 0.4

0.8

Condition indicator

Condition indicator

0.8

0.6 0.4

0.6 0.4

0.2

0.2

0

0

0

-0.2

100

-0.2

100

Time label

Predictions

0.8

0.2

-0.2

100

Time label

Reference line

1

Observations Errors

1

Predictions

Time label

Condition indicator

Reference line

Observations Errors

1

Predictions

1.2

Method 3

1.2

Reference line

Reference line

Time label

X: 153 Y: -0.004957

100

222 313

445 586 Time label

879 912

Fig. 10. Trend prediction (RNL = 0.3).

Method 1

1.2

Method 2

1.2

Reference line Observations Errors

1

0.6 0.4

Predictions

0.8

Condition indicator

Condition indicator

Condition indicator

0.8

0.6 0.4

0.8 0.6 0.4

0.2

0.2

0.2

0

0

0

-0.2

100

-0.2

100

Time label

Method 5

1.2

Reference line

Reference line

Observations Errors

1

Predictions

Observations Errors

1

Predictions

0.4 0.2

Predictions

0.8

Condition indicator

Condition indicator

0.6

0.6 0.4

0.8 0.6 0.4

0

0.2

0.2

-0.2

0

0

-0.4

The proposed method

1.2

Reference line

Observations Errors

0.8

100

Time label

Method 4

1

Observations Errors

1

Predictions

Time label

Condition indicator

Reference line

Observations Errors

1

Predictions

-0.2

Method 3

1.2

Reference line

-0.2

100

-0.2

100

Time label

Time label

100

264 354 395439 562 595647 Time label

794

Fig. 11. Trend prediction (RNL = 0.7).

more fluctuations. Furthermore, its trend is as noisy as actual observations in case RNL = 0.7 of simulation data #2. This shows its deficiency in providing consistent good performance. In conclusion, Method 2 outperformed in the analytical methods and

Method 5 outperformed in the optimization-based methods. Due to the paper length, only these two methods will be used in comparison of the proposed algorithm for the following two experimental datasets.

6100

J. Qu, M.J. Zuo / Expert Systems with Applications 39 (2012) 6089–6102

224 to 228. The proposed algorithm took 16 (228–244) time intervals to capture this change and triggered the re-determination of LSSVR parameters at time 244. As a result, the kurtosis value is lowered to 0.1228 compared to 0.1783 of Method 5.

5.3. Results of slurry pump dataset 5.3.1. Results of trailing edge damage As mentioned in Section 4.3 (3), we considered two condition indicators, 2X and kurtosis. Fig. 12 shows the predicted trend of 2X. It basically shows an increasing trend along with the increase of damage degree. Because true values of 2X are unknown, we cannot calculate NRMSE values. This is true of all experiment datasets. We can see that the trend of Method 2 is the noisiest among the three methods. Method 5 performs well, but it shows greater fluctuations than the proposed algorithm. Fig. 13 shows the predicted trend of kurtosis. It shows a decreasing trend along with the increase of damage degree. The trend of Method 2 is still noisy, especially at the beginning. Method 5 and the proposed algorithm perform comparably. It can be seen that observations of kurtosis have a sudden decrease from times

Method 5

Method 2

1

1

0.9

Predictions

0.8

0.8

0.7

0.7

0.6

0.6

Predictions

Observations

0.9

Predictions

0.8

0.5

0.7 0.6

0.5

2X

2X

2X

The proposed method 1

Observations

Observations

0.9

5.3.2. Results of leading edge damage Fig. 14 shows the predicted trend of 2X. Basically, the observations of 2X do not show a degradation trend. The magnitude of observations changes significantly and the magnitude of noise also varies noticeably. This is obviously not a good trend for decision making; however, we found an interesting property of the proposed algorithm through it. The proposed algorithm predicted two sudden peaks appearing at times 359 and 439. This is good because peaks may carry useful information of machine conditions that one does not want to lose. A possible explanation of this is that a fresh set of LSSVR parameters is determined at time 353 which

0.5

0.4

0.4

0.3

0.3

0.3

0.2

0.2

0.2

0.1

0.1

0.1

0

0

100

0.4

0

100

100

229 Time label

Time label

Time label

323

Fig. 12. Predicted trend of 2X for impeller vane trailing edge damage.

Method 2

Method 5

1

1 0.9

Predictions

0.8

0.8

0.7

0.7 KURTOSIS

KURTOSIS

0.9

0.6 0.5 0.4

0.9

0.2

0.1

0.1 0

Predictions

0.7

0.4

0.2

Observations

0.8

0.5

0.3

100

Predictions

0.6

0.3

0

The proposed method 1

Observations

X: 224 Y: 0.2924 X: 228 Y: 0.1232

KURTOSIS

Observations

0.6 0.5 0.4

X: 224 Y: 0.2863

0.3

X: 244 Y: 0.1783

0.2 0.1

Time label

X: 228 Y: 0.1232

0

100

100

Time label

X: 244 Y: 0.1228 244 Time label

387

Fig. 13. Predicted trend of kurtosis for impeller vane trailing edge damage.

Observations

0.9

Predictions

0.9

Predictions

0.8

0.8

0.7

0.7

0.7

0.6

0.6

0.6

0.5

2X

0.8

2X

2X

1

1

Observations

0.9

The proposed method

Method 5

Method 2 1

0.5 0.4

0.4

0.3

0.3

0.3

0.2

0.2

0.2

0.1

0.1 0

100 Time label

Predictions

X: 359 Y: 0.6654 X: 439 Y: 0.6299

0.5

0.4

0

Observations

0.1 0

100 Time label

Fig. 14. Predicted trend of 2X for impeller vane leading edge damage.

100

276 Time label

353

6101

J. Qu, M.J. Zuo / Expert Systems with Applications 39 (2012) 6089–6102 Method 2

Method 5

1

1

Observations

0.9

Predictions

0.8

0.7

0.7 KURTOSIS

0.8

0.6 0.5 0.4

0.7

0.4

0.2

0.2

0.1

0.1

0

0

Predictions

0.8

0.5

0.3

Observations

0.9

0.6

0.3

100

Predictions

X: 229 Y: 0.2678

KURTOSIS

0.9

KURTOSIS

The proposed method

1 Observations

0.6 0.5 0.4

X: 229 Y: 0.2653

0.3 0.2

X: 232 Y: 0.1505

0.1

Time label

X: 232 Y: 0.1505

0

100

100

224 Time label

Time label

382

Fig. 15. Predicted trend of kurtosis for impeller vane leading edge damage.

Method 2

1 Predictions

Predictions

0.8

0.7

0.7

0.7

0.4

Sideband feature

0.8

0.5

0.6 0.5 0.4

0.5 0.4

0.3

0.3

0.2

0.2

0.2

0.1

0.1

0.1

0

0

0

200

Time label

Time label

Predictions

0.6

0.3

200

Observations

0.9

0.8

0.6

The proposed method

1

Observations

0.9

Sideband feature

Sideband feature

0.9

Method 5

1

Observations

200

512 727 783 828938

1263 1485 17051880 1944 20242233 2377 Time label

Fig. 16. Predicted trend of sideband feature for planetary gearbox.

coincidentally happens in the middle of a process where observation values increase, so the parameters determined are able to capture a rapid increasing behavior of observation values, namely the peak. Nevertheless, the proposed algorithm may not always successfully predict a peak, because when to trigger the determination of LSSVR parameters is governed by CUSUM. This leaves us a problem for future work. Fig. 15 shows the predicted trend of kurtosis. It is similar to trailing edge damage that a sudden decrease was found from times 229 to 232. The behavior of the proposed algorithm is interesting for this case. It re-determined LSSVR parameters at time 224 which is actually in advance of the sudden decrease. As a result, the proposed algorithm basically captured this decreasing change and provided a reasonable trend for this time span. A possible explanation of this phenomenon is that the trend of observations is generally declined from starting time point 100, but the LSSVR model is established upon the first 100 data points of which trend is otherwise increasing, so deviation of predictions from unknown true values was accumulated to exceed the threshold which caused LSSVR parameters to be re-determined at time 224. 5.4. Planetary gearbox dataset Fig. 16 shows the predicted trend of the sideband condition indicator. The trend of Method 5 seems to have larger fluctuations than does Method 2 for this dataset. This is different from slurry pump dataset. In contrast, the proposed algorithm provided much smoother trend and successfully captured the peaks. It is observed that the proposed algorithm triggered 14 times re-determination of LSSVR parameters which reveals the complexity of this dataset.

6. Conclusions In this paper, an algorithm is proposed to address online machine condition prognostics problems. The proposed algorithm aims to predict true values of condition indicators representing machine conditions based on noisy observations. This task is achieved by jointly using GA-based optimization, LSSVR, and CUSUM. GA is used to solve an optimization problem of which model is developed to determine LSSVR model parameters. LSSVR is used to predict the true values of condition indicators. CUSUM is adopted to monitor prediction process, which triggers re-determination of the parameters of LSSVR when accumulated deviations of true values and predictions are not tolerated. The proposed algorithm is compared with five reported methods based on four datasets. The results show that the proposed algorithm is able to consistently provide better performance than its counterparts. The performance of the proposed algorithm in two experimental datasets reveal its potential in industrial aplications. Acknowledgment This research was supported by the Natural Sciences and Engineering Research Council of Canada (NSERC). References Chalimourda, A., Schölkopf, B., & Smola, A. J. (2004). Experimentally optimal nu in support vector regression for different noise models and parameter settings. Neural Networks, 17, 127–141. Charbonnier, S., Garcia-Beltan, C., Cadet, C., & Gentil, S. (2005). Trend exaction and analysis for complex system monitoring and decision support. Engineering Applications of Artificial Intelligence, 18, 21–36.

6102

J. Qu, M.J. Zuo / Expert Systems with Applications 39 (2012) 6089–6102

Chen, K. Y. (2007). Forecasting systems reliability based on support vector regression with genetic algorithms. Reliability Engineering and System Safety, 92, 423–432. Cherkassky, V., & Ma, Y. (2004). Practical selection of SVM parameters and noise estimation for SVM regression. Neural Computation, 17, 113–126. Ding, F., He, Z.J., Zi, Y.Y., Chen, X.F., Tan, J.Y., Cao, H.R., & Chen, H.X. (2008). Application of support vector machine for equipment reliability forecasting. In The IEEE International Conference on Industrial Informatics, DCC, Daejeon, Korea (pp. 526–530). Hong, W. C., & Pai, P. F. (2006). Predicting engine reliability by support vector machines. International Journal of Advanced Manufacturing Technology, 28, 154–161. Huber, P. (1964). Robust estimation of a location parameter. Annals of Mathematical Statistics, 35, 73–101. Inalpolat, M., & Kahraman, A. (2009). A theoretical and experimental investigation of modulation sidebands of planetary gear sets. Journal of Sound and Vibration, 323, 667–696. Jardine, A. K. S., Lin, D., & Banjevic, D. (2006). A review on machinery diagnostics and prognostics implementing condition-based maintenance. Mechanical Systems and Signal Processing, 20, 1483–1510. Jiang, F., & Zuo, M. J. (2008). Forecasting machine vibration trends using support vector machines. International Journal of Performability Engineering, 4(2), 169–181. Kowk, J. T., & Tsang, I. W. (2003). Linear dependency between p and the input noise in e-support vector regression. IEEE Transactions on Neural Networks, 14(3), 544–553. Lei, Y. G., Zuo, M. J., He, Z. J., & Zi, Y. Y. (2010). A multidimensional hybrid intelligent method for gear fault diagnosis. Expert Systems with Applications, 37(2), 1419–1430. Pai, P. F. (2006). System reliability forecasting by support vector machines with genetic algorithms. Mathematical and Computer Modeling, 43, 262–274.

Qu, J., & Zuo, M. J. (2010). SVM-based prognosis of machine health condition. In Proceedings of the 2010 international conference on mechanical, industrial, and manufacturing technologies, Sanya, China, January 22–24, 2010 (pp. 337–341). Ryan, T. P. (1989). Statistical methods for quality improvement. John Wiley & Sons. Samanta, B. (2009). Prognostics of machine condition using energy based monitoring index and computational intelligence. Journal of Computing and Information Science in Engineering, 9(4), 044502. Samanta, B., & Nataraj, C. (2008). Prognostics of machine condition using soft computing. Robotics and Computer-Integrated Manufacturing, 24, 816–823. Schölkopf, B., & Smola, A. (2002). Learning with kernels: Support vector machines, regularization, and beyond. Cambridge, MA: MIT Press. Sun, B. Y., Huang, D. S., & Fang, H. T. (2005). Lidar Signal denoising using least-squares support vector machine. IEEE Signal Processing Letters, 12(2), 101–104. Suykens, J. A. K., Brabanter, J. D., Lukas, L., & Vandewalle, J. (2002). Weighted least squares support vector machines: Robustness and sparse approximation. Neurocomputing, 48, 85–105. Tse, P. W., & Atherton, D. P. (1999). Prediction of machine deterioration using vibration based fault trends and recurrent neural networks. Journal of Vibration and Acoustics, 121, 355–362. Vapnik, V. N. (1998). Statistical learning theory. New York: John Wiley & Sons. Yan, J. H., Koc, M., & Lee, J. (2004). A prognostic algorithm for machine performance assessment and its application. Production Planning and Control, 15(8), 796–801. Zhao, F., Chen, J., & Xu, W. (2009). Condition prediction based on wavelet packet transform and least squares support vector machine methods. Proceedings of the Institution of Mechanical Engineers, Part E, Journal of Process Mechanical Engineering, 223(2), 71–79. Zuo, M. J., Wu, S. Y., Feng, Z. P., Aulakh, A., & Miao, C. X. (2007). Condition monitoring of slurry pumps for wear assessment. Final Technical Report. Department of Mechanical Engineering, University of Alberta, March 30, 2007.