Mechanical Systems and Signal Processing 139 (2020) 106602
Contents lists available at ScienceDirect
Mechanical Systems and Signal Processing journal homepage: www.elsevier.com/locate/ymssp
A new data-driven transferable remaining useful life prediction approach for bearing under different working conditions Jun Zhu a,b, Nan Chen a,b,⇑, Changqing Shen c,⇑ a
Department of Industrial Systems Engineering and Management, National University of Singapore, Singapore Sembcorp-NUS Corporate Laboratory, Singapore c School of Rail Transportation, Soochow University, Suzhou, Jiangsu, PR China b
a r t i c l e
i n f o
Article history: Received 26 February 2019 Received in revised form 13 September 2019 Accepted 23 December 2019
Keywords: Transfer learning Hidden Markov model Remaining useful life estimation
a b s t r a c t Remaining useful life (RUL) estimation plays a pivotal role in ensuring the safety of a machine, which can further reduce the cost by unwanted downtime or failures. A variety of data-driven methods based on artificial intelligence have been proposed to predict RUL of key component such as bearing. However, many existing approaches have the following two shortcomings: 1) the fault occurrence time (FOT) is ignored or selected subjectively; 2) the training and testing data follow the same data distribution. Inappropriate FOT will either include unrelated information such as noise or reduce critical degradation information. The prognostic model trained with dataset in one working condition can not generalize well on dataset from another different working condition owing to distribution discrepancy. In this paper, to handle these two shortcomings, hidden Markov model (HMM) is first employed to automatically detect state change so that FOT can be located. Then a novel transfer learning method based on multiple layer perceptron (MLP) is presented to solve distribution discrepancy problem. Experiment study on RUL estimation of bearing is analyzed to illustrate the effectiveness of the proposed method. The results demonstrate that the proposed framework can detect FOT adaptively, at the same time provide reliable transferable prognostics performance under different working conditions. Ó 2019 Elsevier Ltd. All rights reserved.
1. Introduction The safe operation of a machine is very necessary and important to avoid sudden failures, which could even cause serious catastrophic results. While fault diagnosis can detect and isolate faults that have already appeared, prognosis could have more meanings since the future health condition can be predicted in advance. Remaining useful life (RUL) prediction plays a vital role in preventing the component from performance deterioration, malfunction and sudden failure, which guarantees the safety operation and facilitates predictive maintenance decision making. According to the recent review paper [1], RUL estimation methods can be classified into four categories: physics modelbased methods [2,3], statistical model-based methods such as inverse Gaussian process model, Wiener process model and Gamma process model [4–6], artificial intelligence-based methods such as artificial neural network, support vector machine, relevance vector machine and random forest [7–10] and hybrid methods [11,12]. These data-driven methods have demonstrated to be effective in taking advantages of measured condition monitoring data for RUL prediction. Although statistical ⇑ Corresponding authors at: School of Rail Transportation, Soochow University, Suzhou, Jiangsu, PR China (C. Shen). E-mail addresses:
[email protected] (N. Chen),
[email protected] (C. Shen). https://doi.org/10.1016/j.ymssp.2019.106602 0888-3270/Ó 2019 Elsevier Ltd. All rights reserved.
2
J. Zhu et al. / Mechanical Systems and Signal Processing 139 (2020) 106602
model-based methods can deal with prognostics under varying working conditions, they usually make the assumption that degradation signal follows the parametric process model, which may not be case in reality. Moreover, these models usually require a failure threshold to determine RUL, which is hard to define according to empirical knowledge. Specially, artificial intelligence-based methods are always a research focus because of their powerful model capacity in building the mapping between condition monitoring data and the corresponding RUL. For example, typical features are fed into traditional shallow model such as artificial neural network (ANN) [7], support vector machine (SVM) [8], relevance vector machine (RVM) [9] and random forest (RF) [10] to predict RUL. Recently, deep learning models have been also applied in RUL estimation, such as convolutional neural network (CNN) [13,14], auto-encoder (AE) [15,16], restricted Boltzmann machine (RBM) [17,18], recurrent neural network (RNN) [19,20]. Although satisfactory performance has been achieved by the methods above, two deficiencies still exist, which limit the successful applications of intelligent fault prognostics. The first shortcoming is that fault occurrence time (FOT) is neglected or determined subjectively. However, as mentioned in [21], how to adaptively identify FOT is very important since improper FOT will either include unrelated information such as noise or reduce critical degradation information. A component usually undergoes a stable period, which means it is at the health stage. After FOT, the degradation level becomes severer and the component is at the failure stage. This kind of phenomenon is commonly seen in bearing as shown in Fig. 1. It is difficult to detect FOT by observing the plot of the extracted feature. Several approaches in detecting FOT have been found in literatures. Li et al. [21] FOT determined when kurtosis metrics exceeds the 3r interval. In [22], FOT is selected based on the hypothesis testing theory by using root mean square (RMS) or peak value as degradation index. Antoni et al. [23] developed a health indicator (HI) called spectral kurtosis for fault detection and diagnosis of bearing. Antoni [24] further proposed spectral negative entropy to deal with the vibration fault detection under the presence of repetitive transients. Smoothness index was proposed by Bozchalooi et al. [25] to guide the selection of proper wavelet parameters for fault detection. Zhang et al. [19] proposed waveform entropy index to identify FOT for bearing degradation signal. In [26], a generalized dimensionless HI was formally defined to find FOT for bearing performance degradation. Wang et al. [27] detected FOT of bearing degradation signal by statistical modeling based on observations of HI. But relying on a single feature is not reliable when it fails to reflect the change of the degradation. Moreover, it is laborious to build such kind of ideal HI based on the domain knowledge. FOT can be also detected by binary classification. In [19], bearing running states are identified by a binary classifier so that FOT can be detected. However, the samples for normal state and fault state are usually manually determined in order to train the binary classifier, which human intervention is involved.
Fig. 1. Bearing degradation process.
J. Zhu et al. / Mechanical Systems and Signal Processing 139 (2020) 106602
3
The second shortcoming is that the training and testing data come from the same distribution. Unfortunately, in practical situation, owing to variation of working condition, distribution discrepancy generally exists between training and testing data, which raises the issue of performance deterioration in RUL prediction. In other words, the prediction model established with training samples (source domain) can not generalize well on the testing samples (target domain). In order to solve the distribution discrepancy (domain shift) problem, transfer learning (TL) is proposed to minimize the distribution discrepancy [28]. According to [28], TL approaches can be generally classified into four categories: instance-based TL, parameter-based TL, relation-based TL and feature-based TL. Instance-based TL reweights the source data through information contained in the target data. Parameter-based keeps partial parameters pretrained in the source domain, which are transferred to be utilized in the target domain. Relation-based TL addresses TL in rational domains, in which data can be expressed by multiple relations. Feature-based TL has attracted more attention since it has shown satisfactory merits in finding a common latent space, where features are domain-invariant for the source and target domain. Lots of research works using transfer learning have been found in different areas, such as image classification, natural language processing and object recognition [29–31]. Currently, fault diagnosis based on transfer learning have been reported in [32–35]. Lu et al. [32] proposed a deep neural network with domain adaptation for fault diagnosis. Wen et al. [33] developed a transfer learning method based a threelayer sparse AE for fault diagnosis of bearing. Guo et al. [34] proposed a deep convolutional transfer learning network containing the two modules: condition recognition and domain adaptation. To deal with cross-domain problems, Li et al. [35] presented a cross-domain fault diagnosis method based on deep generative neural network by artificially creating fake samples for domain adaptation. But to the authors’ best knowledge, transfer learning is rarely investigated in the area of fault prognostics. To overcome the two shortcomings above, this paper proposes a novel data-driven framework for RUL prediction based on hidden Markov model (HMM) and transfer learning. HMM is first employed to automatically detect hidden state change so that FOT can be determined. Compared to existing methods such as building an ideal HI and binary classification, HMM can adaptively identify the hidden state through unsupervised learning where the human efforts and intervention based on domain knowledge are greatly reduced. Although HMM has been reported in [36] for tracking bearing wear with features extracted from time-frequency domain, in this paper, HMM accepts input with a feature set containing time-domain features, time-frequency domain and trigonometric features, which comprehensive degradation information are contained. Then transfer learning based on multiple layer perceptron (MLP) is presented to address distribution discrepancy problem, which includes two modules: RUL prediction and domain adaptation. RUL prediction module attempts to find the relationship between the extracted features and the corresponding RUL. The domain adaptation module aims to acquire domain invariant features by a domain classifier and a domain distribution discrepancy metrics. Through those two modules, the RUL prediction model trained with labeled data measured from one working condition is anticipated to effectively predict the RUL of the unlabeled data measured from another different working condition. The effectiveness of the proposed method is validated by the bearing run-to-failure dataset acquired from different working conditions. The contributions of this paper are summarized as follows: 1) Automatic FOT detection by HMM is proposed. To reduce human labor, multiple features providing degradation information from different aspects are fed into HMM to predict the hidden state by unsupervised manner. 2) A novel transfer learning method based on MLP for RUL prediction is presented, which solves the problem of distribution discrepancy. Two key modules: RUL prediction and domain adaptation are included in the transfer learning model, which maintains RUL prediction performance, and learns the domain invariant features simultaneously. The reminder of this paper is organized as follows. Section 2 presents the framework of the proposed data-driven transfer prognostics approach. Section 3 illustrates the effectiveness and superiority of the proposed method by experimental analysis. Finally, conclusions are given in Section 4. 2. Framework of proposed data-driven transfer learning prognostics approach Fig. 2 illustrates the framework of the proposed transfer learning for fault prognosis. Three key steps are as follows. 1) Feature extraction: This first step extracts multiple features to form a feature set, which contains common statistical time-domain features, time-frequency features and trigonometric features. 2) HMM for FOT detection: This second step aims to identify FOT adaptively through HMM. The feature set is converted into observation sequence to predict the hidden states through unsupervised manner. Once faulty state appears in succession, FOT can be successfully identified. 3) Transfer learning for RUL prediction: This third step deals with RUL prediction when training and testing data have distribution discrepancy. By the strategy of domain adaptation, domain invariant features can be learned to realize transfer learning for prognostics.
4
J. Zhu et al. / Mechanical Systems and Signal Processing 139 (2020) 106602
Fig. 2. Framework of proposed transfer learning for RUL prediction.
2.1. Step1: feature extraction After acquiring raw vibration data, we extract 13 time-domain features, 16 time-frequency domain features, and 3 features based on trigonometric functions in Table 1. The 13 time-domain features cover a wide range of popular time-domain characteristics. The 16 time-frequency domain features are energies of sixteen frequency bands generated by performing four-level wavelet packet decomposition. The 3 trigonometric features are standard deviation of inverse hyperbolic cosine (SD of IHC), standard deviation of inverse hyperbolic sine (SD of IHS), and standard deviation of inverse tangent (SD of IT). Time-domain features could be sensitive to different faults or the level of the degradation. For instance, kurtosis is more susceptible to incipient fault and RMS can monitor the long term degradation severity. For time-frequency domain features, rather than depending on certain frequency bands, we took advantages of the information from the entire frequency spectrum so that information included in the other frequency bands will not be lost, which might be useful. Although wavelet packet decomposition is not optimal, it can effectively extract time-frequency domain features from the nonstationary bearing degradation signal. As for trigonometric features, trigonometric functions transform the input signal into different scales so that features will have better trends. Frequency-domain features are not extracted here because the frequency resolution of the vibration signal is too low as reported in [1]. Since there are two data acquisition channels (vibration in x and y directions), we form a feature set with totally 64 features.
J. Zhu et al. / Mechanical Systems and Signal Processing 139 (2020) 106602
5
Table 1 Feature Set. Type
Feature
Time-domain features
F1: Entropy F3: Root Mean Square F5: Square Mean Root F7: Max Absolute F9: Shape Factor F11: Impulse Factor F13: Standard Deviation
F2: Energy F4: Kurtosis F6: Mean Absolute F8: Skewness F10: Clearance Factor F12: Crest Factor
Time-frequency domain features
F14-F29: Energies of sixteen bands
Trigonometric features
F30: SD of IHC F31: SD of IHS F32: SD of IT
2.2. Step2: HMM for FOT detection HMM is a statistical model to represent a system evolving through a limited number of states [37,38]. Miao et al. [39] took advantages of HMM in processing information acquired from vibration signals for classification. Wang et al. [40] utilized HMM for early gear fault diagnosis and its performance degradation assessment. Dong et al. [41] further developed hidden semi-Markov model (HSMM), which can overcome the modeling limitation of HMM. Though the working states of the bearing are not directly observed, it is possible to detect the working state change by analyzing the above feature set such that FOT can be localized. HMM has three parameters: k ¼ ðA; B; pÞ. The parameters of HMM are explained as follows. 1) A set of (N) hidden states H ¼ fH1 ; H2 ; . . . ; HN g. The state at time t is given by st 2 H; 0 6 t 6 T, where T is the length of the observation sequence and st represents the current state. 2) An initial probability value (p) that denotes how possible it is for an input measure to start in a given state
pi ¼ Pðs0 ¼ Hi Þ; 1 6 i 6 N
ð1Þ
3) A transition probability matrix (A ¼ faij g) that suggests the likelihood of changing from one state Hi at time t to another state Hj at time t þ 1.
aij ¼ Pðstþ1 ¼ Hj jst ¼ Hi Þ; aij P 0;
1 6 i; j 6 N
ð2Þ
1 6 i; j 6 N
N X aij ¼ 1 j¼1
4) An emission probability matrix (B ¼ bj ðot Þ) that denotes how possible it is for a certain observation value to come from a given state
bj ðot Þ ¼ Pðot jst ¼ Hj Þ;
16j6N
ð3Þ
where bj ðot Þ is the probability of emitting an observation ot in the state Hj at time t. To estimate these parameters, an iterative expectation maximization (EM) algorithm also known as Baum-Welch algorithm is presented. Then given the observation sequence O ¼ O1 ; O2 ; . . . ; OT and the model parameters k, Viterbi algorithm is introduced to identify a single best state sequence S ¼ s1 ; s2 ; . . . ; sT to maximize PðSjk; OÞ. Moreover, we make assumption that the emission probability follows the Gaussian distribution. To adapt our task of hidden state prediction, we define the number of hidden states N ¼ 2, which represents the normal and faulty state. FOT is detected when the faulty states in S appear in succession. As mentioned previously, the degradation process of a bearing typically contains two stages, where at the first stage the bearing is at the healthy stage and the second stage indicates that a fault occurs. How to accurately identify FOT plays a key role in RUL estimation. If we include the information from the first stage to train a data-driven model, unrelated degradation information is included, which causes interference for model building. At the same time, we may exclude critical degradation information if FOT is not detected timely. 2.3. Step3: transfer learning based on MLP The basic notation is firstly introduced to explain RUL estimation problem by transfer learning. Let v be a feature space, X be a particular sample and PðXÞ be a marginal probability distribution. Then a domain is defined by D ¼ fv; PðXÞg, where
6
J. Zhu et al. / Mechanical Systems and Signal Processing 139 (2020) 106602
X 2 v. Traditional machine learning problems assume that the training and testing data are in the same domain, which means they share the same feature space and probability distribution. However, transfer learning solves the problem that training and testing data are sampled from different probability distribution. In general, given a source domain DS with labeled data, and a target domain DT without labeled data, transfer learning is devoted to enhance model prediction capacity in the target domain by taking advantages of knowledge learned from the source domain. In this paper, it is assumed that in the source domain, the labeled samples are enough to build a satisfactory RUL predictor. This assumption is realistic since we can have access to the labeled source samples. Also, we assume that the features from the source and target domain only differ in their respective data distributions, which means only marginal distribution is considered (vS ¼ vT and PS ðX S Þ – P T ðX T Þ). The reason that we do not consider conditional distribution discrepancy is that we are dealing with unsupervised domain adaptation, which target samples do not have labels. Our task here is to build a regression model with labeled data in one working condition (source domain) and unlabeled data in another different working condition (target domain) so that transfer RUL estimation can be accomplished across different working conditions. The key lies in learning domain invariant features, which are subject to the same or nearly the same distribution regardless of input data is from source domain or target domain. If the learned features are domain invariant, then the regression model trained with source domain data can efficaciously make prediction for the features learned from the target domain. As shown in Fig. 3, the features from the source domain (labeled) and target domain (unlabeled) are simultaneously sent into MLP, then through the proposed training objective (mentioned later), the trained model is expected to generalize well on the target domain. The novel transfer learning model contains two modules: RUL prediction and domain adaptation. RUL prediction is realized by MLP based regression. Domain adaptation is achieved by a domain classifier and a distribution discrepancy metrics. Features before the last layer are utilized by the domain adaptation module to learn domain invariant features. 2.3.1. RUL prediction Since MLP based RUL prediction is in fact a regression problem, we introduce a feed forward neural network (FFNN) with four layers including one input layer, two hidden layers (HL1 and HL2) and one output layer (OL). The first three layers can be treated as a salient feature learner and the last layer is regarded as RUL prediction regression. The input layer is fed with the extracted feature set. For a sample x in data set X ¼ fx1 ; x2 ; . . . ; xn g consisting n feature, the activation h1 of layer HL1 is calculated as
Fig. 3. The structure of the transfer learning model.
J. Zhu et al. / Mechanical Systems and Signal Processing 139 (2020) 106602
h1 ¼ f ðW 1 x þ b1 Þ
7
ð4Þ
where W 1 is the weight connecting input layer and layer HL1, and b1 is the associated bias. f is an activation function, which is rectified linear units (ReLU). The activation h2 of layer HL2 is calculated as
h2 ¼ f ðW 2 h1 þ b2 Þ
ð5Þ
where W 2 is the weight connecting layer HL1 and layer HL2, and b2 is the associated bias. Based on h2 , RUL prediction is achieved in layer OL by regression,
^ ¼ uðW 3 h2 þ b3 Þ y
ð6Þ
where W 3 is the weight connecting layer HL2 and layer OL, and b3 is the associated bias. u is sigmoid activation function and ^ 2 ½0; 1 is the predicted life percentage (LP). Because RUL estimation here is a supervised learning problem, we should label y the output. As mentioned in previous section, the data samples are labeled with true LP calculated after the FOT. For example, if the failure time of a bearing is 3000s and the FOT is 1000s, the output label will be ð2000 1000Þ=ð3000 1000Þ 100% ¼ 50% at time 2000s. 2.3.2. Domain adaptation The domain adaptation module contains a domain classifier and a domain distribution discrepancy metrics. The domain classifier contains two layers including a hidden layer HL3 and a domain discrimination output layer (DDOL). The output of layer HL2 is the input of layer HL3. A binary classification by logistics regression is executed in layer DDOL.
d¼
1 1 þ eðW d f 3 þbd Þ
ð7Þ
where W d is the weight connecting HL3 and DDOL, bd is the associated bias, f 3 denotes the output of layer HL3. Binary classifier in layer DDOL is introduced as a domain classifier to distinguish whether a certain sample is from source domain or target domain. Compared to parametric measure such as Kullback-Leibler divergence and Jensen-Shannon divergence, maximum mean discrepancy (MMD) is a non-parametric measure to estimate the distribution discrepancy and prevents from calculating the S
T
intermediate density, which is always a hard task. Let h2 and h2 be the output of layer HL2 for the source domain and target domain. The squared formulation of MMD is defined as follows S T MMDðh2 ; h2 Þ
2 1 X nS nT 1X S T ¼ S /ðh2i Þ T /ðh2j Þ n i¼1 n j¼1
ð8Þ
H
where nS is the number of samples from source domain, nT is the number of samples from target domain, k:kH is a reproducS
T
ing kernel Hilbert space (RKHS), / : h2 ; h2 !H. 2.3.3. Training optimization objective The proposed transfer learning objective function has the following three parts: 1) the basic RUL regression error term on the source domain data; 2) the domain classification error term on the source and target domain data; 3) the MMD term between the source and target domain data. Regression error term: RUL prediction module attempts to find the mapping between the extracted features and RUL directly on the source domain data. The loss function defined as mean square error (MSE) should be minimized
Lr ¼
m 1X ^ i Þ2 ðy y m i¼1 i
ð9Þ
where m is the batch size of the training samples, yi is a true label value. Domain classification error term: The features are domain invariant if a domain classifier fails to distinguish them between source and target domain by maximizing the domain classification error [42]. This idea is very similar to generative adversarial network (GAN), which utilizes two players to align distributions in an adversarial manner: domain classifier and feature miner. Features from feature miner are shared by regressor and domain classifier. The domain classifier (binary classifier in layer DDOL) is trained to discriminate the domain labels (source:0, target:1) of features generated by feature miner whereas the feature miner is trained to deceive the domain classifier. The loss function defined as cross entropy loss should be maximized
Ld ¼
m 1X ðDi log dðxi Þ þ ð1 Di Þ logð1 dðxi ÞÞÞ m i¼1
ð10Þ
8
J. Zhu et al. / Mechanical Systems and Signal Processing 139 (2020) 106602
where Di is the actual domain label and dðxi Þ is the predicted domain label, which suggests whether xi is from source domain or target domain. MMD error term: The features in the high layer usually contain salient information, which also affect the transferable ability. To lower the distribution discrepancy between the source and target domain data, we minimize the MMD distance S
T
between h2 and h2 through the practical computation m
LMMD ¼
m
m
m
m
m
2 X 2 2 X 2 2 X 2 4 X 4 X 8 X S S T T S T Kðh2i ; h2j Þ þ 2 Kðh2i ; h2j Þ 2 Kðh2i ; h2j Þ m2 i¼1 j¼1 m i¼1 j¼1 m i¼1 j¼1
ð11Þ
where Kð:; :Þ is a kernel function and both the number of the batch from the source and target domain is m2 . Gaussian radial basis function (RBF) that is widely used is chosen as the kernel function here. The final optimization function is summarized as follow
Lfinal ¼ Lr kLd þ lLMMD
ð12Þ
where k and l are both non-negative hyper-parameters, which balance the trade among the three terms. Define hf ; hr and hd be the parameters of the salient feature miner, RUL regression and domain classifier, respectively. Training the novel transfer learning model can be further realized in optimizing
Lfinal ðhf ; hr ; hd Þ ¼ Lr ðhf ; hr Þ kLd ðhf ; hd Þ þ lLMMD ðhf Þ
ð13Þ
by finding the saddle point ^ hf ; ^ hr ; ^ hd so that
ð^hf ; ^hr Þ ¼ argmin Lfinal ðhf ; hr ; ^hd Þ
ð14Þ
^hd ¼ argmin Lfinal ð^hf ; ^hr ; hd Þ
ð15Þ
hf ;hr
hd
By stochastic gradient descent (SGD), the saddle point can be updated as
@Lr @Ld @LMMD hf hf g k þl @hf @hf @hf
ð16Þ
hr hr g
@Lr @hr
ð17Þ
hd hd g
@Ld @hd
ð18Þ
where g is the learning rate. 3. Experimental study 3.1. Experimental data description To validate the effectiveness of the proposed method, analysis is performed on the experimental data from PRONOSTIA in the IEEE 2012 Data Challenge, where run-to-failure datasets with different working conditions are collected [43]. The platform in Fig. 4 includes three key parts: a rotatory part, a degradation generation part and a data collection part. Radial load force is imposed to speed up the degradation. Vibration signals from x and y direction are acquired with sampling frequency of 25,600 Hz. Each sample is recorded with 0.1 s and the record interval is 10 s. When the vibration level of the measured data exceeds a certain threshold, the test is stopped to avoid potential damage. The basic characteristics of the test bearings are listed in Table 2. Table 3 gives the detailed description of the measured data, where three different working conditions are included. When a fault occurs, it usually makes certain fault characteristic frequency appear in the envelope spectrum. Based on the shaft speed and bearing characteristics, we can calculate the theoretical fault frequencies of the inner race, outer race, rolling element and cage based on the equations below. Take bearing from Dataset1 for example, f i ¼ 221:7 Hz, f o ¼ 168:3 Hz, f r ¼ 215:4 Hz and f c ¼ 12:9 Hz.
fi ¼
Nb Dr cosu f rot 1 þ 2 Dp
ð19Þ
fo ¼
Nb Dr cosu f rot 1 2 Dp
ð20Þ
9
J. Zhu et al. / Mechanical Systems and Signal Processing 139 (2020) 106602
Fig. 4. The experimental platform.
Table 2 Characteristics of the tested bearings. Pitch diameter
Diameter of the roller
Number of the roller
Contact angle
25.6 mm
3.5 mm
13
00
Table 3 Descriptions of the experiment data.
Load(N) Speed(rpm)
fr ¼
Dr D2 f rot 1 r2 cos2 u Dp Dp
fc ¼
1 Dr f rot 1 cosu 2 Dp
Dataset1
Dataset2
Dataset3
4000 1800 Bearing Bearing Bearing Bearing Bearing Bearing Bearing
4200 1650 Bearing Bearing Bearing Bearing Bearing Bearing Bearing
5000 1500 Bearing 3 1 Bearing 3 2 Bearing 3 3
1 1 1 1 1 1 1
1 2 3 4 5 6 7
2 2 2 2 2 2 2
1 2 3 4 5 6 7
! ð21Þ
ð22Þ
where N b is the number of roller, f rot is the rotation frequency, Dr is the diameter of the roller, Dp is the patch diameter and u is the contact angle. 3.2. Analysis of the proposed method on experimental data As displayed in Table 4, we assess the proposed method on two transfer fault prognostics tasks: Dataset1 ! Dataset2 and Dataset1 ! Dataset3. The source domain data is from Dataset1 and the target domain data is from Dataset2 and Dataset3. In
10
J. Zhu et al. / Mechanical Systems and Signal Processing 139 (2020) 106602
Table 4 Transfer fault prognostics task. Transfer prognostics
Training dataset
Testing dataset
Dataset1 ! Dataset2
Labeled: Bearing 1 1–Bearing 1 7 Unlabeled: Bearing 2 1 and Bearing 2 2 Labeled: Bearing 1 1–Bearing 1 7 Unlabeled: Bearing 3 1 and Bearing 3 2
Bearing 2 6
Dataset1 ! Dataset3
Bearing 3 3
line with the standard evaluating agreement for unsupervised transfer learning task, the training dataset contains all the labeled data from the source domain and partial unlabeled data from the target domain, while the testing dataset contains the rest of partial unlabeled data from the target domain [34]. The experimental study is performed on a 64-bit Windows server with RAM of 64 GB, two Intel(R) Xeon(R) Gold 5115 CPUs and one Nvidia(R) Quadro(R) P6000 GPU.
3.2.1. Feature extraction As mentioned in Section 2.1, 64 features are first extracted to form a feature set, which describe the degradation of bearing from different aspects. For example, features calculated from bearing 1 1 in the horizontal direction are plotted in Fig. 5. The feature plot for the rest of bearings is omitted to save space.
Fig. 5. Features for Bearing 1 1 from the horizontal direction.
J. Zhu et al. / Mechanical Systems and Signal Processing 139 (2020) 106602
11
3.2.2. FOT detection by HMM Before sending the feature set into HMM model, a scaling technique called Min-Max scaling is performed so that these features have the same range. Through EM algorithm, hidden states for the source and target data are successfully predicted, which are shown in Figs. 6 and 7. Based the hidden states (0 stands for healthy state and 1 stands for faulty state), FOT is determined when more than five fault states appear in succession. For bearing 1 1; 1 2; 1 3; 1 5; 1 6; 1 7 and 3 2, the situation that fault state occurs, but it soon disappears, it is refused to regard this situation as the fault indication. Also, we assume that after FOT is detected, the system will continue degrading to the end. So for bearing 1 5; 1 6; 2 6 and 3 1, the situation that fault state switches to the health state can be considered as recovery of bearing. To further illustrate the effectiveness of HMM for FOT detection, hidden states prediction result, vibration signal of the whole lifetime, vibration signal at FOT and envelope spectrum of vibration signal at FOT are plotted in Fig. 8. From the envelope spectrum, we can clearly see the fault characteristic frequency caused by the localized periodic impact. Moreover, for comparison, HI based method [44] and binary classification based method [19] are also introduced. FOT detection results are listed in Table 5. Compared to HI based method, for bearing 1 2 and 1 4, nearly same detection results can be found. For bearing 1 1; 1 5; 1 6 and 1 7, the proposed HMM method can detect fault as soon as possible, which avoids detection delay. Compared to binary classification based method, except for bearing 1 2, the proposed HMM method can detect fault without delays since early symptoms of fault can be effectively captured. 3.2.3. Transfer prognostics After FOT is determined, we can label the output with the corresponding LP. In RUL prediction module, the size of input layer, layer HL1, layer HL2 and layer OL is 64, 32, 16 and 1, respectively. In the domain adaptation module, the size of layer FC3 is 8. The penalty parameters k and l are determined when the best RUL prediction performance is achieved. The learning
Fig. 6. Hidden state detection results for the source data.
12
J. Zhu et al. / Mechanical Systems and Signal Processing 139 (2020) 106602
Fig. 7. Hidden state detection results for the target data.
Fig. 8. Bearing 1 1: (a) Hidden state detection; (b) vibration signal of the whole lifetime; (c) vibration signal at FOT; (d) envelope spectrum of vibration signal at FOT.
J. Zhu et al. / Mechanical Systems and Signal Processing 139 (2020) 106602
13
Table 5 FOT detections results. Bearing
Failure time
HMM method
Method in [44]
Method in [19]
1 1 1 1 1 1 1
2803 871 2375 1428 2463 2448 2259
1490 827 1684 1083 680 649 1026
2118 826 1613 1082 2306 2035 2030
2174 784 1842 1108 1653 1656 2233
1 2 3 4 5 6 7
rate g is set to be 0.0001. The batch size m is set to be 40. The training step is 50,000. The training loss of the proposed approach is shown in Fig. 9. With those parameters, the RUL prediction results for bearing 2 6 and 3 3 are shown in Figs. 10 and 11. From these results, we can see the predicted LP shows the overall increasing trend and approximates true LP very well, which demonstrates the effectiveness of the proposed data-driven transfer prognostics framework. To quantitatively measure performance, metrics such as mean absolute error (MAE), normalized root MSE (NRMSE) are also included in addition to MSE. These errors are shown in Tables 6 and 7.
MAE ¼
L 1X ^i y y L i¼1 i
ð23Þ
vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi u L u X t1 ^i Þ2 ðyi y L NRMSE ¼
i¼1 1 L
ð24Þ
L X ^i y i¼1
where L is the length of the testing data.
Fig. 9. Training loss: (a) Dataset1 ! Dataset2; (b) Dataset1 ! Dataset3.
14
J. Zhu et al. / Mechanical Systems and Signal Processing 139 (2020) 106602
Fig. 10. Prediction results for Bearing 2 6: (a) the proposed method; (b) MLP without transfer; (c) TCA; (d) DANN; (e) ATNN.
3.3. Comparison results In order to demonstrate the superiority of the proposed data-driven transfer prognostics method, four methods are used for comparison, which are MLP without transfer learning, transfer component analysis (TCA) [45], domain adaptive neural networks (DANN) [46] and adversarial training of neural networks (ATNN) [47]. Compared to MLP without transfer learning, the proposed method is obvious better since the domain discrepancy problem has been addressed. Compared to TCA, which is a traditional transfer learning approach through MMD-regularized subspace learning, it fails to predict LP with a increasing trend. Moreover, LP predicted by TCA even has values larger than 1. The reason can be explained that it is sometimes difficult to find domain invariant features in subspace through TCA. Compared to two state-of-the-art transfer learning methods: DANN and ATNN, the proposed method is able to learn more representable features, which are domain invariant and predictive for both the source and target data samples. The predicted results and performance metrics of these methods are also shown in Figs. 10 and 11, Tables 6 and 7. For bearing 2 6, although NRMSE in TCA is smaller than the proposed method, which is owing to overestimated LP by TCA, MSE and MAE in TCA is larger than the proposed method. Moreover, the LP prediction result by TCA is not increasing with a trend. From the these results, we can see our method achieves the best performance among these methods, which further validates the proposed method is both effective and superior in predicting RUL. 3.4. Discussion The effectiveness and advantage of the proposed data-driven transfer prognostics method has been shown by experimental analysis. Some points also need to be pointed out, which are further work directions.
15
J. Zhu et al. / Mechanical Systems and Signal Processing 139 (2020) 106602
Fig. 11. Prediction results for Bearing 3 3: (a) the proposed method; (b) MLP without transfer; (c) TCA; (d) DANN; (e) ATNN.
Table 6 Performance metrics for Bearing 2 6. Performance metrics
Proposed method
MLP
TCA [45]
DANN [46]
ATNN [47]
MSE MAE NRMSE
0.0890 0.2546 1.1697
0.1333 0.3101 1.9010
0.1118 0.2761 0.4423
0.1000 0.2684 1.3432
0.1140 0.2878 1.5703
Table 7 Performance metrics for Bearing 3 3. Performance metrics
Proposed method
MLP
TCA [45]
DANN [46]
ATNN [47]
MSE MAE NRMSE
0.0159 0.0983 0.2291
0.0468 0.1682 0.4299
0.1054 0.2755 0.5185
0.0460 0.1665 0.4866
0.0309 0.1391 0.3649
1) The RUL prediction module in our method is in fact a traditional shallow method, which the input is the extracted feature set. To liberate from this limit, deep learning based approach such as CNN, which accepts the raw data as input, will be a potential solution. 2) FOT detection is achieved by HMM by predicting hidden states through EM algorithm, which is an unsupervised learning algorithm. But in practice HMM by supervised training is more attractive. If we know certain hidden states at the beginning is healthy and certain states at the end is faulty, we can incorporate such information into HMM training by supervised manner.
16
J. Zhu et al. / Mechanical Systems and Signal Processing 139 (2020) 106602
3) Multi-layer adaptation and multi-kernel MMD are more appealing compared to single layer adaptation and single kernel MMD when the source and target domain have large distribution discrepancy [48]. 4) RUL prediction uncertainty is a critical issue because one can not know future. Unlike statistical model-based approach that introduces random variances into model parameters to describe the uncertainties, our proposed method (which is ANN based method in fact) assumes input sample deterministically represents actual RUL such that uncertainties are hard to explain owing to the lack of transparency. To handle this flaw, Sbarufatti et al. [49] combined feed forward neural network with sequential Monte-Carlo sampling to predict the RUL of fatigue cracks. Deutsch et al. [50] proposed deep belief network to predict RUL, in which confidence bounds are obtained by a resampling technique known as a jackknife (a linear approximation to the bootstrap method). Peng et al. [51] presented Bayesian deep learning based prognostics to deal with uncertainty, where a variational inference based method is introduced for network learning and inference. Inspired by these, how to incorporate prognostic uncertainty into our proposed transfer learning framework remains to be future work. 4. Conclusions In this paper, we proposed a new data-driven framework for transfer prognostics task. To address the limitations of traditional data-driven methods based on artificial intelligence, we took advantages of HMM to detect FOT adaptively first. Then a novel transfer learning method based on MLP is presented to solve the distribution discrepancy problem. Two modules, which are RUL prediction and domain adaptation, are included in the novel transfer learning method. Through these two modules, the RUL prediction ability is maintained, at the same time domain invariant features are successfully learned. By analyzing experimental run-to-failure data from different working conditions, the new data-driven framework was proved to be effective and advantageous in transfer RUL prediction. Further research directions are also discussed. Acknowledgments This research was supported by the National Research Foundation, Sembcorp Industries Ltd and National University of Singapore under the Sembcorp NUS Corporate Laboratory (R-261-513-003-281), National Natural Science Foundation of China (51875375). Appendix A. Supplementary data Supplementary data associated with this article can be found, in the online version, athttps://doi.org/10.1016/j.ymssp. 2019.106602. References [1] Y. Lei, N. Li, L. Guo, N. Li, T. Yan, J. Lin, Machinery health prognostics: a systematic review from data acquisition to rul prediction, Mech. Syst. Signal Process. 104 (2018) 799–834. [2] L. Cui, X. Wang, Y. Xu, H. Jiang, J. Zhou, A novel switching unscented kalman filter method for remaining useful life prediction of rolling bearing, Measurement 135 (2019) 678–684. [3] D. Liu, X. Yin, Y. Song, W. Liu, Y. Peng, An on-line state of health estimation of lithium-ion battery using unscented particle filter, IEEE Access 6 (2018) 40990–41001. [4] N. Chen, K.L. Tsui, Condition monitoring and remaining useful life prediction using degradation signals: revisited, IIE Trans. 45 (9) (2013) 939–952. [5] C. Paroissin, Inference for the wiener process with random initiation time, IEEE Trans. Reliab. 65 (1) (2016) 147–157. [6] W. Peng, Z.-S. Ye, N. Chen, Joint online rul prediction for multivariate deteriorating systems, IEEE Trans. Ind. Inf. 15 (5) (2018) 2870–2878. [7] N. Gebraeel, M. Lawley, R. Liu, V. Parmeshwaran, Residual life predictions from vibration-based degradation signals: a neural network approach, IEEE Trans. Ind. Electron. 51 (3) (2004) 694–700. [8] T.H. Loutas, D. Roulias, G. Georgoulas, Remaining useful life estimation in rolling bearings utilizing data-driven probabilistic e-support vectors regression, IEEE Trans. Reliab. 62 (4) (2013) 821–832. [9] P. Wang, B.D. Youn, C. Hu, A generic probabilistic framework for structural health prognostics and uncertainty management, Mech. Syst. Signal Process. 28 (2012) 622–637. [10] D. Wu, C. Jennings, J. Terpenny, R.X. Gao, S. Kumara, A comparative study on machine learning algorithms for smart manufacturing: tool wear prediction using random forests, J. Manuf. Sci. Eng. 139 (7) (2017) 071018. [11] C. Chen, G. Vachtsevanos, M.E. Orchard, Machine remaining useful life prediction: an integrated adaptive neuro-fuzzy and high-order particle filtering approach, Mech. Syst. Signal Process. 28 (2012) 597–607. [12] P. Baraldi, M. Compare, S. Sauco, E. Zio, Ensemble neural network-based particle filtering for prognostics, Mech. Syst. Signal Process. 41 (1–2) (2013) 288–300. [13] J. Zhu, N. Chen, W. Peng, Estimation of bearing remaining useful life based on multiscale convolutional neural network, IEEE Trans. Ind. Electron. 66 (4) (2019) 3208–3216. [14] G.S. Babu, P. Zhao, X.-L. Li, in: Deep Convolutional Neural Network based Regression Approach for Estimation of Remaining Useful Life in: International Conference on Database Systems for Advanced Applications, Springer, 2016, pp. 214–228. [15] L. Ren, Y. Sun, J. Cui, L. Zhang, Bearing remaining useful life prediction based on deep autoencoder and deep neural networks, J. Manuf. Syst. 48 (2018) 71–77. [16] M. Xia, T. Li, T. Shu, J. Wan, C.W. de silva, Z. Wang, A two-stage approach for the remaining useful life prediction of bearings using deep neural networks, IEEE Trans. Ind. Inf. (2018), https://doi.org/10.1109/TII.2018.2868687, 1–1. [17] L. Liao, W. Jin, R. Pavel, Enhanced restricted boltzmann machine with prognosability regularization for prognostics and health assessment, IEEE Trans. Ind. Electron. 63 (11) (2016) 7076–7083.
J. Zhu et al. / Mechanical Systems and Signal Processing 139 (2020) 106602
17
[18] J. Deutsch, D. He, Using deep learning-based approach to predict remaining useful life of rotating components, IEEE Trans. Syst. Man Cybern.: Syst. 48 (1) (2018) 11–20. [19] B. Zhang, S. Zhang, W. Li, Bearing performance degradation assessment using long short-term memory recurrent network, Comput. Ind. 106 (2019) 14– 29. [20] C. Huang, H. Huang, Y. Li, A bi-directional lstm prognostics method under multiple operational conditions, IEEE Trans. Ind. Electron. (2019), https://doi. org/10.1109/TIE.2019.2891463, 1–1. [21] N. Li, Y. Lei, J. Lin, S.X. Ding, An improved exponential model for predicting remaining useful life of rolling element bearings, IEEE Trans. Ind. Electron. 62 (12) (2015) 7762–7773. [22] K. Li, J. Wu, Q. Zhang, L. Su, P. Chen, New particle filter based on ga for equipment remaining useful life prediction, Sensors 17 (4) (2017) 696. [23] N. Sawalhi, R. Randall, H. Endo, The enhancement of fault detection and diagnosis in rolling element bearings using minimum entropy deconvolution combined with spectral kurtosis, Mech. Syst. Signal Process. 21 (6) (2007) 2616–2633. [24] J. Antoni, The infogram: entropic evidence of the signature of repetitive transients, Mech. Syst. Signal Process. 74 (2016) 73–94. [25] I.S. Bozchalooi, M. Liang, A smoothness index-guided approach to wavelet parameter selection in signal de-noising and fault detection, J. Sound Vib. 308 (1–2) (2007) 246–267. [26] D. Wang, K.-L. Tsui, Theoretical investigation of the upper and lower bounds of a generalized dimensionless bearing health indicator, Mech. Syst. Signal Process. 98 (2018) 890–901. [27] D. Wang, K.-L. Tsui, Statistical modeling of bearing degradation signals, IEEE Trans. Reliab. 66 (4) (2017) 1331–1344. [28] S.J. Pan, Q. Yang, et al, A survey on transfer learning, IEEE Trans. Knowl. Data Eng. 22 (10) (2010) 1345–1359. [29] V.M. Patel, R. Gopalan, R. Li, R. Chellappa, Visual domain adaptation: a survey of recent advances, IEEE Signal Process. Mag. 32 (3) (2015) 53–69. [30] J. Lu, V. Behbood, P. Hao, H. Zuo, S. Xue, G. Zhang, Transfer learning using computational intelligence: a survey, Knowl.-Based Syst. 80 (2015) 14–23. [31] V. Jayaram, M. Alamgir, Y. Altun, B. Scholkopf, M. Grosse-Wentrup, Transfer learning in brain-computer interfaces, IEEE Comput. Intell. Mag. 11 (1) (2016) 20–31. [32] W. Lu, B. Liang, Y. Cheng, D. Meng, J. Yang, T. Zhang, Deep model based domain adaptation for fault diagnosis, IEEE Trans. Ind. Electron. 64 (3) (2017) 2296–2305. [33] L. Wen, L. Gao, X. Li, A new deep transfer learning based on sparse auto-encoder for fault diagnosis, IEEE Trans. Syst. Man Cybern.: Syst. 49 (1) (2019) 136–144, https://doi.org/10.1109/TSMC.2017.2754287. [34] L. Guo, Y. Lei, S. Xing, T. Yan, N. Li, Deep convolutional transfer learning network: a new method for intelligent fault diagnosis of machines with unlabeled data, IEEE Trans. Ind. Electron. (2018), https://doi.org/10.1109/TIE.2018.2877090, 1–1. [35] X. Li, W. Zhang, Q. Ding, Cross-domain fault diagnosis of rolling element bearings using deep generative neural networks, IEEE Trans. Ind. Electron. (2018), https://doi.org/10.1109/TIE.2018.2868023, 1–1. [36] H. Ocak, K.A. Loparo, F.M. Discenzo, Online tracking of bearing wear using wavelet packet decomposition and probabilistic modeling: a method for bearing prognostics, J. Sound Vib. 302 (4–5) (2007) 951–961. [37] L.R. Rabiner, A tutorial on hidden markov models and selected applications in speech recognition, Proc. IEEE 77 (2) (1989) 257–286. [38] J.A. Bilmes et al, A gentle tutorial of the em algorithm and its application to parameter estimation for gaussian mixture and hidden markov models, Int. Comput. Sci. Inst. 4 (510) (1998) 126. [39] Q. Miao, V. Makis, Condition monitoring and classification of rotating machinery using wavelets and hidden markov models, Mech. Syst. Signal Process. 21 (2) (2007) 840–855. [40] D. Wang, Q. Miao, Q. Zhou, G. Zhou, An intelligent prognostic system for gear performance degradation assessment and remaining useful life estimation, J. Vib. Acoust. 137 (2) (2015) 021004. [41] M. Dong, D. He, P. Banerjee, J. Keller, Equipment health diagnosis and prognosis using hidden semi-markov models, Int. J. Adv. Manuf. Technol. 30 (7–8) (2006) 738–749. [42] S. Ben-David, J. Blitzer, K. Crammer, A. Kulesza, F. Pereira, J.W. Vaughan, A theory of learning from different domains, Mach. Learn. 79 (1–2) (2010) 151–175. [43] P. Nectoux, R. Gouriveau, K. Medjaher, E. Ramasso, B. Chebel-Morello, N. Zerhouni, C. Varnier, Pronostia: an experimental platform for bearings accelerated degradation tests, in: IEEE International Conference on Prognostics and Health Management, PHM’12, IEEE Catalog Number: CPF12PHMCDR, 2012, pp. 1–8. [44] X. Jin, Y. Sun, Z. Que, Y. Wang, T.W. Chow, Anomaly detection and fault prognosis for bearings, IEEE Trans. Instrum. Meas. 65 (9) (2016) 2046–2054. [45] S.J. Pan, I.W. Tsang, J.T. Kwok, Q. Yang, Domain adaptation via transfer component analysis, IEEE Trans. Neural Networks 22 (2) (2011) 199–210. [46] M. Ghifary, W.B. Kleijn, M. Zhang, Domain adaptive neural networks for object recognition, in: Pacific Rim International Conference on Artificial Intelligence, Springer, 2014, pp. 898–904. [47] Y. Ganin, E. Ustinova, H. Ajakan, P. Germain, H. Larochelle, F. Laviolette, M. Marchand, V. Lempitsky, Domain-adversarial training of neural networks, J. Mach. Learn. Res. 17 (1) (2016), 2096–2030. [48] M. Long, Y. Cao, J. Wang, M.I. Jordan, Learning transferable features with deep adaptation networks. arXiv:1502.02791.. [49] C. Sbarufatti, M. Corbetta, A. Manes, M. Giglio, Sequential monte-carlo sampling based on a committee of artificial neural networks for posterior state estimation and residual lifetime prediction, Int. J. Fatigue 83 (2016) 10–23. [50] J. Deutsch, D. He, Using deep learning-based approach to predict remaining useful life of rotating components, IEEE Trans. Syst. Man Cybern.: Syst. 48 (1) (2017) 11–20. [51] W. Peng, Z. Ye, N. Chen, Bayesian deep learning based health prognostics towards prognostics uncertainty, IEEE Trans. Ind. Electron.https://doi.org/10. 1109/TIE.2019.2907440..