A novel bearing intelligent fault diagnosis framework under time-varying working conditions using recurrent neural network

A novel bearing intelligent fault diagnosis framework under time-varying working conditions using recurrent neural network

ISA Transactions xxx (xxxx) xxx Contents lists available at ScienceDirect ISA Transactions journal homepage: www.elsevier.com/locate/isatrans Resea...

4MB Sizes 0 Downloads 58 Views

ISA Transactions xxx (xxxx) xxx

Contents lists available at ScienceDirect

ISA Transactions journal homepage: www.elsevier.com/locate/isatrans

Research article

A novel bearing intelligent fault diagnosis framework under time-varying working conditions using recurrent neural network ∗

Zenghui An a , , Shunming Li a , Jinrui Wang b , Xingxing Jiang c a

College of Energy and Power Engineering, Nanjing University of Aeronautics and Astronautics, Nanjing, China College of Mechanical and Electronic Engineering, Shandong University of Science and Technology, Qingdao, China c School of Rail Transportation, Soochow University, Suzhou, China b

article

info

Article history: Received 22 May 2018 Received in revised form 5 November 2019 Accepted 5 November 2019 Available online xxxx Keywords: Fault diagnosis Time-varying working condition Data-driven LSTM

a b s t r a c t Normal operation of bearing is the key to ensure the reliability and security of rotary machinery, so that bearing fault diagnosis is quite significant. However, the large amount of data collected by modern data acquisition system and time-varying working conditions make it hard to diagnose the fault using traditional methods To break the predicaments, we propose a new intelligent fault diagnosis framework inspired by the infinitesimal method. The proposed model including three parts can ignore the effect of different rotational speeds. Firstly, the sample is segmented and every segment dimension is extended by input network to ensure the adequate information memory space. Secondly, the classification information is stored and transferred in the long short-term memory (LSTM) network and output to the third part. In this process, the working condition information is ignored because of the gate units function. Finally, the likelihood is given by output network to classify the health conditions. Besides, we propose a loss function combining all the output of every time step and employ dropout to train the model, which increase the training efficiency and diagnosis ability. The bearing datasets under time-varying speeds and loads are used to verify the proposed method. The application result shows that our method has higher accuracy with simpler structure, and is superior to the traditional method in bearing fault diagnosis. Moreover, we give a physical interpretation of the proposed model. © 2019 ISA. Published by Elsevier Ltd. All rights reserved.

1. Introduction Bearings are the widely applied yet easily damaged components in machinery equipment [1,2]. Normal operation of bearing is the key to ensure the reliability and security of rotary machinery. Thus, research on bearing fault diagnosis is very practical and necessary [3,4]. With the rapid development of sensors and Internet of Things (IoT) [5,6], the amount of data collected by modern detection system is huge, which means the coming of mechanical big data era [7]. In general, rotary machinery often runs under time-varying speed because of time-varying power and loads, which brings a huge challenge for fault diagnosis [8]. Because the time-varying rotational speed may cause smearing in the spectrum, indicating that these fault frequencies will no longer be observable and detected. Traditionally, order tracking is the effective technique addressing this issue [9]. The framework consists of three main steps: (1) rotational speed extraction, (2) resampling, (3) order spectral analysis and fault diagnosis. Rotational speed extraction ∗ Corresponding author. E-mail addresses: [email protected] (Z. An), [email protected] (S. Li), [email protected] (J. Wang), [email protected] (X. Jiang).

is the most important step, because it directly determines the diagnosis accuracy. In this step, time–frequency analysis technique is more interesting comparing with tachometer-based method because of the limit of installation place. In the second step, a mature resampling technology is employed to process the signal based on the extracted rotational speed. In the final step, order spectral analysis of the processed signal is utilized, and then the fault is diagnosed according to the prior knowledge. For instance, Wang et al. [10] used the reference signal from a current signal measured from the stator of the generator for vibration order tracking. Shi et al. [11] extracted the envelope by windowed fractal dimension transform and used the envelope to get a time– frequency representation by short time Fourier transform. Jiang et al. [12] proposed time–frequency ridge fusion and logarithm transformation to track the targeted ridge curve reliably, and improved the diagnosis accuracy. However, the above studies may suffer three deficiencies: (1) Lots of the actual effort goes into the design of clear time– frequency representation algorithms in order to extract rotational speed. So the signal processing techniques are necessary. However, the extracted rotational speed which spends large human labor is only the by-product of fault diagnosis. (2) Even though the correct and clear rotational speed is obtained, it is difficult

https://doi.org/10.1016/j.isatra.2019.11.010 0019-0578/© 2019 ISA. Published by Elsevier Ltd. All rights reserved.

Please cite this article as: Z. An, S. Li, J. Wang et al., A novel bearing intelligent fault diagnosis framework under time-varying working conditions using recurrent neural network. ISA Transactions (2019), https://doi.org/10.1016/j.isatra.2019.11.010.

2

Z. An, S. Li, J. Wang et al. / ISA Transactions xxx (xxxx) xxx

to diagnose the fault, because this depends on the diagnostic expertise and heavy background noises also affect the diagnosis result. (3) Most existing methods require massive computation cost. Therefore, it is a tough work to process big data. Recently, as the capability of computer hardware quickly improves [13], various deep networks such as Artificial Neural Networks (ANN) [14,15], Deep Belief Network (DBN) [16,17], Autoencoders [18], Restricted Boltzmann Machine (RBM) [19], Convolutional Neural Networks (CNN) [20–22] and Sparse Filtering [23] have been successfully applied for intelligent health monitoring of machinery. Jia et al. [18] adopted the frequency spectra of vibration signals as the input of AE to diagnose the rolling bearing fault. Wang et al. [24] introduced batch-normalized to AE for improving the diagnosis efficiency of rolling bearing fault and make the AE model more suitable to deal with large amounts of vibration data. Jia et al. [25] proposed a framework called deep normalized CNN (DNCNN) with normalized layers and weight normalization strategy, which plays a positive role in improving the accuracy of fault diagnosis under a few fault samples. Zhuang et al. [26] proposed a new intelligent fault diagnosis method, multi-scale deep CNN (MS-DCNN). Guo [27] adopted the CWT to process raw vibration signals, and employed CNN for the fault diagnosis of gas turbine. However, for varying rotational speed, the data-driven models are difficult to be employed. The main reason is that it is difficult to obtain the training dataset. In practice, not only the time-varying rotational speed but also the timevarying acceleration can cause smearing of features. Therefore, if we want to train a model for fault diagnosis under varying rotational speed, the training samples collected under different rotational speeds and different accelerations are necessary, which is a tough work. However, it is effective for reducing the difficulty of collecting training dataset if the model is able to ignore one of working conditions (rotational speed and acceleration). For example, if the model can ignore the acceleration, the model can be trained using the samples in a certain speed range which are easy to be obtained. Unfortunately, there is no report about this. Infinitesimal method is a basic idea and an important tool to solve the problem of irregular variable, and is widely applied in various fields, such as calculus in mathematics, Duhamel’s integral in vibration theory and finite element method in engineering [28,29]. The important idea of the infinitesimal method is that a changing curve can be regarded as the combination of a great number of small constant segments, which inspire us to use it to diagnose the fault intelligently under varying rotational speed. Based on the idea of the infinitesimal method, the varying rotational speed can be considered as the combination of small constant rotational speed segments. Therefore, a micro model to extract features and a memory cell to store information are necessary. In this paper, we present a new intelligent fault diagnosis framework of bearing under time-varying rotational speed inspired by the idea of the infinitesimal method. Recurrent neural network (RNN) with long short-term memory (LSTM) [30,31] is employed to realize the motivation. The main contributions of our work can be summarized as follows. (1) The motivation of infinitesimal method is introduced and a novel intelligent fault diagnosis model under time-varying working conditions is proposed. (2) In order to increase the training efficiency and diagnosis ability, we propose a loss function combining all the output of every time step. (3) We give a physical interpretation of the proposed model. The reason why the proposed model can ignore the changing working condition is studied. The remaining of the paper is mainly arranged as follows. In Section 2, the motivation of the infinitesimal method and other theory background are overviewed. The proposed method is

presented in Section 3. In Sections 4 and 5, The bearing datasets under time-varying speed and load are used to verify the proposed method. Section 6 elaborates the physical interpretation of the proposed model. We close the paper with conclusion in Section 7. 2. Theory background 2.1. Motivation of proposed method For better illustration of the infinitesimal method, we use the experiment data which is described in Section 4.1. The spectrum and the rotational speeds of normal health condition under constant and varying rotational speed are shown in Fig. 1. When the rotational speed is steady 1000 rpm, the characteristic frequencies are obvious and the amplitudes of them are very high. Therefore, it is quite easy to classify the health condition. However, when the rotating machinery runs under time-varying speed (uniform deceleration), the fault features shift. This causes smearing of the spectrum [32]. As above, the primary reason why the traditional signal process approaches are ineffective is that there are large ranges of the rotational speed of the sample under time-varying speed in Fig. 1. Therefore, reducing the speed range of varying speed samples is possible to obtain the discriminative features like the samples under a constant speed. To verify the assumption, Kullback–Leibler (K–L) divergence [33] is used. The definition of K–L divergence between two probability distributions P and Q is written as follows D(P ∥ Q ) =

n ∑ j=1

P(j) log

P(j) Q (j)

(1)

where the sizes of P and Q both equal n. A low number of D(P ∥ Q ) means high similarity of two distributions. For the time-varying speed signal, because the rotational speed declines at an even speed as shown in Fig. 1, the length of signal is equivalent to the speed range. Longer signal means the bigger speed range. So we reduce the speed range with the method of reducing the sample points. For the time-varying rotational speed samples, we collect segments centering on 1000 rpm. For the constant speed signal, we sample the corresponding size to calculate the K–L divergence. It should be noticed that we change the form of sum in Eq. (1) to average value and call it average Kullback–Leibler (AKL) divergence, because every K–L divergence is calculated by the samples of varying size. The results are shown in Fig. 2, where S 1 and S 2 represent the spectrum of constant speed and varying speed samples, respectively. In Fig. 2, the trends of the three curves both decline when the size of sample goes small. This means that when we process a varying speed sample with small size, it could be regarded as the sample of constant speed because the features of them (such as spectra) are similar, which conforms to the infinitesimal method. Based on the idea, the sample under varying rotational speed can be considered as the combination of small segments under constant rotational speed as shown in Fig. 3(a). But it is almost impossible to train a model by the samples of all different and constant speeds. Therefore, using the inverse infinitesimal method, a micro model could be trained by a signal in a certain speed range, as shown in Fig. 3(b) and then the micro model could be adopted to diagnose the samples under arbitrarily varying speed. However, there is an important problem that if the micro model is adopted to process the segments with small size, the segments contain too little information. To study obviously, Principal Component Analysis (PCA) is employed. We use the constant speed samples under different health conditions: normal condition (Nor), inner race fault (IF), roller fault (RF), outer race fault

Please cite this article as: Z. An, S. Li, J. Wang et al., A novel bearing intelligent fault diagnosis framework under time-varying working conditions using recurrent neural network. ISA Transactions (2019), https://doi.org/10.1016/j.isatra.2019.11.010.

Z. An, S. Li, J. Wang et al. / ISA Transactions xxx (xxxx) xxx

3

Fig. 1. Spectrum and the rotational speeds of normal health condition vibration signals under constant rotational speed and time-varying rotational speed.

Fig. 2. AKL divergence between constant-speed and varying speed samples.

Fig. 3. Approximate rotational speed based on infinitesimal method: (a) Time-varying rotational speed, (b) Uniform acceleration and uniform deceleration.

(OF) and concurrent faults in the outer race and roller (ORF), which will be detailed in Section 4.1. Spectrum of samples with 4000 and 200 data points are analyzed by PCA and the results are shown in Fig. 4. In Fig. 4(a), although the features of different

health conditions are not fully separated, the basic trend of classification is provided. But as we noticed in Fig. 4(b), the principal components are almost mixed together. It could be observed that the features of samples with 200 data points are difficult to be used for classification. Therefore, we need not only a micro model

Please cite this article as: Z. An, S. Li, J. Wang et al., A novel bearing intelligent fault diagnosis framework under time-varying working conditions using recurrent neural network. ISA Transactions (2019), https://doi.org/10.1016/j.isatra.2019.11.010.

4

Z. An, S. Li, J. Wang et al. / ISA Transactions xxx (xxxx) xxx

Fig. 4. PCA of sample spectrum: (a) Samples with 4000 data points, (b) Samples with 200 data points.

where, bu and bv are the bias vectors. f ( ) is the nonlinear activation function [36]. The standard RNN is often troubled by Long-Term Dependencies that RNN becomes unable to learn to connect the information as the sequence grows [37]. A popular solution to this problem is the use of the LSTM network architecture. LSTM is the improvement of the RNN hidden layer, which is shown in Fig. 6. In the figure, the yellow rectangles and the pink circles represent the matrix and pointwise operations, respec(i) (i) tively. As a gated RNN, LSTM has forget gate ft , input gate int (i) and output gate out t . The operation in LSTM cell is shown as follows:

Fig. 5. Structures of the RNN.

(i)

ft

= σ [Wf (h(t i−) 1 + u(t i) ) + bf ]

(i)

(i)

(i)

int = σ [Win (ht −1 + ut ) + bin ] (i)

(i) (i) outt = σ [Wout (ht −1 + ut ) + bout ] (i) (i) (i) Cˆ t = tanh[WC (ht −1 + ut ) + bC ] (i) (i) (i) (i) (i) Ct = ft ◦ Ct −1 + int ◦ Cˆ t (i) (i) (i)

ht = outt ◦ tanh(Ct ) (i) ut

Fig. 6. Structure of the LSTM cell.

to extract features, but also a feature memory cell to store the classifiable information continually. Inspired by the micro model with memory, RNN with LSTM cell is employed for the proposed method in this paper. 2.2. RNN with LSTM cell ANN and CNN have no memory of sequence but RNN is different. So RNN have exhibited state-of-the-art performance on a wide range of complicated sequential problems including signal processing, speech classification and video captioning [34,35]. The structures of RNN are present in Fig. 5. As we can see from the figure, there are three layers, i.e. input (i) (i) layer, hidden layer and output layer. Their features are xt , ht (i) and ot , respectively, where, the superscript and subscript are the sample number and time step, separately. U ,V and W are the weight matrix. So, the forward propagation of standard RNN is defined as follows (i)

(i)

(i)

ht = f (Wht −1 + Uxt + bu ) (i)

(i)

ot = Vht + bv

(2) (3)

(4) (5) (6) (7) (8) (9)

(i) ht

where, and are input and hidden feature, respectively. W and b are weights and biases, separately. Their subscripts f, in, out and C represent the forget gate, input gate, output gate and the cell of LSTM, respectively; ‘◦’ denotes Hadamard product of two vectors. Sigmoid function (σ ) is used to compose the gates. Hyperbolic tangent function (tanh) is employed to extract (i) features. In LSTM cell, the memory cell Ct is the key to solve Long-Term Dependencies. As we can see from Fig. 6, the dark green line represents the propagation of memory cell. There are only some linear operation on it. Therefore, the memory cell seems like the principal line and the information can be stored in it for a long time. 2.3. Dropout Neural network is inevitably troubled by overfitting. As mentioned in Section 2.1, for fault diagnosis under time-varying working condition, the training samples are scanty. This application condition will aggravate the overfitting problem. Therefore, a appropriate method to overcoming this problem is necessary. Dropout has been well studied and proven to be useful for solving overfitting problem in practice [38]. It can be expediently used in nearly all the network trained by gradient descent. When we train the model, dropout can be regarded as a Hadamard product layer. Feature makes a Hadamard product with a vector d, where, the element d(j) obeys Bernoulli distribution, i.e., d(j)∼Bernoulli(p).

Please cite this article as: Z. An, S. Li, J. Wang et al., A novel bearing intelligent fault diagnosis framework under time-varying working conditions using recurrent neural network. ISA Transactions (2019), https://doi.org/10.1016/j.isatra.2019.11.010.

Z. An, S. Li, J. Wang et al. / ISA Transactions xxx (xxxx) xxx

5

Fig. 7. Illustration of (a) proposed framework, and (b) training strategy. (i)

(i)

This means that we randomly select features and set them to 0 when we train the model. The equation of dropout is as follows

where the subscripts j and k represent the element of oT , and oT is constructed as follows:

input(l) = output(l − 1) ◦ d .

vT = ReLU(V1 hT + bv 1 )

(10)

(i)

(i)

(i)

(12)

(i)

3. Proposed method

oT = V2 vT + bv 2

The proposed method is based on LSTM framework. The illustration and training flowchart of the model are shown in Fig. 7 to make it easier to understand.

where V1 , V2 and bv 1 , bv 2 are separately weights and biases for network layers. Rectified Linear Unit (ReLU) is adopted to (i) (i) activate the feature vt . Besides, we can use the hidden unit ht to calculate the likelihood at time step t.

3.1. Proposed framework

3.2. Training strategy

As shown in Fig. 7(a) the proposed model is divided into three parts: input neural network, LSTM network and output neural network. First, input network extends the dimension in the way of fully connected layer parameterized by a weight matrix U and a bias vectors bu . Based on the infinitesimal method, the size of input Nin is quite small, which limits the information capacity. Therefore, in order to make the memory cell of LSTM off this constraint, (i) (i) the small segment xt is extended to ut of Nh dimension. This ensures the adequate information memory space. Second, LSTM cell is employed to build the recurrent frame(i) work. The feature ut extracted from the input network is com(i) bined with the last hidden unit ht −1 as input information of LSTM cell. This information contains new discriminative feature which (i) will be remembered in the memory cell Ct . Besides, it is also the input of gate units and influences the value of them, which controls the information transmission in LSTM cell. The process of forward propagation is detailed in Eqs. (4)–(9). As a result, (i) (i) the new memory cell Ct and hidden unit ht are generated and then transmitted to the next LSTM cell until the final segment (i) (i) xT is input to the model and the corresponding hidden unit hT is generated. It should be noticed that the elements of initial (i) (i) memory cell C0 and hidden unit h0 both equal 0. (i) Third, the hidden unit hT is input to the output network which employs two-layer network. The final layer consists of K (class number of health condition) softmax classification units. They give the likelihood that the network assigns to each category k at time step T as e

P(l = k)(T ) = ∑ K

The parameters which should be trained are all the weight matrices {U , Wf , Win , Wout , WC , V1 , V2 } and corresponding bias vectors {bu , bf , bin , bout , bC , bv 1 , bv 2 }. In this section, we describe the training strategy of these parameters in detail and the whole procedures are shown in Fig. 7(b). (1) Training dataset Based on the infinitesimal method, the training dataset should contain the information of the most rotational speeds and the different accelerations are unable to affect the result. For training these parameters, samples in a certain speed range are adopted to compose the training dataset. Each signal is alternately divided into many samples with s% overlap. M samples compose the (i) N ×1 training set {x(i) , l(i) }M is the ith sample, and i=1 , where, x ∈R (i) (i) l is the health label of x . (2) Cost function First, the sample x(i) is alternately divided into a segment set (i) T {xt } t = 1, where T is an integer and equals N /Nin . Every (i) segment xt is input to the network successively and the corre(i) sponding output set {ot }T t = 1 is constructed. Even though the (i) health condition is given by the final output oT , we adopt the (i) T whole output set {ot } t = 1 to train the model for the efficient (i) supervision of parameter. Every output ot is combined with the (i) (i) (i) health label l as the single cost function Lt . Lt is the negative log likelihood cost function [39] and written as (i)

Lt = −

K ∑ k=1

(i) oT ,k

j=1 e

(i) o T ,j

(13)

(11)

(i) o t ,k

e 1{l(i) = k} log ∑ K

j=1 e

(i) ot ,j

(14)

where, 1{·} denotes the indicator function. Therefore, the cost (i) function of a sample is the sum of Lt and the cost function of

Please cite this article as: Z. An, S. Li, J. Wang et al., A novel bearing intelligent fault diagnosis framework under time-varying working conditions using recurrent neural network. ISA Transactions (2019), https://doi.org/10.1016/j.isatra.2019.11.010.

6

Z. An, S. Li, J. Wang et al. / ISA Transactions xxx (xxxx) xxx

Fig. 9. Time–frequency representation of normal health condition training sample.

Fig. 8. The arrangement of bearing test bench and bearing with fault.

the model L is the mean of all the cost function of samples which is shown as follows. L=

M T 1 ∑∑

M

(i)

Lt

(15)

i=1 t =1

(3) Optimization method The model is trained with the back propagation through time algorithm [40] on the sequences in the training dataset. Adaptive moment estimation algorithm (Adam) is adopted to minimize L. To solve the problem of overfitting, dropout is employed. Prob(i) abilities of dropping out neurons at the layer of feature ut and (i) feature ht are set to 1 − p. After training, the model is able to judge which health condition the sample belongs to at time step t according to the (i) maximum probability in ot . 4. Fault diagnosis of rolling bearing under time-varying rotational speed 4.1. Data description As shown in Fig. 8, the vibration signals of rolling bearings are collected from a test bench which consisted of an induction motor, four supporting pillow blocks, a tachometer and a coupling. The tested bearing is installed in the farthest supporting pillow block from the induction motor. Accelerometer is mounted on the supporting pillow block of tested bearing to measure the vibration signals. The health conditions of bearing include Nor, IF, RF, OF and ORF. The sampling frequency is 25.6 kHz. Because the proposed method can neglect the effect caused by different acceleration, the proposed model, only trained with samples in a certain speed range, can diagnose the health condition of samples under random rotational speeds. So the training dataset need not possess time-varying accelerations and only possesses time-varying rotating speeds. Considering the extreme condition, the acceleration is constant. The training dataset is composed with the vibration signals under a constant acceleration. Taking the training signal with normal health condition as an example, the time–frequency representation of it is shown in Fig. 9. We alternately divide the uniform acceleration signals under different health conditions into 1990 samples with 50% overlap and each sample contains 2000 data points.

For testing the method, two testing datasets, composed by varying speed signals and constant-speed signals respectively, are obtained. The signals under large speed oscillation compose the varying speed dataset. The rotational speeds are ruleless as shown in Fig. 10, which can verify the diagnosis ability under time-varying speed. For testing the robustness, the constantspeed signals under six drive motor speeds (1000 rpm, 1100 rpm, 1200 rpm, 1300 rpm, 1400 rpm and 1500 rpm) are used to build the constant-speed testing dataset which has the biggest difference of acceleration comparing with the training dataset. We randomly sample segments with 2000 data points from all the signals as testing samples to compose the varying speed testing dataset of 1000 samples and constant-speed dataset of 3500 samples. 4.2. Parameter selection of the proposed method There are several parameters in the proposed method, i.e., the input dimension Nin , the hidden dimension Nh and the probabilities 1 − p of dropping out neurons. We investigate the selection of these parameters respectively. It should be noted that 10 trials are carried out for each experiment in the following studies to reduce the effects of the randomness. First, the selection of the input dimension Nin of the proposed method is investigated. The probabilities of dropout are (i) set to 30%. The dimension of feature vt equals 50. The hidden dimension Nh equals 100 when Nin ≤ 100, and Nh equals 200 when Nin = 200. The results are shown in Fig. 11. The standard of training epoch is when the training accuracy and the testing accuracy are basically steady. In addition, the histogram shows the minimum accuracies and the error bars are the standard deviations. Testing accuracies of varying speed dataset are over 97.1% when Nin ≤ 100. This means that the proposed method can diagnose the dataset under large speed oscillation using various input dimension. It should be noticed that when Nin is increasing over 50, the testing accuracies of varying speed dataset are going lower, which support the infinitesimal method. For testing accuracies of constant-speed dataset, they are a little lower than the testing accuracies of varying speed dataset under the same experiment parameter because the constant-speed dataset has the biggest difference of angular acceleration comparing with the training dataset. Nevertheless, the testing accuracies are over 96.9% when Nin ≤ 100, which means that our method has good robustness for various angular acceleration. After the tradeoff between testing accuracies and the averaged training epoch, we choose 50 as Nin . Then, the selection of Nh is investigated as shown in Fig. 12. When the hidden dimension is small such as 50, the averaged training epoch is quite high. This means that the classified ability

Please cite this article as: Z. An, S. Li, J. Wang et al., A novel bearing intelligent fault diagnosis framework under time-varying working conditions using recurrent neural network. ISA Transactions (2019), https://doi.org/10.1016/j.isatra.2019.11.010.

Z. An, S. Li, J. Wang et al. / ISA Transactions xxx (xxxx) xxx

7

Fig. 10. Rotational speeds of varying speed dataset.

Fig. 11. Diagnosis results using various input dimension of proposed method.

Fig. 12. Diagnosis results using various hidden dimension of proposed method.

of small Nh maybe not strong, which leads to spend more time to train the parameter. In addition, when the hidden dimension is increasing, the standard deviations are going smaller, which indicates the features are steadier. According to the results in Fig. 12, the recommended Nh is 100. Finally, the probabilities 1 − p of dropping out neurons are investigated. In this paper, p = 1 indicates that dropout is not used. The diagnosis results are shown in Fig. 13. The testing accuracies are quite low (testing accuracies of varying speed

dataset are below 92.9% and testing accuracies of constant-speed dataset are below 92.3%) when we do not employed dropout. This means that dropout is necessity. In addition, when the keep probability is decreasing, the testing accuracies are going higher. However, the averaged training epoch is growing. It indicates that we must consider both accuracy and training epoch. So we choose 0.7 as the keep probability with tradeoff between the diagnosis accuracy and the training epoch.

Please cite this article as: Z. An, S. Li, J. Wang et al., A novel bearing intelligent fault diagnosis framework under time-varying working conditions using recurrent neural network. ISA Transactions (2019), https://doi.org/10.1016/j.isatra.2019.11.010.

8

Z. An, S. Li, J. Wang et al. / ISA Transactions xxx (xxxx) xxx

Fig. 13. Diagnosis results using various keep probability of proposed method.

Fig. 14. (a) Real-time testing accuracies of dataset Ss using proposed method, and (b) the class likelihood of an example.

4.3. Performance testing of the proposed method For further testing the proposed method, we design two experiments as follows. For illustrating that our method can ignore the effect of large rotational speed oscillation, we build a testing dataset Ss consisting of 1000 long sequences with 200 000 data points (close to 8 s) which contains at least one large rotational speed oscillation according to Fig. 10. The diagnosis results of Ss and the class likelihood of an example are shown in Fig. 14. The testing accuracies are quite low when diagnose the early sequences because it is almost impossible to extract the obvious, discriminative and steady features by the trained model from short sequences. The class likelihood shows the corresponding phenomenon that the likelihood of several health conditions is unstable. However, after very short time, the accuracies are quite high. More importantly, the later testing accuracies and later likelihood are both quite steady even though there are large rotational speed oscillations. This means that the changes of rotational speed are unable to affect the diagnosis result. In the actual fault condition monitoring, the fault status is not fixed, and will generally be aggravated with the operation of the mechanical system. Therefore, we create a testing dataset Sh of time-varying health condition using the constant speed signals mentioned before. Two examples of Sh are shown in Fig. 15. To simulate the possible samples under time-varying health condition, the {Nor→RF→ORF} and {Nor→OF→ORF} signals under

the same speed are selected to compose the testing samples. So each sample contains three segments of 10 000 data points and τ1 , τ2 are their cutoff points. Finally, we obtain 1200 testing samples of two types of varying health condition under six rotational speeds. The diagnosis results of dataset Sh are shown in Fig. 16 and the class likelihood of the two examples is shown in Fig. 17. When diagnose the first segments of samples, the performance is similar to the results of diagnosing Ss . When the category is changed, the proposed method could give the right health condition after the wrong judgment in a short time. As shown in Fig. 17, the class likelihood is steady except the time nearby the cutoff points. Thus the proposed method shows excellent performance when diagnose the sequences with varying health condition. 4.4. Comparing with related method To show the effectiveness of proposed method and illustrate the superiority of the proposed method, we compare it with the methods in related work. For verifying the motivation of proposed method, we study all the methods by diagnosing samples with different sample sizes. In Section 4.1, the length of samples is 2000. We segment each sample into 2 and 4 samples to compose the dataset of 1000 and 500 sample sizes, respectively. The testing datasets are composed of signals under large speed oscillation. The related works are shown as follows.

Please cite this article as: Z. An, S. Li, J. Wang et al., A novel bearing intelligent fault diagnosis framework under time-varying working conditions using recurrent neural network. ISA Transactions (2019), https://doi.org/10.1016/j.isatra.2019.11.010.

Z. An, S. Li, J. Wang et al. / ISA Transactions xxx (xxxx) xxx

9

Fig. 15. Examples of combined testing samples: (a) {Nor→RF→ORF} and (b) {Nor→OF→ORF}.

Fig. 16. Real-time testing accuracies of testing dataset Sh .

Fig. 17. Class likelihoods of examples.

Artificial Neural Networks (ANN) is a basic framework of deep

The activation functions of each layer are both Rectified Linear

learning. The frequency spectra of samples are the input. The

Unit (ReLU). Maximum training epoch is 500. The learning rate

architecture is set as (spectra size)-300-200-100-5. Spectra size

and keep probability of the last hidden layer are 0.001 and 0.7,

is the dimension of input. Loss function is the form of softmax.

respectively.

Please cite this article as: Z. An, S. Li, J. Wang et al., A novel bearing intelligent fault diagnosis framework under time-varying working conditions using recurrent neural network. ISA Transactions (2019), https://doi.org/10.1016/j.isatra.2019.11.010.

10

Z. An, S. Li, J. Wang et al. / ISA Transactions xxx (xxxx) xxx

Fig. 18. The arrangement of bearing test bench and bearing with fault.

Sparse filtering (SF), viewed as an unsupervised learning method, is employed for comparing. The parameters are set following the Ref. [23]. Therefore, the input and output dimension of sparse filtering are both 100. Each training sample is randomly divided into 50 segments when we train the model. The training datasets are composed by the same training signals of the proposed method, and for different sample sizes, the signals are alternately divided into many samples with 50% overlap. For Deep Belief Network (DBN), The frequency spectra of samples are the input. The parameters are set following the Ref. [18]. We set three hidden layers. The architecture is set as (spectra size)-300-200-100-5. Each RBM is pre-trained in 100 epochs. Then we fine-tuning the whole network. The learning rate is 0.001 and maximum training epoch is 500. For Convolutional Neural Networks (CNN), we follow the method in Ref. [27]. The continuous wavelet transform scalogram of samples are used as inputs. Then CNN with four convolution layers, four pooling layers and two full connection layers is used for classifying the health condition. Loss function of CNN is the form of softmax. CNN architecture is as follows: 20@5 × 5 kernels in the first convolution layer, 30@5 × 5 kernels in the second convolution layer, 50@4 × 4 kernels in the third and fourth convolution layer, 2 × 2 max pooling kernels in the four pooling layers. And the full connection layer architecture is set as (feature size)-100-5, where, feature size is the dimension of output of the last pooling layer. After some trials, when learning rate is 0.01 and maximum training epoch is 500, the best testing accuracy is obtained. The average accuracies of all the methods are shown in Table 1. For ANN, the performance is unsatisfactory. When the samples size is small (500), the testing accuracy is 80.4% but the training accuracy is only 95.7%. This means that samples of small size have less discriminative information. When the sample size is increasing, the testing accuracy is going lower, even though the training accuracies are 100%. The reason is that longer samples contain the more different acceleration information between training and testing datasets and ANN is unable to neglect them. For sparse filtering, it shows grudgingly satisfied performance in one case: when the sample size equals 1000, the accuracy is over 90.2%. However, the testing accuracy is on a declining trend when the sample size is over 1000. The results of DBN and CNN is

Table 1 Classification comparison. Method

Sample size 500

1000

2000

ANN

Training accuracy Testing accuracy

95.7% 80.4%

100% 41.9%

100% 34.1%

SF [23]

Training accuracy Testing accuracy

70.4% 52.7%

97.2% 90.2%

100% 86.86%

DBN [18]

Training accuracy Testing accuracy

96.2% 83.1%

100% 51.7%

100% 45.8%

CNN [27]

Training accuracy Testing accuracy

96.4% 86.9%

100% 83.5%

100% 79.3%

Proposed method

Training accuracy Testing accuracy

93.5%

100% 98.4%

99.4%

Fig. 19. Time–frequency representation of training sample with normal health condition under random load.

similar to ANN. The testing accuracies are both decreasing when the sample size is increasing. Even though the accuracies of CNN is better than other deep learning method (ANN and DBN), the best average accuracy is only 86.9% when the sample size is 500. But, the proposed method is trained only one time and obtains a higher accuracy of rising trend, which verifies that our method can diagnosis the samples under time-varying speed.

Please cite this article as: Z. An, S. Li, J. Wang et al., A novel bearing intelligent fault diagnosis framework under time-varying working conditions using recurrent neural network. ISA Transactions (2019), https://doi.org/10.1016/j.isatra.2019.11.010.

Z. An, S. Li, J. Wang et al. / ISA Transactions xxx (xxxx) xxx

11

Table 2 Classification comparison. Method

Fig. 20. Time–frequency representation of varying-speed testing sample with normal health condition under random load.

5. Fault diagnosis of bearing under time-varying rotational speed and loads 5.1. Data description A bearing fault dataset under time-varying rotational speed and loads is used to verify the proposed model. The bearing test bench is shown in Fig. 18, which consists of a diesel engine, 5 bearing seats, 3 couplings, a break disk and so on. The break disk can provide artificial variable loads. The rotational speed range of diesel engine is 850 rpm ∼ 2600 rpm. The datasets are collected at the tested bearing seats. The health condition includes four types: normal condition (Nor), inner race fault (IF), roller fault (RF), outer race fault (OF). The sampling frequency used in the experiment is 12.8 kHz. Similarly to Section 4.1, The training dataset is composed with the vibration signals under a constant acceleration. The difference in this section is that the diesel engine is operated under random load. Take the training signal with normal health condition as an example. The time–frequency representation of it is shown in Fig. 19. As we can see from the figure, when the load is applied, the rotational speed is decreased and the amplitude is increased. We alternately divide signals under different health conditions into 2000 samples with 50% overlap and each sample contains 2000 data points. For testing the method, two testing datasets respectively composed by varying-speed signals and constant-speed signals are collected. For testing the effect of load, the testing signals are both under random loads. For the varying-speed testing dataset, the rotational speed of every heath condition is ruleless. Take the varying-speed testing signal with normal health condition as an example. The time–frequency representation is shown in Fig. 20. When the diesel engine runs, time-varying loads are applied. For the constant-speed testing signal, the rotational speeds are idle speed (850 rpm) and maximum speed (2600 rpm) and the loads are random. For example, the time–frequency representations of

Sample size 500

1000

2000

ANN

Training accuracy Testing accuracy

92.5% 75.3%

99.6% 36.5%

100% 32.2%

SF [23]

Training accuracy Testing accuracy

68.2% 53.1%

96.9% 86.3%

99.8% 71.6%

DBN [18]

Training accuracy Testing accuracy

96.5% 77.4%

100% 63.6%

100% 55.4%

CNN [27]

Training accuracy Testing accuracy

97.1% 83.7%

100% 78.4%

100% 72.0%

Proposed method

Training accuracy Testing accuracy

91.2%

100% 97.1%

97.6%

constant-speed testing samples with normal health condition are shown in Fig. 21. In this figure, when the load is applied, the rotational speeds are decreased and the amplitudes are increased, so the rotational speeds undulate. We randomly sample segments with 2000 data points from all the signals and respectively obtain 1000 samples to compose varying speed testing dataset and constant-speed dataset. 5.2. Testing results and comparing with related method The selection of Nin is investigated in this section because the law of different input dimensions can reflect the motivation of infinitesimal method. The results are shown in Fig. 22. Other parameters are selected the recommendatory values in Section 4.2. All training accuracies are over 99.3%. When Nin ≤ 100, testing accuracies of varying speed dataset are over 95.3%. This verifies that our method can diagnose the samples under time-varying loads. It should be noticed that when Nin is increasing over 50, the testing accuracies of varying speed dataset are going lower, which support the motivation of infinitesimal method. For testing accuracies of constant-speed dataset, they are a little lower than the testing accuracies of varying-speed dataset because the constant-speed dataset has the biggest difference of acceleration comparing with the training dataset. Nevertheless, the testing accuracies are over 94.7% when Nin ≤ 100, which means that the proposed method has good robustness for various acceleration and load. We also use the comparison methods mentioned in Section 4.4 for illustrating the superiority of the proposed method. The details of all the methods are listed in Section 4.4. The average accuracies of all the methods are shown in Table 2. For deep learning comparison methods, i.e., ANN, DBN and CNN, the law of the results is similar. When the sample size is increasing, the testing accuracy is going lower and the training accuracy is going higher. The highest average accuracy of deep learning methods is obtained by CNN. When CNN diagnoses the samples with 500

Fig. 21. Time–frequency representation of constant-speed testing sample with normal health condition.

Please cite this article as: Z. An, S. Li, J. Wang et al., A novel bearing intelligent fault diagnosis framework under time-varying working conditions using recurrent neural network. ISA Transactions (2019), https://doi.org/10.1016/j.isatra.2019.11.010.

12

Z. An, S. Li, J. Wang et al. / ISA Transactions xxx (xxxx) xxx

Fig. 22. Diagnosis results using various input dimension.

Fig. 23. Average AMMD-S and AMMD-C of all the studied time steps.

sample points, the testing accuracy is 83.7%. Besides, when the sample size is 1000, SF gets the testing accuracy of 86.3%, which is the best result except the proposed method. For the proposed method, it is trained only one time and obtains a higher accuracy when the testing sample size is variable. 6. Understanding of the proposed model 6.1. The proposed model shares a physical interpretation To gain some more insight into what the proposed model has learned, the internal representations of the model at several time steps of the samples are investigated. It is known that the changing rotational speed is a huge influence for feature extraction. Therefore, the biggest question is how could the proposed model ignore the effect of the changing speed and make out the correct decision. For the convenience of research, the constant speed signals mentioned in Section 4.1 are used because they can reflect the difference of rotational speed. For compare the difference between activation vectors, Maximum Mean Discrepancy (MMD), viewed as a distribution distance [41], is used. We use the Average Maximum Mean Discrepancy (AMMD) because the dimensions of the features are different. The AMMD of identical mapping definition between distributions P and Q is written as follows: n





nP Q   1  1 ∑ (i) 1 ∑ (i)  AMMD(P , Q ) =  P − Q   d  nP nQ i=1

where, P (i) , Q (i) ∈R1×d .

i=1

(16)

We calculate two types of AMMD according to the difference of samples. It should be noticed that P and Q refer to the features of the same layer at the same time step. AMMD-S: the rotational speed of P and Q is different but the health condition of them is same. We calculate the AMMD of all qualified features and then average them. Therefore, the larger value of AMMD-S means the higher discrimination of speed. AMMD-C: the health condition of P and Q is different but the rotational speed of them is same. We also calculate the AMMD of all qualified features and then average them. Therefore, the larger value of AMMD-C means the higher discrimination of health condition. We investigate the features of each layer at several time steps (1–15, interval equals 2). First, the average AMMD-S and AMMD-C of all the time steps are shown in Fig. 23. So AMMD-S and AMMD-C represent the discrimination of features at each layer. It can be seen that the two AMMD of the data x both increase when it is transformed into u, which means that the distances between activation vectors increase. Therefore, the input network could equally increase the discrimination between activation vectors without distinction. Likewise, the LSTM network and output network respectively decreases and increases the discrimination. But the changing degrees of AMMD-S and AMMD-C are different. It is obvious that the discrimination of health condition is higher than the discrimination of speed after that the features pass the input network. Then, we investigate the discrimination between activation vectors of each layer at different time steps, which could reflect the change of features in our model. The results are shown in Fig. 24. For AMMD-C, the distances between activation vectors of u and x are invariable when the time step increases, because there are not recurrent connections in the input network. Nevertheless, in other feature layers, discrimination of health condition is higher when the time step goes larger, which means that the discriminative information towards fault type is continuously stored in the memory cell. In this process, weight matrix WC and input gate of LSTM cell play an important role because they directly determine which feature could be extracted and stored in the memory cell. However, as shown in Fig. 24(b), for AMMD-S, the distance between activation vectors of C is also larger when the time step increases. This means that the information stored in the memory cell is the discriminative features towards various factors and the difference of these factors is the weight in the cell. Fortunately, the output gate acts a pivotal part. In Fig. 24(b), when the time step increases over 13, AMMD-S of activation vectors in feature layer h cannot increase. It indicates that the

Please cite this article as: Z. An, S. Li, J. Wang et al., A novel bearing intelligent fault diagnosis framework under time-varying working conditions using recurrent neural network. ISA Transactions (2019), https://doi.org/10.1016/j.isatra.2019.11.010.

Z. An, S. Li, J. Wang et al. / ISA Transactions xxx (xxxx) xxx

13

Fig. 24. (a) AMMD-C and (b) AMMD-S of each layer at different time steps.

Fig. 25. Ratio k of each layer at different time steps.

discrimination of speed is suppressed when the features pass the output gate, which answers the question asked at the beginning in this section. In addition, we could speculate the role of forget gate. As shown in Fig. 16, when the health condition varies, the

efficiency should have been low if the model only depends on the weight matrix WC and input gate to extract features, which contradicts Fig. 16. A high-efficiency way is rejecting the previous information, that is the possible role of forget gate. Then, the

Please cite this article as: Z. An, S. Li, J. Wang et al., A novel bearing intelligent fault diagnosis framework under time-varying working conditions using recurrent neural network. ISA Transactions (2019), https://doi.org/10.1016/j.isatra.2019.11.010.

14

Z. An, S. Li, J. Wang et al. / ISA Transactions xxx (xxxx) xxx

Fig. 26. Feature visualization via t-SNE: feature representations extracted from raw signal in several layers at time steps 1, 5, 10 and 15 respectively for all testing samples.

AMMD of activation vectors v and o further increase and are basically linear change of h. Finally, for comparing the difference between the ability that the model extracts the features of different health conditions and rotational speeds, we use a simple parameter ratio k (k = AMMDC/AMMD-S). Larger k indicates stronger ability to classify the health condition comparing with the rotational speed. The results are displayed in Fig. 25. For the feature layers u and x, k nearly equals 1, which means that the difficulty of classifying health condition is equivalent to the difficulty of classifying rotational speed. Therefore, the input network plays a role of increasing the discrimination of all factors, according to Figs. 23 and 25. It can be seen that k begins to increase with time in the feature layer C which indicates that the ability of classifying the two factors begins to be different. For the feature layer h, the conclusion that the output gate decreases the discrimination was given according to Fig. 23. However, as we can see from Fig. 24, all the ratios in feature layer h are larger than them in feature layer C , which means that the features in h have higher health conditional discrimination comparing with rotational speed than the features in C . In addition, the slope of line indicates that when the time step increases, the discrimination of features in h increases faster comparing with them in C . Then, the slopes of line in v and o are basically unchanged, which verifies the conclusion that the output network could only increase the distance between activation vectors. We could approximatively regard the rotational speed as all factors except health condition, because it is always the most important influence on the diagnosis result. Therefore, the physical interpretation of the proposed model could be generalized as follows. Input network: It has two main actions. First, it could extend the dimension in order to ensure that there is adequate space

to store the information in the memory cell. Second, the distances between activation vectors increase, which means that the discriminations of all factors go higher. LSTM network: First, the forget gate rejects the information in previous memory cell when the changing health condition is considered. Second, the weight matrix WC and the input gate determine which features could be extracted and stored in the memory cell. In this process, the discriminative information of all factors is remembered but the emphasis is health condition. Third, the output gate selects the features benefiting to classify the health condition, and suppresses all the other factors. Output network: It is analogous to the input network. It could further increase the discrimination and compile the features into categorical vectors by dimensionality reduction. Therefore, because of these abilities of the proposed model, it could ignore the influence of all the other factors such as rotational speed, and make out the correct decision of health condition. 6.2. Networks visualizations We use t-SNE [42] to compare the feature obviously. The results of diagnosing the varying speed dataset in Section 4.1 are shown in Fig. 26. It can be seen from the figure that it is difficult to identify the discriminative features in the feature layer u because it is almost impossible to extract the obvious, discriminative and steady features by the input layer from only 50 data points. For the feature layer C , when the time step increases, samples with the same health condition are grouped closer together. This means that the trained input gate and weight matrix WC play an important role. Firstly, the amplitudes of unclassifiable features are close to zero when the last memory

Please cite this article as: Z. An, S. Li, J. Wang et al., A novel bearing intelligent fault diagnosis framework under time-varying working conditions using recurrent neural network. ISA Transactions (2019), https://doi.org/10.1016/j.isatra.2019.11.010.

Z. An, S. Li, J. Wang et al. / ISA Transactions xxx (xxxx) xxx

cell enters the trained forget gate, and only the discriminative features could pass the gate. Secondly, the trained input gate selects the obvious features extracted from the segments and then adds them to the memory cell. Therefore, the discriminative information accumulates in the memory cell with the increase of time step. Thirdly, the trained output gate selects the features from the discriminative information again. This is the reason why the vectors in the feature layer h are a little closer together than those in the feature layer C . Besides, in the feature layer o, the discriminations between activation vectors are the largest, which ensures the high accuracy. 7. Conclusion Inspired by the infinitesimal method, a LSTM-based intelligent fault diagnosis framework under time-varying rotational speed is proposed in this paper. First, we describe and verify the motivation of proposed method, and then the proposed method is presented. The bearing datasets under time-varying speed and load are used to verify the proposed method. The application result shows that our method has higher accuracy with simpler structure, and is superior to the traditional method in bearing fault diagnosis. Furthermore, a physical interpretation of what the network has learned is given using MMD and t-SNE. First, the forget gate rejects the information in previous memory cell when the changing health condition is considered. Then, the weight matrix WC and the input gate determine which features could be extracted and stored in the memory cell. Finally, the output gate selects the features benefiting to classify the health condition, and suppresses all the other features. Therefore, the proposed model could ignore the influence of all the other factors such as rotational speed, and make out the correct decision of health condition. Declaration of competing interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper. Acknowledgments This work was supported National Natural Science Foundation of China (51675262) and also supported by the Fundamental Research Funds for the Central Universities (NP2018304), the Major national science and technology projects (2017-IV0008-0045), the Advance research field fund project of China (6140210020102) and China postdoctoral science foundation (2019T120456) References [1] Luo M, Li C, Zhang X, Li R, An X. Compound feature selection and parameter optimization of ELM for fault diagnosis of rolling element bearings. ISA Trans 2016;65:556–66. [2] He Q, Ding X. Sparse representation based on local time–frequency template matching for bearing transient fault feature extraction. J Sound Vibr 2016;370:424–43. [3] Lu S, He Q, Zhao J. Bearing fault diagnosis of a permanent magnet synchronous motor via a fast and online order analysis method in an embedded system. Mech Syst Signal Process 2017;113:36–49. [4] Saidi L, Ben Ali J, Fnaiech F. Application of higher order spectral features and support vector machines for bearing faults classification. ISA Trans 2015;54:193–206. [5] Wang Z, Han Z, Gu F, Gu JX, Ning S. A novel procedure for diagnosing multiple faults in rotating machinery. ISA Trans 2015;55:208–18. [6] Yin S, Li X, Gao H, Kaynak O. Data-based techniques focused on modern industry: An overview. IEEE Trans Ind Electron 2015;62:657–67.

15

[7] Sinha JK, Elbhbah K. A future possibility of vibration based condition monitoring of rotating machines. Mech Syst Signal Process 2013;34:231–40. [8] Lu S, Wang X. A new methodology to estimate the rotating phase of a BLDC motor with its application in variable-speed bearing fault diagnosis. IEEE Trans Power Electron 2018;33:3399–410. [9] Jiang X, Li S. A dual path optimization ridge estimation method for condition monitoring of planetary gearbox under varying-speed operation. Measurement 2016;94:630–44. [10] Wang J, Peng Y, Qiao W. Current-aided order tracking of vibration signals for bearing fault diagnosis of direct-drive wind turbines. IEEE Trans Ind Electron 2016;63:6336–46. [11] Shi J, Liang M, Guan Y. Bearing fault diagnosis under variable rotational speed via the joint application of windowed fractal dimension transform and generalized demodulation: A method free from prefiltering and resampling. Mech Syst Signal Process 2016;68–9, 15-33. [12] Jiang X, Li S, Wang Q. A study on defect identification of planetary gearbox under large speed oscillation. Math Probl Eng 2016;2016:1–18. [13] Li Y, Kurfess TR, Liang SY. Kurfess TR liang SY stochastic prognostics for rolling element bearings. Mech Syst Signal Process 2000;14:747–62. [14] Paya BA, Esat I, Badi M. Artificial neural networks based fault diagnosis of rotating machinery using wavelet transforms as a preprocessor. Mech Syst Signal Process 1997;11:751–65. [15] Rafiee J, Arvani F, Harifi A, Sadeghi MH. Intelligent condition monitoring of a gearbox using artificial neural network. Mech Syst Signal Process 2007;21:1746–54. [16] Shao H, Jiang H, Wang F, Wang Y. Rolling bearing fault diagnosis using adaptive deep belief network with dual-tree complex wavelet packet. ISA Trans 2017;69:187–201. [17] Chen Z, Li W. Multisensor feature fusion for bearing fault diagnosis using sparse autoencoder and deep belief network. IEEE Trans Instrum Meas 2017;1–10. [18] Jia F, Lei Y, Lin J, Zhou X, Lu N. Deep neural networks: A promising tool for fault characteristic mining and intelligent diagnosis of rotating machinery with massive data. Mech Syst Signal Process 2016;303–15, 72-73. [19] Liao L, Jin W, Pavel R. Enhanced restricted Boltzmann machine with prognosability regularization for prognostics and health assessment. IEEE Trans Ind Electron 2016;63:7076–83. [20] Guo X, Chen L, Shen C. Hierarchical adaptive deep convolution neural network and its application to bearing fault diagnosis. Measurement 2016;93:490–502. [21] Janssens O, Slavkovikj V, Vervisch B, Stockman K, Loccufier M, Verstockt S, et al. Convolutional neural network based fault detection for rotating machinery. J Sound Vib 2016;377:331–45. [22] Hu Z-X, Wang Y, Ge M-F, Liu J. Data-driven fault diagnosis method based on compressed sensing and improved multi-scale network. IEEE Trans Ind Electron 2019;1. [23] Lei Y, Jia F, Lin J, Xing S, Ding SX. An intelligent fault diagnosis method using unsupervised feature learning towards mechanical big data. IEEE Trans Ind Electron 2016;63:3137–47. [24] Wang J, Li S, An Z, Jiang X, Qian W, Ji S. Batch-normalized deep neural networks for achieving fast intelligent fault diagnosis of machines. Neurocomputing 2019;329:53–65. [25] Jia F, Lei Y, Lu N, Xing S. Deep normalized convolutional neural network for imbalanced fault classification of machinery and its understanding via visualization. Mech Syst Signal Process 2018;110:349–67. [26] Zhuang Z, Qin W. Intelligent fault diagnosis of rolling bearing using onedimensional Multi-Scale Deep Convolutional Neural Network based health state classification. In: IEEE international conference on networking sensing and control; 2018. [27] Sheng G, Tao Y, Wei G, Chen Z. A novel fault diagnosis method for rotating machinery based on a convolutional neural network. Sensors 2018;18:1429. [28] Hackbusch W. Numerical tensor calculus. Acta Numer 2014;23:651–742. [29] Hong J, He X, Zhang D, Zhang B, Ma Y. Vibration isolation design for periodically stiffened shells by the wave finite element method. J Sound Vib 2018;419:90–102. [30] Graves A, Liwicki M, Fernandez S, Bertolami R, Bunke H, Schmidhuber J. A novel connectionist system for unconstrained handwriting recognition. IEEE Trans Pattern Anal Mach Intell 2009;31:855–68. [31] Zhang X-Y, Xie G-S, Liu C-L, Bengio Y. End-to-end online writer identification with recurrent neural network. IEEE Trans Hum Mach Syst 2017;47:285–92. [32] Urbanek J, Barszcz T, Sawalhi N, Randall R. Comparison of amplitudebased and phase-based methods for speed tracking in application to wind turbines. Metrol Meas Syst 2011;18:295–304. [33] Bu Y, Zou S, Liang Y, Veeravall VV. Veeravall VV estimation of KL divergence: Optimal minimax rate. IEEE Trans Inf Theory 2016;64:2648– 3674.

Please cite this article as: Z. An, S. Li, J. Wang et al., A novel bearing intelligent fault diagnosis framework under time-varying working conditions using recurrent neural network. ISA Transactions (2019), https://doi.org/10.1016/j.isatra.2019.11.010.

16

Z. An, S. Li, J. Wang et al. / ISA Transactions xxx (xxxx) xxx

[34] Su B, Lu S. Accurate recognition of words in scenes without character segmentation using recurrent neural network. Pattern Recognit 2017;63:397–405. [35] Schuster M, Paliwal K. Bidirectional recurrent neural networks. IEEE Trans Signal Process 1997;45:2673–81. [36] Gers F, Schmidhuber J, Cummins F. Learning to forget: Continual prediction with LSTM. Neural Comput 2000;12:2451–71. [37] Bengio Y, Simard P, Frasconi P. Learning long-term dependencies with gradient descent is difficult. IEEE Trans Neural Netw 1994;5:157–66.

[38] Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R. Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 2014;15:1929–58. [39] Wang J, Li S, Jiang X, Cheng C. An automatic feature extraction method and its application in fault diagnosis. J Vibroeng 2017;19:2521–33. [40] Werbos PJ. Backpropagation through time: what it does and how to do it. Proc IEEE 1990;78:1550–60. [41] Borgwardt KM, Gretton A, Rasch M, Smola AJ. Integrating structured biological data by kernel maximum mean discrepancy. Bioinform 2006;22:49–57. [42] Maaten Lvd, Hinton G. Visualizing data using t-SNE. J Mach Learn Res 2008;9:2579–605.

Please cite this article as: Z. An, S. Li, J. Wang et al., A novel bearing intelligent fault diagnosis framework under time-varying working conditions using recurrent neural network. ISA Transactions (2019), https://doi.org/10.1016/j.isatra.2019.11.010.