A deep convolutional neural networks model for intelligent fault diagnosis of a gearbox under different operational conditions

A deep convolutional neural networks model for intelligent fault diagnosis of a gearbox under different operational conditions

Measurement 145 (2019) 94–107 Contents lists available at ScienceDirect Measurement journal homepage: www.elsevier.com/locate/measurement A deep co...

NAN Sizes 0 Downloads 56 Views

Measurement 145 (2019) 94–107

Contents lists available at ScienceDirect

Measurement journal homepage: www.elsevier.com/locate/measurement

A deep convolutional neural networks model for intelligent fault diagnosis of a gearbox under different operational conditions Guangqi Qiu, Yingkui Gu ⇑, Quan Cai School of Mechanical and Electrical Engineering, Jiangxi University of Science and Technology, Ganzhou 341000, PR China

a r t i c l e

i n f o

Article history: Received 6 November 2018 Received in revised form 10 May 2019 Accepted 17 May 2019 Available online 27 May 2019 Keywords: Gearbox DCNN Intelligent fault diagnosis Vibration experiment Feature extraction SVM

a b s t r a c t For intelligent fault diagnosis of a gearbox using deep convolutional neural networks (DCNNs), we performed a gearbox vibration experiment. To understand the effects of different operational conditions and a cumulative degradation of the operational process of the gearbox, we collected the vertical and horizontal vibration signals of five different degradation states under three different operational conditions (rotational speeds of 1000r/min, 1050r/min and 1090r/min). Using feature extraction methods, such as time-domain analysis, frequency-domain analysis, and wavelet packet decomposition, a feature space with 52 features was constructed. The dimensionality of the extracted features was reduced to 4 by principal component analysis (PCA), with contribution rates of 51.55%, 20.45%, 12.33%, and 5.99%, respectively. To verify the superiority of DCNNs, the performance was compared to a support vector machine (SVM) classifier, and the hyper-parameters of the SVM classifier were optimized using a grid search technique. Results show that the vertical vibration signals are correlated with the degradation of the gear, and the identification accuracy is increased by imposing a certain load. DCNNs have been shown to achieve a higher accuracy than the SVM classifier, indicating that DCNNs are a more suitable method for solving a multi-state fault identification problem. Additionally, by inputting the raw signals directly, the gearbox intelligent fault diagnosis, based on DCNNs, has also achieved a higher accuracy with a lower computational time cost. Ó 2019 Elsevier Ltd. All rights reserved.

1. Introduction The gearbox, as an indispensable universal component in mechanical equipment, plays an important role in connecting and transferring power, and has been widely applied in various industrial fields, such as aerospace, electric power, metallurgy, energy, etc. [1]. A gear is a component that fails frequently, due to its complex structure and harsh working environment, often resulting in the mechanical equipment breaking down. The survey [2] shows that up to 80% of faults in transmission machinery are caused by gear failure. These failures in the gear also suffer a cumulative degradation process over time due to inevitable fatigue, and the operational conditions always vary. Therefore, it is necessary to detect a fault in the gear as early as possible to prevent, or at least minimize, the effect of the continued degradation. To render the gears more efficient and reliable, the effects of different operational conditions and cumulative degradation on the operational process of gears should be made clear.

⇑ Corresponding author. E-mail address: [email protected] (Y. Gu). https://doi.org/10.1016/j.measurement.2019.05.057 0263-2241/Ó 2019 Elsevier Ltd. All rights reserved.

A variety of approaches have been applied in fault diagnosis of gearboxes [3–5]. Most notably, Lei et al. [6] introduced planetary gearboxes from transmission structures, operation behaviors, characteristic frequencies, and test rigs, and reviewed several methods of fault diagnosis of planetary gearboxes; such as modeling, signal processing, and intelligent diagnosis. Using mesh stiffness evaluation, damage modeling, fault diagnosis techniques, effects of transmission paths, and validation methods, Liang et al. [7] summarized dynamic modeling of gearbox faults and their applications. It was also noted that limited work in dynamic modeling has been reported on varying operating conditions. Salameh et al. [8] compared the available fault detection approaches for gearboxes in wind turbines, such as lubrication analysis, acoustic emission analysis, and vibration analysis, and noted that vibration analysis had been the most successful approach for fault detection in rotating machinery. Typical artificial intelligence algorithms, such as the self-organizing neural network [9], SVM [10], multimodal deep support vector classification [11], deep random forest [12], genetic algorithms [13], blind source separation [14], fuzzy logic [15], k-nearest neighbor algorithms [16], hidden Markov models [17], hidden semi-Markov models [18], and Bayesian algorithms [19], etc., have also been applied in fault diagnosis of gearboxes. To

95

G. Qiu et al. / Measurement 145 (2019) 94–107

the best of our knowledge, there are two main categories of approach for fault diagnosis of gearboxes: data-driven and physical model-based methods. Although these methods have been successfully applied in many applications, there are still many disadvantages in signal processes and classification, such as a high degree of expertise for physical model-based methods, and sufficient historical data for data-driven methods. Additionally, with the help of monitoring techniques, a large amount of data is easy to collect in the real world nowadays. The challenges of intelligent fault diagnosis are no longer achieving a higher performance and learning more information with less training samples, but are now focused on increasing the performance and reducing the model complexity and computational burden caused by the large amount of raw data. To address these problems, deep learning approaches have been gaining more attention due to their modeling and representational ability. They have also shown better performance many fields, such as image classification and retrieval, object detection and tracking, semantic segmentation, human pose estimation, speech recognition, etc. [20]. Deep learning approaches are divided into the following categories: auto-encoders; deep belief networks; deep Boltzmann machines, convolutional neural networks (CNNs) and recurrent neural networks [21]. CNN is one of the most notable deep learning approaches [22]. Fukushima et al. [23] took neocognitron as a special form of CNN, based on the concept of local receptive fields. Lecun et al. [24,25] first applied CNN to handwritten digit recognition, achieving good performance. Jing et al. [26] noted that many researchers recognized the feature learning advantage of deep learning, and in 2015 it was applied to learning features from vibration signals. Gu et al. [27] reviewed the recent advances in CNN and introduced various applications of CNN. Guo et al. [28] proposed a health indicator construction by CNNs. Lu et al. [29] proposed a hierarchical convolutional network for intelligent fault diagnosis of bearings. For fault diagnosis of gearboxes, Jiao et al. [30] proposed a multivariate encoder information-based convolutional neural network method, Wang et al. [31] proposed an intelligent diagnosis scheme based on generative adversarial deep learning neural networks, and Han et al. [32] applied CNN to fault diagnosis under varying speeds. There are two important reasons that DCNN has become popular in fault diagnosis and prognosis. The first is that DCNN can automatically extract features from the raw input data. Local receptive fields are introduced into convolution operations to address the classification performance problem caused by losing feature information. The second is that DCNNs achieved a better performance with a lower training time than the variants of neural network architectures. In 2012, Krizhevsky et al. [33] proposed an AlexNet model that used a DCNN method. In the following years, various outstanding DCNNs have been reported, including AlexNet, network in network, VGG, GoogLeNet, ResNet, etc. [34]. For fault diagnosis and prognosis, Jia et al. [35] proposed a normalized DCNN for imbalanced fault diagnosis, Zhang et al. [36] applied DCNNs for bearing fault diagnosis in a noisy environment under differing working loads, and Li et al. [37] proposed a prognosis of turbine-engines by DCNNs. In summary, there is little literature on fault diagnosis of gearboxes using CNNs, but even less has been studied about intelligent fault diagnosis of gearboxes using DCNNs. For the above reasons, this paper focuses on the effects of different operational conditions on intelligent fault diagnosis of a gearbox using DCNNs, and a gearbox vibration experiment is carried out to collect the vertical and horizontal vibration signals of different degradation states under different operational conditions. It ensures that the sensors provide reliable vertical and horizontal vibration signals by reasonable sensor selection, calibration, and an IFDI (Instrument Fault Detection and Isolation) scheme in data acquisition phase. By signal processing methods, such as time-

domain analysis, frequency-domain analysis, and wavelet packet decomposition, a feature space is constructed, and the extracted features are reduced in dimension by using the PCA method. To verify the superiority of DCNNs, their performance are compared with the SVM classifier, and the hyper-parameters of the SVM classifier are optimized using a grid search technique. The structure of this paper is as follows: (1) we carry out a gearbox experimental study on the effect of the continued degradation caused by fatigue, and the vibration signals that are more correlated with the degradation of the gear are determined; (2) the diagnosis performance of DCNNs under different working conditions are compared with raw signals or features extracted from raw signals as the input information, and the superiority of the DCNN is also validated by the SVM method. The rest of this paper is organized as follows: Section 2 gives an overview of a DCNN algorithm; In Section 3, the proposed intelligent diagnosis approach based on DCNNs is introduced, in Section 4 we depict a gearbox vibration experiment. In section 5, the performance of DCNNs under different working conditions are discussed in detail. Section 6 concludes the paper. 2. DCNN algorithm The DCNN, composed of multiple-layer convolution neural networks, is a kind of trainable feed-forward neural network. A general DCNN consists of an input layer, convolutional layers, pooling layers (also known as sampling layers), full-connected layers, and an output layer; different kinds of layers play different roles. As shown in Fig. 1, the first layer of a typical DCNN is the input layer, which is usually followed by a structure that combines multiple convolutional and pooling layers. The last layer is a fullconnected layer, and Softmax classifier is employed to classify and achieve fault diagnosis. There are two stages in developing fault diagnosis using DCNNs; feature learning and classification. The first stage consists of a convolutional layer, pooling layer, and full-connected layer, and the last stage contains the Softmax classifier. The input layer is to preprocess data while inputting raw data into the neural networks. To avoid low training performance caused by the large amount of raw data, the training sets of raw data are zero-mean, ensuring the center of each dimension of the input data is 0, and normalization is employed to make sure data is in the same magnitude range. The processed data can be entered into a DCNN for fault diagnosis. The convolutional layer is the most important layer in a DCNN, aiming to generate feature maps by convolving a filter composed of a set of weights. Different filters can be used to obtain different feature maps. Suppose we use M feature maps as the input and N filters are employed for convolutional operation. Therefore, the output feature maps C j of the jth layer can be obtained as follows:

Cj ¼ f

X

! xi  W ij þ bj

ð1Þ

i

where f is the activation function, xi is the ith input map,  denotes the convolutional operator, W ij is the weight of the connect between the ith input map and the jth output map, and bj is the bias term. Assuming the size of the filter is a  a, the number of all the parameters in a convolutional layer can be calculated as follows:

Q ¼ N  ð a  a  M þ 1Þ

ð2Þ

There are the same number of parameters between input and hidden layers in a traditional artificial neural network, which makes the parameter number of the hidden layer too large, and achieving training difficult. Fortunately, the parameter number

96

G. Qiu et al. / Measurement 145 (2019) 94–107

Fig. 1. A basic architecture of a DCNN.

and training time cost of a DCNN are effectively reduced by local receptive fields and weight sharing mechanisms, reducing the complexity of the DCNN. (1) Local receptive field Local receptive field is an information processing method inspired by the visual perceptual system in biology. By dealing with mechanical fault vibration signals, the correlation between locally similar signals is higher than between distant signals. Local receptive field in the convolutional layer means that each neuron does not need to perceive the overall signal, only the local signal. Additionally, the full representation information of the vibration signal is obtained by merging all the local receptive information in the next layer. (2) Weight sharing mechanism A small part of the signal has the same feature map as the rest of the signal, meaning that the learned feature in the small part of the signal can be applied to the other parts of the signal, i.e. the same learning feature can be used in all bands of the entire signal. For example, a small window is randomly selected as a learning sample in a vibration signal, and the features learned from this small window can be employed as a detector to apply to the entire signal. The different features at any of the bands of the entire signal can be calculated by convolving the raw signal and the features learned from the small window. There are three saturated nonlinear functions, such as a sigmoid, softsign, and tanh, and a non-saturated nonlinear function, rectified linear units (ReLUs). ReLUs is one of the most notable activation functions because of the advantage of the speed of convergence being faster than the saturated nonlinear functions when the training gradient descends. The above four activation functions are as follows.

8 sigmoid : f ¼ 1þe1 y > > > > > > > < softsign : f ¼ 1þyjyj

ð3Þ

y y > > tanh : f ¼ eey e > > þey > > > : ReLU : f ¼ maxð0; yÞ

where y is the input of the activation function. Although local receptive field and weight sharing in the convolutional layer can effectively reduce the cost of computing, the learning of DCNNs still faces a heavier computational burden, especially for large input data. To accelerate the learning process, a pooling layer is added into the DCNN. Pooling layer is a sub-sampling process, and aims to merge similar local features. This lowers the dimension of feature maps and parameters while maintaining translational invariance. Using the pooling layer, the dimension of the feature map is reduced, and the robustness of feature learning is improved. Average pooling and max pooling are two common pooling methods. Scherer et al. [38] compared these two pooling methods, and found that max pooling had a faster convergence speed and a better generalization ability. Therefore, this paper adopts the max pooling method, which is expressed as:

Pj ¼ max C j

ð4Þ

C j 2S

where S is the pooling window size. Assuming that the size of the input feature map is c  c, and the size of the pooling region is s  s, the sizes of the feature maps C1 in the convolutional layer, and the sizes of the feature maps S1 in the pooling layer can be calculated as:

C1 ¼ ðc  a þ 1Þ  ðc  a þ 1Þ  S1 ¼





caþ1 caþ1  s s

ð5Þ  ð6Þ

97

G. Qiu et al. / Measurement 145 (2019) 94–107

The last layers of the DCNN are full-connected layers, and the Softmax classifier follows the last convolutional and pooling layers. The purpose is to classify the features extracted from the raw data. Assuming the lengths of the input and output vectors are M and N respectively, the number of parameters in a full-connected layer can be calculated as follows:

Q ¼MNþN

ð7Þ

The Softmax regression algorithm that was extended from the logistic regression algorithm is always the first-choice, which is from a two-state to a multi-state problem [39]. Due to logistic regression, it is an algorithm suited to the two-state classification problem. The training set consists of k-label samples   ð1Þ ð1Þ    ;    ; xðkÞ ; yðkÞ ; the state labels are yðiÞ 2 f0; 1g, x ;y and the input features are xðiÞ 2 Rnþ1 . We assume the logistic regression function is as follows:

h h ð xÞ ¼

1 T 1 þ eh x

ð8Þ

Dwij ¼ adj X i

where h are the model parameters that are trained to minimize the cost function JðhÞ,

J ðhÞ ¼ 

" k 1 X k

#  ðiÞ      ð iÞ   ð iÞ ð iÞ y log hh x þ 1  y log 1  hh x

ð9Þ

ð12Þ

If the L-layer is the last layer of the DCNN, dj in (12) is calculated as follows:

  0 dj ¼ T j  Y j f L ðX i Þ

i¼1

For multi-state classification problems, we assume there are n states and n corresponding state labels in a Softmax classifier.   ð1Þ ð1Þ    The training sets are x ;y ;    ; xðkÞ ; yðkÞ , and state labels are yðiÞ 2 f1; 2;    ; ng. For a given training sample x of n classes, the probability of state i is pðy ¼ ijxÞ, and the output of the Softmax regression is as follows:

2

3 2 xh1 3 Pðy ¼ 1jx; hÞ e 6 Pðy ¼ 2jx; hÞ 7 6 exh2 7 1 6 7 6 7 hh ðxÞ ¼ 6 7 ¼ Pn xhj 6 7 4 5 4  5  j¼1 e Pðy ¼ njx; hÞ

ð10Þ

exhn

where h1 ; h2 ;    ; hk are the model parameters, and Pn

j¼1

xhj

e

plays the

0

role of normalization. Thus, the loss function J ðhÞ is as follows:

" # k X n  ð iÞ  1 X exhj y ¼ j log Pn xh j k i¼1 j¼1 j¼1 e

ð13Þ 0 f L ðX i Þ

where T j is the jth expected label, and is the derivative of the activation function. If the L-layer is not the last layer of the DCNN, dj in (12) is calculated as follows: 0

dj ¼ f L ðX i Þ

N Lþ1 X

dn wjn

ð14Þ

n¼1

where NLþ1 is the ðL þ 1Þth layer feature number, wjn is the weight between the jth input and the nth output in the ðL þ 1Þth layer. 3. Proposed intelligent diagnosis approach based on the DCNN

1

J 0 ðhÞ ¼ 

propagation layer-by-layer with the derivative chain rule. The gradient can only be estimated for a few examples at one time (instead of the entire training set), so the SGD becomes one of the most widely used gradient methods in practice. Fig. 2 shows the training process of a DCNN, the input signal is forward propagation in neural networks, and the output data can be obtained by passing through multiple convolution neural network layers. By comparing the obtained output data with the expect label, the error is generated then transmitted layer-bylayer through back propagation. The corresponding weight can be updated, and the error decreases with the number of increasing iterations. The training of DCNN ends with convergence. The weight is updated by back propagation from the last layer, the error gradient with respect to the output node at the current layer is calculated, and then use the chained derivation rule to pass the error gradient back to the upper layer output node. For the Llayer of the DCNN, the updated formula for the weight W ij between the ith input xi and the jth output yj can be expressed as

ð11Þ

Fig. 3 shows the flowchart of the proposed intelligent diagnosis approach based on the DCNN. There are four steps for developing intelligent diagnosis based on DCNN: (1) data acquisition; (2) data processing; (3) training; and (4) fault diagnosis. 3.1. Data acquisition

where h1 ; h2 ;    ; hk are the model parameters that are trained to minimize the cost function J0 ðhÞ, to achieve Softmax classification. Since the 1980s, it was confirmed that neural networks with multiple hidden layers could be trained by back-propagation using stochastic gradient descent (SGD). Each unit in a neural network consists of the relative smooth functions of the input and internal weights. The gradient calculation of the loss function relative to the weight of the multi-layer networks can be calculated through back

Compare with the operational parameters such as temperature, current, voltage, vacuum gauge reading, and pressure gauge reading; vibration signals are widely applied in fault diagnosis. Due to the actual monitoring requirements, there are many issues that should be considered in the data acquisition phase, such as the choice of sensor type, number, and installation location. It is related to the accuracy of the measurement results by choosing

Fig. 2. The DCNN training process.

98

G. Qiu et al. / Measurement 145 (2019) 94–107

  var r ij ¼ ðd=d  1Þu2

ð17Þ

If the ith sensor fails, all components r ij are the same variance but their mean is non-zero, and the mean depends on the fault magnitude. The maximum value always appears in the ith component, i.e. it is most sensitive to fault on the ith sensor. By searching the largest component of r j , the fault sensor can be detected, so as to eliminate the effect of sensor failure. 3.2. Data processing One advantage of the DCNN is that monitoring data collected from sensors can be directly and automatically used for fault diagnosis or prognosis by feature extraction. However, for the other fault diagnosis methods, the raw data should be analyzed in advance by using signal processed technology, such as timedomain analysis, frequency-domain analysis, information entropy, and wavelet packet decomposition methods, etc. To demonstrate the superiority of a DCNN, a comprehensive feature space is constructed using time-domain analysis, frequency-domain analysis, and wavelet packet decomposition methods [41]. Principal Components Analysis (PCA) [42] is employed to feature fusion. In this paper, 52 serial numbering feature parameters are extracted, including 16 time-domain features, 4 frequency-domain features, and 32 wavelet packet energy features. Here, sixteen timedomain are mean, Root mean square, Mean square amplitude, Mean absolute, Skewness, Kurtosis, Standard deviation, Max, Min, Peak to peak, Shape factor, Crest factor, Impulse factor, Clearance factor, Skewness factor, and kurtosis factor, respectively. Four frequency-domain features are Mean frequency, Frequency centre, Standard deviation frequency, and Root mean square frequency, respectively. Wavelet packet energy features are a 5-depth decomposition of wavelet packets, thus there are 32 wavelet packet energy features. Fifty-two features are named No. 1 to 52 in the above order. PCA is a linear technique for dimensionality reduction, which means that it performs dimensionality reduction by embedding the data (p-dimensionality) into a lower dimensionality linear subspace (m-dimensionality). The calculation procedure of PCA is as follows: Step 1: Normalization of raw data. To eliminate the influence of different magnitudes of raw data, features are normalized by using the following equation:

Fig. 3. A flowchart for the proposed method.

xij  minðxkj Þ the reasonable and appropriate sensor in engineering application. Error sensor selection does not produce the satisfactory test results, even the most advanced data processing methods. What is more, to ensure the correctness of the monitoring data, the sensor was calibrated before use. Additionally, an IFDI scheme [40] is applied to eliminate the effect of sensor failure at this stage. Suppose there are d sensors of the same type measuring the same signal X j , the output zi of each instrument has a standard uncertainty uji , and all outputs zi are affected by the same determined bias when measuring the same point signal X j ðt Þ at time instant t, thus the output is as follow.

zji ¼ zj ðt Þ  uji ; i ¼ 1;    ; d

ð15Þ

The residual vector r j can be defined as

r ij ¼ zj 

d X k¼1 k–i

zk =ðd  1Þ

ð16Þ

When there is no fault and all ui are assumed to be equal, the mean of each component of r j is 0, and the variance of rj is

xRij ¼

1kn

maxðxkj Þ  minðxkj Þ 1kn

;

i ¼ 1; 2;    ; n

j ¼ 1; 2;    ; m

1kn

ð18Þ where maxðxkj Þ and minðxkj Þ denote the maximum value and mini1kn

1kn

mum value of the jth column of the matrix. Step 2: Build the raw data matrix G and calculate the corresponding covariance matrix C ¼ E½ðG  lÞðG  lÞT , where l is the mean. Step 3: Calculate the eigenvalues of the covariance matrix. Orthogonal decomposition of covariance matrix C: C ¼ ekeT , k ¼ diagðk1 ; k2 ;    ; kn Þ, and e ¼ ½e1 ; e2 ;    ; en , where k1    kn are the eigenvalues, and ei is the corresponding eigenvector of ki , also known as the ith component direction. Step 4: Calculate the contribution rate of each component k ¼ Pkni . k i¼1 i

Step 5: Choosing components. Rank the contribution rate of each component in descending order, and calculate the cumulative

G. Qiu et al. / Measurement 145 (2019) 94–107

Pm ki contribution rate of the former m components K ¼ Pni¼1 . As the

99

4. The gearbox vibration experiment

k i¼1 i

cumulative contribution rate reaches a fixed value (set to 90% in this paper), new m-dimensionality data are derived instead of raw data to achieve dimension reduction. 3.3. Training Training is the core of the DCNN, which is discussed in section 2 in more detail. There are two stages in the DCNN; forward learning, and back propagation. In the forward learning stage, a DCNN with N convolutional and pooling layers is built by ReLU, and the corresponding parameters are initialized. We also perform convolutional operation with local receptive fields, weight sharing and max pooling operations. In the back-propagation stage, the weight is updated using back propagation from the previous layer, the error gradient with respect to the output node at the current layer is calculated, and then use the chained derivation rule to pass the error gradient back to the upper layer output node. Using this strategy, the training of the DCNN ends with convergence. 3.4. Fault diagnosis After training the DCNN, Softmax regression is employed to classify the features learned from the raw signals; this achieves fault diagnosis. To evaluate the diagnosis performance, we define the ‘‘Accuracy” as ‘‘Fault correct detection” in this paper, and the accuracy is given as

Accuracy ¼

Number of correct classifications Total number of classified samples

ð19Þ

Gearboxes are always one of the most frequent components to fail, due to their harsh working environment and the complex structure of the transmission system. The effects of the operational environment, such as loading and rotation speed, etc., cause the gearbox to experience cumulative degradation over time. To clarify the effects of different conditions and degradation on the operational process of the gearbox, a gearbox vibration experiment was performed in the PHM Laboratory of Jiangxi University of Science and Technology. Data acquisition and gearboxes with sensors are discussed in the following subsections, under experimental setup and experimental procedure respectively.

4.1. Experimental setup Fig. 4 shows the gearbox vibration experimental rig, including a driven motor, a gearbox, a brake, a control device, a driven shaft support, and the corresponding electronic units. The experimental scenes are shown in Fig. 4(a), the layout of the vertical and horizontal vibration monitoring points is shown in Fig. 4(b), and four fault gears are shown in Fig. 4(c). In Fig. 4, the working principle is as follows: the motor is employed to drive the gearbox, a brake is to control the rotational speed, and an incremental encoder is to measure the rotational speed; the vertical and horizontal vibration signals were measured by the accelerometers installed on the shaft; The collected signals were stored in the UTekAcqu signal acquisition software by a 24-bit data acquisition card, and were analyzed by the UTekSs signal analysis software. Here, the types of the accelerometer, the incremental encoder and the 24-bit data acquisition card are CA-YD-107, E6B2-CWZ6C and UT3408FRS-

Fig. 4. Gearbox vibration experimental rig.

100

G. Qiu et al. / Measurement 145 (2019) 94–107

TCP, respectively. In this test, sampling frequency is 5120 Hz, the rated rotational speed is 1090r/min, and sample length is 1024. To clarify the effects of different conditions and cumulative degradation on the operational process of the gearbox, four different severely degraded gears and a normal gear were used to simulate the degradation process of gears. To compare and diagnose the effects of different working conditions on the performance, the signals were collected under three different kinds of rotational speeds, 1000r/min, 1050r/min, and 1090r/min. By loading from a brake, the rotation speed of the gearbox is impossible to maintain the same speed due to the low power of the motor, but is correspondingly slowed down. What is more, the applied loading is not specifically quantified, and the rotation speed is minor change by loading in this test. Therefore, three different kinds of rotational speeds, such as 1000r/min, 1050r/min and 1090r/min, are chosen to compare the effects of different working conditions on the performance of diagnosis. The rotational speeds were adjusted by loading from a brake. Four different severely degraded gears are artificially processed, which are shown in Fig. 4(c). As shown in Fig. 5, the crack length is the length of the crack along the rounded angle of the root tooth. The crack depth is the length from the rounded angle of the root tooth to the crack end. In previous literature, it was found that the crack depth is usually less than half the chordal tooth thickness. The gear breaks if the crack depth reaches one-half of the chordal tooth thickness; therefore, the crack depth was set to 1/4 of the chordal tooth thickness in this test. The thickness of the crack depends on the thickness of

the tool. Due to the limited processing capability, the thinnest tool is a hand-held grinding wheel cutter with a thickness of 1 mm. Therefore, the thickness of the gear crack is set to a fixed value of 1 mm. The cracks usually appearing at maximum stress, that is, the normal line of the rounded curve of the root tooth. The angle between the crack and the drawn line is in the range of [40°, 50°]. The crack angle is set to a constant value of 45° in this test. In summary, the depth, thickness and angle of the gear are constant, and only the effect of the crack length was considered in this paper, i.e. the crack lengths in different states are set to 25%, 50%, 75%, and 100% of the tooth width, respectively. The different degradation states are shown in Table 1. 4.2. Experimental procedures The vertical and horizontal vibration signals of five different degradation states under three different conditions were collected in this experiment. The sampling frequency and length are 5120 Hz and 1024, respectively. The conditions are three different kinds of rotational speeds; 1000r/min, 1050r/min, and 1090r/min. There are a total of 1500 data samples, 1024 data points in each data sample, which are shown in Table 2. Each direction vibration signal of each degradation state in a different condition takes 50 data samples. It is noted that only a test gear was employed to simulate the different degradation states, therefore the crack length of the test gear was processed in ascending order, and the experimental procedures were as follows. (1) The experimental rig was installed and adjusted to operate normally in various conditions. The sensors, data acquisition card, and the corresponding electronic control element were installed on the experimental rig. The test rig was connected to the industrial personal computer. (2) The test rig was started with design speed (1090 r/min). To ensure the reliability and accuracy of the experimental data and the test instruments, the vibration signals were collected for initial proofreading. (3) After completion of step (2), the vertical and horizontal vibration signals of a normal gear under three different rotational speeds, 1000 r/min, 1050 r/min, and 1090 r/min, were collected by a data acquisition card, and the rotational speeds were controlled by adjusting the loading from a brake. For each direction vibration signal under different rotational speeds, we took 50 data samples.

Fig. 5. Diagrammatic sketch of a gearbox crack.

Table 1 Description of the degradation states. States

Length

Depth

Thickness (mm)

Angle

1 2 3 4 5

0 25% tooth width 50% tooth width 75% tooth width 100% tooth width

0 25% 25% 25% 25%

0 1 1 1 1

0 45° 45° 45° 45°

chordal chordal chordal chordal

tooth tooth tooth tooth

thickness thickness thickness thickness

Table 2 Description of the monitoring data. Groups

State 1

State 2

State 3

State 4

State 5

Conditions

Vibration

1 2 3 4 5 6

1–50 251–300 501–550 751–800 1001–1050 1251–1300

51–100 301–350 551–600 801–850 1051–1100 1301–1350

101–150 351–400 601–650 851–900 1101–1150 1351–1400

151–200 401–450 651–700 901–950 1151–1200 1401–1450

201–250 451–500 701–750 951–1000 1201–1250 1451–1500

1000r/min 1000r/min 1050r/min 1050r/min 1090r/min 1090r/min

Vertical Horizontal Vertical Horizontal Vertical Horizontal

G. Qiu et al. / Measurement 145 (2019) 94–107

(4) One tooth of the test gear was processed by following the geometric parameters of state 2 as shown in Table 1, and the processed test gear was reinstalled in the test rig. As in step (3), the vertical and horizontal vibration signals of state 2 under three different rotational speeds, 1000 r/min, 1050 r/min, and 1090 r/min, were collected by a data acquisition card. (5) As in step (4), the same one tooth of the test gear was processed by following the geometric parameters of states 3–5 as shown in Table 1, and the vertical and horizontal vibration signals from states 3–5 under three different rotational speeds, 1000 r/min, 1050 r/min, and 1090 r/min, were collected by a data acquisition card. A total of 1500 data samples are collected in this paper, which are shown in Table 2. After the test, the test system was closed.

101

5. Experiment results and discussions 5.1. Feature parameter analysis To verify the effectiveness of the proposed method, a total of 1500 data samples, as shown in Table 2, were collected. Fiftytwo features were extracted from the above 1500 data samples, including 16 time domain features, 4 frequency domain features, and 32 wavelet packet decomposition features. The dimensionality of the extracted features was reduced to 4 using PCA as described in Section 3, and the corresponding contribution rates were 51.55%, 20.45%, 12.33%, and 5.99% respectively. The 16 time domain features and 4 frequency domain features of the vertical and horizontal vibration signals, under a speed of 1000 r/min, are shown in Fig. 6. In Fig. 6, the order from the left to the right and the top to

Fig. 6. 20 features of the vertical and horizontal vibration signals under a speed of 1000 r/min.

102

G. Qiu et al. / Measurement 145 (2019) 94–107

the bottom is the features of No.1 to 20, respectively. The ordinate is the normalized feature value, and the abscissa is the number of samples. The data samples of the vertical vibration signals are numbered from 1 to 250, and the data samples of the horizontal vibration signals are numbered from 251 to 500. In Fig. 6, we see that the trends of the features of the vertical and horizontal vibration signals are not consistent. This indicates that the features extracted from the vertical and horizontal vibration signals are not consistent with the degradation characteristics of the gear, but it is not intuitive to see which direction vibration signals are more correlated. To clarify the direction in which vibration signal is more suitable to characterize the crack failure of the gear, a DCNN is employed to fuse the features and identify the degradation states.

The 16 time domain features and 4 frequency domain features of the vertical vibration signals under three different conditions, 1000 r/min, 1050 r/min, and 1090 r/min, are shown in Fig. 7. In Fig. 7, the order from the left to the right and the top to the bottom is the features of No.1 to 20, respectively. The ordinate is the normalized feature value, and the abscissa is the number of samples. The data samples of the vertical vibration signals under a speed of 1000 r/min are numbered from 1 to 250, the data samples of the vertical vibration signals under a speed of 1050 r/min are numbered from 251 to 500, and the data samples of the vertical vibration signals under a speed of 1090 r/min are numbered from 501 to 750. In Fig. 7, we see that the trends of the features of the vertical vibration signals change accordingly to the speed of the gear

Fig. 7. 20 features of the vertical vibration signals under different speeds.

103

G. Qiu et al. / Measurement 145 (2019) 94–107 Table 3 Model parameters of the DCNN. Layers

Layer types

Model parameters

Learning parameters

2

Convolutional layer

Filter width = 4, Filter height = 1, Filter channel = 1, Filter number = 10, Bias = 10

Initial learning rate = 0.05, Learning rate decay in each 10 iterations = 20%, Weight decay = 0.04, Maximum number of iterations = 200, Test sampling rate = 50%

3 4

Pooling layer Convolutional layer

5 6

Pooling layer Convolutional layer

7 8

Full-connected layer Softmax classifier

Sub-sampling rate = 2 Filter width = 4, Filter height = 1, Filter channel = 10, Filter number = 15, Bias = 15 Sub-sampling rate = 2 Filter width = 18 Filter height = 1, Filter channel = 15, Filter number = 30, Bias = 30 ReLU activation function 5 output

changes. The trend of some features does not change, but their value has changed, such as feature #7. In feature #2, both the trend and the value are changed with the varying speeds. It is concluded that the performance of the classification is affected by external conditions, therefore, the effect of working conditions on the classification accuracy should be considered in the fault diagnosis. In summary, the vibration becomes more intense over a cumulative degradation. The maximum vibration appears in the faulty gear, and the minimum vibration appears in the normal gear. The experimental results of five degradation states under different conditions are consistent with the actual degradation trends, verifying the correctness of this experiment.

5.2. Construction of the DCNN model A DCNN with one input layer, three convolutional layers, two pooling layers and a full-connected layer was constructed, and the model parameters of the DCNN were initialized. The model parameters of the DCNN are shown in Table 3. Where, the number of convolutional layers and pooling layers of the DCNN model is mainly determined by the reference [43]. This paper mainly uses the stochastic gradient descent method to train and optimize the weights of each layer in the DCNN model. The method uses the principle of backpropagation to transmit the error layer by layer, so as to update the weight of each layer. As the number of iterations increases, the error of the model is reduced continuously, and finally the appropriate weights of each layer are obtained. Additionally, ten numbers of the two-state classifiers were constructed for a five-state classification problem by a one-to-one Table 4 Description of three experimental datasets. Experiment

Datasets

Features

Classifiers

Experiment 1 Experiment 2 Experiment 3

Group 3 and 4 Group 1, 3, and 5 Group 1, 3, and 5

Features + PCA Features + PCA Raw data

DCNN or SVM DCNN or SVM DCNN

algorithm of the SVM method, and the hyper-parameters of SVM were optimized using a grid search algorithm in this paper. 5.3. Gearbox fault diagnosis based on the DCNN To study the effects of different vibration signals and different conditions on the performance of the DCNN, three experimental datasets are constructed (shown in Table 4). Datasets in experiment 1 are composed of groups 3 and 4 as shown in Table 2, i.e. the vertical and horizontal vibration signals under a speed of 1050 r/min. Datasets in experiment 2 and experiment 3 are composed of groups 1, 3, and 5 as shown in Table 2, i.e. the vertical vibration signals under different conditions, 1000 r/min, 1050 r/ min, and 1090 r/min. Datasets in experiment 1 and experiment 2 were processed by three feature extraction methods. The dimensionality of the extracted features was reduced to 4 by PCA as described in Section 3. Datasets in experiment 3 are made up of raw data, which are directly input into the DCNN. To verify the superiority of the DCNN, the performance was compared with the SVM classifier. 5.3.1. Experiment 1 In experiment 1, the vertical and horizontal vibration signals under a speed of 1050 r/min (numbered 1 to 500 as shown in Table 2), are employed to analyze the direction in which vibration signal is more suitable to characterize the crack failure of the gear, and to verify the performance of the DCNN. There are a total of 500 data samples, half the datasets are used as a training sample set and half as a test sample set. In other words, half of the 50 data samples for each degradation state in each vibration direction are taken as a test set, and half are as a training set. Therefore, the training sample set contains a total of 250 data samples, and the test sample set contains a total of 250 data samples as well. Note that the test and training samples are distributed randomly. Four dimensionalities of the extracted features were obtained by feature extraction and PCA methods as described in section 3. Using a DCNN, the fault diagnosis results are shown in Fig. 8. When the

Fig. 8. The diagnosis results of the gear using the DCNN.

104

G. Qiu et al. / Measurement 145 (2019) 94–107

Fig. 9. The diagnosis results of the gear using the SVM.

diagnosis state coincides with the actual state, the diagnosis result is correct. Otherwise, the diagnosis state is misclassified. To compare the performance of fault diagnosis, we use the SVM method in this paper. The diagnosis results of SVM are shown in Fig. 9. Table 5 Comparison of the accuracy of DCNN and SVM under different vibration directions. Vibration signals

Vertical Horizontal

Experiment 1 DCNN

SVM

Error

93.6% 87.2%

90.4% 83.2%

3.5% 4.8%

Table 5 summarises the identification accuracy of the DCNN and SVM under different vibration directions. From Figs. 8 and 9, and Table 5, whether using the DCNN or SVM method, the identification accuracy of the vertical vibration signals is higher than that of the horizontal vibration signals in the same conditions. Comparing with the results of SVM, the relative errors of the DCNN to SVM in vertical and horizontal vibration signals are 3.5% and 4.8%, which proves that DCNN performance is superior to SVM. It is concluded that the vertical vibration signals are more correlated with degradation of the gear. Therefore, only the vertical vibration signals are analyzed in the following two experiments.

Fig. 10. The diagnosis results of the DCNN and SVM under different conditions.

105

G. Qiu et al. / Measurement 145 (2019) 94–107

5.3.2. Experiment 2 In experiment 2, the vertical signals under different conditions, 1000 r/min, 1050 r/min and 1090 r/min rotational speeds, are data samples numbered 1 to 250, 501 to 750, and 1001 to 1250 as shown in Table 2. These data samples are employed to analyze the effects of different conditions on the accuracy of gear degradation state diagnosis, and to verify the superiority of the DCNN. There are a total of 750 data samples, half the datasets are used as a training sample set and half as a test sample set. In other words, half of the 50 data samples for each degradation state in each operational condition are taken as a test set, and half are as a training set. Therefore, the training sample set contains a total of 375 data samples, and the test sample set contains a total of 375 data samples as well. Note that the test and training samples are distributed randomly. Four dimensionalities of the extracted features were obtained by feature extraction and the PCA methods described in Section 3. The diagnosis results of the DCNN and SVM are shown in Fig. 10. When the diagnosis state coincides with the actual state, the diagnosis result is correct. Otherwise, the diagnosis state is misclassified. The identification accuracy of the DCNN and SVM under different conditions is shown in Table 6. From Fig. 10 and Table 6, we see that the fault vibration signal is not particularly obvious, and the diagnosis accuracy is not highest when the gear is running in idle. However, when a certain loading is imposed, the vibration caused by the gear degradation becomes more obvious, and the diagnosis accuracy increases. The loading imposed should not be too large; the vibration caused from large loading of the gear interferes with the fault vibration signals to reduce the accuracy of identification. Comparing with the results of SVM, the relative errors of the DCNN to SVM in three conditions are 4.8%, 2.4% and 6.7%, which also proves that DCNN performance is superior to SVM. By comparing the results of experiments 1 and 2, it is found that both the DCNN and SVM achieve a high performance when identifying the vibration signals of the gear crack failure, indicating that the feature extraction and PCA methods described in section 3 are feasible and effective. In terms of the performance of the DCNN and SVM for identifying repeat vibration signals, the DCNN has a higher accuracy than SVM, implying that the DCNN is a more suitable method for solving multi-state fault identification problems.

Fig. 11. The diagnosis results of the DCNN with raw data under different conditions.

Table 7 Comparison of the accuracy of DCNN under different conditions. Conditions (r/min)

5.3.3. Experiment 3 Due to the automatic feature extraction function of a DCNN, the raw signals can be input directly, to extract the features by convolutional operations and pooling operations. As with experiment 2, in experiment 3 the raw data in experiment 2 are input directly into the DCNN model to validate the ability of feature extraction of a DCNN. The diagnosis results of the DCNN using raw data under different conditions are shown in Fig. 11, and the identification accuracy of experiment 3 is shown in Table 7. In Fig. 11 and Table 7, we see that the DCNN method has achieved a high performance when the raw signals are input directly. The identification accuracy under different conditions reaches 86.4%, 89.6%, and 88.0%, respectively. Comparing this to the results of the DCNN in experiment 2, the relative errors of

Table 6 Comparison of the accuracy of DCNN and SVM under different conditions. Conditions (r/min)

1000 1050 1090

1000 1050 1090

DCNN Experiment 2

Experiment 3

Error

87.2% 94.4% 89.6%

86.4% 89.6% 88.0%

0.9% 5.1% 1.8%

experiment 3 to experiment 2 in three conditions are 0.9%, 5.1% and 1.8%. The identification accuracy of experiment 3 is slightly lower, but it is still acceptable. That is because the automatic feature extraction ability of a DCNN is more effectively than other feature extraction methods in terms of reduction of the complexity and computational burden of the model. From a comprehensive perspective, we can conclude that the gearbox fault diagnosis of a DCNN has achieved a high accuracy and low computational cost by inputting the raw signals directly. In the future, many efforts need to be devoted to investigate how to reduce complexity and achieve fast execution models without loss of precision.

Experiment 2 DCNN

SVM

Error

87.2% 94.4% 89.6%

83.2% 92.0% 84.0%

4.8% 2.4% 6.7%

6. Conclusions This paper focuses on intelligent multi-state diagnosis of a gearbox using the DCNN method. To verify the accuracy and effective-

106

G. Qiu et al. / Measurement 145 (2019) 94–107

ness of the proposed method, a gearbox vibration experiment was performed in the PHM Laboratory of Jiangxi University of Science and Technology. By reasonable sensor selection, calibration, and an IFDI (Instrument Fault Detection and Isolation) scheme, it ensures that the sensors provide reliable vertical and horizontal vibration signals in data acquisition phase. We conclude the paper with the following remarks. (1) The experimental results of the five degradation states under different conditions are consistent with actual degradation trends, verifying the correctness of this experiment. The vibration becomes more intense over a cumulative degradation. The maximum vibration appears in the faulty gear, and the minimum vibration appears in the normal gear. (2) By comparing the identification accuracy of the vertical and horizontal vibration signals, it was found that the vertical vibration signals are more correlated with the degradation of the gear. By comparing the identification accuracy of the vertical vibration signals under different conditions, 1000 r/min, 1050 r/min, and 1090 r/min rotational speeds, it was found that the identification accuracy is affected by different conditions, and the identification accuracy increases by imposing a certain loading. DCNN has also been proven to achieve a higher accuracy than SVM by comparing the results of experiments 1 and 2. (3) By comparing to the results of the DCNN in experiment 2 and experiment 3, we found that gearbox fault diagnosis based on the DCNN achieved a high accuracy and a low computational cost by inputting the raw signals directly. (4) In the era of big data, the advantages of DCNN in data processing can be further highlighted with the scale of condition monitoring data increasing explosively, and the tradeoff between accuracy and computational complexity can be adjusted with flexibility in DCNN algorithms. Therefore, we are convinced that DCNN technology can be widely concerned and applied in the field of fault diagnosis in the future.

Acknowledgments The authors are grateful for the valuable comments and suggestions from the respected reviewers. This study was supported by National Natural Science Foundation of China (Grant no. 61463021), Young Scientists Object Program of Jiangxi Province in China (Grant no. 20144BCB23037), Natural Science Foundation of Jiangxi Province in China (Grant no. 20181BAB202020), Science and Technology Project in Education Department of Jiangxi Province in China (Grant no. GJJ180494), and Doctoral Scientific Research Foundation of Jiangxi University of Science and Technology in China (Grant no. 3401223356). References [1] M. Zhang, K. Wang, D. Wei, M.J. Zuo, Amplitudes of characteristic frequencies for fault diagnosis of planetary gearbox, J. Sound Vib. 432 (2018) 119–132. [2] D. Scherer, A. Müller, S. Behnke, Evaluation of pooling operations in convolutional architectures for object recognition, In Artificial Neural Networks–ICANN 2010 (pp. 92-101), Springer, Berlin, Heidelberg. [3] J. Yu, Y. He, Planetary gearbox fault diagnosis based on data-driven valued characteristic multigranulation model with incomplete diagnostic information, J. Sound Vib. 429 (2018) 63–77. [4] Y. Li, K. Ding, G. He, X. Jiao, Non-stationary vibration feature extraction method based on sparse decomposition and order tracking for gearbox fault diagnosis, Measurement 124 (2018) 453–469. [5] J. Chen, D. Zhou, C. Lyu, C. Lu, An integrated method based on CEEMD-SampEn and the correlation analysis algorithm for the fault diagnosis of a gearbox under different working conditions, Mech. Syst. Sig. Process. 113 (2018) 102– 111.

[6] Y. Lei, J. Lin, M.J. Zuo, Z. He, Condition monitoring and fault diagnosis of planetary gearboxes: a review, Measurement 48 (2014) 292–305. [7] X. Liang, M.J. Zuo, Z. Feng, Dynamic modeling of gearbox faults: a review, Mech. Syst. Sig. Process. 98 (2018) 852–876. [8] J.P. Salameh, S. Cauet, E. Etien, A. Sakout, L. Rambault, Gearbox condition monitoring in wind turbines: a review, Mech. Syst. Sig. Process. 111 (2018) 251–264. [9] G. Cheng, Y.L. Cheng, L.H. Shen, J.B. Qiu, S. Zhang, Gear fault identification based on hilbert–huang transform and som neural network, Measurement 46 (3) (2013) 1137–1146. [10] Jedlin´ski Łukasz, J. Jonak, Early fault detection in gearboxes based on support vector machines and multilayer perceptron with a continuous wavelet transform, Appl. Soft Comput. 30 (2015) 636–641. [11] C. Li, G. Zurita, M. Cerrada, D. Cabrera, Multimodal deep support vector classification with homologous features and its application to gearbox fault diagnosis, Neurocomputing 168 (2015) 119–127. [12] C. Li, R.V. Sanchez, G. Zurita, M. Cerrada, D. Cabrera, R.E. Vásquez, Gearbox fault diagnosis based on deep random forest fusion of acoustic and vibratory signals, Mech. Syst. Sig. Process. 76–77 (2016) 283–293. [13] F. Chen, B. Tang, R. Chen, A novel fault diagnosis model for gearbox based on wavelet support vector machine with immune genetic algorithm, Measurement 46 (1) (2013) 220–232. [14] Z. Li, Z. Peng, A new nonlinear blind source separation method with chaos indicators for decoupling diagnosis of hybrid failures: a marine propulsion gearbox case with a large speed variation, Chaos Solitons Fractals Interdiscip. J. Nonlinear Sci. Nonequilibr. Complex Phenom. 89 (2016) 27–39. [15] M. Cerrada, C. Li, R.V. Sánchez, F. Pacheco, D. Cabrera, J.V.D. Oliveira, A fuzzy transition based approach for fault severity prediction in helical gearboxes, Fuzzy Sets Syst. 337 (2018) 52–73. [16] Y. Lei, M.J. Zuo, Gear crack level identification based on weighted k-nearest neighbor classification algorithm, Mech. Syst. Sig. Process. 23 (5) (2009) 1535– 1547. [17] C.U. Mba, V. Makis, S. Marchesiello, A. Fasana, L. Garibaldi, Condition monitoring and state classification of gearboxes using stochastic resonance and hidden markov models, Measurement 126 (2018) 76–95. [18] X. Li, V. Makis, H. Zuo, J. Cai, Optimal bayesian control policy for gear shaft fault detection using hidden semi-markov model, Comput. Ind. Eng. 119 (2018) 21–35. [19] R. Jiang, J. Yu, V. Makis, Optimal bayesian estimation and control scheme for gear shaft fault detection, Comput. Ind. Eng. 63 (4) (2012) 754–762. [20] S. Khan, T. Yairi, A review on the application of deep learning in system health management, Mech. Syst. Sig. Process. 107 (1) (2018) 241–265. [21] R. Zhao, R. Yan, Z. Chen, K. Mao, P. Wang, R.X. Gao, Deep learning and its applications to machine health monitoring, Mech. Syst. Sig. Process. 115 (2019) 213–237. [22] Y. Guo, Y. Liu, A. Oerlemans, S. Lao, S. Wu, M.S. Lew, Deep learning for visual understanding: a review, Neurocomputing 187 (2016) 27–48. [23] K. Fukushima, S. Miyake, Neocognitron: a new algorithm for pattern recognition tolerant of deformations and shifts in position, Pattern Recogn. 15 (6) (1982) 455–469. [24] Y. Lecun, B. Boser, J.S. Denker, D. Henderson, R.E. Howard, W. Hubbard, et al., Backpropagation applied to handwritten zip code recognition, Neural Comput. 1 (4) (1989) 541–551. [25] Y. Lecun, Y. Bengio, G. Hinton, Deep learning, Nature 521 (7553) (2015) 436. [26] L. Jing, M. Zhao, P. Li, X. Xu, A convolutional neural network based feature learning and fault diagnosis method for the condition monitoring of gearbox, Measurement 111 (2017) 1–10. [27] J. Gu, Z. Wang, J. Kuen, L. Ma, A. Shahroudy, B. Shuai, T. Liu, X. Wang, G. Wang, J. Cai, T. Chen, Recent advances in convolutional neural networks, Pattern Recogn. 77 (2018) 354–377. [28] L. Guo, Y. Lei, N. Li, T. Yan, N. Li, Machinery health indicator construction based on convolutional neural networks considering trend burr, Neurocomputing 292 (2018) 142–150. [29] C. Lu, Z. Wang, B. Zhou, Intelligent fault diagnosis of rolling bearing using hierarchical convolutional network based health state classification, Adv. Eng. Inf. 32 (2017) 139–151. [30] J. Jiao, M. Zhao, J. Lin, J. Zhao, A multivariate encoder information based convolutional neural network for intelligent fault diagnosis of planetary gearboxes, Knowl.-Based Syst. (2018), https://doi.org/10.1016/ j.knosys.2018.07.017. [31] Z. Wang, J. Wang, Y. Wang, An intelligent diagnosis scheme based on generative adversarial learning deep neural networks and its application to planetary gearbox fault pattern recognition, Neurocomputing 310 (2018) 213– 222. [32] Y. Han, B. Tang, L. Deng, Multi-level wavelet packet fusion in dynamic ensemble convolutional neural network for fault diagnosis, Measurement 127 (2018) 246–255. [33] A. Krizhevsky, I. Sutskever, G.E. Hinton, Imagenet classification with deep convolutional neural networks, Adv. Neural Info. Process. Systems (2012) 1097–1105. [34] H. Wu, J. Zhao, Deep convolutional neural network model based chemical process fault diagnosis, Comput. Chem. Eng. 115 (2018) 185–197. [35] F. Jia, Y. Lei, N. Lu, S. Xing, Deep normalized convolutional neural network for imbalanced fault classification of machinery and its understanding via visualization, Mech. Syst. Sig. Process. 110 (2018) 349–367.

G. Qiu et al. / Measurement 145 (2019) 94–107 [36] W. Zhang, C. Li, G. Peng, Y. Chen, Z. Zhang, A deep convolutional neural network with new training methods for bearing fault diagnosis under noisy environment and different working load, Mech. Syst. Sig. Process. 100 (2018) 439–453. [37] X. Li, Q. Ding, J.Q. Sun, Remaining useful life estimation in prognostics using deep convolution neural networks, Reliab. Eng. Syst. Saf. 172 (2018) 1–11. [38] D. Scherer, A. Müller, S. Behnke, Evaluation of pooling operations in convolutional architectures for object recognition, In Artificial Neural Networks–ICANN 2010 (pp. 92-101). Springer, Berlin, Heidelberg. [39] H.O.A. Ahmed, M.L.D. Wong, A.K. Nandi, Intelligent condition monitoring method for bearing faults from highly compressed measurements using sparse over-complete features, Mech. Syst. Sig. Process. 99 (2018) 459–477.

107

[40] G. Betta, A. Pietrosanto, Instrument fault detection and isolation: state of the art and new research trends, IEEE Trans. Instrum. Meas. 49 (2000) 100–107. [41] H. Liu, X. Mi, Y. Li, Smart deep learning based wind speed prediction model using wavelet packet decomposition, convolutional neural network and convolutional long short term memory network, Energy Convers. Manage. 166 (2018) 120–131. [42] Y.T. Feng, T. Zhao, M. Wang, D.R.J. Owen, Characterising particle packings by principal component analysis, Comput. Methods Appl. Mech. Engrg. 340 (2018) 70–89. [43] L. Jing, T. Wang, M. Zhao, P. Wang, An adaptive multi-sensor data fusion method based on deep convolutional neural networks for fault diagnosis of planetary gearbox, Sensors 17 (2) (2017) 414.