A multi-ensemble method based on deep auto-encoders for fault diagnosis of rolling bearings

A multi-ensemble method based on deep auto-encoders for fault diagnosis of rolling bearings

Measurement 151 (2020) 107132 Contents lists available at ScienceDirect Measurement journal homepage: www.elsevier.com/locate/measurement A multi-e...

6MB Sizes 44 Downloads 99 Views

Measurement 151 (2020) 107132

Contents lists available at ScienceDirect

Measurement journal homepage: www.elsevier.com/locate/measurement

A multi-ensemble method based on deep auto-encoders for fault diagnosis of rolling bearings Xianguang Kong a, Gang Mao a, Qibin Wang a,b,⇑, Hongbo Ma a, Wen Yang a a b

School of Mechano-Electronic Engineering, Xidian University, Xi’an 710071, PR China Jiangsu Jinxiang Transmission Equipment Co., Ltd, Huaian 223001, PR China

a r t i c l e

i n f o

Article history: Received 15 July 2019 Received in revised form 15 September 2019 Accepted 4 October 2019 Available online 10 October 2019 Keywords: Rolling bearing Fault diagnosis Ensemble strategy Deep auto-encoder Multi-ensemble

a b s t r a c t A multi-ensemble method is proposed based on deep auto-encoder (DAE) for fault diagnosis of rolling bearings. At first, several DAEs with different activation functions are trained to obtain different types of features, which are merged into a feature pool. Then, the features in the feature pool are evaluated and selected. The classifiers are constructed for each feature. The classification accuracy is used as evaluation index and good features are selected. Finally, the train data is cross-divided and multi sample-sets with selected features are constructed. Each sample-set is used to train a classifier, the final diagnosis result is obtained by majority voting among these classification results. The proposed method is validated by two experimental bearing vibrations signals and also compared with other methods. The results revealed that different types of features can be obtained by DAEs with different activation functions. The proposed method has a high accuracy and good generalization ability. Ó 2019 Elsevier Ltd. All rights reserved.

1. Introduction Rotating machinery is one of the most widely used mechanical equipment in industrial applications [1]. Rolling bearings are the most widely used components in rotating machinery and affect directly the healthy operation of rotating machinery [2]. How to diagnose accurately faults of rolling bearings has become a hotpot among many bearing researches. Two common methods are widely used to diagnose the faults of rolling bearings: signal processing and deep learning. Signal processing has been proved to be an effective method for feature extraction [3–9]. Since the rolling bearing vibration is random cyclostationary, the fault diagnosis method based on spectrum analysis and cyclic spectral was used in fault diagnosis [5,6]. Nevertheless, in speed-varying conditions, the assumption of cyclostationarity of rolling-element bearing vibrations is not available. Borghesani et al. [7] proposed a cepstrum pre-whitening technique based on cepstral analysis. Dany et al. [8] used the angle/time approach to analyze bearing fault vibration and explore its angle/time cyclostationary property. Based on the previous studies, Marcin et al. [9] proposed a supervised and unsupervised clustering method for damage classification of rolling bearings. These signal

⇑ Corresponding author at: School of Mechano-Electronic Engineering, Xidian University, Xi’an 710071, PR China. E-mail address: [email protected] (Q. Wang). https://doi.org/10.1016/j.measurement.2019.107132 0263-2241/Ó 2019 Elsevier Ltd. All rights reserved.

processing methods can be combined with physical models and achieve good results. However, signal processing methods rely heavily on expert experience and prior knowledge, and the quality of manually extracted features directly affects the final results. In recent years, many artificial intelligence methods had been studied for fault diagnosis. Because it can adaptively mine the fault information hidden in the measured signal instead of extracting features manually. Deep belief network (DBN) is a probability generation model, and it is stacked by multiple restricted Boltzmann machines. DBN is widely used in fault diagnosis, including rotating machinery, engine and other machinery. Qin et al. [10] established an improved DBN model for fault diagnosis of planetary gearbox, in which a new Isigmoid unit was designed combining the advantages between leaky rectified linear and Sigmoid. Shao et al. [11,12] proposed an improved convolutional deep belief network combining DBN and convolutional neural network (CNN) to diagnose the faults of rotating machine. The results were better than that by the standard DBN, CNN and deep auto-encoder (DAE). CNN is another common deep learning network. It is a special network structure for processing grid-like structure data and has achieved great success in processing of pictures. CNN is also widely used for fault diagnosis and condition monitoring. Zhang et al. [13] proposed CNN with training interference to solve the fault diagnosis under noise and different working conditions by adding dropout rate to convolution kernel. Wang et al. [14] presented a method for rapidly evaluating reliability and predicting remaining

2

X. Kong et al. / Measurement 151 (2020) 107132

useful life using 2-D CNN, in which the one-dimensional signal was converted into the 2-D image. The results showed that the method had good accuracy and fast calculation ability. In addition, the CNN network model had been gradually improved for faults diagnosis, such as multiscale convolutional neural network [15], CNN with capsule network [16] and dilated CNN [17]. Contrasting with DBN and CNN, DAE is an unsupervised feature learning model. It is also widely used in fault diagnosis. Sun et al. [18] presented the sparse auto-encoder approach to classify the faults of induction motors. Based on the standard DAE, Wang et al. [19] introduced the Gaussian radial basis kernel function into DAE for fault diagnosis, and the results showed that it had a higher accuracy than standard methods. Besides, DAE was used in combination with other methods. Shao et al. [20] proposed an improved DAE for fault diagnosis, in which the wavelet transform was regarded as the activation function of each layer and the extreme learning machine was used as the final classifier. Chen et al. [21] proposed a method for bearing fault diagnosis based on multi-sensor features by combining DAE and DBN. The individual deep learning models had achieved good results for fault diagnosis. But these models are prone to perturbation by different sample sets, input features, algorithm parameters, output expression and other factors. To overcome the problems, ensemble learning is widely used in deep learning models. It mainly contains input ensemble strategy, models ensemble strategy, classifiers ensemble strategy, et al. Shi et al. [22] proposed an ensemble DAE model for tool condition monitoring, in which three DAEs were developed with the inputs of the raw data, fast Fourier transform data and Wavelet packet transformation data. The deep features were fused for condition monitoring. Shao et al. [23] established an ensemble DAE for fault diagnosis of rolling bearings. Several DAEs with different activation functions were designed and the features were integrated for fault diagnosis. The results showed that the method could achieve a good performance in fault diagnosis of rolling bearings. Wang et al. [24] proposed the selective ensemble learning for fault diagnosis. The ensemble learning method investigated a new idea that utilized selective ensemble strategy based on bagging and adaptive optimizing. The results revealed that the method was effective for fault diagnosis of rotary machinery. In the above studies, ensemble learning was widely used in sample set inputs, feature extraction models, algorithm parameters and output expressions. It is rare to use ensemble learning together in inputs, models and outputs. On the other

hand, the existing ensemble strategies were applicable to a specific object and not suitable for other objects. Therefore, this paper proposed a multi-ensemble method based on DAEs for fault diagnosis, and the procedures of the methods can be summarized as follows. Firstly, several DAEs with different activation functions are developed to obtain different types of features. And these features are merged into a feature pool. Then, the features in the feature pool are evaluated and selected. The fault classifiers are constructed for each features. The classification accuracy is used as evaluation index and some good features are selected. Finally, the training data is cross-divided and multi sample-sets with selected features are constructed. Each of sample-set is used to train a classifier, the final diagnosis result is obtained by majority voting among these classification results. The rest parts of this paper are arranged as follows. The related theory including deep auto-encoder and softmax regression are described in Section 2. A multi-ensemble method based on DAE is proposed in Section 3. In Section 4, two cases are presented to validate the method and analyze the effectiveness of the method. In Section 5, the conclusions are summarized and drawn. 2. Basic theory 2.1. Deep auto-encoder The auto-encoder (AE) is a kind of neural network, which can extract meaningful features from the unlabeled data [25]. As shown in Fig. 1, AE is consist of an encoder and a decoder. In encoder, the input data x ¼ fx1 ; x2 ;   xm g is transformed into hidden layer h ¼ fh1 ; h2 ;   hm g through an activation function, which can be expressed as follows:

h ¼ wAct ðW x þ bÞ

ð1Þ

where wAct ðÞ represents the activation function. W is the weight matrix and b is a bias vector. In decoder, the hidden layer h ¼ fh1 ; h2 ;   hm g is transformed   into reconstruction output b x¼ b x1 ; b x2 ;    b x m through an activation function, which can be expressed as follows:

 0 b x ¼ wAct W 0 h þ b 0

ð2Þ 0

where W is the weight matrix and b is a bias vector. The parameters of the AE are trained by minimizing the reconstruction error

Fig. 1. The structure of an AE.

3

X. Kong et al. / Measurement 151 (2020) 107132

multiple AEs using the hidden layer features of the previous AE as the input to the next AE. DAE can be modeled and has the ability to extract deep features of input data. DAE can further improve the accuracy in the process of classification. Therefore, it is chosen in this paper. 2.2. Softmax regression The softmax regression is a common classification method for multi-classification problem. It can classify the sample by predicting probability of this sample belongs to each condition labels [26].

Table 1 The common activation functions. Function names

Equations

Sigmoid

f ðxÞ ¼ 1=ð1 þ ex Þ

Waveform

Tanh

  f ðxÞ ¼ 2= 1 þ e2x  1

Gaussian

f ðxÞ ¼ ex

Arctan

f ðxÞ ¼ ArctanðxÞ

2

Fig. 2. The structure of a DAE.

(loss function) between the input and output using BP algorithm. The mean square error (MSE) is used as the loss function in the standard AE, which can be calculated as follows:

LðhÞAE ¼

M 1 X 2 kb x  x k2 2M i¼1

ð3Þ

where M is the number of samples, h is the parameter set and can be  0 expressed as h ¼ W; b; W 0 ; b . Deep auto-encoder (DAE) is an unsupervised deep learning model. As shown in Fig. 2, it can be constructed by stacking

Sinc

 f ð xÞ ¼

1 ðx ¼ 0Þ sinðxÞ=xðx–0Þ

Softsign

f ðxÞ ¼ x=ð1 þ jxjÞ

Loglog

f ðxÞ ¼ 1  ee

Fig. 3. The feature pool construction.

x

4

X. Kong et al. / Measurement 151 (2020) 107132

Fig. 4. The procedure of feature selection.

   M   For a data set xi ; yi (the xi i¼1 xi 2 RN1 is the training set   M  and the yi i¼1 yi 2 f1; 2;    ; C g is the label set), the model of softmax regression model can predict the probability p(yi = c|xi) for each label of c = 1, 2, . . ., C for each input sample xi. The output of the softmax regression will give a vector that contains C estimated probabilities of the input sample xi belonging to every label. Concretely, the hypothesis form of hh (xi) can be expressed as follows:

3 2  i 2 hT x i 3 p y ¼ 1jxi ; h e1  7 6  i 6 hT2 xi 7 p y ¼ 2jxi ; h 7 6 e 7  i 6 1 7 6 7 hh x ¼ 6 7 ¼ PC hT x i 6 .. 7 6 .. 7 6 k e 5 4 . . c¼1 5 4  i  hTC xi i e p y ¼ Cjx ; h

ð4Þ

where h = [h1, h2, . . ., hC]T are the parameters of the softmax regresP T i sion. It should be notice that the term Cc¼1 ehk x normalizes the distribution, so the sum of the elements of hypothesis equals 1. The model is trained through minimizing the loss function J(h) based on the hypothesis.

J ðhÞ ¼ 

" # T i M X C C X N   1 X e hc x kX þ 1 yi ¼ c log PC h2 T i M i¼1 c¼1 2 c¼1 l¼1 kl ehl x

ð5Þ

l¼1

where 1{} indicates a function that it returns 1 if the condition is true, and 0 otherwise. The k is the weight decay term. The loss function J(h) will be strictly convex and the softmax regression model has a unique solution theoretically when the weight decay term (for any k > 0) exists [27].

X. Kong et al. / Measurement 151 (2020) 107132

5

Fig. 5. The process of the ensemble classifiers.

3. The proposed method In this section, a multi-ensemble method based on DAEs for fault diagnosis of rolling bearings is proposed. It mainly includes feature pool construction, feature selection and fault classification.

3.1. Feature pool construction The individual DAE shows low generalization ability when processing the diverse, complex and massive vibration data collected from rolling bearings [23]. In order to overcome the limitations of a single auto-encoder and increase the generalization performance, the ensemble of multiple different auto-encoders is a good choice. The activation function is used to provide nonlinear modeling ability, which has a significant impact on the performance of the neural network [28,29]. The equations and waveform of some common activation functions are listed in Table 1. The DAE with different activation functions can extract different features from raw data. As shown in Fig. 3, the raw vibration signal is used to train

DAEs parallelly with different activation functions. Then the different deep features are extracted and merged into a feature pool.

3.2. Feature selection The feature pool contains all the features extracted from the different DAEs, there will inevitably be some redundant and useless features, which will have a certain impact on the classification model accuracy. Therefore, it is necessary to remove redundant features in the feature pool and retain useful features to maximize the final classification performance. A feature selection strategy is designed to select some of the good features from the feature pool. As shown in Fig. 4, for each  N train  feature from the feature pool, f i ¼ f ij ; Lj j¼1 f ij 2 R1 ; Lj 2 R1 2 f1; 2;    ; C g; i 2 f1; 2;    ; kgÞ is used to train a softmax classifier Ai to evaluate the feature fi. The corresponding  N v alidation  f ij 2 R1 ; feature from the validation data f i ¼ f ij ; Lj j¼1 Lj 2 R1 2 f1; 2;    ; C g; i 2 f1; 2;    ; kgÞ is used to test and the accuracy ai, i2{1, 2, . . ., k} is acquired. The ai is considered as

6

X. Kong et al. / Measurement 151 (2020) 107132

Fig. 6. The procedure of the proposed method.

the evaluation index of fi. The higher ai means the better classification performance for fi. After, all features fi(i2{1, 2, . . ., k}) are sorted in descending order according to the corresponding ai and it can be expressed as:

f ith ¼ Sortðf i ; ai Þ; i ¼ 1; 2; . . . ; k

ð6Þ

where Sort{Pi ,Qi},i = 1, 2, . . ., k, indicates a function that sort Pi (i = 1, 2, . . ., k) in descending order according to the corresponding Qi (i = 1, 2, . . ., k). Then, fith (i2{1, 2, . . ., k}) are fed into the training set to train classifiers one by one. Firstly, the best feature f1st is used as training data to train a softmax classifier B1. The corresponding feature from the validation data is used to test and the accuracy b1 is acquired. Secondly, the second-best feature f2ed are merged into training data. The training data including best feature f1st and second-best feature f2ed are used to train a softmax classifier B2

Fig. 7. The rolling bearing experiment setup.

7

X. Kong et al. / Measurement 151 (2020) 107132

Fig. 8. The faults of bearing in three locations: (a) ball fault; (b) inner fault and (c) outer fault.

m   X     S Fj; c ¼ P Classifier Ci F j ; c ; ! c ¼ 1; 2; . . . ; C

Table 2 Parameters of drive-end bearing.

ð7Þ

i¼1

Parameters

Values

Bearing type Pitch diameter Ball diameter The number ball Moto speed Sampling frequency

6205-2RS JEM SKF 1.537 in. (39.04 mm) 0.3126 in. (7.94 mm) 9 1797 rpm 12 (kHz)

where



   P Classifier Ci F j ; c ¼

(

  1; if Classifier Ci Fj ¼ c   0; if Classifier Ci Fj –c

ð8Þ

where Classifier Ci(Fj) represents the predicted label of Classifier Ci (i = 1, 2, . . ., m) for feature Fj, The final predicted label Pre(Fj) of feature Fj can be expressed as:

and the accuracy b2 is acquired. Similarly, the other-best features are merged into training data by one and one. The accuracy bi (i2 {1, 2, . . ., k}) is acquired. Finally, the maximum bl in bi (i2{1, 2, . . ., k}) can be got and the corresponding features Fi (i2{1, 2, . . ., l}) are selected.

     Pre F j ¼ maxc2f1;2;...;Cg S F j ; c

where the max{} indicates a function that returns the location of the maximum. And the final result of accuracy can be expressed as:

As shown in Fig. 5, the feature pool is transformed into the fea  N train  F j 2 Rl ; Lj 2 R1 2 f1; 2;    ; C g and F ture set F ¼ F j ; Lj j¼1 is divided into m subsets averagely. Then, m sample-sets are constructed. For sample-set 1, subset 1 is removed and the sampleset 1 is developed by the rest (subset i (i={2, 3, . . ., m})). For sample-set 2, subset 2 is removed and the sample-set 2 is developed by the rest (subset i (i={1, 3, . . ., m})). Similarly, the other sample-sets are constructed in this way. After that, each sample set is used to train a classifier, and m softmax regression classifiers are trained. For the feature Fj, the scores of feature Fj belongs to class c can be calculated as:

  test j¼1 Num PreðF i Þ; Lj

PN Acc ¼

3.3. Fault classification

ð9Þ

ð10Þ

N test

where



   Num Pre F j ; Lj ¼

(

1;

if

0;

if

  Pre Fj ¼ Lj   Pre Fj –Lj

ð11Þ

3.4. Procedure of the proposed method In this study, an intelligent fault diagnosis method of rolling bearings based on DAEs is proposed. The framework of the proposed method is shown in Fig. 6 and the general procedures can be summarized as follows.

Table 3 12 operating conditions of bearing. Location

Diameter (inches)

Orientation

The number of training/validation/testing samples

Labels

Normal Ball Ball Ball Inner race Inner race Inner race Outer race Outer race Outer race Outer race Outer race

0 0.007 0.014 0.021 0.007 0.021 0.028 0.007 0.007 0.014 0.021 0.021

– – – – – – – Vertical @3:00 Center @6:00 Center @6:00 Vertical @3:00 Center @6:00

150/75/75 150/75/75 150/75/75 150/75/75 150/75/75 150/75/75 150/75/75 150/75/75 150/75/75 150/75/75 150/75/75 150/75/75

1 2 3 4 5 6 7 8 9 10 11 12

8

X. Kong et al. / Measurement 151 (2020) 107132

Fig. 9. Vibration signals and spectral distributions of the rolling bearing: (a) normal condition; (b) ball fault condition (0.007); (c) ball fault condition (0.014); (d) ball fault condition (0.021); (e) inner race fault condition (0.007); (f) inner race fault condition (0.021); (g) inner race fault condition (0.028); (h) outer race fault condition (0.007@3:00); (i) outer race fault condition (0.007@6:00); (j) outer race fault condition (0.014@6:00); (k) outer race fault condition (0.021@3:00); (l) outer race fault condition (0.021@6:00).

1) Collect the vibration data of rolling bearings with acquisition device and the raw vibration data is divided into training sample, testing sample and validation sample. 2) Multiple DAEs with different activation functions are trained using the training data. The feature pool is constructed by parallel feature learning using multiple DAEs.

3) All of the features in the feature pool are evaluated through validation data and the features are selected. 4) Train multiple softmax classifiers by sample set k-fold cross. The results are intergrated by majority voting. 5) Finally, the proposed model is tested using the testing samples.

9

X. Kong et al. / Measurement 151 (2020) 107132 Table 4 Parameters of the DAEs. DAEs

DAE 1

DAE 2

DAE 3

DAE 4

DAE 5

DAE 6

DAE 7

Activation function The units of input layer The units of first layer The units of second layer The units of third layer Learning rate Momentum Sparsity parameter Sparse penalty Scaling learning Rate

Sigmoid 400 200 100 80 0.05 0 0.2 3 0.99

Tanh 400 200 100 80 0.05 0 0.2 0 0.99

Gaussian 400 200 100 80 0.01 0 0.1 6 0.99

Arctan 400 200 100 80 0.01 0.2 0.1 0 0.99

Sinc 400 200 100 80 0.008 0.2 0 0 0.97

Softsign 400 200 100 80 0.1 0.2 0 0 0.97

Loglog 400 200 100 80 0.1 0 0 0 0.98

Table 5 Diagnosis results of different methods. Methods Method Method Method Method Method Method Method Method

1 2 3 4 5 6 7 8

Results (DAE with Sigmoid) (DAE with TanH) (DAE with Gaussian) (DAE with Arctan) (DAE with Sinc) (DAE with Softsign) (DAE with Loglog (features pool)

87.80% 74.00% 54.50% 81.17% 82.80% 82.00% 79.50% 92.75%

4. Results and discussion In this section, two cases are used to validate the proposed method. The proposed method is written in MATLAB 2018b and run on Windows 64 with the Core 7500 CPU and 16G RAM. 4.1. Case 1 4.1.1. Bearing experiment The case 1 uses the rolling bearing data from Case Western Reserve University Lab [30]. The experimental setup is shown in

Fig. 7, and is mainly composed of an induction motor, testing bearings and a loading motor. The test bearings support the motor shaft. Motor bearings were seeded with faults using electrodischarge machining as shown in Fig. 8. Each bearing was tested under four different loads (0, 1, 2 and 3 hp). Single point fault with fault diameters of 0.007, 0.014, 0.021, and 0.028 in. (1 in. = 25.4 mm) were introduced separately at the inner raceway, rolling element (i.e. ball) and outer raceway. An accelerometer was attached to the housing with magnetic bases near the drive end to collect the vibration signals. The parameters of the drive-end bearing are listed in Table 2. The collected vibration data under 1797 rpm (0 hp) is used to validate the proposed method, which contains 12 bearing operating conditions that include different fault locations, different fault severities and different fault orientations. Each condition includes 300 samples, and each sample is a collected vibration signal segment consists of 400 sampling data points. The random 150 samples of each condition are used as the training data. And the rest 150 samples are divided into testing data and validation data. The 12 operation conditions of bearing are listed in Table 3. The raw vibration signals samples and spectral distributions of ach bearing conditions are shown in Fig. 9. The images on the left

Fig. 10. Two-dimensional visualizations of different features using KPCA: (a) DAE with Sigmoid; (b) DAE with TanH; (c) DAE with Gaussian; (d) DAE with Arctan; (e) DAE with Sinc; (f) DAE with Softsign; (g) DAE with Loglog; (h) the features pool.

10

X. Kong et al. / Measurement 151 (2020) 107132

Fig. 11. (a) Accuracies of features; (b) Accuracies of the best features.

Table 6 Diagnosis results of methods 8 and 9. Methods

Results

Method 8 (features pool) Method 9 (features pool + feature select)

92.75% 93.56%

column are the raw vibration signals and the images on the right column are the corresponding spectrum. In the time-domain signal, the vibration amplitude under normal condition is relatively small and the vibration is stable (shown in Fig. 9(a)). When the rolling bearing with the ball fault (Fig. 9(b)~(d)), inner race fault (Fig. 9(e)~(g)) and the outer race fault (Fig. 9(h)~(l)), they will produce obvious impact vibration and the vibration amplitude

increase progressively with the increasing fault diameters. In the spectral distributions, there is no obvious frequency distribution in normal condition. The vibration signals with bearing fault have a large amplitude around 3000 Hz and a wide resonance band is generated. These spectral signals contain the bearing natural frequency and fault frequency. Except the bearing under normal condition, the other fault conditions are difficult to distinguish the fault location, direction and degree from the spectrum distribution.

4.1.2. Results and discussion At first, 7 DAEs with different activation functions are developed, respectively. The relevant parameters are listed in the Table 4. For each DAE, the input is 400-dimension raw vibration data and the output is 80-dimension features. The softmax

Table 7 The selected features of each DAE. DAEs

DAE 1

DAE 2

DAE 3

DAE 4

DAE 5

DAE 6

DAE 7

Activation function The number of selected feature

Sigmoid 75

TanH 73

Gaussian 11

Arctan 3

Sinc 75

Softsign 26

Loglog 4

Fig. 12. (a) Good feature; (b) Bad feature.

X. Kong et al. / Measurement 151 (2020) 107132

11

Fig. 13. The classfication accuracy of each classifiers.

Table 8 Diagnosis results of different methods. Methods

Results

Method Method Method Method Method Method Method Method Method Method

87.80% 74.00% 54.50% 81.17% 82.80% 82.00% 79.50% 92.75% 93.56% 96.44%

1 (DAE with Sigmoid) 2 (DAE with TanH) 3 (DAE with Gaussian) 4 (DAE with Arctan) 5 (DAE with Sinc) 6 (DAE with Softsign) 7 (DAE with Loglog) 8 (features pool) 9 (features pool + features select) 10 (the proposed method)

classifiers are established with the corresponding features to diagnose the faults of bearing and the results are listed in Table 5 (named Methods 1 ~ 7). Then, a feature pool is constructed using the output features of 7 DAEs and the feature pool contains 560 features in total. A softmax classifier is trained using the features pool of the training data and tested using the validation data (named Method 8). The diagnosis results are also listed in Table 5. From Table 5, it can be seen that the highest accuracy of individual DAE is about 87.80% (Method 1), which the Sigmoid is used as the activation function. The lowest accuracy of individual DAE is 54.50% (Method 3), which the Gaussian is used as the activation function. The accuracies of the other DAEs are around 80.00%. The accuracy of method 8 reaches 92.75%, which is higher than the classification accuracy of other methods. In order to show learning ability of the feature in each method (Methods 1 ~ 8), kernel principal analysis (KPCA) is used for visualizing. Fig. 10 is the two-dimensional visualization of different kind of features, respectively, in which KPC1 and KPC2 represent the first and second principle components, respectively. The labels 1–12 corresponding to the bearing operation conditions are listed in Table 3. It can be seen that the distribution of the points with the same color is relatively dispersive and the features are hard to distinguish from each other in Fig. 10(b)~(g). In Fig. 10(a), although the distribution of points of the same color is relatively concentrated, some points are still difficult to separate. Therefore, classification performance is not good for each individual DAE extraction feature. However, in Fig. 10 (h), the point distribution of the same color is relatively concentrated, and the classifier is easy to separate. It indicates that the features in the feature pool are relatively complete and can effectively absorb

the feature information extracted by different DAEs with different characteristics, thus greatly improving the classification performance. Feature selection has much potential in improving mechanical fault diagnosis [31]. In order to evaluate the effect of redundant features on the results, the features in the feature pool will be evaluated and selected (named Method 9). Classifiers is constructed for each feature. The corresponding features in the verification data feature pool are used for testing. The accuracies of a1 ~ a560 are acquired and the results are shown in Fig. 11(a). Then, all features are sorted in descending order according to the corresponding the accuracy ai. The best features are used as training data to train a classifier. The corresponding feature from the validation data are used to test and the accuracy b1 is acquired. The second-best features are chosen and the accuracy b2 is acquired as before. Similarly, the b1 ~ b560 are acquired and shown in Fig. 11(b). It can be seen that b289 is the maximum one among b1 ~ b560. So the best 289 features are selected. The selected 289 features from feature pool of training data are used to train a classifier and the corresponding features from feature pool of validation data are used to test. The results are listed in Table 6 and the distribution of selected features from each DAE is shown in Table 7. As listed in Table 6, after features selection, the accuracy (Method 9) increased by 0.81% compared with that without features selection (Method 8). This is because some useless and bad features (shown in Fig. 12(b)) in the feature pool are removed, and some good features (shown in Fig. 12(a)) with better classification performance are retained to maximize the overall classification performance. The distribution of retained features in each DAE are listed in Table 7, it can be seen that the distribution of selected features on each DAE are not uniform. It indicates that the different DAEs with different activation functions could extract different types of feature. Some features have good classification ability, and some features have poor classification ability. Finally, the training data are cross-divided and each sample-set with selected features is used to train a classifier. The final diagnosis result is obtained by majority voting among these classification results. Training samples are divided averagely into m subsets. In this case, 1800 training samples are divided into 15 subsets (Named subset 1 ~ subset 15), and each subset contains 120 samples. Then, 15 sample-sets are constructed. For sample-set 1, subset 1 is removed and the rest subsets (subset 2 ~ subset 15 contains 1680 samples) are used to train classifier 1. For sample-set 2,

12

X. Kong et al. / Measurement 151 (2020) 107132

Fig. 14. Muti-class confusion matrix of the proposed method.

Table 9 Diagnosis results of different methods. Methods BPNN with raw data BPNN with 24 features BPNN with whiten BPNN with PCA BPNN with normalization SVM with raw data SVM with 24 features SVM with whiten SVM with PCA SVM with normalization Standard DAE Standard DBN Standard CNN The proposed method

Table 10 Parameters of rolling bearings.

Results

Parameters

Values

51.96% 88.22% 53.08% 78.42% 83.25% 51.54% 90.81% 71.33% 75.25% 80.75% 87.80% 86.98% 91.97% 96.44%

Bearing type Ball diameter Number of ball Sampling frequency Groove section size Inner diameter of bearing Outer diameter of bearing Contact angle Rotate speed

Rolling bearing 6308 15 mm 8 10,240 Hz 65.5 mm 40 mm 90 mm 0° 1309 r/min

subset 2 is removed and the rest subsets (subset 1 and subset 3 ~ subset 15 contains 1680 samples) are used to train classifier 2. Similarly, the other sample-sets (sample-set 3 ~ sample-set 15) are constructed and the other classifiers (classifier 3 ~ classifier 15) are trained. Then, the classification results are obtained by majority voting and shown in Fig. 13. It can be seen that the highest accuracy of individual classifiers is about 95.67% and the accuracy of ensemble classifiers is about 96.44%. The accuracy of ensemble method increased by 0.77% compared with that of the best individual classifier. The performance of ensemble method is better than that of all the individual classifiers. The results reveals

Table 11 Operation conditions of rolling bearings. Fault locations

Diameters

Labels

Normal Inner race Outer race Ball

0 1 mm2 7 mm2 7 mm2

1 2 3 4

that the classifier ensemble can significantly improve diagnosis accuracy of rolling bearings by designing sample sets to build different classifiers. In summary, the proposed method has carried out DAEs ensemble with different activation functions,

the fault different multiple features

X. Kong et al. / Measurement 151 (2020) 107132

13

Fig. 15. Vibration signals of the rolling bearing operating conditions: (a) normal condition; (b) Inner race fault condition; (c) Outer race fault condition; (d) Ball fault condition.

Fig. 16. Muti-class confusion matrix of the proposed method. Table 12 Diagnosis results of different methods. Methods

Results

Method Method Method Method Method Method Method Method Method Method

93.33% 68.17% 71.67% 82.17% 86.83% 86.33% 91.00% 94.17% 99.17% 100.00%

1 (DAE with Sigmoid) 2 (DAE with TanH) 3 (DAE with Gaussian) 4 (DAE with Arctan) 5 (DAE with Sinc) 6 (DAE with Softsign) 7 (DAE with Loglog) 8 (feature pool) 9 (feature pool + feature select) 10 (the proposed method)

selection and classifiers ensemble. The results of different methods are listed in Table 8 and the multi-class confusion matrix of the proposed method are shown in Fig. 14. It can be seen that the classification accuracy improves through ensemble learning and the proposed method has the highest accuracy by multi-ensemble learning. The proposed method can effectively extract the advantages of other methods. 4.1.3. Comparisons with other methods In this section, two shallow machine learning methods BP neural network (BPNN) and support vector machine (SVM), three deep learning methods standard DAE, standard DBN and standard CNN are used to compare the diagnosis results. The standard DAE, standard DBN and standard CNN have not any signal preprocessing in bearing fault diagnosis, and the input of these methods are always the raw vibration data. BPNN and SVM have 5 type of inputs, the first is raw vibration data, and the second is 24 features (11 time-domain features and 13 frequency-domain features) [32]. The other three types are the whiten data, regularization data and the data by principal component analysis (PCA). And the testing diagnosis results are listed in Table 9. As listed in Table 9, for BPNN, the classification accuracy is the highest (88.22%) when 24 features are used as input compared with the other four input methods (51.96%, 53.08%, 78.42% and 83.25%). For SVM, compared with the other four input methods (51.54%, 71.33%, 75.25% and 80.75%), the classification accuracy is the highest when the input is 24 features. These results confirm that the shallow machine learning can achieve a better classification performance, but it is rely on domain knowledge to choose

the appropriate data preprocessing methods. The two deep learning methods, standard DAE and DBN, also achieved good classification accuracy (87.80% and 86.98%) without any signal preprocessing. Standard CNN achieved the highest accuracy (91.97%) among all comparison methods, and 1.16% higher than the highest classification accuracy in shallow machine learning methods. This is because the deep learning can adaptively mine the fault information hidden in the vibration signal. The proposed method can extract the features with different sensitivity to different data sets, and evaluate and select them adaptively. Finally, the diagnosis result is obtained by majority voting multi classifiers results, and it has the highest accuracy (96.44%) and increase by 4.47% compared with the standard CNN. The classification accuracy is further improved on the basis of deep learning. 4.2. Case 2 4.2.1. Bearing experiment To validate the effective and generalization of the proposed method, another case of bearing fault diagnosis is studied. The experiment set consist of a governor, a driving motor, a power box, a rolling bearing mounting frame, an axial loading device and a radial loading device. The speed range of the bearing is 0 ~ 3000 r/min. The rolling bearing is used in this experiment and the parameters of the bearing is listed in Table 10. The bearing contains outer ring, inner ring, ball and cage. As listed in Table 11, four different states are tested in this experiment, which contains normal, inner ring spalling, outer ring spalling and ball spalling. The values of spalling is also listed in Table 11. In the experiment, the vibration signals are collected by accelerometer sensors. The sampling time is 2 s and the sampling frequency is 10,240 Hz. The experiment is repeated 12 times in each operation condition. The operation data in each condition is divided into 600 samples, in which 300 samples are used as a training set, 150 samples are used as testing set and the rest 150 samples are used as validation set. Fig. 15 shows the raw data samples of each bearing conditions. 4.2.2. Results and discussion Different types of features are extracted by 7 DAEs with different activation functions, and all of features are merged into feature pool. Then, the good features is selected and the training data are

14

X. Kong et al. / Measurement 151 (2020) 107132

cross-divided as sample-sets. The sample-sets are used to train multiple classifiers. The final results can be obtained by majority voting. In order to demonstrate the effectiveness of the proposed method, 7 DAEs with different activation functions are used to diagnose faults of bearing and results is listed in Table 12 (named Method 1 ~ 7). It can be seen that the best accuracy of the individual DAE is 93.33%, in which the activation function is the Sigmoid activation. The worst accuracy of the individual DAE is 68.17%, in which the activation function is the TanH activation. The accuracies of other DAEs are about 80%. The accuracies of the feature pool method, feature selection method and the proposed method are 94.17%, 99.17% and 100%, respectively. The multi-class confusion matrix of the proposed method is shown in Fig. 16. It reveals that the classification accuracy improves through ensemble learning and the proposed method has the highest accuracy by multiensemble learning. The proposed method can effectively extract the advantages of other methods. From the above two cases, the results show that the accuracy of the proposed method is higher than other individual DAEs and other machine learning methods. Through the multiple feature extraction models ensemble, feature selection and classifiers ensemble, the classification accuracy improved gradually. The proposed method can effectively extract the advantages of other methods. In addition, the proposed method can adaptively extract features, select features and integrate classification results, and the proposed method has good generalization ability.

5. Conclusions (1) A novel multi-ensemble method based on DAEs is proposed for fault diagnosis of rolling bearings. Firstly, several DAEs with different activation functions are developed to extract different types of features, and the features are used to form a feature pool. Secondly, the features in the feature pool are evaluated and selected. The fault classifiers are constructed for each features. The classification accuracy is used as evaluation index and some good features are selected. Finally, the training data are cross-divided and multi sample-sets with selected features are constructed. Each of sample-set is used to train a classifier, the final diagnosis result is obtained by majority voting among these classification results. (2) The proposed method is applied to recognize fault conditions of rolling bearing, which include fault location, fault size and fault orientation. The accuracy of the proposed method is higher than other individual DAEs and other machine learning methods. Multiple feature extraction model, feature selection and classifiers ensemble are helpful to improve the classification accuracy. The proposed method can effectively extract the advantages of other methods and has good generalization ability. In addition, the proposed method is suitable to the known operation conditions. For the unknown operation condition, the classification performance will decrease. In further research, the unknown operation condition will be taken into consideration.

Funding This work was supported by Key National Science & Technology Major Project of China (No. 2017ZX04006001).

Declaration of Competing Interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper. References [1] Y. Lei, J. Lin, Z. He, M.J. Zuo, A review on empirical mode decomposition in fault diagnosis of rotating machinery, Mech. Syst. Sig. Process. 35 (2013) 108–126. [2] N. Sawalhi, R.B. Randall, Vibration response of spalled rolling element bearings: Observations, simulations and signal processing techniques to track the spall size, Mech. Syst. Sig. Process. 25 (2011) 846–870. [3] A. Glowacz, Fault detection of electric impact drills and coffee grinders using acoustic signals, Sensors (Basel) 19 (2) (2019) 269. [4] A. Głowacz, Recognition of acoustic signals of induction motor using FFT, SMOFS-10 and LSVM, Ekspolatacja i Niezawodnosc – Maintenance and Reliability, 17 (2015) 569–574. [5] J. Antoni, Cyclic spectral analysis of rolling-element bearing signals: Facts and fictions, J. Sound Vib. 304 (2007) 497–529. [6] J. Antoni, R.B. Randall, Differential diagnosis of gear and bearing faults, J. Vib. Acous. 124 (2002) 165–171. [7] P. Borghesani, P. Pennacchi, R.B. Randall, N. Sawalhi, R. Ricci, Application of cepstrum pre-whitening for the diagnosis of bearing faults under variable speed conditions, Mech. Syst. Sig. Process. 36 (2013) 370–384. [8] D. Abboud, J. Antoni, M. Eltabach, S. Sieg-Zieba, Angle\time cyclostationarity for the analysis of rolling element bearing vibrations, Measurement 75 (2015) 29–39. [9] M. Stra˛czkiewicz, P. Czop, T. Barszcz, Supervised and unsupervised learning process in damage classification of rolling element bearings, Diagnostyka 17 (2016). [10] Y. Qin, X. Wang, J. Zou, The optimized deep belief networks with improved logistic Sigmoid units and their application in fault diagnosis for planetary gearboxes of wind turbines, IEEE Trans. Ind. Electron. 66 (5) (2018) 3814–3824. [11] H. Shao, H. Jiang, H. Zhang, W. Duan, T. Liang, S. Wu, Rolling bearing fault feature learning using improved convolutional deep belief network with compressed sensing, Mech. Syst. Sig. Process 100 (2018) 743–765. [12] H. Shao, H. Jiang, H. Zhang, T. Liang, Electric locomotive bearing fault diagnosis using a novel convolutional deep belief network, Mech. Syst. Sig. Process 65 (2018) 2727–2736. [13] W. Zhang, C. Li, G. Peng, Y. Chen, Z. Zhang, A deep convolutional neural network with new training methods for bearing fault diagnosis under noisy environment and different working load, Mech. Syst. Sig. Process. 100 (2018) 439–453. [14] Q. Wang, B. Zhao, H. Ma, J. Chang, G. Mao, A method for rapidly evaluating reliability and predicting remaining useful life using two-dimensional convolutional neural network with signal conversion, J. Mech. Sci. Technol. 33 (2019) 2561–2571. [15] J. Zhu, N. Chen, W. Peng, Estimation of bearing remaining useful life based on multiscale convolutional neural network, IEEE Trans. Ind. Electron. 66 (2019) 3208–3216. [16] Z. Zhu, G. Peng, Y. Chen, H. Gao, A convolutional neural network based on a capsule network with strong generalization for bearing fault diagnosis, Neurocomputing 323 (2019) 62–75. [17] M.A. Khan, Y.-H. Kim, J. Choo, Intelligent fault detection via dilated convolutional neural networks, IEEE Int. Conf. Big Data Smart Comput. 2018 (2018) 729–731. [18] W. Sun, S. Shao, R. Zhao, R. Yan, X. Zhang, X. Chen, A sparse auto-encoderbased deep neural network approach for induction motor faults classification, Measurement 89 (2016) 171–178. [19] F. Wang, B. Dun, X. Liu, Y. Xue, H. Li, Q. Han, An enhancement deep feature extraction method for bearing fault diagnosis based on kernel function and autoencoder, Shock Vibr., 2018 (2018). [20] H. Shao, H. Jiang, X. Li, S. Wu, Intelligent fault diagnosis of rolling bearing using deep wavelet auto-encoder with extreme learning machine, Knowl. Based Syst. 140 (2018) 1–14. [21] Z. Chen, W. Li, Multisensor feature fusion for bearing fault diagnosis using sparse autoencoder and deep belief network, IEEE Trans. Instrum. Meas. 66 (2017) 1693–1702. [22] C. Shi, G. Panoutsos, B. Luo, H. Liu, B. Li, X. Lin, Using multiple-feature-spacesbased deep learning for tool condition monitoring in ultraprecision manufacturing, IEEE Trans. Indus. Elect. 66 (2019) 3794–3803. [23] H. Shao, H. Jiang, Y. Lin, X. Li, A novel method for intelligent fault diagnosis of rolling bearings using ensemble deep auto-encoders, Mech. Syst. Sig. Process. 102 (2018) 278–297. [24] Z.-Y. Wang, C. Lu, B. Zhou, Fault diagnosis for rotary machinery with selective ensemble neural networks, Mech. Syst. Sig. Process. 113 (2018) 112–130. [25] L. Wen, L. Gao, X. Li, A new deep transfer learning based on sparse auto-encoder for fault diagnosis, IEEE Trans. Sys. Man Cyber. Sys. 49 (1) (2017) 136–144. [26] Y. Lei, F. Jia, J. Lin, S. Xing, S.X. Ding, An intelligent fault diagnosis method using unsupervised feature learning towards mechanical big data, IEEE Trans. Ind. Electron. 63 (2016) 3137–3147.

X. Kong et al. / Measurement 151 (2020) 107132 [27] C. Bielza, V. Robles, P. Larrañaga, Regularized logistic regression without a penalty term: an application to cancer classification with microarray data, Expert Sys. Appl. 38 (2011) 5110–5118. [28] X. Ding, J. Cao, A. Alsaedi, F.E. Alsaadi, T. Hayat, Robust fixed-time synchronization for uncertain complex-valued neural networks with discontinuous activation functions, Neural Netw. 90 (2017) 42–55. [29] S.S. Liew, M. Khalil-Hani, R. Bakhteri, Bounded activation functions for enhanced training stability of deep neural networks on visual pattern recognition problems, Neurocomputing 216 (2016) 718–734.

15

[30] https://csegroups.case.edu/bearingdatacenter/pages/project-history. [31] L. Lu, J. Yan, C.W. de Silva, Dominant feature selection for the fault diagnosis of rotary machines using modified genetic algorithm and empirical mode decomposition, J. Sound Vib. 344 (2015) 464–483. [32] J. Qu, Z. Zhang, T. Gong, A novel intelligent method for mechanical fault diagnosis based on dual-tree complex wavelet packet transform and multiple classifier fusion, Neurocomputing 171 (2016) 837–853.