Multiscale local features learning based on BP neural network for rolling bearing intelligent fault diagnosis

Journal Pre-proofs Multiscale local features learning based on BP neural network for rolling bearing intelligent fault diagnosis Jimeng Li, Xifeng Yao...

Download PDF

2MB Sizes 1 Downloads 67 Views

Report

PDF Reader
Full Text

Journal Pre-proofs Multiscale local features learning based on BP neural network for rolling bearing intelligent fault diagnosis Jimeng Li, Xifeng Yao, Xiangdong Wang, Qingwen Yu, Yungang Zhang PII: DOI: Reference:

S0263-2241(19)31286-2 https://doi.org/10.1016/j.measurement.2019.107419 MEASUR 107419

To appear in:

Measurement

Received Date: Revised Date: Accepted Date:

11 September 2019 11 December 2019 16 December 2019

Please cite this article as: J. Li, X. Yao, X. Wang, Q. Yu, Y. Zhang, Multiscale local features learning based on BP neural network for rolling bearing intelligent fault diagnosis, Measurement (2019), doi: https://doi.org/10.1016/ j.measurement.2019.107419

This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of record. This version will undergo additional copyediting, typesetting and review before it is published in its final form, but we are providing this version to give early visibility of the article. Please note that, during the production process, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

© 2019 Elsevier Ltd. All rights reserved.

Multiscale local features learning based on BP neural network for rolling bearing intelligent fault diagnosis Jimeng Li*, Xifeng Yao, Xiangdong Wang, Qingwen Yu, Yungang Zhang College of Electrical Engineering, Yanshan University, Qinhuangdao 066004, PR China *Corresponding author Email: [email protected]

Abstract: Traditional intelligent fault diagnosis techniques based on artificially selected features fail to make the most of the raw data information, and are short of the capabilities of feature self-learning. Moreover, the most informative and distinguished parts of the different faults signals only account for a small portion in the time domain and frequency domain signals. Therefore, in order to learn the discriminative features from the raw data adaptively, this paper proposes a multiscale local feature learning method based on back-propagation neural network (BPNN) for rolling bearings fault diagnosis. Based on the local characteristics of the fault features in the time domain and the frequency domain, the BPNN is used to locally learn meaningful and dissimilar features from signals of different scales, thus improving the fault diagnosis accuracy. Two sets of rolling bearing datasets are adopted to verify the validity and superiority of the proposed method by comparing with other methods.

Keywords: Rolling bearing fault diagnosis; Multiscale; local feature learning; BPNN; SVM

1. Introduction Rotating machinery has played an increasingly important role in the modern industry. As important supporting components, rolling bearings have been widely used in various rotating machinery, and theirs health conditions are closely related to the safe and stable operation of mechanical equipment [1-3]. Therefore, it is of great significance to study the effective condition monitoring and fault diagnosis techniques for evaluating the health conditions of rolling bearing quickly and accurately. As an important carrier of mechanical equipment status information, the vibration signals caused by various failures are different. How to extract valuable feature information from complex vibration signals to evaluate the health conditions of the equipment has been the focus of research. Intelligent diagnosis technologies can effectively analyze and process vibration data to obtain accurate diagnosis results by using effective feature information, thus providing a powerful tool for complex data analysis and pattern recognition. The traditional intelligent fault diagnosis techniques, such as Neural networks, Support vector machines (SVM) and so on, are almost all need to extract the features of the raw data by means of various signal processing methods, and then use the artificially selected feature vector as the input of the intelligent classifiers to realize the multi-fault recognition and classification. For example, Yan et al. proposed a fault diagnosis method based on improved multiscale decentralized entropy and maximum correlation and minimum redundancy, and used the extreme learning machine as a classifier to realize the classification of the different faults of the rolling bearing and gearbox [4]. Dou et al. used the frequency domain features and the time domain dimensionless features of the fault bearing data as the input of the classifier, and discussed the performance of K-nearest neighbor algorithm, probabilistic neural network and particle swarm optimization-SVM in fault diagnosis [5]. Glowacz et al. applied FFT, method of selection of amplitudes of frequencies (MSAF-12) and mean of vector sum to obtain the feature vectors of vibration signals, and then input it into three classifiers to achieve fault diagnosis of induction motor [6]. Yuan et al. adopted the BPNN as the classifier, and used the improved multivariate multiscale sample entropy of original signals as the input of BPNN to achieve fault diagnosis of rolling bearings [7]. Lu et al. proposed an improved sensitive feature selection algorithm to adaptively select parameters from multi-dimensional feature vectors constructed by time and frequency domain characteristics, and then the RLS-BP neural network was used for fault diagnosis by classifying the feature vectors of normal signal and faulty signals [8]. Although these studies achieved good results, there are still some shortcomings in the methods combining the feature extraction algorithms and classification models: (1) the selection of effective features depends on human experience and lacks adaptability, which is difficult to meet the requirements for analyzing various fault data in the engineering practice; (2) the feature extraction algorithms

need artificial design, and it is difficult to make the most of the information of the raw data itself. As a new research hotspot in machine learning, deep learning has powerful feature extraction and modeling representation capabilities, which has attracted extensive attention and research in many fields [9-11]. At present, various deep learning algorithms [12-14], such as convolutional neural network (CNN), deep belief network (DBN), stack auto-encoder (SAE) and so on, have been successfully applied in the fields of fault diagnosis. For example, Sun et al. introduced "dropout" technology to the sparse auto-encoder to realize the fault identification of the asynchronous motor [15]; Shao et al. proposed a fault diagnosis method of rolling bearing based on deep wavelet auto-coders network and extreme learning machine (ELM), which can effectively diagnose the fault location and severity of rolling bearing [16]; Wen et al. proposed a data-driven CNN method for fault diagnosis. The method first converted the time domain signals into 2-D images, and then used them as the input of the CNN to extract effective features, thereby realizing fault classification [17]. Aiming at the influence of variable working conditions and noisy environment on the performance of the intelligent diagnosis algorithms, Zhang et al. proposed a new CNN method with training interference, which can realize fault diagnosis of rolling bearing under noise environment and different working load by directly using the raw vibration signals [18]. The method improved the robustness and classification performance of the model by reusing the data points between the adjacent samples and optimizing the model parameters. In addition, based on the SAE, Qi et al. adopted EEMD and AR models to preprocess the original vibration signals to get AR parameters, and then used them as the input of the SAE to achieve the fault diagnosis of rolling bearing and gearbox [19]. Inspired by the feature extraction capability of auto-encoders and the high training speed of ELM, Mao et al. proposed an auto-encoder-ELM-based fault diagnosis method [20]. This method can automatically extract effective features from the spectra of the signals and obtain better classification performance. In view of some shortcomings existing in the denoising autoencoders, Meng et al. proposed an enhanced denoising autoencoder method for rolling bearings fault diagnosis [21]. These methods utilize time domain or frequency domain signals as the input of the deep learning models for feature learning and fault classification, and have achieved some good research results. However, these methods require a large amount of sample data for parameter learning, and do not consider the influence of effective local feature information in the signals on parameter learning. Thus, according to the nonlinearity of fault signals and the periodicity of the transient features, some researchers have carried out some research work from the perspective of multiscale or local feature learning. Considering the multiscale characteristics inherent in vibration signals, Jiang et al. studied a multiscale CNN structure to perform multiscale feature extraction, thus identifying the wind turbine gearbox faults [22]. By combining wavelet packet multiscale transform and DBN, Gan et al. proposed a hierarchical fault

diagnosis network, which used the wavelet packet energy as the input of DBN to effectively identify the bearings fault location and severity [23]. Yan et al. proposed an intelligent fault diagnosis method based on multi-level wavelet packet and CNNs. The wavelet packet transform was employed to construct multi-level wavelet coefficients matrixes for representing the nonstationary vibration signal, and then, the CNNs were used to learn the multi-level fault features automatically, so as to realize the fault classification of gearbox [24]. Li et al. used the VMD to decompose the vibration signals into narrowband components, and the power spectral entropy of each component was used as the input of DNN constructed by Autoencoder and BPNN to realize the reduction of original features and the fault classification of the planetary gearbox [25]. Sun et al. proposed a convolutional discriminative feature learning method [26]. The method firstly utilized the BPNN to learn the local features from the raw data, and then obtained the final features through the constructed feed-forward convolutional pooling architecture. Finally, the SVM was used as the classifier to realize the fault diagnosis of the induction motor. In view of the shortcomings of traditional autoencoders, Jia et al. presented a local connection network constructed by normalized SAE for intelligent fault diagnosis [27]. It can locally learn various meaningful features from raw vibration signals to recognize mechanical health conditions. Lin et al. studied an integrated hierarchical learning framework based on autoencoder and ELM for the feature learning and PHM modeling [28]. In this method, considering the redundancy of data information, the long data samples were divided into multiple segments and then input to the ELM based autoencoder for local feature learning. By dividing the long data samples into multiple segments, Lei et al. utilized the sparse filtering to perform unsupervised feature learning, and combined with the Softmax classifier to realize the fault diagnosis of rolling bearings [29]. The above analysis indicates that the vibration data in the time domain not only contains lots of redundant information and noise interferences, but also the part with higher discrimination only accounts for a small part of the signal. If the length of the data sample used for feature learning is long, the extraction accuracy of the effective features will be reduced. Similarly, from the frequency domain perspective, the vibration response frequencies caused by different mechanical failures are also different. If the vibration data in the whole frequency band is used to locally learn various effective features, noise and interferences in other frequency bands will still affect the extraction of useful information features in the sensitive frequency band, thereby reducing the features extraction accuracy. Therefore, the vibration response features caused by different mechanical faults not only have local characteristics in the time domain, but also in the frequency domain. The above intelligent diagnosis methods are to perform local feature learning on vibration data from the time domain or the frequency domain, and fail to simultaneously consider the local characteristics of the fault features in the time domain and the frequency domain.

Accordingly, this paper proposes a multiscale local feature learning method for rolling bearing intelligent fault diagnosis. Firstly, the wavelet multiscale transform is employed to decompose the original vibration signals into sub-signals of different scales to prevent frequency components in other frequency bands from affecting the effective features learning in the sensitive frequency band; secondly, the BPNN is used to perform local feature learning on sub-signals of different scales to obtain discriminative local features; finally, the features obtained at different scales of the samples are combined as the input of the classifier to realize the fault classification and diagnosis. The main contributions of this paper can be summarized as follows: (1) The multiscale local feature learning strategy is presented in the paper, which realizes the local feature learning of raw vibration data at different scales and improves the extraction accuracy of various meaningful features. Meanwhile, in view of the good state recognition ability and the simple network structure [30], BPNN is selected as the model for local features learning in the paper. Moreover, BPNN can learn robust features in an efficient and effective way to improve the fault diagnosis performance [26]. (2) Two sets of rolling bearing fault diagnosis cases are adopted to verify the proposed method. In the diagnosis case of rolling bearing test bench, the influence of two important parameters of the proposed method on the classification performance is analyzed in detail. Moreover, the effectiveness and superiority of the proposed method is verified by comparison with other related methods. The rest of the paper is organized as follows. Section 2 provides a brief overview of the theoretical background of BPNN. Section 3 describes in details the proposed method. In Section 4, the reliability and superiority of the proposed method are verified by using two sets of rolling bearing datasets. Finally, some conclusions are drawn in Section 5.

2. The theoretical basis of BPNN BPNN is a common supervised machine learning algorithm that uses the BP algorithm and the labeled training samples to optimize the randomly initialized network weights to achieve prediction or classification. Fig.1 shows a three-layer BPNN, including the input layer, the hidden layer and the output layer. In the BPNN, the N-dimensional vector x={x1, x2, …, xN} represents the input of the network, the d-dimensional vector h  {h1, h2 ,, hd } represents the feature representations of the hidden layer, and the affine mapping between the input layer and hidden layer can be established by a nonlinear transform, which can be expressed as follows

h  f Wx  b 

(1)

where W and b represent the weight matrix and bias vector of the network, respectively. f(∙) is the sigmoid activation function f  x  

1 . 1  e x Input layer

x1

W

Hid layer

h1

ϕ

Output layer

x2

y1

h2

x3





Fig. 1

yc

hd

xN 1



b

Softmax classifier

1

Network structure of BPNN

In the BPNN, the output layer is a Softmax classifier, and the feature representations of the hidden layer are mapped to the c-dimension output vector yˆ . Each dimension in the output vector yˆ represents the probability that the current input features belongs to a certain category, and the category corresponding to the maximum probability value is the category to which the current feature representation belongs. The mapping relationship between the hidden layer and the labeled output layer is as follows

yˆ  f (h)

(2)

where the output layer classifier f is Softmax function, as shown in Eq. (3)

 p( y (i )  (i ) f ( h( i ) )   p ( y  (i )  p( y

 1| h  ,  )  i  2 | h  ,  )      i  c | h  ,  )  i

1



c

kT h( i )

e k 1

e1T hi     T hi   e 2     ecT hi    

(3)

(i ) d where, h R , i=1, 2, ..., m, denotes the feature vector, y ( i )  {1, 2, , c} is the labeled set, and c denotes the

number of categories,   [1T , 1T ,, cT ]T is the classifier parameters. To optimize parameters, the mean squared error is used as the cost function to optimize the parameters. Furthermore, to alleviate the network overfitting problem, the L2 norm regularization is introduced, and the cost function is described as

[W  , b ]  arg min W ,b

2 1 m  1 (i ) (i )   y  f ( f ( x ))    W m i 1  2 

2 2

where y ( i ) denotes the ith output, and λ is the penalty factor. The BPNN obtains the optimal parameters by

(4)

minimizing the cost function.

3. The proposed method In this section, the proposed method for rolling bearing fault intelligent diagnosis is described in details, and the algorithm flowchart is shown in Fig. 2. In the proposed method, the wavelet multiscale transform is employed to decompose the raw data into sub-signals of different scales to construct the samples for the local features extraction at different scales; the mapping relationship between the data features and the labeled data is established by the training the BPNN model to obtain the discriminative feature representations of hidden layer. Finally, the local features obtained at different scales are combined as the input of SVM to realize the intelligent fault diagnosis of rolling bearings. Vibration data x of rolling bearing

Wavelet multiscale transform

Training sample consisting of 4 components {A3, D3, D2, D1}

Testing sample consisting of 4 components {A3, D3, D2, D1}

For each component

BPNN-based local feature learning

Divide training sample into multiple segments

Parameter passing (W, b)

Multicale local feature combination

BPNN-based local feature learning Parameter passing (W, b)

Training of BPNN Parameter learning

SVM model training Fault diagnosis model training

Fig.2

For each component

Multicale local feature combination

SVM model-based fault identification Fault diagnosis model testing

The flowchart of the proposed method for rolling bearing fault diagnosis

3.1 Data preprocessing based on wavelet multiscale transform The vibration signals collected from the rotating machinery are complex and non-stationary. The valuable feature information caused by localized defects is superimposed with other interference components in the time domain signals, and considering the local characteristics of the fault features in the frequency domain, the feature directly learned from the raw data may not be optimal. As an effective signal processing method for multi-resolution analysis, wavelet transform can decompose the raw signals into sub-signals in different independent frequency bands without redundancy and leakage, which is convenient for analyzing and extracting features from different scales. Therefore, wavelet transform has been widely studied and applied in the field of mechanical fault diagnosis [31, 32]. In this paper, the discrete wavelet transform (DWT) is adopted as the preprocessing method for the

multiscale analysis of the raw data, to realize the feature learning and extraction at different scales. The description of the DWT is as follows. Firstly, the original vibration signal x is decomposed onto J scale by the DWT. Let a0 = x, the discrete wavelet decomposition at the j scale is as follows

a j 1 ( k )   a j ,k ( n ) j ,k ( n )   d j ,k ( n ) j ,k ( n ) n

(5)

n

where  j ,k and  j ,k are the scaling function and the wavelet function, respectively; aj and dj represent the approximation coefficients and the detail coefficients, respectively, as shown below

a j ( k )   a j 1 ( n )h( n  2k )  n ,  d j ( k )   a j 1 ( n ) g ( n  2k )  n

j  1, 2,, J

(6)

in which, h(n) and g(n) are the low-pass filter coefficients and high-pass filter coefficients for discrete wavelet decomposition. Through the above analysis, the approximation coefficient aJ and the detail coefficients {dJ,…, d2, d1} of different scales can be obtained. Secondly, the single component reconstruction is performed by using the approximation coefficient and the detail coefficients respectively. That is, when performing the single component reconstruction using the approximation coefficient aJ or detail coefficient dj, the remaining coefficients are all set to zero. The reconstruction formula of the DWT is as follows

a j 1 ( k )   a j ( n )h ( k  2n )   d j ( n ) g ( k  2n ) , n

n

j  J ,, 2,1

(7)

where h ( n ) and g ( n ) are the low-pass filter coefficients and high-pass filter coefficients of discrete wavelet reconstruction, respectively. Through the above reconstruction process, the J+1 different scale reconstruction components {AJ, DJ, DJ-1, ... , D1} can be obtained. In this paper, the 3-level DWT is performed on the collected vibration signal x to obtain 4 reconstruction components {A3, D3, D2, D1}, then each component is divided into M non-overlapping segments with length Ns to construct the samples for parameter learning and feature extraction. In other words, each sample consists of 4 components, that is, xm  { A3m , D3m , D2m , D1m }, m  1, 2,, M , xm R N s 4 denotes the mth sample. 3.2 Parameters learning of BPNN and multiscale local feature extraction In the fault diagnosis of rotating machinery, the collected vibration signals are polluted by strong noise, and the most informative and distinguished parts of the different faults signals only account for a small portion in the

time domain signals. Therefore, in the paper, firstly, the long-sequence samples are divided into multiple segments without overlapping to input into the BPNN for local feature parameters learning, thereby obtaining parameters information capable of representing local features of the samples; secondly, the local feature extraction is performed on the 4 components in the sample by using the trained BPNN, and the pooling processing is conducted; thirdly, the local features obtained from each component are combined to get the overall features reflecting the corresponding frequency band information; finally, the overall features corresponding to the 4 components in the sample are combined to obtain multiscale features reflecting the raw data information. In addition, since the sample consists of 4 components and the feature learning process is the same of each component, the component A3 is taken as an example to describe the parameter learning of BPNN and local feature extraction algorithm in details, and the corresponding algorithm flow is shown in Fig. 3. Pooling

Sample with length Ns

1 fLocal _ max

...

For each sample

...

...

...

... W  , b

...

Pooling

... m

...

...

N

N

p s fLocal _ max

...

Label layer

...

Parameter learning

Pooling

...

...

Parameter

Hidden Layer passing

...

...

W  , b

Input layer

W  , b

...

...

i f Local _ min

...

Segments with length Np

...

i f Local _ max

...

A3

1 fLocal _ min

Local feature Ns N p combination fLocal _ min

W  , b

Local feature extraction

Fig.3. The algorithm flow of parameter learning of BPNN and local feature extraction (Take A3 in sample xm as an example)

Step 1: Parameters learning of BPNN. The component A3 obtained by the 3-level DWT of the original signal is divided into P non-overlapping segments with length Np (far less than Ns) to construct the samples N 1 A 3  { A31 , A32 , A3P } , A3p  R p , p  1, 2,, P . The constructed samples A 3 is used as the input of the BPNN

for the parameter training to obtain the low-dimensional feature representations of the input data, and the Softmax classifier is employed as the output layer to establish the mapping relationship between the feature representations and the labeled data, so as to obtain the optimal parameters   [W  , b ] .

Step 2: Local feature extraction. The component A3m in the mth sample xm  { A3m , D3m , D2m , D1m } is divided into Ns/Np non-overlapping segments with the length Np, and the local feature extraction is performed on the Ns/Np segments by using the trained BPNN to obtain the local features of each segment

f

Local

i

, i  1, 2, , N

s

Np .

Then, the maximum-based pooling processing and minimum-based pooling processing with pooling parameter ζ are performed on

f

Local

i



i i to obtain { fLocal _ max } and { f Local _ min } , respectively. Finally, the pooled local features of

(m)

each segment are combined to obtain the feature representations FA

3



1 1 s p s p  fLocal _ max ,  , f Local _ max , f Local _ min ,  , f Local _ min N N

N N



of the component A3m in the sample xm . Step 3: Multiscale local feature combination. The above steps 1-2 are executed on the remaining three (m)

(m)

components D3m , D2m and D1m in the sample xm to obtain FD , FD 3

2

(m)

and FD , respectively; and then, the 1

local features corresponding to different scale components in the sample xm are combined to obtain a multiscale (m)



(m)

(m)

(m)

(m)

feature set Fout _ train  FA3 , FD3 , FD2 , FD1

 representing the original data information to input into the

classifier for classification. 3.3 SVM classifier for multi-classification learning In order to realize the intelligent fault diagnosis of rolling bearings, the SVM is used as the final classifier for dealing with multi-category classification problem in the paper. The SVM can map input feature vectors nonlinearly into a high dimensional feature space and construct the optimal hyperplane for distinguishing the sample which belongs to one kind or another. The mathematical description of the SVM is as follows min

w ,b

s.t.

N 1 2 w  C  i 2 i 1

(8)

yi  w  xi  b   1  i

where ξi is a relaxation vector, xi denotes the ith input feature vector and yi denotes the ith label indicating the category of xi. C is the penalty factor. SVM can transform the nonlinear classification problem into a linear classification problem in a high-dimensional space by introducing the kernel function K(xi, xj), and the Gaussian radial basis kernel function is selected as the kernel function of SVM in the paper.  x x i j K（xi , x j） exp   2  2 

2

   

where σ is the Gaussian kernel parameter. Introducing the Lagrange multiplier  i to solve the problem (8), and

(9)

optimal classification function f ( x ) can be obtained as follows n

f ( x )  sgn(  i yi K ( x, xi )  b)

(10)

i 1

The optimal parameters of the penalty factor C and the kernel parameter σ in SVM are obtained by the cross-validation. In this paper, the SVM model adopts one-to-one mode, and is combined with multiple binary SVMs to achieve the multi-category classification. The multiscale local feature set Fout_train learned from the training samples is used to train the SVM through a supervised method, and the multiscale local feature set Fout_test learned from the test data is used as the input of the SVM model to analyze the performance of the proposed method in the intelligent fault diagnosis of rolling bearings.

4. Fault diagnosis cases of rolling bearings In order to verify the effectiveness of the proposed method, the following experiments are carried out using the rolling bearing test bench data and Case Western Reserve University bearing data. The influence of the BPNN structural parameters and the pooling parameter ζ on the performance of the proposed method is analyzed. The effectiveness of the proposed method is demonstrated through the visualization of learned features, and the superiority of the proposed method is validated in comparison to other methods. 4.1 Case 1- rolling bearing test bench data analysis 4.1.1 Test bench description In this subsection, the performance of the proposed method is analyzed using the rolling bearing vibration data collected from the rolling bearing fault simulation test bench. In the experiments, the rolling bearing operated under 4 conditions: normal conditon, rolling ball fault, inner race fault and outer race fault. For each condition, the rolling bearing operated under 3 different speeds (300 r/min, 400 r/min, and 500 r/min). The acceleration sensor mounted on the bearing support was employed to collect the vibration data with a sampling rate of 1280 Hz. Fig.4 displays the rolling bearing fault simulation test bench, and Fig.5 shows the rolling bearings in 4 different conditions.

Fig.4

The rolling bearing fault simulation test bench

(a) Normal

Fig.5

(b) Rolling ball fault

(c) Inner race fault

(d) Outer race fault

The rolling bearings in 4 different conditions

4.1.2 Parameters analysis In the proposed method, the algorithm parameter Np and the pooling parameter ζ have an important influence on the performance of the algorithm. The algorithm parameter Np representing the size of segment for local feature learning determines the BPNN structure. The pooling parameter ζ determines the dimension of the feature vectors input into the SVM classifier. Therefore, we analyzed the influence of different parameters on the classification performance of the proposed method using the vibration data collected at the speed of 300 r/min. In the parameter learning of the proposed mehtod, firstly, the wavelet multiscale transform is performed on the original vibration data in each condition to obtain 4 components {A3, D3, D2, D1}; then, the 4 components are respectively divided to

120000 N p non-overlapping segments with the length Np to construct the samples { A 3 , D 3 , D 2 , D 1 } ; finally, 500 samples are randomly selected from all samples constructed under the four conditions to input into the BPNN for parameter learning. In the local feature extraction of the proposed mehtod, the wavelet multiscale transformation is performed on the rolling bearing vibration data collected in each condition to obtain 4 components {A3, D3, D2, D1}; then, for each condition, the 4 components are used to construct the 100 samples with the length of Ns=1200 data points, i.e. xm  { A3m , D3m , D2m , D1m }, m  1, 2,,100 , xm R N s 4 . Table 1 shows the sample composition of the rolling bearing vibration data under the speed of 300 r/min. 100 samples are randomly selected from 400 samples as the training samples, and the rest are used as the testing samples. The trained BPNN is used to perform the local feature learning on all samples, and the classification accuracy of testing samples is regarded as the measure index to evaluate the performance of the proposed method. The optimal parameters are obtained by analyzing the influence of the BPNN structure and the pooling parameters on the classification accuracy of the testing samples. During the process of analysis, the BPNN parameters are set as follows: the parameter learning rate is set to 0.05, the momentum is 0.05, the number of iterations is 50, the batch size is 50. The classification accuracies and the cost time of the proposed method with different parameters Np and ζ are shown in Table 2 and Tabel 3 respectively, and the displayed results are the average of 5 test results. As shown in Table 2, with the increase of the parameters Np and ζ, the fluctuation of the classification accuracy of the proposed method is small. This indicates that BPNN has

good robustness in feature learning, and the classification preformation is relatively stable. It is observed from Table 3 that the larger the parameter Np is, the more the time is spent on the model training. Meanwhile, with the increase of the pooling parameter ζ, the remaining effective features are less and less, thus reducing the time spent on the SVM classifier training. Therefore, based on the above analysis, the parameter Np is set as 200, the BPNN structure is set as [200 100 10], and the pooling parameter ζ is set to 20 in the subsequent analysis. Tabel 1

Fault location

Rotating speed

Number of samples

Category labels

Normal

300 r/min

100

1

Rolling ball fault

300 r/min

100

2

Inner race fault

300 r/min

100

3

Outer race fault

300 r/min

100

4

Table 2 Parameter Np

Description of rolling bearing datasets under the speed of 300 r/min

Classification accuracies of the proposed method with different parameters Pooling size

BPNN structure of input layer and hidden layer 5

10

15

20

25

100

[100 50]

98.42%

99.08%

99.17%

98.92%

98.83%

200

[200 100]

97.50%

98.92%

99.33%

99.75%

99.17%

300

[300 150]

98.25%

99.83%

99.17%

99.58%

98.92%

400

[400 200]

98.75%

99.33%

98.83%

99.25%

99.25%

Tabel 3 The cost time of the proposed method with different parameters Parameter Np

Pooling size

BPNN structure of input layer and hidden layer 5

10

15

20

25

100

[100 50]

31.4 s

27.7 s

22.8 s

21.6 s

21.1 s

200

[200 100]

42.7 s

36 s

34.7 s

33.3 s

33.2 s

300

[300 150]

57.7 s

50.6 s

48 s

47.4 s

47.1 s

400

[400 200]

79.5 s

72.7 s

68.9 s

68.5 s

68.3 s

In addition to the effective feature extraction methods, the selection of the classifiers is also a key factor affecting the accuracy of fault identification. Currently, there are three kinds of classifiers, namely SVM, ELM and Softmax classifier, which have been widely applied in intelligent fault diagnosis. Therefore, in the paper, the classification accuracy is used as the evaluation indicator to analyze the performance of the three classifiers. Moreover, all three classifiers utilize the multiscale local features learned by the proposed method as input vectors, and the average of 5 test results is taken as the final result. Table 4 shows the classification accuracies of the 3 classifiers. It is observed that the testing accuracy of SVM is superior to that of Softmax classifier and ELM. Accordingly, the SVM is selected as the classifer to achieve the fault diagnosis of rolling bearings in the paper. Table 4

Comparison of different classifiers

Classifiers

Training accuracy (%)

Testing accuracy (%)

SVM

100.00

99.75

Softmax

98.75

98.25

ELM

100.00

95.58

4.1.3 Analysis of diagnosis results In this subsection, the vibration data collected at three different speeds is used to analyze the performance of the proposed method. As described in 4.1.2, first, 1500 samples are randomly selected for parameter learning of BPNN. Second, the sample set shown in Table 5 is constructed for multiscale local feature extraction and diagnostic testing. In order to verify the effectiveness of the proposed method in multiscale local feature learning, firstly, using the proposed method to learn features from two samples randomly selected from all samples of each fault category, and then one feature is randomly selected from the feature vectors of each sample to calculate the correlation coefficient between different samples. The results obtained are shown in Fig. 6. It is observed that the similarity between features of the same fault type is the highest, and the similarity between features of different fault types is small. This shows that the proposed method can learn discriminative features from samples of different fault types, and provide the basis for subsequent fault classification. Tabel 5

Description of rolling bearing datasets under three speeds

Fault location

Rotating speed (r/min)

Number of samples

Category labels

Normal

300/400/500

300

1

Rolling ball fault

300/400/500

300

2

Inner race fault

300/400/500

300

3

Outer race fault

300/400/500

300

4 0.95

1

Category labels

0.9 2

0.85 0.8

3 0.75 4

0.7 1

Fig.6

2

3

Category labels

4

Correlation diagram of different categories of features

Furthermore, the “t-SNE” method is also applied to visualize the learned features. The “t-SNE” is an effective way to visualize high-dimensional data by mapping data samples from the original feature space to two-dimensional or three-dimensional space [33]. First, the dimension of the feature vector is reduced to 50 by principal component analysis [34], which facilitates the calculation of the pairwise distance between points and can

suppress some noise; and then, the “t-SNE” method is used to convert 50-dimensional data to a two-dimensional map. In addition, the methods of SAE and CNN are also employed to analyze the dataset shown in Table 5, and the learned features are visualized by using the “t-SNE”. During the analysis, the structure of SAE is [1200 600 300 100 4], and CNN has two convolutional layers, each of which has 30 one-dimensional convolution kernels of length 100. For comparison, Fig. 7 (a) displays the clustering results obtained by the “t-SNE” using raw data, and the clustering maps obtained based on the features learned by these methods are shown in Fig. 7 (b), (c) and (d), respectively. It is observed that in Fig. 7(a), the data samples of the four conditions are randomly distributed, which indicates that the difference among the original data is small, and it is difficult to achieve effective classification and identification. In Fig. 7 (b), the features learned by SAE have a good clustering effect on bearing outer race faults, but it is still difficult to effectively identify and separate other faults. Comparing Fig. 7 (c) with Fig. 7 (d), the two mehtods almost achieve effective separation of the four fault data, and the clustering effect is superior to that of Fig. 7 (a) and (b). However, in Fig. 7(c), there are still several samples overlaped between the rolling ball fault and inner race fault. Therefore, compared with Fig. 7(a), (b) and (c), it can be seen that features of the same condition learned by the proposed method are clustered well while features of different conditions are separated well. Normal

Rolling ball fault

1

Inner race fault

(a)

Dimension 2

Dimension 2

(b)

0.8

0.8

0.6

0.4

0.6

0.4

0.2

0.2

0 0

0.2

0.4

0.6

0.8

0 0

1

0.2

1

0.8

1

(d)

0.8

Dimension 2

Dimension 2

0.6

1

(c)

0.8

0.6

0.4

0.6

0.4

0.2

0.2

0.2

0.4

0.6

0.8

1

0 0

0.2

0.4

0.6

0.8

1

Dimension 1

Dimension 1

Fig.7

0.4

Dimension 1

Dimension 1

0 0

Outer race fault

1

Features visualization based on “t-SNE”: (a) raw data; (b) features learned by SAE; (c) features learned by CNN; (d) features learned by the proposed method

In order to verify the superiority of the proposed method, several intelligent fault diagnosis methods, such as the

SAE + Softmax, SAE + SVM, CNN + Softmax, CNN + SVM, wavelet packet energy + SVM, are also used to analyze the dataset shown in Table 5. In the method of wavelet packet energy + SVM, the 3-level wavelet packet decompositon is performed on the raw data, and the energies of 8 frequency bands are calculated as the input feature vector of the SVM. In addition, the method of removing the wavelet multiscale transform in the proposed method, named LFL, is also employed to analyze the dataset, whose network structure and parameters are consistent with the proposed method. In order to demonstrate the performance of different classification methods, the precision, recall rate, F-measure and accuracy are selected as the evaluation indicators, which are defined as follows, Precision= Recall=

(11)

TP  100 TP  FN

(12)

2TP  100 2TP  FP  FN

(13)

TP  TN  100 TP  FP  TN  FN

(14)

F-measure= Accuracy=

TP  100 TP  FP

where TP and TN represent the numbers of true positive and true negative instances, respectively; FP and FN represent the numbers of false positive and false negative instances. From Eq. (13), it can be seen that F-measure contains the information of precision rate and recall rate, reaching its best value at 1 and the worst value at 0. Table 6 shows the average of 5 trial results obtained by different classification methods. As shown in Table 6, the average testing accuracy of the proposed method is 99.07%, and it is higher than wavelet packet energy + SVM, SAE + Softmax, SAE + SVM, CNN + Softmax and CNN + SVM, which are 92.57%, 63.6%, 65.27%, 95.23%, and 97.82%, respectively. Moreover, compared with the proposed method, the LFL method removes the wavelet multiscale transform, which reduces the difference between the learned different fault features, resulting in a decrease in classification performance. This indicates the effectiveness and superiority of the multiscale local feature learning strategy proposed in this paper. In addition, the standard deviation of the proposed methods is 0.25%, which is also smaller than that of other methods. Meanwhile, based on the precision and recall rates shown in Table 6, F-measure values of these methods are calculated, as shown in Fig. 8. These comparison results demonstrate that the proposed method is more powerful for feature learning than other methods, and has good classification accuracy and stablity. Table 6 The diagnostic results of different methods Bearing condition Methods

Normal Precision

Rolling ball fault Recall

Precision

Recall

Inner race fault Precision

Recall

Outer race fault Precision

Recall

Testing accuracy

Wavelet packet energy + 84.87±2.51

99.50±0.46

94.74±1.64

88.02±2.22

92.86±2.04

87.63±2.82

99.87±0.27

94.6±0.87

92.57±0.83

LFL

71.04±4.05

60.37±1.88

79.96±2.96

90.7±3.32

95.86±1.14

97.89±0.31

74.64±4.56

73.54±7.1

80.62±1.04

SAE + Softmax

61.93±3.03

63.02±2.42

64.25±0.97

65.05±2.08

40.25±1.32

47.8±2.93

98.68±0.9

77.68±3.55

63.6±0.8

SAE + SVM

58.62±3.86

45.28±4.85

53.57±1.68

81.92±4.88

51.42±3.38

37.67±4.13

99.2±1.09

97.34±0.88

65.27±0.68

CNN + Softmax

99.4±0.49

97±2.45

86.94±5.93

99.2±0.75

99.5±0.66

89.87±5.4

99.8±0.4

96.67±3.55

95.23±1.01

CNN + SVM

99.82±0.36

99.29±0.53

94.16±1.45

97.48±2.5

97.48±2079

94.58±1.17

100

100

97.82±0.35

The proposed method

99.8±0.44

97.91±0.63

99.39±0.38

98.94±0.68

99.12±0.72

99.46±0.3

97.99±0.71

100

99.07±0.25

SVM

(Note: the format of the diagnosis result is average value ± standard deviation.) 100

Wavelet packet energy + SVM LFL SAE + Softmax SAE + SVM CNN + Softmax CNN + SVM The proposed methonds

90 80

F-measure(%)

70 60 50 40 30 20 10 0

1

2 3 Condition label of bearing

Fig. 8

4

F-measures of all methods

4.2 Case 2- Western Reserve University rolling bearing data analysis In order to further verify the effectiveness and superiority of the proposed method in the intelligent diagnosis of rolling bearing faults, the proposed method is applied to analyze the bearing dataset provided by Case Western Reserve University [35]. In the bearing dataset, the vibration signals were collected from the drive end bearing of a motor under 4 different conditions: normal condition, outer race fault, inner race fault, and rolling ball fault. There were 3 different severity levels (0.007 inch, 0.014 inch and 0.021 inch) for each fault condition. The vibration data collected when the test bench operated under 3 different loads is selected for analysis in this subsection, and the sampling frequency was 12 kHz. Therefore, in this paper, the bearing dataset contains 10 bearing conditions under the three loads. There are 60 samples for each health condition under one load, and the length of each sample is Ns=2000 data points. The composition of data sample is described in Table 7. Table 7 Data description of rolling bearing Number of

Fault location

Fault diameter (in.)

Loads

Normal

—

0/1/3 hp

180

1

0.007

0/1/3 hp

180

2

0.014

0/1/3 hp

180

3

0.021

0/1/3 hp

180

4

Rolling ball fault

samples

Category labels

Inner race fault

Outer race fault

0.007

0/1/3 hp

180

5

0.014

0/1/3 hp

180

6

0.021

0/1/3 hp

180

7

0.007

0/1/3 hp

180

8

0.014

0/1/3 hp

180

9

0.021

0/1/3 hp

180

10

Based on the parameter analysis method in Section 4.1.2, the optimal values of the parameter Np and pooling parameter ζ are set as 400 and 20, respectively, and the BPNN structure is selected as [400, 200, 10], the remaining parameters are the same as in Section 4.1.2. Fig.9 displays the bearing vibration signals collected at different times for the inner race fault (0.007 in.) and outer race fault (0.007 in.) and their corresponding features learned by the proposed method. It is observed that the features learned from the different scale components of a sample have obvious differences, and the similarity of the features learned from the signals with the same fault type is high, and the differences between the features learned from the signals with different fault types are relatively larger, which is the key to solve the multi-class classification problem. This indicates the effectiveness and practicability of the proposed method in the adaptive feature learning. Inner race fault (0.007in)

3

1

1

2

2

0.5 0 -0.5

0 -0.5 -1

0

0.05

0.1

-1.5

0.15

0

0.05

D3

D2

D1

Feature values

0.6 0.4 0.2 0

200

400

Number of features

Fig.9

A3

1

0.8

0

0.1

-3

0.15

D3

D2

D1

0

1

0.6 0.4 0.2 0

200

0 -1

0.05

0.1

-3

0.15

0

0.05

Time/s

0.8

0

1

-2

Time/s

Feature values

A3

1

0 -1 -2

Time/s

(b)

1

400

Number of features

A3

D3

D2

D1

0.6 0.4 0.2 200

A3

1

0.8

0 0

0.1

0.15

Time/s

Feature values

-1.5

0.5

Amplitude

3

Amplitude

1.5

-1

Feature values

Outer race fault (0.007in)

1.5

Amplitude

Amplitude

(a)

400

Number of features

D3

D2

D1

0.8 0.6 0.4 0.2 0

0

200

400

Number of features

(a) Vibration signals of 2 faults at different times; (b) Output features of 2 faults at different times

We randomly select 25% of samples of the bearing dataset to train the proposed method and use the rest to test, and the testing accuracy of the proposed is shown in Table 8. It is observed the proposed method obtains 99.31% testing accuracy in classifying the bearing datasets. Given that the bearing dataset used is a benchmark in mechanical fault diagnosis, the proposed method is compared with other methods using the bearing dataset in public papers.

In Method 1 [36], empirical mode decomposition (EMD), wavelet kernel local Fisher discriminant analysis (WKLFDA) and SVM were integrated to identify 10 health conditions of rolling bearings, achieving 98.8% testing accuracy. In Method 2 [37], the obtained bi-spectrum features were input to the SVM to distinguish four health conditions of rolling bearings, and a testing accuracy of 96.98% was obtained. These two methods used traditional signal processing techniques to extract effective features, and then used SVM to achieve bearing faults classification and recognition. The feature extraction algorithms need manual design and are heavily dependent on human knowledge and experience. By comparing Method 1 and Method 2 with the proposed method, it indicates that the features automatically learned by the proposed method are more representative than the manual extracted features, which is conducive to the classification and recognition of the health conditions of rolling bearings. In Method 3 [38], by combining various activation functions, the ensemble deep autoencoders (EDAEs) method was proposed to realize the identification and classification of 12 health conditions of rolling bearings, achieving a testing accuracy of 97.18%. Although the EDAEs method makes full use of the information of the vibration signal itself, the training process of multiple DAEs increases the time cost of the algorithm. In Method 4 [28], the integrated hierarchical learning method based on Auto-Encoders and ELM was used to perform local feature learning on the vibration signal to classify 4 bearing health conditions, and achieved a testing accuracy of 92.60%. By combining wavelet packet transform and DBN, a hierarchical diagnostic network (HDN), namely Method 5 [23], was proposed, which realized the effective identification of 10 bearing conditions with a testing accuracy of 99.03%. Compared with these methods, the proposed method can automatically achieve the feature extraction and fault recognition from the raw vibration signals, and obtains higher classification accuracy in the bearing fault diagnosis. Table 8

Classification comparison of the bearing dataset

Methods

Description

No. of classes

Training samples

Testing accuracy

Proposed method

Multiscale local feature learning + SVM

10

25%

99.31%

Method 1[36]

EMD-WKLFDA + SVM

10

40%

98.80%

Method 2[37]

Bi-Spectrum features + SVM

4

50%

96.98%

Method 3[38]

EDAEs

12

30%

97.18%

Method 4[28]

Stacked ELM based on Auto-encoder

4

30%

92.60%

Method 5[23]

HDN

10

30%

99.03%

5. Conclusion In order to avoid the influence of human experience on the performance of intelligent fault diagnosis algorithms and improve the feature learning performance from the original vibration data, an intelligent fault diagnosis method based on the multiscale local feature learning was presented for rolling bearings fault diagnosis. In this method, the

3-level DWT was employed to decompose the original vibration signals into sub-signals of different scales to construct the samples for multiscale feature learning; and then, the input samples were divided into segments as the input of the BPNN to perform the local features learning; finally, the local features learned from the sub-signals of different scales were combined as the input of the SVM to realize the effective identification and diagnosis of rolling bearing faults. The two sets of rolling bearing datasets were adopted to verify the effectiveness and superiority of the proposed method. The comparative analysis results indicated that the learned features of the proposed method are meaningful and dissimilar, which helps to improve the accuracy of fault diagnosis. Through comparing with other methods, the proposed method has high diagnostic accuracy and provides an effective solution for the intelligent diagnosis of rolling bearing faults. In addition, in this paper, the two important parameters of the proposed method are determined manually by the parameter analysis. In further work, to increase the practicability of the proposed method, it is considered to optimize the algorithm parameters by using optimization algorithms to further improve the performance of the proposed method in intelligent fault diagnosis of machines.

Author Contributions Section Jimeng Li and Xifeng Yao conceived of the presented idea, and wrote the manuscript. Xiangdong Wang, Qingwen Yu and Yungang Zhang planned and carried out the experiments, and analyzed the data. All authors discussed the results and contributed to the final manuscript.

Acknowledgement This work was supported by the National Natural Science Foundation of China (Grant Nos. 51505415 and 61308065), Natural Science Foundation of Hebei Province (Grant Nos. E2017203142 and F2018203413), and Hebei Province Key Research and Development Plan (Grant No. 19214306D). The authors would like to thank the anonymous reviewers for their valuable comments and suggestions. References [1] W. Caesarendra, M. Pratama, B. Kosasih, T. Tegoeh, A. Glowacz, Parsimonious network based on a fuzzy inference system (PANFIS) for time series feature prediction of low speed slew bearing prognosis, Appl. Sci-Basel. 8(12) (2018) 2656_1-21.

[2] X. Zhang, Z. Liu, J. Wang, J. Wang, Time-frequency analysis for bearing fault diagnosis using multiple Q-factor Gabor wavelets, ISA T. 87 (2019) 225-234. [3] B. Chen, B. Shen, F. Chen, H. Tian, W. Xiao, F. Zhang, C. Zhao, Fault diagnosis method based on integration of RSSD and wavelet transform to rolling bearing, Measurement 131 (2019) 400-411. [4] X. Yan, M. Jia, Intelligent fault diagnosis of rotating machinery using improved multiscale dispersion entropy and mRMR feature selection, Knowl.-Based Syst. 163 (2019) 450-471. [5] D. Dou, S. Zhou, Comparison of four direct classification methods for intelligent fault diagnosis of rotating machinery, Appl. Soft. Comput. 46 (2016) 459-468. [6] A. Glowacz, W. Glowacz, J. Kozik, K. Piech , M. Gutten , W.u Caesarendra, H. Liu, F. Brumercik, M. Irfan, Z.F. Khan, Detection of deterioration of three-phase induction motor using vibration signals, Meas. Sci. Rev. 19 (6) (2019) 241-249. [7] R. Yuan, Y. Lv, H. Li, G. Song, Robust fault diagnosis of rolling bearings using multivariate intrinsic multiscale entropy analysis and neural network under varying operating conditions, IEEE Access 7 (2019) 130804-130819. [8] Q. Lu, R. Yang, M. Zhong, Y. Wang, An improved fault diagnosis method of rotating machinery using sensitive features and RLS-BP neural network, IEEE T. Instrum. Meas. (2019) 1-9. DOI: 10.1109/TIM.2019.2913057 [9] Q. Zhang, L.T. Yang, Z. Chen, P. Li, A survey on deep learning for big data, Inform. Fusion. 42 (2018) 146-157. [10] K. He, X. Zhang, S. Ren, J. Sun, Spatial pyramid pooling in deep convolutional networks for visual recognition, IEEE T. Pattern. Anal. 37(9) (2015) 1904-1916. [11] G. Hinton, L. Deng, D. Yu, G. Dahl, A. Mohamed, N. Jaitly, Deep neural networks for acoustic modeling in speech recognition, IEEE Signal Proc. Mag. 29 (2012) 82-97. [12] J. Wang, Y. Ma, L. Zhang, R. Gao, D. Wu, Deep learning for smart manufacturing: Methods and applications, J. Manuf. Syst. 48 (2018) 144-156. [13] S. Munikoti, L. Das, B. Natarajan, B. Srinivasan, Data driven approaches for diagnosis of incipient faults in DC motors, IEEE T. Ind. Inform. 15 (2019) 5299- 5308. [14] J. Xie, G. Du, C. Shen, N. Chen, L. Chen, Z. Zhu, An end-to-end model based on improved adaptive deep belief network and its application to bearing fault diagnosis, IEEE Access 6 (2018) 63584-63596. [15] W. Sun, S. Shao, R. Zhao, R. Yan, Zhang, X. Chen, A sparse auto-encoder-based deep neural network

approach for induction motor faults classification, Measurement 89 (2016) 171-178. [16] H. Shao, H. Jiang, X. Li, S. Wu , Intelligent fault diagnosis of rolling bearing using deep wavelet auto-encoder with extreme learning machine, Knowl.-Based Syst. 140 (2018) 1-14. [17] L. Wen, X. Li, L. Gao, Y. Zhang, A new convolutional neural network based data-driven fault diagnosis method, IEEE T. Ind. Electron. 65 (2017) 5990 - 5998. [18] W. Zhang, C. Li, G. Peng, Y. Chen, Z. Zhang, A deep convolutional neural network with new training methods for bearing fault diagnosis under noisy environment and different working load, Mech. Syst. Signal Process. 100 (2018) 439-453. [19] Y. Qi, C. Shen, D. Wang, J. Shi, X. Jiang, Z. Zhu, Stacked sparse autoencoder-based deep network for fault diagnosis of rotating machinery, IEEE Access, 5 (2017) 15066-15079. [20] W. Mao, J. He, Y. Li, Y. Yan, Bearing fault diagnosis with auto-encoder extreme learning machine: A comparative study, P. I. MECH. ENG. C-J. MEC. 231 (2017) 1560-1578. [21] Z. Meng, X. Zhan, J. Li, Z. Pan, An enhancement denoising autoencoder for rolling bearing fault diagnosis, Measurement 130 (2018) 448-454. [22] G. Jiang, H. He, J. Yan, P. Xie, Multiscale convolutional neural networks for fault diagnosis of wind turbine gearbox, IEEE Trans. Industr. Electron. 66 (4) (2019) 3196-3207. [23] M. Gan, C. Wang, Construction of hierarchical diagnosis network based on deep learning and its application in the fault pattern recognition of rolling element bearings, Mech. Syst. Signal Process. 72 (2016) 92-104. [24] H. Yan, B. Tang ,D. Lei, Multi-level wavelet packet fusion in dynamic ensemble convolutional neural network for fault diagnosis, Measurement 127 (2018) 246-255. [25] Y. Li, G. Cheng , C. Liu ,X. Chen, Study on planetary gear fault diagnosis based on variational mode decomposition and deep neural networks, Measurement 130 (2018) 94-104. [26] W. Sun, R. Zhao, R. Yan, S. Shao, X. Chen, Convolutional discriminative feature learning for induction motor fault diagnosis, IEEE T. Ind. Inform. 13 (2017) 1350-1359. [27] F. Jia, Y. Lei, L. Guo, J. Lin, S. Xing, A neural network constructed by deep learning technique and its application to intelligent fault diagnosis of machines, Neurocomputing 272 (2018) 619-628. [28] Y. Lin, X. Li, Y. Hu, Deep diagnostics and prognostics: An integrated hierarchical learning framework in PHM applications, Appl. Soft. Comput. 72 (2018) 555-564. [29] Y. Lei, F. Jia, J. Lin,S. Xing, S. Ding, An intelligent fault diagnosis method using unsupervised feature learning towards mechanical big data, IEEE T. Ind. Electron. 63(5) (2016) 3137-3147.

[30] X. Zhao, X. Tang, J. Zhao, Y. Zhang, Fault diagnosis of asynchronous induction motor based on BP neural network, in: Proc. IEEE Conf. ICMTMA, March, 2010, 236-239. [31] N. Li, R. Zhou, Q. Hu, X. Liu, Mechanical fault diagnosis based on redundant second generation wavelet packet transform, neighborhood rough set and support vector machine, Mech. Syst. Signal Process. 28 (2012) 608-621. [32] N. Rodriguez, P. Alvarez, L. Barba, G. Cabrera-Guerrero, Combining multi-scale wavelet entropy and kernelized classification for bearing multi-fault diagnosis, Entropy-Switz. 21(2) (2019) 1-17. [33] L. Maaten, G. Hinton, Visualizing data using t-SNE, J. Mach. Learn. Res. (9) (2008) 2579-2605. [34] W. Sun, J. Chen, J. Li, Decision tree and PCA-based fault diagnosis of rotating machinery, Mech. Syst. Signal Process. 21 (3) (2007) 1300-1317. [35] http://csegroups.case.edu/bearingdatacenter/home (accessed 2015.06.20) [36] M. Van, H.J. Kang, Bearing defect classification based on individual wavelet local fisher discriminant analysis with particle swarm optimization, IEEE T. Ind. Inform. 12 (2016) 124–135. [37] L. Saidi, J. B. Ali, F. Fnaiech, Application of higher order spectral features and support vector machines for bearing faults classification, ISA T. 54 (2015) 193-206. [38] H. Shao, H. Jiang, Y. Lin, X. Li, A novel method for intelligent fault diagnosis of rolling bearings using ensemble deep auto-encoders, Mech. Syst. Signal Process. 102 (2018) 278-297.

Table captions Tabel 1

Description of rolling bearing datasets under the speed of 300 r/min

Table 2

Classification accuracies of the proposed method with different parameters

Tabel 3

The cost time of the proposed method with different parameters

Table 4

Comparison of different classifiers

Tabel 5

Description of rolling bearing datasets under three speeds

Table 6

The diagnostic results of different methods

Table 7

Data description of rolling bearing

Table 8

Classification comparison of the bearing dataset

Figure captions Fig. 1

Network structure of BPNN

Fig.2

The flowchart of the proposed method for rolling bearing fault diagnosis

Fig.3. The algorithm flow of parameter learning of BPNN and local feature extraction (Take A3 in sample xm as an example) Fig.4

The rolling bearing fault simulation test bench

Fig.5

The rolling bearings in 4 different conditions

Fig.6

Correlation diagram of different categories of features

Fig.7

Features visualization based on “t-SNE”: (a) raw data; (b) features learned by SAE; (c) features learned by

CNN; (d) features learned by the proposed method Fig.8

F-measures of all methods

Fig.9

(a) Vibration signals of 2 faults at different times; (b) Output features of 2 faults at different times

Tables Tabel 1

Fault location

Rotating speed

Number of samples

Category labels

Normal

300 r/min

100

1

Rolling ball fault

300 r/min

100

2

Inner race fault

300 r/min

100

3

Outer race fault

300 r/min

100

4

Table 2 Parameter Np

Description of rolling bearing datasets under the speed of 300 r/min

Classification accuracies of the proposed method with different parameters Pooling size

BPNN structure of input layer and hidden layer 5

10

15

20

25

100

[100 50]

98.42%

99.08%

99.17%

98.92%

98.83%

200

[200 100]

97.50%

98.92%

99.33%

99.75%

99.17%

300

[300 150]

98.25%

99.83%

99.17%

99.58%

98.92%

400

[400 200]

98.75%

99.33%

98.83%

99.25%

99.25%

Tabel 3 The cost time of the proposed method with different parameters Parameter Np

Pooling size

BPNN structure of input layer and hidden layer 5

10

15

20

25

100

[100 50]

31.4 s

27.7 s

22.8 s

21.6 s

21.1 s

200

[200 100]

42.7 s

36 s

34.7 s

33.3 s

33.2 s

300

[300 150]

57.7 s

50.6 s

48 s

47.4 s

47.1 s

400

[400 200]

79.5 s

72.7 s

68.9 s

68.5 s

68.3 s

Table 4

Comparison of different classifiers

Classifiers

Training accuracy (%)

Testing accuracy (%)

SVM

100.00

99.75

Softmax

98.75

98.25

ELM

100.00

95.58

Tabel 5

Description of rolling bearing datasets under three speeds

Fault location

Rotating speed (r/min)

Number of samples

Category labels

Normal

300/400/500

300

1

Rolling ball fault

300/400/500

300

2

Inner race fault

300/400/500

300

3

Outer race fault

300/400/500

300

4

Table 6 The diagnostic results of different methods Bearing condition

Normal

Methods

Rolling ball fault

Inner race fault

Outer race fault

Testing

Precision

Recall

Precision

Recall

Precision

Recall

Precision

Recall

accuracy

84.87±2.51

99.50±0.46

94.74±1.64

88.02±2.22

92.86±2.04

87.63±2.82

99.87±0.27

94.6±0.87

92.57±0.83

LFL

71.04±4.05

60.37±1.88

79.96±2.96

90.7±3.32

95.86±1.14

97.89±0.31

74.64±4.56

73.54±7.1

80.62±1.04

SAE + Softmax

61.93±3.03

63.02±2.42

64.25±0.97

65.05±2.08

40.25±1.32

47.8±2.93

98.68±0.9

77.68±3.55

63.6±0.8

SAE + SVM

58.62±3.86

45.28±4.85

53.57±1.68

81.92±4.88

51.42±3.38

37.67±4.13

99.2±1.09

97.34±0.88

65.27±0.68

CNN + Softmax

99.4±0.49

97±2.45

86.94±5.93

99.2±0.75

99.5±0.66

89.87±5.4

99.8±0.4

96.67±3.55

95.23±1.01

CNN + SVM

99.82±0.36

99.29±0.53

94.16±1.45

97.48±2.5

97.48±2079

94.58±1.17

100

100

97.82±0.35

The proposed method

99.8±0.44

97.91±0.63

99.39±0.38

98.94±0.68

99.12±0.72

99.46±0.3

97.99±0.71

100

99.07±0.25

Wavelet packet energy + SVM

(Note: the format of the diagnosis result is average value ± standard deviation.)

Table 7

Data description of rolling bearing Number of

Fault location

Fault diameter (in.)

Loads

Normal

—

0/1/3 hp

180

1

0.007

0/1/3 hp

180

2

0.014

0/1/3 hp

180

3

0.021

0/1/3 hp

180

4

0.007

0/1/3 hp

180

5

0.014

0/1/3 hp

180

6

0.021

0/1/3 hp

180

7

0.007

0/1/3 hp

180

8

0.014

0/1/3 hp

180

9

0.021

0/1/3 hp

180

10

Rolling ball fault

Inner race fault

Outer race fault

Table 8

Category labels

samples

Classification comparison of the bearing dataset

Methods

Description

No. of classes

Training samples

Testing accuracy

Proposed method

Multiscale local feature learning + SVM

10

25%

99.31%

Method 1[36]

EMD-WKLFDA + SVM

10

40%

98.80%

Method 2[37]

Bi-Spectrum features + SVM

4

50%

96.98%

Method 3[38]

EDAEs

12

30%

97.18%

Method 4[28]

Stacked ELM based on Auto-encoder

4

30%

92.60%

Method 5[23]

HDN

10

30%

99.03%

Figures Input layer

W

x1

Hid layer

h1

x2

ϕ

Output layer

y1

h2

x3





hd

xN b

1 Fig. 1

 yc

Softmax classifier

1

Network structure of BPNN

Vibration data x of rolling bearing

Wavelet multiscale transform

Training sample consisting of 4 components {A3, D3, D2, D1}

Testing sample consisting of 4 components {A3, D3, D2, D1}

For each component

BPNN-based local feature learning

Divide training sample into multiple segments

Parameter passing (W, b)

Multicale local feature combination

BPNN-based local feature learning Parameter passing (W, b)

Training of BPNN Parameter learning

SVM model training Fault diagnosis model training

Fig.2

For each component

Multicale local feature combination

SVM model-based fault identification Fault diagnosis model testing

The flowchart of the proposed method for rolling bearing fault diagnosis

Pooling

Sample with length Ns

1 fLocal _ max

...

For each sample

...

...

...

...

1 fLocal _ min

W  , b

...

Pooling

... m

A3

...

...

...

N

N

s p fLocal _ max

...

Label layer

...

Parameter learning

Pooling

...

...

Parameter

Hidden Layer passing

...

...

W  , b

Input layer

W  , b

...

...

i f Local _ min

...

Segments with length Np

...

i f Local _ max

Local feature Ns N p combination fLocal _ min

W  , b

Local feature extraction

Fig.3. The algorithm flow of parameter learning of BPNN and local feature extraction (Take A3 in sample xm as an example)

Fig.4

(a) Normal

The rolling bearing fault simulation test bench

(b) Rolling ball fault

Fig.5

(c) Inner race fault

(d) Outer race fault

The rolling bearings in 4 different conditions

0.95 1

Category labels

0.9 2

0.85 0.8

3 0.75 4

0.7 1

Fig.6

2

3

Correlation diagram of different categories of features

Normal

Rolling ball fault

1

Inner race fault

(a)

Dimension 2

Dimension 2

(b)

0.8

0.6

0.4

0.6

0.4

0.2

0.2

0.2

0.4

0.6

0.8

0 0

1

0.2

1

0.8

1

(d)

0.8

Dimension 2

Dimension 2

0.6

1

(c)

0.8

0.6

0.4

0.6

0.4

0.2

0.2

0.2

0.4

0.6

0.8

1

0 0

0.2

0.4

0.6

0.8

1

Dimension 1

Dimension 1

Fig.7

0.4

Dimension 1

Dimension 1

0 0

Outer race fault

1

0.8

0 0

4

Category labels

Features visualization based on “t-SNE”: (a) raw data; (b) features learned by SAE; (c) features learned by CNN; (d) features learned by the proposed methods

100

Wavelet packet energy + SVM LFL SAE + Softmax SAE + SVM CNN + Softmax CNN + SVM The proposed methonds

90 80

F-measure(%)

70 60 50 40 30 20 10 0

1

2 3 Condition label of bearing

Fig.8

Outer race fault (0.007in) 3

3

1

1

2

2

0.5 0 -0.5

0.5 0 -0.5

1 0 -1

-1

-1

-2

-1.5

-1.5

-3

0.05

0.1

0.15

0

0.05

Time/s A3

1

D3

D2

D1

Feature values

0.4 0.2 200

400

Number of features

Fig.9

A3

1

0.6

0

0

0.05

Time/s

0.8

0

0.15

D3

D2

D1

1

0.8 0.6 0.4 0.2 0

0

200

0 -1

0.1

-3

0.15

0

0.05

Time/s

Feature values

(b)

0.1

1

-2

400

Number of features

A3

D3

D2

D1

0.6 0.4 0.2 200

A3

1

0.8

0 0

0.1

0.15

Time/s

Feature values

0

Amplitude

1.5

Amplitude

1.5

Amplitude

Amplitude

F-measures of all methods

Inner race fault (0.007in)

(a)

Feature values

4

400

Number of features

D3

D2

D1

0.8 0.6 0.4 0.2 0

0

200

400

Number of features

(a) Vibration signals of 2 faults at different times; (b) Output features of 2 faults at different times

Research Highlights 

An intelligent diagnosis method is presented for rolling bearings fault diagnosis.



The multiscale local features learning strategy is presented.



The method can directly diagnose bearing faults using the raw vibration signals.



Experimental results verify the validity and superiority of the proposed method.

Multiscale local features learning based on BP neural network for rolling bearing intelligent fault diagnosis

Multiscale local features learning based on BP neural network for rolling bearing intelligent fault diagnosis

Recommend Documents