A new Local-Global Deep Neural Network and its application in rotating machinery fault diagnosis

A new Local-Global Deep Neural Network and its application in rotating machinery fault diagnosis

Neurocomputing 366 (2019) 215–233 Contents lists available at ScienceDirect Neurocomputing journal homepage: www.elsevier.com/locate/neucom A new L...

5MB Sizes 68 Downloads 127 Views

Neurocomputing 366 (2019) 215–233

Contents lists available at ScienceDirect

Neurocomputing journal homepage: www.elsevier.com/locate/neucom

A new Local-Global Deep Neural Network and its application in rotating machinery fault diagnosis Xiaoli Zhao, Minping Jia∗ School of Mechanical Engineering, Southeast University, Nanjing 211189, People’s Republic of China

a r t i c l e

i n f o

Article history: Received 9 November 2018 Revised 12 June 2019 Accepted 4 August 2019 Available online 7 August 2019 Communicated by Dr. Bin Jiang Keywords: Fault diagnosis Rotating machinery Industrial big data Fisher-CDBN (Fisher-Convolutional Deep Belief Network) LGDNN (Local-Global Deep Neural Network) Local and global information

a b s t r a c t Currently, it is a great challenge to effectively acquire more widespread equipment health information for guaranteeing safe production and timely fault maintenance in the process of industrial informationization. Aimed at the present issue that the conventional machine learning algorithm cannot juggle local and global fault feature information of rotating machinery, a novel fault diagnosis method based on LocalGlobal Deep Neural Network (LGDNN) algorithm is proposed in this paper. First of all, this fault diagnosis method can directly use the proposed LGDNN algorithm to extract local and global structural features from the original vibration spectral signals. Subsequently, the extracted high-level features are applied to classify different fault conditions by using soft-max classifier. Crucially, the core of this fault diagnosis method is that the anterior local layer of LGDNN utilizes a novel local feature extractor of the improved Convolutional Deep Belief Network (CDBN) based on the Fisher parameter optimization criterion (called Fisher-CDBN) to efficiently extract local discriminant information of data. Afterwards, its secondary global layer uses the global feature extractor of Kernel Principal Component Analysis (KPCA) to reduce the redundancy attribute of data. Ultimately, the vibration signals of the rolling bearing and the fan’s gear are used to validate the effectiveness of the proposed method. And the experimental results demonstrate that the proposed algorithm and fault diagnosis method can juggle local and global fault feature information of rotating machinery. Besides, it also provides a novel research reference for deep learning fault diagnosis of rotating machinery. © 2019 Elsevier B.V. All rights reserved.

1. Introduction Recently, the advent of “Big Data” age has accelerated the development of industrial informatization [1–3]. Consequently, the primary task of the industrial big data is to transform the industrial data into the useful and valuable information resources [2–3]. At the same time, its core requirement in industrial big data is devoted to protect safe and reliable operation of industrial equipment through compressing effective information resources [4–5]. Rotating machinery plays an irreplaceable role in industrial system, and its health monitoring and fault diagnosis is essential for the development of industrial informationization [6–8]. Actually, the massive data and information are produced by rotating machinery can reflect the operating and health condition of the mechanical system [8–10]. As life system needs medical diagnosis, rotating machinery also requires effective fault diagnosis and health assessment [9–16]. With the prominent development of measurement and sensor technology, the acquisition of machinery



Corresponding author. E-mail addresses: [email protected], [email protected] (M. Jia).

https://doi.org/10.1016/j.neucom.2019.08.010 0925-2312/© 2019 Elsevier B.V. All rights reserved.

health data is more and more convenient [17]. Nevertheless, as the amount of data increases exponentially, the sources of the data are more diversified, and "mechanical big data" problems are unavoidable [10,12]. Therefore, to filter out the real and valuable fault information from the massive mechanical data is a challenging task in current fault diagnosis [12,17–18]. Generally, it is necessary to convert the mechanical operating data into reliable information for fault diagnosis [3,19]. An effective way is applied to achieve deep representation of data, i.e. feature extraction [19–21]. Feature extraction not only plays the important role of the mining fault information, but also is the most critical step in fault diagnosis [21]. Distinctly, Refs. [22–25] demonstrated that the general fault feature extraction methods are separately divided into two categories: global feature extraction and local feature extraction. The major representative global models are Principal Component Analysis (PCA) [26], Linear Discriminant Analysis (LDA) [27], etc. These approaches can mine information from the view of global statistical characteristics. The local models are represented by local manifold learning algorithms, such as Locality Preserving Projection (LPP) [28] and Locality Sensitive Discriminant Analysis (LSDA) [29], which can effectively reveal the local geometric structure of data in the process of low-dimensional embedding.

216

X. Zhao and M. Jia / Neurocomputing 366 (2019) 215–233

However, the above-mentioned fault feature extraction methods have two distinct shortcomings. 1) Most of them are shallow feature representative, and their ability to characterize fault information of rotating machinery is insufficient. 2) Most of them depend too much on the priori conditions and expert knowledge, which easily causes the inferior reliability. Therefore, the development of intelligent and automated fault diagnosis technology is seriously hampered by the above-mentioned problems. With the development of automated and intelligent mechanical equipment, the automated and reliable fault diagnosis methods should be put forward urgently. Deep learning (DL) is one of most advanced data and information processing tool in current hottest subfields [30–32]. Compared with traditional machine learning methods, DL is more focused on the depth of model structure. And the abstract and discriminate feature vector in the final transmission layer is formed through the multi-layer feature extraction. Therefore, their accuracy of classification and prediction are distinctly improved and enhanced by using DL. Lately, DL has been widely applied into voice processing [32], image recognition [33], fault diagnosis [34], etc. Since Tamilselvan and Wang [17] first applied deep belief network (DBN) into health assessment of aircraft engines in 2013, this research direction has attracted more and more attention. Jia et al. [34] firstly constructed a novel DL model (Deep Neural Networks (DNNs)) and applied it into intelligent diagnosis of rotating machinery. Therefore, this proposed method can obtain superior diagnosis accuracy compared with the existing intelligent diagnosis methods. Liao et al. [35] firstly proposed a new Regularized Enhancement Restricted Boltzmann Machine (RERBM), which automatically generated characteristics for the remaining life expectancy and fault health assessment. Ince et al. [36] successfully applied Convolutional Neural Networks (CNNs) into fault detection of the original time series data of the motor. Nevertheless, each neuron in CNNs is connected with the local receptive field of the previous layer, and its positional relationship is determined according to local features. Whereas, its training also needs a supervised mode [36]. On the other hand, the purely unsupervised training of DBN can extract the sensitive features from bottom layer to top layer of data, but it’s more difficult when dealing with large-scale high-dimensional data [35]. Inspired by DBN and the local weight sharing of CNN, Convolutional Deep Belief Network (CDBN) was firstly presented by Lee et al. [37] in 2009, which not only has unsupervised characteristics of DBN, but also inherits local feature extraction ability of CNN. A new convolutional deep belief network (CDBN) based fault diagnosis method was developed by Shao et al. [11], which was applied into analyzing experimental signals collected from electric locomotive bearings. As a matter of fact, the new fault diagnosis methods based on deep learning still need to be explored and applied. At the same time, it was found that CDBN generally define minimum reconstruction error when defining the objective function for solving weights. In other words, the conventional deep neural network learning model requires a large number of labeled samples to be trained, and it often requires tens of thousands of iterative updates to obtain the better recognition performance. To reduce the time complexity and maintain the strong classification ability of the weights in the network, an improved algorithm is proposed in this paper for CDBN based on the Fisher criterion. The difference of the proposed Fisher Convolutional Deep Belief Network (Fisher-CDBN) and CDBN is that Fisher-CDBN can introduce an energy function model with intra-class distance and inter-class distance to adjust the weights of back propagating. It is possible to direct the optimal values that are favorable for classification during the iterative adjustment of weights, and it also can improve the ability of this model to extract local discriminant information of rotating machinery.

On the other hand, Refs. [23–25] indicate that global information and local information of data are both useful for dimension reduction and classification. Under normal circumstances, the structural characteristics of the data include the local structural features that describe the arrangement of data neighbors, and the global structural features that describe the overall structure information of the data. In other words, the former reflects the intrinsic properties of the fault data, and the latter reflects the external shape of the fault data. Increasingly, the more information is preserved in the process of feature extraction, the more accurate fault diagnosis is. As a consequence, the conventional deep learning methods are also faced with the issue that they cannot juggle global information extraction and local information extraction. Meanwhile, the vibration signals of rotating machinery are nonlinear and non-stationary. KPCA (Kernel Principal Component Analysis) [38], as a shallow non-linear feature extraction method, is widely used in global feature extraction. Although the extracted principal components of KPCA can retain most data variations from raw data set, they can only capture the global structure of the processed data. In other words, the detailed local structure information among the process data has been ignored. However, the inner structure represents the detailed relationship between different data samples, which is also an important aspect of the dataset. The loss of this crucial information when applying KPCA has great impact on dimension reduction performance, thus fault diagnosis will also be influenced. As a compromise, both local and global information should be adequately considered for dimension reduction and fault diagnosis. While the global structure defines the outer shape of the processed dataset, the local structure of data provides its inner organization. However, KPCA or the proposed Fisher-CDBN only considers one of these two aspects. To sum up, there is no novel deep learning model can extract the global and local feature information of fault data to improve diagnostic reliability and accuracy of rotating machinery [18–20]. Considering those above-mentioned problems, this paper presents a novel rotating machinery fault diagnosis method based on LGDNN (Local-Global Deep Neural Network). The proposed LGDNN is composed of two layers of local-global features representative. The first layer of the trained Fisher-CDBNs in LGDNN is used to extract local information of data. The second layer of KPCA in LGDNN can receive the result of the first layer and extract global feature information from global sub-feature space. Finally, the classification layer of soft-max in LGDNN is applied to classify different health conditions. Last but not least, the main contributions of this paper are outlined as follows. 1) To reduce the time complexity and maintain the strong classification ability of the weights in the network, an improved algorithm for CDBN based on the Fisher criterion (called Fisher-CDBN) is presented in this paper. The difference of the proposed Fisher Convolutional Deep Belief Network (Fisher-CDBN) and CDBN is that Fisher-CDBN can introduce an energy function model with intra-class distance and inter-class distance to adjust the weights of back propagating. 2) Aimed at the present issue that conventional deep learning algorithm cannot juggle local and global fault feature information, the improved Fisher-CDBN algorithm combines with global feature extraction method of Kernel Principal Component Analysis (KPCA) to develop a novel Local-Global Deep Neural Network (LGDNN) model. In conclusion, this novel deep feature extraction method (LGDNN) can preserve localglobal feature information of rotating machinery. 3) A brand-new fault diagnosis method of rotating machinery based on LGDNN is presented in this paper, which can directly input the spectral signals into LGDNN to extract

X. Zhao and M. Jia / Neurocomputing 366 (2019) 215–233

217

Fig. 1. The schematic diagram of convolution process based on CRBM. Fig. 2. The schematic diagram of the probabilistic pooling CRBM.

local-global deep information of the vibration signals. Subsequently, the application of soft-max regression is devoted to classify the different fault conditions. At last, the vibration signals of rolling bearings and the fan’s gear validate the effectiveness of the proposed LGDNN model and fault diagnosis method based on LGDNN.

Similar to the standard Restricted Boltzmann Machine (RBM), the energy function of CRBM [37] is shown as

E ( v, h ) = −

NH  NW K  

hki j wkrs vi+r−1, j+s−1

k=1 i, j=1 r,s=1 K 

NH 

NV 

The remainder of this paper is organized as follows. In Section 2, the feature extractors of CDBN and KPCA are introduced, respectively. Afterwards, the novel Fisher-CBDN and LGDNN algorithm are put forward in Section 3. In Section 4, a novel deep learning fault diagnosis method of rotating machinery based on LGDNN is proposed. The validity of the proposed fault diagnosis method and algorithm is verified in Section 5. Finally, the conclusions are drawn in Section 6.

Where NH and NV represent the number of hidden and visible layer neurons, respectively. Besides, there are a bias bk in each hidden group and a shared single bias c for all visible units. Accordingly, the conditional probability distribution of this CRBM can be obtained as below.

2. Brief introduction of the basic algorithms

p hkj = 1|v =

σ

p(vi = 1|h ) =

σ W ∗h



k=1



Firstly, the local feature extractor of CDBN is briefly introduced in this section. Afterwards, the widely used global feature extractor (KPCA) is also introduced in this section. 2.1. Introduction to Convolutional Deep Belief Networks (CDBN) In general, CDBN can handle large-scale data and automatically learn local structure information from the target data. 2.1.1. CRBM (Convolution Restricted Boltzmann Machine) CRBM (Convolution Restricted Boltzmann Machine) [37] is also an energy model which consists of a visible layer V and a hidden layer H represented by random variable matrices. But the weights between the hidden layers and visible layers are shared among all locations [11,37]. Assuming that the visible layer is made of L channels, each channel is made of NV × NV real units. The hidden layer consists of K group, each group is made of NH × NH hidden units. Each of the K hidden groups is connected with a NW × NW matrix called filter (NW = NV − NH + 1, W1 , W2 , . . ., WK ) [11]. Subsequently, the size of the convolution filter is shown as below







Nw × Nw Nw = NV − NH + 1

(1)

A standard CRBM with K3×3 filters is shown in Fig. 1. As shown in Fig. 1, the connection weights are shared in each hidden group, and the convolution and feature extraction process of CRBM are presented. The lth channel corresponding to the kth hidden group is Wk,l , which is shared between the hidden units in the kth group. In addition, each hidden group has an offset value bk .

bk

σ (x ) =



hk i j − c

i, j=1

 



˜ k∗v W k

vi j

(2)

i. j=1

k

j

+ bk

i

+c







(3)

1 1 + e−x

(4)

Where σ is the sigmoid function. And ∗ is convolution symbol. To make CRBM more scalable, Lee et al. [37] further developed the probabilistic max-pooling technology to narrow the representation of higher layers by applying probability. The model structure of the pooling CRBM is shown in Fig. 2. For the maximum pooling operation, the blocks in the detection layer and its units in the pooling layer are connected by the following constraints. If there is one unit in the detection layer that is active, the corresponding unit in the pooling layer will be active. In the Fig. 2, the E is the max-pooling ratio. Accordingly, the energy function of probabilistic max-pooling CRBM [37] is defined as

E ( v, h ) = − sub j.to

 k









˜ k∗v hki, j W

i, j

+ bk hki, j − c

i, j



vi, j

i, j

hki j ≤ 1, ∀k, α

(5)

(i, j )∈Bα



Where Bα = {(i, j ) : hi j ∈ α} is defined. And the sampling the detection layer H and the pooling layer P are discussed. Then, the conditional probability of pooling CRBM is expressed as below.

P (hki, j = 1|v ) =



1+



P pkα = 0|v =

1+





exp(I (hki , j )) ( i , j  )∈Ba

exp(I (hki , j ))

1 ( i , j  )∈Ba



exp I hki , j



(6)

(7)

218

X. Zhao and M. Jia / Neurocomputing 366 (2019) 215–233

˜ k ∗ v )i j . Similarly, the In the Eqs. (6) and (7), I (hki j ) = bk + (W hidden layer H is obtained by the visible layer V. More details of CRBMs are described in Refs. [11,37]. 2.1.2. CDBN (Convolutional Deep Belief Network) In general, CDBN model is obtained through the pooling CRBMs. And this model can learn different layers of representative features. And CDBN can handle large-scale data and automatically learn local structure information from the target data. In other words, the Contrastive Divergence (CD) algorithm [11,37] is applied to optimize the connected weight of deep neural network. Afterwards, the weights and offsets of CDBN are inversely fine-tuned according to the tagged samples to achieve the purpose of layer-by-layer learning depth-sensitive multi-layer features. More details of CRBMs and CDBN are described in Refs. [11,37]. It is found that CDBN generally have the smallest reconstruction error when defining the objective function for solving weights. This model requires a large number of labeled samples to be trained, and it often requires tens of thousands of iterations to obtain better recognition performance. Blind targets of CDBN are optimized, which is not conducive to extracting local fault features with discriminant information. 2.2. Introduction to Kernel Principal Component Analysis (KPCA) KPCA [38], as a conventional shallow feature extraction method, is widely used in global non-linear feature extraction. Actually, KPCA can map the raw signals into the nonlinear kernel space H(x) and extract the nonlinear fault feature information of vibration signals. Compared with the Principal Component Analysis (PCA), its nonlinear feature learning ability is stronger. Thus, the main ideology of KPCA is described as follows. Firstly, the sample dataset X={xi |i = 1,2,…,n; xi ∈Rd } are mapped into high-dimensional feature space G by using nonlinear function H, that is xi →H(xi ). Afterwards, principal component analysis (PCA) is applied to extract nonlinear structure information from the raw dataset. If the mapping data satisfies the requirement of zero mean value, the covariance matrix C of mapping data is expressed by

C=

n 1 T H (xi )H (xi ) n

(8)

i=1

Subsequently, C is analyzed by

λv = C v

(9)

Accordingly, Both ends of Eq. (9) are inner producted by using the samples H(xj )

λ[H (xk ) · v] = H (xk ) · C v, k = 1, 2, ...n

(10)

Where the feature vector v of C in Eq. (10) can be linearly expressed by H(xj ), i.e.

v=

n 

a j H (x j ), a j ( j = 1, 2, ...n )

(11)

j=1

Afterwards, the kernel function K (xi ,xj ) = (H(xi ).H(xj )) and kernel matrix K = (Kij ), Kij = K(xi ,xj ) are introduced, and Eq. (10) is simplified as

λ¯ a = Ka

(12)

Subsequently, this issue is transformed into the solution of K. The t-th kernel principal component is extracted as

Tt = [vt · H (x )] =

n 

at,i K (xi , x )

i=1

Finally, more details of KPCA are described in Ref. [38].

(13)

3. The proposed Fisher-CDBN algorithm and LGDNN algorithm In this section, we firstly propose a new Fisher-CDBN and develop a novel LGDNN model which can extract local-global structural features from the original vibration spectral signals. 3.1. The proposed Fisher-Convolutional Deep Belief Network (Fisher-CDBN) In general, the optimization goal of the deep neural network is based on the least squares criterion. Nonetheless, the least squares target optimization criterion in practical engineering cannot fully utilize the category label information of the data, and the extracted features are not favorable for the subsequent classification. Ideally, we hope that the extracted features samples distance are smaller from homogeneous sample, and the samples distance from the heterogeneous sample are larger, so that the extracted features are more sensitive and beneficial to the actual fault classification. However, the training objective function of the traditional CBDN is based on the least squares criterion, which can only guarantee the minimum error of the optimized parameter model. Fortunately, we know that the Fisher’s criterion can achieve the following objectives: when it is applied to determine the optimal direction, the expected classification effect is selected according to the optimal projection direction and multiple loop iteration. Therefore, the objective function based on Fisher criterion can make full use of the category information of the fault sample while minimizing the training error, so that the sample distance between the same classes is small, and the distance between the different classes is large. In this way, Fisher discriminant criterion is introduced into CDBN, so that when each iteration of the training updates the weights, the adjustment of each layer parameter not only ensures that the error of the actual output value and the labeled value are as small as possible, but also makes the distance of same-class samples closer and the heterogeneous samples are further away from each other. The iterations are continuously updated along this target, so that the trained network weights are more conducive to classification and identification. 3.1.1. Cost function of Fisher-CDBN and BP (Back Propagation) algorithm Hypothetically, a sample set {(x1 , y 1 )… (xn , y n )} belong to c categories. n is the number of samples. The number of category labels for the sample is c, yi ∈ C. And yi is the category label corresponding to sample xi . Accordingly, the basic cost function of CDBN is defined as follows

R = arg min R(w, b) =

 =

w,b

n 1 R(w, b; xi , yi )] n i=1

 n  1 1 hw,b (xi ) − yi 2 n 2



(14)

i=1

where w is the connection parameter between the layers of the unit. And b is the offset. hW,b (xi ) is the output of the last layer of the neural network, i.e. the predicted value. The goal of the training network is devoted to find the minimum value (the parameter w and b) of its function R (w, b). And the objective function is optimized by using the gradient descent method. Therefore, its iteration equation is described as follows

Wilj = Wilj − β bli = bli − β

∂ R(w, b) ∂ Wilj

∂ R(w, b) ∂ bli

(15) (16)

X. Zhao and M. Jia / Neurocomputing 366 (2019) 215–233

In Eqs. (15) and (16), β is the learning rate. Meanwhile, the forward propagation is first applied to calculate the final layer output value h of the network by using the back propagation algorithm, and then the direct difference (ζinl ) between the sample label prediction value h and the actual label y is calculated. This direct difference is defined as ζinl (nl represents the output layer). Afterwards, the partial derivatives of Eqs. (15) and (16) is calculated from the residuals of the final output layer.

∂R ∂ 1 ζinl = ∂ nl = nl hw,b (xi ) − yi 2 ∂ Zi ∂ Zi 2

(17)

Where Zinl is the inputting weighted sum of sample xi in the last layer. However, according to the above-mentioned Eqs. (14)–(17), traditional CDBN does not make use of the category label information and cannot distinguish the heterogeneous samples, so the extracted feature information is not conducive to fault classification. To make the algorithm more conducive to classification, this paper uses the Fisher criterion to propose the new energy function based on intra-class distances and inter-class distances. R1 is the intra-class similarity measure function which can be defined as the sum of the distances of all samples and their label means. And R2 is a similarity measure function which is defined as the sum of the distances of the mean of all sample categories.

R1 = R2 =

1 2

m  c 



hw,b (xi, j ) − M j 2

(18)

i=1 j=1

c c 1  2

j

M − M i 2

Where m is the number of samples per class, c is the number of categories (n = m × c). Simultaneously, Mj is the mean value of the jth class samples, i.e.

Mj =

i=1

hw,b (xi, j ) m

(20)

When the gradient algorithm is computed by using R1 as a cost function, each step iteration causes the average predicted value of the sample predictor and samples category to be smaller. When the gradient algorithm is calculated using R2 as a cost function, each step is iterated to make the distance of heterogeneous samples even greater. To make the learned features by each layer of the new CDBNs network more conducive to classification, an energy function model that incorporates intra-class distance and interclass distance constraints is proposed as below.

J f isherCDBN = R + ε1 R1 − (1 − ε1 )R2

ξi = = =

∂ R1 ∂ Zi(nl ) ∂



∂ Zi(nl ) m  c  

m c

  1 

hw,b xi, j − M j 2 2 i=1 j=1

(21)

In this equation, ɛ1 is the parameter adjustment coefficient of Fisher-CDBN’s cost function. R is the cost function of the original CDBN. While taking into account the reconstruction error, the new overall cost function JfisherCDBN makes intra-class distance and interclass distance of the learned features larger. Thus, the weights of each layer are adjusted in a direction that is more conducive to classification, so that when the number of samples or the number of training iterations is small, the targets of Jfisher-CDBN for classification can be approached more quickly. When the weights are adjusted by using the BP algorithm, the key problem is that the residual of the output unit is obtained, and each sub-function can separately solve the residual of its corresponding output unit. The most important thing when updating the weights with BP algorithm is to find the residual of each output element at the final output layer of the objective function. For the intra-class constraint objective function R1 , the residual’s calculation formula for each output element of the outputting

 

hw,b (xi, j ) − M j ·

anl i







− Mj

i=1 j=1

=

m  c  

 



j nl anl · anl i −M i · 1 − ai



=

m  c  

anl i −M

  j



 nl

· anl i · 1 − ai

i=1 j=1

=

m  c  

 



j nl anl · anl i −M i · 1 − ai

 



− mM j ·

i=1 j=1





1 n



− Mj

anl · 1 − anl i i





m

1−

i=1 j=1

1 m

 (22)

The “activation value” of each layer is anl . For the inter-class constraint objective function R2 , the residual’s calculation formula for each output unit of the output layer is described as follows

 c c

∂ R2 ∂ 1 

j i 2

ξi = (nl ) = (nl ) M −M 2 ∂ Zi ∂ Zi j=1 i= j+1  i    i   j    j 

= Mi − M

j=1 i= j+1

m 

layer is described as follows

= M −M

(19)

219

·

 j

M

· mM j

− M

1

m



− M i − mM j

1 m

− Mj

 (23)

In this model, after each sub-function finds the residual of the last layer, it can iterate through the BP algorithm and obtain all the weights. In the Eq. (21), the value of the adjustment parameter ɛ1 needs to be determined through experiments, different databases and different network structure have slightly different values. Usually, the range of parameter adjustment coefficient ɛ1 is 0∼1. 3.1.2. Algorithm step of Fisher-CDBN In summary, this paper proposes a brand-new CDBN deep model based on the Fisher criterion-Fisher-CDBN. In the adjustment of the weights by inverse-propagation, the energy function based on Fisher criterion is proposed to make the weight learning process more beneficial. And the training steps of Fisher-CDBN are described as following. Step 1. Fisher-CDBN is pre-trained through unsupervised greedy means, which is called Contrast Divergence (CD) algorithm. When the parameters of training layer are finished, its weight parameters of this layer are temporarily fixed. Step 2. When all layers have been pre-trained, the network weights of CDBN are fine-tuned through using labeled data with reverse propagation algorithm based on the Fisher parameter optimization criterion (i.e. Eq. (21)) to monitor the entire network learning. In conclusion, the schematic comparison between CDBN and Fisher-CDBN is shown in Fig. 3. 3.2. Research motivation of LGDNN (Local-Global Deep Neural Network) Actually, the global information can reflect the overall characteristics of data, and the local information corresponds to the local area which focus on the extracted details. Global feature information and local feature information are widely used in the areas of

220

X. Zhao and M. Jia / Neurocomputing 366 (2019) 215–233

Fig. 3. The schematic comparison between CDBN and Fisher-CDBN.

face recognition and image processing, etc. Yet it is rarely mentioned in fault diagnosis [23–25]. At the same time, local feature information can restore the global information. Therefore, global information and local information are different but equally important [19]. The less loss of information from raw data, the more accurate the subsequent classification is. According to Fisher-CDBN, it can be seen that the local feature information of data can be directly extracted by convolution and localized weight matrix. The proposed Fisher-CDBNs model can extract more local feature information that is conducive to classification. However, the features extracted by Fisher-CDBN do not fully reflect the overall characteristics of the data, and the dimensions and redundancy of the extracted data are too high. Therefore, it is necessary to introduce an effective global feature extraction and effective data dimension reduction method. KPCA has a strong ability to comprehensively distribute the variance information of the original data, and it also can effectively extract nonlinear feature information, but it does not fully consider the local geometric relationship of the data samples. And global information plays an important role in describing the overall scale of the data. In the global scope, the global information of the data shown by the principle of KPCA is reflected in the covariance matrix, and the high-dimensional data space is projected into the lowdimensional space by KL transform, so that the compressed lowdimensional data has the smallest variance and the smallest information loss. Finally, the feature analysis is performed to extract the global nonlinear information of the data by solving the objective function. As mentioned above, the global and local features in the process of fault diagnosis play the different role. To make comprehensive use of more fault diagnosis information, the global and local features are effectively integrated into the together. From the point of view of spectral analysis, global characteristics corresponding to low frequency, and local features corresponding to the high frequency. From the point of view of information, the global feature is the whole contour information, and local information is the details of the data. In this section, we first illustrate the different roles of global and local features through the structural diagram shown in Fig. 4, and then the proposed global and local feature extraction method is introduced. As shown in Fig. 4, the inputting raw sample points consist of sample points on the left and right sides. Since the size and number of data on both sides are roughly the same, we select two different types of sample points as the research center (i.e., the yellow sample points are the research centers). Since the global feature reflects the overall change of the sample, the global feature extractor will process the left and right sides of the original sample point as a whole, but it is impossible to

discriminate the category information to which the sample center point belongs by analyzing the final analysis result, so the feature expression is fuzzy. For another, the local feature reflects the detail change of the data, so the local feature extractor will put the left and right sides of the original sample into some local patches. This kind of local feature information can distinguish the difference class information of the sample research center through the neighbor relationship. The local information reflects the details of the sample by the local area as shown in Fig. 4. This diametrically opposite result illustrates the different effects of the two features from one side, and it also proves the necessity of integrating global and local feature extraction methods. From the above Sections 2 and 3.1, the proposed Fisher-CDBN can fully excavate local structural information that hides in deep structure, but it easily ignores global feature information. KPCA, as a statistical global feature learning algorithm, can keep global variance of data to maximize information. But it is ineffective on the scale of local structure information. Summarizing the advantages and disadvantages of the above-mentioned two algorithms, the fusion of local and global deep feature learning model (LGDNN) is proposed. It can be known that KPCA can complement the defect that the Fisher-CDBN cannot extract the global feature information of the data, and further it can compress the high-dimensional features extracted by CDBN. 3.3. Feature extraction phase of LGDNN In the LGDNN model, Fisher-CDBN is trained layer by layer, so the local information of data is extracted. And the fully connected feature vectors in the last layer of Fisher-CDBN can be seen as new observation data for KPCA layer. In the global layer, the extracted local features are mapped into high-dimensional feature space through nonlinear kernel function H(X), and KPCA is applied to extract prominent global structure. Finally, the extracted localglobal feature vectors are used as the input of soft-max classifier to calculate fault diagnostic accuracy. In summary, the feature extraction process of LGDNN is divided into three stages. 1) First stage: Local feature extraction The network layer of local feature extraction is divided into a series of adjacent pooling Fisher-CRBMs, which are trained by using unsupervised CD (Contrastive Divergence) [30] algorithm. The hidden layer vector h1 of the first layer is trained according to p(h1 |v ) = p(h1 |v, W ). Using h1 as the input of the second layer, the weight W2 of the second layer is trained. Since the single-layer of Fisher-CRBM cannot perfectly describe original features of the model, a higher level network needs to be constructed. A number

X. Zhao and M. Jia / Neurocomputing 366 (2019) 215–233

221

Fig. 4. The different roles of global features and local features in fault diagnosis.

of pre-trained Fisher-CRBMs are connected in series. According to Fisher-CDBN, it can be seen that the local feature information of data can be directly extracted by convolution and localized weight matrix. The proposed Fisher-CDBNs model can extract more local feature information that is conducive to classification. 2) Second stage: Global feature extraction After the first stage of deep learning feature extraction, the local information of data is maintained, but its global information is ignored. For this reason, the fully connection layer of CRBMs is taken as the new observation data to KPCA layer. The training data in global subspace is shown as





X = X 1 , X 2 , ..., X n ∈ Rk1 k2 ×Nnm

(24)

The new observation data is mapped into high-dimensional space through kernel function, and KPCA is used to minimize the reconstruction error.





2

min X − V V T X

V ∈R

s.t.V V T = IL

F

(25)

where IL is the unit matrix of L × L. V is covariance matrix of the former L eigenvectors. And then V represents the principal components (PCs) of the input data. The global PCs of the training data are obtained by the unsupervised PCA. The extracted PCs represent all of the features of data. 3) Third stage: Local-global feature extraction and classification The network architecture of LGDNN is constructed by a local layer and a global layer. And its fine-tuning starts from the last

layer of LGDNN by using sample labels. Its parameters are updated and saved. After two stages, the new feature space can preserve local-global feature information of data. The features extracted by LGDNN is used as the input of the classifier. Therefore, the softmax classifier is trained for fault recognition. As a result, the network architecture diagram of LGDNN is shown in Fig. 5. And its algorithm steps of LGDNN is shown as below. Input: The vibration spectrum signals X and the initialized parameters of LGDNN. Output: The parameters of the network, the high-level features of LGDNN (Y) and classification recognition rate. Step. 1 The data X is normalized, which is denoted as v0 , and then v0 is taken as visual layer input of the first layer Fisher-CRBM. Step. 2 The h0 i , v1 j , h1 i are calculated by using CD algorithm from Eq. (6) and Eq. (7) to update the parameters. Step. 3 After each pooling Fisher-CRBM has trained, repeating Step.2 and Step.3 until the maximum number of iterations is reached or reconstruction errors are small enough to end the training of Fisher-CRBM. Step. 4 The output of the last Fisher-CRBM layer is fully connected and is input into the global feature layer. First, local features are input into non-linear space, and X∈Rd is mapped into high-dimensional feature space H(X)∈Rh (h ≥ d) by kernel function. PCA is used to minimize reconstructed error. According to Eq. (10), the global maximum variance of data is obtained. Finally, the network parameters of LGDNN are fine-tuned by using sample labels.

222

X. Zhao and M. Jia / Neurocomputing 366 (2019) 215–233

Fig. 5. The schematic diagram of the proposed LGDNN algorithm.

Fig. 6. The flowchart of rotating machinery fault diagnosis method based on LGDNN.

Step. 5 And its high-level feature set Y is put forward by training LGDNN. And the soft-max classifier is trained by highlevel feature set Y toward classified faults. 4. The proposed fault diagnosis method of rotating machinery based on LGDNN To learn more effective fault information, a new fault diagnosis method of rotating machinery based on LGDNN is proposed in this section. The fault diagnosis method of rotating machinery based on LGDNN is shown in Fig. 6, and its steps can be summarized as follows Step. 1: The spectrum signals of rotating machinery are firstly obtained. Afterwards, those signals are divided into training samples and testing samples. Step. 2: The samples is normalized into X¯ . And the initialized parameters of LGDNN are obey to Gaussian distribution. Step. 3: The LGDNN is pre-trained by using unlabeled samples. In order to determine the hidden number N of LGDNN. Each hidden layer output of Fshier-CDBN is regarded as the input of next layer of Fshier-CDBN until N Fisher-CDBN

training are completed and then the output of local features is regarded as input of KPCA to find global features. Finally, the local-global feature information is extracted. Step. 4: According to the labeled information of samples, the network parameters of LGDNN are reverse fine-tuning. Step. 5: The testing samples are tested by the trained LGDNN. And the extracted local-global features are then input into soft-max classifier to diagnose the faults. 5. Experiment and analysis Bearings and gears as key components of rotating machinery, their health conditions will usually affect the life of rotating machinery. In this section, the diagnostic cases of motor rolling bearing failure [39] and wind turbine gears failure [40] are used to verify the feasibility of LGDNN-based fault diagnosis method, respectively. 5.1. Case 1: fault diagnosis of rolling bearings The data set in this experiment was collected by the Electronic Engineering Laboratory at University of Case Western Reserve, USA.

X. Zhao and M. Jia / Neurocomputing 366 (2019) 215–233

223

Fig. 7. Rolling bearing (CWRU) fault simulation test bed.

Fig. 8. The influence of adjustment coefficient for fault diagnosis method based on LGDNN. Table 1 Fault conditions of data set A.

Table 2 The parameters setting up of LGDNN.

Fault type

Fault depth/mm

Labels

Samples number

Parameter type

Value

Parameter type

Value

Normal BF1 BF2 BF3 IF1 IF2 IF3 OF1 OF2 OF3

0 0.18 0.36 0.54 0.18 0.36 0.54 0.18 0.36 0.54

F1 F2 F3 F4 F5 F6 F7 F8 F9 F10

100 100 100 100 100 100 100 100 100 100

Learning rate (Fisher-CDBN) Layer number of Fisher-CDBN Sparsity (Fisher-CDBN)

0.1

Layer2.n_map.v (Fisher-CDBN) Layer2.n_map.h (Fisher-CDBN) Layer2.fliter (Fisher-CDBN) Layer2.pool (Fisher-CDBN) Regularization (Fisher-CDBN) Layer1.pool (Fisher-CDBN) Stride (Fisher-CDBN)

9

And its sampling frequency was 12 kHz. This motor-bearing experiment system simulated four kinds of health conditions, which are normal, outer ring fault, inner ring fault and rolling element fault. And each fault condition has three kinds of fault depth. The experiment dataset A, B were constructed under 0∼1 hp load, respectively. And each dataset contained 10 kinds of fault conditions. The original vibration signals were intercepted by using 1024 points, each fault states can get 100 samples. Among them, the statistics feature of dataset A as shown in Table 1, where the normal represents the normal condition of bearing, and BF, IF, OF on behalf of the ball failure, bearing inner failure and outer ring failure, respectively. And the subscript numbers in BF, OF and IF represent the different degree of failure. Various failure conditions are marked as F1, F2, F3, F4, F5, F6, F7, F8, F9, and F10, respectively. And rolling bearing (CWRU) fault simulation test bed is shown as in Fig. 7. In this experiment case, 50% of the samples were randomly selected for training samples, and the remaining samples were used as testing samples. The parameters of Fisher-CDBN and KPCA in LGDNN are set up according to Refs. [11,37], respectively.

Number of iterations (Fisher-CDBN) Batchsize (Fisher-CDBN) Layer1.n_map.v (Fisher-CDBN) Layer1.n_map.h (Fisher-CDBN) Layer1.fliter (Fisher-CDBN) Input_type (Fisher-CDBN)

2 0.03 400 20 1 9 [7 7] “Binary”

Adjustment coefficient (Fisher-CDBN) Kernel parameters (KPCA)

16 [5 5] [2 2] 0.1 [2 2] [1 1] ɛ1=0.6 Gaussian kernel =256

5.1.1. Parameter selection of the proposed method The network model of LGDNN mainly consists of the following parameters, namely, the local layer parameters of Fisher-CDBN and the selection of the kernel parameters of KPCA in the global layer. The selection of these parameters are referred in [11,37], the hidden layer and the visible layer unit of the Fisher-CDBNs can be set as [1-9-16]. Two layers of CRBMs are set. Considering the stability and convergence rate of the algorithm, parameters set up of LGDNN is shown in Table 2. According to the fault diagnosis flowchart, Fig. 8 shows the influence of adjustment coefficient for fault diagnosis based on LGDNN. It can be seen that the adjustment coefficient ɛ1 = 0.6 is appropriate. After constructing the main parameters of LGDNN model and data set, the rolling bearing fault data set were diagnosed accord-

224

X. Zhao and M. Jia / Neurocomputing 366 (2019) 215–233

ings, Fisher-CDBN can reach to 0.01 or less, which indicates that the Fisher-CDBN model has better input and output fitting ability. And after 400 iterations of CDBN training, the training error reconstruction curve can only reach to 0.1, which demonstrates that the training model of CDBN is general, and its information loss is very large. From the perspective of information theory, it can be explained that the smaller the training error is, the less the loss of information is and the stronger the learning ability of the model is better. Compared to the original CDBN, Fisher-CDBN is iterated in the direction of the Fisher criterion for data classification in each training, so the extracted features are more beneficial to the local information of the extracted data.

Fig. 9. The confusion matrix for training samples based on LGDNN-fault diagnosis method.

ing to the diagnostic flowchart. To display more details about the diagnostic information, Fig. 9 shows the confusion matrix of the training samples’ fault diagnosis by using flowchart of fault diagnosis based on LGDNN. As we can see, the classification result of all types of failures can basically reach nearly 100%. Only the second type and the third type have a certain degree of aliasing, and the proposed method can completely separate all kinds of fault condition. To illustrate the performance of the improved Fisher-CDBN, when the parameters are not changed, and Fig. 10 compares the mean square error (MSE) of Fisher-CDBN neural network with the MSE of the original CDBN neural network. It can be seen that the Fisher-CDBN can be reduced to a lower level of MSE and its convergence rate is higher. Besides, the MSE of CDBN is always high, and it tends to converge to the minimum value as the iteration progresses toward stability owing to that the FisherCDBNs based on the Fisher criterion can obtain fault features that are conducive to classification and improve the calculation and classification result. From Fig. 10, we can see that the training error reconstruction curve based on Fisher-CDBN is faster than CDBN. After 400 train-

5.1.2. Fault classification capability based on LGDNN According to the diagnostic flowchart in the Fig. 6, the dataset A is classified by our proposed fault diagnosis method (LGDNN) and Fisher-CDBN-based, CDBN-based [5] fault diagnosis method, respectively. Therefore, recognition results of testing samples based on three above-mentioned methods are shown in Fig. 11a), Fig. 11b), Fig. 11c), respectively. It can be seen that two kinds of deep learning fault diagnosis methods can directly input original signals into deep learning method to realize the effective fault diagnosis. In Fig. 11, the accuracy of CDBN-based method is lower than LGDNN. The accuracy of LGDNN is close to 100%, especially the classification of training samples, which fully demonstrates that our method can completely eliminate the interference of different working conditions and accurately identify 10 health conditions of bearing. To verify that LGDNN has strong and stable feature learning ability, the fault diagnosis methods including the KPCA+soft-max, CDBN and Fisher-CDBNs are used as the comparative experiments to classify data set B, respectively. The experimental classification results of 20 times based on the above methods are shown in Fig. 12. From diagnosis results, it can be seen that 20 times diagnostic accuracy of KPCA is in the range of 0.51∼0.7, and its rangeability of diagnostic accuracy is lager. This is because traditional shallow feature learning cannot directly get precise feature learning from original vibration signals. The diagnostic accuracy of CDBN are concentrated in the range of 0.82∼0.97, its recognition rate is stable. Obviously, LGDNN has the highest recognition accuracy which is close to 100% due to its capabilities of local feature extraction of CDBN

Fig. 10. The mean square error (MSE) curve based on Fisher-CDBN and CDBN.

X. Zhao and M. Jia / Neurocomputing 366 (2019) 215–233

225

Fig. 11. Testing samples’ classification for dataset A based on different fault diagnosis methods.

and global feature extraction ability of KPCA. And it can extract more fault information from mechanical spectrum signals. Therefore, its fault diagnosis accuracy has been improved. The classification results show that the diagnosis ability of the proposed method has obvious advantages on local-global extraction ability.

5.1.3. Feature extraction capability of LGDNN To validate that the proposed LGDNN-based fault diagnosis method has a strong feature extraction capabilities, the FisherCDBN, CDBN, DBN, KPCA and PCA algorithms were used to extract features of original fault dataset A, respectively. These are

226

X. Zhao and M. Jia / Neurocomputing 366 (2019) 215–233

Fig. 12. The 20 times recognition accuracy based on three fault diagnosis methods.

comparative testings (in which Fisher-CDBN, CDBN, DBN are processed by PCA). To facilitate the follow-up work, we remember six algorithms as {LGDNN, Fisher-CDBN, CDBN, DBN, KPCA, PCA} = {D6, D5, D4, D3, D2, D1}, respectively. In order to clarify the experimental parameters, we need to briefly introduce the main parameter settings of other comparison methods (where the LGDNN parameter settings are as shown in Table 2). Therefore, the parameters of other comparison methods are described as follows: 1) Fisher-CDBN: Learning rate is 0.1. // Layer number of FisherCDBN is 2. // Sparsity (Fisher-CDBN) is 0.03. // Number of iterations (Fisher-CDBN)is 400. // Batchsize is 20. // The structure is 1-9-16.// Layer1.fliter (Fisher-CDBN) is [7 7]. // Layer1.fliter (Fisher-CDBN) is [5 5]. // Layer.pool (FisherCDBN) is [2 2]. //Stride (Fisher-CDBN) is [1 1]. //Adjustment coefficient (Fisher-CDBN) ɛ1 = 0.6. // soft-max classifier; 2) CDBN: Learning rate is 0.1. // Layer number is 2. // Sparsity is 0.03. // Number of iterations is 400. // Batchsize is 20. // The structure is 1-9-16. // Layer1.fliter is [7 7]. // Layer1.fliter is [5 5]. //Layer.pool is [2 2]. //Stride (Fisher-CDBN) [1 1] .// soft-max classifier; 3) DBN: Learning rate is 0.1. // Number of iterations is 400. // Batchsize is 20. // The structure is 512-20 0-10 0. //Activation_function = ’sigm’; 4) KPCA: Kernel parameters (KPCA) is Gaussian kernel. // For kernel parameters of the KPCA is set to 256 by grid optimization. And Principal component contribution rate is 0.99. //soft-max classifier; 5) PCA: its Principal component contribution rate is 0.99.//softmax classifier; Correspondingly, the three-dimensional principal component scatter plot of the proposed method and its contrast methods are shown in Fig. 13, respectively. From Fig. 13, it can be seen that the clustering and classification of first three features based on LGDNN are the best. The same health conditions of motor bearings are well gathered together, and different health conditions are separated effectively. On the other hand, other 5 feature extraction methods have a certain extent cross-aliasing. Therefore, the proposed method can basically classify 10 health conditions. To quantify the feature learning ability of various methods, the inter-class distance Sb, intra-class distance Sw and Sb/Sw index in pattern recognition are used as the evaluation index of feature extraction [41–42]. The higher the index Sb/Sw is, the better its clustering and classification performance is. The Sb/Sw index of testing samples and training samples for the first three features are calculated by the above-mentioned six methods, the results of which is shown in Fig. 14. In conjunc-

Fig. 13. The first three principal component scatter plots based on different feature extraction methods.

X. Zhao and M. Jia / Neurocomputing 366 (2019) 215–233

227

Table 3 The manual feature parameters. No.

Features

No.

Features

No.

Features

1 2 3 4 5 6 7 8 9 10 11 12

Mean Mean square amplitude Square root amplitude Average amplitude Maximum Minimum Peak-to-peak Skewness Kurtosis Variance Waveform index Peak index

13 14 15 16 17 18 19 20–24

Pulse index Clearance index Slope index Kurtosis index Mean frequency Frequency center Standard deviation frequency 1∼5 times frequency amplitude features

25–34

The first five layers energy feature of IMFs and five-layer complexity

35–42

Three - Layer wavelet Packet decomposition band Energy feature

Table 4 The diagnostic results of bearing dataset based on different feature extraction methods. Fault recognition rate

Fig. 14. The Sb/Sw index of training samples and testing samples based on different feature extraction methods.

Methods

Manual

Manual+KPCA

CDBN

Fisher-CDBN

LGDNN

F1 F2 F3 F4 F5 F6 F7 F8 F9 F10 Average

1 0 0.68 0 1 0.54 0 0 0.8 1 0.502

1 0.36 0.68 0.1 1 0.98 1 1 1 1 0.812

1 0.80 1 1 1 0.92 1 1 1 1 0.972

1 0.88 1 1 1 1 1 1 1 1 0.988

1 1 1 1 1 1 1 1 1 1 1

tion with Figs. 13 and 14, some conclusions have been drawn as follows From PCA to LGDNN, the relative distance index Sb/Sw are increasing, which demonstrates that the extracted features of different methods are getting better and better. Compared to traditional feature extraction methods, the Sb/Sw index of CDBN, Fisher-CDBN and LGDNN are obvious, and their clustering and classification performance are better. The feature representation of LGDNN is the best, and LGDNN can effectively excavate local-global information of data, so LGDNN has strong feature learning ability. 5.1.4. Compared with traditional feature extraction methods At present, the most commonly used manual features include time domain, frequency domain, and time-frequency domain characteristics, etc. The manual feature parameter table is constructed in Table 3. The CDBN, manual features+KPCA, manual features as three combination models are compared with LGDNN-based fault diagnosis method. The diagnostic accuracy of testing samples for bearing dataset B based on LGDNN and other three comparison methods are calculated in Table 4. As seen from Table 4, the two manual feature extraction methods are directly entered into soft-max classifier. And manual feature extraction methods cannot basically achieve fault diagnosis, but their diagnostic results are significantly lower than deep learning methods. It can be seen that the diagnostic accuracy of FisherCDBN is 0.988, which is better than other methods and slightly worse than LGDNN. The proposed LGDNN can accurately diagnose different fault conditions. The proposed method can self-adaptively extract deep fault feature and complete intelligent diagnosis. 5.2. Case 2: fault diagnosis of gear This experimental vibration data of high-speed pinion fault from wind turbines was collected by Eric Bechhoefer, Chief Engineer, NRG Systems [40]. Its radial vibration measurements are

Fig. 15. The fault conditions of wind turbine pinion. Table 5 Experiment parameters of wind turbine pinion. Parameter/device

Value /type

Driving speed Sampling frequency Sensor type Machine health condition Number of pinion teeth Rated power

1800 rpm 97,656 Hz Accelerometer 3 32 3 MW

made on 3MW wind turbine pinions. For fault Condition 1, the initial vibration reading shows a high vibration level, and its machine was stopped after one week. And its pinion fault is shown in Fig. 15. The other two vibrations are marked as Condition 2 and Condition 3. The experiment parameters of the wind turbine pinion are shown in Table 5. To intercept 1024 points as a sample. And three conditions are selected. Three health conditions based on 100 samples are

228

X. Zhao and M. Jia / Neurocomputing 366 (2019) 215–233

Fig. 16. The classification result of gear fault dataset based on LGDNN.

Fig. 17. The recognition rate based on different fault diagnosis methods.

selected as original fault dataset of gear. The experiment is verified from two following aspects. 5.2.1. Fault classification and feature visualization The gear fault data are classified according to the flowchart of the fault diagnosis method based on LGDNN, and its training results and testing results are shown in Fig. 16. Especially, its classification results of training samples is 100%. And the testing samples have only two error samples in the second label, while the other samples are all correct. This shows that the fault diagnosis method based on LGDNN has a good effect on the fault recognition of gears. And LGDNN can eliminate all kinds of interference and accurately identify different health conditions of gear.

5.2.2. Comparison of the proposed method and other methods On the other hand, LGDNN, Fisher-CDBN, CDBN and KPCA algorithms are used to classify original gear fault dataset, and the comparison testing is carried out to verify the classification effect of LGDNN. In order to clarify the experimental parameters, we need to briefly introduce the main parameter settings of other comparison methods (where the LGDNN parameter settings are as shown in Table 2). Therefore, the parameters of other comparison methods are described as follows: 1) Fisher-CDBN: Learning rate is 0.1. // Layer number of FisherCDBN is 2. // Sparsity (Fisher-CDBN) is 0.03. // Number of iterations (Fisher-CDBN)is 400. // Batchsize is 20. // The structure is 1-9-16. // Layer1.fliter (Fisher-CDBN) is [7 7]. // Layer1.fliter (Fisher-CDBN) is [5 5]. // Layer.pool

X. Zhao and M. Jia / Neurocomputing 366 (2019) 215–233

229

(Fisher-CDBN) is [2 2]. //Stride (Fisher-CDBN) [1 1] // Adjustment coefficient (Fisher-CDBN)ɛ1 = 0.6. //soft-max classifier; 2) CDBN: Learning rate is 0.1. // Layer number is 2. // Sparsity is 0.03. // Number of iterations is 400. // Batchsize is 20. // The structure is 1-9-16. // Layer1.fliter is [7 7]. // Layer1.fliter is [5 5]. //Layer.pool is [2 2]. //Stride (Fisher-CDBN) [1 1] //softmax classifier; 3) KPCA: Kernel parameters (KPCA) is Gaussian kernel.// For kernel parameters of the KPCA is set to 256 by grid optimization. And Principal component contribution rate is 0.99. //soft-max classifier; The classification results are shown in the Fig. 17. To validate the feature extraction ability of four kinds of fault diagnosis methods, the extracted high-level features were visualized. The first three-dimensional principal component scatter diagrams based on LGDNN, Fisher-CDBN, CDBN and KPCA algorithms are shown in Fig. 18., respectively. From Figs. 17 and 18, it can be seen that the clustering results of first three features based on LGDNN are the best. And the same health conditions of gears are well gathered together, besides, different health conditions are separated effectively. However, the feature extraction methods based on KPCA, CDBN and Fisher-CDBN have a certain extent cross-aliasing, respectively. Therefore, LGDNN has powerful feature extraction capabilities and it can also extract more feature information and achieve higher recognition accuracy. 5.2.3. Local-global information extraction capabilities of LGDNN To further verify the ability of LGDNN-based fault diagnosis method to extract local-global information, the method of crossing training samples and testing samples combination is adopted. 20/80, 30/70, 40/60, 50/50, 60/40, 70/30, 80/20 are set by changing the ratio of training samples and testing samples, respectively. The changing trend of recognition results of testing samples based on above-mentioned four methods is obtained in Fig. 19. By analyzing the results in Fig. 19, some conclusions have been drawn as below 1) Generally, the recognition accuracy of above-mentioned four methods are increasing, because the number of training samples is increasing. Therefore, there are more priori information of samples. Afterwards, the over-fitting phenomenon is reduced, so that its fault recognition rate is increased 2) As a whole, the declining of deep learning algorithms is not obvious. Especially, the fault recognition rate of LGDNN is very stable, basically not affected. 3) With the change of local information and global information, the stability of LGDNN is the best, when the ratio is 40/60, its recognition rate is close to 100%. Besides, its recognition accuracy is always high, which demonstrates that the localglobal information mining ability of LGDNN is more prominent. 5.3. Case 3: fault diagnosis of rolling bearing fault dataset from ABLT-1A The ABLT-1A (Accelerated Bearing Life Tester provided by Hangzhou Bearing Test Research Center) used in this section is suitable for the fatigue life strengthening test of rolling bearings with the inner diameter of 10–60 mm. The testing machine is mainly composed of the testing head, testing head base, transmission system, loading system, lubrication system, electrical control system, computer monitoring system and the like. Accordingly, its specific real scene and structure diagram are shown in Fig. 20. In this experiment, we selected the model of HRB6205 rolling bearing. And the fault bearing is installed at Channel 1 of the sensor. The other 3 normal bearings were installed at the 2nd, 3rd and

Fig. 18. Three-dimensional scatter plots of gear fault dataset based on different fault diagnosis methods.

230

X. Zhao and M. Jia / Neurocomputing 366 (2019) 215–233

Fig. 19. The average recognition accuracy for different number of training samples based on different methods.

Table 6 The experimental parameters and main technical indicators of ABLT-1A.

Fig. 20. Real show and structure diagram of Accelerated Bearing Life Tester (ABLT1A) Test Bench.

Fig. 21. Schematic diagram of test bearing installation and sensor arrangement.

4th channels, respectively. The bearing installation and sensor arrangement diagram are shown in the Fig. 21. Afterwards, five kinds of health conditions (Normal (AN), Inner ring fault (AIRF), Outer ring fault (AORF), Inner and outer ring composite fault (AIORF), and outer ring ball composite weak fault (ABORF)) were simulated under the condition of empty load. In the case of the speed of 17.5HZ and the sampling frequency of 10240HZ, the original vibration signals were collected by the acceleration sensor, and the collected electrical signal is converted into digital signal by the data acquisition card. Subsequently, data acquisition and signal analysis are carried out by software platforms such as MATLAB and LabVIEW. Finally, its specific experimental parameters are shown in Table 6. Generally, the 1024 vibration points are taken as a sample, and 80 sets of samples are intercepted for each state, of which the first 50 were used as training samples and the remaining 30 were used

Parameter

Value /type

Driving speed Sampling frequency Transducer Data acquisition system PC platform Bearing designation Number of testing bearing Load Machine size

1050 r/min 10240 Hz PCB vibration acceleration sensor NI 9234 CPU AMD Ryzen3 3.50 GHz Matlab2018a, win10 HRB6205 single row deep groove ball bearing 4sets 0kN 1500∗ 720∗ 1300 mm

as test samples. The experimental parameters of this section are set in the above-mentioned section. To validate the effectiveness of the proposed fault diagnosis method (LGDNN), DBN, CDBN and Fisher-CDBN are selected as the comparison model for fault diagnosis, respectively. In order to clarify the experimental parameters, we need to briefly introduce the main parameter settings of other comparison methods (where the LGDNN parameter settings are as shown in Table 2). Therefore, the parameters of other comparison methods are described as follows: 1) Fisher-CDBN: Learning rate is 0.1. // Layer number of FisherCDBN is 2. // Sparsity (Fisher-CDBN) is 0.03. // Number of iterations (Fisher-CDBN) is 400. // Batchsize is 20. // The structure is 1-9-16. // Layer1.fliter (Fisher-CDBN) is [7 7]. // Layer1.fliter (Fisher-CDBN) is [5 5] // Layer.pool (FisherCDBN) is [2 2]. //Stride (Fisher-CDBN) is [1 1]. //Adjustment coefficient (Fisher-CDBN) ɛ1=0.6. //soft-max classifier; 2) CDBN: Learning rate is 0.1. // Layer number is 2. // Sparsity is 0.03. // Number of iterations is 400. // Batchsize is 20. // The structure is 1-9-16. // Layer1.fliter is [7 7]. // Layer1.fliter is [5 5]. //Layer.pool is [2 2]. //Stride (Fisher-CDBN) is [1 1]. //soft-max classifier; 3) DBN: Learning rate is 0.1. // Number of iterations is 400. // Batchsize is 20. //The structure is 512-200100.//Activation_function = ’sigm’; According to the fault diagnosis flowchart, the features extracted by the above-mentioned four fault diagnosis methods are subjected to three-dimensional dimensionality reduction processing (PCA is used for preprocessing and visualization), so the threedimensional feature distribution map of the above-mentioned four diagnostic model for training samples and testing samples is obtained in Fig. 22. As can be seen from Fig. 22, the proposed diagnostic method (LGDNN) can basically separate the four types of testing samples and the training samples from the different health conditions, especially the clustering and classification of the testing samples.

X. Zhao and M. Jia / Neurocomputing 366 (2019) 215–233

Fig. 22. 3-D feature distribution map based on different diagnostic method for training samples and testing samples.

231

232

X. Zhao and M. Jia / Neurocomputing 366 (2019) 215–233

Fig. 23. The diagnostic recognition rate of 4-fold cross-validation based on different fault diagnosis methods.

There exists a certain amount of cross aliasing in the other three fault diagnosis methods. Particularly, the classification of AORF and AIORF is not obvious, and the separation performance and aggregation performance of the testing samples based on the other three methods are also general. Compared with other methods, this paper fully demonstrates that the proposed method can effectively extract the global and local feature performance of data, which is beneficial to the pattern recognition of fault diagnosis. To illustrate the generalization performance of the proposed fault diagnosis model, we hope that the fault diagnosis method has a strong generalization ability for unknown data. In order to use all the data for training and testing samples, the cross-validation techniques are applied to measure the generalization performance of the fault diagnosis model. Whereafter, the experimental description of the 4-fold cross-validation method in this section is shown as follows: 1) Firstly, the original fault data set (80 samples for each state) is divided into 4 equal data subsets; 2) The first data subset is used as the testing sample, and the remaining samples as the training sample to calculate the fault recognition accuracy; 3) In order to test the recognition rate of different subsets, step 2) is repeated with an average testing accuracy as an estimate of the prediction accuracy of the unknown data. In summary, the diagnostic recognition rate of the four-fold cross-validation obtained by the above-mentioned four different fault diagnosis models is shown in Fig. 23. To diagnose different testing subsets, different training samples are recycled. Crossvalidation can fully explain the generalization performance of the fault diagnosis method. As can be seen from Fig. 23, when the data training sample is constantly changing, the proposed fault diagnosis method LGDNN has better stability. Compared with other methods, the fault diagnosis results are better when new samples are diagnosed. Owing to that the proposed method can extract the global and local discriminant feature information of the data, which is equivalent to the secondary feature extraction of the data. In addition, the proposed method is more time-efficient than other methods, but it is within an acceptable range. 5.4. Results and discussion As mentioned above, we know that keeping more information in the process of fault diagnosis is crucial for improving the accuracy of fault diagnosis. Traditional deep learning fault diagnosis method fails to achieve both global and local information extraction. Therefore, this paper presents a novel fault diagnosis method of rotating machinery based on LGDNN. The validity of this method is validated by rolling bearing and gear fault vibration signals.

Compared with other combinatorial methods, the effectiveness of the proposed method in terms of feature extraction ability, classification performance and Local-global information extraction capabilities are validated. However, there are still some potential questions and research directions remained to be improved and studied. (1) The core ideology of LGDNN-based rotating machinery fault recognition method is to make use of deep learning to automatically extract global and local fault features information. However, the using of labeled information is required in the classification stage and fine-tuning stage of LGDNN, and unsupervised learning is not fully utilized. In fact, the unsupervised fault diagnosis method is beneficial to engineering practice. Therefore, how to develop a novel unsupervised fault diagnosis method based on deep learning is essential. (2) The parameters optimization of LGDNN needs to be considered. Currently, most of deep learning exist the problems of parameters optimization. The number of layers and the number of neurons still have an important impact on the performance of LGDNN model. Based on the empirical values in the references, there are many shortcomings for model performance. Therefore, the issues that how to choice parameters of neural network should be considered. (3) The proposed LGDNN-based rotating machinery fault diagnosis method incorporates Fisher-CDBN into improved KPCA for improving recognition accuracy of fault diagnosis. And it have a certain advantage compared to the existing methods (i.e. Fisher-CDBN, DBN, CBDN). The proposed LGDNN can extract global and local information of data, and its feature extraction and classification results are better than traditional deep learning fault diagnosis methods. The LGDNN-based fault diagnosis method can basically achieve 100% recognition rate. Compared with DBN and CBDN, the global and local information extraction capabilities of LGDNN-based fault diagnosis methods are verified in Figs. 9, 13 and 14. 6. Conclusion Aiming at the problem that traditional fault diagnosis methods cannot extract global-local fault information of rotating machinery, the new deep learning fault diagnosis method of rotating machinery based on Local-Global Deep Neural Network (LGDNN) is proposed. The core ideology of this method is that the proposed LGDNN model can directly extract local fault information and global fault information from the original signal space by using the proposed Fisher-CDBN and KPCA, afterwards, the fault feature subset with higher sensitivity and discrimination is extracted by using LGDNN. Based on LGDNN, a deep learning fault diagnosis

X. Zhao and M. Jia / Neurocomputing 366 (2019) 215–233

method of rotating machinery based on LGDNN is proposed in this paper. In addition, the experimental results of motor bearing and fan gear vibration signals indicate that the proposed method can implement efficient feature extraction of local-global information and higher precision of fault diagnosis. And it also provides a new reference for deep learning fault diagnosis of rotating machinery. Declaration of Competing Interest The authors declare that there is no conflict of interests regarding the publication of this paper. Acknowledgments This research was supported by the National Natural Science Foundation of China (Grant no. 51675098) and Postgraduate Research & Practice Innovation Program of Jiangsu Province, China (no. SJKY19_0064). The author would appreciate the anonymous reviewers and the editor for their valuable comments. References [1] H. Hu, B. Tang, X. Gong, W. Wei, H. Wang, Intelligent fault diagnosis of the high-speed train with big data based on deep neural networks, IEEE Trans. Ind. Informat. 13 (4) (2017) 2106–2116. [2] R. Liu, B. Yang, E. Zio, et al., Artificial intelligence for fault diagnosis of rotating machinery: a review, Mech. Syst. Signal Process. 108 (2018) 33–47. [3] H. Hu, B. Tang, X.J. Gong, et al., Intelligent fault diagnosis of the high-speed train with big data based on deep neural networks, IEEE Trans. Ind. Informat. 13 (4) (2017) 2106–2116. [4] L. Cui, J.F. Huang, F.B. Zhang, HVSRMS localization formula and localization law: localization diagnosis of a ball bearing outer ring fault, Mech. Syst. Signal Process. 120 (2019) 608–629. [5] H. Shao, H. Jiang, H. Zhao, et al., An enhancement deep feature fusion method for rotating machinery fault diagnosis, Knowl. Based Syst. 119 (2017) 200–220. [6] X. Zhao, M. Jia, Fault diagnosis of rolling bearing based on feature reduction with global-local margin Fisher analysis, Neurocomputing 315 (2018) 447–464. [7] Z. Zhang, S. Li, J. Wang, Y. Xin, Z. An, General normalized sparse filtering: a novel unsupervised learning method for rotating machinery fault diagnosis, Mech. Syst. Signal Process. 124 (2019) 596–612. [8] X.W. Dai, Z.W. Gao, From model, signal to knowledge: a data driven perspective of fault detection and diagnosis, IEEE Trans. Ind. Informat. 9 (4) (2013) 2226–2238. [9] C. Shen, Y. Qi, J. Wang, G. Cai, Z. Zhu, An automatic and robust features learning method for rotating machinery fault diagnosis based on contractive autoencoder, Eng. Appl. Artif. Intell. 76 (2018) 170–184. [10] W. Xiang, F. Li, J. Wang, et al., Quantum weighted gated recurrent unit neural network and its application in performance degradation trend prediction of rotating machinery, Neurocomputing 313 (2018) 85–95. [11] H. Shao, H. Jiang, H. Zhang, T. Liang, Electric locomotive bearing fault diagnosis using novel convolutional deep belief network, IEEE Trans. Ind. Electron. 65 (3) (2017) 2727–2736. [12] Y. Lei, F. Jia, J. Lin, S. Xing, An intelligent fault diagnosis method using unsupervised feature learning towards mechanical big data, IEEE Trans. Ind. Electron. 63 (5) (2016) 3137–3147. [13] Z. Su, B. Tang, Z. Liu, Y. Qin, Multi-fault diagnosis for rotating machinery based on orthogonal supervised linear local tangent space alignment and least square support vector machine, Neurocomputing 157 (2015) 208–222. [14] F. Jia, Y.G. Lei, L. Guo, et al., A neural network constructed by deep learning technique and its application to intelligent fault diagnosis of machines, Neurocomputing 272 (2018) 619–628. [15] Y.R. Wang, Q. Jin, G.D. Sun, et al., Planetary gearbox fault feature learning using conditional variational neural networks under noise environment, Knowl. Based Syst. 163 (2019) 438–449. [16] M. Cerrada, R. Sánchez, C. Li, et al., A review on data-driven fault severity assessment in rolling bearings, Mech. Syst. Signal Process. 99 (2018) 169–196. [17] P. Tamilselvan, P. Wang, Failure diagnosis using deep belief learning based health state classification, Rel. Eng. Syst. Safety 115 (2013) 124–135. [18] F. Jia, Y. Lei, N. Lu, et al., Deep normalized convolutional neural network for imbalanced fault classification of machinery and its understanding via visualization, Mech. Syst. Signal Process. 110 (2018) 349–367. [19] M. Zhang, Z. Ge, Z. Song, et al., Global–local structure analysis model and its application for fault detection and identification, Ind. Eng. Chem. Res. 50 (11) (2011) 6837–6848. [20] K. Zhang, Y. Li, P. Scarf, et al., Feature selection for high-dimensional machinery fault diagnosis data using multiple models and radial basis function networks, Neurocomputing 74 (2011) 2941–2952. [21] H. Wang, J. Chen, G. Dong, Feature extraction of rolling bearing’s early weak fault based on EEMD and tunable Q-factor wavelet transform, Mech. Syst. Signal Process. 48 (1) (2014) 103–119.

233

[22] D. Yu, L. Deng, Deep learning and its applications to signal and information processing, IEEE Signal Process. Mag. 28 (1) (2011) 145–154. [23] L. Ma, H. Li, F. Meng, et al., Global and local semantics-preserving based deep hashing for cross-modal retrieval, Neurocomputing 312 (2018) 49–62. [24] J. Yu, Local and nonlocal preserving projection for bearing defect classification and performance assessment, IEEE Trans. Ind. Electron. 59 (5) (2012) 2363–2376. [25] J. Tenenbaum, V. Silva, J. Langford, A global geometric framework for nonlinear dimensionality reduction, Science 290 (22) (20 0 0) 2319–2323. [26] D. You, X. Gao, S. Katayama, WPD-PCA-based laser welding process monitoring and defects diagnosis by using FNN and SVM, IEEE Trans. Ind. Electron. 62 (1) (2015) 628–638. [27] V. Thabethe, L.J. Thompson, L.A. Hart, et al., Discriminant diffusion maps analysis: a robust manifold learner for dimensionality reduction and its applications in machine condition monitoring and fault diagnosis, Mech. Syst. Signal Process. 34 (1–2) (2013) 277–297. [28] X. He, P. Niyogi, Locality preserving projections, Adv. Neural Inf. Process. Syst. 16 (2003) 585–591. [29] D. Cai, X.F. He, K. Zhou, J.W. Han, H.J. Bao, Locality sensitive discriminant analysis, Proc. IJCAI (2007) 708–713. [30] G.E. Hinton, R. Salakhutdinov, Reducing the dimensionality of data with neural networks, Science 313 (5786) (2006) 504–507. [31] Y. LeCun, Y. Bengio, G. Hinton, Deep learning, Nature 521 (7553) (2015) 436–444. [32] G. Hinton, L. Deng, D. Yu, G. Dahl, A. Mohamed, N. Jaitly, A. Senior, V. Vanhoucke, P. Nguyen, T. Sainath, B. Kingsbury, Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups, IEEE Signal Process. Mag. 29 (6) (2012) 82–97. [33] A. Krizhevsky, I. Sutskever, G.E. Hinton, ImageNet classification with deep convolutional neural networks, Proc. Adv. Neural Inf. Process. Syst. (2012) 1106–1114. [34] F. Jia, Y. Lei, J. Lin, X. Zhou, N. Lu, Deep neural networks: a promising tool for fault characteristic mining and intelligent diagnosis of rotating machinery with massive data, Mech. Syst. Signal Process. 72 (2016) 30–315. [35] L. Liao, W. Jin, R. Pavel, Enhanced restricted Boltzmann machine with prognosability regularization for prognostics and health assessment, IEEE Trans. Ind. Electron. 63 (11) (2016) 7076–7083. [36] T. Ince, S. Kiranyaz, L. Eren, M. Askar, M. Gabbouj, Real-time motor fault detection by 1D convolutional neural networks, IEEE Trans. Ind. Electron. 63 (11) (2016) 7067–7075. [37] H. Lee, R. Grosse, R. Ranganath, A.Y. Ng, Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations, in: Proc. ICML, Montreal, Canada, June 14-18, 2009, pp. 609–616. [38] Y. Xu, D. Zhang, F. Song, et al., A method for speeding up feature extraction based on KPCA, Neurocomputing 70 (4–6) (2007) 1056–1061. [39] K.A. Loparo, Case Western Reserve University. Bearing data center. [EB/O-L] [2017-6.25]. http://csegroups.case.edu/bearingdata-center/home. [40] E. Bechhoefer. High speed gear dataset. [EB/O-L][2017-6.25]. http:// data-acoustics.com/measurements/gear-faults/gear-1/. [41] X. Ding, Q. He, N. Luo, A fusion feature and its improvement based on locality preserving projections for rolling elements bearing faults classification, J. Sound Vib. 335 (2015) 367–383. [42] S. He, J. Chen, Z. Zhou, et al., Multifractal entropy based adaptive multiwavelet construction and its application for mechanical compound-fault diagnosis, Mech. Syst. Signal Process. s76-77 (2016) 742–758.

Xiaoli Zhao received his M.S. degree from LanZhou University of Technology, Lanzhou, China, in 2017. Now he is a Ph.D. candidate in Southeast University, Nanjing, China. His main research interest is Mechanical signal processing, electromechanical equipment intelligent monitoring and fault diagnosis, and brain-computer interaction.

Minping Jia received the B.S. and M.S. degrees in mechanical engineering from Nanjing Institute of Technology (now Southeast University), Nanjing, China, in 1982 and 1985, respectively, and the Ph.D. degree in mechanical engineering from Southeast University, Nanjing, China, in 1991. He is currently a Full Professor with Southeast University, Nanjing, China. His research interests include dynamic signal processing, machine fault diagnosis, and vibration engineering applications.