ARTICLE IN PRESS
JID: NEUCOM
[m5G;November 20, 2017;11:0]
Neurocomputing 0 0 0 (2017) 1–14
Contents lists available at ScienceDirect
Neurocomputing journal homepage: www.elsevier.com/locate/neucom
A two-stage feature selection and intelligent fault diagnosis method for rotating machinery using hybrid filter and wrapper method Xiaolong Zhang, Qing Zhang∗, Miao Chen, Yuantao Sun, Xianrong Qin, Heng Li School of Mechanical Engineering, Tongji University, No. 4800 Cao’an Road, Shanghai 201804, PR China
a r t i c l e
i n f o
Article history: Received 2 November 2016 Revised 7 April 2017 Accepted 7 November 2017 Available online xxx Communicated by Shem Yin Keywords: Fault diagnosis Feature selection Intrinsic time-scale decomposition ReliefF Binary particle swarm optimization Support vector machine
a b s t r a c t Selecting the most discriminative features from the original high dimensional feature space and finding out the optimal parameters for recognition model both have vital influences on the accuracy of fault diagnosis for complicated mechanical system. However, as these two important processes are interactional, conducting them separately may result in inferior diagnostic accuracy. This paper presents a feature selection and fault diagnosis framework which can select the optimal feature subset and optimize the parameters of SVM classifier synchronously and dynamically with the ultimate objective of achieving the highest diagnostic rate. The proposed method is based on a hybrid Filter and Wrapper framework. Since the original feature dimensionality is high which may lead to a lower computation efficiency of the process of synchronous feature selection and SVM parameters optimization, ReliefF is applied for preliminarily selecting some optimal feature candidates. Furthermore, in the reselection process, the reselection state of feature candidates and the values of classifier parameters are all encoded into BPSO particles. The optimal feature subset and the SVM model can be synchronously obtained for fault diagnosis with a high performance. Moreover, in the original feature extraction stage, intrinsic time-scale decomposition (ITD) is utilized to preprocess the nonstationary vibration signal into several PRCs. The statistical parameters in time and frequency domain of PRCs are extracted as the multitudinous original features for each signal sample. Two experimental cases including rolling bearing fault and rotor system fault are implemented to evaluate the proposed scheme. The results demonstrate that compared with some existing methods the proposed one can obtain a better comprehensive performance in the number of optimal features, training time and testing accuracy. © 2017 Elsevier B.V. All rights reserved.
1. Introduction Rotating machinery is a kind of widely used mechanical equipment which plays a vital role in various tough industrial applications. Once a failure occurs, it may cause machinery to break down and decrease machinery service performances such as manufacturing quality, operation safety [1]. Thus, monitoring the condition of rotating machinery accurately and diagnosing the faults effectively are extremely significant for guaranteeing the production safety and reliability. In recent years, a considerable number of researches about condition monitoring and fault diagnosis have been conducted in bearing [2], gearbox [3], motor [4], wind turbine [5], rotor [6], etc. The system composition and the working processing of modern mechanical systems are becoming more and more complicated. Conventional model-based approaches, which require prior knowledge about the system and
∗
Corresponding author. E-mail address:
[email protected] (Q. Zhang).
the physical model, are impractical for obtaining from the complicated mechanical systems [7]. Fortunately, with the developing of advanced measurement technology, we can acquire various sensor data such as acoustics, vibration, structural stress and temperature. And these data can be applied to assess the health condition of mechanical equipment. Hence, data-driven method provides us a promising approach for condition monitoring and fault diagnosis. It has been successfully used in the industrial process monitoring and fault-tolerant control systems [8,9]. In a data-driven method of fault diagnosis, selecting the most representative features and constructing an effective recognition model, which are both important for the fault identification performance, are always interactional. Consequently, constructing them separately may result in inferior diagnostic accuracy. Hence, we intend to propose a method which can select the optimal feature subset and optimize the classifier parameters synchronously with the ultimate objective of achieving the highest diagnostic accuracy. Since the vibration signal contains abundant information which can reveal the machinery condition and it is convenient to be measured, it has become a principal tool for diagnosing rotating
https://doi.org/10.1016/j.neucom.2017.11.016 0925-2312/© 2017 Elsevier B.V. All rights reserved.
Please cite this article as: X. Zhang et al., A two-stage feature selection and intelligent fault diagnosis method for rotating machinery using hybrid filter and wrapper method, Neurocomputing (2017), https://doi.org/10.1016/j.neucom.2017.11.016
JID: NEUCOM 2
ARTICLE IN PRESS
[m5G;November 20, 2017;11:0]
X. Zhang et al. / Neurocomputing 000 (2017) 1–14
machinery problems [10]. Traditional time domain and frequency domain analysis cannot accurately identify different fault conditions because the mechanical vibration signals are always nonstationary and nonlinear. Therefore, it is necessary to introduce advanced time-frequency domain signal processing methods to extract features that can apparently reveal the machinery conditions [11,12]. The wavelet transform (WT) [13], empirical mode decomposition (EMD) [1], local mean decomposition (LMD) [6,12] have been successfully applied in fault diagnosis. Besides, Frei and Osorio proposed a novel time-domain analysis method named intrinsic time-scale decomposition (ITD) in 2007 [14]. ITD can automatically decompose a complex signal into a group of proper rotation components (PRCs) and a trend component [15]. ITD has been well used to process biomedical signal and it is suitable for dealing with vibration signals as well [5,16,17]. Additionally, ITD can perform better in suppressing the endpoint effect and mode mixing and it has a low computational complexity [17]. With the application of ITD, a raw signal can be decomposed into several PRCs for further analysis and feature extraction [18]. Although various characteristic parameters have been introduced into fault diagnosis by researchers such as kinds of entropy-based parameters [12,19,20], the commonly used parameters are some statistical parameters which can be calculated conveniently and understood straightforwardly [10,21]. Some time domain statistical parameters such as root mean square, crest factor, kurtosis, skewness, and frequency domain features such as mean frequency, root mean square frequency may change correspondingly as the condition of rotating machinery deteriorates. After the raw vibration signal is decomposed into several PRCs through ITD, the statistical parameters in time domain and frequency domain from each PRC can compose a potential feature vector which can reveal the machinery condition. However, the raw feature vector is with high dimensionality and it not only includes relevant features for fault diagnosis but also irrelevant and redundant ones [21]. If all the features are employed, it may lead to a complicated classification model with low performance of identification accuracy [22]. Therefore, it is of great importance to carry out an intelligent feature selection method to automatically select representative features which obviously characterize the machinery condition rather than adopting all features [21]. In general, the feature selection methods are broadly categorized as Filter and Wrapper. The Filter method evaluates the importance and relevance of a feature using an evaluation function. Then it ranks the feature weights and selects several highest weight features as a feature subset. So, the Filter method has the advantage that it takes less computational resources, while it is less effective without considering the influence of a classifier. Wrapper method selects the optimal feature subset that can achieve a highest performance with the evaluation of a classifier. So, Wrapper method is more effective, but it is time consuming [19]. As mentioned above, both Filter and Wrapper have merits and drawbacks. It is promising to combine the two feature selection methods as a hybrid one with the application of their merits. ReliefF is a successful Filter method in dealing with multiclass problems. It is an extension from the original Relief which is limited in two-class problems [23,24]. ReliefF gives a higher weight value to a feature that can discriminate from neighbors of different classes and can cluster the same class neighbors in its feature dimensionality. Accordingly, we can apply the ReliefF to select several highest weight features as candidates for the Wrapper procedure. So, the raw features with high dimensionality can be pre-optimized to a feature subset relevant to fault identification in a lower dimensional space. However, the type and performance of the classification algorithm in Wrapper method also have effects on the feature selection and fault diagnosis accuracy. So, it is essential to construct an effective classification model
and reselect the features in the Wrapper stage simultaneously [25]. SVM is a kind of machine learning method based on the structural risk minimization principle and it develops from the statistical learning theory [26]. Many studies in the field of fault diagnosis have demonstrated that SVM can make a reliable decision with small samples [19,27]. However, the parameters of SVM model are necessary to be optimized to achieve an excellent performance. For example, in the commonly used radial basis function (RBF) kernel SVM, penalty factor C and the RBF function parameter γ have a great influence on the performance of the classifier. Poor selection of these parameters can dramatically decrease the classification decision performance. There are lots of literatures and optimization methods about searching the optimal parameters of machine learning model such as genetic algorithm, ant colony algorithm and particle swarm [20,28,29]. However, feature selection and the classifier parameters are interdependent in the stage of Wrapper method. If we conduct two aspects separately, it may not be available to achieve an optimal diagnostic accuracy. For the purpose of extracting the optimal feature subset with excellent fault diagnosis accuracy, it is significant to conduct them synchronously [25]. Actually, this can be considered as a combinatorial optimization problem. Binary particle swarm optimization (BPSO), a swarm intelligent discrete optimization algorithm whose particles are encoded by binary bits, is developed from the traditional PSO method [25]. BPSO has some merits such as less parameters, fast convergence rate and efficient optimization capability. We can use the binary value of each bit in every particle to stand for the selection condition of a feature. Moreover, a certain number of bits in each particle can be encoded for denoting the values of classifier parameters. To achieve a highest performance classification model with the feature number as less as possible, the BPSO can adaptively and concurrently adjust the selection feature subset and the values of classification model parameters. At last, the optimal feature subset and trained SVM model can be utilized to diagnose the newcomers of machinery vibration signal. Some studies have been reported on fault diagnosis by selecting sensitive features from original high dimensional feature set. Islam et al. [4] proposed a hybrid feature selection for bearing diagnosis which employed a GA based Filter analysis to select optimal features and a k-NN average classification accuracy-based Wrapper analysis approach that can select the most optimal features. Van and Kang [21] proposed an automatic fault diagnosis of different rolling element bearing faults using a dual-tree complex wavelet transform, empirical mode decomposition and a novel two-stage feature selection technique. Zhang et al. [31] introduced an intelligent fault diagnosis of rotating machinery using support vector machine with ant colony algorithm for synchronous feature selection and parameter optimization. Lu et al. [32] developed a dominant feature selection method using a genetic algorithm with a dynamic searching strategy for rotating machinery diagnosis. In this paper, we propose an intelligent fault diagnosis scheme utilizing ReliefF for feature preselection and BPSO for synchronous feature reselection and SVM parameters optimization. In the original feature extraction stage of the proposed fault diagnosis framework, each raw vibration signal is decomposed into several PRCs by ITD method. The statistical parameters in time and frequency domains are calculated to form an original high dimensional feature set. Then ReliefF is applied to preselect some top-ranked features advantageous to classification. These features are then used as candidates in the next Wrapper stage which is conducted as feature reselection procedure. We employ the SVM as a classifier whose parameters are optimized synchronously with the searching for optimal feature subset. The selection state of each preselected feature in the feature reselection process and the values of SVM parameters are encoded by the bits of each particle in BPSO
Please cite this article as: X. Zhang et al., A two-stage feature selection and intelligent fault diagnosis method for rotating machinery using hybrid filter and wrapper method, Neurocomputing (2017), https://doi.org/10.1016/j.neucom.2017.11.016
ARTICLE IN PRESS
JID: NEUCOM
[m5G;November 20, 2017;11:0]
X. Zhang et al. / Neurocomputing 000 (2017) 1–14
3
algorithm in the feature reselection process. The particle swarm updates the combination of features and parameter values for the goal of achieving highest classification accuracy with number of features as less as possible. The feature selection and training of a model can be conducted with training samples off-line. Then under on-line conditions, the machinery state and fault types can be identified with the assistant of the known optimal feature indexes and the trained outstanding classifier. Based on the above analyses, different from the traditionally used fault diagnosis methods which conduct the feature selection and classification separately, the proposed method can dynamically adjust the selected parameters and the parameters of SVM until the best diagnosis accuracy is arrived. Besides, we use the ReliefF to preliminarily select some feature candidates in order to increase the efficiency of synchronous feature reselection and SVM parameters optimization process. The rest of paper is organized as follows. Section 2 gives a brief review of the related algorithms including ITD, ReliefF, BPSO and SVM. Section 3 presents the framework of the proposed hybrid Filter and Wrapper feature selection scheme. Section 4 describes the application results and analyses on two experimental cases. Finally, Section 5 concludes the paper.
Kira and Rendell proposed the original Relief method which was limited in two-class classification problems in 1992 [24,33]. Relief method evaluates the feature weights based on the relevance between features and classification. The feature whose weight is less than a certain threshold will be discarded. The key principle of Relief is that a distinguishing feature can guarantee the distance between instances from similar classification to be near while the distance between instances from different classifications to be longer. ReliefF proposed by Kononenko in 1994 is an extension of Relief [34]. The improved algorithm overcomes the limitations of Relief’s application only in two-class problems. ReliefF can deal with multiclass problems with more robust and be able to deal with incomplete and noisy data. Given a training sample set R = {R1, R2, · · · Rm}, each sample contains p features denoted by Ri = {Ri1 , Ri2 , · · · Rip }, 1 ≤ i ≤ m. When dealing with the two-class problems, ReliefF algorithm randomly selects a sample Ri from the training set, then searches for two kinds of nearest neighbors: one from the same class, called nearest hit H, and the other from the different class, called nearest miss M. For each feature t(1 ≤ t ≤ p), the weight of feature t denoted by Wt is updated as:
2. Brief review of ITD, ReliefF, BPSO and SVM
Wt = Wt − diff(t, Ri , H )/r + diff(t, Ri , M )/r
(5)
2.1. Intrinsic time-scale decomposition (ITD)
where function diff(t, Ri , Rj ) means the difference between the values of the feature t for two sample Ri , Rj . For normal features diff(t, Ri , Rj ) is defined as:
Given a real-valued discrete signal Xt to be analyzed, we define a baseline-extracting operator L, a baseline signal Lt and a proper rotation Ht . To simplify the notation, let Xk denote X(tk ) and Lk denote L(tk ). The detailed ITD method is described as follows [14].
diff(t, Ri , R j ) =
(1) Extract the local extreme points as Xk of time series Xt and the corresponding time {τk , k = 1, 2, . . . , N}. For convenience, τ0 = 0 is defined. The baseline extracting operator L in interval (τk , τk+2 ] is defined as follows:
L − Lk LXt = Lt = Lk + k+1 (Xt − Xk ), t ∈ (τk , τk+1 ] Xk+1 − Xk where
(1)
τk+1 − τk Lk+1 = α [Xk + (Xk+2 − Xk )] + (1 − α )Xk+1 (2) τk+2 − τk and α is a linear scale factor which can be used to adjust the amplitude of proper rotation, α ∈ [0, 1]. Typically, α = 0.5 is set during computation. (2) Once the baseline signal Lt = LXt is calculated in accordance with step (1), the proper rotation component (PRC) Ht can be extracted as
Xt = LXt + (1 − L )Xt = Lt + Ht
(3)
where Ht = (1 − L )Xt is namely the proper rotation component which denotes the local relative high frequency of the decomposed signal. (3) Take the baseline signal Lt as the input signal for next decomposition and repeat step (1), (2). The decomposition is terminated when the baseline signal Lt becomes monotone or a constant. Therefore, the raw signal Xt is decomposed into a series of PRCs and a trend-like baseline.
Xt =
p
Hti + Ltp
(4)
i=1
where p is the number of the obtained PRCs. 2.2. ReliefF ReliefF, one of Relief family algorithms, is a Filter method that selects the optimal feature subset by the weight of each feature.
0 1
Rit = R jt Rit = R jt
(6)
and for numerical features as:
Rit − R jt diff(t, Ri , R j ) = maxt − mint
(7)
where max t and min t mean the maximum value and minimum value of feature t from all set S. When dealing with l(l > 2) class problems, given that the class label of training set is C = {c1 , c2 , · · · , cl }, ReliefF algorithm searches for k nearest neighbors (denoted by set H) of Ri from the same class and also k nearest neighbors (denoted by set M(c)) from each of different calsses. Now the weight of feature t is updated as:
Wt : = Wt −
k j=1
×
k
diff(t, Ri , H j )/(m · k ) +
c∈ / class (Ri )
diff(t, Ri , M j (c )) /(m · k )
p( c ) 1 − p(class(Ri )) (8)
j=1
where Wt is the weight of feature t. k is the neighbor number which is safely set 10 for most purposes [24]. p(c) is the prior probability of class c (estimated by the channel-selection sets). Function diff(t, Ri , Hj ), diff(t, Ri , Mj (c)) are defined as formula (6). m is the iteration times which should be less than the feature numbers. 2.3. Binary particle swarm optimization (BPSO) PSO is a global optimization algorithm based on swarm intelligence proposed by Kennedy and Eberhart in 1995 [30]. PSO searches the optimal solution in a complex space via the collaboration and competition of individuals. In PSO technology, the swarm composed from these particles is initialized randomly in solution space. Each particle is a potential solution represented by its position, speed and fitness value. When searching for the optimal value, each particle makes a global search via iteration method. In each iteration, for each particle there is a current personal best value called Pbest and from all particles there is a global best value
Please cite this article as: X. Zhang et al., A two-stage feature selection and intelligent fault diagnosis method for rotating machinery using hybrid filter and wrapper method, Neurocomputing (2017), https://doi.org/10.1016/j.neucom.2017.11.016
ARTICLE IN PRESS
JID: NEUCOM 4
[m5G;November 20, 2017;11:0]
X. Zhang et al. / Neurocomputing 000 (2017) 1–14
called Gbest. The velocity and position of the particles at the next iteration are updated as follows:
vid (t + 1 ) = wvid (t ) + c1 r1 ( pid (t ) − xid (t )) + c2 r2 ( pgd (t ) − xid (t )) (9) xid (t + 1 ) = xid (t ) + vid (t + 1 )
(10)
where i is the index of a particle. t is the iteration number. c1 , c2 are acceleration constants. r1 and r2 are random numbers uniformly distributed in the interval [0, 1]. pid and gid represent Pbest and Gbest respectively vid and xid are the velocity and position of the ith particle in the dth dimensionality of search space and vid is limited in interval [−vmax , vmax ]. w is inertia weight that controls the impact of the previous velocity of the particle on its current one. w can be adjusted during the iteration as the following equation
wmax − wmin wt = wmax − × iter itermax
(11)
1 1 + e−vid (t+1)
(12)
The position of each particle is updated using the following formula:
xid (t + 1 ) =
1 0
f (x ) = sign(w · x + b ) = sign ∗
∗
m
a∗i yi
( x · xi ) + b
∗
(15)
i=1
In a nonlinear input space, the input is transformed into a higher dimensional linear feature with the transformation of kernel function φ (x). After transforming the training samples can be linearly separated by applying the linear SVM again. The scalar product xi · xj in input space is transformed as K (xi , x j ) = φ (xi ) · φ (x j ) in a feature space. The linear SVM is trained in this feature space with the following forms:
f (x ) = sign
= sign
m
a∗i yi
(φ (x ) · φ (xi )) + b
∗
i=1 m
a∗i yi K
( x · xi ) + b
∗
(16)
i=1
where wmin is the minimal inertia weight. wmax is the maximal inertia weight. iter is the current iteration number and itermax is the maximum iteration number. Kennedy and Eberhart introduced binary particle swarm optimization (BPSO) in 1997 and intended to solve combination optimization problems in discrete binary space [35]. In BPSO algorithm, the velocity of a particle is still updated by Eq. (9), where xid , pid and gid are restricted to 0 or 1. The velocity vid indicates the probability of the particle being as 0 or 1. A sigmoid function is introduced to map vid to the interval (0, 1) by the following equation:
s(vid (t + 1 )) =
m ∗ ∗ or i=1 ai yi (x · xi ) + b = 0 and the classification function can be written as
if rand () < s(vid (t + 1 )) otherwise
(13)
Commonly used kernel functions include dot product kernels, polynomial kernels, sigmoid kernels and RBF kernels. RBF kernel is employed in this paper due to its universal application and superior performance in many cases [25,36]. The form of RBF kernels is as following forms:
K (x, x ) = exp
||x − x ||2 − or exp(−γ ||x − x ||2 ) σ2
(17)
When using RBF as the kernel function, two major parameters applied in SVM are C and γ . In order to achieve a higher classification accuracy, it is necessary to search for the optimal C and γ. For the multi-class classifier, there are three major multi-class SVM classification strategies: one-against-all, one-against-one and DAGSVM. Hsu and Lin have made a comparison study among these strategies and concluded that one-against-one is suitable for piratical use [37].
where, rand() is a random number selected from a uniform distribution in [0,1].
3. The framework of a hybrid filter and wrapper feature selection scheme
2.4. Support vector machine (SVM)
3.1. An overview of the proposed feature selection and fault diagnosis method
An essential of SVM is to map the input vectors either linearly or nonlinearly into a higher dimensional space by means of a kernel function. Then it constructs a hyperplane in the feature space which can linearly separate the two classes. SVM can realize accurate classification not only in two-class space with large number of features but also in multi-class space. Given a training dataset consist of instance and label (xi , yi ), i = 1, 2, · · · m where xi ∈ Rn and yi ∈ {+1, −1}. The optimal separating hyperplane f (x ) = w · w + b can be obtained by solving the following optimization problem:
1 w · w + C ξi 2 m
Minimize w,b,ξ
(14)
i=1
Subject to: yi (w · xi + b) ≥ 1 − ξi , i = 1, 2, · · · , m
ξi ≥ 0 , i = 1 , 2 , · · · , m where C is a penalty parameter which determines the degree of punishment and ξ i is the non-negative slack variable. To solve this optimization problem, Lagrange multipliers α i can be introduced into its dual optimization model. Once the optimal value αi∗ is solved the optimal hyperplane parameters w∗ and b∗ can be determined. The hyperplane is described as w∗ · x + b∗ = 0
A flowchart of the proposed feature selection scheme and fault diagnosis procedure is shown in Fig. 1. The proposed feature selection and fault diagnosis method includes original feature extraction, feature preselection, feature reselection and fault identification as following description. (1) Original feature extraction: The raw vibration signals are divided into training datasets and testing datasets. Original feature extraction is performed as following steps. Each signal is decomposed into several PRCs through ITD method. Then the first M PRCs is selected to represent the character of original signal. Both time domain (12 statistical parameters) and frequency domain (4 statistical parameters) feature values are calculated for each PRC. So, an original 16M dimensional feature vector is extracted from each raw signal. The dimensionality of all features of training datasets is Nd × 16M, in which Nd is the sample number of the whole training datasets. (2) Feature preselection: If all the original high dimensional features are set as inputs of a classifier, it may consume lots of computational time while the results may be dissatisfactory. So, in this procedure, ReliefF is applied as a Filter to
Please cite this article as: X. Zhang et al., A two-stage feature selection and intelligent fault diagnosis method for rotating machinery using hybrid filter and wrapper method, Neurocomputing (2017), https://doi.org/10.1016/j.neucom.2017.11.016
JID: NEUCOM
ARTICLE IN PRESS
[m5G;November 20, 2017;11:0]
X. Zhang et al. / Neurocomputing 000 (2017) 1–14
5
Fig. 1. The flowchart of the proposed feature selection and fault diagnosis method.
preselect Np higher weight features from the original high dimensional space. These preselected features by ReliefF are candidates for the next feature reselection step. (3) Feature reselection: In this procedure, reselecting features and optimizing SVM parameters are conducted synchronously for getting an optimal SVM model with a highest training accuracy. The outputs of this procedure are the optimal feature subset and an optimal SVM model. (4) Fault identification: We can use the indexes of optimal features to select the corresponding optimal features from the testing original high dimensional feature vectors. Then the optimal features from testing dataset act as new inputs of the trained SVM model. Eventually, the machinery condition and fault identification result can be obtained for the testing dataset.
tionary signals and it is taken as a tool to analyze the raw vibration signals in this paper. Each sample of the original normal signal and different types of fault signal is decomposed into several PRCs and a residual signal by ITD method. The first several level PRCs containing almost all the validation information of the original signal are selected as target signals for feature extraction. When a fault occurs, the time and frequency domain statistical parameters will change and some of them sensitive to different types of fault can identify the fault type. For each PRC, 12 time domain and 4 frequency domain common used parameters as presented in Table 1 are selected as features [10]. Thus 16 × M features are calculated for each signal sample and M is the number of selected PRCs. In this paper, we select the first 5 PRCs for further analysis, so the original high dimensional feature vector for each signal sample is 80.
3.2. Original feature extraction of raw vibration signals
3.3. Features ranking and preselection using Relieff
Condition monitoring and fault diagnosis have a significant importance for maintaining the rotating machinery running normally. Vibration signals can reflect the condition of rotating machinery and they can be collected through vibration sensors and a DAQ system. However, as the original signal is often nonstationary, deep analysis should be performed on for exploring more machinery condition information. ITD is self-adapted in decomposing nonsta-
The original high dimensional feature set from the training dataset not only contains some sensitive features but also redundancy ones. If the original high dimensional feature set is used as an input of classifier directly, it will take a lot of computing resources and decrease the computational efficiency. As an effective Filter method, ReliefF can evaluate the weight of each feature and preselect Np higher weight features to combine a feature subset for
Please cite this article as: X. Zhang et al., A two-stage feature selection and intelligent fault diagnosis method for rotating machinery using hybrid filter and wrapper method, Neurocomputing (2017), https://doi.org/10.1016/j.neucom.2017.11.016
ARTICLE IN PRESS
JID: NEUCOM 6
[m5G;November 20, 2017;11:0]
X. Zhang et al. / Neurocomputing 000 (2017) 1–14 Table 1 Time domain and frequency domain statistical parameters. Time domain statistical parameters Peak
F1 = max |x(n )|
Mean
F3 =
N
n=1
N
Peak-peak
x (n )
Standard deviation
(x(n )) N
N 3 n=1 (x (n )−F3 ) (N−1 )F43 N n=1
Root mean square
F5 =
Skewness
F7 =
Crest factor Impulse factor
F9 = F1 /F5 F11 = N (
2
F2
n=1 |x (n )|)/N
F2 = max
(x(n )) − min(x(n )) N n=1
F4 =
N
F6 = (
Kurtosis
F8 =
Clearance factor Shape factor
F10 = F1 /F6 F12 = N
Frequency center
F13 = F15 =
K
k=1
s (k )
Standard deviation frequency
K
K f k s (k )
k=1 K k=1 s (k )
Root mean square frequency
)
n=1
(
n=1
F6
|x(n )|)/N
Frequency domain statistical parameters Mean frequency
N−1
|x ( n )| 2 N
N 4 n=1 (x (n )−F3 ) (N−1 )F54
Root amplitude
where x(n) is a signal series for n = 1, 2, ..., N, N is the number of data point.
(x(n )−F3 )2
√
F14 =
K
F16 =
k=1
K
f k2 s(k )
k=1
K
s (k )
2
( fk −F15 ) s(k )
K k=1 s (k )
k=1
where s(k) is a spectrum for k = 1, 2, ..., K, K is the number of spectrum lines; fk is the frequency value of the kth spectrum line.
Fig. 2. Particle encoding in BPSO.
the next Wrapper method. Thus, the preliminary optimal feature subset combined in a low dimensionality can be extracted from the original high dimensional feature set. Before the preselection procedure, feature values are normalized to [0, 1] range since normalization can eliminate the influence caused by different dimensionalities and orders of magnitude. The normalization equation is as follows
x =
x − mina maxa − mina
(18)
where x is the normalized value. x is the original feature value. min a , max a are minimum value and maximum value of feature a respectively. 3.4. Particle encoding in feature reselection The dimensionality of the preselected feature subset is lower than the original one. However, there may be also some redundancy features in it. In order to obtain an optimal feature subset that can be used to identify fault types with high accuracy by using a classifier, we can employ a Wrapper method. In the following process, BPSO-SVM is applied to search for the optimal feature subset and optimal SVM parameters synchronously. RBF kernel function is selected and two parameters C and γ will be optimized along with the optimal feature subset searching. During the optimization, each particle is encoded to represent feature selection state and the values of parameter C and γ as shown in Fig. 2. For each candidate from the previous preselection process, its selection state is represented by one bit of the coded sequence in every BPSO particle. Consequently, Np bits are used to represent feature selection state in a particle. The bit value of “1” means the feature is selected while the value of “0” represents the feature is discarded. For parameter C and γ , Nc and Ng bits are used to represent their values respectively depending on their ranges and precision for optimization. The decimal value of a parameter can be calculated from the value of each bit through the following equation:
N
xd =
i=1
(bit (i ) · 2i−1 2N − 1
(xd max − xd min ) + xd min
(19)
Fig. 3. The flow chart of synchronous feature reselection and SVM parameters optimization with BPSO.
where xd is the decimal value of C or γ . N is the number of bits. i means the ith bit of whole Nc or Np bits. bit(i) is the value (1 or 0) of the ith bit. xdmin , xdmax means the searching interval [xdmin , xdmax ] of C or γ . As results illustrated in Fig. 2, each particle is combined with Np + Nc + Ng bits to represent the selection state of each feature and the values of parameter C and γ . 3.5. Synchronous feature reselection and SVM parameters optimization with BPSO Fig. 3 illustrates the procedure of synchronous feature reselection and SVM parameters optimization with BPSO. After the
Please cite this article as: X. Zhang et al., A two-stage feature selection and intelligent fault diagnosis method for rotating machinery using hybrid filter and wrapper method, Neurocomputing (2017), https://doi.org/10.1016/j.neucom.2017.11.016
JID: NEUCOM
ARTICLE IN PRESS
[m5G;November 20, 2017;11:0]
X. Zhang et al. / Neurocomputing 000 (2017) 1–14
7
Fig. 4. The experiment setup of bearing fault diagnosis in Case 1.
BPSO parameter initialization, particle encoding and initialization, searching cycle for the optimal feature subset and best SVM parameters begins. When the BPSO iteration starts, for each particle, a part of bits can represent whether a feature from the preselection feature subset will be reselected and the other bits represent the values of SVM parameters. Then the fitness value of each particle is calculated as function (20) and the Gbest value which indicates the current global best result will be obtained.
f it val =Atr −α Ns
(20)
where fitval is the fitness value, Atr is the training accuracy, Ns is the number of selected features, α is a weight factor which is given a small value like 0.01 for instance. If the two feature subsets have a same accuracy, the feature subset with a smaller selection feature number will have a lager fitness value. When all reselected features of training datasets are used to train the SVM model, k-fold cross-validation is employed to evaluate the classification average accuracy. If the iteration times is not arriving, the searching process continues and all the particle bits’ values are updated. Ultimately, the optimal reselected features with their feature indexes in the original high dimensionality will be obtained and the optimal SVM parameters with highest training diagnostic accuracy will be obtained at the same time. The optimal feature indexes can guide us to select the corresponding optimal features from the testing dataset. The machinery condition and fault diagnosis result can be obtained by applying the trained SVM. 4. Application results and analyses on two experiment cases 4.1. Experimental setup and vibration data acquisition To evaluate the effectiveness of the proposed method in fault diagnosis of vibration signals from actual measurement, results and analyses of utilizing the proposed method on two cases of experiment are presented in this section. Case 1: Bearing fault diagnosis To verify the effectiveness of the proposed method in bearing fault diagnosis, some measured bearing vibration data is used as an analysis object. The bearing test data is taken from Bearing Data Center of Case Western Reserve University [38]. As shown in Fig. 4 the experiment stand consists of a 1.5 kW induction driving motor, a torque transducer and a load motor. Single point faults were introduced to the test deep groove ball bearing using electrodischarge machining. In order to simulate different severity levels, the fault diameters were set as 0.178 mm for slight fault and 0.533 mm for serious fault. The vibration data was collected with an accelerometer which was mounted at the drive end. The sampling frequency was 12 kHz and the speed of motor was 1797 rpm without motor load applied on.
Fig. 5. The experiment setup of rotor system fault diagnosis in Case 2.
The experimental vibration signals consist of 7 conditions including normal (Nor), slight inner race fault (IFSl), slight outer race fault (OFSl), slight ball fault (BFSl), serious inner race fault (IFSe), serious outer race fault (OFSe) and serious ball fault (BFSe). Actually, the experimental data analysis is a 7 class recognition problem. The whole continuous sampling data of each class is split into 50 samples with 2400 points of segment length and for each class 30 samples are selected randomly as training datasets and the other 20 samples are left as testing datasets. Totally, for 7 classes there are 350 samples, out of which 210 samples are training datasets and the rest 140 samples are testing datasets. The training datasets are employed to search for optimal features with their indexes in original high dimensional feature set and train the classifier model. Then these achievements can be applied on testing datasets directly and validate the performance of the feature selection method as while as the performance of the proposed fault diagnosis method. Case 2: Rotor system fault diagnosis In this case, the experiment was conducted on a rotor test rig as presented in Fig. 5. The rotor system is driven by a motor whose power is transferred to the transmission shaft and rotating plates through the driving belt and transmission shaft support. There are threaded holes for installing the mass block around the circumferential of the rotating plates. And the rotor system unbalance fault can be simulated in this way. Moreover, to simulate some bearing faults, the normal bearing installed in the bearing pedestal (for test bearing supporting) can be replaced by inner race fault bearings and outer race fault ones. We conducted 5 types of conditions on this rotor system including normal (Nor), rotor mass unbalance (Unb), bearing inner race fault (InF), bearing outer race fault (OutF), rotor mass unbalance and simultaneous outer race fault (OutFUnb). The experiment was carried out at the rotating speed of 1492 rpm. The vibrating sensor was mounted on the bearing pedestal (for normal bearing supporting) to probe the vertical vibration at 12.8 kHz sampling frequency. For each condition 200 samples of vibration signals were acquired with 0.2 s sampling length for each one. For each class, 120 samples are selected randomly as training datasets and the other 80 samples are left as testing datasets. Totally, for 5 classes there are 10 0 0 samples, out of which 600 samples are training datasets and the rest 400 samples are testing datasets.
Please cite this article as: X. Zhang et al., A two-stage feature selection and intelligent fault diagnosis method for rotating machinery using hybrid filter and wrapper method, Neurocomputing (2017), https://doi.org/10.1016/j.neucom.2017.11.016
JID: NEUCOM 8
ARTICLE IN PRESS
[m5G;November 20, 2017;11:0]
X. Zhang et al. / Neurocomputing 000 (2017) 1–14
Fig. 6. Waveforms in time domain of 7 vibration signals from Case 1 in different conditions: (a) normal, (b) slight inner race fault, (c) slight outer race fault, (d) slight ball fault, (e) serious inner race fault, (f) serious outer race fault, (g) serious ball fault.
Fig. 7. Waveforms in time domain of 5 vibration signals from Case 2 in different conditions: (a) normal, (b) rotor mass unbalance, (c) bearing inner race fault, (d) bearing outer race fault, (e) rotor mass unbalance and simultaneous outer race fault.
4.2. Feature selection and fault diagnosis results utilizing the proposed method for two cases Fig. 6 illustrates the typical vibration signal waveforms of normal and 6 fault types of bearing in Case 1 experiment. And Fig. 7 demonstrates the typical normal and 4 fault types of rotor system in Case 2 experiment. From Figs. 6 and 7 it is found that the shock characteristics and amplitudes of vibration signals in different conditions are different with each other. Nevertheless, it is a challenging task to accurately distinguish a large amount of vibration signals in different types of conditions only according to the time domain waveforms [39]. Therefore, it is necessary to extract
features in a deeper level and introduce an intelligent fault diagnosis method which can recognize the faults automatically. With the assistance of ITD method, original vibration signals are decomposed into a series of PRCs with deceasing frequencies. Fig. 8 illustrates the first 5 PRCs of the normal bearing vibration signal and the slight inner race fault vibration signal from Case 1. And Fig. 9 shows the first 5 PRCs of normal and rotor mass unbalance and simultaneous outer race fault from Case 2. On these bases, time domain and frequency domain features can be calculated for each PRC. So, a high dimensional feature set can be obtained for each sample. The first 5 level PRCs are chosen to be analyzed and 12 time domain statistical parameters and 4 frequency domain statistical parameters are calculated for each PRC. Therefore, for each sample, 80 features compose the original high dimensional feature vector. Consequently, the last original high dimensional feature set combined by all samples from different conditions is 240 × 80 for Case 1 and 600 × 80 for Case 2. These will lead to a cost of large amount of training time and lower performance of fault diagnosis accuracy if all of them are fed into classifiers as inputs [3]. Thus, a feature selection method is expected to be applied to select the optimal features with highest fault diagnosis accuracy. The Filter method ReliefF is employed on the training datasets to preselect the optimal feature candidates. The weight scores provide useful information to select a discriminating feature subset. Fig. 10 illustrates the weight scores of the 80 features for Case 1. It indicates that as many as 19 weight scores are larger than 0 which means that these features are relevant to classification. For the consideration of computing complexity of the next feature reselection stage, we select the 10 features with highest weight scores as the candidates for feature reselection in this case. Fig. 11 illustrates the weight scores of 80 features for Case 2. However, it can be seen that only 7 weight scores are greater than 0. For the consideration of retaining more feature information, we select these whole 7 features as feature reselection candidates in this case. The results of this feature preselection stage are listed in the second column of Table 2. Then 10 and 7 topmost weight score features for Case 1 and Case 2 are separately selected to combine two preselection feature subsets. So far, the original 240 × 80 and 600 × 80 high dimensionalities can be reduced to 240 × 10 and 600 × 7. The subsets combined by the preselected features cannot guarantee high recognition accuracies which are also influenced by the performance of classifiers. Given this, feature reselection to search for the optimal feature subset and classifier parameters optimization are required to be conducted at the same time. The BPSO-SVM feature reselection and fault diagnosis procedure is described previously. The maximum iteration of BPSO is set as 100 and the particle number is 20. For Case 1 and Case 2, based on the feature numbers in preselection stage, 10 and 7 bits in particle encoding step are separately set to represent the state of feature in reselection stage. As discussed in the former section, RBF kernel function is used in SVM classification and parameter C is limited in interval [0.0 01, 10 0] encoded by 8 bits for each particle. Meanwhile, the parameter γ is limited in interval [0.1, 500] encoded by 16 bits for each particle. Consequently, the total number of bits for each encoded particle is 34 for Case 1 and 31 for Case 2. When the SVM is performed in multiclass classification model 10-fold cross validation is applied in order to achieve better estimates of the accuracy rates of the classifiers [40]. The calculation is accomplished on a PC with an Intel Core i3 3.4 GHz CPU and 4GB RAM with the scientific computation software MATLAB2014a in Windows 7. And in our algorithm, the SVM classifier is implemented by a toolbox named LIBSVM [41]. The preselected features of training datasets are conducted as inputs of SVM. The performance of a feature subset and SVM parameters for fault diagnosis are evaluated as Formula (20) in the
Please cite this article as: X. Zhang et al., A two-stage feature selection and intelligent fault diagnosis method for rotating machinery using hybrid filter and wrapper method, Neurocomputing (2017), https://doi.org/10.1016/j.neucom.2017.11.016
ARTICLE IN PRESS
JID: NEUCOM
[m5G;November 20, 2017;11:0]
X. Zhang et al. / Neurocomputing 000 (2017) 1–14
9
Fig. 8. ITD results of vibration signals from Case 1 in condition of (a) normal and (b) slight inner race fault.
Fig. 9. ITD results of vibration signals from Case 2 in condition of (a) normal and (b) rotor mass unbalance and simultaneous outer race fault.
Table 2 The results of feature selection and fault diagnosis with the proposed method.
Case 1 Case 2
Preselected features’ indexes
Reselected features’ indexes
Optimal feature number
Optimal SVM parameters
Average training accuracy
Average testing accuracy
F26 , F21 , F26 , F16 ,
F76 , F78 , F73, F27
4
100%
98.57%
F26 , F66 , F72
3
C = 94.90, γ = 101.69 C = 79.22, γ = 116.97
100%
99.75%
F71 , F76 , F78 , F42 , F61 , F73 , F27 , F77 F21 , F66 , F77 , F72 , F61
Please cite this article as: X. Zhang et al., A two-stage feature selection and intelligent fault diagnosis method for rotating machinery using hybrid filter and wrapper method, Neurocomputing (2017), https://doi.org/10.1016/j.neucom.2017.11.016
ARTICLE IN PRESS
JID: NEUCOM 10
[m5G;November 20, 2017;11:0]
X. Zhang et al. / Neurocomputing 000 (2017) 1–14 Table 3 Case 1 fault diagnosis results of testing samples in different conditions. Bearing condition
Nor
IFSl
OFSl
BFSl
IFSe
OFSe
BFSe
Testing accuracy
Nor IFSl OFSl BFSl IFSe OFSe BFSe
20 0 0 0 0 0 0
0 20 0 0 0 0 0
0 0 20 0 0 0 0
0 0 0 20 0 0 2
0 0 0 0 20 0 0
0 0 0 0 0 20 0
0 0 0 0 0 0 18 Average
100% 100% 100% 100% 100% 100% 90% 98.57%
Table 4 Case 2 fault diagnosis results of testing samples in different conditions.
Fig. 10. ReliefF weight scores of all features for Case 1.
Fig. 11. ReliefF weight scores of all features for Case 2.
iterative process of optimization. The feature reselection and SVM model parameters optimization progresses synchronously as described in Fig. 3. Ultimately, the optimal feature subset and SVM model are obtained. As listed in column 3 of Table 3, the optimal feature is F76 , F78 , F73, F27 for Case 1 and F26 , F66 , F72 for Case 2. So, the total element numbers of optimal feature subset are 4 for Case 1 and 3 for Case 2. It is important to note that the feature preselection with ReliefF and feature reselection with BPSO are all conducted in the space of training datasets. Then we can directly select the optimal features from testing datasets with the indexes as listed in Table 3 and identify the fault type by using the trained SVM model. Thus, the performance of optimal feature subset and the optimal SVM model can be validated on the testing dataset. The fault diagnosis results of testing dataset are provided in Table 3 as well. The average testing classification accuracy can reach up to 98.57% for Case 1 and 99.75% for Case 2. The process is repeated for 10 times and we select the parameter C and γ with the best diagnostic accuracy for testing dataset as the results of the scheme as demonstrated in Table 3. For observing visually, the testing dataset scatter diagrams in the reselected feature dimensionality are plotted and shown in Fig. 12. Meanwhile, the testing dataset classification results of each type are listed in Table 3 for Case 1 and Table 4 for Case
Rotor condition
Nor
Unb
InF
OutF
OutFUnb
Testing accuracy
Nor Unb InF OutF OutFUnb
80 0 0 0 0
0 80 0 0 0
0 0 80 0 0
0 0 0 79 0
0 0 0 1 80 Average
100% 100% 100% 98.75% 100% 99.75%
2. Table 3 suggests that for Case 1 the recognition accuracy can reach up to 100% in each condition except BFSe. Correspondingly, Fig. 12(a) indicates that testing sample in Nor, IFSl, OFSl, IFSe, OFSe can be separated with each other distinctly and there is some overlap between BFSl and BFSe in the first optimal features dimensionality. Besides, Fig. 12(a) also indicates that the distances between Nor and other fault conditions are far and this contributes to the accurate distinction between normal and fault conditions. Table 4 shows that the Nor, Unb, InF and OutFUnb can be accurately classified by 100% and one OutF sample is mistakenly classified into OutFUnb. Fig. 12(b) shows that the between class distance of OutF and OutFUnb is small and this is not conducive to accurately discriminate them. In another way of consideration, it demonstrates that in this experiment the influence on the rotor system performed by rotor mass unbalance fault is not obvious as by other fault types. However, it’s comforting that the final average testing accuracy is considerable. 4.3. Comparison with other feature selection and fault diagnosis methods To compare the performance of feature selection and fault diagnosis with other frameworks, we present 5 extra methods described in Table 5, parts of which have been studied by other investigators as listed. In Method 1, ReliefF is used to select some features with higher weight scores. These features are taken as inputs of SVM without feature reselection stage and the BPSO only optimizes SVM parameters. This method is set for comparison with the intention of evaluating the effect of feature reselection process in the proposed method. In Method 2, all features from original high dimensionality and SVM parameters are encoded into BPSO particle bits. Afterward, the feature selection and SVM parameters optimization are performed simultaneously. Compared with the proposed method, feature preselection with ReliefF is not used in Method 2 [42]. Thus, the performance of feature preselection process can be evaluated by comparing with the proposed method. In Method 3, feature preselection with ReliefF and reselection with BPSO are both conducted. However, SVM parameters are set as default values in LIBSVM. Hence, SVM parameters optimization is not conducted as the proposed method. Besides, as the classifier and optimization algorithm both have an influence on the results of feature reselection and SVM parameters optimization [43]. Method 4 and Method 5 apply a different classifier and an optimization algorithm as substitutions of BPSO + SVM fault diagnosis
Please cite this article as: X. Zhang et al., A two-stage feature selection and intelligent fault diagnosis method for rotating machinery using hybrid filter and wrapper method, Neurocomputing (2017), https://doi.org/10.1016/j.neucom.2017.11.016
ARTICLE IN PRESS
JID: NEUCOM
[m5G;November 20, 2017;11:0]
X. Zhang et al. / Neurocomputing 000 (2017) 1–14
11
Fig. 12. Distributions of testing samples in the first three optimal features dimensionality under different conditions for Case 1 (a) and Case 2 (b). Table 5 The description of the proposed method and other schemes for comparison. Feature selection and fault diagnosis scheme The proposed method Method 1 Method 2 [42] Method 3 Method 4 [36,44] Method 5 [40]
Method description Preselect features using ReliefF + Reselect features and optimize SVM parameters synchronously using BPSO + Diagnose fault types using the optimal feature subsets and the trained SVM model. Select several features with higher weight scores using ReliefF + Optimize SVM parameters and diagnose fault types using the whole selected features by ReliefF. Select features directly from original high dimensional feature sets with BPSO and optimize the SVM parameters synchronously + Diagnose faults with the selected features and SVM model optimized by BPSO. Preselect the features with ReleifF + Reselect features from the preselected features using BPSO + Diagnose fault types with the reselected features and SVM model with default parameters. ReliefF + SVM + GA: using GA as a substitution optimization method in the proposed method. ReliefF + BPNN + BPSO: using BPNN as a substitution classification method in the proposed method.
Table 6 Performance comparison with other fault diagnosis schemes for Case 1. Feature selection and fault diagnosis scheme
Optimal parameters
Number of optimal Training time features (s)
Average training accuracy
The proposed method Method 1 Method 2 Method 3 Method 4 Method 5
C = 94.90, γ = 101.69 C = 46.28, γ = 157.03 C = 10.59, γ = 67.52 C = 1, γ = 3 C = 36.08, γ = 385.13 Hidden layer node number = 21
4 10 33 4 2 3
100% 100% 100% 96.67% 100% 99.52%
architecture. BPSO is substituted by genetic algorithm (GA) as an optimization algorithm in Method 4 [36,44] and SVM is substituted by BP neural network (BPNN) as a classifier in Method 5 [40]. ReliefF feature preselection is also used in Method 4 and Method 5 and the numbers of preselection features is 10 and 7 for Case 1 and Case 2 as the foregoing discussion. Each individual of GA in Method 4 is encoded in the same way of particle encoding in BPSO as previously described. The individual number is set as 30 and the maximum generation is 100. Feature reselection and SVM parameters optimization are conducted by GA simultaneously. In Method 5, a commonly used 3-layer neural network is constructed. The number of hidden layer node influences the classification results so it is treated as an optimization parameter like the parameter in SVM. For each particle in BPSO, 6 bits are set to encode the value of hidden layer node whose range is limited in [1, 63]. The number of iteration times is set as 50 and the particle number is 20. The performances of the other 5 methods for comparison and the proposed method are presented in Table 6 for Case 1 and
12.94 18.04 36.11 9.01 9.03 1787
Average testing accuracy 98.57% 92.14% 45.71% 96.43% 95.71% 98.57%
Table 7 for Case 2. In Method 1, only ReliefF method is used in feature selection, so the final number of features for training is 10 for Case 1 and 7 for Case 2 which is as same as the proposed method in the feature preselection stage. Consequently, the features used in SVM classification training model are 10 and 7 as well. Compared with the proposed method which can reduce the number of features for training as many as possible, Method 1 consumes more computing time. Nevertheless, the average testing accuracy of Method 1 is inferior to the proposed method as indicated in Tables 6 and 7. Because the proposed method does not take all the preselected features into the iteration of SVM parameter optimization, it costs less computation time. Besides, in Method 1, the combination of features with highest weight scores does not necessarily result in a good classification model. So, the superiority of simultaneous feature reselection and SVM parameters optimization in the proposed method can be revealed by the result comparison of this pair of methods.
Please cite this article as: X. Zhang et al., A two-stage feature selection and intelligent fault diagnosis method for rotating machinery using hybrid filter and wrapper method, Neurocomputing (2017), https://doi.org/10.1016/j.neucom.2017.11.016
ARTICLE IN PRESS
JID: NEUCOM 12
[m5G;November 20, 2017;11:0]
X. Zhang et al. / Neurocomputing 000 (2017) 1–14 Table 7 Performance comparison with other fault diagnosis schemes for Case 2. Feature selection and fault diagnosis scheme
Optimal parameters
Number of optimal Training time features (s)
Average training accuracy
The proposed method Method 1 Method 2 Method 3 Method 4 Method 5
C = 79.22, γ = 116.97 C = 74.90, γ = 216.20 C = 95.29, γ = 302.54 C = 1, γ = 3 C = 56.86, γ = 458.85 Hidden layer node number = 10
3 7 29 4 2 3
100% 100% 100% 98.83% 100% 100%
Compared with the proposed method, Method 2 adopts the whole features of the original high dimensionality. This will undoubtedly lead to an extremely complicated SVM model. As a result, Method 2 consumes longer calculation time when compared with the proposed method. But even worse, the trained SVM model with as many as 31 (for Case 1) and 29 (for Case 2) features can only achieve a very low testing accuracy although the training accuracy is 100%. This means that the generalization capability of this method is extremely poor. By comparing the results of the proposed method and Method 2, we can realize the necessity and effectiveness of feature preselection stage. In Method 3, only the feature state in feature reselection stage is encoded into each particle while SVM parameters are set as default values in LIBSVM. When the feature reselection is in progress, the algorithm must coordinate the selected features with the default SVM parameters. Without simultaneous feature preselection and SVM parameters optimization, the trained model may not be the optimal one. Therefore, when the trained model is applied on the testing datasets, it cannot reach a high testing accuracy as the proposed method. From the comparison results of this pair of methods, it is obvious that selecting out the optimal model parameters is also important for the diagnosis accuracy. Not only the feature combination but also the SVM model parameters influence the fault diagnosis performance. The excellent fault recognition model is a dynamic regulation result of simultaneous feature preselection and SVM parameters optimization. Both Tables 6 and 7 suggest that the re-reselected feature numbers are 2 for two cases by Method 4. However, the average testing accuracies of Method 4 are slightly lower than the proposed method. This is because the only two dimensional features are not enough informative for distinguishing all the samples. In Method 5, the BPSO optimization result of hidden layer node number is 21 for Case 1 and 10 for Case 2. Meanwhile, the feature number of preselection is 4 for Case 1 and 3 for Case 2. Although high accuracies can also be achieved in Method 5, the training time is far more than the one by the proposed method. This is because training the neural network consumes much time especially in the condition when the hidden layer node number is large in optimization process. Comprehensive analyses demonstrate that the proposed feature selection method is effective in feature selection and fault diagnosis for both Case 1 and Case 2. And the proposed method can accomplish less consumption of calculation time and lower feature dimensionality on the premise of a high accuracy. Consequently, we can conclude that the proposed method has an advantage in combination property in training time and testing accuracy than the other 5 compared methods.
5. Conclusions This paper has discussed an effective model which can conduct the feature selection and fault diagnosis synchronously for the purpose of achieving the optimal diagnostic accuracy. The results of
21.90 44.34 172.11 26.83 23.90 3268
Average testing accuracy 99.75% 99.50% 28.75% 99.25% 99.00% 95.50%
two experiments verify the superiority of the proposed method. The study can be concluded as follows: (1) To acquire the comprehensive characteristic of running condition of machinery, we suggest a multi-domain feature extraction procedure. In the proposed approach of original feature extraction process, ITD is employed to decompose the raw vibration signal into several PRCs and the first 5 PRCs are used to extract the time and frequency domain features. Consequently, a high dimensional feature set, which contains both representative characteristics and redundancy ones, is obtained for each vibration signal sample. It is necessary to perform further process on these high dimensional features to adaptively select out the representative ones. (2) The origin feature dimensionality is high and may also consume too much calculation time and result in inferior recognition accuracy. ReliefF is applied to preselect the features with higher weight score. These preselected features are adopted as candidates for the next feature reselection stage. In this stage, bit values of particles in BPSO are encode as representation of SVM classifier’s parameters and the reselection states of features. The SVM parameters optimization and feature reselection are conducted synchronously. As the feature reselection and fault diagnosis process are conducted at the same time, it is effective to achieve the optimal diagnostic accuracy with the corresponding optimal feature subset. (3) The fault diagnosis results of experimental bearing and motor system demonstrate that the proposed approach can accurately distinguish different fault types. Comparative study shows that the proposed method can accomplish less consumption of calculation time and lower feature dimensionality on the premise of a high accuracy. So, it is believable that this approach would be suitable and efficient for feature selection and fault diagnosis of rotating machinery. In addition, in order to evaluate the adaptive capacity of the proposed method to the changeable industrial field, further study should be performed on the field test signals which may be influenced by the variation of working load and speed of the complicated machinery. Acknowledgments This research is supported by National Key Technology Research and Development Program of the Ministry of Science and Technology of China (Grant No. 2014BAF08B05, 2015BAF06B05). The authors would like to thank Professor Ming Chen and his graduate students for experimental research support. The authors appreciate the anonymous reviewers and the editor for their valuable comments and suggestions for improving this paper. References [1] Y. Lei, J. Lin, Z. He, M.J. Zuo, A review on empirical mode decomposition in fault diagnosis of rotating machinery, Mech. Syst. Signal Process. 35 (1-2) (2013) 108–126.
Please cite this article as: X. Zhang et al., A two-stage feature selection and intelligent fault diagnosis method for rotating machinery using hybrid filter and wrapper method, Neurocomputing (2017), https://doi.org/10.1016/j.neucom.2017.11.016
JID: NEUCOM
ARTICLE IN PRESS
[m5G;November 20, 2017;11:0]
X. Zhang et al. / Neurocomputing 000 (2017) 1–14 [2] A. Rai, S.H. Upadhyay, A review on signal processing techniques utilized in the fault diagnosis of rolling element bearings, Tribol. Int. 96 (2016) 289–306. [3] B. Li, P. Zhang, H. Tian, S. Mi, A new feature extraction and selection scheme for hybrid fault diagnosis of gearbox, Expert Syst. Appl. 38 (8) (2011) 10 0 0 0–10 0 09. [4] R. Islam, S.A. Khan, J. Kim, Discriminant feature distribution analysis-based hybrid feature selection for online bearing fault diagnosis in induction motors, J. Sens. 2016 (2016) 1–16. [5] X. An, D. Jiang, Bearing fault diagnosis of wind turbine based on intrinsic time-scale decomposition frequency spectrum, Proc. Inst. Mech. Eng Part O: J. Risk Reliab. 228 (6) (2014) 558–566. [6] L. Deng, R. Zhao, Fault feature extraction of a rotor system based on local mean decomposition and Teager energy kurtosis, J. Mech. Sci. Technol. 28 (4) (2014) 1161–1169. [7] M. Kang, M.R. Islam, J. Kim, J. Kim, A hybrid feature selection scheme for reducing diagnostic performance deterioration caused by outliers in data-driven diagnostics, IEEE Trans. Ind. Electron. 63 (5) (2016) 3299–3310. [8] S. Yin, X. Li, H. Gao, O. Kaynak, Data-based techniques focused on modern industry: an overview, IEEE Trans. Ind. Electron. 62 (1) (2015) 657–667. [9] S. Yin, G. Wang, H. Gao, Data-driven process monitoring based on modified orthogonal projections to latent structures, IEEE Trans. Control Syst. Technol. 24 (4) (2016) 1480–1487. [10] Y. Lei, Z. He, Y. Zi, Q. Hu, Fault diagnosis of rotating machinery based on multiple ANFIS combination with GAs, Mech. Syst. Signal Process. 21 (5) (2007) 2280–2294. [11] Z. Feng, M. Liang, F. Chu, Recent advances in time-frequency analysis methods for machinery fault diagnosis: A review with application examples, Mech. Syst. Signal Process. 38 (1) (2013) 165–205. [12] M. Han, J. Pan, A fault diagnosis method combined with LMD, sample entropy and energy ratio for roller bearings, Measurement 76 (2015) 7–19. [13] R. Yan, R.X. Gao, X. Chen, Wavelets for fault diagnosis of rotary machines: a review with applications, Signal Process. 96 (2014) 1–15. [14] M.G. Frei, I. Osorio, Intrinsic time-scale decomposition: time-frequency-energy analysis and real-time filtering of non-stationary signals, Proc. R. Soc. A 463 (2078) (2007) 321–342. [15] L. Bo, C. Peng, Fault diagnosis of rolling element bearing using more robust spectral kurtosis and intrinsic time-scale decomposition, J. Vib. Control 22 (12) (2016) 2921–2937. [16] R.J. Martis, U.R. Acharya, J.H. Tan, A. Petznick, Application of intrinsic time-scale decomposition (ITD) to EEG signals for automated seizure prediction, Int. J. Neural Syst. 23 (5) (2013) 1557–1565. [17] Z. Feng, X. Lin, M.J. Zuo, Joint amplitude and frequency demodulation analysis based on intrinsic time-scale decomposition for planetary gearbox fault diagnosis, Mech. Syst. Signal Process. 72-73 (2016) 223–240. [18] L. Duan, L. Zhang, J. Yue, Fault diagnosis method of gearbox based on intrinsic time-scale decomposition and fuzzy clustering, J. China Univ. Pet. 37 (4) (2014) 133–139. [19] S. Wu, P. Wu, C. Wu, J. Ding, Bearing fault diagnosis based on multiscale permutation entropy and support vector machine, Entropy 14 (12) (2012) 1343–1356. [20] K. Zhu, X. Song, D. Xue, A roller bearing fault diagnosis method based on hierarchical entropy and support vector machine with particle swarm optimization algorithm, Measurement 47 (2014) 669–675. [21] M. Van, H.J. Kang, Two-stage feature selection for bearing fault diagnosis based on dual-tree complex wavelet transform and empirical mode decomposition, Proc. Inst. Mech. Eng. C-J. MEC 230 (2) (2016) 291–302. [22] Y. Yang, Y. Liao, G. Meng, J. Lee, A hybrid feature selection scheme for unsupervised learning and its application in bearing fault diagnosis, Expert Syst. Appl. 38 (9) (2011) 11311–11320. [23] K. Kira, L.A. Rendell, The feature selection problem: traditional methods and a new algorithm, in: Proceedings of the Tenth National Conference on Artificial Intelligence, San Jose, California, AAAI Press, 1992, pp. 129–134. [24] M.R. Ikonja, I. Kononenko, Theoretical and empirical analysis of ReliefF and RReliefF, Mach. Learning 53 (1) (2003) 23–69. [25] C. Huang, J. Dun, A distributed PSO–SVM hybrid system with feature selection and parameter optimization, Appl. Soft Comput. 8 (4) (2008) 1381–1391. [26] V. Cherkassky, The nature of statistical learning theory, IEEE Trans. Neural Netw. 8 (6) (1997) 1564. [27] A. Widodo, B. Yang, Support vector machine in machine condition monitoring and fault diagnosis, Mech. Syst. Signal Process. 21 (6) (2007) 2560–2574. [28] X. Li, A.N. Zheng, X. Zhang, C. Li, Rolling element bearing fault detection using support vector machine with improved ant colony optimization, Measurement 46 (8) (2013) 2726–2734. [29] J. Huang, X. Hu, F. Yang, Support vector machine with genetic algorithm for machinery fault diagnosis of high voltage circuit breaker, Measurement 44 (6) (2011) 1018–1027. [30] J. Kennedy, R. Eberhart, Particle swarm optimization, in: IEEE International Conference on Neural Networks, Perth, Australia, IEEE Service Center, 1995, pp. 1942–1948. [31] X. Zhang, W. Chen, B. Wang, X. Chen, Intelligent fault diagnosis of rotating machinery using support vector machine with ant colony algorithm for synchronous feature selection and parameter optimization, Neurocomputing 167 (2015) 260–279. [32] L. Lu, J. Yan, C.W. de Silva, Dominant feature selection for the fault diagnosis of rotary machines using modified genetic algorithm and empirical mode decomposition, J. Sound Vib. 344 (2015) 464–483.
13
[33] K. Kira, L.A. Rendell, A practical approach to feature selection, in: Proceedings of the Ninth International Workshop on Machine learning, Aberdeen, Scotland, United Kingdom, Morgan Kaufmann Publishers Inc, 1992, pp. 249–256. [34] I. Kononenko, Estimating attributes: analysis and extensions of RELIEF, in: Proceedings of the European Conference on Machine Learning, Catania, Italy, Springer-Verlag New York, Inc., 1994, pp. 171–182. [35] J. Kennedy, R.C. Eberhart, A discrete binary version of the particle swarm algorithm, Systems, Man, and Cybernetics, 1997. Computational Cybernetics and Simulation, 1997, pp. 4104–4108. [36] C. Huang, C. Wang, A GA-based feature selection and parameters optimization for support vector machines, Expert Syst. Appl. 31 (2) (2006) 231–240. [37] C.W. Hsu, C.J. Lin, A comparison of methods for multiclass support vector machines, IEEE Trans. Neural Netw. 13 (4) (2002) 415–425. [38] Case Western Reserve University, Bearing data center. http://csegroups.case. edu/bearingdatacenter/home, 2016. [39] Y. Li, M. Xu, Y. Wei, W. Huang, Rotating machine fault diagnosis based on intrinsic characteristic-scale decomposition, Mech. Mach. Theory 94 (2015) 9–27. [40] S. Lin, S. Chen, W. Wu, C. Chen, Parameter determination and feature selection for back-propagation network by particle swarm optimization, Knowl. Inf. Syst. 21 (2) (2009) 249–266. [41] C.C. Chang, C.J. Lin, LIBSVM: a library for support vector machines, ACM Trans. Intell. Syst. Technol. 2 (3) (2011) 389–396. [42] H.T. Yin, J.Q. Qiao, P. Fu, X.Y. Xia, Face feature selection with binary particle swarm optimization and support vector machine, J. Inf. Hiding Multimed. Signal Process. 5 (4) (2014) 731–739. [43] M. Van, H. Kang, Bearing-fault diagnosis using non-local means algorithm and empirical mode decomposition-based feature extraction and two-stage feature selection, IET Sci. Meas. Technol. 9 (6) (2015) 671–680. [44] J. Li, Q. Zhang, K. Wang, J. Wang, Optimal dissolved gas ratios selected by genetic algorithm for power transformer fault diagnosis based on support vector machine, IEEE Trans. Dielectr. Electr. Insul. 23 (2) (2016) 1198–1206. Xiaolong Zhang is a Ph.D. candidate at Tongji University, Shanghai, China. He received his bachelor degree from Wuhan University of Technology in 2012. His research interest is machinery condition monitoring and intelligent fault diagnosis.
Qing Zhang is a professor and Ph.D. supervisor at Tongji University, Shanghai, China. He received his Ph.D. degree from Wuhan University of Technology in 1999. His current research interest includes dynamic analysis and condition monitoring of electromechanical transmission system.
Miao Chen is a Ph.D. candidate at Tongji University, Shanghai, China. His research includes system dynamic modeling and vibration analysis.
Yuantao Sun is a lecturer and master’s supervisor at Tongji University, Shanghai, China. He received his Ph.D. degree from Wuhan University of Technology in 2008. His current research includes reliability analysis of electromechanical system and fatigue analysis of structures.
Please cite this article as: X. Zhang et al., A two-stage feature selection and intelligent fault diagnosis method for rotating machinery using hybrid filter and wrapper method, Neurocomputing (2017), https://doi.org/10.1016/j.neucom.2017.11.016
JID: NEUCOM 14
ARTICLE IN PRESS
[m5G;November 20, 2017;11:0]
X. Zhang et al. / Neurocomputing 000 (2017) 1–14 Xianrong Qin is a professor and Ph.D. supervisor at Tongji University, Shanghai, China. She received her Ph.D. degree from Nanjing University of Aeronautics and Astronautics in 2002. Her research interest includes finite element modeling, vibration signal processing, dynamic monitoring and detection of structures. Currently, she is a visiting scholar at University of Notre Dame, Indiana, USA.
Heng Li is a Ph.D. candidate at Tongji University, Shanghai, China. His research includes system pattern recognition and machinery fault diagnosis.
Please cite this article as: X. Zhang et al., A two-stage feature selection and intelligent fault diagnosis method for rotating machinery using hybrid filter and wrapper method, Neurocomputing (2017), https://doi.org/10.1016/j.neucom.2017.11.016