Author's Accepted Manuscript
Multi-fault diagnosis for rotating machinery based on orthogonal supervised linear local tangent space alignment and least square support vector machine Zuqiang Su, Baoping Tang, Ziran Liu, Yi Qin
www.elsevier.com/locate/neucom
PII: DOI: Reference:
S0925-2312(15)00037-5 http://dx.doi.org/10.1016/j.neucom.2015.01.016 NEUCOM15054
To appear in:
Neurocomputing
Received date: 1 July 2014 Revised date: 1 January 2015 Accepted date: 8 January 2015 Cite this article as: Zuqiang Su, Baoping Tang, Ziran Liu, Yi Qin, Multi-fault diagnosis for rotating machinery based on orthogonal supervised linear local tangent space alignment and least square support vector machine, Neurocomputing, http://dx.doi.org/10.1016/j.neucom.2015.01.016 This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting galley proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
Multi-fault diagnosis for rotating machinery based on orthogonal supervised linear local tangent space alignment and least square support vector machine Zuqiang Sub, Baoping Tanga,∗, Ziran Liua , Yi Qinb a
School of Mechanical & Electrical Engineering, Henan University of Technology, Zhengzhou, 450007, P.R.China
b
The State Key Laboratory of Mechanical Transmission, Chongqing University, Chongqing 400030, P.R.China
Abstract: In order to improve the accuracy of fault diagnosis, this article proposes a multi-fault diagnosis method for rotating machinery based on orthogonal supervised linear local tangent space alignment (OSLLTSA) and least square support vector machine (LS-SVM). Firstly, the collected vibration signals are decomposed by empirical model decomposition (EMD), and a high-dimensional feature set is constructed by extracting statistical features, autoregressive (AR) coefficients and instantaneous amplitude Shannon entropy from those intrinsic model functions (IMFs) that contain most fault information. Then, the high-dimensional feature set is inputted into OSLLTSA for dimension reduction to yield more sensitive low-dimensional fault features, which not only achieves the combination of intrinsic structure information and class label information of dataset but also improves the discrimination of the low-dimensional fault features. Further, the low-dimensional fault features are inputted to LS-SVM to recognize machinery faults according to the LS-SVM parameters selected by enhanced particle swarm optimization (EPSO). Finally, the performance of the proposed method is verified by a fault diagnosis case in a rolling element bearing, and the results confirm the improved accuracy of fault diagnosis.
Keywords: vibration signal; dimension reduction; fault diagnosis; supervised manifold learning; least square support vector machine; particle swarm optimization
∗
Corresponding author. Tel.:+86-13658319901; fax: +86-023-65415883. E-mail address:
[email protected] (B. Tang),
[email protected] (Z. Su).
1. Introduction Rotating machinery is an important component in industrial applications and has been widely used in many crucial areas such as electric power generation, mining industry, industrial production and so on. With the development of society and technology, rotating machinery is becoming huger, more complicated and more precise. Additionally, it always operates continuously for extended hours under hostile conditions. As a result, any failure may lead to its fatal breakdown, which brings heavy economic losses and even personal casualties [1]. Consequently, it is very important to develop effective fault diagnosis techniques for rotating machinery to avoid accidents and increase machine reliability. Essentially, fault diagnosis of rotating machinery is a problem of pattern recognition, in which fault feature extraction and intelligent fault recognition are the most important two aspects [2]. Therefore, the accuracy of fault diagnosis can be improved by developing more effective feature extraction methods and classification methods. Vibration signals contain abundant information about machinery running states and are commonly used in fault diagnosis of rotating machinery [3]. However, the raw vibration signals should be preprocessed before feature extraction, because they are usually non-stationary and non-linear and contain complicated components [4]. Empirical mode decomposition (EMD) [5] is an advanced self-adaptive time-frequency domain signal processing method based on the local characteristic time scale of a signal. It can decompose a multi-component signal into a set of intrinsic mode functions (IMFs) containing different frequency bands, and the fault features extracted from the IMFs are more distinct than those extracted from the raw vibration signals. However, conventional fault features such as statistical features [6], autoregressive model coefficients [7] and Shannon entropy [8] can only characterize the properties of faults from one specific aspect. In view of this, a high-dimensional feature set comprised of all the three features is constructed in this study to obtain more fault information to better characterize the properties of faults. Unfortunately, the high-dimensional fault feature set inevitably contains disturbed and even noise features, and mutual coupling between features also leads to excessive redundant information. Therefore, it is necessary to further extract significantly independent features with low dimension, high sensitivity and good clustering from the high-dimensional fault feature set by an appropriate dimension reduction method. Dimension reduction methods can be roughly divided into two categories: linear and nonlinear ones. Typical linear dimension reduction methods include principle component analysis (PCA) [9], independent component analysis (ICA) [10], linear discriminant analysis (LDA) [11] and so on. These linear dimension reduction methods are popularly exploited and easy to implement, but they are only applicable to linearly structured datasets with Gaussian distribution and cannot reveal the intrinsic structure information of datasets. In order to overcome the disadvantages of linear dimension reduction methods, some nonlinear dimension reduction methods, known as manifold learning methods, are proposed recently, including ISOMAP [12], locally linear embedding [13], local tangent space alignment [14], etc. These nonlinear dimension reduction methods project a nonlinearly structured high-dimensional dataset into a lower dimensional feature space by preserving local structures in small neighborhoods, and thus successfully acquire the intrinsic features of the high-dimensional dataset. Among these methods, local tangent space alignment has some remarkable advantages such as easy implementation, low computational cost and high capability against noises. However, it can only implement dimension reduction on training dataset but cannot deal with the new coming samples. For this problem, linear local tangent space alignment (LLTSA) is proposed as the linearization of local tangent space alignment [15]. LLTSA inherits the ability of local tangent space alignment in revealing intrinsic structure information of dataset and provides the explicit mapping from high-dimensional input space to low-dimensional feature space. For example, Li et al. [16] have already applied LLTSA to machinery fault diagnosis. Nevertheless, LLTSA is an unsupervised learning method, which neglects the supervisory role of class label information of training samples in classification. This drawback leads to the blindness of dimension reduction and causes the incompleteness of fault decoupling. Although some improvements of local tangent space alignment have been proposed (for example, Zhang et al. [17] proposed an
improved local tangent space alignment to deal with the sparse and non-uniformly distributed dataset; Wang [18] proposed a modified local tangent space alignment to solve the problem of outliers; Zhan et al. [19] proposed a robust local tangent space alignment to improve the noise resistance ability), these variants mainly focus on the problem of noise or outliers but neglect the supervisory role of class label information of training samples in classification. Hence, the linearization of these methods is not optimal yet for machinery fault diagnosis. Zhang et al. [20] also proposed a semi-supervised local tangent space alignment, but the employed label information was the low-dimensional presentation rather than class type of training samples. That is to say, the objective of this method is coordinate propagation rather than classification. Although some other improved methods like supervised linear local tangent space alignment [21-23] and orthogonal discriminant linear local tangent space alignment [24] are the supervised variants of LLTSA, they still have some disadvantages as follows: supervised linear local tangent space alignment only introduces class label information within the neighborhood selection phase but keeps local structure information extraction in neighborhood unchanged; orthogonal discriminant linear local tangent space alignment only increases the interclass dissimilarity but does not enhance the intraclass compactness. Therefore, class label information cannot be fully utilized to improve the discrimination power of dimension reduction. In this study, a novel supervised manifold learning method called as orthogonal supervised linear local tangent alignment (OSLLTSA) is proposed for machinery fault diagnosis. The proposed OSLLTSA mainly improves LLTSA from three aspects: (1) class label information is employed to guide the construction of local neighborhood, thus enhancing the intraclass compactness; (2) a novel strategy for local structure information extraction is introduced to increase the interclass dissimilarity; (3) an orthogonal iteration algorithm [25] is applied to compute the optimal orthogonal project matrix, which eliminates the statistical correlations between basis vectors. Through the first two improvements, combination of intrinsic structure information and class label information of dataset can be successfully achieved, so that clustering effect and sensitivity of the extracted low-dimensional fault features are improved. Additionally, since the datasets are orthogonally projected to the low-dimensional feature space, redundant information within the high-dimensional fault feature set can be well removed and more intrinsic structure information is preserved. Thus, the discrimination of the obtained low-dimensional fault features can be further improved. After fault feature extraction, an effective classification method is necessary to accurately recognize faults. However, commonly used classification methods such as k nearest neighbor classifier (KNNC) [26] and artificial neural networks (ANN) [27] require sufficient training samples for model training, and ANN even suffers over-fitting and local optimal solution owing to empirical risk minimization principle. By contrast, support vector machine (SVM) [28] based on structural risk minimization principle can minimize an upper bound on the expected risk, so that it can solve the problem of over-fitting and local optimal solution. Moreover, SVM implements classification by using a separating hyper-plane that is determined by a few samples called as support vectors, and thus has great generalization capability for small sample size problem. Least square support vector machine (LS-SVM) [29] is an improvement of SVM, and it can greatly simplify the training process by transforming the quadratic programming problem in SVM to a linear problem. Currently, LS-SVM has already been successfully employed to diagnose faults in machinery. For example, Xu et al. [30] applied LS-SVM to diagnose the faults in bearings; Zhou et al. [31] used LS-SVM to diagnose the vibration faults of centrifugal pump; Zhang et al. [32] proposed a new method based on LS-SVM for roller bearing safety region estimation and state identification. However, the performance of LS-SVM heavily depends upon the selection of its parameters, whereas the theoretical basis of selecting appropriate parameters for LS-SVM is still lacking. Particle swarm algorithm (PSO) [33] is a global optimization method that simulates the social behavior of bird flock foraging, and it is simple to implement and easy to compute. However, PSO suffers the problem of being trapped into local minimum. Worse still, the convergence rate of PSO is very slow in practical applications. In order to overcome the disadvantages, some improvements of PSO have already been proposed. Among these methods, He et al. [34] proposed a new PSO method (NPSO) and pointed out that sub-optimal experience of particle swarm should be introduced within the optimization process to overcome premature; He et al. [35] proposed a modified PSO and pointed out that the
parameters of PSO should be adjusted along with the evolution in order to accelerate convergence rate of the optimization process; Ding et al. [36] proposed a new PSO variant on the basis of local stochastic search strategy(LSSPSO) and pointed out that selecting the optimal local position for each particle could improve the performance of the optimization process; Zhai et al [30] proposed an improved PSO (IPSO) and pointed out that it was the searching velocity rather than the increase of iterations that reflected the real state of the convergence of the particle swarm. However, these methods only focused on one aspect of the problem, but ignored the combination of their advantages to achieve better performance. Inspired by the research results above, this study proposes a novel parameter optimization method called as enhanced PSO (EPSO) to select the parameters of LS-SVM. In the proposed EPSO, searching velocity is redefined to overcome premature in PSO and to indicate the convergence state of the particle swarm; local searching ability is introduced to locate the optimal position of each particle; and the parameters in EPSO are adjusted along with the evolution iterations. Thus, the proposed EPSO integrates all the advantages of the improved variants of PSO above. Consequently, the optimal parameters of LS-SVM selected by EPSO can greatly improve the accuracy of LS-SVM. To sum up, a new multi-fault diagnosis method for rotating machinery is proposed based on OSLLTSA and LS-SVM, which improves the accuracy and effectiveness of fault diagnosis of rotating machinery. The rest of this paper is organized as follows. In Section 2, fault feature extraction of the high-dimensional fault feature set is introduced. Then in Section 3, LLTSA is briefly reviewed and the proposed OSLLTSA is described in detail. In Section 4, the algorithm and implementation procedures of the proposed EPSO for parameter selection of LS-SVM are provided. Further, a novel fault diagnosis strategy for rotating machinery based on OSLLTSA and LS-SVM is proposed in Section 5, and fault diagnosis test in a rolling element bearing is performed in Section 6 to verify the effectiveness of the proposed method. Finally, some conclusions are drawn in Section 7.
2. High-dimensional fault feature set Fault feature extraction is the key procedure of rotating machinery fault diagnosis, which aims to extract characteristics that represent the properties of machinery faults. In this section, a high-dimensional fault feature set is extracted from vibration signals to characterize the properties of machinery faults. 2.1. Preprocessing by EMD When faults or abnormal running states occur, the measured vibration signals are usually non-stationary and non-linear, and the components of the vibration signals are also very complicated. Therefore, it is very difficult to directly extract accurate fault features from the raw vibration signals. Wavelet transform [37][38][39] is a commonly used time-frequency domain signal analysis method, but the results of wavelet transform are related to the selected wavelet basis. Many researches on the selection of wavelet basis have been published, such as [40][41]. In this study, EMD [5], which is also time-frequency domain signal processing method, is selected as the preprocessing method before feature extraction, for that EMD doesn’t need base function and is completely based on the local characteristic time scale of the signal. So EMD is a self-adaptive signal processing method that is applicable to non-stationary and non-linear vibration signals. EMD decomposes a multi-component vibration signal x ( t ) , ( t = 1, , T ) into a set of IMFs and a residual as follows:
x ( t ) = ∑ i =1 ci ( t ) + r ( t ) L
(1)
where T is the length of the signals, ci ( t ) , ( i = 1, , L ) are the IMFs and r ( t ) is the residual that represents the central tendency of x ( t ) . These IMFs contain different frequency bands and present the natural oscillatory mode embedded in the vibration signals. The features extracted from IMFs are more distinct than those extracted from raw vibration signals, and thus more exact fault information can be acquired for fault recognition. In this study, the first n IMFs, whose cumulative energy covers more than 90% of the total energy of the original signal, are
selected for fault feature extraction, and the energy of IMF can be computed by
γi = ∫
+∞
−∞
2
ci (t ) dt .
2.2. Fault feature extraction (1) Statistical fault features When faults occur in rotating machinery, both the time-domain and frequency-domain distribution of vibration signals will change [42][43]. In this study, 8 statistical features Fi = [ f1 , , f8 ] , including 6 time-domain features and 2 frequency-domain features, are extracted from each IMF, and the definitions of these features are shown in Table 1, where si ( k ) is the spectrum for k = 1, , K of ci ( t ) and fik is the frequency of the k-th spectrum line. Among these features, f1 − f3 reflect the vibration amplitude and energy in time domain; f 4 − f6 represent the time series distribution of the signal in time domain; f7 can indicate the vibration energy in frequency domain and f8 describes the convergence of the spectrum power.
Table 1 is here (2) AR model coefficients AR model, which is established by time difference and vibration amplitude, is a time series analysis method. It can give the characteristics of the dynamic systems, and its parameters are sensitive to the change of conditions especially for the shock characteristics. Therefore, AR model coefficients contain some important fault information. The AR model of each IMF can be written as follows: m
ci ( t ) = ∑ ϕij ci ( t − j ) + ei ( t )
(2)
j =1
where ϕ ij , ( j = 1, , m ) are the m order coefficients of AR model and ei ( t ) is the residual error. The parameters ϕ ij , ( j = 1, , m ) reflect the inherent properties of the vibration system of rotating machinery, so
Ai = [ϕ1 , , ϕm ] is extracted as a part of the high-dimensional fault feature set. More detail information about the deduction and calculation procedures of AR model coefficients can refer to Ref. [7]. (3) Instantaneous amplitude Shannon entropy Instantaneous amplitude Shannon entropy is a kind of commonly used information entropy, which can be used as the criteria to judge the uncertainty of a signal. The information quantity in a signal is directly related to the uncertainty of the signal. To be specific, a signal can get a larger Shannon entropy value if it is well-regulated. Under normal running state, vibration signals are randomly distributed in the whole frequency range and the information in the signal is uncertain, so that the value of Shannon entropy is small. However, when faults occur, the certainty of the signal in specific frequency band will increase, so that the value of Shannon entropy in that frequency band becomes relatively large. Therefore, Shannon entropy of the IMFs can reflect the quantity and distribution of the information in vibration signals so well that it can be utilized to characterize the properties of faults. In order to extract the Shannon entropy, the instantaneous amplitude of IMFs should be obtained first as follows: 2 ai ( t ) = ci ( t ) + c i ( t )
1 where, c i ( t ) =
∞
ci ( t )
−∞
t −τ
π∫
(3)
dτ is the corresponding signal of IMF after Hilbert transformation. Then, the Shannon
entropy of the instantaneous amplitude is defined as: N
(
Si = ∑ ai ( t ) log ai ( t ) t =1
2
2
)
(4)
It can be deduced from the above analysis that each kind of features contains some important fault information. However, single kind of features can only characterize the properties of faults from a specific aspect, which cannot achieve satisfying fault diagnosis results. In contrast, the high-dimensional fault feature set
H = [ Fi Ai Si ] , i = 1,, n comprised of the above three kinds of features can more comprehensively reflect the properties of faults from multiple aspects, and thus provide more fault information for fault recognition.
3. Orthogonal supervised linear local tangent space alignment (OSLLTSA) for dimension reduction In this section, a novel supervised manifold learning method called as OSLLTSA is proposed for dimension reduction of the high-dimensional fault feature set. Essentially, as a problem of pattern recognition, the objective of dimension reduction in fault diagnosis mainly contains two aspects: (1) removing the disturbed and redundant information within the high-dimensional fault feature set; (2) increasing the separability of fault samples, namely making different-class samples far from each other while same-class samples close to each other. Based on these two points, the proposed OSLLTSA introduces class label information into dimension reduction process to achieve combination of intrinsic structure information and class label information. Notably, OSLLTSA requires basis vectors of the projection matrix to be orthogonal so that redundant information can be well removed. 3.1. Brief review of linear local tangent space alignment (LLTSA)
{
Suppose a high-dimensional dataset X = xi ∈ R D , i = 1, , N
}
sampled from an underlying manifold that
belongs to a lower dimensional feature space in R d , where D and d (d << D) are respectively the dimensions of the high-dimensional input space and the lower dimensional feature space. Then, the dimension reduction problem with LLTSA can be boiled down to finding a projection matrix A to map the high-dimensional dataset X into a
{
low-dimensional dataset Y = yi ∈ R d , i = 1, , N
}
as follows:
Y = AT XH N
(5)
where, H N = I − ee T N is the centering matrix, I is the identity matrix and e is a column vector of all ones. The basic assumption of LLTSA is that local structure information in the neighborhood of each sample can be represented by its local tangent space. Thus, the local tangent spaces of all samples can be realigned in the global low-dimensional feature space to construct the projection matrix from high-dimensional input space to the low-dimensional feature space. In the LLTSA algorithm, the neighborhood X i = [ xi1 , xi 2 , , xik ] of sample point xi , ( i = 1, , N ) is first constructed according to the Euclidean distance between samples, where k is the number of the nearest neighbors. Then, a local transformation matrix Qi is required to map the neighborhood X i to local low-dimensional tangent space. Thus, the local structure of the neighborhood of sample point xi can be approximated as: k
k
2
arg min ∑ xij − ( x + Q iθ ij ) = arg min ∑ X i H k − Q iΘ i x ,Θ ,Q
Θ ,Q
2
j =1
j =1
2
(6)
2
where H k = I − ee T k is the centering matrix, θ ij is the local low-dimensional representation of the nearest neighbor xij and Θ i = [θ i1 , , θ ik ] is the local low-dimensional representation of X i . Obviously, the optimal x can be given by the mean of all xij and the optimal transformation matrix Qi can be given by the T
d eigenvectors of covariance matrix ( X i H k )( X i H k ) corresponding to its d largest eigenvalues. Thus, Θi can be computed by Θ i = QiT X i H k . Hence, it can be concluded that LLTSA actually performs a local PCA on X i to approximate the local structure in the neighborhood. After local structure extraction, the obtained local tangent spaces of all samples are realigned in the global low-dimensional feature space to get the global low-dimensional representation Y of X . Denote a selection matrix by S = [ S1 , , S N ] among which Si , (i = 1,, N ) is a 0-1 selecting vector, so that Yi = YSi in which
Yi = [ yi1 ,, yik ] is the global low-dimensional representation of X i . The objective function of this step can be transformed to a minimal problem as follows: N
arg min ∑ Ei Y
i =1
2
N
= arg min ∑ Yi H k − LiΘ i Y
2
Y
i =1 T
T
T
= arg min tr (YSWW S Y ) Y
= arg min YSW
2 2
(7)
In the above equation, Li is a global transformation matrix and the optimal Li is given by Li = Yi H kΘ i† , where Θ i† is the Moore-Penrose generalized inverse of
Θi ; W = diag (W1 ,,WN )
and Wi is given by
T i
Wi = H k ( I − Θ iΘ ) . Taking the objective of LLTSA in Eq. (5) into consideration, Eq. (7) can be transformed into the following form:
arg min tr ( AT XH N BH N X T A) (8) T T A XH N BH N X A = I d where B = SWW T S T and AT XH N BH N X T A = I d is used to uniquely determine Y . Finally, the solution of Eq. (8) can be converted to a generalized eigenvalue problem as follows:
XH N BH N X T α = λ XH N X T α Evidently, the transformation matrix A can be given by the d eigenvectors to the eigenvectors
(9)
α1 , α 2 ,, α d
corresponding
λ1 ≤ λ2 ≤ ≤ λd of Eq. (9). That is to say A = (α1 ,α 2 , ,α d ) .
3.2. The algorithm of OSLLTSA According to the implementation of LLTSA in Section 3.1, it is obvious that LLTSA is an unsupervised learning method. That is to say, LLTSA only takes intrinsic structure information into consideration but does not make use of class label information. Worse still, basis vectors of the projection matrix are non-orthogonal. These disadvantages restrict the potential application of LLTSA to fault diagnosis. In view of this, a supervised development of LLTSA called as OSLLTSA is proposed for machinery fault diagnosis. The proposed OSLLTSA takes both intrinsic structure information and class label information into consideration, and it also requires basis vectors to be orthogonal. It is easy to know that the dimension reduction effect of LLTSA heavily depends on the extraction of local structure information. Therefore, in order to improve the discrimination capacity of LLTSA, different-class samples should be well separated in the neighborhood. As shown in Section 3.1, the implementation of LLTSA mainly involves three steps: (1) constructing the local neighborhoods of samples; (2) extracting the local structure information in the local neighborhood; (3) computing the projection matrix from high-dimensional input space to low-dimensional feature space. According to the above analysis, the most direct and effective way of improving the discrimination capacity of LLTSA is to introduce class label information into the first two steps of LLTSA. (1) Local neighborhood selection guided by class label information The nonlinear dimension reduction ability of manifold learning, including LLTSA, is mainly acquired by preserving local structures in small neighborhoods when projecting a high-dimensional dataset to a low-dimensional feature space. Therefore, the selection of the neighborhood of each sample greatly influences the dimension reduction results. Traditional neighborhood selection of LLTSA is directly based on Euclidean distances between samples, but neglects the supervisory role of class label information of samples in classification. Worse still, Euclidean distance cannot characterize the local structure when the neighborhood contains multiple class samples. In the case of fault diagnosis, the neighborhood should mainly consist of samples from the same class as the center sample so as to reduce intraclass dissimilarity. To this end, the distances between samples should be redefined.
{
}
For a sample point ( xi , li ) , i = 1, , N , where li ∈ L j , j = 1, , C is the class label of xi , the distance between samples can be redefined as:
dij2 β − η , li ≠ l j e ∆ij = 2 1 − e − dij β , li = l j
(10)
where d ij is the original Euclidean distance between two sample points xi and x j , the regulation parameter β is set as the average Euclidean distance between all pairs of samples to prevent ∆ij from increasing too fast when
d ij is relatively large. The parameter η ∈ ( 0,1) gives a certain chance for different-class samples to own a smaller
dissimilarity than the same-class samples in order to preserve the relationship between classes. The advantages of the redefined distance can be summarized as follows: (i) The range of the redefined distances within class is
while that between classes is 1 − η , ∞ .
[ 0,1]
Therefore, redefined distances can be controlled in certain ranges no matter how strong the noise is. (ii) With the redefinition of distance, distances within class are compressed while distances between classes are stretched. Therefore, when constructing the local neighborhood, those samples with the same class label as the center sample will be more likely to be selected as the nearest neighbors, namely local structure in the neighborhood is mainly determined by the samples from the same class of the center sample point. Consequently, the intraclass dissimilarity can be reduced. (2) New method of local structure information extraction Local structure information extraction is the key of LLTSA, and good dimension reduction results can be achieved if local structure information is well extracted. For fault diagnosis, different-class samples should be well separated in small neighborhoods in order to improve interclass dissimilarity. k
Denote a squared-distance matrix of the neighborhood of xi , ( i = 1, , N ) by Di = d lj , where d lj is l , j =1 2
T
Euclidean distance between xil and xij . Then, an equation, ( H k X i )( X i H k ) = H k Di H k , can be easily derived. Therefore, performing singular value decomposition on
X i is actually equivalent to computing the
eigen-decomposition of H k Di H k . That is to say, local structure information extraction by multidimensional scaling analysis (MDS) is equivalent to that by PCA. The only difference between these two methods is that orthogonal projection by MDS can guarantee the distances between samples unchanged [44]. However, the Euclidean distances cannot characterize the local structure when the neighborhood contains multiple classes. Meanwhile, as shown in Eq. (10), the redefined distances within class are compressed while those between classes are stretched, namely the same-class samples are made close to each other while different-class samples are made far from each other. Thus, the redefined distances between samples can be used to construct the squared-distance matrix k
Ωi = ∆lj2 lj =1 , where ∆ij is the redefined distance between xil and xij . In order to get the local low-dimensional representation of X i , MDS is applied to Ωi : (11) H k Ωi H k = U i diag ( λ1 , , λk ) U iT of the neighborhood as
(
)
The local low-dimensional representation of X i is given by Θ i = diag λ11 2 , , λd1 2 ViT , where Vi is comprised of the first d columns of Ui . Fig.1 gives the local information extraction results by using PCA and MDS with redefined distances. As can be seen, different-class samples are well separated in the local neighborhood when using MDS to extract local structure information, but still mixed with each other when using PCA. It is noticed that appropriate separation of different-class samples in local neighborhood can result in the increase of interclass dissimilarity. Accordingly, Eqs. (7-9) can be established using the obtained local low-dimensional representations of samples.
Figure 1 is here 3.2.2. Computation of orthogonal basis functions for OSLLTSA Solution of Eq. (9) is equivalent to solving the eigenvectors of
( XH
−1
N
X T ) XH N BH N X T associated with
the d smallest eigenvectors. However, the eigenvectors are non-orthogonal in most cases due to the asymmetry of
( XH
−1
N
X T ) XH N BH N X T . As a result, redundant information may still exist after dimension reduction and the
metric structure will be distorted. This may restrict the locality preserving and discriminating power of OSLLTSA. In this study, an iterative algorithm is introduced to compute the orthogonal basis vectors for the projection matrix [24]. The algorithm is as follows: Firstly, taking the objective function of Eq. (8) into consideration, the locality preserving function is defined as follows:
f (α ) =
α T XH N BH N X T α α T XH N X T α
Then, the solution of the eigenvectors in Eq. (9) can be transformed into the following minimal problem:
(12)
α mT XH N BH N X T α m , α m = arg min α mT XH N X T α m α ∈R
(13)
D
T m
T m
s.t. α α1 = 0, , α α m−1 = 0
α1 can be given by the eigenvector of Eq. (9) associated with
where m = 2, , d . It is obvious that the optimal
the smallest eigenvalue. In order to solve the constraint problem in Eq. (13), Lagrange multipliers are introduced to include all constraints:
L(
m)
= α mT XH N BH N X T α m − λ (α mT XH N X T α m − 1) −
(14)
µ1α mTα 1 − − µm −1α mTα m−1 ( m)
The optimal solution of Eq. (13) must satisfy that the partial derivative of L
with respect to
α m is equal to
zero:
∂L( m ) = 2 XH N BH N X T α m − 2λ XH N X T α m − ∂α m
(15)
µ1α1 − − µm−1α m−1 = 0 Multiplying the left side of Eq. (15) by α mT , the solution of
λ can be obtained as follows:
α mT XH N BH N X T α m α mT XH N X T α m
λ=
λ is the minimum of Eq. (12). Multiplying
Comparing Eq. (16) with Eqs. (12) and (13), it is easy to get that −1
the left side of Eq. (15) by
(16)
−1
α1T ( XH N X T ) , , α mT −1 ( XH N X T ) respectively, we can get m − 1 equations as
follows: m −1
−1
−1
2α tT ( XH N X T ) XH N BH N X T α i = ∑ µiαtT ( XH N X T ) α i
(17)
i =1
where t = 1, , m − 1 . For easy writing, we define three simplified notations as follows:
µ ( m−1) = [ µ1 ,, µm−1 ] , T ( m−1) = [α1 ,, α m−1 ] T
(
A( m−1) = T ( m−1)
)
T
Substituting Eq. (18) into Eq. (17), the solution of µ (
(
µ ( m −1) = 2 A( m −1) Multiplying the left side of Eq. (15) by
( XH
−1
m −1)
can be obtained as follows:
) (T ( ) ) ( XH N
(18)
XH N X T T ( m−1)
m −1
XT )
−1
T
N
X T ) αm
(19)
−1
and substituting Eq. (19) into the result, Eq. (15) can be
further transformed into an eigenvalue problem as follows:
{I − ( XH X ) T
N
( XH
N
X
T
−1
(
T ( m−1) A( m −1)
−1
) ( XH
N
BH N X
T
)
)α
−1
m
}× = λα m
(20)
According to Eq. (16),
λ is the minimal of Eq. (12), and thus the optimal α m can be given by the eigenvector
of Eq. (20) associated with the smallest eigenvalue. By repeating Eq. (13) to Eq. (20) for d − 1 times, we get the d basis vectors of the projection matrix. To sum up, the proposed OSLLTSA is a new supervised variant of LLTSA, which can make full use of the supervisory role of class label information to improve the discriminating power of dimension reduction. Moreover, basis vectors of the projection matrix are orthogonal so that the redundant information can be well removed without distorting the metric structure. The implementation of OSLLTSA mainly includes the following steps: (1) Select the k nearest neighbors xij , ( j = 1, , k ) of each sample xi , ( i = 1, , N ) to construct its
{
}
neighborhood X i = xij ∈ R D , j = 1, , k according to the redefined distance between samples. (2) Compute the squared-distance matrix
Ωi of X i , ( i = 1, , N ) and perform MDS on Ωi to get the
local low-dimensional representation Θi of xi , ( i = 1, , N ) . (3) Establish the objective function in Eq. (9) with Θi and then iteratively solve the orthogonal basis vectors of the projection matrix from high-dimensional input space to low-dimensional feature space.
4. Least square support vector machine (LS-SVM) for fault classification After feature extraction, an effective classification method is needed to recognize the machinery faults. In this study, LS-SVM is employed as the fault recognition method, and a new optimization method called as EPSO is proposed to select the optimal parameters for LS-SVM. 4.1. The algorithm of least square support vector machine (LS-SVM) SVM is a machine learning method for classification or pattern recognition. It is based on structure risk minimization principle, so that it can solve the problem of over-fitting and local optimal solution. Moreover, it implements classification using a separating hyper-plane determined by a few samples called as support vectors, so that it is applicable to the small sample size problems. In order to make SVM more practical in engineering applications, LS-SVM is proposed by transforming the quadratic programming problem to a linear problem, which greatly simplifies the training process of SVM and reduces computation time and cost. Given a training sample set
{( y ∈ R , l ∈ R ) , i = 1,, N} , where d
i
i
yi is the i -th training samples and
li ∈ ( −1, +1) is the class label of yi . The objective of LS-SVM is to find an optimal separating hyper-plane that maximizes the margin between the separating hyper-plane and the closest samples of training sample set. The objective function of LS-SVM is given as follows:
min J ( w, e ) = w,b ,ξ
1 τ N 2 w + ∑ ei2 2 2 i =1
(21)
s.t. li w φ ( ti ) + b = 1 − ei , i = 1, , N T
where
τ is an error punishing factor, w ∈ R d is the normal vector, b ∈ R is bias value to determine the position
of the separating hyper-plane, φ ( ⋅) is nonlinear mapping function and ei is the slack factor of yi . Then, Eq. (21) is transformed into a Lagrange form to get the optimal solution: N
L ( w, b, e, α ) = J ( w, e ) − ∑ ξi {wTφ ( ti ) + b + ei − li }
(22)
i =1
where
ξi , (i = 1,, N ) are
the Lagrange multipliers and only the support vectors get non-zero ξi . The optimal
solution of Eq. (21) must satisfy that partial derivatives of Eq. (22) with respect to w, b, e and ξ are equal to zero. Finally, according to the Karush-Kuhn-Tucker conditions, the solution of Eq. (21) is transformed into a linear equation as follows:
0 Z
b 0 = HH + τ I ξ e −Z T T
(23)
−1
T T where H = φ ( y1 ) t1 , , φ ( y N ) t N , Z = [ t1 , , t N ] and ξ = [ξ1 , , ξ N ] . Then, a kernel function, which
satisfies the Mercer’s condition, is introduced to implement the dot production in high-dimensional feature space: T
K ( yi , y ) = φ ( yi ) φ ( y ) . Thus, the classification function of LS-SVM can be obtained as follows:
N f ( y ) = sgn ∑ ξi K ( yi , y ) + b i =1
(24)
It is worth noting that the introduction of kernel function is the key of LS-SVM. Kernel function can directly provide the dot production of two vectors in high-dimensional feature space under the situation that the nonlinear mapping function φ ( ⋅) is unknown. In this study, the most commonly used Radial Basis Function
(
K ( yi , y j ) = exp − yi − y j
2
σ2
) is adopted as the kernel function. Then, there are two parameters τ
and
σ 2 of LS-SVM that should be selected, and the performance of LS-SVM heavily depends on the values of these two parameters. 4.2. Enhanced particle swarm optimization (EPSO) method for LS-SVM PSO is a global optimization method that simulates the social behavior of bird flocks foraging. Suppose an optimization problem in a J -dimensional space where each particle represents a potential solution, and then the optimization procedure of PSO can be expressed as follows:
vij ( t + 1) = ω ⋅ vij ( t ) + c1 ⋅ r1 ( gp j ( t ) − χ ij ( t ) ) + c2 ⋅ r2 ( lpij ( t ) − χ ij ( t ) )
χ ij ( t + 1) = χ ij ( t ) + ν ij ( t + 1) , i = 1, , M; j = 1, , J
(25) (26)
where, χ i = ( χ i1 , , χ iJ ) , i = 1, , M is the position of the i -th particle, M is the number of particles,
ν i = (ν i1 , ,ν iJ ) and lpi = ( lpi1 , , lpiJ ) respectively represent searching velocity and the best position of the i -th particle, gp = ( gp1 , , gpJ ) is the best position of all particles so far,
ω is the inertia factor to control
the impact of previous velocity on current iteration, c1 and c2 are acceleration coefficients which determine the influence of cognitive and social components, and r1 and r2 are two random numbers within the range of
[ 0,1] .
PSO is simple to implement and easy to compute. However, it often suffers the problem of being trapped into local minimum. What is worse, the convergence rate of PSO is always very slow in practical applications. In order to further improve the performance of PSO, a novel variant of PSO called as EPSO is proposed in this study. The proposed EPSO mainly modifies PSO from three aspects as follows: (1) Redefinition of searching velocity. In the standard PSO, position updating of each particle relies on the updating of searching velocity according to global and local optimal position of the particle swarm. However, in actual applications the optimal solution depends on not only the best experience, but also the sub-optimal experience [33]. That is to say more information will be utilized if the sub-best position is considered when updating the searching velocity. Aiming at this problem, this study considers both the best and sub-best positions to redefine the searching velocity as follows:
ν ij ( t + 1) = ω ⋅ vij ( t ) + c1 ⋅ r1 ⋅ λ1 ⋅ ( gp j ( t ) − χij ( t ) ) + (1 − λ1 ) ⋅ ( gp sj ( t ) − χij ( t ) ) +c2 ⋅ r2 ⋅ λ2 ⋅ ( lpij ( t ) − χij ( t ) ) + (1 − λ2 ) ⋅ ( lpijs ( t ) − χij ( t ) )
(27)
s
where, lpis is the sub-best position of the i -th particle, gp is the sub-best position of all particles so far and
λ1 , λ2 ∈ [ 0,1] are two parameters to balance the weight of the best and sub-best positions. (2) Optimized selection method for particle position. From Eqs. (26) and (27), it can be seen that current position, the best position and sub-best position of particles directly determine the searching velocity, whereas the searching velocity is a crucial input variable to compute the position of next iteration. Meanwhile, each particle should be superior in local searching space first and then in global searching space. Therefore, selecting an accurate position for each particle can improve the performance of PSO. However, the obtained position in Eq. (26) may not be the best position in its nearby area. In this study, the position with best fitness in its nearby area is selected as the position of a particle. k ' positions are randomly created in the nearby area of position χ i ( t + 1) computed from Eq. (26), where the range of the nearby area is determined by
χ ij − χ ij ( t + 1) ≤ φ j , j = 1, , J :
χ rji = χ ij ( t + 1) + φ j ⋅ ϕ j , r = 1, , k '
(28)
where φ j is the radius of the nearby area and ϕ j is randomly distributed in the range of [−1,1] . The best position of the obtained k '+ 1 positions (including χ i ( t + 1) ) is selected as the position of the i -th particle. The selection
(
)
of the best position can be given by χ i ( t + 1) = min rk =' 1 fit ( χ ri ) fit ( χ i (t + 1) ) , where fit ( ⋅) is the fitness function. (3) New value assignment model for ω , c1 and c2 ; In the standard PSO, the values of ω , c1 and c2 are fixed, which limits the performance of PSO in practical applications. In the early stage of optimization process, a larger value should be assigned to ω in order to improve the global searching ability. At the same time, each particle has to search for the best solution mainly based on its own experience, due to the lack of social knowledge. Therefore, the value of c1 should be relatively large while that of c2 should be relatively small. On the contrary, at the end of optimization process, a smaller ω can improve the convergence rate, and the importance of social knowledge should be enhanced to ensure the global optimum [34]. Moreover, because searching velocity decreases along with the objective function [30], it can reflect the real state of the convergence of the particle swarm. Thus, the modified value assignment model for ω , c1 and
c2 can be given as follows:
ν (t ) −ν max (ωmax − ωmin ) + ωmax ν max
(29)
c1 (t + 1) =
ν max −ν (t ) (cmax − cmin ) + cmin ν max
(30)
c2 (t + 1) =
ν (t ) −ν max (cmax − cmin ) + cmax ν max
(31)
ω (t + 1) =
1 N ∑ ν i is the average searching velocity, ν max is the maximum searching velocity during the N i =1 optimization process, [ω min , ω max ] is the range of ω and [ cmin , cmax ] is the range of c1 , c2 . where
ν (t ) =
Optimization of LS-SVM parameters by the proposed EPSO mainly includes the following four steps: (1) Randomly produce M particles and initialize the values of fixed parameters in Eqs. (27) ~ Eq. (31); (2) Update the position and searching velocity of each particle as follows: (i) Update the searching velocity and position of each particle according to Eqs. (27) and (26), respectively; (ii) Randomly produce k ' positions according to Eq. (28) in the nearby area of the position obtained by Eq. (26); (iii) Evaluate the fitness of the k '+ 1 positions obtained in Eqs. (28) and (26), and select the best position as the position of a particle;
ⅳ (ⅴ) Update the value of ω ,
s
( ) Update lpi , lpis , (i = 1, , M) and gp, gp ;
c1 and c2 according to Eqs. (28) ~ (31).
(3) Repeat (2) until stopping criteria is fulfilled; (4) Establish the fault recognition model according to the optimized LS-SVM. Fig. 2 shows the flow chart of the above computation procedure.
Figure 2 is here
5. Multi-fault diagnosis model for rotating machinery Based on OSLLTSA and LS-SVM, a novel multi-fault diagnosis method of rotating machinery is proposed in this section to improve accuracy of fault diagnosis. Fig.3 shows the flow chart of the proposed method, which mainly includes four parts: vibration signals preprocessing, feature extraction, dimension reduction and intelligent fault recognition. (1) The collected vibration signals are always non-stationary and non-linear, and their components are complicated. Therefore, they are firstly preprocessed by EMD which decomposes them into a set of IMFs. Each of these IMFs contains different frequency-bands and represents the oscillatory mode embedded in the vibration signals. Then, the first n IMFs are selected for fault feature extraction. (2) A high-dimensional fault feature set is constructed to comprehensively characterize the properties of faults. Statistical features provide the time-domain and frequency-domain statistical distribution of vibration signals; AR model coefficients show the characteristics of the vibration systems; and instantaneous amplitude Shannon entropy reflects the information distribution of the vibration signal. Each kind of these features can reflect the faults from one specific aspect, so that the high-dimensional fault feature set comprised of these features contains abundant fault information for fault recognition. (3) Although the high-dimensional fault feature set contains abundant fault information, disturbed features and excessive redundant information are also inevitable in the high-dimensional fault feature set, which will decrease the accuracy of fault diagnosis. In order to extract optimal sensitive low-dimensional fault feature vectors, the high-dimensional fault feature set is inputted into OSLLTSA for dimension reduction. OSLLTSA not only achieves the combination of intrinsic structure information and class label information, but also ensures basis vectors of the projection matrix are orthogonal and statistically uncorrelated. (4) After feature extraction and dimension reduction, LS-SVM is applied as the final fault recognition method, and EPSO is proposed as the parameter optimization method to select the optimal parameters of LS-SVM. Thus, LS-SVM can solve the problem of over-fitting and local optimal solution, and meanwhile ensure its best performance.
Figure3 is here 6. The experimental result and analysis 6.1. Test rig setup and signal acquisition In this section, an experiment of fault diagnosis in a rolling element bearing is performed to verify the effectiveness of the proposed fault diagnosis method. Fig.4 displays the established test rig, which is mainly comprised of a driving motor, a coupling, a shaft, rotors and a loading mechanism. The test rig is driven by the driving motor, and its rotating speed is adjustable from 0 to 6000 rpm (100 Hz). The specification of the bearings used in this experiment is LM11749. All bearings are assembled in advance, so that more detailed information about the bearing faults cannot be provided. The parameters of the experiment are listed in Table 2.
Figure 4 is here Table 2 is here
Ⅰ
Ⅱ
In this experiment, four kinds of bearing running states are simulated: ( ) normal state; ( ) roller-element fault; ( ) inner-race fault; ( ) outer-race fault. Vibration signals are collected by accelerometer mounted at the bearing housing under the driving speed of 2100 rpm (35 Hz), and the sampling frequency is set to 10 kHz. Fig 5 shows the time waveforms and spectrums of the collected vibration signals. For each kind of running state, 100 samples are collected, among which 40 samples are used as training samples and the rest as testing samples. Thus, there are totally 400 samples collected for this experiment. Each sample contains 8192 points (almost 0.8 second), which covers more than one rotating period of the shaft.
Ⅲ
Ⅳ
Figure 5 is here Figure 6 is here 6.2. Experiment results and analysis The collected vibration signals are firstly decomposed into several IMFs by EMD, as shown in Figure.6. Then, the first 5 IMFs, whose cumulative energy covers more than 90% of the total energy of the original signal, are selected for fault feature extraction. Here, the order of AR model is set to 4, and thus 13 features can be extracted from each IMF. That is to say the dimension D of the high-dimensional fault feature set is 65, namely 65 features are extracted for each sample. In order to extract the significant low-dimensional fault features and remove disturbed and redundant information, the high-dimensional fault feature set is inputted into OSLLTSA for dimension reduction. In algorithm of OSLLTSA, if the number of the nearest neighbors k is too small, OSLLTSA cannot well discover the intrinsic structure information of the high-dimensional fault samples, and on the contrary OSLLTSA will lose the ability of nonlinear dimension reduction. Here, the number of k is set to 15 through several repetitive experiments. For local structure information extraction in the neighborhood of each fault sample, the dimension
di , (i = 1,, N ) of local tangent space is determined by specifying ∑ ji=1 λ j d
∑
k j =1
λ j ≥ 0.95 , and the largest di
is set to the overall dimension d of the low-dimensional feature space. At last, the overall dimension d is set to 3. Figure.7 gives the dimension reduction result, where only 30 testing samples are shown in the figure for ease of observation.
Figure 7 is here After feature extraction, the low-dimensional fault samples are inputted into the optimized LS-SVM for fault recognition, where the optimal parameters range of
τ and σ 2 are selected by the proposed EPSO. Here, the searching
τ and σ 2 is set to (0, 50] , the number of particles is set to 20 and the evolutional generation is set to 50.
Table 3 gives the fault diagnosis results. As can be seen, the average fault diagnosis accuracy of the proposed method can reach 98.75%.
Table 3 is here Four experiments are performed as follows to evaluate the effectiveness of the proposed method, including the effectiveness of high-dimensional fault feature set, the discrimination power of OSLLTSA, the effect of dimension reduction and the fault recognition accuracy of optimized LS-SVM:
(1) Firstly, fault diagnosis with different feature sets is performed, and Table 4 gives the experimental results. As can be seen, fault diagnosis accuracy with the high-dimensional fault feature set is far higher than that with any one single kind of features. Fault diagnosis is a problem of pattern recognition, so that more fault information is conductive to higher diagnosis accuracy. As analyzed in Section 2, features extracted from vibration signals contain important fault information, whereas single kind of feature can only reflect the properties of faults from one specific aspect. By contrast, the high-dimensional fault feature set integrates the fault information in each kind of feature, which can more comprehensively characterize the properties of faults. As a result, the high-dimensional fault feature set can provide more fault information for fault recognition.
Table 4 is here (2) The effectiveness of the proposed OSLLTSA is compared with other dimension reduction methods, including PCA, LLTSA, LDA, supervised linear local tangent space alignment (SLLTSA), orthogonal discriminant linear local tangent space alignment (ODLLTSA) and supervised locality preserving projections (SLPP) [45]. Here, the parameters of these methods are set to be the same with each other. Fig.8 shows the dimension reduction results of these methods and Table 5 gives the fault diagnosis results. As a classical linear dimension reduction method, PCA is only applicable to linearly structured dataset with Gauss distribution, and it cannot reveal the intrinsic structure information of the high-dimensional fault samples. As a result, a lot of useful information is lost, and the fault diagnosis accuracy is only 79.18%. LLTSA can reveal the intrinsic structure of fault samples, but it is unsupervised and neglects the supervisory role of class label information in fault recognition. What is worse, the project matrix is not orthogonal, namely basis vectors are statistically correlated. Consequently, redundant information still remains and fault decoupling is incomplete. As a result, the fault diagnosis accuracy with LLTSA is 91.25%. As for LDA, it can make use of class label information, but does not take intrinsic structure information into account. As a result, LDA still cannot well discriminate the four running states, and fault diagnosis accuracy with LDA is 83.75%. SLLTSA and ODLLTSA are supervised manifold learning methods, which take both intrinsic structure information and class label information into consideration. As shown in Fig.8, dimension reduction results of these two methods are better than that of LLTSA, and the fault diagnosis accuracies with SLLTSA and ODLLTSA are respectively 95.83% and 94.17%, which are better than those of above methods. However, SLLTSA only introduces class label information within neighborhood selection phase but keeps the local structure information extraction in neighborhood unchanged, which decreases the discriminating power of SLLTSA; ODLLTSA only focuses on increasing interclass dissimilarity but does not enhance intraclass compactness, which weakens further improvement of clustering effect and sensitivity of the extracted low-dimensional features. SLPP is also a supervised manifold learning method. Fig.8 shows that SLPP performs better than LLTSA and classical linear dimension reduction methods but worse than SLLTSA and ODLLTSA, with a fault diagnosis accuracy of 92.92%. The main reason that SLPP performs worse than SLLTSA and ODLLTSA is that the nonlinear manifold unfolding ability of locality preserving projections (LPP) is worse than that of LLTSA, because local tangent space employed by LLTSA can represent the local structure of dataset better than the constructed local graph employed by LPP [15]. The proposed OSLLTSA introduces class label information within both neighborhood selection and local structure information extraction. That is to say, OSLLTSA tries to separate faults in the local neighborhood of each sample. Therefore, the discriminating power of OSLLTSA is stronger than those of both SLLTSA and ODLLTSA. Moreover, basis vectors of the projection matrix are orthogonal and statistically uncorrelated, so that redundant information can be well removed, which further improves the discrimination of the obtained low-dimensional fault features. Finally, the fault diagnosis accuracy with OSLLTSA can reach 98.75%.
Figure8 is here Table 5 is here (3) The effect of dimension reduction is illustrated in this experiment, and Table 6 shows the fault diagnosis results with dimension reduction by OSLLTSA versus those without dimension reduction. The high-dimensional fault feature set extracted from vibration signals always contains disturbed and even noise features, and mutual coupling between features also leads to redundant information. Through dimension reduction by OSLLTSA, the redundant and noise information can be removed, while the intrinsic structure information is preserved. Thus, the sensitivity and clustering effect of the obtained low-dimensional fault features are improved, so that more reliable and clearer fault information can be extracted for fault recognition, thus improving the fault diagnosis accuracy.
Table 6 is here (4) The fault recognition ability of LS-SVM optimized by EPSO is compared with those of KNNC (the number of the nearest neighbors of KNNC is set to 10), ANN and LS-SVM with randomly selected parameters. Table 7 shows the fault diagnosis results. As a statistical classification method, KNNC requires sufficient training samples for model training. Worse still, the distance between samples used by KNNC cannot effectively indicate the relevance between samples. Thus, the fault diagnosis result with KNNC is not satisfactory. Similar to KNNC, ANN also requires a large number of fault samples, and ANN even suffers over-fitting and local optimal solution problem. Hence, ANN cannot get satisfying diagnosis result either. LS-SVM is based on structural risk minimization principle and has good generation ability for small sample size problem. However, the performance of LS-SVM heavily depends on its parameters. As shown in Table 7, the fault recognition capacity of LS-SVM with parameters selected by EPSO is stronger than that with randomly selected parameters. Figure 9 gives the optimization results of EPSO and some other improvements of PSO. As can be seen, convergence rate of PSO is very slow, and the obtained solution is not global optimal. Compared with PSO, IPSO can largely improve the convergence rate because the parameters c1 , c2 and
ω can be adjusted along with the evolution iteration. Moreover, the parameters of IPSO are
adjusted according to the searching velocity rather than the increase of the iterations, which can better reflect the real state of the convergence of the particle swarm. NPSO and LSSPSO can get better solution than PSO, and meanwhile improve the convergence rate. That is to say, introducing more information by using both the optimal and sub-optimal experience to update the searching velocity can improve the performance of PSO. Further, LSSPSO can get even better solution than NPSO, which proves that selecting the best position in nearby area for each particle can help to overcome the problem of local minimum. The proposed EPSO is a novel variant of PSO, which integrates all the advantages of the improved variants of PSO above. It adds local searching ability into the optimization process, and redefines the searching velocity to involve more evaluation information. Meanwhile, the parameters c1 , c2 and
ω can be updated according to the searching velocity. It is obvious that the proposed EPSO can get even better performance than the other improvements of PSO, with more stable optimization result and faster convergence rate.
Table 7 is here Figure 9 is here The effectiveness of the proposed fault diagnosis method is proved by the above experiments. The experimental results show that the high-dimensional fault feature set can provide more fault information for fault diagnosis, as it integrates all the fault information in single kind of feature. Moreover, the dimension reduction effect of the proposed OSLLTSA is better than other dimension reduction methods, as OSLLTSA takes both intrinsic structure information and class label information into account during dimension reduction. The experimental results also show that the fault recognition ability of LS-SVM is better than that of KNNC and ANN, and the proposed EPSO can select the optimal parameters for LS-SVM.
7. Conclusions This study proposed a multi-fault diagnosis method for rotating machinery based on OSLLTSA and optimized LS-SVM. According to the experimental results, some conclusions are summarized as follows: (1) The high-dimensional fault feature set extracted from the vibration signals can well characterize the properties of machinery faults, and sufficient fault information is obtained for the fault recognition. (2) The proposed OSLLTSA realizes the combination of intrinsic structure information and the class label information, and moreover the basis vectors of the projection matrix are made orthogonal and statically uncorrelated. As a result, dimension reduction by OSLLTSA can well remove the disturbed and redundant information to yield more sensitive low-dimensional fault features. (3) LS-SVM has good generalization capability for small sample size problem and is suitable to the task of fault diagnosis. What is more, the proposed EPSO can select the optimal parameters for LS-SVM to achieve the best fault recognition performance. (4) The proposed fault diagnosis method integrated the advantages of OSLLTSA in dimension reduction and LS-SVM in fault recognition, and as a result the fault diagnosis accuracy can be effectively improved. The fault diagnosis experiment of rolling element bearing demonstrated the effectiveness of the proposed fault diagnosis method.
Acknowledgments This research is supported by National Science Foundation of China (Nos. 51275546, 51375514), Specialized Research Fund for the Doctoral Program of Higher Education (No. 20130191130001), and the Fundamental Research Funds for the State Key Laboratory of Mechanical Transmission in Chongqing University (Project No. SKLMT-ZZKT-2012 MS 09). Finally, the authors are very grateful to the anonymous reviewers for their helpful comments and constructive suggestions.
References [1] Y.G Lei, M.J. Zou, Z.J. He, et al., A multidimensional hybrid intelligent method for gear fault diagnosis, Expert Systems with Applications 37 (2010) 1419-1430. [2] Z.W. Liu, H.R. Cao, X.F. Chen, et al. Multi-fault classification based on wavelet SVM with PSO algorithm to analyze vibration signals from rolling element bearings, Neurocomputing 99 (2013) 399-410. [3] J. Zarei, M.A. Tajeddini, H.R. Karimi. Vibration analysis for bearing fault detection and classification using an intelligent filter, Mechatronics 24 (2014) 151-157. [4] R. Ricci, P. Pennacchi. Diagnosis of gear faults based on EMD and automatic selection of intrinsic mode functions, Mechanical Systems and Signal Processing 25 (2011) 821-838. [5] N. E. Huang, Z. Shen, S. R. Long, et al, The empirical mode decomposition and the Hilbert spectrum for nonlinear and non-stationary time series analysis, Proceedings of The Royal Society of London Series A-Mathematical Physical and Engineering Sciences 454 (1971) (1998) 903-995. [6] Y.G Lei, M.J. Zou. Gear crack level identification based on weighted K nearest neighbor classification, Mechanical Systems and Signal Processing 23 (2009) 1535-1547. [7] G.F Wang, C. Liu, Y.H. Cui. Clustering diagnosis of rolling element bearing fault based on integrated Autoregressive/Autoregressive Conditional Heteroscedasticity model, Journal of Sound and Vibration 331 (2012) 4379-4387. [8] H.H Bafroui, A. Ohadi. Application of wavelet energy and Shannon entropy for feature extraction in gearbox fault detection under varying speed conditions, Nerocomputing 133 (2014) 437-446. [9] I.T. Jolliffe, Principle Component Analysis, Springer, New York, 1986. [10] A. Hyvarinen, E. Oja, Independent component analysis: algorithms and applications, Neural Networks 13 (425) (2000) 411-430. [11] P.N Belhumeur, J.P Hespanha, D.J Kriegman. Eigenfaces vs. Fisherfaces: recognition using class specific linear projection, IEEE. Trans. Pattern Anal. Mach. Intell. 19 (7) (1997) 711-720. [12] J.B. Tenenbaum, V.D. Silva, J.C. Langford. A global Geometric Framework for Nonlinear Dimensionality Reduction, Science 290 (2000) 2319-2323. [13] S.T. Rweis, L.K. Saul, Nonlinear Dimensionality Reduction by Locally Linear Embedding, Science 290(2000)2323-2326. [14] Z.Y. Zhang, H.Y. Zha, Principal manifolds and nonlinear dimensionality reduction via tangent space alignment, Society for Industrial and Applied Mathematics Journal of Scientific Computing 26(2004) 313-338. [15] T.H. Zhang, J. Yang, D.L. Zhao, et al. Linear local tangent space alignment and application to face recognition, Nerocomputing 70 (2007) 1547-1553. [16] F Li, B.P. Tang, R.A. Yang. Rotating machine fault diagnosis using dimension reduction with linear local tangent space alignment, Measurement 46 (2013) 2525-2539. [17] P. Zhang, H. Qiao, B. Zhang, An improved local tangent space alignment method for manifold learning, Pattern Recogniton Letters 32 (2011) 181-189. [18] J. Wang, Improved local tangent space alignment using various dimensional local coordinates, Neurocomputing 71 (2008) 3575-3581. [19] Y.B. Zhan, J.P Yin, Robust local tangent space alignment via iterative weighted PCA, Neurocomputing 74 (2011) 1985-1993. [20] Z.Y. Zhang, H.Y. Zha, M. zhang, Spectral methods for semi-supervised manifold learning, In Proc. of CVPR 2008. [21] G.B. Wang, X.J. Li, Z.C. He, et al. Fault diagnosis method based on supervised Manifold learning and SVM, Advanced Materials Research 216 (2011) 223-227. [22] H.Y. Li, W.B. Chen, I.F. Shen, Supervised learning for classification, L. Wang and Y. Jin (Eds.): FSKD 2005, LNAI 3614, (2005) 49-57. [23] G.B. Wang, X.Q. Zhao, Y.H. He, Fault diagnosis method based on supervised incremental local tangent space alignment and SVM, Applied Mechanics and Materials, 1233 (2010) 34-35. [24] Y.Z Li, D.Y. Luo, S.Q. Liu. Orthogonal discriminant linear local tangent space alignment for face recognition, Nerocomputing 72 (2009) 1319-1323.
[25] D. Cai, X.F. He, J.W. Han, et al. Orthogonal Laplacian faces for face recognition[J]. IEE Transactions of Image Processing 11 (2006) 3608-3614. [26] X.P. Yu, X.G. Yu. Novel text classification based on k-nearest neighbor, in:Proceedings of Sixth International Conference on Machine learning Cybernetics, Hong Kong, China, 2007, pp: 3425-3430. [27] A. Azadeh, M. Saberi, A. Kazem, et al. A flexible algorithm for fault diagnosis in a centrifugal pump with corrupted data and noise based on ANN and support vector machine with hyper-parameters optimization, Applied Soft Computing 3 (2013) 1478-1486. [28] Y. Yang, D.J. Yu, J.S. Cheng. A fault diagnosis approach for roller bearing based on IMF envelope spectrum and SVM, Measurement 40 (2007) 943-950. [29] C.M. Vong, P.K. Wong. Engine ignition signal diagnosis with Wavelet Packet Transform and Multi-class Least Squares Support Vector Machines, Expert Systems with Applications 38 (2011) 8563-8570. [30] H.B. Xu, G.H. Chen. An intelligent fault identification method of rolling bearing based on LSSVM optimized by improved PSO, Mechanical Systems and Signal Processing 35 (2013) 167-175. [31] Y.L. Zhou, P Zhao. Vibration Fault Diagnosis Method of Centrifugal Pump Based on EMD Complexity Feature and Least Square Support Vector Machine, Energy Procedia 17 (2012) 939-945. [32] Y Zhang, Y Qin, Z.Y. Xing, et al. Roller bearing safety region estimation and state identification based on LMD–PCA–LSSVM, Measurement 46 (2013) 1315-1324. [33] Q. Wu. Car assembly line fault diagnosis based on robust wavelet SVC and PSO, Expert Systems with Applications 37 (2010) 5423-5429. [34] G He, N.J. Huang. A new particle swarm optimization algorithm with an application, Applied Mathematical and Computation 232 (2014) 521-528. [35] G He, N.J. Huang. A modified particle swarm optimization algorithm with applications, Applied Mathematical and Computation 219(2013) 1053-1060. [36] J.L. Ding, J Liu K.R. Chowdhury, et al. A particle swarm optimization using local stochastic search and enhancing diversity for continuous optimization, Neurocomputing 137(2014)261–267. [37] G.G. Cai, X.F. Cheng, Z.J. He. Sparsity-enabled signal decomposition using tunable Q-factor wavelet transform for fault feature extraction of gearbox, Mechanical Systems and Signal Processing 41 (2013) 34-53. [38] R.Q. Yan, R. Gao, An efficient approach to machine health evaluation based on harmonic wavelet packet transform, Robotics and Computer Integrated Manufacturing 21(2005) 291–301. [39] K. Feng, Z.N. Jiang, W. He, et al, Rolling element bearing fault detection based on optimal antisymmetric real Laplace wavelet, Measurement 44 (2011) 1582-1591. [40] R.Q. Yan, R. Gao, Base wavelet selection for bearing vibration signal analysis, International Journal of Wavelets, Multi-resolution, and Information Processing 7(4)(2009) 411–426. [41] R.Q. Yan, R. Gao, X.F. Chen, Wavelets for fault diagnosis of rotary machines: A review with applications, Signal Processing 96(2014) 1-15. [42] Z.J. Shen, X.F. Chen, X.L. Zhang, et al, A novel intelligent gear fault diagnosis model based on EMD and multi-class TSVM, Measurement 45(2012) 30-40. [43] Y.G. Lei, M.J. Zuo. Gear crack level identification based on weighted K nearest neighbor classification algorithm. Mechanical Systems and Signal Processing 23 (2009) 1535-1547. [44] J. Wang, W.X. Jiang, J. Gou, Extended local tangent space alignment for classification, Neurocomputing 77 (2012) 261-266. [45] W.K. Wong, H.T. Zhao. Supervised optimal locality preserving projection, Pattern Recognition 45 (2012) 186–197.
(1) A multi-fault diagnosis model is proposed based on OSLLTSA and optimized LS-SVM. (2) OSLLTSA is proposed for feature compression of high-dimension feature set. (3) LS-SVM is employed as fault identification method. (4) EPSO is proposed for parameter selection of LS-SVM.
The biography of the first author Zuqiang Su received his B.Sc. degree from Hefei University of Technology, Hefei, China, in 2010. Now he is a Ph.D. candidate in Chongqing University, Chongqing, China. His main research interest is mechanical signal processing, machinery condition monitoring and fault diagnosis, and mechanical equipment security service and life prediction.
The biography of the second (corresponding) author Baoping Tang received his M.Sc. degree in 1996 and Ph.D. degree in 2003 both from Chongqing University, Chongqing, China. Now He is a professor and Ph.D. supervisor in Chongqing University, Chongqing, China. His main research interests include wireless sensor networks, mechanical and electrical equipment security service and life prediction, and measurement technology and instruments.
The biography of the third author Ziran Liu received his B.Sc. degree in 1983 from Xi’an Jiaotong University, received his M.Sc. degree in 1988 from Chongqing University. Now he is a professor in Henan University of Technology. His main research interest is equipment condition monitoring and fault diagnosis.
The biography of the fourth author Yi Qin received the B.S. and Ph.D. degrees in mechatronical engineering from Chongqing University, Chongqing, China, in 2004, and 2008, respectively.Since 2009, he has been with the State Key Laboratory of Mechanical Transmission at Chongqing University, Chongqing, China, where he is currently an Associate Professor. His current research interests are in the area of signal processing, mechanical fault diagnosis and measurement method.
Fig.1. Local structure information extraction effect of PCA and MDS: (a) original data; (b) local structure extracted by PCA; (c) local structure extracted by MDS with redefined distance Fig.2. Implementation procedure of EPSO for parameter optimization Fig.3. Fault diagnosis model for rotating machinery based on OSLLTSA and LS-SVM Fig.4. The display of roller bearing fault experiment rig Fig.5. The time waveform and spectrums of the collected vibration signals: (a) normal state; (b) rolling-element fault; (c) inner-race fault; (d) outer-race fault Fig.6. The time waveforms and spectrums of the IMFs in four running states: (a) normal state; (b) rolling-element fault; (c) inner-race fault; (d) outer-race fault Fig.7. Dimension reduction result by using OSLLTSA Fig.8. Dimension reduction results by applying several methods: (a) PCA; (b) LLTSA; (c) LDA; (d) SLLTSA; (e) ODLLTSA; (f) SLPP Fig.9. The comparison between the proposed EPSO and other improvements of PSO
Figure1
1.5
1
0.4 0.2
1
0.5
0.5
0
0 -0.2 -0.4
0 -1
0
1
(a)
2
-0.5 -1
0
(b)
1
-0.6 -0.5
0
(c)
0.5
Figure2
Start Randomly produce M particles and initialize the parameters Select particle’s local best, sub-best position and global best, sub-best position
Stop iteration
Yes
Test fault samples
Fault diagnosis model of LS-SVM with optimized parameters
No Update the searching velocity and position of particles
Fault diagnosis results
Randomly produce k ' positions in the particle’s nearby area
End
Select the best position as the actual position of the particle Update particle’s local best, sub-best position and global best, sub-best position Update the value of , c1 and
c2
Figure3
Original vibration signals Training sample set
Testing sample set
EMD decomposition Hybrid feature set
Statistical features
AR coefficients
Shannon entropy
Feature compression by OSLLTSA
LS-SVM optimized by EPSO
Fault identification results
Figure4
Coupling
Accelerometers Roller bearing
Data analysis
Driving motor
Data acquisition
Figure5
10
0.5
0 -10
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
Acceleration/g
10
(a) 0
0 0.5
1000
2000
3000
4000
5000
0
1000
2000
3000
4000
5000
0
1000
2000
3000
4000
5000
1000
2000
3000
4000
5000
0 -10
(b) 0 0.5
10 0 -10
(c) 0 0
0.2
0.4
0.6
0.8
0.5
10 0 -10
(d) 0 0
0.1
0.2
0.3
0.4
Time/s
0.5
0.6
0.7
0.8
0
Frequency/Hz
IMF1 IMF2
5 0 -5
IMF3
5 0 -5
IMF4
Acceleration/g
5 0 -5
5 0 -5
IMF5
Figure6
2 0 -2
0
0.2
0.4
0.6
0.1 0.05 0
0.8
0
0.2
0.4
0.6
0.8
0
0.2
0.4
0.6
0.8
0
0.2
0.4
0.6
0.8
0
0.2
0.4
0.6
0.8
0
1000
2000
3000
4000
5000
0 0.2 0.1 0 0 0.4 0.2 0 0 0.2 0.1 0 0
1000
2000
3000
4000
5000
1000
2000
3000
4000
5000
1000
2000
3000
4000
5000
1000
2000
3000
4000
5000
0.2 0.1 0
(a)
Frequency/Hz
Time/s
0.6
0.4 0.2 0
0.8
0.4
0.6
0.8
0
0.2
0.4
0.6
0.8
0
0.2
0.4
0.6
0.8
0
0.2
0.4
0.6
0.8
5 0 -5 5 0 -5 5 0 -5 5 0 -5 2 0 -2
0
0
Time/s
0.2
0.4
0.2
0.4
0
1000
2000
3000
4000
5000
0 0.2 0.1 0 0 0.2 0.1 0 0 0.2 0.1 0 0
1000
2000
3000
4000
5000
1000
2000
3000
4000
5000
1000
2000
3000
4000
5000
2000
3000
4000
5000
0.2 0.1 0
(b)
IMF1
IMF3 IMF2
0.4
0.2
IMF2
IMF4
0.2
0
IMF3
2 0 -2
Acceleration/g
5 0 -5
0
IMF4
5 0 -5
IMF5
10 0 -10
IMF5
Acceleration/g
IMF1
10 0 -10
Frequency/Hz 0.6
0.6
0.4 0.2 0
0.8
0
1000
2000
3000
4000
5000
0
1000
2000
3000
4000
5000
0
1000
2000
3000
4000
5000
0 0.2 0.1 0 0
1000
2000
3000
4000
5000
1000
2000
3000
4000
5000
0.2 0.1 0
0.8
0.2 0.1 0 0
0.2
0.4
1000
0.6
0.8
0
0.2
0.4
0.6
0.8
0
0.2
0.4
0.6
0.8
0.2 0.1 0
(c)
IMF1 IMF2
5 0 -5
IMF3
5 0 -5
IMF4
Acceleration/g
5 0 -5
5 0 -5
IMF5
Time/s
2 0 -2
Frequency/Hz 0.2 0.1 0
0
0.2
0.4
0.6
0.8 0.2 0.1 0
0
0.2
0.4
0.6
0.8
0
0.2
0.4
0.6
0.8
0
0.2
0.4
0.6
0.8
0.2 0.1 0
(d) (d)
0
0.2
0.4
Time/s
0.6
0.8
0.2 0.1 0 0.2 0.1 0
0
1000
2000
3000
4000
5000
0
1000
2000
3000
4000
5000
0
1000
2000
3000
4000
5000
0
1000
2000
3000
4000
5000
0
1000
2000
3000
4000
5000
Frequency/Hz
Frequency/Hz
Figure7
Component3
Inner-race fault Normal state Outer-race fault Rolling-element fault Testing samples 0.2 0.1 0 0.2
-0.1 0.1
-0.2 -0.2
0
-0.1 0
Component2
-0.1
Component1
0.1 0.2
-0.2
Figure8
Inner-race fault Normal state Outer-race fault Rolling-element fault Testing samples
Inner-race fault Normal state Outer-race fault Rolling-element fault Testing samples 0.4
0 -3
-0.5
-2
-1 -1
-1
Component3
Component3
1 0.5
0.2 0 -0.4 -0.2 -0.4 0.2
0 Component1
-0.5 0 1 1.5
0
0.1 0
1
0.5
Component2
-0.2
0.2
-0.1
Component 2
2
Component1
-0.2 -0.3
(a)
0.4
(b) Inner-race fault Normal state Outer-race fault Rolling-element fault Testing samples
Inner-race fault Normal state Outer-race fault Rolling-element fault Testing samples 0.6
20 10 10
0
5
-10 10
Component3
Component3
30
0.4 0.2 0
0
Component2
-5 -10
0 0
-10 Component1
0
0.1 -0.1
-5
5
0.2
-0.2
Component 2
-15
-0.1 0.1
(d)
(c)
Inner-race fault Normal state Outer-race fault Rolling-element fault Testing samples
Inner-race fault Normal state Outer-race fault Rolling-element fault Testing samples
0 0.4 -0.5 -0.2
0.2 0
-0.1 0
(e)
0.2
-0.4
0.5
0 0.4 0.2
-0.5 0.4
0
0.2 0
-0.2 0.1
Component3
Component3
0.5
Component2
Component1
-0.2
-0.2
Component1
Component2
-0.4 -0.6
(f)
-0.4
Component1
Figure9
Error rate (%)
(Initial parameters c1=2,c2=1.5,Evolutional generation=50,Population of particles=20) 5 EPSO 4.5 IPSO NPSO 4 PSO LSSPSO 3.5 3 2.5 2 1.5 1
0
10
20 30 Evolutional generation
40
50
Table1
Table 1 The definition of the extracted statistical time-domain and frequency-domain features Feature Mean
Peak
Root mean square Standard deviation
Feature definition f1
1 T c t T t 1 i
f 2 max ci t
Feature Crest factor
Feature definition f5
Kurtosis
f6
f7
f3
2 1 T c t T t 1 i
Mean frequency
f4
2 T 1 c t f1 T 1 t 1 i
Standard deviation frequency
f8
f2 1 T t 1 ci (t ) T
2
4 T 1 c t f1 4 t 1 i T 1 f 2
1 K s k K k 1 i 2 K 1 s k f7 K 1 k 1 i
Table2
Table 2 The main parameters of the roller bearing fault experiment Parameter/Device
Value / Type
Driving speed Bearing specs Accelerometer Data acquisition card Data acquisition system Sampling frequency
2100 rpm LM11749 PCB352C03 NI9234 DP/INV306U 10kHz
Table3
Table 3 Fault diagnosis result of the proposed fault diagnosis method based on OSLLTSA and optimized LS-SVM Running states
Diagnosis accuracy
Inner-race fault/%
98.33
Normal
Outer-race
Roller element
Average
state/%
fault/%
fault/%
accuracy/%
100
100
96.67
98.75
Table4
Table 4 Fault diagnosis result with different feature set Feature extraction
Dimension
Inner-race
Normal
Outer-race
Roller element
Average
method
d
Fault/%
State/%
Fault/%
Fault/%
Accuracy/%
statistical features
40
91.67
100
95
90
94.17
AR model coefficients
20
93.33
100
96.67
90
95
Shannon entropy
5
90
100
95
88.33
93.33
Hybrid feature set
65
98.33
100
100
96.67
98.75
Table5
Table 5 Fault diagnosis result with fault compression by five kinds of dimensionality reduction methods Feature compression
Inner-race
Normal state/%
Outer-race fault/%
Roller element
Average accuracy/%
method
fault/%
PCA
76.67
85
81.67
73.33
79.18
LLTSA
90
100
93.33
81.67
91.25
LDA
80
91.67
85
78.33
83.75
SLLTSA
93.33
100
100
90
95.83
ODLLTSA
91.67
100
96.67
88.33
94.17
SLLP
90
100
96.67
85
92.92
OSLLTSA
98.33
100
100
96.67
98.75
fault%
Table6
Table 6 Comparison of fault diagnosis with dimension reduction by OSLLTSA versus without dimension reduction Whether dimension
Inner-race fault/%
reduction?
Normal
Outer-race
Roller element
Average
state/%
fault/%
fault/%
accuracy/%
Yes
98.33
100
100
96.67
98.75
No
61.67
78.33
76.67
55
67.92
Table7
Table 7 Fault diagnosis result with several kinds of classification methods Fault identification
Inner-race fault
Normal state
Outer-race fault
Roller element fault
Average accuracy
method
(%)
(%)
(%)
(%)
(%)
KNNC
95
98.33
100
93.33
96.65
ANN
96.67
100
96.67
95
97.09
LS-SVM
93.33
100
98.33
90
95.42
Optimized LS-SVM
98.33
100
100
96.67
98.75
The photo of the first author
The photo of the second (corresponding) author
The photo of the third author
The photo of the fourth author