Weak fault diagnosis of rotating machinery based on feature reduction with Supervised Orthogonal Local Fisher Discriminant Analysis

Weak fault diagnosis of rotating machinery based on feature reduction with Supervised Orthogonal Local Fisher Discriminant Analysis

Neurocomputing 168 (2015) 505–519 Contents lists available at ScienceDirect Neurocomputing journal homepage: www.elsevier.com/locate/neucom Weak fa...

2MB Sizes 0 Downloads 51 Views

Neurocomputing 168 (2015) 505–519

Contents lists available at ScienceDirect

Neurocomputing journal homepage: www.elsevier.com/locate/neucom

Weak fault diagnosis of rotating machinery based on feature reduction with Supervised Orthogonal Local Fisher Discriminant Analysis Feng Li a,n, Jiaxu Wang b, Minking K. Chyu c, Baoping Tang d a

School of Manufacturing Science and Engineering, Sichuan University, Chengdu 610065, China School of Aeronautics and Astronautics, Sichuan University, Chengdu 610065, China c Swanson School of Engineering, University of Pittsburgh, Pittsburgh, PA 15261, USA d The State Key Laboratory of Mechanical Transmission, Chongqing University, Chongqing 400030, China b

art ic l e i nf o

a b s t r a c t

Article history: Received 15 December 2014 Received in revised form 5 March 2015 Accepted 18 May 2015 Communicated by: Shen Yin Available online 28 May 2015

A new weak fault diagnosis method based on feature reduction with Supervised Orthogonal Local Fisher Discriminant Analysis (SOLFDA) is proposed. In this method, the Shannon mutual information (SMI) between all samples and training samples is combined into SMI feature sets to represent the mutual dependence of samples as incipient fault features. Then, SOLFDA is proposed to compress the highdimensional SMI fault feature sets of testing and training samples into low-dimensional eigenvectors with clearer clustering. Finally, Optimized Evidence-Theoretic k-Nearest Neighbor Classifier (OET-KNNC) is introduced to implement weak failure recognition for low-dimensional eigenvectors. Under the supervision of class labels, SOLFDA achieves good discrimination property by maximizing the betweenmanifold divergence and minimizing the within-manifold divergence. Meanwhile, an orthogonality constraint on SOLFDA can make the output sparse features statistically uncorrelated. Therefore, SMI feature set combining SOLFDA is able to extract the essential but weak fault features of rotating machinery effectively, compared with popular signal processing techniques and unsupervised dimension reduction methods. The weak fault diagnosis example on deep groove ball bearings demonstrates the advantage of the weak fault diagnosis method proposed in this paper. & 2015 Elsevier B.V. All rights reserved.

Keywords: Shannon mutual information (SMI) Supervised Orthogonal Local Fisher Discriminant Analysis (SOLFDA) Dimension reduction Optimized Evidence-Theoretic k- Nearest Neighbor Classifier (OET-KNNC) Weak fault diagnosis Rotating machinery

1. Introduction Rotating elements, such as bearings and gears, are widely used in machinery equipments. Mechanical faults occurring in bearings and gears often lead to fatal breakdowns in machinery equipments, and such failure can be catastrophic, resulting in costly downtime. Therefore, it is significant to accurately diagnose the existence of weak faults, i.e. the faults at an early stage in rotating elements [1]. The early fault features consist of transient signals that occur approximate periodically at a characteristic frequency. However, in most cases these signals are very weak as they can be buried in the strong background noises with a wide spread frequency band and the interference of the rotor rotating frequency with its harmonics [2]. Besides, there also exists severe signal attenuation between the weak fault source and the sensor collecting the fault signal if the sensor is placed far from the weak fault location [3]. Therefore, the difficulty in weak fault diagnosis focuses on how to extract or identify the weak fault features (i.e. transient components) from strong background noises, rotating frequency interference and signal attenuation.

n

Corresponding author. Tel.: þ 86 18382385401. E-mail address: [email protected] (F. Li).

http://dx.doi.org/10.1016/j.neucom.2015.05.076 0925-2312/& 2015 Elsevier B.V. All rights reserved.

Researchers have been searching for effective methods to detect weak faults of rotating elements. Numerous signal processing techniques have been proposed to extract the early fault feature, among which Autoregressive (AR) model [4], Fast Fourier Transform (FFT), Wigner–Ville Distribution (WVD) [5], Short Time Fourier Transform (STFT), Hilbert–Huang Transform (HHT) [6] and Wavelet Transform (WT) [7] are the most popular time-frequency analysis methods. However, AR model is just applicable to analyzing mutation signals with certainty, periodicity and energy aggregation, which are quite different from the uncertain, nonlinear and nonstationary weak fault signals of rotating machinery. Similarly, classical spectral analysis method FFT is incapable of detecting the nonlinear and nonstationary characteristics of weak faults under low signal-to-noise ratios. WVD usually generates cross-term when it is used to analyze multi-component signals. STFT method is restricted by fixed size of time-frequency window and thus cannot process multi-scale fault signals at early stage. HHT is influenced by the end effects and redundant intrinsic mode functions associated with Empirical Mode Decomposition (EMD) processes [6], in which large swings ultimately propagate inward and corrupt the entire data span. Wavelet method employs a fixed scale of decomposition to analyze signal without considering its characteristics. In summary, due to their intrinsic deficiencies, all

506

F. Li et al. / Neurocomputing 168 (2015) 505–519

these signal processing approaches are inappropriate for feature mining of weak fault. During the past decades, the model-based fault detection methods have been a remarkable research topic. However, their applications on weak fault diagnosis of rotating machinery are unrealistic due to the sophisticated modeling procedure that prerequires physical and mathematical knowledge, regardless of their advantage in handling process dynamics [8,9]. Because of the simple forms and low design efforts compared with the modelbased techniques, the multivariate statistical methods based on data-driven framework, e.g. Principal component analysis (PCA), partial least squares (PLS) and improved partial least squares (IPLS), are widely used for typical fault monitoring of machinery systems [8–11]. However, the weak fault diagnosis is generally a dynamic process because of the nonlinearity, non-stationarity and uncertainty of weak fault signals. Therefore, such multivariate statistical approaches are not suitable to recognize weak faults of rotating machinery due to their basic assumption under stationary and ideal signal conditions [8]. Recently, in view of the pros and cons of model-based and data-driven approaches, the model-data integrated approaches, including subspace identification methods (SIMs), iterative feedback tuning (IFT) and virtual reference feedback tuning (VRFT), have been proposed and aroused concern from both academic and industrial points of view [9]. However, the current model-data integrated approaches still have limitations in dealing with weak fault data that have non-Gaussian distributing disturbances and strong nonlinearity [8,9]. Weak fault diagnosis methods based on pattern recognition technology have the merits of powerful knowledge inference and error correction capability, and they have attracted more research attention recently. In the existing weak fault diagnosis theories based on pattern recognition, the collected signals are firstly analyzed by a fore-mentioned signal processing approach. Then, the secondary filtration approach with the aid of manual analysis, including envelope spectrum analysis or Hilbert demodulation, is used to extract authentic fault features. After that, the extracted fault components are converted into eigenvectors as feature representation of some incipient fault pattern via a sensitive index (e.g. correlation dimension, distance evaluation factors, information entropy, etc.). Finally, the eigenvectors are entered into pattern recognition algorithms in order to identify incipient fault [12,13]. Obviously, these theories are somewhat immature for two reasons as follows. Firstly, to extract weak fault features they usually adopt the aforementioned signal processing approaches that have inherent drawbacks, which make these theories unable to comprehensively dig nonlinear, weak and strongly coupled fault features. Secondly, the current rules of weak fault diagnosis based on pattern recognition rely on manual method to complete the optimization of weak fault feature [14]. In other words, the feature extraction quality and recognition accuracy of weak fault are mainly determined by professional knowledge and field experience of engineers, so that it is quite difficult to realize the highprecision of weak fault diagnosis. In order to overcome the defects of existing approaches based on pattern recognition, a novel weak fault diagnosis method based on feature reduction with Supervised Orthogonal Local Fisher Discriminant Analysis (SOLFDA) is proposed in this paper. Firstly, the Shannon mutual information (SMI) between all samples and training samples is combined into SMI feature sets to represent the mutual dependence of samples, which can be regarded as incipient fault features. Secondly, with the proposed SOLFDA, high-dimensional SMI feature sets of the testing and training samples are reduced to low-dimensional eigenvectors with better discrimination. Finally, the sparse eigenvectors are entered into the Optimized Evidence-Theoretic k-Nearest Neighbor Classifier (OET-KNNC) for weak fault recognition. SOLFDA can maximize the

between-manifold divergence and minimize the within-manifold divergence under the supervision of class labels. In addition, the extracted features via SOLFDA can be statistically uncorrelated by exerting an orthogonality constraint on the basis vector computation. Therefore, in contrast to manual feature refining and other unsupervised dimension reduction methods, SOLFDA can extract more effectively the essential but weak fault information and meanwhile, it is able to compress high-dimensional SMI feature set automatically. In a word, the dimension reduction with SOLFDA can realize the high-precision weak fault recognition, and can be applied to early fault diagnosis of bearing, axle or rotor, gear, turbine of aeroengine, blade of wind turbine and so on. The remainder of this paper is organized as follows. Section 2 introduces the basic theory of SMI. In Section 3, the SOLFDA algorithm is derived in detail. Then, the OET-KNNC theory is discussed in Section 4. In Section 5, the weak fault diagnosis experiment of deep groove ball bearings is performed to verify the proposed method, and experimental results are analyzed. Finally, Section 6 concludes the paper.

2. Shannon mutual information (SMI) One of the principal issues in weak fault diagnosis based on pattern recognition is the weak feature extraction, i.e., selecting a most relevant variable set as the weak fault feature of a testing sample. However, as aforementioned, all the common signal processing approaches are inapplicable to weak fault feature extraction. In order to accurately measure the relevance between testing samples and training samples for the learning task, it is proposed to measure the dependence of the former on the latter in this paper. SMI is just one important and novel dependence measure that can capture linear and nonlinear relations [15]. In fact, SMI has been recently used as a feature selector to extract the most relevant variables in the fields of graphic identification and text categorization. Moreover, it is not difficult for SMI to calculate mutual information efficiently under limited sample size conditions. The theoretical analysis on SMI is provided as follows. , , The entropy Hðx Þ of a random vector (i.e., variable) x (some, times written as HðPðx ÞÞ), is a function of probability distribution , , , Pðx Þ since the Hðx Þ only lies on Pðx Þ instead of the actual values of , , x . Shannon entropy is able to well measure the uncertainty of x and further quantify the difficulty in predicting the variable. The definition of Shannon entropy can be represented as an expectation value X , , Hðx Þ ¼  E½ log Pðx Þ ¼  ½pðxÞlog ðpðxÞÞ ð1Þ x ,

where pðxÞ ¼ PðX ¼ xÞ (x A x ) denotes the probability distribution , function of variable x . Thus, the Shannon entropy can be con, sidered as the average amount of information in variable x . In other words, it is just the removed uncertainty after the actual , feature of x is revealed. , , Based on the Shannon entropy, mutual information Iðx ; y Þ, in , which variable x is known, can represent the uncertainty reduc, tion amount of variable y as follows: , ,

,

,

, ,

Iðx ; y Þ ¼ Hðx Þ þ Hðy Þ  Hðx ; y Þ

ð2Þ

, ,

Notably, Iðx ; y Þ is also the KL divergence of the product of , , marginal probability distributions Pðx Þ and Pðy Þ from the joint , , probability distribution Pðx ; y Þ [15]   XX pðx; yÞ , , , , , , Iðx ; y Þ ¼ DKL ðPðx ; y Þj j Pðx Þ U Pðy ÞÞ ¼ pðx; yÞlog pðxÞ U pðyÞ x y ð3Þ ,

,

where pðx; yÞ ¼ PðX ¼ x; Y ¼ yÞ (x A x , y A y ).

F. Li et al. / Neurocomputing 168 (2015) 505–519 , ,

,

,

Similarly, Iðy ; x Þ can represent the uncertainty of x if y is , , , , available. It can be further proved that Iðx ; y Þ ¼ Iðy ; x Þ. To sum up, the mutual Information I can be perfectly used as , , the mutual dependence measure of the two variables x and y .

507 ,

,

,

,

,

an available unitary matrix, then Y ¼ ½y 1 ; y 2 ; ⋯; y n1 ; y n1 þ 1 ; y n1 þ 2 ; , ⋯; y n1 þ n2  denotes the orthogonal projection of X into embedded space M. 3.2. Algorithm of SOLFDA

3. Supervised Orthogonal Local Fisher Discriminant Analysis (SOLFDA)

In order to build the local geometric structure of M, a nearest ,

neighbor graph G should be constructed. For each sample vector x i , Although SMI is expected to measure the relevance between testing samples and training samples, the dimension of feature set consisting of SMI increases with the increase of training sample size according to the equivalent relation of SMI feature set dimension and training sample size. Dimension expansion means that the existence probability of outliers will increase accordingly. The singular and nonlinear components mixed in high-dimensional SMI feature set will inevitably cause a negative impact on weak fault feature extraction, i.e., the so-called “dimension disaster” [16]. In order to avoid “dimension disaster” and maintain the robustness of the proposed weak fault diagnosis method, it is essential to further dig sparse and essential eigenvectors with low dimension, good clustering and high sensitivity from highdimensional SMI feature set which may contain outliers by an appropriate feature reduction method. Classic feature reduction approaches include Independent Component Analysis (ICA), Kernel Independent Component Analysis (KICA) [17], MultiDimensional Scaling (MDS) [18], etc. However, these approaches are only valid for linear and Gaussian data, but not applicable to weak fault signal with nonlinear data structure and non-Gaussian distribution [19]. Recently, a new theory of nonlinear feature reduction called manifold learning becomes the research focus. It aims to project the complex high-dimensional data into a lowdimensional topological space by preserving the local neighborhood structure to discover the intrinsic feature of nonlinear highdimensional data [20,21]. Currently, typical manifold learning methods mainly include Linear Discriminate Analysis (LDA) [22], Orthogonal Neighborhood Preserving Embedding (ONPE) [23], Local Fisher Discriminant Analysis (LFDA) [24], etc. However, these methods take local structure and discriminant information into separate consideration rather than combining them together. Besides, the basis vectors output by LDA, ONPE and LFDA are all statistically correlated so that the features extracted by them still contain certain redundant information distorting feature distribution [16]. Based on LFDA, we propose a novel manifold learning algorithm called Supervised Orthogonal Local Fisher Discriminant Analysis (SOLFDA) for feature reduction of high-dimensional SMI feature set. 3.1. Problem description of SOLFDA  , , , , Given n ¼ n1 þ n2 , the data points X ¼ x 1 ; x 2 ; ⋯; x n1 ; x n1 þ 1 ; , , x n1 þ 2 ; ⋯; x n1 þ n2 g A ℜmðn1 þ n2 Þ

are sampled from one underlying manifold M, where n1 denotes the training sample size, n2 denotes the testing sample size, n denotes the total number of all fault samples and m denotes the original dimension of sample. The goal of feature reduction is to map X A ℜm to Y A ℜd by using the sample information, where d is the dimension obtained by data reduction and d⪡m. In other words, the feature reduction rules consist of replacing the high-dimensional data X by a matrix as in the following form: ð4Þ Y ¼ VT X where V A ℜmd , , , Hence, each original vector x i can be replaced by y i ¼ VT x i , d which is a member of the d-dimensional mapped space ℜ . If V is

,

we search its k nearest neighbors and set a boundary between x i  1 2  , , , ,k and its neighbors. Let Nðx i Þ ¼ x i ; x i ; ⋯; x i be the set of k nearest neighbors, and then a weight matrix of G can be defined as [24] 0 1 , , ‖x i  x j ‖2 A @ ð5Þ Ai;j ¼ exp  σ2 However, the Heat Kernel given by Eq. (5) produces a nonsparse affinity matrix A, which is computationally disadvantageous compared to a sparse one. A sparse affinity matrix can be obtained by assigning positive weight values only to neighboring points, but it is difficult to select an ideal parameter σ for Heat Kernel. To solve this problem, the weight matrix of G can be redefined as follows: ( , , , , 1 if x i A Nðx j Þ or x j A Nðx i Þ A0i;j ¼ ð6Þ 0 otherwise The nearest neighbor graph G with affinity matrix A0 can characterize the local geometric structure of data manifold, but it fails to detect the discriminant information in the data. To obtain both the geometric and discriminant information of data manifold, we have to construct two graphs, i.e., between-manifold graph Gb , and within-manifold graph Gw . For each data point x i , the , neighbor set Nðx i Þ can be naturally divided into two subsets: , , , Nb ðx i Þ and Nw ðx i Þ, in which Nb ðx i Þ only contains the neighbors , with different class labels, and Nw ðx i Þ just includes the neighbors , , , , with the same label as x i . Obviously, Nb ðx i Þ [ N w ðx i Þ ¼ Nðx i Þ and , , Nb ðx i Þ \ N w ðx i Þ ¼ ∅. Let Wlb' be the affinity matrix of Gb and Wlw' be the affinity matrix of Gw . Then, the two equations are defined as follows: ( 0 Ai;j ð1=n  1=nℓi Þ if ℓi ¼ ℓj W 0lb;ij ¼ ð7Þ 1=n if ℓi a ℓj ( W 0lw;ij ¼

A0i;j =nℓi

if

ℓi ¼ ℓj

0

if

ℓi a ℓj

ð8Þ

In Eqs. (7) and (8), ℓi A f1; ⋯; cg or ℓj A f1; ⋯; cg represents the , , class label of training sample x i or x j (i ¼ 1; 2; ⋯; n1 , j ¼ 1; 2; ⋯; n1 ), Pc n ¼ n1 þ n2 , n1 ¼ ℓi ¼ 1 nℓi , and c is the number of classes. Obviously, when two samples share the same class label, we can safely assume that they lie in the same manifold. Thus, the weight value in W0lw for within-manifold should be relatively large. In a word, from the construction of weight matrixes W0lb and W0lw , it can be concluded that SOLFDA is a supervised learning algorithm, which performs between-manifold isolation and within-manifold clustering under the guidance of class labels of training samples. The objective of SOLFDA is to maximize the between-manifold divergence of each data point in N b from its k-nearest interclass neighbors and minimize the within-manifold divergence of each data point in Nw from its k-nearest intra-class neighbors. In order to choose a good projection for original data X, we optimize the following two objective functions under appropriate constraints: X , , max ‖y i  y j ‖2 W 0lb;ij ð9Þ ij

508

F. Li et al. / Neurocomputing 168 (2015) 505–519

min

X , , ‖y i y j ‖2 W 0lw;ij :

ð10Þ

ij

The objective function (9) for between-manifold graph Gb will , , incur a severe penalty once neighbors x i and x j are projected closer from each other in the low-dimensional embedded space, but they actually belong to different classes. By algebraic deduction, the objective function (9) can be simplified as 1X , , 2 0 1X T , , , , ‖y i  y j ‖ W lb;ij ¼ ðV x i VT x j ÞðVT x i  VT x j ÞW 0lb;ij 2 ij 2 ij 0 1 X 0 , , , , T T @ ¼ trace V ð W lb;ij ðx i  x j Þðx i  x j Þ ÞVA ij

¼ traceðVT S0lb VÞ

ð11Þ

where S0lb denotes the between-manifold divergence matrix demonstrated as X 0 , , , , W lb;ij ðx i  x j Þðx i  x j ÞT ð12Þ S0lb ¼ ij

In contrast, the objective function (10) for within-manifold , , graph Gw will incur a heavy penalty if neighbors x i and x j are projected farther away from each other in the low-dimensional embedded space, whereas they actually belong to the same class. Similarly, objective function (10) can be simplified by algebraic deduction:

ð13Þ

S0lw

denotes the within-manifold divergence matrix demonwhere strated as X 0 , , , , W lw;ij ðx i  x j Þðx i  x j ÞT ð14Þ S0lw ¼ ij

Thus, the objective function (10) can be rewritten as ð15Þ

V

Equivalently, max VT S0lw V

ð16Þ

V

Moreover, the objective function (9) can be expressed as follows: max VT S0lb V

ð17Þ

V

By optimizing Eqs. (16) and (17) simultaneously, we can derive the projection matrix V, and formulate it as h i arg max tr VT ðS0lb S0lw ÞV ð18Þ V

,

Since any two different lower-dimensional embeddings, y i and (ja i), of the extracted feature set Y ¼ VT X are statistically uncorrelated if V maximizes the optimal objective function with the conjugated orthogonality constraints below [16]:

, yj

,T , v i St v j

¼ 0; ja i

ð19Þ

the orthogonality constraint in Eq. (19) can be imposed on the , lower-dimensional feature set Y. Then, v i can be normalized to satisfy the following equation: ,

VT St v i ¼ 1

ð22Þ

Thus, according to Eqs. (18) and (22), SOLFDA can be formulated as a constrained optimization problem: h i arg max tr VT ðS0lb  S0lw ÞV V

s:t: VT St V ¼ I

ð23Þ

Finally, the Lagrange multipliers are applied to Eq. (23), and the , derivative with respect to V is set as zero. The projection vector v which maximizes Eq. (23) can be obtained from the maximum eigenvalue solution to the following generalized eigenvalue problem: ,

,

ðS0lb  S0lw Þv ¼ λSt v

ð24Þ , , , v 1 ; v 2 ; ⋯; v d

are the We assume that the column vectors solutions of Eq. (24), which are ranked according to their eigenvalues λ1 Z λ2 Z ⋯ Z λd . Thus, the projection can be given as follows: ,

¼ VT x i

ð25Þ

, , , ½v 1 ; v 2 ; ⋯; v d .

ij

min VT S0lw V

VT St V ¼ I

, , x i ↦y i

1X , , 2 0 1X T , , , , ‖y i  y j ‖ W lw;ij ¼ ðV x i  VT x j ÞðVT x i  VT x j ÞW 0lw;ij 2 ij 2 ij 0 1 X 0 , , , , T T ¼ trace@V ð W lw;ij ðx i  x j Þðx i  x j Þ ÞVA ¼ traceðVT S0lw VÞ

The total scatter matrix St is   n 1X 1 1,,T , , , , ðx i  μ Þðx i μ ÞT ¼ X I  e e XT ð21Þ St ¼ ni¼1 n n  Pn , , , T where μ ¼ 1=n i ¼ 1 x i , I is an identity matrix, and e ¼ ð1; ⋯; 1Þ . We can summarize Eqs. (19) and (20) as

ð20Þ

where V ¼ On the whole, the advantages of SOLFDA are mainly reflected in the following two aspects. From the perspective of discrimination ability, SOLFDA can maximize the between-manifold divergence and minimize the within-manifold divergence under the supervision of class labels (see Eqs. (7) and (8)), which means it can optimize the local and global manifold structure with the class information of samples. On the other hand, SOLFDA imposes an orthogonality constraint on original LFDA to make the extracted features statistically uncorrelated (see Eq. (19)), so that it can make the reduced features have minimum redundancy. Due to the above two advantages, SOLFDA has more excellent classification and dimension reduction characteristics than existing manifold learning methods. Based on the above preparation, we can summarize the SOLFDA algorithm as follows: Step1: Project the original sample set XORG into a PCA subspace to reduce the noise in XORG . The de-noised XORG is denoted by the X in Section 3, and the transformation matrix of PCA is denoted by ΦPCA . Step2: Construct the nearest neighbor graph G with affinity matrix A0 . Step3: Construct between-manifold graph Gb and withinmanifold graph Gw . Step4: Compute the between-manifold weight matrix W0lb and the within-manifold weight matrix W0lw as given in Eqs. (7) and (8). Step5: Calculate the local between-manifold divergence matrix S0lb and the local within-manifold divergence matrix Slw' on the basis of Eqs. (12) and (14), as well as the total scatter matrix St according to Eq. (21). Step6: Compute the eigenvectors associated with the largest , , eigenvalues of ðS0lb S0lw Þv ¼ λSt v and build the optimal projection matrix V. Step7: The sparse coordinates of the n samples in the d-dimensional embedded space are provided by the column vectors of Y ¼ V0T XORG , where V0 ¼ ΦPCA V.

F. Li et al. / Neurocomputing 168 (2015) 505–519

4. Optimized Evidence-Theoretic k-Nearest Neighbor Classifier (OET-KNNC)

509 ,

correctly measured by a distance function dð U; UÞ, and let vector x be a testing sample that is classified by using the information in T. ,ðiÞ

The last procedure in weak fault diagnosis methods based on pattern recognition technology is to put the low-dimensional eigenvectors obtained by dimension reduction into a knowledge inference model for weak fault identification. Classic knowledge inference models (i.e., terminal classifiers) mainly include Fuzzy CMeans Clustering (FCM) [25], Fuzzy Positivistic C-Means Clustering (FPCM) [26], Continuous Hidden Markov Model (CHMM) [27,28], neural network, Support Vector Machine (SVM) [29], k-Nearest Neighbor Classifier (KNNC) [30], etc. These methods have been applied to fault diagnosis of some machinery systems. For examples, FPCM is developed for fault detection and isolation of vehicle suspension systems [26], and CHMM is used for fault monitoring of centrifugal pump [28]. However, FCM and FPCM are both the unsupervised classifiers that cannot fully utilize the class label information of training samples, and their accuracy of classification and clustering are not high enough; CHMM lacks the algorithms to learn imprecise-probabilistic models from incomplete data, thus its inference power is limited in the case of small training samples [27]; neural network often relapses into local minimums, insufficient training or over training; the existing kernel functions of SVM are unlikely to produce a group of complete orthogonal bases by translation, and fail to approximate its own classification function in square integrable space at times [14]; as for the KNNC, it implicitly assumes that the k nearest neighbors of a sample are in the region of relatively small volume, but in practice it offers no effective solution to deal with the situation in which one or more of the distances between the sample point and its closest neighbors become very large outside the regions of high density [31,32]. In a word, these theoretic defects restrict the application of FCM, FPCM, CHMM, neural network, SVM, and KNNC in rotating machinery weak fault diagnosis. Recently, a novel pattern classification method called EvidenceTheoretic k-Nearest Neighbor Classifier (ET-KNNC) has attracted our attention. The method is based on the Dempster–Shafer evidence theory. It considers each of the k-nearest neighbors of a testing sample as an evidence item that supports certain hypotheses on the class membership of the testing sample. According to this evidence, the basic belief values are assigned to each subset in class set. These values are obtained for each neighbor of the testing sample and then aggregated using the Dempster combination rule. Given a finite set of losses corresponding to each class, class membership decisions can be made by using the generalization theory of Bayes decision [32,33]. Based on the same information, this approach has been experimentally verified to yield higher recognition ratios in many situations than the aforementioned methods. However, the classification accuracy of ET-KNNC still fluctuates with the variation of neighbors' number k. In this paper, the ETKNNC with parameter optimization [33] is introduced to minimize the adverse effects of the inappropriate choice of k on classification accuracy. 4.1. Evidence-theoretic k-nearest neighbor classifier (ET-KNNC) The problem of classifying testing samples into M categories is considered, and the set of categories is denoted by Ω ¼ fω1 ; ⋯; ωM g. We assume that a training set consists of the available information  ð1Þ  ðNÞ ðiÞ , , , T ¼ ðx ; ωð1Þ Þ; ⋯; ðx ; ωðNÞ Þ with N n-dimensional samples x , i ¼ 1; ⋯; N, and their class labels are values taken from Ω as ωðiÞ , i ¼ 1; ⋯; N. Assume that the similarity between samples can be

Thus, each data pair ðx ; ωðiÞ Þ constitutes a distinct evidence item ,

,ðiÞ

,

regarding the class membership of x . When x is “close” to x , according to the distance metric d, it is reasonable to believe that ,

both samples share the same class. Conversely, if x is “far” from ,ðiÞ

,ðiÞ

x , the consideration of x

may lead to an almost complete

,

ignorance of the class of x . Therefore, we can assume that this ,ðiÞ

evidence item induces a basic belief assignment (BBA) mð U j x Þ over Ω as follows:  

,ðiÞ ðiÞ ð26Þ ¼ αϕq ðd Þ m ωq j x   ðiÞ , ðiÞ ¼ 1  αϕq ðd Þ m Ωj x

ð27Þ

  ,ðiÞ ¼ 0; m Aj x

ð28Þ

8 A A 2Ω \fΩ; fωq gg

, ,ðiÞ

ðiÞ

,ðiÞ

where d ¼ dðx ; x Þ, ωq is the class of x ½ωðiÞ ¼ ωq , α denotes a parameter satisfying 0 o α o 1, and ϕq represents a decreasing function that satisfies ϕq ð0Þ ¼ 1 and limd-1 ϕq ðdÞ ¼ 0. It is noted ,ðiÞ

that mð U j x Þ will reduce to the vacuous belief function   ,ðiÞ , ,ðiÞ ¼ 1 if the distance between x and x tends to infinity, ½m Ωj x which presents the state of total ignorance. If d denotes the Euclidean distance, a reasonable choice for ϕq , shown in Ref. [33], can be 2

ϕq ðdÞ ¼ expð  γ q d Þ

ð29Þ

where γ q is a positive parameter corresponding to class ωq . , The training samples far away from x actually offer little information. Therefore, it is quite sufficient to combine the BBA's , of k-nearest neighborhood points of x using the Dempster combination rule to generate a resulting BBA m (i.e., synthesize final , belief values regarding the class of x ). An appropriate definition of m can be: ,ði1 Þ

m ¼ mð U j x

,ðik Þ

Þ  ⋯  mð U j x

Þ

ð30Þ

where Ik ¼ fi1 ; ⋯; ik g represents the indexes of k-nearest neighbors , of x in T. According to the definition above, m can be expressed by [32] !

1 ðiÞ ðiÞ 1  ∏ f1  αϕq ðd Þg U ∏ ∏ f1  αϕr ðd Þg; mð ωq Þ ¼ K r a qi A Ik;r i A Ik;q 8 q A f1; ⋯; Mg mðΩÞ ¼

ð31Þ

1 M ðiÞ ∏ ∏ f1  αϕr ðd Þg K r ¼ 1 i A Ik;r

ð32Þ ,

where Ik;q is the subset of Ik that corresponds to the neighbors of x belonging to the class ωq , and K is a normalization factor K¼

M X q¼1

! ðiÞ

1  ∏ f1  αϕq ðd Þg i A Ik;q

M

ðiÞ

þ ∏ ∏ f1  αϕr ðd Þg: r ¼ 1 i A Ik;r

ðiÞ

∏ ∏ f1  αϕr ðd Þg

r a qi A Ik;r

ð33Þ

510

F. Li et al. / Neurocomputing 168 (2015) 505–519

Consequently, the pignistic probability distribution proposed in Ref. [34] is given by

BetPð ωq Þ ¼

X

mðΩÞ mðAÞ ¼ mð ωq Þ þ A M fA D Ωj ωq A Ag

ð34Þ

for q ¼ 1; ⋯; M. Based on this evidential corpus, it is assumed that a decision , can be made concerning the assignment of x to class ωq , which is denoted by an action αq . Further, it is assumed that the loss in case of correct classification is equal to zero, and the loss corresponding to wrong classification is equal to one. Then, the expected loss regarding the pignistic distribution is ,

Rbet ðαq j x Þ ¼ 1  BetPðfωq gÞ

ð35Þ

After the particular form of m is defined, the strategy of minimizing Rbet leads to the following decision: the testing sample is assigned to the unique class with maximum belief value. 4.2. Parameter optimization In the above discussion of the ET-KNNC, the question of , choosing parameters α and γ ¼ ðγ 1 ; ⋯; γ q ÞT in Eqs. (26) and (29) is always left open. The value of α practically proves to be not critical, , whereas the adjustment of γ ¼ ðγ 1 ; ⋯; γ q ÞT is experimentally found to have a great impact on classification accuracy. Further, the classification efficiency can be improved if these parameters are determined as the ones of an optimized performance criterion. ðℓÞ ,

We consider a training sample x

belonging to class ωq . The class

,

membership of x can be represented as an indicator vector ðℓÞ ,

t

ðℓÞ T ðℓÞ ¼ ðt ðℓÞ are respectively defined by 1 ; ⋯; t M Þ where variables t j

t ðℓÞ j

¼ 1 if j ¼ q, and t jðℓÞ ¼ 0 otherwise. By using the information of

5. Application and discussion 5.1. Weak fault diagnosis for bearing weak fault data from test-bed The validity of the proposed method was proved by the weak fault diagnosis of deep groove ball bearings. Model 6205-2RS deep groove ball bearings with 52 mm outer diameter, 25 mm inner diameter and 15 mm thickness were employed, and the complete set of experimental equipments is shown in Fig. 1. The bearing test-bed consists of an electric motor, an electric control device, a rotating torque loader and revolving axes. The electric motor drives the input axis of test-bed at a rotational speed of 1770 r/ min, and the output axis of test-bed drives the load. Three tiny grooves were respectively machined on three 6205-2RS bearings to simulate the three weak cracks of bearings, as shown in Table 1. 50 samples were respectively acquired for the normal state and each fault state with piezoelectric accelerometers, charge amplifier and signal acquisition device. The sampling frequency was 48 kHz and the time length of each sample was 0.1 s. We randomly selected just nℓi ¼ 20 out of the 50 acquired samples as the training samples, and the remaining were all used as testing samples. Fig. 2 shows the time-domain waveforms and frequency-domain amplitude spectrums of the testing samples which have normal state, tiny outer race crack, tiny inner race crack or tiny ball crack. Obviously, the signals of the three weak faults are mainly composed of background noise just like normal state signals so that the weak harmonic features (i.e., certainty, periodicity and energy aggregation) hidden in strong noise can hardly be identified by classical signal processing methods, as the spectrum analysis results show in Fig. 2. Further, 4096 continuous data were intercepted from each sample as time series for weak fault recognition of the testing samples. The period range (or frequency band) of each weak fault

,ðℓÞ

k-nearest neighbors of x in training sample set, a “leave-one-out” BBA mðℓÞ can be obtained, which characterizes the belief regarding ,ðℓÞ

the class of x if the sample is to be classified with other training samples. On the basis of mðℓÞ , we can compute the output vector ,ðℓÞ

p

¼ ðBetP ðℓÞ ðfω1 gÞ; ⋯; BetP ðℓÞ ðfωM gÞÞT of pignistic probabilities, in

which BetP ðℓÞ the pignistic probability distribution is relative to ðℓÞ ,

mðℓÞ . Ideally, vector p

,ðℓÞ

should be as close to vector t

as possible. ,ðℓÞ

Here, closeness is defined according to the squared error Eðx ,ðℓÞ

Eðx

,ðℓÞ

Þ ¼ ðp

,ðℓÞ T ,ðℓÞ

t

Þ ðp

,ðℓÞ

t

Þ¼

M X

ðℓÞ 2 ðpðℓÞ q  tq Þ

Þ ð36Þ

q¼1

The final mean squared error over the entire training sample set T of size N is equal to E¼

N 1 X ,ðℓÞ Eðx Þ Nℓ¼1

ð37Þ Fig. 1. Experiment equipment of weak bearing fault diagnosis.

We can regard the function E as a cost function for adjusting the ,

,ðℓÞ

parameter vector γ . The gradient expression of Eðx ,

Þ with respect to

γ can be calculated so that the parameters γ q can be iteratively determined by the gradient search procedure used in Ref. [33]. The refinement of ET-KNNC will be demonstrated experimentally to obtain significant improvement on classification accuracy. The classification rule of OET-KNNC has superior performance on the rotating machinery weak fault diagnosis problem relative to the neural network, SVM, KNNC and ET-KNNC. Moreover, the distinct property achieved by OET-KNNC is relatively insensitive to the choice of k.

Table 1 Three types of weak fault patterns on deep groove ball bearings. No. Failure pattern

Machining position of Groove diameter Groove depth groove (mm) (mm)

1

Outer race of the 1st 0.089 bearing Inner race of the 2nd 0.089 bearing Ball of the 3rd bearing 0.089

2 3

Tiny outer race crack Tiny inner race crack Tiny ball crack

0.140 0.140 0.140

F. Li et al. / Neurocomputing 168 (2015) 505–519

511

Fig. 2. Time-domain waveforms and frequency-domain amplitude spectrums of testing signals: (a) time-domain waveform of normal state; (b) amplitude spectrums of normal state; (c) time-domain waveform of tiny outer race crack; (d) amplitude spectrums of tiny outer race crack; (e) time-domain waveform of tiny inner race crack; (f) amplitude spectrums of tiny inner race crack; (g) time-domain waveform of tiny ball crack; and (h) amplitude spectrums of tiny ball crack.

Collected vibration sinals of bearings (Training (Training samples) samples)

Real-time vibration sinals of bearings (Testing samples)

Data acquisition

(1) Construct Shannon mutual information (SMI) between training samples to form feature set

(1) Construct Shannon mutual information (SMI) between testing samples and training samples to form feature set

Weak fault features mining

(2) Compress the high-dimensional SMI feature set by Supervised Orthogonal Local Fisher Discriminant Analysis (SOLFDA) with reduced dimension optimization low-dimensional eigenvectors

Dimension reduction

low-dimensional eigenvectors

(3) Make classification decision with Optimized Evidence-theoretic kNearest Neighbor Classifier (OET-KNNC)

Pattern recognition

(4) Recognize weak fault patterns

Output the results

Training phase

Diagnostic phase

Fig. 3. Implementation of the weak fault diagnosis method based on feature reduction with SOLFDA (SMI þSOLFDA þ OET-KNNC).

should be entirely covered by the length of time series and meanwhile overlarge computation should be avoided. The complete implementation process of weak fault diagnosis in Fig. 3 can be described in detail as follows: Step (1): The SMI between each testing sample and all training samples were constructed to form 80 (c  nℓi , here, c ¼ 4, nℓi ¼ 20)-dimensional normalized fault feature set for each testing sample. Meanwhile, the SMI between each training sample and all training samples were constructed to form 80-dimensional

normalized fault feature set for each training sample. Thus, the weak fault feature set of each testing or training sample consists of 80 normalized SMI. The detailed construction principle of SMIbased weak fault feature sets is shown in Fig. 4. Step (2–1): The 80-dimensional weak fault feature sets of testing and training samples were input into SOLFDA in order to deduce projection matrix V0 , i.e., train SOLFDA. The parameter k of SOLFDA was set as k ¼ d. An effective way to estimate the reduced dimension d (1 r d o 80) of SOLFDA (i.e., the dimension of OET-KNNC's input eigenvectors) is described

512

F. Li et al. / Neurocomputing 168 (2015) 505–519

Training samples

xi

( i = 1, 2, ... n2 )

SMI 1

x2

SMI 2

...

Testing sample

x1

SMI k

xc×n

i

SMI c×n

(SMI1 , SMI 2 , ... SMI c×n ) i

Normalization

(SMI1 ,SMI 2 , ... SMI c×n i ) i

c × n i -dimensional normalized fault feature set of testing sample xi

Training samples

xj

( j = 1, 2, ... c × n i )

SMI 1

x2

SMI 2

...

Training sample

x1

SMI k

xc×n

'

(SMI1' ,SMI '2 , ... SMI 'c×n ) i

'

Normalization

'

(SMI1' ,SMI '2 , ... SMI 'c×n ) i

'

i

SMI c×n

i

c × n i -dimensional normalized fault feature set of training sample x j

Fig. 4. Construction principle of SMI-based weak fault feature sets of (a) testing samples and (b) training samples.

as follows [35]: Firstly, an eigen analysis on every local covariance matrix Q i , , , , with element Q i ðs; tÞ ¼ ðx i  x s ÞT ðx i  x t Þ was performed; , , , , Then, a di was found for each feature set x i A X ¼ ½x 1 ; x 2 ; ⋯; x n  Pdi Pm by specifying j ¼ 1 λj Z0:95, where λj is the larger j ¼ 1 λj = , non-zero eigenvalue of Q i , and m ¼ 80 is the dimension of x i ; Finally, majority voting over the all feature sets was used to select maximal di as the overall d. Step (2–2): The 80-dimensional feature sets were compressed into d-dimensional nonlinear eigenvectors by the trained SOLFDA. The partial d-dimensional eigenvectors are shown in Table 2. Step (3): The d-dimensional eigenvectors of testing and training samples were entered into OET-KNNC for classification decision. The neighbors' number k and parameter α of OETKNNC were set to be k ¼ 14 and α ¼ 0:95, respectively. The relations between expected outputs of OET-KNNC and weak fault patterns (including normal state) were set as follows:1 (P 1 )-normal state, 2 (P 2 )-tiny outer race crack, 3 (P 3 )-tiny inner race crack and 4 (P 4 )-tiny ball crack, where P i (i ¼ 1; 2; 3; 4) represents the weak fault patterns of testing samples. Step (4): The weak fault diagnosis results of testing samples were finally output by OET-KNNC. Table 3 presents some analysis results. In Table 3, the recognition accuracy rate ηi is defined as ηi ¼

i NPcorrect i N Ptotal

 100%

ð38Þ

i where N Pcorrect represents the amount of testing samples that are i correctly recognized and N Ptotal represents the total amount of testing samples. Moreover, the average recognition accuracy rate η can be expressed as:

η¼

4 1X η 4i¼1 i

ð39Þ

Table 3 indicates that our weak fault diagnosis method can accurately discriminate among normal state and three weak faults.

More significantly, the total time that the MATLAB program of proposed method consumes to identify the weak fault patterns is merely 65.381144 s. Complementally, the measurement of computation time was made in the following computer configuration environment: 4 G RAM, 3 GHz Intel CPU and MATLAB 2009.

6. Discussion In this section, four comparison analyses were performed to verify the advantageous features of the proposed weak fault diagnosis method based on feature reduction with SOLFDA. Firstly, the diagnosis accuracy of SMI-based weak fault feature set with SOLFDA dimension reduction (SMI þ SOLFDA) was compared with Autoregressive (AR) model coefficients and instantaneous amplitude Shannon entropy of intrinsic mode function (IMF) from Empirical Mode Decomposition (EMD), where the AR model order was set as na ¼ 8 [4] and the minimum number of each sample's IMFs was determined as N min ¼ 9 [14]. The results are compared in Table 4. It can be noted that the diagnosis accuracy rate ηi (i ¼ 1; 2; 3; 4) and its mean value η obtained by the SMI feature set with SOLFDA dimension reduction are both higher than those obtained by AR model coefficients and IMF instantaneous amplitude Shannon entropy. Therefore, the SMI feature set with SOLFDA dimension reduction has better data mining performance than the classical feature extraction approaches based on signal processing techniques. Secondly, the feature reduction effect of SOLFDA was compared with those of four commonly used manifold learning methods: Local Fisher Discriminant Analysis (LFDA), Locality Preserving Projection (LPP), Orthogonal Neighborhood Preserving Embedding (ONPE), Linear Discriminate Analysis (LDA) and a popular multivariate statistical approach-PCA. The neighbor point numbers of LFDA, LPP, ONPE, LDA and PCA were set as k ¼ d like that of SOLFDA. Table 5 summarizes the diagnosis accuracy (i.e., classification effect) obtained by feature reduction with the six feature reduction methods. As is known, LFDA only uses local structure information to represent the local geometry of essential manifold structure, but ignores the supervisory role of class information of training samples in classification. Worse still, the basis vectors output by LFDA are partly correlated with one another so

F. Li et al. / Neurocomputing 168 (2015) 505–519

513

Table 2 d-dimensional eigenvectors achieved by feature reduction with SOLFDA. Fault pattern

Sample serial number No:

Estimate of d

Normal state

15

d ¼ 15

16 17 18 Tiny outer race crack

15 16 17 18

Tiny inner race crack

15 16 17 18

Tiny ball crack

15 16 17 18

,

d-dimensional eigenvector y achieved by dimension reduction with SOLFDA

(0.0612 0.0248 0.0121 0.0236 0.0096 0.0208 0.0169 0.0119 0.0070 0.0111 0.0006 0.0073 0.0063 0.0227 0.0087) (0.0571 0.0179 0.0124 0.0163 0.0311 0.0125 0.0063 0.0031 0.0014 0.0145 0.0188 0.0009 0.0218 0.0035 0.0145) (0.0567 0.0157 0.0221 0.0003 0.0239 0.0130 0.0101 0.0088 0.0135 0.0263 0.0070 0.0013 0.0081 0.0084 0.0164) (0.0558 0.0155 0.0006 0.0117 0.0131 0.0063 0.0063 0.0141 0.0057 0.0163 0.0357 0.0046 0.0001 0.0021 0.0048) (0.1192 0.0194 0.0335 0.0656 0.0413 0.1668 0.0851 0.0764 0.1671 0.0649 0.0132 0.0091 0.0814 0.0486 0.0142) (0.0963 0.0617 0.0546 0.0778 0.0045 0.0898 0.0318 0.1220 0.0290 0.0008 0.0021 0.0130 0.0076 0.0643 0.0392) (0.0298 0.0257 0.1087 0.0776 0.0120 0.0364 0.0021 0.0081 0.0848 0.1080 0.0393 0.0394 0.0226 0.0304 0.0574) (0.0385 0.0065 0.0812 0.0755 0.0897 0.0977 0.1260 0.0273 0.0627 0.0119 0.0648 0.0335 0.0726 0.1629 0.0155) (0.0849 0.0228 0.0118 0.0088 0.0174 0.0009 0.0055 0.0043 0.0368 0.0413 0.0177 0.0487 0.0462 0.0238 0.0546) (0.0452 0.0280 0.0019 0.0373 0.0013 0.0266 0.0593 0.0321 0.0291 0.0356 0.0441 0.0022 0.0220 0.0273 0.0258) (0.0786 0.0292 0.0154 0.0258 0.0259 0.0251 0.0347 0.0755 0.0434 0.0546 0.0053 0.0665 0.0543 0.0196 0.0191) (0.0651 0.0420 0.0026 0.0210 0.0322 0.0146 0.0031 0.0375 0.0188 0.0545 0.0180 0.0259 0.0181 0.0259 0.0041) (0.0242 0.0223 0.0096 0.0136 0.0158 0.0304 0.0125 0.0115 0.0311 0.0139 0.0069 0.0131 0.0247 0.0124 0.0335) (0.0288 0.0864 0.0804 0.0198 0.0646 0.0234 0.0125 0.0075 0.0141 0.0027 0.0139 0.0011 0.0174 0.0185 0.0214) (0.0020 0.0828 0.0375 0.0092 0.0379 0.0007 0.0115 0.0194 0.0335 0.0314 0.0195 0.0067 0.0055 0.0196 0.0418) (0.0191 0.1018 0.0686 0.0242 0.0481 0.0095 0.0225 0.0024 0.0136 0.0221 0.0103 0.0013 0.0013 0.0272 0.0389)

Table 3 Partial masses of basic belief assignment and partial diagnosis results of weak faults obtained by OET-KNNC. (α ¼ 0:95, k ¼ 14, the first 4 columns of m contain the probability values mðfωq gÞgiven to each class, and the last column contains mðΩÞ) Fault pattern Sample serial number No:

Optimized γq

Normal state 15

γ 1 ¼ 22.6670 (0.8347 0 0.1632 0 0.0021) (0.9909 0 0.0086 0 0.0004) (0.9765 0 0.0229 0 0.0006) (0.9970 0 0.0019 0 0.0011) γ 2 ¼ 2.8203 (0 0.9987 0 0 0.0013) (0 0.7375 0.0047 0.0249 0.2329) (0 0.4818 0.0036 0.1686 0.3461) (0 0.8988 0 0.0012 0.0999) γ 3 ¼ 15.8900 (0.0194 0 0.8832 0 0.0975) (0.0131 0 0.8056 0.1158 0.0655) (0.0003 0.8418 0.1066 0 0.0514) (0.0390 0 0.7471 0.1713 0.0426) γ 4 ¼ 7.8445 (0.0346 0 0.0789 0.8817 0.0047) (0.0003 0 0.0031 0.9653 0.0313) (0.0000 0 0.0001 0.9992 0.0007) (0 0 0.0035 0.9798 0.0167)

16 17 18 Tiny outer race crack

15 16 17 18

Tiny inner race crack

15 16 17 18

Tiny ball crack

15 16 17 18

Basic belief assignment m

Actual output c

Expected output c0

Whether the output result is correct?

Recognition accuracy rate ηi

Average recognition accuracy rate η

1

1

Yes

η1 ¼ 86.67%

90.83%

1

1

Yes

1

1

Yes

1

1

Yes

2 2

2 2

Yes Yes

2

2

Yes

2

2

Yes

3

3

Yes

3

3

Yes

2

3

No

3

3

Yes

4

4

Yes

4

4

Yes

4

4

Yes

4

4

Yes

η2 ¼ 93.33%

η3 ¼ 83.33%

η4 ¼ 100.00%

514

F. Li et al. / Neurocomputing 168 (2015) 505–519

Table 4 Comparison results of weak fault diagnosis accuracy obtained by three feature extraction methods. Feature extraction method

Normal state recognition accuracy rate η1 (%)

Tiny outer race crack recognition accuracy rate η2 (%)

Tiny inner race crack Tiny ball crack Average recognition accuracy recognition accuracy recognition rate η3 (%) rate η4 (%) accuracy rate η (%)

SMI þ SOLFDA AR model coefficients Instantaneous amplitude Shannon entropies of intrinsic mode function deduced from EMD

86.67 30.00 50.00

93.33 100.00 100.00

83.33 66.67 46.67

100.00 30.00 50.00

90.83 56.67 61.67

Table 5 Comparison results of weak fault diagnosis accuracy achieved by six feature reduction methods. Algorithm Estimate of d

SOLFDA LFDA LPP ONPE LDA PCA

d ¼ 15

Normal state recognition accuracy rate η1 (%)

Tiny outer race crack recognition accuracy rate η2 (%)

Tiny inner race crack recognition accuracy rate η3 (%)

Tiny ball crack recognition accuracy rate η4 (%)

Average recognition accuracy rate η (%)

86.67 73.33 56.67 70.00 13.33 40.00

93.33 80.00 63.33 83.33 3.33 93.33

83.33 73.33 30.00 63.33 86.67 60.00

100.00 93.33 26.67 100.00 20.00 36.67

90.83 80.00 44.17 79.17 30.83 57.50

Table 6 Comparison results of weak fault diagnosis accuracy among seven pattern recognition methods. Method

Normal state recognition accuracy rate η1 (%)

Tiny outer race crack recognition accuracy rate η2 (%)

Tiny inner race crack recognition accuracy rate η3 (%)

Tiny ball crack recognition accuracy rate η4 (%)

Average recognition accuracy rateη (%)

OET-KNNC ET-KNNC KNNC MWSVM RBF-SVM FCM CHMM

86.67 76.67 73.33 50.00 0 0 30.00

93.33 100.00 80.00 76.67 46.67 43.33 96.67

83.33 76.67 66.67 83.33 80.00 73.33 56.67

100.00 93.33 86.67 73.33 80.00 66.67 66.67

90.83 86.67 76.67 70.83 51.67 45.83 62.50

that the extracted features contain redundancy that distorts the distribution of features. Consequently, the classification performance of LFDA has an obvious limitation. As Table 5 shows, the recognition accuracy rates η1 and η3 obtained by dimension reduction with LFDA are only 73.33%. Similarly, LPP and ONPE only take local structure into consideration but do not take advantage of class information. Meanwhile, the basis vectors of LPP are still statistically correlated. Hence, while using LPP for dimension reduction, 15-dimensional eigenvectors of weak fault samples mix with one another, and the fault diagnosis accuracy rates η1 , η2 , η3 , and η4 are only 56.67%, 63.33%, 30.00% and 26.67%, respectively. As regards ONPE, its fault recognition accuracy rates η1 and η3 are not high enough as well. As a linear feature reduction method, LDA only takes into consideration the global Euclidean measure of data, and even has no normalization processing. Therefore, LDA cannot effectively recognize the normal state and weak fault patterns. Moreover, PCA fails to accurately identify the normal state and three weak faults because of its poor dynamic processing performance under non-Gaussian, nonlinear and nonstationary signal conditions. By contrast, through dimension reduction for 80-dimensional SMI feature sets with SOLFDA, both the clustering performance and the discrimination effect in case of 15-dimensional eigenvectors are greatly improved for the samples at normal state or with weak fault patterns. Accordingly, recognition accuracy rates ηi (i ¼ 1; 2; 3; 4) are notably increased. To sum

up, SOLFDA imposed with supervised learning mechanism and decorrelation constraint has superior classification performance compared with LFDA, LPP, ONPE, LDA and PCA. Thirdly, the pattern recognition precision of OET-KNNC was compared with that of ET-KNNC, KNNC, Morlet wavelet support vector machine (MWSVM) [36], Radial Basis kernel function support vector machine (RBF-SVM) [37], FCM and CHMM. The neighbor number of ET-KNNC and KNNC were set as k ¼ 14, which was the same as that of OET-KNNC. In MWSVM, its parameters were set as follows: penalty factor γ ¼ 1, kernel parameters c ¼ 1 and ω0 ¼ 1:75 [36]. In RBF-SVM, its penalty factor was set as γ ¼ 1, and RBF kernel parameter was set as σ ¼ 10 [14,30,37]. In the case of FCM, its parameters were set as: iteration times t ¼ 200 and fuzzy weighted index m ¼ 6. The parameters of CHMM were set as: number of hidden states K ¼ 4, maximum number of Iterations nmax ¼ 20 and convergence error (i.e. termination tolerance) e ¼ 1  10  4 . Table 6 presents the pattern recognition results of OET-KNNC, ET-KNNC, KNNC, MWSVM, RBF-SVM, FCM and CHMM after the 80-dimensional SMI feature sets are reduced by SOLFDA. From Table 6, it can be clearly seen that the weak fault recognition results of ET-KNNC, KNNC, MWSVM, RBF-SVM, FCM and CHMM are not satisfactory. For examples, the recognition accuracy rates and of normal state (η1 ) and tiny outer race crack (η2 ) obtained by FCM are only 0 o50% and 43.33%o50%, respectively. The recognition accuracy rate η1 of normal state obtained by CHMM is also just

F. Li et al. / Neurocomputing 168 (2015) 505–519

515

Fig. 5. Training iteration curve of four CHMMs corresponding to four weak faults in one testing sample.

Fig. 6. Average recognition accuracy rate as a function of k for OET-KNNC, ET-KNNC and KNNC.

Fig. 7. Time-domain waveforms and frequency-domain amplitude spectrums of testing signals: (a) time-domain waveform of outer race crack; (b) amplitude spectrums of outer race crack; (c) time-domain waveform of inner race crack; (d) amplitude spectrums of inner race crack; (e) time-domain waveform of ball crack; and (f) amplitude spectrums of ball crack.

516

F. Li et al. / Neurocomputing 168 (2015) 505–519

30.00% o50%, although CHMM has rapid convergence speed (see Fig. 5). In contrast, the identification accuracy of OET-KNNC is much higher than that of the other six methods. That is to say, OET-KNNC, which determines near-optimal or optimal parameter , values γ ¼ ðγ 1 ; ⋯; γ q ÞT from the reduced eigenvectors by minimizing the mean squared error function in Eq. (37), can significantly improve the classification accuracy compared with ET-KNNC and other methods. In general, the most distinctive feature of OET-KNNC should be the robustness with regard to its neighbor number in

Table 7 d-dimensional eigenvectors achieved by fault feature reduction with SOLFDA. Fault pattern

Estimate Sample serial of d number No:

Outer race crack

4

d¼5

5 6 7 Inner race crack

4 5 6 7

Ball crack

4 5 6 7

,

d-dimensional eigenvector y achieved by dimension reduction with SOLFDA

consideration. As Fig. 6 shows, the evidence-theoretic rule with optimized γ q always performs equally well to or better than ETKNNC and KNNC. This can significantly increase the average recognition accuracy rate to 90.83% when we consider the best results for 1 r k r 30. By optimizing γ q , OET-KNNC can learn to discard those neighbors at far remote distance from the testing sample. This performance is of great importance because it helps to save the users from exhaustively optimizing the value of k. The application of our proposed method to weak fault diagnosis of 6205-2RS deep groove ball bearings confirmed the good characteristics of the comprehensive combination of SMI feature set, SOLFDA and OET-KNNC. Their complementary advantages achieved high diagnosis precision for normal state and weak faults. Furthermore, this example verified the vital role of SOLFDA in dimension reduction and pattern classification.

6.1. Fault diagnosis of bearing fault dataset

(0.0271 0.4455 0.0183 0.0438 0.0232) (0.0124 0.4675 0.0208 0.0197 0.0116) (0.0019 0.3908 0.0147 0.1190 0.0366) (0.0090 0.3906 0.0974 0.1478 0.0530) (0.2005 0.1902 0.0050 0.0464 0.0152) (0.1551 0.2843 0.0868 0.0055 0.0118) (0.2698 0.2957 0.0293 0.0147 0.0250) (0.3143 0.2740 0.0025 0.0382 0.0155) (0.1989 0.1350 0.1923 0.0571 0.1172) (0.2648 0.2110 0.0151 0.0202 0.1398) (0.1667 0.1378 0.0471 0.0023 0.0853) (0.1210 0.1388 0.0096 0.0766 0.0341)

The proposed method is applied to bearing fault dataset obtained from the Bearing Data Center Website of Case Western Reserve University [38]. The outer race, inner race and ball of a deep groove ball bearing (SKF6205-2RS JEM) have a single point crack fault created by electro-discharge machining. The crack size is 0.007″ (0.178 mm) in diameter and 0.011″ (0.279 mm) in depth for outer race, inner race and ball. The shaft rotational frequency f r is 29.53 Hz (1772 rpm). Vibration data were collected using accelerometers attached to the housing with magnetic bases. The sampling frequency was 12 kHz. For each crack fault, 14 samples were used and each sample has 8192 data. From the 14 samples, we randomly selected just nℓi ¼ 3 ones as training samples, and the remaining 11 ones were all used as testing samples. Fig. 7 shows the time-domain waveforms and amplitude spectrums of testing samples. After training and testing samples were put into the proposed method for fault diagnosis, we obtained d-dimensional eigenvectors of testing samples by dimension reduction with SOLFDA and final fault diagnosis results of testing samples from OET-KNNC, as shown in Tables 7 and 8, respectively.

Table 8 Partial masses of basic belief assignment and partial fault diagnosis results obtained by OET-KNNC. (α ¼ 0:95, k ¼ 4, the first 3 columns of m contain the probability values mðfωq gÞ given to each class, and the last column contains mðΩÞ) Fault pattern

Sample serial number No:

Outer race 4 crack 5 6 7 Inner race crack

4 5 6 7

Ball crack

4 5 6 7

Optimized γq γ 1 ¼ 3.8985

Basic belief assignment m

(0.8874 0.0000 0 0.1126) (0.8953 0.0000 0 0.1047) (0.7812 0.0000 0 0.2187) (0.8687 0.0000 0 0.1313) γ 2 ¼ 12.7212 (0.0714 0.3986 0 0.5301) (0.2879 0.1404 0 0.5717) (0.0325 0.8481 0 0.1195) ( 0 0.7963 0.0901 0.1136) γ 3 ¼ 2.6997 ( 0 0.0001 0.4646 0.5353) ( 0 0.1725 0.2960 0.5315) ( 0 0.0229 0.2710 0.7060) ( 0.0819 0.0077 0 0.9104)

Actual output c

Expected output c'

Whether the output result Recognition is correct? accuracy rate ηi

1

1

Yes

1

1

Yes

1

1

Yes

1

1

Yes

2

2

Yes

1

2

No

2

2

Yes

2

2

Yes

3

3

Yes

3

3

Yes

3

3

Yes

1

3

No

η1 ¼ 100.00%

η2 ¼ 90.91%

η3 ¼ 81.82%

Average recognition accuracy rateη 90.91%

F. Li et al. / Neurocomputing 168 (2015) 505–519

517

Table 9 Comparison results of fault diagnosis accuracy obtained by three feature extraction methods. Feature extraction method

Estimate of d

Ball crack recognition accuracy rate η3 (%)

Inner race crack Outer race crack recognition recognition accuracy rate η1 (%) accuracy rate η2 (%)

SMI þ SOLFDA (Neighbor number of d¼5 SOLFDA k ¼ d) AR model coefficients (AR model order d ¼ 2 na ¼ 8) Instantaneous amplitude Shannon d¼2 entropies of intrinsic mode function deduced from EMD (Minimum number of IMFs N min ¼ 9)

Average recognition accuracy rate η

(%)

100.00

90.91

81.82

90.91

100.00

72.73

63.64

78.79

100.00

72.73

72.73

81.82

Table 10 Comparison results of fault diagnosis accuracy achieved by six feature reduction methods. Algorithm

Estimate of d Outer race crack recognition accuracy rate η1 (%)

Inner race crack recognition accuracy rate η2 (%)

Ball crack recognition accuracy rate η3 (%)

Average recognition accuracy rate η (%)

SOLFDA (Neighbor number k ¼ d) LFDA (Neighbor number k ¼ d) LPP (Neighbor number k ¼ d) ONPE (Neighbor number k ¼ d) LDA (Neighbor number k ¼ d) PCA (Neighbor number k ¼ d)

d¼5

100.00

90.91

81.82

90.91

100.00

81.82

72.73

84.85

45.45

100.00

72.73

72.73

81.82

100.00

72.73

84.85

18.18

90.91

9.09

39.39

81.82

63.64

90.91

78.79

Table 11 Comparison results of fault diagnosis accuracy among seven pattern recognition methods. Method

Inner race crack Ball crack recognition Outer race crack recognition accuracy rate η1 recognition accuracy rate η2 accuracy rate η3 (%) (%) (%)

Average recognition accuracy rateη (%)

OET-KNNC (Neighbor number k ¼ 4) ET-KNNC (Neighbor number k ¼ 4) KNNC (Neighbor number k ¼ 4) MWSVM (Penalty factor γ ¼ 1, kernel parameters c ¼ 1 and ω0 ¼ 1:75) RBF-SVM (Penalty factor γ ¼ 1, kernel parameter σ ¼ 10) FCM (Iteration times t ¼ 200, fuzzy weighted index m ¼ 6) CHMM (Number of states K ¼ 3, maximum number of

100.00 100.00 100.00 63.64

90.91 81.82 100.00 90.91

81.82 72.73 9.09 36.36

90.91 84.85 69.70 63.64

27.27 100.00 0

81.82 63.64 0

100.00 63.64 0

69.70 75.76 0

Iterations nmax ¼ 20, convergence error e ¼ 1  10  4 )

The total time that the proposed method takes to recognize the three faults is only 9.701232 s. Table 9 provides the comparison results of diagnosis accuracy between SMI-based fault feature set with SOLFDA dimension reduction (SMI þSOLFDA) and the other two feature extraction methods: AR model coefficients and instantaneous amplitude Shannon entropy of IMF. In Table 10, the feature reduction effect of SOLFDA is compared with that of LFDA, LPP, ONPE, LDA and PCA. The comparison of recognition accuracy among OET-KNNC, ETKNNC, KNNC, MWSVM, RBF-SVM, FCM and CHMM is shown

in Table 11. All these comparison results demonstrate the specific advantages of the proposed weak fault diagnosis method (SMI þSOLFDA þOET-KNNC) over the traditional methods.

7. Concluding remarks A novel weak fault diagnosis method for rotating machinery has been proposed based on dimension reduction with Supervised Orthogonal Local Fisher Discriminant Analysis (SOLFDA), and the

518

F. Li et al. / Neurocomputing 168 (2015) 505–519

following conclusions can be drawn from our analysis and discussion: (1) Shannon mutual information (SMI) can accurately measure the relevance between testing samples and training samples, so that the data set consisting of normalized SMI can be constructed to represent the dependence of the former on the latter. Therefore, it is very suitable to the weak fault features of testing samples. (2) SOLFDA maximizes the between-manifold divergence and minimizes the within-manifold divergence under the supervision of class labels for better discrimination ability. In addition, an orthogonality constraint is imposed to statistically uncorrelate the output eigenvectors. Therefore, SOLFDA can effectively extract essential but weak fault features when compressing high-dimensional SMI feature set. (3) OET-KNNC minimizes the mean squared error function to determine the near-optimal or optimal parameter values. This refinement can substantially improve the classification accuracy and robustness of the original ET-KNNC. (4) By the combination of the advantages of SMI feature sets in weak fault feature construction, SOLFDA in weak fault information compression, and OET-KNNC in classification decision, the proposed method can successfully improve the weak fault diagnosis precision for rotating machinery including deep groove ball bearings.

Acknowledgments This research was financially supported by the Project of National Natural Science Foundation of China (No. 51305283), State Scholarship Fund of China (No. 201406245021) and Research Fund for the Doctoral Program of Higher Education of China (No. 20120181130012). Meanwhile, the authors would like to acknowledge the insightful and valuable comments from anonymous reviewers.

References [1] H.F. Tang, J. Chen, G.M. Dong, Sparse representation based latent components analysis for machinery weak fault detection, Mech. Syst. Sign. Process. 46 (2) (2014) 373–388. [2] Y. Wang, G.H. Xu, L. Liang, K.S. Jiang, Detection of weak transient signals based on wavelet packet transform and manifold learning for rolling element bearing fault diagnosis, Mech. Syst. Signal Process. 54–55 (2015) 259–276. [3] H.C. Wang, J. Chen, G.M. Dong, Feature extraction of rolling bearing's early weak fault based on EEMD and tunable Q-factor wavelet transform, Mech. Syst. Signal Process. 48 (1–2) (2014) 103–119. [4] J.P. Burg, Maximum Entropy Spectral Analysis Ph. D. Dissertation, Department of Geophysics, Stanford University, California, 1975. [5] B.P. Tang, W.Y. Liu, T. Song, Wind turbine fault diagnosis based on Morlet wavelet transformation and Wigner-Ville distribution, Renew. Energy 35 (12) (2010) 2862–2866. [6] J.H. Yan, L. Lu, Improved Hilbert-Huang transform based weak signal detection methodology and its application on incipient fault diagnosis and ECG signal analysis, Signal Process. 98 (2014) 74–87. [7] Y. Qin, B.P. Tang, J.X. Wang, Higher-density dyadic wavelet transform and its application, Mech. Syst. Signal Process. 24 (3) (2010) 823–834. [8] S. Yin, S.X. Ding, X.C. Xie, H. Luo, A review on basic data-driven approaches for industrial process monitoring, IEEE Trans. Ind. Electron. 61 (11) (2014) 6418–6428. [9] S. Yin, X.W. Li, H.J. Gao, O. Kaynak, Data-based techniques focused on modern industry: An overview, IEEE Trans. Ind. Electron. 62 (1) (2015) 657–667. [10] S. Yin, S.X. Ding, A. Haghani, H.Y. Hao, P. Zhang, A comparison study of basic data-driven fault diagnosis and process monitoring methods on the benchmark Tennessee Eastman process, J. Process Control 22 (9) (2012) 1567–1581. [11] S. Yin, X.P. Zhu, O. Kaynak, Improved PLS focused on key-performanceindicator-related fault diagnosis, IEEE Trans. Ind. Electron. 62 (3) (2015) 1651–1658.

[12] Y.G. Lei, Z.J. He, Y.Y. Zi, Q. Hu, Fault diagnosis of rotating machinery based on multiple ANFIS combination with GAs, Mech. Syst. Signal Process. 21 (5) (2007) 2280–2294. [13] P. Boskoski, D. Juricic, Fault detection of mechanical drives under variable operating conditions based on wavelet packet Renyi entropy signatures, Mech. Syst. Signal Process. 31 (2012) 369–381. [14] F. Li, B.P. Tang, R.S. Yang, Rotating machine fault diagnosis using dimension reduction with linear local tangent space alignment, Measurement 46 (8) (2013) 2525–2539. [15] C.O. Sakar, O. Kursun, A method for combining mutual information and canonical correlation analysis: predictive mutual information and its use in feature selection, Expert Syst. Appl. 39 (3) (2012) 3333–3344. [16] H. Huang, J.M. Liu, H.L. Feng, T.D. He, Ear recognition based on uncorrelated local fisher discriminant analysis, Neurocomputing 74 (17) (2011) 3103–3113. [17] A. Hyvarinen, E. Oja, Independent component analysis: algorithms and applications, Neural Netw. 13 (4–5) (2000) 411–430. [18] T.F. Cox, M.A. Cox, Multi-Dimensional Scaling, Chapman & Hall, London, 1994. [19] W.K. Wong, H.T. Zhao, Supervised optimal locality preserving projection, Pattern Recognit. 45 (1) (2012) 186–197. [20] M. Li, J.W. Xu, J.H. Yang, D.B. Yang, D.D. Wang, Multiple manifolds analysis and its application to fault diagnosis, Mech. Syst. Signal Process. 23 (8) (2009) 2500–2509. [21] H. Huang, J.W. Li, J.M. Liu, Enhanced semi-supervised local fisher discriminant analysis for face recognition, Future Gener. Comput. Syst. 28 (1) (2012) 244–253. [22] M. Vlachos, C. Domeniconi, D. Gunopulos, Non-linear dimensionality reduction techniques for classification and visualization, in: Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM Press, Edmonton, NY, 2002, pp. 645–651. [23] X.M. Liu, J.W. Yin, Z.L. Feng, J.X. Dong, L. Wang, Orthogonal neighborhood preserving embedding for face recognition, in: Proceedings of the IEEE International Conference on Image Processing, ICIP 2007, New York, USA, 2007, pp.133–136. [24] M. Sugiyama, Dimensionality reduction of multimodal labeled data by local Fisher discriminant analysis, J. Mach. Learn. Res. 8 (2007) 1027–1061. [25] C. Xu, P.L. Zhang, G.Q. Ren, J.P. Fu, Engine wear fault diagnosis based on improved semi-supervised fuzzy C-means clustering, J. Mech. Eng. 47 (17) (2011) 55–60, In Chinese. [26] S. Yin, Z.H. Huang, Performance monitoring for vehicle suspension system via Fuzzy Positivistic C-Means Clustering based on accelerometer measurements, IEEE/ASME Trans. Mechatron. (2014) 1–8. [27] A. Antonucci, R.D. Rosa, A. Giusti, F. Cuzzolin, Robust classification of multivariate time series by imprecise hidden Markov models, Int. J. Approx. Reason. 56 (2015) 249–263. [28] Y.L. Zhou, C.X. Liu, P. Zhao, B. Sun, W.P. Hong, Fault diagnosis methods for centrifugal pump based on autoregressive and continuous hidden Markov model, Proc. CSEE 28 (20) (2008) 88–93, In Chinese. [29] J.G. Park, K.J. Kim, Design of a visual perception model with edge-adaptive Gabor filter and support vector machine for traffic sign detection, Expert Syst. Appl. 40 (9) (2013) 3679–3687. [30] F. Li, J.X. Wang, B.P. Tang, D.Q. Tian, Life grade recognition method based on supervised uncorrelated orthogonal locality preserving projection and Knearest neighbor classifier, Neurocomputing 138 (2014) 271–282. [31] S.B. Tan, An effective refinement strategy for KNN text classifier, Expert Syst. Appl. 30 (2) (2006) 290–298. [32] T. Denceux, A k-Nearest Neighbor Classification Rule Based on DempsterShafer Theory, IEEE Trans. Syst. Man, Cybern. 25 (5) (1995) 804–813. [33] L.M. Zouhal, T. Denoeux, An evidence-theoretic k-NN rule with parameter optimization, IEEE Trans. Syst. Man Cybern. 28 (2) (1998) 263–271. [34] P. Smets, The combination of evidence in the transferable belief model, IEEE Trans. Pattern Anal. Mach. Intell. 12 (5) (1990) 447–458. [35] D.D. Ridder, R.P.W. Duin, Locally linear embedding for classification Technical Report PH-2002-01, Delft University of Technology, Delft, The Netherlands, 2002. [36] B.P. Tang, F. Li, Y. Qin, Fault diagnosis model based on feature compression with orthogonal locality preserving projection, Chin. J. Mech. Eng. 24 (5) (2011) 897–904. [37] L. Deng, X.L. Hu, F. Li, B.P. Tang, Support vector machines-based method for restraining end effects of B-spline empirical mode decomposition, J. Vibr. Measur. Diagn. 31 (3) (2011) 344-347þ398 (in Chinese). [38] 〈http://csegroups.case.edu/bearingdatacenter/pages/ welcome-case-western-reserve-university-bearing-data-center-website〉.

Feng Li received his M. Sc. and Ph. D. from Chongqing University in 2008 and 2011, respectively. He is now a lecturer in School of Manufacturing Science and Engineering at Sichuan University. His research interest is equipment condition monitoring and fault diagnosis.

F. Li et al. / Neurocomputing 168 (2015) 505–519 Jiaxu Wang is now a professor and supervisor for doctorial student in School of Aeronautics and Astronautics at Sichuan University. His main research interest covers the areas of machinery tribology and reliability design, mechanical electrical transmission and intelligent control.

Minking K. Chyu is presently the Leighton and Mary Orr Chair Professor and the inaugural Associate Dean for International Initiatives at the Swanson School of Engineering, as well as the inaugural Dean of the Sichuan University-Pittsburgh Institute (SCUPI) in China. His primary research area lies in thermo-fluid issues relating to power and propulsion system, material processing, and micro/nano-system technology.

519 Baoping Tang received M. Sc. and Ph. D. both from Chongqing University in 1996 and 2003, respectively. He is now a professor and supervisor for doctorial student in Chongqing University. His main research interest covers the areas of equipment condition monitoring and fault diagnosis, Virtual Instrument and wireless sensor network.