Expert Systems with Applications 38 (2011) 12721–12729
Contents lists available at ScienceDirect
Expert Systems with Applications journal homepage: www.elsevier.com/locate/eswa
A recognition and novelty detection approach based on Curvelet transform, nonlinear PCA and SVM with application to indicator diagram diagnosis Kun Feng, Zhinong Jiang ⇑, Wei He, Bo Ma Diagnosis and Self-recovery Engineering Research Center, Beijing University of Chemical Technology, Beijing 100029, China
a r t i c l e
i n f o
Keywords: 2D-Curvelet transform Nonlinear PCA SVM Novelty detection Indicator diagram recognition Reciprocating compressor
a b s t r a c t Indicator diagram plays an important role in the health monitoring and fault diagnosis of reciprocating compressors. Different shapes of indicator diagram indicate different faults of reciprocating compressor. A proper feature extraction and pattern recognition method for indicator diagram is significant for practical uses. In this paper, a novel approach is presented to handle the multi-class indicator diagrams recognition and novelty detection problems. When multi-class faults samples are available, this approach implements multi-class fault recognition; otherwise, the novelty detection is implemented. In this approach, the discrete 2D-Curvelet transform is adopted to extract the representative features of indicator diagram, nonlinear PCA is employed for multi-class recognition to reduce dimensionality, and PCA is used for novelty detection. Finally, multi-class and one-class support vector machines (SVMs) are used as the classifier and novelty detector respectively. Experimental results showed that the performance of the proposed approach is better than the traditional wavelet-based approach. Crown Copyright Ó 2011 Published by Elsevier Ltd. All rights reserved.
1. Introduction The reciprocating compressor is a common machine in modern chemical industry and is an important link in the production chain of process industry. The faults of reciprocating compressor could lead to serious accident, such as explosion and casualties. Hence, the health monitoring and fault diagnosis of reciprocating compressor are crucial for the reliability and safety in production. Indicator diagram is usually used in the monitoring and fault diagnosis of reciprocating compressor. However, recognition of indictor diagram requires the sophisticated knowledge and is very complicated for end users. In addition, the subjective judgment of the end user could lead to some false alarm in the process of diagnosis. Therefore, it would be significant to develop an effective approach for automatic indicator diagram novelty detection, recognition and diagnosis. From the viewpoint of image processing, each indicator diagram is a digital image, which is a 2-dimensional (2D) matrix formed by one planar closed curve. Geometrically speaking, it can be found that the main differences of different curves are contained in multi-scales and multi-directions of the closed curves (see 2.1). Furthermore, the fault indicator diagrams bend towards different directions (see 2). The first important task is to adjust the matrix in order to get clear features which are sensitive to these forementioned differences. In the field of image processing, the mul-
⇑ Corresponding author. Tel.: +86 10 64431325 601. E-mail address:
[email protected] (Z. Jiang).
ti-resolution methods such as short-time Fourier transform and wavelet transform has been used to handle feature extraction problems (Diekmann et al., 2001; Imaeda et al., 2004; Murray, Gorse, & Thornton, 2002; Xiang et al., 2007). And the Curvelet transform-based approaches have generated increasing interest in the community of image processing over the past years (Candes, Demanet, Donoho, & Ying, 2007; Candes & Donoho, 2004; Donoho & Duncan, 2000; Jean-Luc, Candes, & Donoho, 2002; Li, Yang, & Jiao, 2010; Ma & Plonka, submitted for publication; Moghadas Nejad & Zakeri, 2011). Compared with short-time Fourier transform and wavelet transform, 2D-Curvelet transform is not only a multi-scale but also multi-direction transform in which more directions are considered and more precise representation of the image would be achieved than traditional 2D-wavelet transform (Candes et al., 2007; Donoho & Duncan, 2000; Ma & Plonka, submitted for publication). Moreover, as angled polar wedges windows in frequency domain are used by Curvelet transform to resolve directional features, it is useful and effective for representing line-like edges (Candes et al., 2007; Jean-Luc et al., 2002; Ma, 2007). Therefore, 2D-Curvelet transform can also be taken as the feature extraction tool to handle our first important task in this paper. However, the result of 2D-Curvelet transform is a high dimensional feature vector which would cause the so-called curse of dimensionality like any other high dimensional feature (Bishop, 2006; Theodoridis & Koutroumbas, 2003). For this reason, an appropriate dimensionality reduction approach should be used. In the multi-class case, with the multi-class samples acquired, the nonlinear principal component analysis (NLPCA) is adopted to reduce the dimensionality and overcome the class conjunctions
0957-4174/$ - see front matter Crown Copyright Ó 2011 Published by Elsevier Ltd. All rights reserved. doi:10.1016/j.eswa.2011.04.060
12722
K. Feng et al. / Expert Systems with Applications 38 (2011) 12721–12729
simultaneously (Cho, 2007; Duin, Loog, & Haeb-Umbach, 2000; Widodo & Yang, 2007). In one-class case, PCA is adopted instead because only the normal samples are available. Support vector machine (SVM) became popular recently for solving problems in classification, regression, and novelty detection (Baccarini, Rocha e Silva, de Menezes, & Caminhas, 2011; Scholkopf, Williamson, Smola, Shawe-Taylor, & Platt, 2000). Compared with traditional classifier, such as Neural Network, SVM has some better properties. An important one is that the determination of the model parameters corresponds to a convex optimization problem, so any local solution is also a global optimum (Bishop, 2006). Therefore, multi-class SVM and one-class SVM are employed as multi-classifier and novelty detector respectively in this paper. In a nutshell, our whole approach adopts 2D-Curvelet (Bishop, 2006; Donoho & Duncan, 2000; Theodoridis & Koutroumbas, 2003) and nonlinear PCA (Bishop, 2006) to extract the features, multi-class SVM (Franc, 2005; Schölkopf, Smola, & Muller, 1998; Scholkopf et al., 2000) to classify the multi-class indicator diagrams, and one-class SVM to detect novelty (Franc, 2005; Scholkopf et al., 2000). Some experimental investigations are conducted to verify the feasibility and advantages of this approach. A pressure sensor and a tachometer are mounted to measure the indicator diagram and convert it to digital image through a PCbased data acquisition system. With the measured digital indicator diagram, the approach is verified and compared with the traditional wavelet-based approach.
2. Indicator diagram feature extraction based on 2D-Curvelet transform and nonlinear PCA 2.1. Indicator diagram and its properties From the geometrical viewpoint, the indicator diagram of reciprocating compressor is a closed 2D curve (Huagen, Ziwen, & Pengcheng, 2004). The X-coordinate of it is the volume V sketched by the trajectory of the piston of the compressor and the Y-coordinate is the pressure P inside the cylinder. So indicator diagram is also called P–V graph (Wu, Peng, Xing, & Shu, 2004). Fig 1 shows an example of the indicator diagram. When different mechanical faults occur, indicator diagrams show different shapes due to the changing of dynamic property of the reciprocating compressor. Fig. 2 illustrates the corresponding indicator diagrams of five typical faults (see part 3).
From Fig. 2, we can roughly learn that the main differences between the faults are contained in the multi-scales and multidirections. For examples, the Fault 1 curve and normal curve are roughly the same except the micro difference in the top part of the curves. In contrast, between the Fault 2 curve and normal one, there are differences in the right and bottom lines. And, the Fault 3 curve is different from the normal one in the position and shape of higher right corner. Meanwhile, between Fault 4 and the normal one, there is difference in the position and shape of lower left corner. It is clear that the curves of different fault indicator diagrams bend towards different directions. For example, Fault 1 bends up, and Fault 4 bends left and down. In a word, the differences among all these indicator diagrams are subtle, directional and in different positions. These differences are the key to recognition and novelty detection. For this reason, the extracted features should be sensitive to the multi-scales and multi-directional differences of the shapes. As the Curvelet transform uses angled polar wedges or angled trapezoid windows in frequency domain in order to extract directional features, it is very efficient in representing line-like edges (Candes & Donoho, 2004; Donoho & Duncan, 2000; Ma, Antoniadis, & Le Dimet, 2006; Starck, Donoho, & Candes, 2003). As a result, it could be suitable for the feature extraction of indicator diagram. 2.2. 2D-Curvelet transform In 1999, Candues and Donoho introduced the Curvelet transform, a multi-scale representation suited for objects which are smooth away from discontinuities across curves (Donoho & Duncan, 2000). Unlike the wavelet transform, it has directional parameters, and the Curvelet pyramid contains elements with a very high degree of directional specificity. The transform was designed to represent edges and other singularities along curves, which is much more efficient than conventional transforms, such as short-time Fourier transform and wavelet (Candes & Donoho, 2004; Donoho & Duncan, 2000). All of these properties are very stimulating and have already led to a wide range of interesting applications. In 2005, a new mathematical architecture – Fast Discrete Curvelet Transforms (FDCvTs) that suggests innovative algorithmic strategies and improves upon earlier implementations is proposed by Canddes and Demanet (Candes, Donoho, Demanet, & Ying, xxxx; Starck et al., 2003). Two different algorithms are developed to implement FDCvT. Therefore, the FDCvT is adopted and a short review is given here. 2.2.1. Continuous-time Curvelet transform (Candes et al., 2007) ^ j ðxÞg ¼ IFTfU j ðxÞg. Definitions. Mother Curvelet: uj ¼ IFTfu Frequency window: U j ðxÞ is the frequency window defined in the Fourier domain by
0.6 0.55 0.5
U j ðr; hÞ ¼ 23j=4 Wð2j rÞV
0.45
P/Mpa
0.4
! ð1Þ
where bj=2c is the integer part of j/2. Meanwhile, WðrÞ (radial window) and VðtÞ (angular window) are window functions. These are smooth, nonnegative, and real-valued, with WðrÞ taking positive real arguments and supported on r 2 ð1=2; 2Þ and VðtÞ taking real arguments and supported on t 2 ½1; 1. These windows will always obey the admissibility conditions:
0.35 0.3 0.25 0.2
1 X
0.15 0.1
2bj=2c h 2p
W 2 ð2j rÞ ¼ 1;
r 2 ð3=4; 3=2Þ
ð2Þ
V 2 ðt lÞ ¼ 1;
t 2 ð1=2; 1=2Þ
ð3Þ
j¼1
0
0.1
0.2
0.3
0.4
0.5
V/cm3
Fig. 1. An example of the indicator diagram.
0.6
0.7 1 X l¼1
12723
K. Feng et al. / Expert Systems with Applications 38 (2011) 12721–12729 0.7
0.7 Normal Fault1
0.6
0.7 Normal Fault2
0.6
0.5
0.5
0.5
0.4
0.4
0.4
0.3
0.3
0.3
0.2
0.2
0.2
0.1
0.1
0.1
0
0
0.2
0.4
0.6
0
0.8
0
0.2
Normal Fault4
0.6
0.4
0.6
0.5
0.4
0.4
0.3
0.3
0.2
0.2
0.1
0.1
0
0
0.2
0.4
0.6
0
0.8
0.8
0
0
0.2
0.4
0.6
0.8
Normal Fault5
0.6
0.5
Normal Fault3
0.6
0
0.2
0.4
0.6
0.8
Fig. 2. Five typical faults’ indicator diagrams smoothed by Savitzky–Golay Filter (see 3.1).
The equispaced sequence of rotation angles: hl ¼ 2p 2bj=2c l, with l ¼ 0; 1; 2; . . .. The sequence of translation parameters: k ¼ ðk1 ; k2 Þ 2 Z 2 . Curvelets as function of x = (x1, x2) at scale 2j,orientation hl and ðj;lÞ j j=2 position xk ¼ R1 Þ is defined as hl ðk1 2 ; k2 2
uj;l;k ðxÞ ¼ uj ðRhl ðx xðj;lÞ k ÞÞ
ð4Þ
where Rh is the rotation by h radians. The Curvelet coefficients are defined as
cðj; l; kÞ ¼ hf ; /j;l;k i ¼
Z R2
j;l;k ðxÞdx f ðxÞ /
ð5Þ
where f e L2(R2), h, i denotes the inner product. Express this inner product as the integral over the frequency plane:
1
Z
^f ðxÞ / b j;l;k ðxÞdx ð2pÞ2 Z 1 ;xi ^f ðxÞ U ðR xÞeihxðj;lÞ k ¼ dx j hl 2 ð2pÞ
cðj; l; kÞ ¼
ð6Þ
2.2.2. Discrete Curvelet transform The discrete transformation is linear and take as input Cartesian arrays of the form f[t1, t2], 0 6 t 1 ; t 2 < n, which provides the discrete Curvelet coefficients (DCvCs) obtained by the analog-todigital convertion of Eq. (5): D
C ðj; l; kÞ ¼
X
f ½t 1 ; t2 u
D j;l;k ½t 1 ; t 2
ð7Þ
Fig. 3. The basic digital tiling. The shaded region represents one such typical wedge (Candes et al., 2007).
C D ðj; l; kÞ ¼
X
_
f ½n1 ; n2 n1 tan hl
n1 ;n2 2Pj _
U j ½n1 ; n2 ei2pðk1 n1 =L1;j þk2 n2 =L2;j Þ
ð8Þ
06t 1 ;t 2
where each uDj;l;k is a digital Curvelet waveform (D stands for ‘‘digital’’) which is implicitly defined by the fast algorithms. For the discrete Curvelet transform, the basic digital tilling and a typical wedge are illustrated in Fig. 3 (Candes et al., 2007). And, the FDCvT by Unequispaced FFT’s (USFFT) can be expressed as
2.2.3. An example of an indicator diagram’s FDCvT In this work, every indicator diagram is a 256 256 gray image. We applied 5-level FDCvT on it. The numbers of the wedges of all 5 scales are 1, 32, 32, 64, and 1. We extract one feature from each wedge. Hence, there exist 130 (1 + 32 + 32 + 64 + 1) wedge energy values which are the features of each indicator diagram. The DCvCs
12724
K. Feng et al. / Expert Systems with Applications 38 (2011) 12721–12729 log of curvelet coefficients
50
100
150
200
250
300
350
400
450
50
100
150
200
250
300
350
400
450
Fig. 4. An example of the log of DCvCs of an indicator diagram.
are shown in Fig. 4 and all the features extracted from DCvCs are shown Fig. 5.
All the features of an Indicator Diagram extracted from DCvCs 0.045
Ej;l ¼
X
jC D ðj; l; kÞj2
ð9Þ
k
And the flow chart of the FDCvT-based feature extraction is shown in Fig. 6. 2.3. Dimensionality reduction and nonlinear PCA (NLPCA)
Feature Value(Energy of the Edges)
0.04
2.2.4. Features extracted by FDCvT Since the DCvCs can represent the indicator diagram in different scales and wedges, the energy of the DCvCs could be adopted as features
0.035 0.03 0.025 0.02 0.015 0.01 0.005
A major problem associated with pattern recognition is the so-called curse of dimensionality (Bishop, 2006). There are two reasons to explain why the dimension of the feature vector cannot be too large: firstly, the computational complexity would become too large; secondly, an increase of the dimension will ultimately causes a decrease of the performance (van der Heijden, Duin, De Ridder, & Tax, 2004). For the reduction of the feature space dimensionality, two different approaches exist. One is to discard certain elements of the feature vector and keep the most representative ones. This type of reduction is feature selection (Theodoridis & Koutroumbas, 2003). Another one is called feature extraction, in which, the original feature vector is converted to a new one by a special transform and the new features have a much lower dimensions. In this paper, the second approach is adopted, and an effective feature extraction method, namely, nonlinear PCA (NLPCA), will be used. PCA is a powerful technique for extracting structures from possibly high dimensional data sets. However, the traditional linear PCA, which is to find a liner solution for the feature extraction problem, is not optimal with respect to the class separability. Because multi-class classification problems often suffer from the problem of serious class conjunctions (Duin et al., 2000). In this
0
0
20
40
60 80 Feature No.
100
120
140
Fig. 5. All the features of the example indictor diagram.
case, liner PCA could not work well, therefore it hardly has practical values to solve the multi-class classification problem faced in this paper. In Duin et al. (2000), NLPCA is proposed to avoid the class conjunction. The principle of it is discussed as follows. In nonlinear PCA, a weighted between-scattered matrix SB is defined for a c-class classification problem:
SB ¼
c1 X c X
pi pj wðdij Þðmi mj Þðmi mj ÞT
ð10Þ
i¼1 j¼iþ1
where pi and pj are the prior probabilities of the classes i and j respectively; mi is the class means represented by row vectors; w(d) is the weighting function based on a error function erf():
wðdÞ ¼
d p ffiffiffi ; erf 2 2 2 2d 1
2 erf ðxÞ ¼ pffiffiffiffi
p
Z 0
x
2
et dt
ð11Þ
12725
K. Feng et al. / Expert Systems with Applications 38 (2011) 12721–12729
the performance of classifying and detection. Finally, a comparison is made between the proposed approach and the traditional wavelet transform-based approach.
2D Digital Indicator Diagram (256X256)
3.1. Indicator diagram data collection and preprocessing
2D Curvelet Transform (5 Levels)
Scales: 1 Wedges: 1
2 32
3 32
4 64
In this work, we monitor a reciprocating compressor through two measurement points. One measurement point is the key-phase whose signal is collected by an eddy-current transducer. This keyphase works as a trigger for the second channel and is used to calculate the volume V sketched by the piston (X-axial of the indicator diagram). Another measurement point measures the real time pressure (Y-axial of the indicator diagram). Because of the fluctuation of pressure and the intrinsic noise of the physical channel, the crude indicator diagram is a little rough. To reduce the effects of random noise, we first smooth the curve of indicator diagram by a moving average filter before the feature extraction. In this preprocessing phase, the Savitzky–Golay filter (Luo, Ying, He, & Bai, 2005; Press, Teukolsky, Vetterling, & Flannery, 1990) is adopted (Fig. 7). Then, we transform the indicator diagrams into 256 256 digital gray-level images. After that, FDCvT is used to extract DCvCs, which is a 130-dimensional feature vectors.
5 1
Calculate the Energy of All Wedges
130 Wavelet based Features
3.2. Feature extracted by Curvelet transform and nonlinear PCA Fig. 6. Flow chart of the FDCvT-based feature extraction.
The eigenvectors corresponding to the largest eigenvalues of SB are used for obtaining the linear map. n_ _ o _ Let v 1 ; v 2 ; . . . ; v N is the set of orthonormal eigenvectors with associated eigenvalues fk1 ; k2 . . . ; kN g for the symmetric matrix SB . Without loss of generality, let k1 > k2 > > kN , and we have: _
_
SB v i ¼ ki v i ;
i ¼ 1; 2; . . . N
ð12Þ
Assume that we want to extract n principal components from N features, the linear map for our feature extractor is
Y ðn1Þ ¼ W ðnNÞ X ðN1Þ ;
_
_
_
W ¼ ½v 1 ; v 2 ; . . . v N T
ð13Þ
where X is the vector of original high dimensional feature space; Y is the vector of low dimensional feature subspace. In this paper, X is the vector whose elements are the 130 DCvCs. From the above discussion, we can get a helpful result. Although this feature extraction approach is based on nonlinear PCA, the final mapping that maps the DCvCs to a low dimensional feature space is linear. Furthermore, this linear mapping could be obtained through solving a standard matrix eigenvalues problem (Schölkopf et al., 1998) which has been settled efficiently (Heath, 2002). Hence, the NLPCA is also computationally efficient. Therefore, it is easy to be implemented and has high calculation efficiency.
According to the above discussion, using high dimensional feature directly for recognition would lead to high computation complexity because of the ‘‘curse of dimension’’. Therefore, we employ NLPCA to reduce the dimension of the feature space from 130 to 3. Let each class i, 1 6 i 6 K (K = 6 in this experiment), be characterized by its mean mi , covariance Si and a priori probability pi (here, pi = 1/6 for each i). The feature extraction procedures based on nonlinear PCA are as follows: First, a pre-whitening step which transforms the average withP in-class scatter matrix Sw ¼ Ki¼1 pi Si into the identity matrix. After this step, the new sample set with identity Sw is obtained. Second, we formulate the matrix Sb and get the linear mapping W which is obtained through finding the eigenvalues of Sb and choosing the first 3 principal ones kj ; j ¼ 1; 2; 3 with the corresponding eigenvectors Pj , j = 1, 2, 3:
W ¼ ½P1 ; P2 ; P3 T
At last, the 3D feature vector could be obtained by simple mapping or projection operation: Crude Curve Smoothed Curve
0.6
0.5
P/MPa
3. Experimental investigation To verify the proposed approach, experimental investigations are carried out. At first, five typical faults of a reciprocating compressor are made, which are discharge valve or pipe blockage, suction valve or pipe blockage, suction valve leakage, discharge valve leakage and discharge and suction valve leakage. Then, 700 samples (digital indicator diagrams) are collected under all the faulty and normal working conditions. The FDCvT and nonlinear PCA are adopted to extract the needed features (we compare 2 and 3 features here) from the 700 indicator diagrams. Five hundred samples of them are selected to train the Radial Basis Function (RBF) SVM classifier and detector, and other 200 samples are used to test
ð14Þ
0.4
0.3
0.2
0.1
0
0
0.1
0.2
0.3
0.4
0.5
0.6
V/cm3
Fig. 7. The crude and smoothed curve by Savitzky–Golay filter.
0.7
12726
K. Feng et al. / Expert Systems with Applications 38 (2011) 12721–12729 PCA Features of Trainning Samples
NLPCA Features of Trainning Samples
5 Feature 3
Fault1 Fault2 Fault3 Fault4 Fault5 Normal
Fault2 Fault3 Fault4 Fault5 Normal
10
0
2
−5 −10
1 0
10
−10 Feature 2
−10
0 Feature 1
Feature 3
0
10
−1
2 0
−2
−2
Feature 2 −6
−4
−2
Fig. 8. Feature extraction result of indicator diagram training samples (based on FDCvT & NL PCA).
ð15Þ
where W acts a projection matrix. Now, the feature extraction of the 6 500 training vectors (each one is 130-dimension) can be accomplished by repeating the projection (Eq. (15)) 3000 times. And, the testing samples are obtained through mapping testing vectors by the projection matrix W in (Eq. (15)). The nonlinear PCA results are shown in Figs. 8 and 9. For comparison, we also give the result of the traditional PCA (linear PCA). The implement of this method is similar to the nonlinear PCA-based method. The only difference is that the mapping W is obtained from the eigenvectors of the covariance matrix of the samples. As illustrated by Figs. 10 and 11, slight class conjunction happens between Fault 2 class and the normal one. Indeed, class separability is not considered a lot by the linear PCA. Nonetheless, the overall discrimination effectiveness is still good as a result of the application of Curvelet transform. Furthermore, as a simple algorithm, PCA does not rely on samples of different classes. This advantage is very crucial when fault samples are not available, e.g. novelty detection.
In the recent decades, SVM has become an increasingly popular tool for machine learning tasks involving classification, regression,
NLPCA Features of Testing Samples Fault1 Fault2 Fault3 Fault4 Fault5 Normal
Feature 3
5 0 −5 −10
10 0 −10 Feature 2
−10
0 Feature 1
4
Fig. 10. Feature extraction result of indicator diagram training samples (based on FDCvT & PCA).
PCA Features of Testing Samples Fault1 Fault2 Fault3 Fault4 Fault5 Normal
2 1 0 −1
1 0 −1 −2 −6
−2 −4
−2
0
2
Feature 2
−3 4
Feature 1
3.3. Classification and novelty detection result by SVM
10
2
Feature 1
Feature 3
Y ð31Þ ¼ W ð3130Þ X ð1301Þ
0
−3
10
Fig. 9. Feature extraction result of indicator diagram testing samples (based on FDCvT & NL PCA).
Fig. 11. Feature extraction result of indicator diagram testing samples (based on FDCvT & PCA).
and novelty detection. It exhibits good generalization performance on many real world datasets and is well-established theoretically (Campbell, 2000). In this paper, the multi-class SVM (Franc, 2005) is adopted to classify indicator diagrams of different faults while one-class SVM is used for novelty detection. A typical scheme of multi-class SVM classification is to use a multi-class formulation SVM and learn all the parameters of discriminant functions at once (Franc, 2005). Compared with multi-class SVM, One-class SVM is to find a sphere with a minimal radius and center which contains most of the normal data: novel test points are those which lie outside the boundary of the sphere. We employ the approaches proposed by Franc (2005) to accomplish these two tasks. Here, radial basic function is adopted as the kernel function. In practice, these two different tasks have different implementation procedures as demonstrated in Fig. 12. We list the experimental result of multiple faults classification and novelty detection as follows. We can see that the recognition accuracy of NLPCA is higher than PCA for almost all faults. When three features generated by PCA are used, 4 out of 260 normal samples are detected as outlier; meanwhile, 2 out of 260
12727
K. Feng et al. / Expert Systems with Applications 38 (2011) 12721–12729
Normal and All Faults Indicator Diagrams for trainning
New Testing Indicator Diagrams
Normal Indicator Diagrams for trainning
New Testing Indicator Diagrams
Preprocessing
Preprocessing
FDCvT & NonlinearPCA
FDCvT & PCA
Samples of the Normal and All Faults Classes
Samples of the Normal Class
Train the Multiclass SVM
Train the One class SVM
Traned Multicalss SVM
New Testing Samples
Result of Classification
(a) Multiple Faults Classification
Traned One class SVM
New Testing Samples
Result of Novelty Detection
(b) Novelty Detection
Fig. 12. Flow charts of classification and novelty detection: (a) multiple faults classification and (b) novelty detection.
Fault 2 samples are detected as normal. This approach shows a very high accuracy. As a contrast, when two features are used, 8 out of 260 normal samples are detected as outlier, and, 286 out of 1300 fault samples (150 out of Fault 2, 126 out of Fault 4, and 10 out of Fault 5) are detected as normal.
3.4. Comparison with wavelet-based features As demonstrated in the above section, the Curvelet transformbased features are very effective for the recognition and novelty detection of indicator diagrams. To further demonstrate the superiority of Curvelet transform-based feature, we also compare it with some traditional features used for shape recognition, such as wavelet-based features (Diekmann et al., 2001; Imaeda et al., 2004; Murray et al., 2002). For a parallel comparison with 5-level FDCvT-based features, the 5-level wavelet decomposition is adopted here. There are four kinds of coefficients in each level: approximation coefficients, horizontal detail coefficients, vertical detail coefficients and diagonal detail coefficients. We calculate the energy of each kind coefficients by Eq. (8). And the flow chart of the wavelet-based feature extraction is shown in Fig. 13. Now, we get 20 wavelet-based features. Similar to the FDCvTbased features, these features is adopted in classification and novelty detection through NLPCA/PCA and multi-class SVM/one-class SVM. The equivalent experimental results are illustrated by the following tables. As demonstrated by Tables 1–4, the overall performance of FDCvT-based features is higher than wavelet-based features. The accuracy of wavelet-based features is sharply low under some
2D Digital Indicator Diagram (256X256) 2D Wavelet Transform (5 Levels)
5 Levels X 4 Coefficient Vectors
Calculate the Energy of Every Coefficient Vectors
20 Wavelet based Features Fig. 13. Flow chart of the wavelet-based feature extraction.
condition. That is to say, the FDCvT-based features are more representative than wavelet-based features.
12728
K. Feng et al. / Expert Systems with Applications 38 (2011) 12721–12729
Table 1 Accuracy ratio of multiple faults classification (FDCvT-based features).
Start
Features
3 Features extracted by PCA (%)
3 Features extracted by NLPCA (%)
All faults (260 5 samples) Fault 1 (260 samples) Fault 2 (260 samples) Fault 3 (260 samples) Fault 4 (260 samples) Fault 5 (260 samples)
98.25 93.62 97.65 100 100 100
99.79 100 99.33 100 99.66 100
Faults Samples are Available
Y
N Novelty Detection
Table 2 Accuracy ratio of novelty detection (FDCvT-based features). Features extracted by PCA
2 Features (%)
3 Features (%)
Normal (260 samples) Fault 1 (260 samples) Fault 2 (260 samples) Fault 3 (260 samples) Fault 4 (260 samples) Fault 5 (260 samples)
96.92 100 42.30 100 51.53 96.15
98.46 100 99.23 100 100 100
Multiple Faults Classification
Confirm the Fault Artificially
Store the Fault Indicator Diagrams as samples Table 3 Accuracy ratio of multiple faults classification (wavelet-based features). Features
3 Features extracted by PCA (%)
3 Features extracted by NLPCA (%)
All faults (260 5 samples) Fault 1 (260 samples) Fault 2 (260 samples) Fault 3 (260 samples) Fault 4 (260 samples) Fault 5 (260 samples)
87.58 96.64 85.57 97.99 79.19 100
98.99 93.96 100 99.33 100 99.66
Fig. 14. The overall framework.
3. The overall framework. Considering the above discussed practical problems, we suggest the overall framework illustrated by Fig. 14. 5. Conclusions
Table 4 Accuracy ratio of novelty detection (wavelet-based features). Features extracted by PCA
2 Features
3 Features
Normal (260 samples) Fault 1 (260 samples) Fault 2 (260 samples) Fault 3 (260 samples) Fault 4 (260 samples) Fault 5 (260 samples)
75.38 100 100 100 39.23 99.62
74.23 100 100 100 58.85 100
4. Practical consideration The purpose of this paper is to provide an effective approach to solve the real-world engineering problem of indicator diagram recognition. Hence, it is better to take some practical situations into account. 1. Over three classes of faults samples are available. As the above discussion shows, at least three classes of faults samples are necessary for the application of NLPCA in multiclass indicator diagrams recognition. Therefore, if samples of over three classes of faults are available; the NLPCA-based multi-class indicator diagrams recognition could be implemented. 2. No faults samples are available. However, samples of fault indicator diagrams are not always available in practice. Under the condition that these samples are not available, novelty detection based on PCA can be done. After that, a professional engineer could check the fault indicator diagrams, confirm the fault and store them as new fault samples.
In this paper, we propose a novel approach for indicator recognition and novelty detection. Discrete 2D-Curvelet transform is employed to transform an indicator diagram to a high-dimensional feature space. Then, the so-called nonlinear PCA is adopted to map the high-dimensional features to 3-dimensional feature space under the multi-class recognition situation, and PCA is used for dimensionality reduction under the one-class situation. At last, multi-class SVM and one-class SVM are taken as the classifier and novelty detector respectively. As to the computing efficiency of the proposed approach, firstly, as we know, PCA and SVM have low computational burden. In addition, the final mapping of NLPCA is linear and can be realized by solving a standard eigenvalue problem, although the class separability is enhanced by nonlinear weighting. As a result, the proposed approach is fast enough to be used in real-time applications. Some experimental investigations are carried out to inspect and verify our approach and satisfied results are obtained. Furthermore, we also compare FDCvT-based feature generation approach with traditional wavelet-based approach. The results show that the proposed approach is more accurate. At the end of this paper, some practical considerations about the proposed approach are discussed and an overall framework of implementation is suggested for engineers. Acknowledgements This research is supported by the National Natural Science Foundation of China under Grant No. 50635010. The authors also thank Prof. Emmanuel Candes of Applied and Computational Mathematics California Institute of Technology, Assistant Prof. Laurent Demanet of Applied Mathematics Department of Mathematics of MIT for providing the CurveLab software and the referred figure (Fig. 3).
K. Feng et al. / Expert Systems with Applications 38 (2011) 12721–12729
References Baccarini, L. M. R., Rocha e Silva, V. V., de Menezes, B. R., & Caminhas, W. M. (2011). SVM practical industrial application for mechanical faults diagnostic. Expert Systems with Applications, 38, 6980–6984. Bishop, C. (2006). Pattern recognition and machine learning. New York: Springer. Campbell, C. (2000). Algorithmic approaches to training support vector machines: A survey. In Proceedings of ESANN2000 (pp. 27–36). Candes, E., Demanet, L., Donoho, D., & Ying, L. (2007). Fast discrete curvelet transforms. Multiscale Modeling and Simulation, 5, 861–899. Candes, E., & Donoho, D. (2004). New tight frames of curvelets and optimal representations of objects with C2 singularities. Communications on Pure and Applied Mathematics, 56, 219–266. Candes, E., Donoho, D., Demanet, L., & Ying, L. (xxxx). Curvelab: Fast discrete curvelet transform. URL:
. Cho, H.-W. (2007). Nonlinear feature extraction and classification of multivariate data in kernel feature space. Expert Systems with Applications, 32, 534–542. Diekmann, F., Heinlein, P., Drexl, J., Grebe, S., Gössler, A., Schneider, W., et al. (2001). Visualization of microcalcifications by full-field digital mammography using a wavelet algorithm. International Congress Series, 1230, 526–530. Donoho, D., & Duncan, M. (2000). Digital curvelet transform: Strategy, implementation and experiments. Dept. of Statistics, Stanford University. Duin, R., Loog, M., & Haeb-Umbach, R. (2000). Multi-class linear feature extraction by nonlinear PCA. In IEEE proceedings of the 15th international conference on pattern recognition, Barcelona, Spain (Vol. 15, pp. 398–401). Franc, V. (2005). Optimization algorithms for kernel methods. Centre for Machine Perception, Czech Technical University. Heath, M. (2002). Scientific computing. McGraw-Hill. Huagen, W., Ziwen, X., & Pengcheng, S. (2004). Theoretical and experimental study on indicator diagram of twin screw refrigeration compressor. International Journal of Refrigeration, 27, 331–338. Imaeda, S., Kobashi, S., Kitamura, Y. T., Kondo, K., Hata, Y., & Yanagida, T. (2004). Wavelet-based hemodynamic analyzing method in event-related fMRI with statistical processing. International Congress Series, 1270, 138–141. Jean-Luc, S., Candes, E. J., & Donoho, D. L. (2002). The curvelet transform for image denoising. IEEE Transactions on Image Processing, 11, 670–684. Li, Y., Yang, Q., & Jiao, R. (2010). Image compression scheme based on curvelet transform and support vector machine. Expert Systems with Applications, 37, 3063–3069.
12729
Luo, J., Ying, K., He, P., & Bai, J. (2005). Properties of Savitzky–Golay digital differentiators. Digital Signal Processing, 15, 122–136. Ma, J. (2007). Curvelets for surface characterization. Applied Physics Letters, 90, 054–109. Ma, J., Antoniadis, A., & Le Dimet, F. (2006). Curvelet-based snake for multiscale detection and tracking of geophysical fluids. IEEE Transactions on Geoscience and Remote Sensing, 44, 3626–3638. Ma, J., & Plonka, G. (submitted for publication). A review of curvelets and recent applications. IEEE Signal Processing Magazine. Moghadas Nejad, F., & Zakeri, H. (2011). A comparison of multi-resolution methods for detection and isolation of pavement distress. Expert Systems with Applications, 38, 2857–2872. Murray, K. B., Gorse, D., & Thornton, J. M. (2002). Wavelet transforms for the characterization and detection of repeating motifs. Journal of Molecular Biology, 316, 341–363. Press, W., Teukolsky, S., Vetterling, W., & Flannery, B. (1990). Savitzky–Golay smoothing filters. Computers in Physics, 4, 669–672. Schölkopf, B., Smola, A., & Muller, K. (1998). Nonlinear component analysis as a kernel eigenvalue problem. Neural Computation, 10, 1299–1319. Scholkopf, B., Williamson, R., Smola, A., Shawe-Taylor, J., & Platt, J. (2000). Support vector method for novelty detection. Advances in Neural Information Processing Systems, 12, 582–588. Starck, J., Donoho, D., & Candes, E. (2003). Astronomical image representation by the curvelet transform. Astronomy and Astrophysics, 398, 785–800. Theodoridis, S., & Koutroumbas, K. (2003). Pattern recognition. New York: Academic Press. van der Heijden, F., Duin, R., De Ridder, D., & Tax, D. (2004). Classification parameter estimation and state estimation: An engineering approach using MATLAB. John Wiley & Sons Inc. Widodo, A., & Yang, B.-S. (2007). Application of nonlinear feature extraction and support vector machines for fault diagnosis of induction motors. Expert Systems with Applications, 33, 241–250. Wu, H., Peng, X., Xing, Z., & Shu, P. (2004). Experimental study on P–V indicator diagrams of twin-screw refrigeration compressor with economizer. Applied Thermal Engineering, 24, 1491–1500. Xiang, J., Xiao, Z., Wang, Y., Feng, Y., Qiao, H., Sun, B., et al. (2007). Detection of subtle structural abnormality in tuberous sclerosis using MEG guided post-image processing. International Congress Series, 1300, 693–696.