A recognition and novelty detection approach based on Curvelet transform, nonlinear PCA and SVM with application to indicator diagram diagnosis

A recognition and novelty detection approach based on Curvelet transform, nonlinear PCA and SVM with application to indicator diagram diagnosis

Expert Systems with Applications 38 (2011) 12721–12729 Contents lists available at ScienceDirect Expert Systems with Applications journal homepage: ...

1MB Sizes 2 Downloads 16 Views

Expert Systems with Applications 38 (2011) 12721–12729

Contents lists available at ScienceDirect

Expert Systems with Applications journal homepage: www.elsevier.com/locate/eswa

A recognition and novelty detection approach based on Curvelet transform, nonlinear PCA and SVM with application to indicator diagram diagnosis Kun Feng, Zhinong Jiang ⇑, Wei He, Bo Ma Diagnosis and Self-recovery Engineering Research Center, Beijing University of Chemical Technology, Beijing 100029, China

a r t i c l e

i n f o

Keywords: 2D-Curvelet transform Nonlinear PCA SVM Novelty detection Indicator diagram recognition Reciprocating compressor

a b s t r a c t Indicator diagram plays an important role in the health monitoring and fault diagnosis of reciprocating compressors. Different shapes of indicator diagram indicate different faults of reciprocating compressor. A proper feature extraction and pattern recognition method for indicator diagram is significant for practical uses. In this paper, a novel approach is presented to handle the multi-class indicator diagrams recognition and novelty detection problems. When multi-class faults samples are available, this approach implements multi-class fault recognition; otherwise, the novelty detection is implemented. In this approach, the discrete 2D-Curvelet transform is adopted to extract the representative features of indicator diagram, nonlinear PCA is employed for multi-class recognition to reduce dimensionality, and PCA is used for novelty detection. Finally, multi-class and one-class support vector machines (SVMs) are used as the classifier and novelty detector respectively. Experimental results showed that the performance of the proposed approach is better than the traditional wavelet-based approach. Crown Copyright Ó 2011 Published by Elsevier Ltd. All rights reserved.

1. Introduction The reciprocating compressor is a common machine in modern chemical industry and is an important link in the production chain of process industry. The faults of reciprocating compressor could lead to serious accident, such as explosion and casualties. Hence, the health monitoring and fault diagnosis of reciprocating compressor are crucial for the reliability and safety in production. Indicator diagram is usually used in the monitoring and fault diagnosis of reciprocating compressor. However, recognition of indictor diagram requires the sophisticated knowledge and is very complicated for end users. In addition, the subjective judgment of the end user could lead to some false alarm in the process of diagnosis. Therefore, it would be significant to develop an effective approach for automatic indicator diagram novelty detection, recognition and diagnosis. From the viewpoint of image processing, each indicator diagram is a digital image, which is a 2-dimensional (2D) matrix formed by one planar closed curve. Geometrically speaking, it can be found that the main differences of different curves are contained in multi-scales and multi-directions of the closed curves (see 2.1). Furthermore, the fault indicator diagrams bend towards different directions (see 2). The first important task is to adjust the matrix in order to get clear features which are sensitive to these forementioned differences. In the field of image processing, the mul-

⇑ Corresponding author. Tel.: +86 10 64431325 601. E-mail address: [email protected] (Z. Jiang).

ti-resolution methods such as short-time Fourier transform and wavelet transform has been used to handle feature extraction problems (Diekmann et al., 2001; Imaeda et al., 2004; Murray, Gorse, & Thornton, 2002; Xiang et al., 2007). And the Curvelet transform-based approaches have generated increasing interest in the community of image processing over the past years (Candes, Demanet, Donoho, & Ying, 2007; Candes & Donoho, 2004; Donoho & Duncan, 2000; Jean-Luc, Candes, & Donoho, 2002; Li, Yang, & Jiao, 2010; Ma & Plonka, submitted for publication; Moghadas Nejad & Zakeri, 2011). Compared with short-time Fourier transform and wavelet transform, 2D-Curvelet transform is not only a multi-scale but also multi-direction transform in which more directions are considered and more precise representation of the image would be achieved than traditional 2D-wavelet transform (Candes et al., 2007; Donoho & Duncan, 2000; Ma & Plonka, submitted for publication). Moreover, as angled polar wedges windows in frequency domain are used by Curvelet transform to resolve directional features, it is useful and effective for representing line-like edges (Candes et al., 2007; Jean-Luc et al., 2002; Ma, 2007). Therefore, 2D-Curvelet transform can also be taken as the feature extraction tool to handle our first important task in this paper. However, the result of 2D-Curvelet transform is a high dimensional feature vector which would cause the so-called curse of dimensionality like any other high dimensional feature (Bishop, 2006; Theodoridis & Koutroumbas, 2003). For this reason, an appropriate dimensionality reduction approach should be used. In the multi-class case, with the multi-class samples acquired, the nonlinear principal component analysis (NLPCA) is adopted to reduce the dimensionality and overcome the class conjunctions

0957-4174/$ - see front matter Crown Copyright Ó 2011 Published by Elsevier Ltd. All rights reserved. doi:10.1016/j.eswa.2011.04.060

12722

K. Feng et al. / Expert Systems with Applications 38 (2011) 12721–12729

simultaneously (Cho, 2007; Duin, Loog, & Haeb-Umbach, 2000; Widodo & Yang, 2007). In one-class case, PCA is adopted instead because only the normal samples are available. Support vector machine (SVM) became popular recently for solving problems in classification, regression, and novelty detection (Baccarini, Rocha e Silva, de Menezes, & Caminhas, 2011; Scholkopf, Williamson, Smola, Shawe-Taylor, & Platt, 2000). Compared with traditional classifier, such as Neural Network, SVM has some better properties. An important one is that the determination of the model parameters corresponds to a convex optimization problem, so any local solution is also a global optimum (Bishop, 2006). Therefore, multi-class SVM and one-class SVM are employed as multi-classifier and novelty detector respectively in this paper. In a nutshell, our whole approach adopts 2D-Curvelet (Bishop, 2006; Donoho & Duncan, 2000; Theodoridis & Koutroumbas, 2003) and nonlinear PCA (Bishop, 2006) to extract the features, multi-class SVM (Franc, 2005; Schölkopf, Smola, & Muller, 1998; Scholkopf et al., 2000) to classify the multi-class indicator diagrams, and one-class SVM to detect novelty (Franc, 2005; Scholkopf et al., 2000). Some experimental investigations are conducted to verify the feasibility and advantages of this approach. A pressure sensor and a tachometer are mounted to measure the indicator diagram and convert it to digital image through a PCbased data acquisition system. With the measured digital indicator diagram, the approach is verified and compared with the traditional wavelet-based approach.

2. Indicator diagram feature extraction based on 2D-Curvelet transform and nonlinear PCA 2.1. Indicator diagram and its properties From the geometrical viewpoint, the indicator diagram of reciprocating compressor is a closed 2D curve (Huagen, Ziwen, & Pengcheng, 2004). The X-coordinate of it is the volume V sketched by the trajectory of the piston of the compressor and the Y-coordinate is the pressure P inside the cylinder. So indicator diagram is also called P–V graph (Wu, Peng, Xing, & Shu, 2004). Fig 1 shows an example of the indicator diagram. When different mechanical faults occur, indicator diagrams show different shapes due to the changing of dynamic property of the reciprocating compressor. Fig. 2 illustrates the corresponding indicator diagrams of five typical faults (see part 3).

From Fig. 2, we can roughly learn that the main differences between the faults are contained in the multi-scales and multidirections. For examples, the Fault 1 curve and normal curve are roughly the same except the micro difference in the top part of the curves. In contrast, between the Fault 2 curve and normal one, there are differences in the right and bottom lines. And, the Fault 3 curve is different from the normal one in the position and shape of higher right corner. Meanwhile, between Fault 4 and the normal one, there is difference in the position and shape of lower left corner. It is clear that the curves of different fault indicator diagrams bend towards different directions. For example, Fault 1 bends up, and Fault 4 bends left and down. In a word, the differences among all these indicator diagrams are subtle, directional and in different positions. These differences are the key to recognition and novelty detection. For this reason, the extracted features should be sensitive to the multi-scales and multi-directional differences of the shapes. As the Curvelet transform uses angled polar wedges or angled trapezoid windows in frequency domain in order to extract directional features, it is very efficient in representing line-like edges (Candes & Donoho, 2004; Donoho & Duncan, 2000; Ma, Antoniadis, & Le Dimet, 2006; Starck, Donoho, & Candes, 2003). As a result, it could be suitable for the feature extraction of indicator diagram. 2.2. 2D-Curvelet transform In 1999, Candues and Donoho introduced the Curvelet transform, a multi-scale representation suited for objects which are smooth away from discontinuities across curves (Donoho & Duncan, 2000). Unlike the wavelet transform, it has directional parameters, and the Curvelet pyramid contains elements with a very high degree of directional specificity. The transform was designed to represent edges and other singularities along curves, which is much more efficient than conventional transforms, such as short-time Fourier transform and wavelet (Candes & Donoho, 2004; Donoho & Duncan, 2000). All of these properties are very stimulating and have already led to a wide range of interesting applications. In 2005, a new mathematical architecture – Fast Discrete Curvelet Transforms (FDCvTs) that suggests innovative algorithmic strategies and improves upon earlier implementations is proposed by Canddes and Demanet (Candes, Donoho, Demanet, & Ying, xxxx; Starck et al., 2003). Two different algorithms are developed to implement FDCvT. Therefore, the FDCvT is adopted and a short review is given here. 2.2.1. Continuous-time Curvelet transform (Candes et al., 2007) ^ j ðxÞg ¼ IFTfU j ðxÞg. Definitions. Mother Curvelet: uj ¼ IFTfu Frequency window: U j ðxÞ is the frequency window defined in the Fourier domain by

0.6 0.55 0.5

U j ðr; hÞ ¼ 23j=4 Wð2j rÞV

0.45

P/Mpa

0.4

! ð1Þ

where bj=2c is the integer part of j/2. Meanwhile, WðrÞ (radial window) and VðtÞ (angular window) are window functions. These are smooth, nonnegative, and real-valued, with WðrÞ taking positive real arguments and supported on r 2 ð1=2; 2Þ and VðtÞ taking real arguments and supported on t 2 ½1; 1. These windows will always obey the admissibility conditions:

0.35 0.3 0.25 0.2

1 X

0.15 0.1

2bj=2c h 2p

W 2 ð2j  rÞ ¼ 1;

r 2 ð3=4; 3=2Þ

ð2Þ

V 2 ðt  lÞ ¼ 1;

t 2 ð1=2; 1=2Þ

ð3Þ

j¼1

0

0.1

0.2

0.3

0.4

0.5

V/cm3

Fig. 1. An example of the indicator diagram.

0.6

0.7 1 X l¼1

12723

K. Feng et al. / Expert Systems with Applications 38 (2011) 12721–12729 0.7

0.7 Normal Fault1

0.6

0.7 Normal Fault2

0.6

0.5

0.5

0.5

0.4

0.4

0.4

0.3

0.3

0.3

0.2

0.2

0.2

0.1

0.1

0.1

0

0

0.2

0.4

0.6

0

0.8

0

0.2

Normal Fault4

0.6

0.4

0.6

0.5

0.4

0.4

0.3

0.3

0.2

0.2

0.1

0.1

0

0

0.2

0.4

0.6

0

0.8

0.8

0

0

0.2

0.4

0.6

0.8

Normal Fault5

0.6

0.5

Normal Fault3

0.6

0

0.2

0.4

0.6

0.8

Fig. 2. Five typical faults’ indicator diagrams smoothed by Savitzky–Golay Filter (see 3.1).

The equispaced sequence of rotation angles: hl ¼ 2p  2bj=2c  l, with l ¼ 0; 1; 2; . . .. The sequence of translation parameters: k ¼ ðk1 ; k2 Þ 2 Z 2 . Curvelets as function of x = (x1, x2) at scale 2j,orientation hl and ðj;lÞ j j=2 position xk ¼ R1 Þ is defined as hl ðk1  2 ; k2  2

uj;l;k ðxÞ ¼ uj ðRhl ðx  xðj;lÞ k ÞÞ

ð4Þ

where Rh is the rotation by h radians. The Curvelet coefficients are defined as

cðj; l; kÞ ¼ hf ; /j;l;k i ¼

Z R2

 j;l;k ðxÞdx f ðxÞ  /

ð5Þ

where f e L2(R2), h, i denotes the inner product. Express this inner product as the integral over the frequency plane:

1

Z

^f ðxÞ  / b j;l;k ðxÞdx ð2pÞ2 Z 1 ;xi ^f ðxÞ  U ðR xÞeihxðj;lÞ k ¼ dx j hl 2 ð2pÞ

cðj; l; kÞ ¼

ð6Þ

2.2.2. Discrete Curvelet transform The discrete transformation is linear and take as input Cartesian arrays of the form f[t1, t2], 0 6 t 1 ; t 2 < n, which provides the discrete Curvelet coefficients (DCvCs) obtained by the analog-todigital convertion of Eq. (5): D

C ðj; l; kÞ ¼

X

f ½t 1 ; t2   u

D j;l;k ½t 1 ; t 2 

ð7Þ

Fig. 3. The basic digital tiling. The shaded region represents one such typical wedge (Candes et al., 2007).

C D ðj; l; kÞ ¼

X

_

f ½n1 ; n2  n1 tan hl 

n1 ;n2 2Pj _

 U j ½n1 ; n2 ei2pðk1 n1 =L1;j þk2 n2 =L2;j Þ

ð8Þ

06t 1 ;t 2
where each uDj;l;k is a digital Curvelet waveform (D stands for ‘‘digital’’) which is implicitly defined by the fast algorithms. For the discrete Curvelet transform, the basic digital tilling and a typical wedge are illustrated in Fig. 3 (Candes et al., 2007). And, the FDCvT by Unequispaced FFT’s (USFFT) can be expressed as

2.2.3. An example of an indicator diagram’s FDCvT In this work, every indicator diagram is a 256  256 gray image. We applied 5-level FDCvT on it. The numbers of the wedges of all 5 scales are 1, 32, 32, 64, and 1. We extract one feature from each wedge. Hence, there exist 130 (1 + 32 + 32 + 64 + 1) wedge energy values which are the features of each indicator diagram. The DCvCs

12724

K. Feng et al. / Expert Systems with Applications 38 (2011) 12721–12729 log of curvelet coefficients

50

100

150

200

250

300

350

400

450

50

100

150

200

250

300

350

400

450

Fig. 4. An example of the log of DCvCs of an indicator diagram.

are shown in Fig. 4 and all the features extracted from DCvCs are shown Fig. 5.

All the features of an Indicator Diagram extracted from DCvCs 0.045

Ej;l ¼

X

jC D ðj; l; kÞj2

ð9Þ

k

And the flow chart of the FDCvT-based feature extraction is shown in Fig. 6. 2.3. Dimensionality reduction and nonlinear PCA (NLPCA)

Feature Value(Energy of the Edges)

0.04

2.2.4. Features extracted by FDCvT Since the DCvCs can represent the indicator diagram in different scales and wedges, the energy of the DCvCs could be adopted as features

0.035 0.03 0.025 0.02 0.015 0.01 0.005

A major problem associated with pattern recognition is the so-called curse of dimensionality (Bishop, 2006). There are two reasons to explain why the dimension of the feature vector cannot be too large: firstly, the computational complexity would become too large; secondly, an increase of the dimension will ultimately causes a decrease of the performance (van der Heijden, Duin, De Ridder, & Tax, 2004). For the reduction of the feature space dimensionality, two different approaches exist. One is to discard certain elements of the feature vector and keep the most representative ones. This type of reduction is feature selection (Theodoridis & Koutroumbas, 2003). Another one is called feature extraction, in which, the original feature vector is converted to a new one by a special transform and the new features have a much lower dimensions. In this paper, the second approach is adopted, and an effective feature extraction method, namely, nonlinear PCA (NLPCA), will be used. PCA is a powerful technique for extracting structures from possibly high dimensional data sets. However, the traditional linear PCA, which is to find a liner solution for the feature extraction problem, is not optimal with respect to the class separability. Because multi-class classification problems often suffer from the problem of serious class conjunctions (Duin et al., 2000). In this

0

0

20

40

60 80 Feature No.

100

120

140

Fig. 5. All the features of the example indictor diagram.

case, liner PCA could not work well, therefore it hardly has practical values to solve the multi-class classification problem faced in this paper. In Duin et al. (2000), NLPCA is proposed to avoid the class conjunction. The principle of it is discussed as follows. In nonlinear PCA, a weighted between-scattered matrix SB is defined for a c-class classification problem:

SB ¼

c1 X c X

pi pj wðdij Þðmi  mj Þðmi  mj ÞT

ð10Þ

i¼1 j¼iþ1

where pi and pj are the prior probabilities of the classes i and j respectively; mi is the class means represented by row vectors; w(d) is the weighting function based on a error function erf():

wðdÞ ¼

  d p ffiffiffi ; erf 2 2 2 2d 1

2 erf ðxÞ ¼ pffiffiffiffi

p

Z 0

x

2

et dt

ð11Þ

12725

K. Feng et al. / Expert Systems with Applications 38 (2011) 12721–12729

the performance of classifying and detection. Finally, a comparison is made between the proposed approach and the traditional wavelet transform-based approach.

2D Digital Indicator Diagram (256X256)

3.1. Indicator diagram data collection and preprocessing

2D Curvelet Transform (5 Levels)

Scales: 1 Wedges: 1

2 32

3 32

4 64

In this work, we monitor a reciprocating compressor through two measurement points. One measurement point is the key-phase whose signal is collected by an eddy-current transducer. This keyphase works as a trigger for the second channel and is used to calculate the volume V sketched by the piston (X-axial of the indicator diagram). Another measurement point measures the real time pressure (Y-axial of the indicator diagram). Because of the fluctuation of pressure and the intrinsic noise of the physical channel, the crude indicator diagram is a little rough. To reduce the effects of random noise, we first smooth the curve of indicator diagram by a moving average filter before the feature extraction. In this preprocessing phase, the Savitzky–Golay filter (Luo, Ying, He, & Bai, 2005; Press, Teukolsky, Vetterling, & Flannery, 1990) is adopted (Fig. 7). Then, we transform the indicator diagrams into 256  256 digital gray-level images. After that, FDCvT is used to extract DCvCs, which is a 130-dimensional feature vectors.

5 1

Calculate the Energy of All Wedges

130 Wavelet based Features

3.2. Feature extracted by Curvelet transform and nonlinear PCA Fig. 6. Flow chart of the FDCvT-based feature extraction.

The eigenvectors corresponding to the largest eigenvalues of SB are used for obtaining the linear map. n_ _ o _ Let v 1 ; v 2 ; . . . ; v N is the set of orthonormal eigenvectors with associated eigenvalues fk1 ; k2 . . . ; kN g for the symmetric matrix SB . Without loss of generality, let k1 > k2 >    > kN , and we have: _

_

SB v i ¼ ki v i ;

i ¼ 1; 2; . . . N

ð12Þ

Assume that we want to extract n principal components from N features, the linear map for our feature extractor is

Y ðn1Þ ¼ W ðnNÞ  X ðN1Þ ;

_

_

_

W ¼ ½v 1 ; v 2 ; . . . v N T

ð13Þ

where X is the vector of original high dimensional feature space; Y is the vector of low dimensional feature subspace. In this paper, X is the vector whose elements are the 130 DCvCs. From the above discussion, we can get a helpful result. Although this feature extraction approach is based on nonlinear PCA, the final mapping that maps the DCvCs to a low dimensional feature space is linear. Furthermore, this linear mapping could be obtained through solving a standard matrix eigenvalues problem (Schölkopf et al., 1998) which has been settled efficiently (Heath, 2002). Hence, the NLPCA is also computationally efficient. Therefore, it is easy to be implemented and has high calculation efficiency.

According to the above discussion, using high dimensional feature directly for recognition would lead to high computation complexity because of the ‘‘curse of dimension’’. Therefore, we employ NLPCA to reduce the dimension of the feature space from 130 to 3. Let each class i, 1 6 i 6 K (K = 6 in this experiment), be characterized by its mean mi , covariance Si and a priori probability pi (here, pi = 1/6 for each i). The feature extraction procedures based on nonlinear PCA are as follows: First, a pre-whitening step which transforms the average withP in-class scatter matrix Sw ¼ Ki¼1 pi Si into the identity matrix. After this step, the new sample set with identity Sw is obtained. Second, we formulate the matrix Sb and get the linear mapping W which is obtained through finding the eigenvalues of Sb and choosing the first 3 principal ones kj ; j ¼ 1; 2; 3 with the corresponding eigenvectors Pj , j = 1, 2, 3:

W ¼ ½P1 ; P2 ; P3 T

At last, the 3D feature vector could be obtained by simple mapping or projection operation: Crude Curve Smoothed Curve

0.6

0.5

P/MPa

3. Experimental investigation To verify the proposed approach, experimental investigations are carried out. At first, five typical faults of a reciprocating compressor are made, which are discharge valve or pipe blockage, suction valve or pipe blockage, suction valve leakage, discharge valve leakage and discharge and suction valve leakage. Then, 700 samples (digital indicator diagrams) are collected under all the faulty and normal working conditions. The FDCvT and nonlinear PCA are adopted to extract the needed features (we compare 2 and 3 features here) from the 700 indicator diagrams. Five hundred samples of them are selected to train the Radial Basis Function (RBF) SVM classifier and detector, and other 200 samples are used to test

ð14Þ

0.4

0.3

0.2

0.1

0

0

0.1

0.2

0.3

0.4

0.5

0.6

V/cm3

Fig. 7. The crude and smoothed curve by Savitzky–Golay filter.

0.7

12726

K. Feng et al. / Expert Systems with Applications 38 (2011) 12721–12729 PCA Features of Trainning Samples

NLPCA Features of Trainning Samples

5 Feature 3

Fault1 Fault2 Fault3 Fault4 Fault5 Normal

Fault2 Fault3 Fault4 Fault5 Normal

10

0

2

−5 −10

1 0

10

−10 Feature 2

−10

0 Feature 1

Feature 3

0

10

−1

2 0

−2

−2

Feature 2 −6

−4

−2

Fig. 8. Feature extraction result of indicator diagram training samples (based on FDCvT & NL PCA).

ð15Þ

where W acts a projection matrix. Now, the feature extraction of the 6  500 training vectors (each one is 130-dimension) can be accomplished by repeating the projection (Eq. (15)) 3000 times. And, the testing samples are obtained through mapping testing vectors by the projection matrix W in (Eq. (15)). The nonlinear PCA results are shown in Figs. 8 and 9. For comparison, we also give the result of the traditional PCA (linear PCA). The implement of this method is similar to the nonlinear PCA-based method. The only difference is that the mapping W is obtained from the eigenvectors of the covariance matrix of the samples. As illustrated by Figs. 10 and 11, slight class conjunction happens between Fault 2 class and the normal one. Indeed, class separability is not considered a lot by the linear PCA. Nonetheless, the overall discrimination effectiveness is still good as a result of the application of Curvelet transform. Furthermore, as a simple algorithm, PCA does not rely on samples of different classes. This advantage is very crucial when fault samples are not available, e.g. novelty detection.

In the recent decades, SVM has become an increasingly popular tool for machine learning tasks involving classification, regression,

NLPCA Features of Testing Samples Fault1 Fault2 Fault3 Fault4 Fault5 Normal

Feature 3

5 0 −5 −10

10 0 −10 Feature 2

−10

0 Feature 1

4

Fig. 10. Feature extraction result of indicator diagram training samples (based on FDCvT & PCA).

PCA Features of Testing Samples Fault1 Fault2 Fault3 Fault4 Fault5 Normal

2 1 0 −1

1 0 −1 −2 −6

−2 −4

−2

0

2

Feature 2

−3 4

Feature 1

3.3. Classification and novelty detection result by SVM

10

2

Feature 1

Feature 3

Y ð31Þ ¼ W ð3130Þ  X ð1301Þ

0

−3

10

Fig. 9. Feature extraction result of indicator diagram testing samples (based on FDCvT & NL PCA).

Fig. 11. Feature extraction result of indicator diagram testing samples (based on FDCvT & PCA).

and novelty detection. It exhibits good generalization performance on many real world datasets and is well-established theoretically (Campbell, 2000). In this paper, the multi-class SVM (Franc, 2005) is adopted to classify indicator diagrams of different faults while one-class SVM is used for novelty detection. A typical scheme of multi-class SVM classification is to use a multi-class formulation SVM and learn all the parameters of discriminant functions at once (Franc, 2005). Compared with multi-class SVM, One-class SVM is to find a sphere with a minimal radius and center which contains most of the normal data: novel test points are those which lie outside the boundary of the sphere. We employ the approaches proposed by Franc (2005) to accomplish these two tasks. Here, radial basic function is adopted as the kernel function. In practice, these two different tasks have different implementation procedures as demonstrated in Fig. 12. We list the experimental result of multiple faults classification and novelty detection as follows. We can see that the recognition accuracy of NLPCA is higher than PCA for almost all faults. When three features generated by PCA are used, 4 out of 260 normal samples are detected as outlier; meanwhile, 2 out of 260

12727

K. Feng et al. / Expert Systems with Applications 38 (2011) 12721–12729

Normal and All Faults Indicator Diagrams for trainning

New Testing Indicator Diagrams

Normal Indicator Diagrams for trainning

New Testing Indicator Diagrams

Preprocessing

Preprocessing

FDCvT & NonlinearPCA

FDCvT & PCA

Samples of the Normal and All Faults Classes

Samples of the Normal Class

Train the Multiclass SVM

Train the One class SVM

Traned Multicalss SVM

New Testing Samples

Result of Classification

(a) Multiple Faults Classification

Traned One class SVM

New Testing Samples

Result of Novelty Detection

(b) Novelty Detection

Fig. 12. Flow charts of classification and novelty detection: (a) multiple faults classification and (b) novelty detection.

Fault 2 samples are detected as normal. This approach shows a very high accuracy. As a contrast, when two features are used, 8 out of 260 normal samples are detected as outlier, and, 286 out of 1300 fault samples (150 out of Fault 2, 126 out of Fault 4, and 10 out of Fault 5) are detected as normal.

3.4. Comparison with wavelet-based features As demonstrated in the above section, the Curvelet transformbased features are very effective for the recognition and novelty detection of indicator diagrams. To further demonstrate the superiority of Curvelet transform-based feature, we also compare it with some traditional features used for shape recognition, such as wavelet-based features (Diekmann et al., 2001; Imaeda et al., 2004; Murray et al., 2002). For a parallel comparison with 5-level FDCvT-based features, the 5-level wavelet decomposition is adopted here. There are four kinds of coefficients in each level: approximation coefficients, horizontal detail coefficients, vertical detail coefficients and diagonal detail coefficients. We calculate the energy of each kind coefficients by Eq. (8). And the flow chart of the wavelet-based feature extraction is shown in Fig. 13. Now, we get 20 wavelet-based features. Similar to the FDCvTbased features, these features is adopted in classification and novelty detection through NLPCA/PCA and multi-class SVM/one-class SVM. The equivalent experimental results are illustrated by the following tables. As demonstrated by Tables 1–4, the overall performance of FDCvT-based features is higher than wavelet-based features. The accuracy of wavelet-based features is sharply low under some

2D Digital Indicator Diagram (256X256) 2D Wavelet Transform (5 Levels)

5 Levels X 4 Coefficient Vectors

Calculate the Energy of Every Coefficient Vectors

20 Wavelet based Features Fig. 13. Flow chart of the wavelet-based feature extraction.

condition. That is to say, the FDCvT-based features are more representative than wavelet-based features.

12728

K. Feng et al. / Expert Systems with Applications 38 (2011) 12721–12729

Table 1 Accuracy ratio of multiple faults classification (FDCvT-based features).

Start

Features

3 Features extracted by PCA (%)

3 Features extracted by NLPCA (%)

All faults (260  5 samples) Fault 1 (260 samples) Fault 2 (260 samples) Fault 3 (260 samples) Fault 4 (260 samples) Fault 5 (260 samples)

98.25 93.62 97.65 100 100 100

99.79 100 99.33 100 99.66 100

Faults Samples are Available

Y

N Novelty Detection

Table 2 Accuracy ratio of novelty detection (FDCvT-based features). Features extracted by PCA

2 Features (%)

3 Features (%)

Normal (260 samples) Fault 1 (260 samples) Fault 2 (260 samples) Fault 3 (260 samples) Fault 4 (260 samples) Fault 5 (260 samples)

96.92 100 42.30 100 51.53 96.15

98.46 100 99.23 100 100 100

Multiple Faults Classification

Confirm the Fault Artificially

Store the Fault Indicator Diagrams as samples Table 3 Accuracy ratio of multiple faults classification (wavelet-based features). Features

3 Features extracted by PCA (%)

3 Features extracted by NLPCA (%)

All faults (260  5 samples) Fault 1 (260 samples) Fault 2 (260 samples) Fault 3 (260 samples) Fault 4 (260 samples) Fault 5 (260 samples)

87.58 96.64 85.57 97.99 79.19 100

98.99 93.96 100 99.33 100 99.66

Fig. 14. The overall framework.

3. The overall framework. Considering the above discussed practical problems, we suggest the overall framework illustrated by Fig. 14. 5. Conclusions

Table 4 Accuracy ratio of novelty detection (wavelet-based features). Features extracted by PCA

2 Features

3 Features

Normal (260 samples) Fault 1 (260 samples) Fault 2 (260 samples) Fault 3 (260 samples) Fault 4 (260 samples) Fault 5 (260 samples)

75.38 100 100 100 39.23 99.62

74.23 100 100 100 58.85 100

4. Practical consideration The purpose of this paper is to provide an effective approach to solve the real-world engineering problem of indicator diagram recognition. Hence, it is better to take some practical situations into account. 1. Over three classes of faults samples are available. As the above discussion shows, at least three classes of faults samples are necessary for the application of NLPCA in multiclass indicator diagrams recognition. Therefore, if samples of over three classes of faults are available; the NLPCA-based multi-class indicator diagrams recognition could be implemented. 2. No faults samples are available. However, samples of fault indicator diagrams are not always available in practice. Under the condition that these samples are not available, novelty detection based on PCA can be done. After that, a professional engineer could check the fault indicator diagrams, confirm the fault and store them as new fault samples.

In this paper, we propose a novel approach for indicator recognition and novelty detection. Discrete 2D-Curvelet transform is employed to transform an indicator diagram to a high-dimensional feature space. Then, the so-called nonlinear PCA is adopted to map the high-dimensional features to 3-dimensional feature space under the multi-class recognition situation, and PCA is used for dimensionality reduction under the one-class situation. At last, multi-class SVM and one-class SVM are taken as the classifier and novelty detector respectively. As to the computing efficiency of the proposed approach, firstly, as we know, PCA and SVM have low computational burden. In addition, the final mapping of NLPCA is linear and can be realized by solving a standard eigenvalue problem, although the class separability is enhanced by nonlinear weighting. As a result, the proposed approach is fast enough to be used in real-time applications. Some experimental investigations are carried out to inspect and verify our approach and satisfied results are obtained. Furthermore, we also compare FDCvT-based feature generation approach with traditional wavelet-based approach. The results show that the proposed approach is more accurate. At the end of this paper, some practical considerations about the proposed approach are discussed and an overall framework of implementation is suggested for engineers. Acknowledgements This research is supported by the National Natural Science Foundation of China under Grant No. 50635010. The authors also thank Prof. Emmanuel Candes of Applied and Computational Mathematics California Institute of Technology, Assistant Prof. Laurent Demanet of Applied Mathematics Department of Mathematics of MIT for providing the CurveLab software and the referred figure (Fig. 3).

K. Feng et al. / Expert Systems with Applications 38 (2011) 12721–12729

References Baccarini, L. M. R., Rocha e Silva, V. V., de Menezes, B. R., & Caminhas, W. M. (2011). SVM practical industrial application for mechanical faults diagnostic. Expert Systems with Applications, 38, 6980–6984. Bishop, C. (2006). Pattern recognition and machine learning. New York: Springer. Campbell, C. (2000). Algorithmic approaches to training support vector machines: A survey. In Proceedings of ESANN2000 (pp. 27–36). Candes, E., Demanet, L., Donoho, D., & Ying, L. (2007). Fast discrete curvelet transforms. Multiscale Modeling and Simulation, 5, 861–899. Candes, E., & Donoho, D. (2004). New tight frames of curvelets and optimal representations of objects with C2 singularities. Communications on Pure and Applied Mathematics, 56, 219–266. Candes, E., Donoho, D., Demanet, L., & Ying, L. (xxxx). Curvelab: Fast discrete curvelet transform. URL: . Cho, H.-W. (2007). Nonlinear feature extraction and classification of multivariate data in kernel feature space. Expert Systems with Applications, 32, 534–542. Diekmann, F., Heinlein, P., Drexl, J., Grebe, S., Gössler, A., Schneider, W., et al. (2001). Visualization of microcalcifications by full-field digital mammography using a wavelet algorithm. International Congress Series, 1230, 526–530. Donoho, D., & Duncan, M. (2000). Digital curvelet transform: Strategy, implementation and experiments. Dept. of Statistics, Stanford University. Duin, R., Loog, M., & Haeb-Umbach, R. (2000). Multi-class linear feature extraction by nonlinear PCA. In IEEE proceedings of the 15th international conference on pattern recognition, Barcelona, Spain (Vol. 15, pp. 398–401). Franc, V. (2005). Optimization algorithms for kernel methods. Centre for Machine Perception, Czech Technical University. Heath, M. (2002). Scientific computing. McGraw-Hill. Huagen, W., Ziwen, X., & Pengcheng, S. (2004). Theoretical and experimental study on indicator diagram of twin screw refrigeration compressor. International Journal of Refrigeration, 27, 331–338. Imaeda, S., Kobashi, S., Kitamura, Y. T., Kondo, K., Hata, Y., & Yanagida, T. (2004). Wavelet-based hemodynamic analyzing method in event-related fMRI with statistical processing. International Congress Series, 1270, 138–141. Jean-Luc, S., Candes, E. J., & Donoho, D. L. (2002). The curvelet transform for image denoising. IEEE Transactions on Image Processing, 11, 670–684. Li, Y., Yang, Q., & Jiao, R. (2010). Image compression scheme based on curvelet transform and support vector machine. Expert Systems with Applications, 37, 3063–3069.

12729

Luo, J., Ying, K., He, P., & Bai, J. (2005). Properties of Savitzky–Golay digital differentiators. Digital Signal Processing, 15, 122–136. Ma, J. (2007). Curvelets for surface characterization. Applied Physics Letters, 90, 054–109. Ma, J., Antoniadis, A., & Le Dimet, F. (2006). Curvelet-based snake for multiscale detection and tracking of geophysical fluids. IEEE Transactions on Geoscience and Remote Sensing, 44, 3626–3638. Ma, J., & Plonka, G. (submitted for publication). A review of curvelets and recent applications. IEEE Signal Processing Magazine. Moghadas Nejad, F., & Zakeri, H. (2011). A comparison of multi-resolution methods for detection and isolation of pavement distress. Expert Systems with Applications, 38, 2857–2872. Murray, K. B., Gorse, D., & Thornton, J. M. (2002). Wavelet transforms for the characterization and detection of repeating motifs. Journal of Molecular Biology, 316, 341–363. Press, W., Teukolsky, S., Vetterling, W., & Flannery, B. (1990). Savitzky–Golay smoothing filters. Computers in Physics, 4, 669–672. Schölkopf, B., Smola, A., & Muller, K. (1998). Nonlinear component analysis as a kernel eigenvalue problem. Neural Computation, 10, 1299–1319. Scholkopf, B., Williamson, R., Smola, A., Shawe-Taylor, J., & Platt, J. (2000). Support vector method for novelty detection. Advances in Neural Information Processing Systems, 12, 582–588. Starck, J., Donoho, D., & Candes, E. (2003). Astronomical image representation by the curvelet transform. Astronomy and Astrophysics, 398, 785–800. Theodoridis, S., & Koutroumbas, K. (2003). Pattern recognition. New York: Academic Press. van der Heijden, F., Duin, R., De Ridder, D., & Tax, D. (2004). Classification parameter estimation and state estimation: An engineering approach using MATLAB. John Wiley & Sons Inc. Widodo, A., & Yang, B.-S. (2007). Application of nonlinear feature extraction and support vector machines for fault diagnosis of induction motors. Expert Systems with Applications, 33, 241–250. Wu, H., Peng, X., Xing, Z., & Shu, P. (2004). Experimental study on P–V indicator diagrams of twin-screw refrigeration compressor with economizer. Applied Thermal Engineering, 24, 1491–1500. Xiang, J., Xiao, Z., Wang, Y., Feng, Y., Qiao, H., Sun, B., et al. (2007). Detection of subtle structural abnormality in tuberous sclerosis using MEG guided post-image processing. International Congress Series, 1300, 693–696.