Complex system fault diagnosis based on a fuzzy robust wavelet support vector classifier and an adaptive Gaussian particle swarm optimization

Complex system fault diagnosis based on a fuzzy robust wavelet support vector classifier and an adaptive Gaussian particle swarm optimization

Information Sciences 180 (2010) 4514–4528 Contents lists available at ScienceDirect Information Sciences journal homepage: www.elsevier.com/locate/i...

304KB Sizes 10 Downloads 116 Views

Information Sciences 180 (2010) 4514–4528

Contents lists available at ScienceDirect

Information Sciences journal homepage: www.elsevier.com/locate/ins

Complex system fault diagnosis based on a fuzzy robust wavelet support vector classifier and an adaptive Gaussian particle swarm optimization Qi Wu a,*, Rob Law b a b

Jiangsu Key Laboratory for Design and Manufacture of Micro-Nano Biomedical Instruments, Southeast University, Nanjing 211189, China School of Hotel and Tourism Management, Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong

a r t i c l e

i n f o

Article history: Received 8 July 2009 Received in revised form 25 July 2010 Accepted 1 August 2010

Keywords: Fault diagnosis FW v-SVC Wavelet kernel function Adaptive mutation Gaussian mutation Particle swarm optimization

a b s t r a c t This paper proposes a robust loss function that penalizes hybrid noise (i.e., Gaussian noise, singularity points, and larger magnitude noise) in a complex fuzzy fault-diagnosis system. A mapping relationship between fuzzy numbers and crisp real numbers that allows a fuzzy sample set to be transformed into a crisp real sample set is also presented. Furthermore, the paper proposes a novel fuzzy robust wavelet support vector classifier (FRWSVC) based on a wavelet base function and develops an adaptive Gaussian particle swarm optimization (AGPSO) algorithm to seek the optimal unknown parameter of the FRWSVC. The results of experiments that apply the hybrid diagnosis model based on the FRWSVC and the AGPSO algorithm to fault diagnosis demonstrate that it is both feasible and effective. Tests comparing the method proposed in this paper against other fuzzy support vector classifier (FSVC) machines show that it outperforms them. Ó 2010 Elsevier Inc. All rights reserved.

1. Introduction Support vector machines (SVMs), which comprise a novel set of machine learning techniques, have recently drawn considerable attention in the fields of pattern classification [1,3,20,50,51] and regression forecasting [2,6,11,41,47,48]. They were first introduced by Vapnik and his colleagues [37] as a type of classification method used in statistical theory. SVM are derived from linear classifiers and can be used to solve the problem of classifying new examples into one of two categories. In the future, algorithms of this type are expected to be applied to nonlinear fields; i.e., we will be able to find the optimal large margin hyper-plane needed to classify a set of samples. SVMs represent an approximate implementation of the structure risk minimization (SRM) principle as an alternative to the empirical risk minimization (ERM) method in statistical learning theory. Unlike traditional neural networks [25], SVMs are based on SRM theory to avoid problems such as excessive study, calamity data, and local minimal value [37]. Algorithms of this type are highly generalizable for small sample sets. SVMs have been successfully used for machine learning with large and high-dimensional data sets [1,3,8,20,24,39,40,49]. These attractive properties underline the promise of the SVM technique because SVM generalization does not depend on the complete set of training data but rather only a subset thereof, namely, the so-called support vectors. SVMs have now been applied in many fields such as fault diagnosis [27,39,40,46], classification [1,3,20,50,51], image retrieval [24,41], handwritten digit recognition [14,44], feature selection [22], regression estimation [2,6,11,41,47,48], biomedical data classification [10], network anomaly detection [30], color image watermark extraction [33], and a number of other applications [15,29,45,47].

* Corresponding author. Tel.: +86 25 51166581; fax: +86 25 511665260. E-mail addresses: [email protected] (Q. Wu), [email protected] (R. Law). 0020-0255/$ - see front matter Ó 2010 Elsevier Inc. All rights reserved. doi:10.1016/j.ins.2010.08.006

Q. Wu, R. Law / Information Sciences 180 (2010) 4514–4528

4515

In many real fault-diagnosis applications, some observed input data cannot be measured precisely and are often described in imprecise language or ambiguous metrics. This leads to fuzzy and uncertain decision-making results. Traditional support vector machine methods cannot cope with fuzzy information. It is well known that fuzzy logic is a powerful tool for dealing with fuzzy and uncertain problem. A number of scholars have explored fuzzy support vector machines (FSVMs) [4,5,8,17,50,51] that use fuzzy membership to score input variables and facilitate crisp decision-making. However, it is clear that these published FSVMs are not suited for input and output variables with fuzzy numbers. To resolve this problem, we transform fuzzy numbers into crisp real numbers according to a proposed mapping relationship. This not only ensures that all input samples are crisp, but also it transforms the output variables of SVMs into fuzzy numbers to facilitate more appropriate subjective decision-making. Complex fuzzy diagnosis systems inevitably contain some level of noise. Given that standard support vector classifier machines with an e-insensitive loss function are poor at penalizing hybrid noise (i.e., Gaussian noise, singularity points, and larger magnitude noise), it is necessary to develop a novel loss function. By analyzing noise features, we propose a robust loss function composed of a Gaussian loss function, a Laplace loss function, and an e-insensitive function. The proposed function is effective in reducing the influence of hybrid noise on the generalization performance of the algorithm. Furthermore, a fuzzy robust support vector classifier machine (FSVC) is established on the basis of the fuzzy mapping and robust loss function proposed here. When resolving the decision-making function of the FSVC model, it is difficult to precisely derive b. To overcome this disadvantage, the influence of parameter b is taken into account in the confidence interval of the FSVC model, and a modified FSVC model is established. Parameter b does not appear in the decision-making function of the modified FSVC model. To make use of wavelet decomposition, this paper proposes an allowable support vector kernel function called the wavelet kernel function. This kind of kernel function is also shown to exist. On the basis of wavelet analysis and the conditions of the support vector kernel function, the Morlet and Mexican wavelet function can be transformed into the kernel function of the fuzzy support vector classifier (FSVC) machine. Some scholars have previously explored the wavelet e-support vector classifier or regression machine [39,42], with some studies showing that v-SVC performs better than e-SVC [42]. Therefore, this paper focuses on modeling the modified fuzzy v-support vector classifier machine with a wavelet kernel function (FWvSVC). By integrating the robust loss function, the wavelet kernel function, and the mapping relationship, and on the adjustment of parameter b, we propose a new version of the fuzzy wavelet support vector classifier machine referred to as fuzzy robust W v-SVC (FRW v-SVC). Obviously, the input and output of FRW v-SVC are fuzzy numbers. The main contributions of the proposed FRW v-SVC are as follows: (1) FRW v-SVC has the ability to penalize hybrid noise (i.e., Gaussian noise, singularity points, and larger magnitude noise). (2) FRW v-SVC is suited to cases such as those described in (1) above. (3) FRW v-SVC has the ability to handle the fuzzy input and out cases. (4) Parameter b is removed from the decision-making function through a Hilbert space transformation. Based on the FRW v-SVC model, this paper proposes an intelligent diagnosis method for complex fuzzy diagnosis systems. The rest of the paper is organized as follows. Section 2 introduces the modeling process for FRW v-SVC. The merits and characteristics of the FRW v-SVC model are described in Section 3. Section 4 proposes an adaptive Gaussian mutation particle swarm optimization algorithm (AGPSO). Section 5 designs two algorithms to solve the intelligent diagnosis problem. Four applications based on the FRW v-SVC model and the AGPSO algorithm are discussed in Section 6. Section 7 presents conclusions. 2. Fuzzy robust wavelet support vector classifier 2.1. Robust loss function Suppose that we have the sample set T ¼ fðxi ; yi Þgli¼1 , where xi 2 Rd,yi 2 R. With the objective of mitigating learning risk, the ideal learning algorithm is designed to minimize the following empirical risk function [37]:

RðwÞemp ¼

Z

jyi  f ðxi ; wÞdpðxi ; yi Þj;

ð1Þ

where p(xi, yi) is the probability distribution of (xi, yi). Due to lack of information on p(xi, yi), it is common to use the empirical risk function Remp(w) estimated from the training sample set. This loss function plays an important role in modeling for applications of support vector techniques. To avoid overfitting and enhance generalization, a capacity control term [34] is often given as follows: l k 1X k RðwÞ ¼ Remp ðwÞ þ kwk2 ¼ cðxi ; yi ; f ðxi ; wÞÞ þ kwk2 ; 2 l i¼1 2

ð2Þ

4516

Q. Wu, R. Law / Information Sciences 180 (2010) 4514–4528

where c denotes a loss function determining how the estimation error based on the training samples will be penalized. k > 0 is a regularization constant. The last term controls the flatness of the function. In this context, controlling flatness requires the achievement of a small kwk2, where w 2 Rd. This has a significant effect on the generalization capability of the algorithm. Thus, one might consider what loss function should be used. Ideally, this function should have a simple structure to avoid difficult optimization problems and should be suitable for the data. Meanwhile, noise may affect the samples even after denoising. A set of training samples is generated by a function plus additive noise as follows:

yi ¼ f ðxi Þ þ ni :

ð3Þ

The likelihood of an estimate Ff = {(xi,f(xi, w)) ji = 1, 2, . . ., l} based on the sample set is

PðF f jFÞ ¼

l Y

Pðf ðxi ; wÞjðxi ; yi ÞÞ ¼

l Y

i¼1

Pðf ðxi ; wÞjyi Þ ¼

i¼1

l Y

pðyi  f ðxi ; wÞÞ ¼

i¼1

l Y

pðni Þ;

ð4Þ

i¼1

where p(ni) is the noise density. The appropriate loss function can maximize this likelihood and is equivalent to maximizing log P(FfjF), where

log PðF f jFÞ ¼

l X

log pðyi  f ðxi ; wÞÞ:

ð5Þ

i¼1

Thus, the appropriate loss function is

cðxi ; yi ; f ðxi ; wÞÞ ¼  log pðyi  f ðxi ; wÞÞ ¼  log pðni Þ:

ð6Þ

The abnormal condition can be detected by comparison with the similarity measure defined by

log PðF f jFÞ ¼

l X i¼1

log pðni Þ ¼ 

l X

cðxi ; yi ; f ðxi ; wÞÞ:

ð7Þ

i¼1

The importance of a new similarity measure is twofold. First, it presents a principle that enables the construction of a loss function. Once the noise density of the system is defined, its related loss function can be obtained according to (6). In real-world applications, the standard Gaussian density model N(0, 1) is commonly used to describe noise. Hence, the Gaussian density model and its loss function are employed here as outlined in (8) and (9):

  1 1 pðyi  f ðxi ; wÞÞ ¼ pðni Þ ¼ pffiffiffiffiffiffiffi exp  n2i 2 2p 1 1 2 cðxi ; yi ; f ðxi ; wÞÞ ¼ ðyi  f ðxi ; wÞÞ ¼ ni : 2 2

ð8Þ ð9Þ

Second, the result provides a theoretical standard by which to determine whether two signals are generated under the same condition. Under normal conditions, the process demonstrates unique characteristics that are reflected in its related signals. Signal patterns in abnormal conditions exhibit deviations that are referred to here as noise. Prediction values emerge when the unknown input vectors are substituted into regression models. For a specific noise density model or its related loss function, the likelihood that the unknown input vector will estimate the function – i.e., the similarity measure – can be calculated according to (7). As a result, it is possible to implement condition monitoring by comparing the similarity of an unknown signal with the likelihood distribution of normal ones. However, for the standard W v-SVC [36], it is difficult to address the hybrid noise of a sample set. To resolve the problem of a shortage of e-insensitive loss, a new hybrid function composed of a Gaussian function, a Laplace function, and an e-insensitive loss function is constructed to serve as the loss function of v-SVM; it is referred to as the robust loss function. The robust loss function can be defined as

8 jnj 6 e; > < 0; LðnÞ ¼ 12 ðjnj  eÞ2 ; e < jnj 6 el ; > : lðjnj  eÞ  12 l2 ; jnj > el ;

ð10Þ

where e + l = el, and e P 0, l P 0. The middle part of the robust loss function is represented by an error quadratic item used to inhibit (or penalize) noise with a Gaussian distribution. The linear part of the robust loss function is generally used to inhibit (penalize) singularity points and larger magnitude noise in time series. The robust loss function integrates the advantages of the Gaussian loss function, the Laplace loss function, and the e-insensitive loss function and makes the support vector machine more robust and generalizable. 2.2. The conditions of the kernel function of the wavelet support vector The support vector’s kernel function can be described not only as a dot product, such as K(x, x0 ) = K(hx  x0 i), but also as a horizontal floating function, such as K(x, x0 ) = K(x  x0 ). In practice, any function that satisfies Mercer’s condition is an allowable support vector kernel function.

Q. Wu, R. Law / Information Sciences 180 (2010) 4514–4528

4517

Lemma 1 [23]. The symmetry function K(x, x0 ) is the kernel function of the SVM if and only if for every function g – 0 that satisfies R g 2 ðxÞdx < 1, the following condition is satisfied: Rd

ZZ

Kðx; x0 ÞgðxÞgðxÞdxdx0 P 0:

ð11Þ

This theorem proposes a simple method for building a kernel function. It is difficult to divide the horizontal floating function into two similar functions. We can now describe the conditions for the horizontal floating kernel function. Lemma 2 [32]. The horizontal floating function is an allowable support vector kernel function if and only if the Fourier transform of K(x) satisfies the following condition:

F½xðxÞ ¼ ð2pÞn=2

Z Rn

expðjðx  xÞÞKðxÞdx P 0:

ð12Þ

  is the Fourier transform of If the wavelet function w(x) satisfies the conditions w(x) 2 L2(R) \ L1(R) and wðxÞ ¼ 0, then w function w(x). The wavelet function group can be defined as

x  m 1 ; wa;m ðxÞ ¼ ðaÞ2 w a

ð13Þ

where a is the scaling parameter, m is the horizontal floating coefficient, and w(x) is called the mother wavelet. The parameters (namely, translation m 2 R and dilation a > 0) may be continuous or discrete. For the function f(x), where f(x) 2 L2(R), the wavelet transform of f(x) can be defined as 1

Wða; mÞ ¼ ðaÞ2

Z

þ1

f ðxÞw

1

x  m dx; a

ð14Þ

where w*(x) stands for the complex conjugate of w(x). The wavelet transform W(a, m) can be considered a function of translation m for each scale a. Eq. (14) implies that wavelet analysis is a time–frequency or time-scaled form of analysis. Unlike the short-time Fourier transform (SFFT), the wavelet transform can be used for multi-scale analysis of a signal through dilation and translation and is therefore effective in extracting the time–frequency features of a signal. The wavelet transform is also reversible, making it possible to reconstruct the original signal. A classical inversion formula for f(x) is

f ðxÞ ¼ C 1 w

Z

þ1

Z

1

þ1

Wða; mÞwa;m ðxÞ

1

da dm; a2

ð15Þ

where

Cw ¼

Z

1

1

and

 wðwÞ ¼

Z

2  jwðwÞj dw < 1; jwj

wðxÞ expðjwxÞdx:

ð16Þ

ð17Þ

For Eq. (15), Cw is a constant with respect to w(x). The wavelet decomposition theory states that the function f(x) should be approached by the linear combination of wavelet function groups. If the wavelet function of one dimension is w(x), tensor theory proposes that the multidimensional wavelet function can be defined as

wd ðxÞ ¼

d Y

wðxi Þ:

ð18Þ

i¼1

We can build the horizontal floating kernel function as follows:

Kðx; x0 Þ ¼

  d Y xi  x0i ; w ai i¼1

ð19Þ

where ai is the scaling parameter of the wavelet, with ai > 0. Given the analysis thus far, few wavelet kernel functions can be shown by existing functions because the wavelet kernel function must satisfy the conditions of Lemma 2. We now discuss the Morlet wavelet kernel function, which is an existing wavelet kernel function. Moreover, we prove that this function can satisfy the conditions for an allowable support vector kernel function. The Morlet wavelet function is defined as follows:

 2 x : wðxÞ ¼ cosðx0 xÞ exp  2

ð20Þ

4518

Q. Wu, R. Law / Information Sciences 180 (2010) 4514–4528

Theorem 1. The Morlet wavelet kernel function is defined as 0

Kðx; x Þ ¼

n Y i¼1

  !  xi  x0 2 xi  x0i i ; cos x0  exp  a 2a2 

ð21Þ

and is an allowable support vector kernel function. Proof. See Appendix A. h If we use the wavelet kernel function as the support vector’s kernel function, the decision-making function of the standard W v-SVC is defined as

f ðxÞ ¼ sgn

l X i¼1

yi ai

! n x  x  Y i w þb : a i¼1

ð22Þ

Please see Refs. [13,21] regarding wavelet theory. 2.3. Fuzzy robust wavelet support vector classifier The structural risk minimum rule is approximately accurate under the conditions of probability. The problem f(x) satisfying the SRM rule is a kind of SVC machine. The equipollent functions of f(x) have many representations. The standard SVC is only one of these equipollent functions. Because it is difficult to solve for coefficient b while solving the standard v-SVC and  x Þ because errors may occur in the process of identifying parameter b, the hyper-plane with the representation f ðxÞ ¼ sgnðw satisfying the SRM rule is constructed by incorporating b into the confidence interval of the standard SVC machine. The coef x  ficient b does not appear in the final decision-making formulation of f ðxÞ ¼ sgnðw  Þ.   ¼ ðxT ; gÞT , the inner product between x 1 ¼ xT1 ; g T and x 2 ¼ xT2 ; g T is defined as Considering the Hilbert space x  T 2 Þ ¼ ðx1  x2 Þ þ g2 . Letting w  ¼ wT ; gb , with g – 0, we then have 1  x ðx

  b f ðxÞ ¼ sgnðw  x Þ ¼ sgn w  x þ  g ¼ sgnðw  x þ bÞ ¼ f ðxÞ:

g

ð23Þ

It is obvious from Eq. (23) that the hyper-plane with f ðxÞ is equivalent to the hyper-plane with f(x). Therefore, the hyperplane with f ðxÞ also satisfies the structural risk minimum (SRM) rule on the conditions of probability. Because the input and output of decision-making variables are fuzzy numbers, they are difficult for standard wavelet support vector classifier machines to handle. To remedy this disadvantage, Definition 1 establishes a mapping relationship between fuzzy numbers and crisp real numbers. By this mapping relationship, triangular fuzzy numbers are transformed into crisp real numbers that are used as the input data for a wavelet support vector machine, which leads to straightforward computation. To understand the triangular fuzzy number expediently, the k-cuts of two triangular fuzzy numbers are shown in Fig. 1, where A = (rA  DrA, rA, rA + DrA) and B = (rB  DrB, rB, rB + DrB) are two triangular fuzzy numbers. Definition 1. Show the mapping relationship transforming fuzzy numbers into crisp real numbers [43]. For the fuzzy sample set T ¼ fðxi ; yi Þgli¼1 , where xi 2 Rd is a vector with crisp real numbers, and yi is a triangular fuzzy number that represents a fuzzy class, the membership of a sample point belonging to either the positive class (+1) or the negative class (1) is respectively either d+ or d, with d+, d 2 [0.5, 1]. For convenient representation, d 2 [1, 0.5] [ [0.5, 1] is introduced so that d+ = d and d = d. The triangular fuzzy feature of yi can then be defined by the following special triangular fuzzy number:

Fig. 1. The k-cuts of two triangular fuzzy numbers.

Q. Wu, R. Law / Information Sciences 180 (2010) 4514–4528

8  2 2 > < 2d þd2 ; 2d  1; 2d 3dþ2 ; 0:5 6 d 6 1; d d  yi ¼ ðr 1 ; r2 ; r 3 Þ ¼  2 2 > : 2d þ3dþ2 ; 1 6 d 6 0:5; ; 2d þ 1; 2d d2 d d

4519

ð24Þ

where r1 is the center, r2 is the left boundary, and r3 is the right boundary. These variables satisfy the relationship 2r2 = r1 + r3. Fuzzy positive class. For a sample point, the membership of its positive class is 0.9, and the membership of its negative  ¼ ð0:58; 0:8; 1:02Þ, class is 0.1. The fuzzy feature of the sample point can be described as the triangular fuzzy number a . where 0.8 is the center of a Fuzzy negative class. For a sample point, the membership of its negative class is 0.8, and the membership of its positive  ¼ ð1:1; 0:6; 0:1Þ, class is 0.2. The fuzzy feature of the sample point can be described as the triangular fuzzy number b  where 0.6 is the center of b. Mediacy. For a sample point, the membership of its positive class is 0.5, and the membership of its negative class is 0.5. The fuzzy feature of the sample point can be described as the triangular fuzzy number c ¼ ð2; 0; 2Þ , where 0 is the center of c. For the fuzzy sample set T ¼ fðxi ; yi Þgli¼1 , where xi 2 Rd is a vector with crisp real numbers, and yi is a triangular fuzzy number satisfying Eq. (24), the membership of any triangular fuzzy number can be represented by either d+ or d. Using the robust loss function proposed above, the mapping transformation proposed in Definition 1, and the wavelet kernel function also proposed above, a fuzzy robust W v-SVC (FRW v-SVC) with the following mathematical formulation is proposed:

min

w;n;q;b

s:t:

1 1 sðw; n; q; bÞ ¼ ðkwk2 þ b2 Þ  v q þ 2 l

yi ðw  xi þ bÞ P q  ni ; ni P 0; q P 0

l

l X i¼1

l 1X ni þ n2 2 i¼1 i

!

ð25Þ

where v 2 (0,1] is an adjustable parameter. Parameter q appears as the variable of the optimal problem that controls the size of the interval 2q/kwk between two-class points. ni represents a set of slack variables. Parameter l P 0 is constant; it controls the degree of penalty for singularity points and larger magnitude noise in time series. In problem (25), the input and output of the FRW v-SVC model are triangular fuzzy numbers, whereas the other parameters are crisp. Moreover, the FRW v-SVC model, which is robust, has the ability to penalize hybrid noise and provide better generalization capability. Problem (25) is a quadratic programming (QP) problem. Using Lagrangian multipliers, a Lagrangian function can be defined as follows:

Lðw; b; a; b; n; q; dÞ ¼

l l 1 1 X 1X 2 ni þ n2 ðkwk2 þ b Þ  v q þ 2 l i¼1 2 i¼1 i

! 

l X ðai ðyi ðw  xi þ bÞ  q þ ni Þ þ bni Þ  dq;

ð26Þ

i¼1

where a ¼ ða1 ; . . . ; al ÞT 2 Rlþ ; b ¼ ðb1 ; . . . ; bl ÞT 2 Rlþ ; d > 0 are Lagrangian multipliers. Differentiating the Lagrangian function (26) with respect to w, b, q, n, we have

8 l P > > > rw Lðw; b; q; nÞ ¼ 0 ) w ¼ ai yi xi ; > > > i¼1 > > > l > > < r Lðw; b; q; nÞ ¼ 0 ) v  P a þ d ¼ 0; q i i¼1 > > > l P > > > rb Lðw; b; q; nÞ ¼ 0 ) b ¼ ai yi ; > > > i¼1 > > : rn Lðw; b; q; nÞ ¼ 0 ) ai þ bi ¼ 1l þ 1l ni :

ð27Þ

By substituting Eq. (27) into Eq. (26), we can obtain the corresponding dual form of problem (25) as follows:

max a

s:t

WðaÞ ¼ 

l X l l 1X l X yi yj ai aj ðKðxi  xj Þ þ 1Þ  a2 2 i¼1 j¼1 2 i¼1 i

8 1 > < 0 6 ai 6 l ; l P > : ai P v :

ð28Þ

i¼1

By selecting appropriate v and K(x, x0 ), we can construct and solve the optimal problem (28) according to QP methods to  T obtain the optimal solution a ¼ a1 ; . . . ; al . The decision-making function of the FRW v-SVC model can be represented as follows:

4520

Q. Wu, R. Law / Information Sciences 180 (2010) 4514–4528

f ðxÞ ¼ sgn

l X

! yi ai ðKðxi ; xÞ þ 1Þ :

ð29Þ

i¼1

It is obvious that the FRW v-SVC model includes one constraint condition fewer than the standard FW v-SVC. The concise dual form (28) and. the absence of parameter b from the decision-making function (29) reduce the complexity of the support vector classifier machine. 3. Merits and characteristics of the FRW v-SVC model 3.1. Merits of the FRW v-SVC model Typical classifier problems can be represented by one of two classes: the positive class (1) or the negative class (1). Clearly, representation of a training point by the positive class (1) versus the negative class (1) is determined by historical data, experience, and expert knowledge in the field. Inadequate historical data or experience and knowledge deficiencies among experts may generate only fuzzy judgments, such that there is, for example, an 80% probability that the sample point belongs to the positive class and a 20% probability that it belongs to the negative class. This type of fuzzy judgment must be represented by a fuzzy number rather than a crisp real number. Standard wavelet support vector classifier machines have difficulty handling the above decision-making variables because their inputs and outputs are fuzzy numbers. To remedy this problem, this paper establishes a mapping relation between fuzzy numbers and crisp real numbers. This mapping relation transforms a triangular fuzzy number into a crisp real number, which is easily computed as the input of a wavelet support vector machine. It is clear that the left and right parts of a triangular fuzzy number can represent the uncertain parts of the expert knowledge domain. For an ordinary fuzzy SVC, all fuzzy information is transformed into a crisp real number or membership, and diagnosing analysis is based on the sample set, including crisp real numbers. In contrast, this paper suggests a novel fuzzy wavelet v-SVC. The major innovation of the method proposed in the present study is that using triangular fuzzy numbers to describe input and output allows for a more effective description of fault systems involving uncertainties. In addition, the parameters in the proposed model are determined via a PSO-based search. In comparison with the standard FW v-SVC model, the proposed model requires one fewer constraint condition. Uncertain information is also taken into full account in establishing the novel FW v-SVC, as is appropriate for complex nonlinear fuzzy fault system classifying problems involving uncertain influencing factors. 3.2. Characteristics of FRW v-SVC The following theorem defines parameter

v in the FRW v-SVC model.

Definition 2. If a support vector machine can classify the training sample set T ¼ fðxi ; yi Þgli¼1 , the individual set outside the e tube is known as the erroneous sample. Theorem 2. Let us consider a set of data points (x1, y1), (x2, y2), . . . , (xl, yl) independently and randomly generated by an unknown function. Specifically, xi is a column vector of attributes, and yi is a scalar that represents the dependent variable. l denotes the number of data points in the training set. The v-support vector classifier machine is used for classifier analysis. Let parameter q* > 0. Suppose that q represents the number of erroneous samples. Then v P q/l; i.e., v is the upper bound of the proportion of erroneous samples to total samples. Suppose that p represents the number of support vectors. Then v 6 p/l; i.e., v is the lower bound of the proportion of support vectors to total samples. Proof. See Appendix B.

h

4. Adaptive and Gaussian particle swarm optimization (AGSPO) Determining the unknown parameters of FRW v-SVC is a complicated process. In practice, this process resembles a multivariable optimization problem in a continuous space. Identifying the appropriate combination of parameters for a model can enhance the approximating degree of the original series. Therefore, it is necessary to select an intelligence algorithm that obtains the optimal parameters for the FRW v-SVC model. An appropriate parameter combination corresponds to a high level of generalization of FRW v-SVC. The PSO algorithm introduced by Kennedy and Eberhart [12] is considered to reflect an excellent technique for solving combinatorial optimization problems [7,9,12,31,34,35,38]. The robustness and efficacy of the PSO algorithm in solving function optimization problems has been demonstrated in previous literature [16,26,36]. Adaptive mutation is a highly efficient operator under the conditions of a real-number code. Solution quality is closely connected to the mutation operator. We propose that the adaptive mutation operator regulates the inertia weight of velocity. At the same time, the Gaussian mutation operator is considered to correct the direction error of particle velocity. By

Q. Wu, R. Law / Information Sciences 180 (2010) 4514–4528

4521

incorporating both adaptive mutation and Gaussian mutation for the previous velocity of the particle, a novel particle swarm optimization algorithm called the adaptive and Gaussian particle swarm optimization (AGSPO) algorithm is proposed as follows: k k v kþ1 im ¼ ð1  kÞwim v im þ kN k kþ1 xkþ1 im ¼ xim þ v im ;

wkim

  0; rki þ c1 r 1 pim  xkid þ c2 r2 p g im  xkim ;

      2 ¼ b 1  f xki =f xkg þ ð1  bÞw0im exp ak ;

kþ1 i

r



k i

¼ r expðNi ð0; DrÞÞ;

ð30Þ ð31Þ ð32Þ ð33Þ

where Dr is the standard error of the normal distribution, b is the adaptive coefficient, k is an increment coefficient, a is the  coefficient controlling for particle velocity attenuation, f xki is the fitness of the ith particle in the kth iterative process, and   f xkg is the optimal fitness of particle swarms in the k iterative processes. Parameter w regulates the trade-off between the global and local exploration abilities of the swarm. A large inertia weight facilitates global exploration, whereas a small one tends to facilitate local exploration. A suitable value for the inertia weight w usually establishes a balance between global and local exploration abilities and consequently results in a reduction in the number of iterations required to locate the optimum solution. Then, the adaptive mutation operator is adopted to handle the inertia weight w. The adaptive weight w is described as follows. The first term of the right part of Eq. (32) denotes the mutation based on the fitness function value. A particle with greater fitness mutates within a smaller scope, whereas one with lesser fitness mutates within a larger scope. The second term of the right part of Eq. (32) represents the mutation based on the iterative variable k. The particles mutate on the smaller k and search for the local optimal value in a larger scope. When the algorithm continues to run, particles mutate on the larger k and search for the optimal value within a smaller scope to gradually reach the global optimal value. The normal mutation operator corrects the change error of particle velocity as represented in Eq. (33). In the normal mutation strategy, the individual vectors consist of an explanation vector v = (v1, v2, . . . , vn) and a perturbation vector r = (r1, r2,    , rn). The perturbation vector mutates its value according to Eq. (33) in each iterative process as a controlling vector for the explanatory vector. Adaptive mutation, whereby the quality of the solution depends on the mutation operators, is a highly effective mutation operator in real code. The proposed adaptive mutation operator, which is based on the iterative variable k and the fitness function f(xk), is described in Eq. (32). Then, in the first term on the right of Eq. (30), the velocity inertia weight wkij can provide the balance between the global and local exploration abilities results in a reduction in the number of  andconsequently   describes how the particles with greater iterations required to locate the optimum solution. In Eq. (32), 1  f xki =f xkg fitness mutate  within a smaller scope, whereas those with lesser fitness mutate within a larger scope. The term 2 w0ij exp ak describes how the initial inertia weight w0ij initially mutates in a large scope to search for the local optimal value in a large space (i.e., when the iterative process k is smaller) before mutating within a smaller scope to search for the global optimal value in a smaller space (that is, when the iterative process k is bigger) and ultimately reaches the global optimal value. The second term of Eq. (30) represents normal mutation based on the iterative variable k. Eq. (33) shows the Gaussian mutation operator that can correct strategy, the proposed velocity  the direction of particle velocity. In the normal mutation  vector v kþ1 ¼ v 1kþ1 ; v 2kþ1 ; . . . ; v kþ1 consists of the last generation velocity vector v k ¼ v k1 ; v k2 ; . . . ; v kd and the perturbation d  vector rk ¼ rk1 ; rk2 ; . . . ; rkd . The perturbation vector mutates itself according Eq. (33) in each iteration as a vector controls the velocity vector vk+1. The information flow between particles seems to be the reason for clustering of particles. Diversity declines rapidly, leaving the PSO algorithm with great diffculties of escaping local optima. Consequently, the clustering leads to low diversity with a fitness stagnation. Overall, the adaptive and Gaussian mutation operators can restore the diversity lost from the population and improve the global search capacity of the algorithm. 5. Intelligent diagnosis method based on the FRWSVC and AGPSO The proposed mixture of expert models is an intelligent diagnosing system that can handle the hybrid noise and non-stationarity of fault-pattern series and is effective in constructing the nonlinearity relation in high-dimensional space. The steps of the AGPSO algorithm proposed above are as follows. Algorithm 1 Step (1) Data preparation: training and test sets are represented as Tr and Te, respectively. Step (2) Particle initialization and setting of PSO parameters: generate initial particles. Set the PSO parameters, including the number of particles (n), the particle dimension (m), the maximum number of iterations (kmax), the error limit 0 for the fitness function, the velocity limit (Vmax), the inertia weight for particle velocity (w ), the Gaussian distri0 bution (N(0, Dr)), the perturbation momentum ri , the coefficient controlling for particle velocity attenuation (a), the adaptive coefficient (b), and the increment coefficient (k). Set the iterative variable such that k = 0.

4522

Q. Wu, R. Law / Information Sciences 180 (2010) 4514–4528

Step (3) Set the iterative variable: k = k + 1. Step (4) Compute the fitness function value of each particle. Take the current particle as the individual extremum point of every particle and use the particle with the minimal fitness value as the global extremum point. Step (5) Termination condition checking: if termination criteria (that is, a predefined maximum number of iterations or the error accuracy of the fitness function) are met, go to step 8. Otherwise, go to the next step. Step (6) Adopt the adaptive mutation operator described by Eq. (32) and the Gaussian mutation operator described by Eq. (33) to manipulate particle velocity. Step (7) Update the particle position: use Eqs. (30) and (31) to form new particle swarms; go to step 3. Step (8) End the training procedure: output the optimal particle. Based on the FRWv-SVC model, we can summarize the fault-diagnosis algorithm as follows. Algorithm 2 Step (1) Initialize the original data using normalization and fuzzification and then form Tr and Te. Step (2) Create the diagnosis pattern series with a wavelet transform on a different scale and select the best wavelet function and scale scope ai to provide a good match with the original series. Step (3) Compute the wavelet kernel function using Eq. (19) and construct the QP problem (25) of the FRW v-SVC model. Step (4) Use Algorithm 1 to obtain the optimal parameter combination vector (v, a). Solve the optimization problem (28) and obtain the Lagrangian parameter a. Step (5) For a new diagnosis task, extract the fault characteristics and form a set of input variables x. Step (6) Compute the diagnosing result f(x) using Eq. (29). 6. Applications Four examples are presented to illustrate the hybrid diagnosis model based on the FRW v-SVC model and AGPSO algorithm. We used MATLAB 7.1 to implement the algorithm. The experiments were carried out on a 1.80 GHz Core (TM)2 CPU personal computer (PC) with 1.0 G memory running Microsoft Windows XP Professional. To confirm the validity of AGPSO, we selected the published CLPSO [19] and DMPSO [18] methods to optimize the unknown parameters of the FRW v-SVC model. The performance of these algorithms under the same running generation is compared in Table 1. According to the comparison, AGPSO used less time and provided greater searching efficiency. The adaptive mutation operator resulted in better exploration of the search space. The Gaussian mutation operator adjusted the direction of the particles and improved the searching efficiency of the algorithm. Therefore, the better performance of the proposed AGPSO algorithm may be attributed to all of its features, i.e., the adaptive mutation operator, the Gaussian mutation operator, and the adaptive control parameters. Example 1. This experiment examined fault diagnosis on a car assembly line. The fault pattern of a car assembly line represents a type of complex fuzzy system used in the manufacturing sector, the diagnosis of which is usually driven by a large number of uncertain factors. Some factors, including linguistic information, were compiled in the factor list shown in Table 2. In terms of numerical information, oscillating signals from machines B (1st gear), C (2nd gear), E (transmission shaft), G (driving motor 1), and H (driving motor 2) on the car assembly line were considered as inputs. In terms of linguistic information types A (assistant worker number), D (product demand), and F (delay from the last machining step), the influencing weights obtained using fuzzy logic were taken as input pattern data. The detailed process was as follows. Influencing factor set U and evaluation set V were first set up. Each factor was given a different weight obtained through an analytical hierarchy process or analysis of its degree of influence on the car assembly line. The weight set M, which is a fuzzy subset of U, was thereby obtained. Thus, the degree of influence that each factor was calculated to have on the car assembly line was taken as numerical information and used directly as input data for FRW v-SVC. In this experiment, the data set shown in Table 3 was from Nanjing Automobile Company of Jiangsu province in China.

Table 1 Performance comparison among AGPSO and two other published PSOs. Algorithm

AGPSO CLPSO DMPSO

Running time

Fitness value Maximum

Minimum

2753S

0.000592859

0.00015042

2428S 2942S

0.000650339 0.000630339

0.00016030 0.00016012

4523

Q. Wu, R. Law / Information Sciences 180 (2010) 4514–4528 Table 2 Linguistic fault pattern for car assembly line. Influencing characteristics

Unit

Expression

Weight

Assistant worker number (A) Product demand (D) Delay from the last machining step (F)

Dimensionless Dimensionless Dimensionless

Linguistic information Linguistic information Linguistic information

0.65 0.8 0.7

Table 3 Sample set of fault diagnoses. No.

A

B

C

D

E

F

G

H

Membership

1 2 3 4 5 6 7 8 9 .. . 56 57 58 59 60

0.66 0.98 0.12 0.49 0. 82 0. 65 0. 34 0. 97 0.11 .. . 0. 68 0. 94 0. 59 0. 29 0. 53

0.0154 0.0305 0.0744 0.0015 0.0008 0.0708 0.0901 0.0089 0.0387 .. . 0.0756 0.0992 0.0275 0.0784 0.0548

0.0021 0.0508 0.0159 0.0928 0.0731 0.0548 0.0656 0.0324 0.0049 .. . 0.0074 0.0007 0.0022 0.0016 0.0056

0.29 0.49 0.69 0.65 0. 98 0.55 0.44 0.19 0.63 .. . 0. 83 0.67 0.94 0.91 0. 58

0.0603 0.0485 0.0618 0.0973 0.0493 0.0711 0.0009 0.0623 0.0505 .. . 0.0614 0.0295 0.0561 0.0955 0.0287

0. 12 0.35 0. 15 0. 92 0. 25 0.76 0. 66 0. 33 0. 38 .. . 0. 86 0. 73 0. 28 0. 51 0. 45

0.0289 0.0663 0.0939 0.0208 0.0181 0.0692 0.0608 0.0453 0.0877 .. . 0.0171 0.0699 0.0751 0.0234 0.0466

0.0501 0.0311 0.0068 0.0086 0.0013 0.0621 0.0065 0.0185 0.0214 .. . 0.0079 0.0084 0.0016 0.0537 0.0936

0.87(+) 0.65(+) 0.74() 0.79() 0.58() 0.55() 0.85() 0.65(+) 0.91(+) .. . 0.71() 0.76(+) 0.52() 0.67(+) 0.75(+)

Table 4 The diagnosis results from the last 10 pattern points. No.

Actual membership

Actual pattern

F v-SVC

Predicting pattern

FW v-SVC

Predicting pattern

FRW v-SVC

Predicting pattern

1 2 3 4 5 6 7 8 9 10

0.51 0.77 0.82 0.63 0.57 0.71 0.76 0. 52 0.67 0.75

1 1 1 1 1 1 1 1 1 1

0.4936 0.7832 0.8328 0.6419 0.4900 0.8044 0.7577 0.4837 0.6812 0.7340

1 1 1 1 1 1 1 1 1 1

0.5075 0.7736 0.8235 0.6401 0.4987 0.7363 0.7561 0.4984 0.6788 0.7476

1 1 1 1 1 1 1 1 1 1

0.5136 0.7753 0.8217 0.6323 0.5800 0.7223 0.7607 0.5357 0.6735 0.7468

1 1 1 1 1 1 1 1 1 1

The initial parameters of the intelligent diagnosis system were set as follows: inertia weight w0 = 0.9, positive acceleration constants c1, c2 = 2, standard error of normal distribution Dr = 0.5, adaptive coefficient b = 0.8, increment coefficient k = 0.1, fitness accuracy of the normalized samples 0.0005, and coefficient controlling for particle velocity attenuation a = 2. The wavelet transform of the original pattern series on the different scales was obtained by following steps 1 and 2 of Algorithm 2. The wavelet functions employed were the Morlet, Haar, Mexican, and Gaussian wavelets. Of these, the Mexican wavelet transform was the best wavelet transform given its ability to decompose the original pattern series on a scale ranging from 0.3 to 2. Therefore, the Mexican wavelet could be taken as a kernel function of the FRW v-SVC model. The two following parameters were also determined: v 2 [0, 1], a 2 [0.3, 2]. The optimal combinational parameters were obtained by Algorithm 1, with v = 0.9 and a = 0.4. To check the diagnostic capability of the FRW v-SVC model, the two prior models (FW v-SVC and F v-SVC) were trained on the same pattern series. The last 10 diagnosis results for the tested samples under each models are shown in Table 4. In the F v-SVC model, the greatest diagnosis error was 13.3% (No. 6), and the smallest diagnosis error was 0.3% (No. 7). In the FW v-SVC model, the greatest diagnosis error was 12.5% (No. 5), and the smallest diagnosis error was 0.32% (No. 10). In the FRW v-SVC model, the greatest diagnosis error was 0.6% (No. 2), and the smallest diagnosis error was 0.09% (No. 7). The diagnostic capability of the FRW v-SVC model was compared with that of the other diagnostic approaches; the results are shown in Table 5. The classifier indices adopted included a one-class-error ratio such that the negative class was included in the positive class and a two-class-error ratio such that the positive class was included in the negative class. It is clear that

4524

Q. Wu, R. Law / Information Sciences 180 (2010) 4514–4528

the classifier indices for FRW v-SVC are better than those for the fuzzy v-support vector classifier (F v-SVC) and the fuzzy wavelet v-support vector classifier (FW v-SVC). Because of the production environment, some errors inevitably occurred in the data gathering and estimation process. Thus, the above diagnosis results are satisfactory. The results of applying the FRW v-SVC model to a car assembly line indicate the feasibility and effectiveness of this diagnostic approach. Example 2. In this experiment, the proposed model was applied to a data set based on the fault patterns of an injection machine. A data set of 73 patterns was split into a training set of 60 patterns (from Nos. 1–60) and a testing set of 13 patterns (from Nos. 61–73). The influencing factors are as follows: (A) workshop temperature (linguistic information), (B) driving wheel (numerical information), (C) screw transmission (numerical information), (D) concentration of hydraulic pressure oil

Table 5 Precision analysis of fault diagnosis. Model

OCER (%)

TCER (%)

ER (%)

F v-SVC FW v-SVC FRW v-SVC

20 10 0

10 10 0

20 10 0

Note: OCER: one-class-error ratio; TCER: two-class-error ratio; ER:error ratio.

Table 6 Sample set of fault diagnoses.

1 2 3 4 5 6 .. . 70 71 72 73

A

B

C

D

E

F

Membership

0.9 0.7 0.9 0.7 0.5 0.7 .. . 0.7 0.3 0.3 0.5

0.0222 0.0614 0.0721 0.0534 0.0292 0.0858 .. . 0.0567 0.0983 0.0344 0.0625

0.0037 0.0221 0.0329 0.0134 0.0228 0.0496 .. . 0.0049 0.0553 0.0948 0.0861

1 1 1 1 2 2 .. . 1 8 10 1

0.0579 0.0329 0.0132 0.0099 0.0389 0.0028 .. . 0.0988 0.0153 0.0468 0.0451

4 14 9 7 2 3 .. . 9 6 4 7

0.52(+1) 0.52(+1) 0.52(+1) 0.52(+1) 0.52(+1) 0.52(+1) .. . 0.52(+1) 0.46(1) 0.39(+1) 0.28(+1)

Table 7 The diagnostic results from the last 13 pattern points. No.

Actual membership

Actual pattern

F v-SVC

Predicting pattern

FW v-SVC

Predicting pattern

FRW v-SVC

Predicting pattern

1 2 3 4 5 6 7 8 9 10 11 12 13

0.53 0.57 0.45 0.67 0.38 0.64 0.57 0.27 0.18 0.52 0.46 0.39 0.28

1 1 1 1 1 1 1 1 1 1 1 1 1

0.45 0.52 0.59 0.75 0.37 0.63 0.55 0.27 0.18 0.51 0.51 0.39 0.27

1 1 1 1 1 1 1 1 1 1 1 1 1

0.49 0.57 0.49 0.67 0.38 0.64 0.57 0.27 0.18 0.52 0.50 0.39 0.28

1 1 1 1 1 1 1 1 1 1 1 1 1

0.54 0.57 0.48 0.67 0.38 0.64 0.49 0.27 0.18 0.52 0.48 0.39 0.28

1 1 1 1 1 1 1 1 1 1 1 1 1

Table 8 Precision analysis of fault diagnosis. Model

OCER (%)

TCER (%)

ER (%)

F v-SVC FW v-SVC FRW v-SVC

15.4 7.6 0

7.6 7.6 7.6

23 15.4 7.6

4525

Q. Wu, R. Law / Information Sciences 180 (2010) 4514–4528

(linguistic information), (E) transmission wheel (numerical information), and (F) water content of original material (linguistic information). In this experiment, the data set shown in Table 6 was sourced from an injection machine used in a typical company in Jiangsu province. Table 7 shows the 13 latest diagnoses for the injection machine. Table 8 provides the error analysis of the diagnosis results. It was found that AGPSO performed better than PSO for the SVC model in diagnosing faults. The OCER, TCER, and ER indices provided by FRW v-SVC were also better than those provided by FW v-SVC and F v-SVC. Example 3. In this experiment, the proposed model was implemented on a high-dimensional data set [45]. Information on the data set is as follows. The donor database of the Blood Transfusion Service Center in Hsin-Chu City, Taiwan, was used to demonstrate the RFMTC (recency, frequency, and monetary value, time since first purchase, and churn probability) marketing model, which is a modified version of the RFM (recency, frequency, and monetary value) model. The Center dispatches its blood transfusion service bus to a university in Hsin-Chu City about once every three months to gather blood donations. To build a FRMTC model, we selected 748 donors at random from the donor database. The data held on each of these 748 donors included R (recency; the number of months since last donation), F (frequency; the total number of donations), M (monetary; the total blood donated in c.c.), T (time; the number of months since first donation), and a binary variable representing whether the donor donated blood in March 2007 (1 indicates donated blood; 0 indicates no donation of blood). It was obvious that every data point in the benchmark data set was crisp. Therefore, there was no need for fuzzy mapping transformation of the decision-making variables in the benchmark data set. To facilitate ready identification of the classifier results, the ‘‘0” value for the standard benchmark data set was changed to ‘‘1”. The last 10 diagnosis results are shown in Table 9. The classifier precision analysis is reported in Table 10. Obviously, the OCER, TCER, and ER indices provided by FRW v-SVC performed better than those provided by FW v-SVC and F v-SVC. Example 4. In this experiment, the proposed model was applied to a high-dimensional data set [28].

Table 9 The diagnostic results from the last 10 pattern points. No.

Actual membership

Actual pattern

F v-SVC

Predicting pattern

FW v-SVC

Predicting pattern

FRW v-SVC

Predicting pattern

1 2 3 4 5 6 7 8 9 10

0.51 0.77 0.82 0.63 0.57 0.71 0.76 0. 52 0.67 0.75

1 1 1 1 1 1 1 1 1 1

0.4936 0.7832 0.8328 0.6419 0.4900 0.8044 0.7577 0.4837 0.6812 0.7340

1 1 1 1 1 1 1 1 1 1

0.5075 0.7736 0.8235 0.6401 0.4987 0.7363 0.7561 0.4984 0.6788 0.7476

1 1 1 1 1 1 1 1 1 1

0.5136 0.7753 0.8217 0.6323 0.5800 0.7223 0.7607 0.5357 0.6735 0.7468

1 1 1 1 1 1 1 1 1 1

Table 10 Precision analysis of fault diagnosis. Model

OCER (%)

TCER (%)

ER (%)

F v-SVC FW v-SVC FRW v-SVC

20 10 0

10 10 0

20 10 0

Table 11 Training and testing sample set. 1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

91 96 88

39 50 35

77 94 60

153 215 143

59 67 59

8 9 7

139 187 128

48 35 52

18 22 18

139 158 129

159 214 147

289 525 246

123 214 109

62 67 62

8 8 1

17 6 6

201 193 202

209 201 209

Van (+1) Saab (1) Van (+1)

93 93 96 85

46 51 40 36

85 90 78 51

169 209 170 115

66 69 58 56

9 8 7 5

151 183 174 119

44 36 38 57

19 22 21 17

147 156 139 124

169 211 197 139

339 506 455 207

179 230 160 127

67 70 68 81

0 6 3 13

4 1 29 5

195 189 191 181

204 196 200 184

Van (+1) Saab (1) Saab (1) van (+1)

4526

Q. Wu, R. Law / Information Sciences 180 (2010) 4514–4528 Table 12 Precision analysis of fault diagnosis. Model

OCER (%)

TCER (%)

ER (%)

F v-SVC FW v-SVC FRW v-SVC

27 18 9

9 9 9

36 27 18

Information on the data set is as follows. The objective in this example was to classify a given silhouette as one of four types of vehicle using a set of features extracted from the silhouette. The vehicle could be viewed from many different angles. The benchmark data set had 18 dimensions. To apply the proposed model to this data set, all decision-making variables marked ‘‘SAAB” or ‘‘VAN” were selected as the training and testing data sets. The new data set is described in Table 11. It is obvious that every data point in the benchmark data set was crisp. To facilitate ready identification of the classifier results, the items marked ‘‘VAN” and ‘‘SAAB” in the standard benchmark data set [28] were marked ‘‘1” and ‘‘1”, respectively. The classifier precision analysis of the testing set (i.e., the last 10 patterns) is shown in Table 12. Clearly, the OCER, TCER, and ER indices provided by the FRW v-SVC model performed better than those provided by FW vSVC and F v-SVC. For both the pattern away from the mean value of the input variable yi and the pattern near the mean value of yi, the FRW v-SVC model provided the highest degree of diagnostic precision, showing that the Laplace loss function can penalize greater breadth noise and oddity points. Tables 5, 8, 10 and 12 demonstrate that the diagnostic pattern near the mean value of yi provided the best level of precision on FRW v-SVC, thus indicating that the Gaussian loss function can penalize white noise (i.e., Gaussian noise or normally-distributed noise). It is clear that a robust loss function comprised of a Laplace loss function, a Gaussian loss function, and an e-insensitive loss function can penalize hybrid noise and heighten the diagnostic precision of a complex fault system. In constructing the FRW v-SVC model, one factor taken into consideration is how to penalize noise in (1,eu) [ [eu  ,e) [ [e, 0) [ (0, e] [ (e, eu] [ (eu, +1). This will lead to further improvement in the anti-noise capability of the FRW v-SVC approach. 7. Conclusion This paper combines wavelet theory, fuzzy theory, SVC, a wavelet kernel function, and a robust loss function to develop a novel wavelet support vector classifier machine, namely, the fuzzy robust wavelet v-support vector classifier machine (FRW v-SVC), to handle fuzzy uncertain information and penalize hybrid noise. Moreover, the proposed AGPSO algorithm can be used to select parameters for the FRW v-SVC model. It is difficult to address certain Gaussian noise in fault-diagnosis data using a standard support vector machine with an e-insensitive loss function. In contrast, a support vector machine with a Gaussian loss function can effectively inhibit (or penalize) Gaussian noise and has good generalization properties. For some singularity points and larger magnitude noise, a Gaussian loss function lacks a strong penalizing capability in comparison with that of a Laplace loss function. Thus, a robust loss function based on the integration of a Laplace loss function, a Gaussian loss function, and an e-insensitive loss function combines each loss function trait and penalizes some hybrid noise given a sample set of support vector machines. The fault patterns of complex manufacturing systems are influenced by many factors. Some of these factors are described in numerical information, and others are described in linguistic information. For these fault patterns, it is difficult to set up an accurate nonlinear fault-diagnosis model using traditional modeling methods. The results of experiments applying the proposed FRW v-SVC model to fault diagnosis show that this model can penalize some types of hybrid noise and can increase classifier precision. Moreover, the FRW v-SVC model has a concise dual form. The absence of parameter b from the FRW v-SVC decision-making function represented by sample data and a wavelet kernel function reduces the complexity of the model. Acknowledgements This research was partly supported by the National Natural Science Foundation of China under Grant 60904043, a research grant funded by the Hong Kong Polytechnic University, China Postdoctoral Science Foundation (20090451152), Jiangsu Planned Projects for Postdoctoral Research Funds (0901023C) and Southeast University Planned Projects for Postdoctoral Research Funds. Appendix A. Proof of Theorem 1 According to Lemma 2, we need only prove that

F½xðxÞ ¼ ð2pÞn=2 where KðxÞ ¼

Qn



xi i¼1 w a

Z

¼

Rn

expðjðx  xÞÞKðxÞdx P 0;

Qn

i¼1

cos

w0 xi a

expððkxi k2 =2a2 ÞÞ, and j denotes the imaginary number unit. We have

ð34Þ

Q. Wu, R. Law / Information Sciences 180 (2010) 4514–4528

Z Rn

!!  x kxi k2 i dx cos w0 exp  a 2a2 Rh i¼1 !   n Z 1 Y expðjw0 xi =aÞ þ expðjw0 xi =aÞ kxi k2 dxi ¼ exp expðjxi xi Þ 2 2a2 1 i¼1   !   !! Z n Y 1 1 kxi k2 w0 j kxi k2 w0 j ¼ exp   jxi a xi þ exp   jxi a x þ  2 1 a a 2 2 i¼1 ! !! pffiffiffiffiffiffiffi 2 2 n Y jaj 2p ðw0  xi aÞ ðw0 þ xi aÞ ¼ þ exp  : exp  2 2 2 i¼1

expðjxxÞKðxÞdx ¼

Z

expðjxxÞ

4527

n Y

ð35Þ

Substituting Eq. (35) into Eq. (34), we obtain Eq. (36).

F½XðxÞ ¼

 n  Y jaj 2

i¼1

exp 

ðw0  xi aÞ2 2

! þ exp 

ðw0 þ xi aÞ2 2

!! ;

ð36Þ

where a – 0. Note also that

F½xðxÞ P 0:

ð37Þ

Appendix B. Proof of Theorem 2 Proof. Suppose that q represents the number of erroneous samples. If q* > 0, the Lagrangian multipliers of the second constraint of Eq. (28) are equal to 0, viz., d = 0. The inequality represented by the second constraint of Eq. (28) is transformed into an equation. When ni > 0 corresponding to the erroneous sample point is (xi, yi), then ai ¼ 1l . Thus, we have



l X

q l

ai P :

i¼1

ð38Þ

Therefore, parameter v is the upper bound of the proportion of erroneous samples to total samples. Suppose that p represents P the number of support vectors. Substituting 0 < ai 6 1/l into v 6 li¼1 ai , we find that

v6

p X i¼1

ai ¼

l X i¼1

1 l

p l

ai 6  p ¼ :

ð39Þ

Eq. (39) shows that the proportion (p/l) of support vectors (p) to total samples (l) is not less than v; i.e., the parameter v is the lower bound of the proportion of support vectors to total samples. Therefore, when the number of sample points is close to infinity, with l ? 1, v is close to both the proportion of support vectors to total samples and the proportion of erroneous samples to total samples. It is clear that parameter v(0 6 v 6 1) can be used to control the magnitude of support vectors and erroneous samples, which is the theoretical basis of the selection of v. The generalizability of v-SVC is evaluated by the magnitude of the parameter v. From this perspective, the generalizability of the fuzzy wavelet v-support vector classifier machine (FW v-SVC) exceeds that of the fuzzy wavelet C-support vector classifier machine (W C-SVC). h

References [1] M. Bicego, M.A.T. Figueiredo, Soft clustering using weighted one-class support vector machines, Pattern Recognition 42 (1) (2009) 27–32. [2] G. Bloch, F. Lauer, G. Colin, Y. Chamaillard, Support vector regression from simulation data and few experimental samples, Information Sciences 178 (20) (2008) 3813–3827. [3] X.B. Cao, Y.W. Xu, D. Chen, H. Qiao, Associated evolution of a support vector machine-based classifier for pedestrian detection, Information Sciences 179 (8) (2009) 1070–1077. [4] Y. Chen, J.Z. Wang, Support vector learning for fuzzy rule-based classification systems, IEEE Transactions on Fuzzy Systems 11 (6) (2003) 716–728. [5] J.H. Chiang, P.Y. Hao, A new kernel-based fuzzy clustering approach: support vector clustering with cell growing, IEEE Transactions on Fuzzy Systems 11 (4) (2003) 518–527. [6] C.C. Chuang, Extended support vector interval regression networks for interval input–output data, Information Sciences 178 (3) (2008) 871–891. [7] W. Du, B. Li, Multi-strategy ensemble particle swarm optimization for dynamic optimization, Information Sciences 178 (15) (2008) 3096–3109. [8] A.C. Elikyılmaz, I. Burhan Türksen, Fuzzy functions with support vector machines, Information Sciences 177 (23) (2007) 5163–5177. [9] J.F. Chang, P. Shi, Using investment satisfaction capability index based particle swarm optimization to construct a stock portfolio, Information Sciences, in press, doi:10.1016/j.ins.2010.05.008. [10] B. Jin, Y.C. Tang, Y.Q. Zhang, Support vector machines with genetic fuzzy feature transformation for biomedical data classification, Information Sciences 177 (2) (2007) 476–489. [11] Jayadeva, R. Khemchandani, S. Chandra, Regularized least squares support vector regression for the simultaneous learning of a function and its derivatives, Information Sciences 178 (17) (2008) 3402–3414. [12] J. Kennedy, R. Eberheart, Particle swarm optimization, in: Proceedings of the IEEE International Conference on Neural Networks, 1995, pp. 1942–1948. [13] S.G. Krantz, Wavelet: Mathematics and Application, CRC, Boca Raton, FL, 1994.

4528

Q. Wu, R. Law / Information Sciences 180 (2010) 4514–4528

[14] F. Lauer, C.Y. Suen, G. Bloch, A trainable feature extractor for handwritten digit recognition, Pattern Recognition 40 (6) (2007) 1816–1824. [15] C.A.M. Lima, A.L.V. Coelho, F.J.V. Zuben, Hybridizing mixtures of experts with support vector machines: investigation into nonlinear dynamic systems identification, Information Sciences 177 (10) (2007) 2049–2074. [16] X. Li, J. Wang, A steganographic method based upon JPEG and particle swarm optimization algorithm, Information Sciences 177 (15) (2007) 3099– 3109. [17] C.F. Lin, S.D. Wang, Fuzzy support vector machines, IEEE Transactions on Neural Networks 13 (2) (2002) 464–471. [18] J.J. Liang, P.N. Suganthan, Dynamic multi-swarm particle swarm optimizer, in: Proceedings of Swarm Intelligence Symposium, 2005, pp. 124–129. [19] J.J. Liang, A.K. Qin, P.N. Suganthan, S. Baskar, Comprehensive learning particle swarm optimizer for global optimization of multimodal functions, IEEE Transactions on Evolutionary Computation 10 (2006) 281–296. [20] P. Lingras, C. Butz, Rough set based 1-v-1 and 1-v-r approaches to support vector machine multi-classification, Information Sciences 177 (18) (2007) 3782–3798. [21] G.Z. Liu, S.L. Di, Wavelet Analysis and Application, Xidian Univ. Press, Xi’an, China, 1992. [22] S. Maldonado, R. Weber, A wrapper method for feature selection using support vector machines, Information Sciences 179 (13) (2009) 2208–2217. [23] J. Mercer, Functions of positive and negative type and their connection with the theory of integral equation, Philosophical Transactions of the Royal Society of London A-209 (1909) 415–446. [24] R. Min, H.D. Cheng, Effective image retrieval using dominant color descriptor and fuzzy support vector machine, Pattern Recognition 42 (1) (2009) 147–157. [25] W. Pedrycz, H.S. Park, S.K. Oh, A granular-oriented development of functional radial basis function neural networks, Neurocomputing 72 (1–3) (2008) 420–435. [26] W. Pedrycz, B.J. Park, N.J. Pizzi, Identifying core sets of discriminatory features using particle swarm optimization, Expert Systems with Applications 36 (3) (2009) 4610–4616. [27] J. Ramírez, J.M. Górriz, D. Salas-Gonzalez, A. Romero, M. López, I. Álvarez, M. Gómez-Río, Computer-aided diagnosis of Alzheimer’s type dementia combining support vector machines and discriminant set of features, Information Sciences, in press, doi:10.1016/j.ins.2009.05.012. [28] J.P. Siebert, Vehicle Recognition using rule based methods, Turing Institute Research Memorandum TIRM-87-018, 1987, . [29] Q. Shen, W.M. Shi, W. Kong, B.X. Ye, A combination of modified particle swarm optimization algorithm and support vector machine for gene selection and tumor classification, Talanta 71 (4) (2007) 1679–1683. [30] T. Shon, J. Moon, A hybrid machine learning approach to network anomaly detection, Information Sciences 177 (18) (2007) 3799–3821. [31] Y. Shi, H. Liu, L. Gao, G. Zhang, Cellular particle swarm optimization, Information Sciences, in press, doi:10.1016/j.ins.2010.05.025. [32] A. Smola, B. Scholkopf, The connection between regularization operators and support vector kernels, Neural Network 11 (1998) 637–649. [33] H.H. Tsai, D.W. Sun, Color image watermark extraction based on support vector machines, Information Sciences 177 (2) (2007) 550–569. [34] P.K. Tripathi, S. Bandyopadhyay, S.K. Pal, Multi-objective particle swarm optimization with time variant inertia and acceleration coefficients, Information Sciences 177 (22) (2007) 5033–5049. [35] A. Unler, A. Murat, R.B. Chinnam, mr2PSO: a maximum relevance minimum redundancy feature selection method based on swarm intelligence for support vector machine classification, Information Sciences, in press, doi:10.1016/j.ins.2010.05.037. [36] F. van den Bergh, A.P. Engelbrecht, A study of particle swarm optimization particle trajectories, Information Sciences 176 (8) (2006) 937–971. [37] V. Vapnik, The Nature of Statistical Learning, Springer, New York, 1995. [38] Y. Wang, Y. Yang, Particle swarm optimization with preference order ranking for multi-objective optimization, Information Sciences 179 (12) (2009) 1944–1959. [39] A. Widodo, B.S. Yang, Wavelet support vector machine for induction machine fault diagnosis based on transient current signal, Expert Systems with Applications 35 (1–2) (2008) 307–2316. [40] A. Widodo, B.S. Yang, Application of nonlinear feature extraction and support vector machines for fault diagnosis of induction motors, Expert Systems with Applications 33 (1) (2007) 241–250. [41] W.T. Wong, F.Y. Shih, J. Liu, Shape-based image retrieval using support vector machines, Fourier descriptors and self-organizing maps, Information Sciences 177 (8) (2007) 1878–1891. [42] Q. Wu, The forecasting model based on wavelet v-support vector machine, Expert Systems with Applications 36 (4) (2009) 7604–7610. [43] Q. Wu, J. Liu, F.L. Xiong, X.J. Liu, The fuzzy wavelet classifier machine with penalizing hybrid noises from complex diagnosis system, Acta Automatica Sinica 35 (6) (2009) 773–779 (in Chinese). [44] L. Yang, C.Y. Suen, T.D. Bui, P. Zhang, Discrimination of similar handwritten numerals based on invariant curvature features, Pattern Recognition 38 (7) (2005) 947–963. [45] I.C. Yeh, K.J. Yang, T.M. Ting, Knowledge discovery on RFM model using Bernoulli sequence, Expert Systems with Applications 36 (3) (2009) 5866– 5871. [46] W. Yu, X. Li, On-line fuzzy modeling via clustering and support vector machines, Information Sciences 178 (22) (2008) 4264–4279. [47] S.F. Yuan, F.L. Chu, Fault diagnostics based on particle swarm optimization and support vector machines, Mechanical Systems and Signal Processing 21 (4) (2007) 1787–1798. [48] J. Zhang, Y. Wang, A rough margin based support vector machine, Information Sciences 178 (9) (2008) 2204–2214. [49] Y. Zhao, J. Sun, Recursive reduced least squares support vector regression, Pattern Recognition 42 (5) (2009) 837–842. [50] S.M. Zhou, J.Q. Gan, F. Sepulved, Classifying mental tasks based on features of higher-order statistics from EEG signals in brain–computer interface, Information Sciences 178 (6) (2008) 1629–1640. [51] S.M. Zhou, R.I. John, X.Y. Wang, J.M. Garibaldi, Compact fuzzy rules induction and feature extraction using SVM with particle swarms for breast cancer treatments, in: Proceedings of 2008 IEEE Congress on Evolutionary Computation (CEC), 2008, pp.1469–1475.