Application of the PSO–SVM model for recognition of control chart patterns

Application of the PSO–SVM model for recognition of control chart patterns

ISA Transactions 49 (2010) 577–586 Contents lists available at ScienceDirect ISA Transactions journal homepage: www.elsevier.com/locate/isatrans Ap...

1MB Sizes 1 Downloads 50 Views

ISA Transactions 49 (2010) 577–586

Contents lists available at ScienceDirect

ISA Transactions journal homepage: www.elsevier.com/locate/isatrans

Application of the PSO–SVM model for recognition of control chart patterns Vahid Ranaee ∗ , Ata Ebrahimzadeh, Reza Ghaderi Faculty of Electrical and Computer Engineering, Babol University of Technology, Babol, Iran

article

info

Article history: Received 23 February 2010 Received in revised form 14 May 2010 Accepted 27 June 2010 Available online 20 July 2010 Keywords: Support vector machines Classification Feature selection Particle swarm optimization Control chart pattern recognition

abstract Control chart patterns are important statistical process control tools for determining whether a process is run in its intended mode or in the presence of unnatural patterns. Accurate recognition of control chart patterns is essential for efficient system monitoring to maintain high-quality products. This paper introduces a novel hybrid intelligent system that includes three main modules: a feature extraction module, a classifier module, and an optimization module. In the feature extraction module, a proper set combining the shape features and statistical features is proposed as the efficient characteristic of the patterns. In the classifier module, a multi-class support vector machine (SVM)-based classifier is proposed. For the optimization module, a particle swarm optimization algorithm is proposed to improve the generalization performance of the recognizer. In this module, it the SVM classifier design is optimized by searching for the best value of the parameters that tune its discriminant function (kernel parameter selection) and upstream by looking for the best subset of features that feed the classifier. Simulation results show that the proposed algorithm has very high recognition accuracy. This high efficiency is achieved with only little features, which have been selected using particle swarm optimizer. © 2010 ISA. Published by Elsevier Ltd. All rights reserved.

0. Introduction Control charts are widely used in modern industrial and service organizations. In recent years, various kinds of control chart have been developed according to different quality attributes and control targets. Monitoring process fluctuation with control charts was first proposed by Shewhart in 1924. It is believed that the process fluctuation involves abnormal changes due to assignable causes and normal changes due to unassignable causes. Therefore, automatically recognizing control chart patterns (CCPs) is an essential issue for identifying the process fluctuation effectively. CCPs can exhibit six common types of pattern: normal (NOR), cyclic (CYC), increasing trend (IT), decreasing trend (DT), upward shift (US), and downward shift (DS). Fig. 1 shows these six types of pattern [1]. There are several reported research papers on the summarization and categorization of all possible patterns that commonly appear in CCPs. Most of the existing CCP recognition schemes in the literature use normalized original data as the input vector to the recognizer [2–5]. These data representations normally produce large classifier structures and are not very effective and efficient for complicated recognition problems. A smaller classifier size can lead to faster training and generally more effective and efficient recognition. Regarding this, a feature-based control pattern recognition method was proposed [6–10] to solve these issues, and their



Corresponding author. E-mail address: [email protected] (V. Ranaee).

results showed that the use of features made the shape of a pattern explicit and feature-based neural network approach reduced the topological complexity of the network and the corresponding training time. According to the above description, it can be seen that quickly and accurately recognizing control chart patterns should ideally be achieved, especially in real manufacturing processes. Very limited work has been reported on the use of features extracted from CCPs as the input vectors. Hassan [6] introduced feature-based control chart pattern recognition. Six statistical features were proposed: mean, variance, skewness, mean-square value, autocorrelation, and cusum. The scheme was aimed at improving the performance of the pattern recognizer by presenting a smaller input vector (features). Al-Assaf [10] investigates the use of multi-resolution wavelet analysis (MRWA) with artificial neural networks (ANNs) to recognize CCPs. Pham and Wani [7] introduced feature-based control chart pattern recognition. Nine geometric features were proposed: slope, number of mean crossings, number of least-square line crossings, cyclic membership, average slope of the line segments, slope difference, and three different measures for area. The scheme was aimed at improving the performance of the pattern recognizer by presenting a smaller input vector (features). Chen [9] presents a hybrid approach by integrating wavelet method and neural network for on-line recognition of concurrent CCPs. In the hybrid system, concurrent CCPs are first preprocessed by a wavelet transform to decompose the concurrent patterns into different levels or patterns, and then the corresponding features are fed into back-propagation ANN classifiers for pattern recognition. Gauri and Chakraborty [8] also present a set of seven most useful features that are selected from a large

0019-0578/$ – see front matter © 2010 ISA. Published by Elsevier Ltd. All rights reserved. doi:10.1016/j.isatra.2010.06.005

578

V. Ranaee et al. / ISA Transactions 49 (2010) 577–586

a 40

b 60

c 80

40

60

20

40

30

20

0

20

40

60

0

d 40

e 60

20

40

0

20

40

60

20 0

20

40

60

20

40

60

f 40 30 20

0

0

20

40

60

20

0

20

40

60

10 0

Fig. 1. Control chart patterns. (a) Normal pattern, (b) cyclic pattern, (c) increasing trend, (d) decreasing trend, (e) upward shift, (f) downward shift.

number of potentially useful features using a CART-based systematic approach. Based on these selected features, eight most commonly observed CCPs are recognized using heuristic and ANN techniques. As mentioned above, artificial neural networks (ANNs) have been widely applied for detecting CCPs. However, an ANN suffers from several weaknesses, such as the requirement for a large amount of training data, ‘over-fitting’, slow convergence velocity and relapsing into a local extremum easily [11]. The practicability of ANNs is limited due to these weaknesses. Support vector machines (SVMs), based on statistical learning theory, are gaining applications in the area of pattern recognition because of their excellent generalization capability [12]. Using SVMs is the method that is receiving increasing attention, with remarkable results recently [12]. The main difference between ANNs and SVMs is the principle of risk minimization. An ANN implements empirical risk minimization to minimize the error on the training data, whereas an SVM implements the principle of structural risk minimization in place of experiential risk minimization, which makes it have excellent generalization ability in the situation when there is a small sample. The largest problems encountered in setting up the SVM model are how to select the kernel function and its parameter values. The parameters that should be optimized include the penalty parameter (C ) and the kernel function parameters such as the value of gamma (γ ) for the radial basis function (RBF) kernel. On the other hand, one of the other SVM classifier problems is selecting the number of features. With a small and appropriate feature subset, the rationale for the classification decision can be realized more easily. In general, the purpose of feature selection is to find the subset of features that represents the best target detection and best recognition performance. This requires the least computational effort. Therefore, both suitable feature subset selection and model parameter setting play an important role in the classification performance [13]. In this study, particle swarm optimization (PSO) is chosen as an optimization technique to optimize the input feature subset selection and the SVM parameters setting simultaneously. This technique will improve the SVM performance. This paper is organized as follows. Section 1 describes the efficient features. Section 2 describes the concepts needed, including the basic SVM and PSO concepts. Section 3 describes the PSO–SVM hybrid system. Section 4 describes the experimental design, including the data description and experiment settings. Some of the evaluation results of this system are given in Section 5. Finally, Section 6 concludes with a discussion of the paper.

1. Efficient features Features represent the format of the CCPs. As we know, different types of CCP have different properties; therefore finding the suitable features in order to identify them (especially in higherorder and/or non-square cases) is a difficult task. In the signal recognition area, choosing good features not only enables the classifier to distinguish more and higher CCPs, but also helps reduce the complexity of the classifier. In this paper, for the feature extraction module we have used a suitable set of features that consists of both shaping and statistical information of the CCPs. We now briefly describe these features. 1.1. Shape features The shape features used by the CCP recognizer in this study are such that they facilitate recognition of CCPs quickly and accurately. The six types of CCP considered in this work have different forms, which can be characterized by a number of shape features. In [14], the authors have introduced nine shape features for discrimination of the CCPs. In this paper, based on trial and error, eight of these features are considered. These features have been chosen such that they and the proposed statistical features can significantly recognize the patterns quickly and accurately. These features are as follows. (1) S: the slope of the least-square line representing the pattern. The magnitude of S of this line for normal and cyclic patterns is approximately zero, while that for trend and shift patterns is greater than zero. Therefore S may be a good candidate to differentiate natural and cyclic patterns from trend and shift patterns. (2) NC1: the number of mean crossings, i.e. the crossings of the pattern with the mean line. NC1 is small for shift and trend patterns. It is highest for normal patterns. For cyclic patterns, the number of crossings is intermediate between those for normal patterns and shift or trend patterns. This feature differentiates normal patterns from cyclic patterns. It also differentiates normal and cyclic patterns from trend and shift patterns. (3) NC2: the number of least-square line crossings. NC2 is highest for normal and trend patterns and lowest for shift and cyclic patterns. Thus it can be used for separation of natural and trend patterns from others. (4) AS: the average slope of the line segments. In addition to the least-square line which approximates the complete pattern, each pattern also has two line segments which fit the data

V. Ranaee et al. / ISA Transactions 49 (2010) 577–586

(5)

(6)

(7)

(8)

starting from either end of the pattern. The average slope of the line segments for a trend pattern will be higher than for normal, cyclic and shift patterns. This feature therefore differentiates trend patterns from other patterns. SD: the slope difference between the least-square line and the line segments representing a pattern. The SD value is obtained by subtracting the average slope as of the two line segments from the slopes of the least-square line. For normal, cyclic and trend patterns, the least-square line and the line segments will be different. Thus, the SD will have a high value for a shift pattern and small values for normal, cyclic and trend patterns. This feature therefore differentiates a shift pattern from other patterns. APML: the area between the pattern and the mean line. The APML is lowest for a normal pattern. Thus, this feature differentiates between normal and other patterns. APSL: the area between the pattern and its least-square line. Cyclic and shift patterns have a higher APSL value than normal and trend patterns and therefore the APSL can be used to differentiate cyclic and shift patterns from normal and trend patterns. ASS: the area between the least-square line and the line segments. The value of this feature is approximately zero for a trend pattern and is higher for a shift pattern. This feature thus differentiates trend patterns from shift patterns.

1.2. Statistical features Some statistical features are mean, standard deviation, skewness, kurtosis, and autocorrelation. In this paper we have used skewness and kurtosis. Skewness provides the information regarding to the degree of asymmetry and kurtosis measures the relative peakness or flatness of its distribution [6]. Their mathematical forms are respectively shown below:

emerging with greater availability of computing power, paving the way for numerous practical applications. The basic SVM deals with two-class problems; however, it can be developed for multi-class classification [15]. 2.1.1. Binary SVM (BSVM) An SVM performs classification tasks by constructing optimal separating hyperplanes (OSHs). An OSH maximizes the margin between the two nearest data points belonging to two separate classes. Suppose that the training set, (xi , yi ), i = 1, 2, . . . , l, x ∈ Rd , y ∈ {−1, +1}, can be separated by the hyperplane w T x+b = 0, where w is the weight vector and b is the bias. If this hyperplane maximizes the margin, then the following inequality is valid for all input data: yi (w T xi + b) ≥ 1,

mean =

std =

t =1

n

xt

,

(1)

v uP u n u (xt − mean)2 t t =1 n P

skew =

(2)

n(std)3

,

(3)

1

Lp =

2

kwk2 −

n(std)

4

l X

ai [yi (w T xi + b) − 1].

(6)

i =1

After minimizing Lp with respect to w and b, the optimal weights are given by

w∗ =

l X

ai yi xi .

(7)

i =1

The dual of the problem is given by Ld =

l X

ai −

l l 1 XX

2 i =1 i =1

ai aj yi yj xTi xj .

(8)

To find the OSH, it must maximize Ld under the constraints of Pl i=1 ai yi = 0. The Lagrange multipliers are only non-zero (a > 0) when yi (w T xi + b) = 1. Those training points for which the equality in (5) holds are called support vectors (SVs) that can satisfy (ai > 0). The optimal bias is given by b ∗ = y i − w ∗T x i

(9)

f (x) = sgn

l X

! ∗ T

yi ai x xi + b



,

(10)

i =1

(xt − mean)4

t =1

(5)

for any support vector xi . The optimal decision function (ODF) is then given by

(xt − mean)3

t =1

n P

kurt =

,

n

forall xi , i = 1, 2, . . . , l.

2 The margin of the hyperplane is kwk . Thus, the problem is the maximizing of the margin by minimizing kwk subject to (5). This is a convex quadratic programming (QP) problem, and Lagrange multipliers (ai , i = 1, 2, . . . , l; ai ≥ 0) are used to solve it:

i=1 n P

579

,

(4)

where xt represent the input (reference) vector and n is the total length of the observing window. The box plots and the feature space plots of the twelve features for different classes, which are generated by this process, are presented in Fig. 2. As is seen, the patterns related to the different classes are located close to each other and are relatively well separated from the other classes within the feature space. 2. Needed concepts 2.1. Support vector machine (SVM) We have proposed a multi-class SVM-based classifier (MCSVM). SVMs were introduced on the foundation of statistical learning theory. Since the mid-1990s, the algorithms used for SVMs started

where the a∗i are optimal Lagrange multipliers. For input data with a high noise level, an SVM using soft margins can be expressed as follows with the introduction of the non-negative slack variables ξi , i = 1, 2, . . . , l: yi (w T xi + b) ≥ 1 − ξi

for i = 1, 2, . . . , l.

To obtain the OSH, it should be minimizing Φ

(11)

=

1 2

kwk2 +

C i=1 ξ subject to (11), where C is the penalty parameter, which controls the trade-off between the complexity of the decision function and the number of training examples that have been misclassified. In the nonlinearly separable cases, the SVM maps the training points, nonlinearly, to a high-dimensional feature space using − → − → kernel function K ( xi , yi ), where linear separation may be possible. There are several types of kernel function. Linear:

Pl

k i

− → − →

− →− →

K ( xi , yi ) = xi . yi .

(12)

580

V. Ranaee et al. / ISA Transactions 49 (2010) 577–586

2nd feature for class 1-6

3rd feature for class 1-6

4th feature for class 1-6

5th feature for class 1-6

6th feature for class 1-6

7th feature for class 1-6

8th feature for class 1-6

9th feature for class 1-6

10th feature for class 1-6

11th feature for class 1-6

12th feature for class 1-6

Value

Value

Value

Value

1st feature for class 1-6

Number of class Fig. 2. Box plots of the twelve features for different classes (1 = NOR, 2 = CYC, 3 = IT, 4 = DT, 5 = US, 6 = DS). For more information see the text.

The a∗i are derived by

Gaussian radial basis function (GRBF):



− → − →

K ( xi , yi ) = exp −

kxi − yi k σ2

2

− → − →

K ( xi , yi ) = exp(−γ kxi − yi k2 ).

(13)

a∗i = arg max Ld

(14)

0 ≤ ai ≤ C ; i = 1, 2, . . . , l;

(17) ai y i = 0 .

(18)

j =1

Polynomial:

− → − →

l X

− →− →

K ( xi , yi ) = ( xi . yi + 1)d ,

(15)

where γ and d are the parameters of the kernel functions. After a kernel function is selected, the QP problem will become

After training, the decision function becomes f (x) = sgn

l X

! − → − → ∗ yi ai K ( xi , yi ) + b . ∗

(19)

i=1

Ld =

l X i=1

ai −

l l 1 XX

2 i=1 i=1

− → − →

ai aj yi yj K ( xi , yi ).

(16)

The performance of an SVM can be controlled through the term C and the kernel parameter, which are called hyperparameters.

V. Ranaee et al. / ISA Transactions 49 (2010) 577–586

581

Fig. 4. Particle representation.

where n is the dimension (1 ≤ n ≤ N ), c1 and c2 are positives constants, rand1() and rand2() are two random functions in the range [0, 1], and w is the inertial weight. The inertial weight is linearly decreasing [18] according to the following equation:

wt = wmax − Fig. 3. Multi-class SVM-based classifier.

These parameters influence the number of support vectors and the maximization margin of the SVM. 2.1.2. Multi-class SVM-based classifier (MCSVM) There are two widely used methods to extend binary SVMs to multi-class problems [16]. One of them is called the one-against-all (OAA) method. Suppose that we have a P-class pattern recognition problem; P independent SVMs are constructed and each of them is trained to separate one class of samples from all others. When testing the system after all the SVMs are trained, a sample is input to all the SVMs. Suppose that this sample belongs to class P1; ideally only the SVM trained to separate class P1 from the others can have a positive response. Another method is called oneP (P −1) against-one (OAO) method. For a P-class problem, 2 SVMs are constructed and each of them is trained to separate one class from another class. Again, the decision of a testing sample is based on the voting result of these SVMs. The structure of this classifier is shown in Fig. 3. 2.2. Particle swarm optimization This section provides a brief introduction to basic PSO concepts. PSO is a form of stochastic optimization technique originally developed by Kennedy and Eberhart [17] that originated from the simulation of the behavior of a group of a flock of birds or school of fish or the social behavior of a group of people. Each individual flies in the search space with a velocity which is dynamically adjusted according to its own flying experience and its companions’ flying experience, instead of using evolutionary operators to manipulate the individuals like in other evolutionary computational algorithms. Each individual is considered as a volume-less particle (a point) in the Ndimensional search space. At time step t, the ith particle is represented as Xi (t ) = (xi1 (t ), xi2 (t ), . . . , xiN (t )). The set of positions of m particles in a0 multidimensional space is identified as X = {X1 , . . . , Xj , . . . Xl , . . . , Xm }. The best previous position (the position giving the best fitness value) of the ith particle is recorded and represented as Pi (t ) = (pi1 , pi2 , . . . , piN ). The index of the best particle among all the particles in the population (global model) is represented by the symbol g. The index of the best particle among all the particles in a defined topological neighborhood (local model) is represented by the index subscript l. The rate of movement (velocity) for particle i at time step t is represented as Vi (t ) = (vi1 (t ), vi2 (t ), . . . , viN (t )). The particle variables are manipulated according to the following equation (global model [17]):

vin (t ) = wi × vin (t − 1) + c1 × rand1( ) × (pin − xin (t − 1)) + c2 × rand2() × (pgn − xin (t − 1)) (20) xin (t ) = xin (t − 1) + vin (t ),

(21)

wmax − wmin tmax

× t.

(22)

For the neighborhood (lbest) model, the only change is to substitute pin for pgn in the equation for velocity. This equation in the global model is used to calculate a particle’s new velocity according to its previous velocity and the distance of its current position from its own best experience (pbest) and the group’s best experience (gbest). The local model calculation is identical, except that the neighborhood’s best experience is used instead of the group’s best experience. Particle swarm optimization has been used for approaches that can be used across a wide range of applications, as well as for specific applications focused on a specific requirement. Its attractiveness over many other optimization algorithms relies in its relative simplicity because only a few parameters need to be adjusted. 3. Classification modeling by the combination of PSO and a support vector machine (PSO–SVM) In this research, a nonlinear SVM based on the popular Gaussian kernel (SVM–RBF) has been studied. The largest problems encountered in setting up the SVM model are how to select the kernel function and its parameter values. The parameters that should be optimized include the penalty parameter (C ) and the kernel function parameters such as the value of gamma (γ ) for the radial basis function (RBF) kernel. On the other hand, one of the other SVM classifier problems is selecting the number of features. With a small and appropriate feature subset, the rationale for the classification decision can be realized more easily. Therefore, both suitable feature subset selection and model parameter setting play an important role in the classification performance [13]. Fig. 5 shows the flowchart of the PSO-based parameter determination and feature selection approach for the SVM classifier. Particle representation. The particle is comprised of three parts: the input features, C , and γ , when the RBF kernel is selected. Fig. 3 illustrates the representation of a particle with dimension n + 2, where n is the total number of input features (variables) of a data set. The value of the n variables ranges between 0 and 1. If the value of a variable is less than or equal to 0.5, then its corresponding feature is not chosen. Conversely, if the value of a variable is greater than 0.5, then its corresponding feature is chosen (see Fig. 4). Fitness function. The fitness function is used to evaluate the quality of every particle which must be designed before searching for the optimal values of both the SVM parameters and the feature selection. The fitness function is based on the classification accuracy of a SVM classifier, which is as follows: Fitness function =

yt yt + yf

× 100,

(23)

where yt and yf denote the number of true and false classifications, respectively.

582

V. Ranaee et al. / ISA Transactions 49 (2010) 577–586

Fig. 5. The architecture of the proposed PSO-based parameter determination and feature selection approach for the SVM classifier.

Table 1 Coefficient values in the PSO algorithm.

4. Experiment design 4.1. Data description Our experiments were conducted on the basis of control charts synthetically generated by the process in Alcock and Manolopoulos [19]. It has six different classes of control charts with 100 instances of each class. In particular, the considered patterns refer to the following classes: normal (NOR), cyclic (CYC), increasing trend (IT), decreasing trend (DT), upward shift (US), and downward shift (DS). The data are taken from [20]. The following equations were used to create the data points for the various patterns. (i) Normal patterns: P (t ) = η + r (t ) × σ .

(24)

(ii) Cyclic patterns: P (t ) = η + r (t ) × σ + a × sin



2π t T



.

(25)

(iii) Increasing trend patterns: P (t ) = η + r (t ) × σ + g × t .

(26)

(iv) Decreasing trend patterns: P (t ) = η + r (t ) × σ − g × t .

(27)

Swarm size Acceleration constant Maximum velocity Maximum number of iterations Size of the local neighborhood Constants c1 = c2

20 3 8 100 2 2

4.2. Experiment settings In the experiments, we considered a nonlinear SVM based on the popular Gaussian kernel (SVM–RBF). The related parameters C and γ for this kernel were varied in the arbitrarily fixed ranges [1, 1000] and [0, 1] so as to cover high and small regularization of the classification model, and fat as well as thin kernels, respectively. In addition, for comparison purposes, we implemented, in the first experiment, the SVM classifier with two other kernels, namely linear and polynomial kernels, leading therefore to two other SVM classifiers, termed as SVM-linear and SVM-poly, respectively. The degree d of the polynomial kernel was varied in the range [2, 5] in order to span polynomials with low and high flexibility. In the particle swarm, there are several coefficients whose values can be adjusted to produce a better rate of convergence. Table 1 shows the coefficient values in the PSO algorithm.

(v) Upward shift patterns: P (t ) = η + r (t ) × σ + b × s.

(28)

(vi) Downward shift patterns: P (t ) = η + r (t ) × σ − b × s.

(29)

Here η is the nominal mean value of the process variable under observation (set to 80), σ is the standard deviation of the process variable (set to 5), a is the amplitude of cyclic variations in a cyclic pattern (set to 15 or less), g is the gradient of increasing or decreasing trend pattern trend pattern (set in the range 0.2 to 0.5), b indicates the shift position in an upward shift pattern and a downward shift pattern (b = 0 before the shift and b = 1 at the shift and thereafter), s is the magnitude of the shift (set between 7.5 and 20), r (.) is a function that generates random numbers normally distributed between −3 and 3, t is the discrete time at which the monitored process variable is sampled (set within the range 0 to 59), T is the period of the cycle (set between 4 and 12 sampling intervals), and P (t ) is the value of the sampled data point at time t.

5. Results In this section, the performance of the proposed recognizer is evaluated. 600 patterns,100 of each type, were previously generated. About 20% of the samples are used as the training set of the SVM classifier and the rest are used for testing. Based on extensive simulations, it is found that the SVM with the GRBF kernel has better results than those from the other kernels. So it was adopted as the kernel function. Several experiments were done to verify the effectiveness of the proposed method. 5.1. Experiment 1: classification in the whole original hyperdimensional feature space As the first experiment, we applied the SVM classifier directly on the entire original hyper-dimensional feature space, which is made up of 12 features. The performances of the one-againstone (OAO) and the one-against-all (OAA) methods using the SVM

V. Ranaee et al. / ISA Transactions 49 (2010) 577–586

583

Table 2 Performance comparison of the one-against-one (OAO) and one-against-all (OAA) methods with different kernels. Method

SVM-linear

SVM-linear

SVM-poly

SVM-poly

SVM-rbf

SVM-rbf

PSO–SVM

PSO–SVM

Recognition accuracy (%)

OAO

OAA

OAO

OAA

OAO

OAA

OAO

OAA

NOR

CYC

IT

DT

US

DS

89.17

NOR CYC IT DT US DS

100 0 3.75 1.25 2.5 0

0 100 0 0 0 0

0 0 95 0 1.25 1.25

0 0 0 91.25 0 8.75

0 0 1.25 0 96.25 0

0 0 0 7.5 0 90

90.42

NOR CYC IT DT US DS

98.75 11.25 0 1.25 0 0

1.25 88.75 0 0 1.25 0

0 0 98.75 0 18.75 0

0 0 0 95 0 18.75

0 0 1.25 0 80 0

0 0 0 5 0 81.25

93.75

NOR CYC IT DT US DS

100 0 3.75 1.25 0 0

0 100 0 0 7.5 15

0 0 98.75 0 1.25 0

0 0 0 95 0 7.5

0 0 1.25 0 91.25 0

0 0 0 3.75 0 77.5

94.17

NOR CYC IT DT US DS

100 0 3.75 1.25 25 0

0 100 0 0 0 0

0 0 95 0 2.5 1.25

0 0 0 90 0 5

0 0 1.25 0 86.25 0

0 0 0 8.75 8.75 93.75

95.42

NOR CYC IT DT US DS

100 0 3.75 1.25 2.5 0

0 100 0 0 0 0

0 0 95 0 1.25 1.25

0 0 0 91.25 0 8.75

0 0 1.25 0 96.25 0

0 0 0 7.5 0 90

97.08

NOR CYC IT DT US DS

100 2.5 0 0 0 0

0 97.5 0 0 0 0

0 0 98.75 0 3.75 0

0 0 0 100 0 10

0 0 1.25 0 96.25 0

0 0 0 0 0 90

99.17

NOR CYC IT DT US DS

100 0 0 0 0 0

0 100 0 0 0 0

0 0 98.75 0 1.25 0

0 0 0 100 0 2.5

0 0 1.25 0 98.75 0

0 0 0 0 0 97.5

99.58

NOR CYC IT DT US DS

100 0 0 0 0 0

0 100 0 0 0 0

0 0 98.75 0 0 0

0 0 0 100 0 1.25

0 0 1.25 0 100 0

0 0 0 0 0 98.75

classifier with different kernels are compared in Table 2. We chose the best SVM classifier parameter values to maximize this prediction. As reported in Table 2, the percentage recognition accuracies of the OAO and OAA methods achieved with the SVM classifier based on the Gaussian kernel (SVM–RBF) on the test set were equal to 95.42% and 97.08%, respectively. These results were better than those achieved by SVM-linear and SVM-poly. Indeed, the percentage recognition accuracies of the OAO (and OAA) methods were equal to 89.17% (90.42%) for the SVM-linear classifier and 93.75% (94.17%) for the SVM-poly classifier. As can be seen from Table 2, separating the increasing trend (IT) and the upward shift (US) patterns as well as the decreasing trend (DT) and the downward shift (DS) patterns is very difficult due to the similarities between them, which are also the most overlapped ones according to Fig. 2 (Section 1.2). In addition, it provides reference classification accuracies in order to quantify the capability of the proposed PSO–SVM classification system to further improve these interesting results.

As already stated, the features play a vital role in the classification of digital signal types. In order to investigate the effectiveness of the selected features, we have used features that have been introduced in some references. Table 3 shows this comparison. The other simulation setups are the same. The results imply that the proposed features have effective properties in control chart pattern representation. 5.2. Experiment 2: performance evaluation with optimization in different runs In this experiment, we trained the SVM classifier based on the Gaussian kernel, which proved in the previous experiments to be the most appropriate kernel for control chart patterns classification; then, to evaluate the performance of the proposed algorithm (PSO–SVM), ten different runs were performed. PSO finds the best combination of the parameters of the SVM classifier and features to gain the fitness function maximum. More detailed information about the selected features and the optimal values of the SVM classifier parameters (i.e., the γ parameter and optimal

584

V. Ranaee et al. / ISA Transactions 49 (2010) 577–586

Table 3 Comparison of the proposed features and some features that have been introduced in other references.

Table 5 Recognition accuracy of the optimized recognizer for acceleration constants tuned in the range [1, 2].

Ref.

Features

Total accuracy

Constants c1 = c2

Recognition accuracy (%)

[21]

Correlation between the input and various reference vectors Statistical correlation coefficients Shape features

93.94

1 1.2 1.4 1.6 1.8 2

99.17 98.33 98.85 99.37 99.25 99.58

[22] [14] Proposed features (SVM–RBF) (OAA) Proposed method (PSO–SVM) (OAA)

95.19 96.28 97.08 99.58

Table 4 Selected features and the optimal values of the SVM classifier parameters for different runs. Run

Features

Size of features

Correct feature

γ

C

Bestfitness

#1 #2 #3 #4 #5 #6 #7 #8 #9 #10 Mean

[6, 8, 9] [6, 8, 9] [6, 8, 9] [1, 4, 8.9] [6, 8, 9] [6, 8, 9] [1, 4, 5, 6, 9, 10] [1, 4, 5, 9, 11, 12] [6, 8, 9] [6, 8, 9]

3 3 3 5 3 3 6 6 3 3

Yes Yes Yes No Yes Yes No No Yes Yes 0.7

0.0078 0.0103 0.0107 0.0109 0.0092 0.0111 0.0069 0.0098 0.0118 0.0069 0.00946

429.7 327.3 334.4 301.3 388 304.5 485.7 338.5 289.6 463.5 366.25

99.58 99.58 99.58 99.17 99.58 99.58 96.46 96.88 99.58 99.58 98.95

5.3. Experiment 3: sensitivity to the acceleration parameters As mentioned, selection of the PSO parameters is an important issue. We have analyzed the sensitivity of the PSO–SVM recognition system with respect to the two acceleration constants c1 and c2 , which control the behavior, and thus the goodness of the PSO search process. We varied c1 and c2 in the range [1, 2]. Table 5 shows the results. In this case, the recognition accuracy was less affected by the variation of these parameters. Indeed, it fluctuated from 99.58% for c1 = c2 = 2 to 98.33% for c1 = c2 = 1.2. However, even when nonstandard parameter values are adopted, the achieved accuracies still remain above those yielded by the reference classifiers.

50 45 40 Ninth feature

40 iterations, the algorithm converges to the best values of the C and γ parameters of the SVM classifier. Fig. 8 shows a typical increase of the fitness (classification accuracy) of the best particle of the swarm obtained from the PSO–SVM recognition system for the four different runs. As indicated in Fig. 8, its fitness curves gradually improved from iteration 0 to 100, and they exhibited no significant improvements after iteration 70 for the four different runs. The optimal stopping iteration to get the highest validation accuracy for the four different runs was around iteration 50–70. Also, since the value of fitness depends on the features selected at each iteration, the value of the fitness in the four different runs changes even after the 40th iteration (when the values of C and γ parameters converged to their final values).

35 30 25

5.4. Experiment 4: comparing performances of the classification techniques

20 15 10 100

200

300

400

500 Sixth feature

600

700

50

200 250 100 150 Eighth feature

Fig. 6. Feature space plots of the sixth, eighth, and ninth features for different classes.

values of the C parameter) obtained by the proposed algorithm (PSO–SVM) are shown in Table 4. The proposed algorithm successfully finds the global optimum just with 100 iterations. It can be seen that the sixth, eighth and ninth features produced the best accuracy of 99.58%. This result was repeated in multiple runs of the program, and it shows that these features have very good discrimination ability for our classes. Fig. 6 supports our claim. In Fig. 6, the six classes are represented through a three-dimensional distribution to measure the amount of overlap between classes visually. As is seen, the classes are separated by three features. Values of the C and γ parameters of the SVM classifier in four different runs of the program with 100 iterations are presented in Fig. 7. In each different run of program, PSO first generates random values for C and γ (as seen in the figure), and then it searches for better values of them that produce better fitness. Usually, after

The performance of the proposed classifier has been compared with other classifiers for investigating the capability of the proposed classifier, as indicated in Table 6. In this respect, probabilistic neural networks (PNNs) [23], radial basis function neural networks (RBFNNs) [24] and multilayered perceptron (MLP) neural networks with different training algorithm such the as back propagation (BP) learning algorithm [25] and with the resilient propagation (RP) learning algorithm [26] are considered. They comprise parameters which should be readjusted in any new classification. Furthermore, those parameters regulate the classifiers to be best fitted in the classification task. In most cases, there is no classical method for obtaining their values, and therefore they are experimentally specified through try and error. It can be seen from Table 6 that the proposed method has better recognition accuracy than that of other classifiers. 6. Conclusion and discussion Control chart patterns (CCPs) are important statistical process control tools for determining whether a process is run in its intended mode or in the presence of unnatural patterns. This study presents methods for improving SVM performance in two aspects: feature selection and parameter optimization. The new

V. Ranaee et al. / ISA Transactions 49 (2010) 577–586

a

b

1000

800

585

0.49

run 1

0.44

run 2

0.39

run 1 run 2 run 3

run 3 run 4

γ Parameter

600

C Parameter

run 4

0.34

400

0.29 0.24 0.19 0.14

200

0.09 0.04

0

0

20

40

60

80

100

-0.01

0

Iteration

20

40

60

80

100

Iteration

Fig. 7. (a) Evaluated parameter (γ ) of the GRBF kernel function of the SVM classifier obtained by PSO for different runs and (b) evaluated parameter (C ) of the SVM classifier obtained by PSO for different runs.

Table 6 Comparison of the performance of the proposed classifier (PSO–SVM) with that of other classifiers. Classifier

Parameters

Recognition accuracy (%)

PNN RBF MLP (BP) MLP (RP) PSO–SVM (proposed method)

Spread = 52 Spread = 85, goal = 0.01 Hidden neurons = 20 Hidden neurons = 26 C = 429.7, γ = 0.0078

93.37 94.58 94.16 93.95 99.58

References

100 99.58

The best fitness function

99 98.5 98 97.5 run 1 run 2 run 3 run 4

97

96

0

20

40

60 Iteration

accuracy (RA) of about 95.58% with 60 000 samples. The authors achieved an average RA lower than 94% in [4,28]. In [29], the proposed method reached an RA of about 93.73%. Also, in [2] the proposed method achieved an RA of about 97.46%. The proposed method in this paper shows an RA higher than 99%.

80

100

Fig. 8. The best fitness evolution of the fitness function for different runs.

method that proposed in this paper is the combination of a support vector machine and particle swarm optimization (PSO–SVM). This modified PSO is jointly applied to optimize the feature selection and the SVM kernel parameter. We evaluated the proposed model using a data set and compared it with other models. The simulation results indicate that the PSO–SVM method can correctly select the discriminating input features and also achieve high classification accuracy (99.58%). This high efficiency is achieved with only three features, which have been selected using the particle swarm optimizer. In order to show the importance of the proposed technique, we compared the performances of some the previous works. The authors of [27] gained a performance with about 94.30% accuracy with 1500 samples. The authors of [3] gained a recognition

[1] Montgomery DC. Introduction to Statistical Quality Control. 5th ed. Hoboken (NJ, USA): John Wiley; 2005. [2] Le Q, Goal X, Teng L, Zhu M. A new ANN model and its application in pattern recognition of control charts. In: Proc. IEEE. WCICA. 2008. p. 1807–11. [3] Cheng Z, Ma Y. A research about pattern recognition of control chart using probability neural network. In: Proc. ISECS. 2008. p. 140–5. [4] Guh RS, Zorriassatine F, Tannock JD. On-line control chart pattern detection and discrimination—a neural network approach. Artificial Intelligence in Engineering 1999;13:413–25. [5] Pham DT, Oztemel E. Control chart pattern recognition using linear vector quantization networks. International Journal of Production Research 1994;32: 256–62. 721–729. [6] Hassan A, Nabi Baksh MS, Shaharoun AM, Jamaluddin H. Improved SPC chart pattern recognition using statistical features. International Journal of Production Research 2003;41(7):1587–603. [7] Pham DT, Wani MA. Feature-based control chart pattern recognition. International Journal of Production Research 1997;35(7):1875–90. [8] Gauri SK, Chakraborty S. Recognition of control chart patterns using improved selection of features. Computers and Industrial Engineering 2009;56:1577–88. [9] Chen Z, Lu S, Lam S. A hybrid system for SPC concurrent pattern recognition. Advanced Engineering Informatics 2007;21:303–10. [10] Al-assaf Y. Recognition of Control chart patterns using multi-resolution wavelets and neural networks. Computers and Industrial Engineering 2004; 47:17–29. [11] Mohamed EA, Abdelaziz AY, Mostafa AS. A neural network-based scheme for fault diagnosis of power transformers. Electric Power Systems Research 2005; 75(1):29–39. [12] Vapnik V. The nature of statistical learning theory. New York: Springer-Verlag; 1995. [13] Huang CL, Wang CJ. A GA-based attribute selection and parameter optimization for support vector machine. Expert Syststems With Applications 2006; 31(2):231–40. [14] Wani MA, Rashid S. Parallel algorithm for control chart pattern recognition. In: Proc. IEEE the fourth international conference on machine learning and applications. 2005. [15] Cortes C, Vapnic V. Support vector network. Machine Learning 1995;20:1–25. [16] Burges C. A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery 1998;2:121–67. [17] Eberhart RC, Kennedy J. A new optimizer using particle swarm theory. In: Proc. ISMMH S. 1995. p. 39–43. [18] Shi YH, Eberhart RC. Empirical study of particle swarm optimization. In: Proceedings of the congress on evolutionary computation. 1999. p. 1945–50. [19] Alcock RJ, Manolopoulos Y. Time-series similarity queries employing a featurebased approach. In: 7th hellenic conference on informatics. 1999. [20] Hettich S, Bay S. The uci kdd archive [http://kdd.ics.uci.edu ]. [21] Al-Ghanim AM, Ludeman LC. Automated unnatural pattern recognition on control charts using correlation analysis techniques. Computers & Industrial Engineering 1997;32(3):679–90.

586

V. Ranaee et al. / ISA Transactions 49 (2010) 577–586

[22] Yang JH, Yang MS. A control chart pattern recognition scheme using a statistical correlation coefficient method. Computers and Industrial Engineering 2005;48:205–21. [23] Specht DF. Probabilistic neural networks. Neural Networks 1990;3:109–18. [24] Luan F, Zhang XY, Zhang HX, Zhang RS, Liu MC, Hu ZD, et al. QSPR study of permeability coefficients through low-density polyethylene based on radial basis function neural networks and the heuristic method. Computational Materials Science 2006;37(4):454–61. [25] Rumelhart DE, McClelland JL. Parallel distributed processing: explorations in the microstructure of cognition. Cambridge (MA): MIT Press; 1986.

[26] Riedmiller M, Braun H. A direct adaptive method for faster backpropagation learning: the RPROP algorithm. In: Proceedings of the IEEE int. conf. on neural networks. 1993. [27] Pham DT, Oztemel E. Control chart pattern recognition using neural networks. Journal of Systems Engineering 1994;2:256–62. [28] Guh RS. Robustness of the neural network based control chart pattern recognition system to non-normality. International Journal of Quality & Reliability management 2002;19:97–112. [29] Sagiroujlu S, Besdoc E, Erler M. Control chart pattern recognition using artificial neural networks. Turk Journal of Electrical Engineering 2000;8: 137–47.