A self-adaptive embedded chaotic particle swarm optimization for parameters selection of Wv-SVM

A self-adaptive embedded chaotic particle swarm optimization for parameters selection of Wv-SVM

Expert Systems with Applications 38 (2011) 184–192 Contents lists available at ScienceDirect Expert Systems with Applications journal homepage: www...

824KB Sizes 0 Downloads 65 Views

Expert Systems with Applications 38 (2011) 184–192

Contents lists available at ScienceDirect

Expert Systems with Applications journal homepage: www.elsevier.com/locate/eswa

A self-adaptive embedded chaotic particle swarm optimization for parameters selection of Wv-SVM Qi Wu * Jiangsu Key Laboratory for Design and Manufacture of Micro-Nano Biomedical Instruments, Southeast University, Nanjing 211189, China Key Laboratory of Measurement and Control of Complex Systems of Engineering, Ministry of Education, Southeast University, Nanjing, Jiangsu 210096, China

a r t i c l e

i n f o

Keywords: Particle swarm optimization Self-adaptive and normal gauss mutation Chaotic mapping Wv-SVM

a b s t r a c t Particle swarm optimization (PSO) is a population-based swarm intelligence algorithm driven by the simulation of a social psychological metaphor instead of the survival of the fittest individual. Based on the chaotic system theory, this paper proposes new PSO method that uses chaotic mappings for parameter adaptation of Wavelet v-support vector machine (Wv-SVM). Since chaotic mapping enjoys certainty, ergodicity and the stochastic property, the proposed PSO introduces chaos mapping using logistic mapping sequences which increases its convergence rate and resulting precision. The simulation results show the parameter selection of Wv-SVM model can be solved with high search efficiency and solution accuracy under the proposed PSO method. Ó 2010 Elsevier Ltd. All rights reserved.

1. Introduction Particle swarm optimization (PSO) is an evolutionary computation technique developed by Kennedy and Eberhart (1995), inspired by social behavior of bird flocking or fish schooling. Similar to genetic algorithm (GA), PSO is a population based optimization tool. PSO is based on the metaphor of social interaction and communication such as bird flocking. Original PSO is distinctly different from other evolutionary-type methods in a way that it does not use the filtering operation (such as crossover and mutation) and the members of the entire population are maintained through the search procedure so that information is socially shared among individuals to direct the search towards the best position in the search space. It has also fast converging characteristics and more global searching ability at the beginning of the run and a local searching near the end of the run. However, it has sometimes a slow fine-tuning ability of the solution quality. While solving problems with more local optima, it is more likely that PSO will explore local optima at the end of the run. The convergence properties of PSO are strongly related to its stochastic nature and PSO uses random sequence for its particles during a run. In particular, it can be shown that when different random sequences are used during the PSO search, the final results may effectively be very close but not equal. Different numbers of iterations may also be required to reach the same optimal values. However, there are no analytical results that guarantee an improvement of

* Present address: Jiangsu Key Laboratory for Design and Manufacture of MicroNano Biomedical Instruments, Southeast University, Nanjing 211189, China. Tel.: +86 25 51166581; fax: +86 25 511665260. E-mail address: [email protected] 0957-4174/$ - see front matter Ó 2010 Elsevier Ltd. All rights reserved. doi:10.1016/j.eswa.2010.06.038

the performance indexes of PSO depending on the modified setting and choice of a particular generator as in evolutionary algorithm. Chaos is a bounded unstable dynamic behavior that exhibits sensitive dependence on initial conditions and includes infinite unstable periodic motions in nonlinear systems. Although it appears to be stochastic, it occurs in a deterministic nonlinear system under deterministic conditions. Many chaotic mappings in the literature possess certainty, ergodicity and the stochastic property (Chen, Zhang, & Bi, 2008; Chuanwena & Bompard, 2005; Pareek, Patidar, & Sud, 2006; Yang, Yang, Yin, & Li, 2008). Recently, chaotic sequences have been adopted instead of random sequences and very interesting and somewhat good results have been shown in many applications (Chuanwena & Bompard, 2005; Yang et al., 2008). They have been used to improve the performance of evolutionary algorithm (Chuanwena & Bompard, 2005; Yang et al., 2008). The choice of chaotic sequences is justified theoretically by their unpredictability, i.e., by their spread-spectrum characteristic, non-periodic, complex temporal behavior, and ergodic properties. The published PSO algorithms not have any rich literatures on the PSO algorithms with combinatorial action among chaotic mapping, self-adaptive mutation and normal gauss mutation operators. Based on the chaotic system theory, this paper proposes an embedded chaotic particle swarm optimization (ECPSO) that uses chaotic mapping to optimize parameters adaptation of Wavelet vsupport vector machine (Wv-SVM). Their superiority over traditional PSO is verified in numerical simulation. The proposed method consists of two PSO with adaptive and normal gauss mutation operators (ANPSO) dwelled in father process and child process respectively. The local optimal particle is obtained from child process, while the global optimal particle is obtained from father process.

185

Q. Wu / Expert Systems with Applications 38 (2011) 184–192

Child chaotic colony in child process consists of sequences generated from chaotic mapping. The local optimal particle obtained from child process substitutes the original particle from father process where it is necessary to make a random-based choice. The rest of this paper is organized as follows. Section 2 introduces the standard PSO. The proposed scheme is arranged in Section 3. Section 4 gives wavelet v-support vector machine. Section 5 gives experimental simulation and results. Conclusions are drawn in the end. 2. The standard particle swarm optimization Similarly to evolutionary computation techniques, PSO uses a set of particles, representing potential solutions to the problem under consideration. The swarm consists of m particles. Each particle has a position Xi = {xi1, xi2, . . . , xid, . . . , xim}, a velocity Vi = {vi1, vi2, . . . , vid, . . . , vim}, where i = 1, 2, . . . , n; d = 1, 2, . . . , m, and moves through a n-dimensional search space. According to the global variant of the PSO, each particle moves towards its best previous position and towards the best particle g in the swarm. Let us denote the best previously visited position of the ith particle that gives the best fitness value as Pi = {pi1, pi2, . . . , pid, . . . , pim}, and the best previously visited position of the swarm that gives best fitness as pg = {pg1, pg2, . . . , pgd, . . . , pgm}. The change of position of each particle from one iteration to another can be computed according to the distance between the current position and its previous best position and the distance between the current position and the best position of swarm. Then the updating of velocity and particle position can be obtained by using the following equations:

v kþ1 ¼ wv kid þ c1 r1 id kþ1 ¼ xkid þ v kþ1 xid id



   pid  xkid þ c2 r 2 pgd  xkid

ð1Þ ð2Þ

where w is called inertia weight and is employed to control the impact of the previous history of velocities on the current one. Accordingly, the parameter w regulates the trade-off between the global and local exploration abilities of the swarm. A large inertia weight facilitates global exploration, while a small one tends to facilitate local exploration. A suitable value of the inertia weight w usually provides balance between global and local exploration abilities and consequently results in a reduction of the number of iterations required to locate the optimum solution. k denotes the iteration number, c1 is the cognition learning factor, c2 is the social learning factor, r1and r2 are random numbers uniformly distributed in [0, 1]. Thus, the particle flies through potential solutions towards Pti and ptg in a navigated way while still exploring new areas by the stochastic mechanism to escape from local optima. Since there was no actual mechanism for controlling the velocity of a particle, it was necessary to impose a maximum value Vmax on it. If the velocity exceeds the threshold, it is set equal to Vmax, which controls the maximum travel distance at each iteration to avoid this particle flying past good solutions. The PSO is terminated with a maximal number of generations or the best particle position of the entire swarm can not be improved further after a sufficiently large number of generations. The PSO has shown its robustness and efficacy in solving function value optimization problems in real number spaces. 3. The proposed particle swarm optimization 3.1. The PSO with self-adaptive and normal gauss mutation operators (ANPSO) One of the major drawbacks of the PSO is its premature convergence, especially while handling problems with more local optima. Aim at the disadvantage of the standard PSO, the self-adaptive

mutation (Krusienski, 2006; Yamaguchi, 2007) operator is proposed to regulate the inertia weight of velocity by means of the fitness value of object function and iterative variable. The normal gauss mutation operator (Hansen, 1996) is considered to correct the direction of particle velocity at the same time. The adaptive mutation is highly efficient operator on the conditions of real number code. The solution quality is related tightly with the mutation operator. The aforementioned problem is addressed by incorporating adaptive mutation and normal mutation for the previous velocity of the particle. Thus, the PSO with self-adaptive and normal gauss mutation operators (ANPSO) can update the velocity and particle position by using the following equations:

v kþ1 ¼ ð1  kÞwkid v kid þ kN id



     0; rki þ c1 r1 pid  xkid þ c2 r 2 pgd  xkid ð3Þ

xkþ1 ¼ xkid þ v kþ1 id id      2 k wid ¼ b 1  f xki =f xkm þ ð1  bÞw0id expðak Þ kþ1 i

r

k i

¼ r exp ðNi ð0; MrÞÞ

ð4Þ ð5Þ ð6Þ

where Mr is standard error of normal gauss distribution, b is the self-adaptive coefficient, k is increment coefficient, a is the coefficient of controlling particle velocity attenuation. The first item of Eq. (3) denotes the self-adaptive mutation of velocity inertia weight mutation based on iterative variable and the fitness function value. The particles with the bigger fitness mutate in a smaller scope, while the ones with the smaller fitness mutate in a big scope by Eq. (5). The second item of Eq. (3) represents the normal gauss mutation based on the iterative variable. The particles mutate in big scope on the smaller iterative variable and search the local optimal value, while the particles mutate in small scope on the bigger iterative variable, search the optimal value in small space and gradually reach the global optimal value. The operator of normal gauss mutation correct the change of particle velocity is represented in Eqs. (3) and (6). In the strategy of normal gauss  mutation, the proposed velocity vector v kþ1 ¼ v kþ1 ; v kþ1 ; . . . ; v kþ1 1  k k2 m k k consists of last generation velocity vector v ¼ v ; v 1 2 ; . . . ; v m and   gauss perturbation vector rk ¼ rk1 ; rk2 ; . . . ; rkm . The gauss perturbation vector mutates itself by Eq. (6) on the each iterative process as a controlling vector of velocity vector. The self-adaptive and normal mutation operators can restore the diversity loss of the population and improve the capacity of the global search of the algorithm. 3.2. The embedded chaotic PSO (ECPSO) Generating random sequences with a long period and good uniformity is very important for easily simulating complex phenomena, sampling, numerical analysis, decision making and especially in heuristic optimization. Its quality determines the reduction of storage and computation time to achieve a desired accuracy. Generated such sequences may be ‘‘random” enough for one application however may not be random enough for another. Chaos is a deterministic, random-like process found in non-linear, dynamical system, which is non-period, non-converging and bounded. Moreover, it has a very sensitive dependence upon its initial condition and parameter. The nature of chaos is apparently random and unpredictable and it also possesses an element of regularity. Mathematically, chaos is randomness of a simple deterministic dynamical system and chaotic system may be considered as sources of randomness. A simplest chaotic mapping which was brought to the attention of scientists by May (1976) that appears in nonlinear dynamics of biological population evidencing chaotic behavior is logistic mapping, whose equation is the following:

X nþ1 ¼ lX n ð1  X n Þ

ð7Þ

186

Q. Wu / Expert Systems with Applications 38 (2011) 184–192

In Eq. (7), Xn is the nth chaotic number where n denotes the iteration number. Obviously, Xn 2 (0, 1) under the conditions that the initial X0 2 (0, 1) and that X0 R {0.0, 0.25, 0.5, 0.75, 1.0}. l = 4 has been used in the experiments. Chaotic sequences have been proven easy and fast to generate and store, there is no need for storage of long sequences. Merely a few functions (chaotic mappings) and few parameters (initial conditions) are needed even for very long sequences. In addition, an enormous number of different sequences can be generated simply by changing its initial condition. Moreover these sequences are deterministic and reproducible. On the basis of analysis on the aforementioned chaotic system theory, the embedded chaotic particle swarm optimization (ECPSO) is proposed in this paper. The proposed ECPSO consists of two ANPSO dwelled in father process and child process respectively. The local optimal particle is obtained from child process, while the global optimal particle is obtained from father process. Child chaotic colony of each particle from child process consists of sequences generated from chaotic mapping. The local optimal particle obtained from child process substitutes the original particle from father process where it is necessary to make a random-based choice. By this way, it is intended to improve the global convergence and to prevent to stick on a local solution. In fact, however, these particles cannot ensure the optimization’s ergodicity entirely in phase space, because they are random in traditional PSO. 3.3. The procedure of ECPSO There are consists of father and child processes in the proposed ECPSO. There is an ANPSO in each process. Child process obtains the optimal particle of child chaotic colony of current particle by means of Algorithm 2, then, send the local optimal chaotic particle into father process and substitute the current particle. Child process is carried out for n(the steps size of particles) steps, then n optimal chaotic particles are obtained, sent into the father process and substitute the original random particle of father process. The global optimal particle is given by father process by means of Algorithm 1. The ECPSO is described in steps as follows: Algorithm 1. [Embeded chaotic particle swarm optimization (Father process)] Step 1: Data preparation: Training, validation, and test sets are represented as Tr, Va, and Te, respectively. Step 2: Particle initialization and ECPSO parameters setting: Generate initial particles. Set the ANPSO parameters including number of particles (n), particle dimension (m), number of maximal iterations (kmax), error limitation of the fitness function (obj_lit), velocity limitation (Vmax), and inertia weight for particle velocity (w0), normal gauss   distribution (N(0, Mr)), the perturbation momentum r0i , the coefficient of controlling particle velocity attenuation (a), selfadaptive coefficient (b ), increment coefficient (k). Set iterative variable: k = 0. And perform the training process from step3-8. Step 3: Go to Algorithm 2 for n times, then produce n particles with chaotic characteristic and substitute the initial n particles. The new particle swarm consists of the n particles with chaotic characteristic. Step 4: Set iterative variable: k = k + 1. Step 5: Computer the fitness function value of each particle of the new particle swarm from step 3. Take current particle as individual extremum point of every particle and do the particle with minimal fitness value as the global extremum point.

Step 6: Stop condition checking: If stopping criteria (maximum iterations predefined or the error accuracy of the fitness function) are met, go to step 9. Otherwise, go to the next step. Step 7: Adopt the self-adaptive mutation operator by Eq. (5) and normal gauss mutation operator by Eq. (6) to manipulate particle velocity. Step 8: Update the particle position by Eqs. (3) and (4) and form new particle swarms, go to step 3. Step 9: End the training procedure, output the optimal particle.

Algorithm 2. [Embeded chaotic particle swarm optimization (Child process)] Step 1: Child chaotic particle initialization: Normalize initial value of velocity and position of the original particles from father process into the scope of [0, 1]. Use the normalized velocity and position of the original particles as X0 of the chaotic mapping Eq. (7), then generate sequences by logistic mapping Eq. (7) and form child chaotic colony of current particle. Step 2: Parameters setting: Set the particle parameters of child chaotic colony including number of particles (n), particle dimension (m), number of maximal iterations (sub_kmax), error limitation of the fitness function (sub_obj_lit), velocity limitation (sub_Vmax), and inertia weight for particle velocity (w0), normal gauss distribution (N(0, Mr)), the perturbation momentum(r0i Þ, the coefficient of controlling particle velocity attenuation (a), self-adaptive coefficient (b), increment coefficient (k). Set iterative variable: sub_k = 0, and perform the training process from step3–8. Step 3: Set iterative variable: sub_k = sub_k + 1. Step 4: Computer the fitness function value of each particle from child chaotic colony. Take current particle as individual extremum point of every particle and do the particle with minimal fitness value as the global extremum point. Step 5: Stop condition checking: If stopping criteria (maximum iterations predefined or the error accuracy of the fitness function) are met, go to step 8. Otherwise, go to the next step. Step 6: Adopt the self-adaptive mutation operator by Eq. (5) and normal gauss mutation operator by Eq. (6) to manipulate particle velocity. Step 7: Update the particle position by Eqs. (3) and (4) and form new particle swarms, go to step 3. Step 8: End the training procedure, restore the optimal particle into the original data scope, output the restored optimal particle.

4. Wavelet v-support vector machine (Wv-SVM) Wavelet support vector machine (WSVM) (Vapnik, 1995; Widodo & Yang, 2008; Zhang, Zhou, & Jiao, 2004) is one of the most important achievements of statistical learning theory. The key idea of WSVM is that it firstly maps the samples into the high dimension feature space by wavelet kernel function, and then finds the support vectors and their corresponding coefficients by solving a quadratic programming problem based on the rule of structure risk minimization and dual principle, and finally uses these support vectors and coefficients to construct the classifier (Shih & Liu, 2006). The classifier constructed by this method can not only classify out the samples but also has a maximal margin between the two classes.

187

Q. Wu / Expert Systems with Applications 38 (2011) 184–192

Z

4.1. The conditions of wavelet support vector’s kernel function

b wðwÞ ¼

Let us consider a set of data points (x1, y1), (x2, y2), . . . , (xl, yl), which are independently and randomly generated from an unknown function. Specifically, xi is a column vector of attributes, yi is a scalar, which represents the dependent variable, and l denotes the number of data points in the training set. The support vector’s kernel function can be described as not only the product of point, such as K(x, x0 ) = K(), but also the horizontal floating function, such as K(x, x0 ) = K(x  x0 ). In fact, if a function satisfied condition of Mercer, it is the allowable support vector kernel function.

For the above Eq. (13), Cw is a constant with respect to w (x). The theory of wavelet decomposition is to approach the function f(x) by the linear combination of wavelet function group. If the wavelet function of one dimension is w(x), using tensor theory, the multidimensional wavelet function can be defined as:

Theorem 1. The symmetry function K(x, x0 ) is the kernel function of SVM if and only if: For all function g – 0 which satisfies the condition R of Rd g 2 ðnÞdn < 1, we need satisfy the condition as follows:

Kðx; x0 Þ ¼

Z Z

Kðx; x0 ÞgðxÞgðxÞdxdx0 P 0

ð8Þ

This theorem proposed a simple method to build kernel function. For the horizontal floating function, because hardly dividing this function into two same functions, we can give the condition of horizontal floating kernel function.

wd ðxÞ ¼

wðxÞ expðjwxÞdx:

d Y

ð14Þ

wðxi Þ

ð15Þ

i¼1

We can build the horizontal floating kernel function as follows:

  d Y xi  x0i w ai i¼1

ð16Þ

where ai is the scaling parameter of wavelet, ai > 0. So far, because the wavelet kernel function must satisfy the conditions of Theorem 2, the number of wavelet kernel function which can be showed by existent functions is few. Now, we give an existent wavelet kernel function: Morlet wavelet kernel function, and we can prove that this function can satisfy the condition of allowable support vector’s kernel function. Morlet wavelet function is defined as follows: x2

wðxÞ ¼ cosðx0 xÞe 2

ð17Þ

Theorem 2. The horizontal floating function is allowable support vector’s kernel function if and only if the Fourier transform of K(x) need satisfy the condition follows:

F½xðxÞ ¼ ð2pÞn=2

Theorem 3. Morlet wavelet kernel function is defined as:

Z Rn

expðjðx:xÞÞKðxÞdx P 0

ð9Þ 0

Kðx; x Þ ¼

d Y i¼1

4.2. Wavelet kernel function

  !  xi  x0 2 xi  x0i i cos x0  exp  a 2a2 

ð18Þ

and this kernel function is an allowable support vector kernel function. If the wavelet function w(x) satisfied the conditions: b b is the Fourier transform of w(x) 2 L2(R) \ L1(R), and wðxÞ ¼ 0, w function w(x). The wavelet function group can be defined as:

x  m 1 wa;m ðxÞ ¼ ðaÞ2 w a

ð10Þ

where a is the so-called scaling parameter, m is the horizontal floating coefficient, and w(x) is called the ‘‘mother wavelet”. The parameter of translation m 2 R and dilation a > 0, may be continuous or discrete. For the function f(x), f(x) 2 L2(R), The wavelet transform f(x) can be defined as: 1

Wða; mÞ ¼ ðaÞ2

Z

þ1

f ðxÞw 1

x  m dx a

ð11Þ

where w*(x) stands for the complex conjugation of w(x). The wavelet transform W(a,m) can be considered as functions of translation m with each scale a. Eq. (11) indicates the wavelet analysis is a time-frequency analysis, or a time-scaled analysis. Different from the Short Time Fourier Transform (STFT), the wavelet transform can be used for multi-scale analysis of a signal through dilation and translation so it can extract time-frequency features of a signal effectively. Wavelet transform is also reversible, which provides the possibility to reconstruct the original signal. A classical inversion formula for f(x) is:

f ðxÞ ¼ C 1 w

Cw ¼

Z

1

1

where

Z

þ1

1

Z

þ1

Wða; mÞwa;m ðxÞ

1

2 b j wðwÞj dw < 1; jwj

da dm; a2

ð12Þ

Proof. According to the Theorem 2, we only need to prove

F½xðxÞ ¼ ð2pÞn=2 Qd

expðjðx:xÞÞKðxÞdx P 0

Rn

xi 

Qn

where KðxÞ ¼ i¼1 w a ¼ i¼1 cos nary number unit. We have

Z Rn

w0 xi  a

2

eðkxi k

=2a2 Þ

ð19Þ , j denotes imagi-

expðjxxÞKðxÞdx

!!  x kxi k2 i exp  expðjxxÞ cos w0 dx ¼ a 2a2 Rn i¼1   d Z 1 Y expðjw0 xi =aÞ þ expðjw0 xi =aÞ expðjxi xi Þ ¼ 2 1 i¼1 ! Z 1 2 d Y kxi k 1 kxi k2 dxi ¼ exp   exp  2 2 1 2a 2 i¼1      !! 2 w0 j kxi k w0 j þ  jxi a xi þ exp  þ jxi a xi  a a 2 ! !! pffiffiffiffiffiffiffi 2 d Y jaj 2p ðw0  xi aÞ ðw0 þ xi aÞ2 þ exp  ¼ exp  2 2 2 i¼1 Z

d Y

ð20Þ Substituting formula (20) into Eq. (19), we can obtain Eq. (21).

F½XðxÞ ¼ ð13Þ

Z

 d  Y jaj i¼1

2

ðw0  xi aÞ2 exp  2

!

ðw0 þ xi aÞ2 þ exp  2

!!

ð21Þ where a – 0, we have

188

Q. Wu / Expert Systems with Applications 38 (2011) 184–192

F½xðxÞ P 0

ð22Þ

If we use wavelet kernel function as the support vector’s kernel function, the estimation function of Wv-SVM is defined as:

f ðxÞ ¼

l X  i¼1

ai  ai

1 min sðw; n ; eÞ ¼ kwk2 þ C  2 w;nðÞ ;e;b ðÞ

l   1X v eþ n þ ni l i¼1 i

! ð24Þ

Subject to

d x  x  Y i w þb a i¼1

ð23Þ ððwT  xi Þ þ bÞ  yi 6 e þ ni

ð25Þ

ni

ð26Þ

T

For wavelet analysis and theory, see (Krantz, 1994; Liu & Di, 1992; Zhang et al., 2004). h

yi  ððw  xi Þ þ bÞ 6 e þ

4.3. wavelet v-support vector machine(Wv-SVM)

where w and xi are a column vector with d dimensions, C > 0 is a ðÞ penalty factor, ni ði ¼ 1; . . . ; lÞ are slack variables and m 2 (0,1] is an adjustable regularization parameter. Problem (24) is a quadratic programming (QP) problem. By means of the Wolfe principle, wavelet kernel function technique and Karush–Kuhn–Tucker (KKT) conditions, we have the duality problem (28) of the original optimal problem (24).

Combining the wavelet kernel function with v-support vector machine, we can build a new SVM learning algorithm that is v-support vector machine on wavelet kernel function (Wv-SVM). For a set of data points (x1, y1), (x2, y2), . . . , (xl, yl), the wavelet v-support vector machine (Wv-SVM) can be reformulated as:

ðÞ

ni P 0; e P 0; b 2 R

ð27Þ

Output the current combinational parameters

Accuracy check

v

Output the optimal combinational paramters Fig. 1. The ECPSO optimizes the parameters of Wv-SVM.

Table 1 Influencing factors of product sale forecasting. Product characteristics

Unit

Expression

Weight

Brand famous degree (BF) Performance parameter (PP) Form beauty (FB) Sales experience (SE) Oil price (OP) Dweller deposit (DD)

Dimensionless Dimensionless Dimensionless Dimensionless Dimensionless Dimensionless

Linguistic Linguistic Linguistic Linguistic Linguistic Linguistic

0.9 0.8 0.8 0.5 0.8 0.4

information information information information information information

189

Q. Wu / Expert Systems with Applications 38 (2011) 184–192

max wða; a Þ ¼   a ;a

l l  X       1X ai  ai aj  aj Kðxi  xj Þ þ ai  ai yi 2 i;j¼1 i¼1

ð28Þ s:t 0 6 ai ; ai 6 l X 



C l

ai þ ai 6 C  v

ð29Þ ð30Þ

SVM model.Then, Wv-SVM output function is described as following:

f ðxÞ ¼

l X 

 i

ai  a

i¼1

d Y

xj  xji w a i¼1

! þb

ð31Þ

where w(x) is wavelet transform function, a is the scaling parameter of wavelet, a > 0.

i¼1

4.4. The intelligence forecasting system Select the appropriate parameters C and v, and the wavelet kernel function whose wavelet transform function can match well the original series in some scope of scales as the kernel function of Wv-

The confirmation of unknown parameters of the Wv-SVM is complicated process. In fact, it is a multivariable optimization problem in a continuous space. The appropriate parameter combi-

Fig. 2. Mexican hat wavelet transform of sales time series in the scope of different scale.

Fig. 3. Morlet wavelet transform of sales time series in the scope of different scale.

190

Q. Wu / Expert Systems with Applications 38 (2011) 184–192

nation of models can enhance approximating degree of the proposed model. Therefore, it is necessary to select an intelligence algorithm to get the optimal parameters of the proposed model. The parameters of Wv-SVM have a great effect on the generalization performance of WNv-SVM. An appropriate parameter combination corresponds to a high generalization performance of the Wv-SVM. PSO is considered as an excellent technique to solve the combinatorial optimization problems. The proposed ECPSO is used to determine the parameters of Wv-SVM. The intelligence system shown in Fig. 1 based on the ECPSO and Wv-SVM model can evaluate the performance of ECPSO by forecasting time series. The different Wv-SVMS in the different Hilbert spaces are adopted to forecast the product sale time series. For each particular region only the most adequate Wv-SVM with the optimal parameters is used for the final forecasting. To valuate forecasting capacity of the intelligence system, the fitness function of ECPSO is designed as follows:

fitness ¼

2 l  1X yi  yi l i¼1 yi

ð32Þ

where l is the size of the selected sample, yi denote the forecasting value of the selected sample, yi is original date of the selected sample.

5. Experiment Wavelet support vector machine has been applied into supply chain demand forecasting (Carbonneau, Laframbois, & Vahidov, 2008). To analyze the performance of the proposed ECPSO algorithm, the forecast of car sale series by means of the intelligence system based on ECPSO and Wv-SVM is studied. For finding some performance of ECPSO, the standard PSO and ANPSO also are adopted to optimize the parameters of Wv-SVM. To evaluate forecasting capacity of the intelligent system, some evaluation indexes, such as mean absolute error (MAE), mean absolute percentage error (MAPE) and mean square error (MSE), are adopted to deal with the forecasting result of ECPSOWv-SVM, APSOWv-SVM and PSOWv-SVM. The car is a type of consumption product influenced by macroeconomic in manufacturing system and its sale action is usually driven by many uncertain factors. Some factors with large influencing weights are gathered to develop a factor list, as shown in Table 1. The six influencing factors are expressed as linguistic information. In our experiments, car sale series are selected from past sale record in a typical company. The detailed characteristic data and sale series of these cars compose the corresponding training and

Fig. 4. Gaussian wavelet transform of sales time series in the scope of different scale.

Fig. 5. The car sales forecasting result by means of ECPSOWv-SVM model.

191

Q. Wu / Expert Systems with Applications 38 (2011) 184–192

testing sample sets. During the process of the car scale series forecasting, six influencing factors, viz., brand famous degree (BF), performance parameter (PP), form beauty (FB), sales experience (SE), oil price (OP) and dweller deposit (DD) are taken into account. All linguistic information of gotten influencing factors is dealt with fuzzy comprehensive evaluation and forms numerical information. The proposed ECPSO has been implemented in Matlab 7.1 programming language. The experiments are made on a 1.80 GHz Core(TM) 2 CPU personal computer (PC) with 1.0 G memory under Microsoft Windows xp professional. The initial father process parameters of ECPSO are given as follows: number of particles: n = 100; particle dimension: m = 6; inertia weight: w0 = 0.9; positive acceleration constants: c1, c2 = 2; the maximal iterative number: kmax = 100; the standard error of normal gauss distribution: Mr = 0.5; the self-adaptive coefficient:b = 0.8; increment coefficient:k = 0.1; the fitness accuracy of the normalized samples is equal to 0.005; the coefficient of controlling particle velocity attenuation: a = 2. The initial child process parameters of ECPSO are given as follows: inertia weight: w0 = 0.9; positive acceleration constants: c1, c2 = 2; the maximal iterative number:kmax = 100; the standard error of normal gauss distribution: Mr = 0.5; the self-adaptive coefficientb = 0.8; increment coefficient:k = 0.1; the fitness accuracy of the normalized samples is equal to 0.005; the coefficient of controlling particle velocity attenuation: a = 2. The Morlet, Mexican hot and Gaussian wavelet are selected to analyze the sale series on the different scales shown in Figs. 2–4. Morlet wavelet transform is the best wavelet transform that can inosculate the original sale series on the scope of scale from 0 to 4 among all given wavelet transforms. Therefore, Morlet wavelet can be ascertained as a kernel function of Wv-SVM model, three parameters also are determined as follows:

v 2 ½0; 1;

a 2 ð0; 4 and maxðxi;j Þ  minðxi;j Þ maxðxi:j Þ  minðxi;j Þ C2  103 ;  103 : l l

The optimal combinational parameters are obtained by Algorithm ECPSO, viz.,C = 638.3252, v = 0.9759 and a = 0.0715. Fig. 5 illuminates the sale series forecasting result given by ECPSO Wv-SVM. For analyzing the parameter searching capacity of ECPSO, the ANPSO and standard PSO are used to optimize parameters of Wv-SVM by training the original sale series, then give the latest 12 months forecasting results of each model shown in Table 2 The comparison among ECPSO, ANPSO and PSO optimizing the parameters of the same model (Wv-SVM) is shown in Table 3. The Table 3 shows the dealt error indexes distribution from two different models. The ME, MAPE and MSE of ECPSOWv-SVM are

Table 3 Error statistic of three forecasting models. Model

MAE

MAPE

MSE

PSOWv-SVM ANPSOWv-SVM ECPSOWv-SVM

227.58 198.83 42.42

0.1517 0.1119 0.0561

286480 299740 2520

better than ones of PSOWv-SVM and ANPSOWv-SVM. It is obvious that self-adaptive and normal mutation operators can improve the global search ability of particle swarm optimization algorithm, and the chaotic mapping can enlarge the diversity of particle colony and ergodicity of searching space. The ECPSO is then applied to the parameters selection of WvSVM. Experiment results show that the forecast’s precision is improved by ECPSO, compared with ANPSO and PSO under the same conditions.

6. Conclusion In this paper, a new version of PSO, named ECPSO, is proposed to optimize the parameters of wavelet v-support vector machine. The performance of the ECPSOWv-SVM is evaluated by means of forecasting the data of car sales, and the simulation results demonstrate that the ECPSO is effective in dealing with many dimensions, nonlinearity and finite samples. Moreover, it is shown that ECPSO presented here is available for the Wv-SVM to seek optimized parameters. ECPSO introduces chaotic mappings with ergodicity, irregularity and the stochastic property into ANPSO to improve the global convergence by escaping the local solutions. The use of chaotic sequences in ECPSO can be helpful to escape more easily from local minima than can be done through the traditional PSO and ANPSO. In our experiments, the fixed self-adaptive coefficients (b, k), the second step control parameter Mr of normal mutation, and the parameter a of control the velocity attenuation are adopted. However, how to choose an appropriate coefficient is not described in this paper. The research on the velocity changes when different above parameters are adopted is a meaningful problem for future research. Acknowledgements This research was partly supported by the National Natural Science Foundation of China under Grant 60904043, a research grant funded by the Hong Kong Polytechnic University, China Postdoctoral Science Foundation (20090451152), Jiangsu Planned Projects for Postdoctoral Research Funds (0901023C) and Southeast University Planned Projects for Postdoctoral Research Funds.

Table 2 Comparison of forecasting result from three different models. The latest 12 months

Real value

1 2 3 4 5 6 7 8 9 10 11 12

1892 868 1704 836 1352 972 1382 447 1470 567 1267 573

References

Forecasting value PSOWv-SVM

ANPSOWv-SVM

ECPSOWv-SVM

1912 998 1750 938 1433 1057 1439 619 1477 692 1357 733

1853 937 1689 876 1332 996 1336 558 1416 630 1296 671

1838 922 1675 862 1358 981 1422 543 1422 616 1281 657

Carbonneau, R., Laframbois, K., & Vahidov, R. (2008). Application of machine learning techniques for supply chain demand forecasting. European Journal of Operational Research, 184, 1140–1154. Chen, Z. Y., Zhang, X. F., & Bi, Q. S. (2008). Bifurcations and chaos of coupled electrical circuits. Nonlinear Analysis: Real World Applications., 9, 1158–1168. Chuanwena, J., & Bompard, E. (2005). A hybrid method of chaotic particle swarm optimization and linear interior for reactive power optimization. Mathematics and Computers in Simulation., 68, 57–65. Hansen, N. (1996). Adapting arbitrary normal mutation distributions in evolution strategies: The covariance matrix adaptation. In Proceedings of the IEEE Conference on Evolutionary Computation, Nagoya (pp. 312–317). Kennedy, J., & Eberhart, R. (1995). Particle swarm optimization. In IEEE international conference on neural networks, Perth, Australia (pp. 1942–1948). Krantz, S. G. (Ed.). (1994). Wavelet: Mathematics and application. Boca Raton, FL: CRC.

192

Q. Wu / Expert Systems with Applications 38 (2011) 184–192

Krusienski, D. J. (2006). A modified particle swarm optimization algorithm for adaptive filtering. In IEEE International symposium on circuits and systems, Kos, Greece (pp. 137–140). Liu, G. Z., & Di, S. L. (1992). Wavelet analysis and application. Xi’an,China: Xidian Univ. Press. May, R. (1976). Simple mathematical models with very complicated dynamics. Nature., 261, 45–67. Pareek, N. K., Patidar, V., & Sud, K. K. (2006). Image encryption using chaotic logistic map. Image and Vision Computing., 24, 926–934. Shih, P., & Liu, C. J. (2006). Face detection using discriminating feature analysis and support vector machine. Pattern Recognition, 39, 260–276. Vapnik, V. (1995). The nature of statistical learning. New York: Springer.

Widodo, A., & Yang, B. S. (2008). Wavelet support vector machine for induction machine fault diagnosis based on transient current signal. Expert Systems with Applications, 35, 307–316. Yamaguchi, T. (2007). Adaptive particle swarm optimization – Self-coordinating mechanism with updating information. IEEE International Conference on Systems, Man and Cybernetics, Taipei, Taiwan, 3, 2303–2308. Yang, X. H., Yang, Z. F., Yin, X. A., & Li, J. Q. (2008). Chaos gray-coded genetic algorithm and its application for pollution source identifications in convection– diffusion equation. Communications in Nonlinear Science and Numerical Simulation., 13, 1676–1688. Zhang, L., Zhou, W. D., & Jiao, L. C. (2004). Wavelet support vector machine. IEEE Transactions on Systems, Man, and Cybernetics – Part B: Cybernetics, 34, 34–39.