Regression application based on fuzzy ν-support vector machine in symmetric triangular fuzzy space

Regression application based on fuzzy ν-support vector machine in symmetric triangular fuzzy space

Expert Systems with Applications 37 (2010) 2808–2814 Contents lists available at ScienceDirect Expert Systems with Applications journal homepage: ww...

471KB Sizes 0 Downloads 33 Views

Expert Systems with Applications 37 (2010) 2808–2814

Contents lists available at ScienceDirect

Expert Systems with Applications journal homepage: www.elsevier.com/locate/eswa

Regression application based on fuzzy m-support vector machine in symmetric triangular fuzzy space Qi Wu * School of Mechanical Engineering, Southeast University, Nanjing, Jiangsu 210096, China Key Laboratory of Measurement and Control of CSE (School of Automation, Southeast University), Ministry of Education, Nanjing, Jiangsu 210096, China

a r t i c l e

i n f o

Keywords: Fuzzy m-support vector machine Triangular fuzzy number Particle swarm optimization Sale forecasts

a b s t r a c t This paper presents a new version of fuzzy support vector machine to forecast multi-dimension time series. Since there exist some problems of finite samples and uncertain data in many forecasting problem, the input variables are described as real numbers by fuzzy comprehensive evaluation. To represent the fuzzy degree of these input variables, the symmetric triangular fuzzy technique is adopted. Then by combining the fuzzy theory with m-support vector machine, the fuzzy m-support vector machine (Fm-SVM) on the triangular fuzzy space is proposed. To seek the optimal parameters of Fm-SVM, particle swarm optimization is also proposed to optimize parameters of Fm-SVM. The results of the application in sale forecasts confirm the feasibility and the validity of the Fm-SVM model. Compared with the traditional model, Fm-SVM method requires fewer samples and has better forecasting precision. Ó 2009 Elsevier Ltd. All rights reserved.

1. Introduction The sale forecast (estimation) is a complex dynamic process, and the duration of this process is affected by many factors. Most of these factors have the random, fuzzy, and uncertain characteristics. There is a kind of nonlinear mapping relationship between the factors and sale series, and it is difficult to describe the relationship by definite mathematical models. Traditionally, approximate sale quantity is determined by qualitative analysis in companies. With the development of computer techniques and estimation techniques, some quantitative estimation methods are put forward using traditional regression analysis. Moreover, there exist some weaknesses in the modeling processes, such as prior assumption about the properties of the model and simplification of parameters. Therefore, the practicability and accuracy of these models are limited, and there is a demand for more accurate and general methods. On the other hand, intelligent techniques, such as neural network (Chang, Wang, & Liu, 2007; Khashei, Hejazi, & Bijari, 2008; Kuo & Xue, 1999; Yu & Zhang, 2005), fuzzy logic (Arciniegas & Arciniegas Rueda, 2008), and expert system (Popescu et al., in press), have developed fast and found a great deal successful applications. Unlike the statistical models, this FNN is a data-driven and nonparametric weak model. Thus, the FNN performs well in the problem of time series forecast (estimation) when the sample data are sufficient (Chang et al., 2007; Khashei et al., 2008; Kuo & Xue, 1999; Yu

* Contact address: School of Mechanical Engineering, Southeast University, Nanjing, Jiangsu 210096, China. Tel.: +86 25 51166581; fax: +86 25 511665260. E-mail address: [email protected] 0957-4174/$ - see front matter Ó 2009 Elsevier Ltd. All rights reserved. doi:10.1016/j.eswa.2009.09.010

& Zhang, 2005). Nevertheless, the available pre-existing sale data in companies are often finite. Under this condition, the approximation ability and generalization performance of the FNN are poor. To overcome this disadvantage, a new approach should be explored. Recently, a novel machine-learning technique, called support vector machine (SVM), has drawn much attention in the fields of pattern classification and regression estimation. SVM was first introduced by Vapnik and his colleagues in 1995 (Vapnik, 1995). It is an approximate implementation to the structure risk minimization (SRM) principle in statistical-learning theory, rather than the empirical risk minimization (ERM) method. This SRM principle is based on the fact that the generalization error is bounded by the sum of the empirical error and a confidence interval term depending on the Vapnik–Chervonenkis (VC) dimension (Vapnik, 1995). By minimizing this bound, good generalization performance can be achieved. Compared with traditional neural networks, SVM can obtain a unique global optimal solution and avoid the curse of dimensionality. These attractive properties make SVM become a promising technique. SVM was initially designed to solve pattern recognition problems (Chen & Hsieh, 2006; Chen, Li, Harrison, & Zhang, 2008; Chiu & Chen, 2009; Chen et al., 2008b; Jayadeva, Khemchandani, & Chandra, 2004; Juang et al., in press; Li et al., 2008; Min & Cheng, 2009; Shieh & Yang, 2008; Tang & Qu, 2008a; Tsujinishi & Abe, 2003; Yang, Jin, & Chuang, 2006). Recently, with the introduction of Vapnik’s e-insensitive loss function, SVM has been extended to function approximation and regression estimation problems (Bao et al., 2005; Dong, Yang, & Wu, 2007; Wu, 2009; Wu, Yan, & Yang, 2008a, 2008b). In many real applications, the observed input data cannot be measured precisely and usually described in linguistic levels or

2809

Q. Wu / Expert Systems with Applications 37 (2010) 2808–2814

ambiguous metrics. However, traditional support vector regression (SVR) method cannot cope with qualitative information. It is well known that fuzzy logic is a powerful tool to deal with fuzzy and uncertain data. Some scholars have explored the fuzzy support vector machine (FSVM). There exist some errors inevitably in the sample data of multi-dimensional SVM, due to uncertain factors during the gather of sample data. Some standard SVM literatures do not consider the potential error, this leads the final forecasting results maybe not to approach the real values closely. And then, this paper proposes an error interval around the original each sample point, these errors will be considered into the optimal problem of SVM, then a novel fuzzy SVM is proposed in this paper. To represent the errors, the symmetric triangular fuzzy numbers is also proposed. Moreover, the published literatures put little attention on FSVM in symmetric triangular fuzzy number space. Therefore, this paper focuses on the establishment of a novel fuzzy support vector machine in triangular fuzzy number space. In SVM approach, the parameter e controls the sparseness of the solution in an indirect way. However, it is difficult to come up with a reasonable value of e without the prior information about the accuracy of output values. Schölkopf, Smola, Williamson, and Bartlett (2000) modified the original e-SVM and introduced m-SVM, where a new parameter v controls the number of support vectors and the points that lie outside of the e-insensitive tube. Then, the value of e in the m-SVM is traded off between model complexity and slack variables via the constant v. In this paper, we put forward a new FSVM, called Fm-SVM. Based on the Fm-SVM, a forecasting method for multi-dimension time series is proposed. The rest of this paper is organized as follows. The Fm-SVM is described in Section 2. In Section 3, PSO is introduced to seek the parameters of Fm-SVM. In Section 4, a forecasting method based on Fm-SVM is proposed. Section 5 gives an application in multi-dimensional sale series forecast. Fm-SVM is also compared with ARIMA and m-SVM. Section 6 draws some conclusions.

μ

A

1

λ

0

A(λ )

rA − ΔrA

B

A(λ )

B (λ )

rA rA + ΔrA rB − ΔrB

B (λ )

rB

rB + ΔrB

x

Fig. 1. The k-cuts of two triangular fuzzy numbers.

The k-cut of a fuzzy number is always a closed and bounded interval. By the Hausdorff distance of real numbers, we can define a metric in TðRÞ as

DðA; BÞ ¼ sup maxfjAðkÞ  BðkÞj; jAðkÞ  BðkÞjg

ð2Þ

k

where Ak ¼ ½AðkÞ; AðkÞ and Bk ¼ ½BðkÞ; BðkÞ are k-cuts of two fuzzy numbers. Theorem 1. In T(R), the Hausdorff metric can be obtained as follows:

DðA; BÞ ¼ maxfjðr A  DrA Þ  ðr B  Dr B Þj; jrA  rB j; jðr A þ Dr A Þ  ðrB þ Dr B Þjg

ð3Þ

Proof. The lower boundary of the k-cut of A meets the following formula



AðkÞ  ðr A  Dr A Þ : Dr A



ð4Þ

Then, we have AðkÞ ¼ r A þ ðk  1ÞDrA . In the same way, we can obtain AðkÞ ¼ rA þ ð1  kÞDr A ; BðkÞ ¼ r B þ ðk  1ÞDr B and BðkÞ ¼ r B þ ð1  kÞDrB . According to the definition of Eq. (2), we get

2. Fuzzy m-support vector machine (Fm-SVM) 2.1. Triangular fuzzy theory

DðA; BÞ ¼ sup maxfjðr A  rB Þ þ ðk  1ÞðDr A  Dr B Þj; jðr A  r B Þ k

Suppose M e T(R) is triangular fuzzy number (TFN) in triangular fuzzy space, whose membership function is represented as follows:

8 xaM ; aM  x < r M > < rM aM x ¼ rM ; lM ðxÞ ¼ 1; > : xbM ; r  x < b M M r M bM

þ ð1  kÞðDrA  Dr B Þjg  ¼ max sup jðrA  r B Þ þ ðk  1ÞðDr A  DrB Þj; sup jðrA  rB Þ k k o þð1  kÞðDr A  Dr B Þj ð5Þ

ð1Þ

where aM 6 rM < bM, aM, rM, bM e R, aM 6 x < bM, x e R. Then, we have the formulation M = (aM, rM, bM) in which rM is the center, aM is the left boundary, and bM is the right boundary. The standard triangular fuzzy number is difficult to deal with input variables of SVM, the extended version is considered and described as following:

For the given triangular fuzzy numbers A and B, ðr A  rB Þ þ ðk  1ÞðDr A  Dr B Þ and ðr A  rB Þ þ ð1  kÞðDr A  Dr B Þ are two linear functions of k. As k 2 ½0; 1, the following formulas must hold:

sup jðr A  r B Þ þ ðk  1ÞðDr A  Dr B Þj k

¼ maxfjðr A  DrA Þ  ðr B  DrB Þj; jrA  rB jg

Definition 1. Extended triangular fuzzy number (ETFN)

ð6Þ

sup jðr A  r B Þ þ ð1  kÞðDr A  Dr B Þj M ¼ ðrM ; DrM ; DrM Þ is extended triangular fuzzy number (ETFN) in which rM e R is the center, DrM = rM – aM is the left spread, and DrM ¼ bM  rM is the right spread. Let A ¼ ðrA ; Dr A ; Dr A Þ and B ¼ ðrB ; DrB ; DrB Þ be two ETFNs, whose k-cuts are shown in Fig. 1. In the space T(R) of all ETFNs, we define linear operations by the extension principle: A þ B ¼ ðrA þ rB ; maxðDr A ; Dr B Þ; maxðDr A ; DrB ÞÞ, kA ¼ ðkr A ; Dr A ; Dr A Þ if k P 0, kA ¼ ðkrA ; DrA ; DrA Þ if k < 0, and A  B ¼ ðr A  r B ; maxðDr A ; DrB Þ; maxðDrA ; Dr B ÞÞ. k-cut of A can be labeled as Ak ¼ ½AðkÞ; AðkÞ for k 2 ½0; 1, where AðkÞ and AðkÞ are two boundaries of the k-cut, as shown in Fig. 1.

k

¼ maxfjðr A þ DrA Þ  ðr B þ DrB Þj; jrA  rB jg:

ð7Þ

Substituting (6) and (7) into (4), we can obtain Eq. (3). This completes the proof of Theorem 1. Deduction 1. If A and B are two symmetric triangular fuzzy numbers in T(R), where A = (rA, DrA) and B = (rB, DrB), then the Hausdorff metric of A and B can be written as

DðA; BÞ ¼ maxfjðr A  DrA Þ  ðr B  Dr B Þj; jðr A þ Dr A Þ  ðr B þ Dr B Þjg ð8Þ

2810

Q. Wu / Expert Systems with Applications 37 (2010) 2808–2814

Proof. For symmetric triangular fuzzy numbers A and B, we have DrA ¼ DrA and DrB ¼ Dr B . h

ξ1

From Theorem 1, the following formulas must hold:

ξ2

+Δ ry −Δ ry

DðA; BÞ ¼ maxfjðr A  Dr A Þ  ðr B  Dr B Þj; jr A  rB j; jðrA þ Dr A Þ

+ε −ε

 ðrB þ DrB Þjg ¼ maxfjðr A  r B Þ  ðDr A  Dr B Þj; jr A  rB j; jðrA  rB Þ þ ðDrA  Dr B Þjg ¼ maxfjðr A  r B Þ  ðDr A  Dr B Þj; jðr A  r B Þ þ ðDrA  Dr B Þjg ¼ maxfjðr A  Dr A Þ  ðr B  Dr B Þj; jðr A þ Dr A Þ  ðrB þ DrB Þjg

ð9Þ

+ ρ ( Δ rx ) − ρ ( Δ rx )

ξ 1*

ξ 2*

This completes the proof of Deduction 1. 2.2. Fv-SVM Fig. 2. The e-insensitive tube of Fm-SVM.

i 2 Rd and i ; y i Þgli¼1 , where x For a set of training samples fðx i 2 R. Due to the difficulty of gathering sample data, the values y of ith sample point from sample set are not accurate. To represent i ; y i Þgli¼1 , symmetric ETFNs is used the fuzziness of sample data fðx to score the fuzzy degree of each data point. Suppose a set of fuzzy training samples fðxi ; yi Þgli¼1 , where xi 2 TðRÞd and yi 2 TðRÞ. T(R)d is i Þgli¼1 , i ; y the set of d dimensional vectors of TFNs. For sample set fðx by means of symmetric ETFNs technique, the data point is described as follows: Dr xi ¼ Dr xi ; r yi ¼ Dr yi ; xi ¼ ðr xi ; Dr xi Þ and  i ; r yi ¼ y i . On the basis of symmetric yi ¼ ðryi ; Dr yi Þ, where r xi ¼ x ETFNs technique, a novel fuzzy v-support vector machine (FvSVM) is proposed as follows. Moreover, the applications in FvSVM and v-SVM are also given in the same sample set in Section 5. We consider the approximation function f(x) = w  /(x) + b, b e R, where w = (w1, w2, . . ., wn), wi e R, and w  /(x) denotes an inner product of w and x. In T(R), f(x) can be written as

f ðxÞ ¼ ðw  /ðr x Þ þ b; qðDr x ÞÞ;

w; r x ; Dr x 2 Rn ; b 2 R

ð10Þ

where q(Drx) = |w|  Drx. In the light of the idea of Fm-SVM, the regression coefficients in T(R) can be estimated by the following constrained optimization problem:

"

min

w;b;e;nðÞ

s:t:

Fig. 3. The architecture of Fm-SVM.

#

2 X l 1 1X ðn þ nki Þ kwk2 þ C me þ 2 l k¼1 i¼1 ki 8 ðyi þ Dr yi Þ  ðw  /ðr xi Þ þ b þ qðDr xi ÞÞ  e þ n1i > > > > > ðw  /ðrxi Þ þ b þ qðDrxi ÞÞ  ðyi þ Dryi Þ  e þ n1i > > > < ðyi  Dr yi Þ  ðw  /ðr xi Þ þ b  qðDr xi ÞÞ  e þ n2i  > > ðw  /ðrxi þ b  qðDr xi ÞÞ  ðyi  Dr xi Þ  e þ n2i > > >  > > nki ; nki  0; k ¼ 1; 2 > : e0

þ

 ðyi þ Dr yi Þ  e  n1i  þ ð11Þ

# 2 X l 1 1X Lðw; b; e; nðÞ ; aðÞ ; b; gðÞ Þ ¼ kwk2 þ C me þ ðnki þ nki Þ 2 l k¼1 i¼1 

l X

  ki nki Þ

ðgki nki þ g

 be

k¼1 i¼1

þ

l X

a1i ½ðyi þ Dryi Þ  ðw  /ðrxi Þ

i¼1

þ b þ qðDrxi ÞÞ  e  n1i 

l X

a2i ½ðyi  Dryi Þ  ðw  /ðrxi Þ

i¼1

þ b  qðDr xi ÞÞ  e  n2i  þ

l X

a2i ½ðw  /ðrxi Þ þ b  qðDrxi ÞÞ

i¼1

ðÞ

2 X

a1i ½ðw  /ðrxi Þ þ b þ qðDrxi ÞÞ

i¼1

where C > 0 is a penalty factor, nki ðk ¼ 1; 2; i ¼ 1; . . . ; lÞ are slack variables, and m e (0, 1] is an adjustable regularization parameter. Problem (11) is a quadratic-programming (QP) problem, whose e-insensitive tube and structure are shown in Figs. 2 and 3. By introducing Lagrangian multipliers, a Lagrangian function can be defined as follows.

"

l X

 ðyi  Dr yi Þ  e  n2i  ðÞ ki ; b;

ð12Þ

ðÞ k

where a g  0ðk ¼ 1; 2; i ¼ 1; . . . ; lÞ are Lagrangian multipliers. Differentiating the Lagrangian function (12) with regard to ðÞ w; b; e; nki , we have 2 X l X @L ðaki  aki Þr xi ¼0)w¼ @w k¼1 i¼1

ð13Þ

2 X l X @L ðaki  aki Þ ¼ 0 ¼0) @b k¼1 i¼1

ð14Þ

2 X l X @L ¼ 0 ) b ¼ Cm  ðaki þ aki Þ @e k¼1 i¼1

ð15Þ

@L ðÞ

@nki

ðÞ

ðÞ

¼ 0 ) gki ¼ C=l  aki

ð16Þ

Q. Wu / Expert Systems with Applications 37 (2010) 2808–2814

By substituting (13)–(16) into (12), we can obtain the corresponding dual form of function (11) as follows: l X 1  jjwjj2 þ ðyi þ Dr yi  qðDr xi ÞÞða1i  a1i Þ 2 i¼1

max  a ;a

þ

l X ðyi  Dr yi þ qðDrxi ÞÞða2i  a2i Þ i¼1

s:t:

2 X l X ðaki  aki Þ ¼ 0;

aðÞ ki 2 ½0; C=l;

k¼1 i¼1 2 X l X ðaki þ aki Þ  C m

ð17Þ

k¼1 i¼1

P where jjwk2 ¼ li;j¼1 ða1i  a1i þ a2i  a2i Þða1j  a1j þ a2j  a2j Þðr xi  r xj Þ. For l training samples, the QP problem (17) consists of 4l variables, 1 linear equality constraint, and 8l + 1 bound constraints. Therefore, the size of QP problem (17) is directly proportional to the number of training samples and is independent of the input dimensionality of the Fv-SVM. ðÞ The Lagrangian multipliers aki can be determined by solving the above QP problem. Based on the Karush–Kuhn–Tucker (KKT) conditions, we have

8 a1i ðyi þ Dryi  w  /ðrxi Þ  b  qðDrxi Þ  e  n1i Þ ¼ 0 > > >  > > a1i ðw  /ðrxi Þ þ b þ qðDrxi Þ  yi  Dryi  e  n1i Þ ¼ 0 > > > > > > a ðy  Dryi  w  /ðrxi Þ  b þ qðDrxi Þ  e  n2i Þ ¼ 0 > < 2i i a2i ðw  /ðrxi Þ þ b  qðDrxi Þ  yi þ Dryi  e  n2i Þ ¼ 0 > > > ðÞ ðÞ > ðC=l  aki Þnki ¼ 0 > > > >   > 2 P l > P > > : Cv  ðaki þ aki Þ e ¼ 0

ð18Þ

ðÞ

Theorem 2. In T(R), the Lagrangian multipliers aki satisfy the equality aki aki ¼ 0; k ¼ 1; 2. Proof. Suppose that there exist Lagrangian multipliers a1i and a1i , which satisfy the inequality aki aki – 0. Then, there must exist a1i – 0 and a1i – 0. According to the first and the second equation of (18), we have

(

r yi þ Dr yi  w  r xi  b  qðDr xi Þ  e  n1i ¼ 0 w  r xi þ b þ qðDrxi Þ  r yi  Dr yi  e  n1i ¼ 0



ð19Þ

Incorporating the above two equations, we obtain 2e þ n1i þ n1i ¼ 0 resulting in e ¼ n1i ¼ n1i ¼ 0. This equation conflicts with the conditions e; n1i ; n1i  0. Therefore, a1i a1i ¼ 0 holds. In the same way, a2i a2i ¼ 0 also holds. This completes the proof of Theorem 2. Parameter b can be obtained by Eq. (21).

" 2 X l X 1 b¼ ðaki  aki ÞKðrxi ; rxm Þ r ym þ r yj  2 k¼1 i¼1 !# 2 X l X  ðaki  aki ÞKðr xi ; r xj Þ ; þ

ð20Þ

k¼1 i¼1

where akm ; akj 2 ð0; C=lÞ. Thus, the regression function (22) of Fv-SVM can be determined by

f ðxÞ ¼

! 2 X l X ðaki  aki Þðr xi  r x Þ þ b; qðDrx Þ k¼1 i¼1

3. Particle swarm optimization (PSO) It is difficult to seek the optimal unknown parameters of SVMs, and particle swarm optimization (PSO) Kenedy, 1995 is generally adopted (Lin, Ying, Chen, & Lee, 2008; Wu, 2009; Wu et al., 2008b). Fv-SVM involves three main parameters: the constant parameter of kernel function K, the control constant v, and the penalty factor C. Many literatures show that PSO has good optimization performance (Lin et al., 2008; Wu, 2009; Wu et al., 2008b). This paper focuses on the establishment of Fv-SVM, PSO is used only to optimize the parameters of Fv-SVM. Therefore, the standard PSO is adopted in our experiment. Similarly to evolutionary computation techniques, PSO uses a set of particles to represent potential solutions to the problem under consideration. The swarm consists of m particles. Each particle has a position X i ¼ fxi1 ; xi2 ; . . . ; xij ; . . . xim g, a velocity V i ¼ fv i1 ; v i2 ; . . . ; v ij ; . . . ; v im g, where i ¼ 1; 2; . . . ; n; j ¼ 1; 2; . . . ; m, and moves through a m-dimensional search space. According to the global variant of the PSO algorithm, each particle moves towards its best previous position and towards the best particle pg in the swarm. Let us denote the best previously visited position of the ith particle that gives the best fitness value as pi = {pi1, pi2, . . ., pij, . . ., pim}, and the best previously visited position of the swarm that gives best fitness as pg = {pg1, pg2, . . ., pgj, . . ., pgm}. The change of position of each particle from one iteration to another can be computed according to the distance between the current position and its previous best position and the distance between the current position and the best position of swarm. Then, the updating of velocity and particle position can be obtained by using the following equations:

v kþ1 ¼ wv kij þ c1 r 1 ij xijkþ1 ¼ xkij þ v kþ1 ij

k¼1 i¼1

ð21Þ

2811

    pij  xkij þ c2 r 2 pg j  xkij

ð22Þ ð23Þ

where w is called inertia weight and is employed to control the impact of the previous history of velocities on the current one. Accordingly, the parameter w regulates the trade-off between the global and the local exploration abilities of the swarm. A large inertia weight facilitates global exploration, while a small one tends to facilitate local exploration. A suitable value of the inertia weight w usually provides balance between global and local exploration abilities and consequently results in a reduction in the number of iterations required to locate the optimum solution. k denotes the iteration number, c1 is the cognition learning factor, c2 is the social learning factor, r1 and r2 are random numbers uniformly distributed in [0, 1]. Thus, the particle flies through potential solutions towards pki and pgk in a navigated way while still exploring new areas by the stochastic mechanism to escape from local optima. Since there was no actual mechanism for controlling the velocity of a particle, it was necessary to impose a maximum value Vmax on it. If the velocity exceeds the threshold, it is set equal to Vmax, which controls the maximum travel distance at each iteration to avoid this particle flying past good solutions. The PSO algorithm is terminated with a maximal number of generations or the best particle position of the entire swarm cannot be improved further after a sufficiently large number of generations. The PSO algorithm has shown its robustness and efficacy in solving function value optimization problems in real number spaces. In our experiment, the position of particle is given as X = (C, v, r). 4. Regression estimation method based on Fm-SVM and PSO The particle swarm optimization is described in steps as follows:

2812

Q. Wu / Expert Systems with Applications 37 (2010) 2808–2814

Algorithm 1

(Step 3)

(Step 1)

(Step 4)

(Step 2)

(Step 3) (Step 4)

(Step 5)

(Step 6) (Step 7)

Data preparation: training, validation, and test sets are represented as Tr, Va, and Te, respectively. Particle initialization and PSO parameters setting: generate initial particles. Set the PSO parameters including number of particles (n), particle dimension (m), number of maximal iterations (kmax), error limitation of the fitness function, velocity limitation (Vmax), and inertia weight for particle velocity (w0). Set iterative variable: k = 0. And perform the training process from steps 3 to 7. Set iterative variable: k = k + 1. Compute the fitness function value of each particle. Take current particle as an individual extremum point of every particle and do the particle with minimal fitness value as the global extremum point. Stop condition checking: if stopping criteria (maximum iterations predefined or the error accuracy of the fitness function) are met, go to step 7. Otherwise, go to the next step. Update the particle position by Eqs. (22) and (23) and form new particle swarms, go to step 3. End the training procedure, output the optimal particle.

On the basis of the Fv-SVM model, we can summarize a forecasting algorithm shown in Fig. 4 as the follows. Algorithm 2 (Step 1) (Step 2)

Initialize the original data by normalization and fuzzification, then form training patterns. Select the kernel function K, the control constant v, and the penalty factor C. Construct the QP problem (11) of the Fv-SVM.

Solve the optimization problem and obtain the paramðÞ eters aki . Compute the regression coefficient b by (20). For a new design task, extract product characteristics and form a set of input variables x. Then, compute the ^ by (21). forecasting result y

Many actual applications suggest that radial basis functions tend to perform well under general smoothness assumptions, so that they should be considered especially if no additional knowledge of the data is available. In this paper, Gaussian radial basis function is used as the kernel function of Fm-SVM.

Kðr xi ; rxj Þ ¼ exp 

jjrxi  rxj jj2 2r2

! ð24Þ

5. Application To illustrate the forecasting method, car sale series forecast with multi-factors is studied. Some factors with large influencing weights are gathered to develop a factor list, as shown in Table 1. In our experiments, car demand series are selected from past demand records in a typical company. The detailed factor data and sale series of these cars compose the corresponding training and testing sample sets. During the process of the car sale series forecasting, six influencing factors, viz., brand famous degree (BF), performance parameter (PP), form beauty (FB), sales experience (SE), oil price (OP), and dweller deposit (DD) are taken into account. All linguistic information of influencing factors is dealt with fuzzy comprehensive evaluation and forms numerical information. The sample set is shown in Table 2. Some criteria, such as mean absolute error (MAE), mean square error (MSE), and mean absolute percentage error (MAPE), are adopted to evaluate the performance of the Fv-SVM method. MSE index is used as the fitness function of PSO in this paper.

Input Data

Normalization Initialization Fuzzy Pretreatment

PSO

Output the current combinational parameters

Forecasting Results

Accuracy check

Fv-SVM

N Accuracy Limitation

Y Output the optimal combinational paramters Fig. 4. The PSO optimizes the parameters of F v-SVM.

2813

Q. Wu / Expert Systems with Applications 37 (2010) 2808–2814 Table 1 Influencing factors of car sale forecasts.

Table 3 Comparison of forecasting results from three forecasting models.

Product characteristics

Unit

Expression

Weight

Brand famous degree (BF)

Dimensionless

Linguistic information Linguistic information Linguistic information Linguistic information Numerical information Linguistic information

0.9

Performance parameter (PP) Form beauty (FB)

Dimensionless

Sales experience (SE)

Dimensionless

Dweller deposit (DD)

Dimensionless

Oil price (OP)

Dimensionless

Dimensionless

0.8 0.8 0.5 0.8 0.4

Table 2 Learning and testing data. No.

Input Data

1 2 3 4 5 6 7 8 9 10 ... 58 59 60

MAE ¼

MSE ¼

FB

SE

OP

BF

DD

0.4 0.8 0.6 0.1 0.3 0.3 0.5 0.5 0.8 0.9 ... 0.7 0.6 0.3

0.4 0.5 0.3 0.2 0.6 0.5 0.8 0.2 0.4 0.6 ... 0.3 0.6 0.8

0.4 0.9 0.1 0.1 0.9 0.5 0.2 0.3 0.4 0.2 ... 0.9 0.9 0.3

4 4 4 1 1 1 1 2 1 1 ... 8 10 12

3.1 0.56 1.5 0.5 2.1 7.1 0.5 8.07 0.45 0.3 ... 2 5 7.9

9.3 18.2 21.1 25 23.7 5.1 2.8 2.9 8.2 6.8 ... 3.7 2.3 11.6

l 1X ^i j jy  y l i¼1 i

MAPE ¼

Desired outputs

PP

l ^i  yi j 1X jy l n¼ yi

l 1X ^ i Þ2 ðy  y l i¼1 i

The latest 12 months

Real value

1 2 3 4 5 6 7 8 9 10 11 12

1886 1537 1699 1129 1287 1420 1429 1231 1967 1912 1350 1818

1699 1537 1886

ð25Þ ð26Þ ð27Þ

^i is the forecasting result of the data point yi. where y Suppose the number of variables is n, and n = n1 + n2, where n1 and n2, respectively, denote the number of fuzzy linguistic variables and crisp numerical variables. The linguistic variables are evaluated in several description levels, and a real number between 0 and 1 can be assigned to each description level. Distinct numer-

ARMA

v-SVM

F v-SVM

1478 1465 1497 1498 1492 1493 1511 1468 1438 1489 1449 1450

1855 1543 1689 1094 1256 1425 1422 1268 1950 1879 1325 1796

1874 1542 1708 1143 1277 1444 1441 1287 1969 1898 1374 1815

ical variables have different dimensions and should be normalized firstly. The following normalization is adopted:

  xdi  min xdi jli¼1 xdi ¼    ; max xdi jli¼1  min xdi jli¼1

1355 1305 1131 1852 1121 1143 1719 1366 1632 1393

Forecasting value

d ¼ 1; 2; . . . ; n2

ð28Þ

where l is the number of samples, xdi and xdi denote the original value and the normalized value, respectively. In fact, all the numerical variables from (1) to (27) are the normalized values although they are not marked by bars. Fuzzification is used to process the linguistic variables and the normalized numerical variables. The centers of those corresponding triangular fuzzy numbers are assigned the normalized values. The spreads of those fuzzy numbers can be determined by evaluating, or by taking some function of the observed values, such as Dr xi ¼ h r xi , where h is a coefficient for fuzzification. The experiments are made on a 1.80 GHz Core(TM)2 CPU personal computer (PC) with 1.0 G memory under Microsoft Windows XP professional. The initial parameters of PSO are given as follows: number of particles: n = 50; particle dimension: m = 6; inertia weight: w = 0.9; positive acceleration constants: c1, c2 = 2; the maximal iterative number: kmax = 100; the fitness accuracy of the normalized samples is equal to 0.0002; and fuzzification coefficient: h = 0.1. The optimal combinational parameters are obtained by PSO, viz., C = 958.03, v = 0.98, and r = 0.56. Fig. 5 illuminates the sale series forecasting results given by PSO and Fm-SVM. To analyze the forecasting capacity of the hybrid model based on PSO and Fm-SVM, PSO and m-SVM, and autoregressive moving average (ARMA) are selected to deal with the above time series. The initial PSO conditions applied in Fm-SVM and m-SVM are the same. Their results are shown in Table 3.

Fig. 5. The sales forecasting results based on F v-SVM.

2814

Q. Wu / Expert Systems with Applications 37 (2010) 2808–2814

Table 4 Error statistic of three models. Model

MAE

MAPE

MSE

ARMA v-SVM Fv-SVM

253.08 22.42 17.92

0.160 0.0152 0.0126

8763.6 626.08 552.25

The indexes MAE, MAPE, and MSE are used to evaluate the forecasting capacity of three models shown in Table 4. To represent the error trend well, the latest 12 months forecasting results are used to analyze the forecasting performance of the above models. It is obvious that the forecasting accuracy given by support vector machine (Fv-SVM and v-SVM) excels the ones by ARMA. The indexes MAE, MAPE, and MSE provided by Fv-SVM are better than ones of ARMA and v-SVM. It is obvious that Fm-SVM has a good generalization performance and is appropriate to those cases with finite samples. Compared with v-SVM, the forecasting precision by Fv-SVM is improved due to adopting the fuzzification measure on each sample point in the sample set. 6. Conclusions In this paper, a new version of FSVM, named Fm-SVM, is proposed to forecast time series by combining the fuzzy theory with m-SVM. The performance of the Fm-SVM is evaluated using the multi-dimension time series, and the simulation results demonstrate that the Fm-SVM is effective in dealing with uncertain data and finite samples. Moreover, it is shown that the parameter-choosing algorithm presented here is available for the Fm-SVM to seek optimized parameters. Compared to ARMA, the Fm-SVM has some other attractive properties such as the strong learning capability for small samples, the good generalization performance, the insensitivity to noise or outliers, and the steerable approximation parameters. Compared to m-SVM, the Fm-SVM can score the microcosmic uncertainty characteristic of the sample data. The error space of each data point is represented by symmetric triangular fuzzy numbers, which is similar to size tolerance concept in mechanical engineering. In our experiments, some fixed coefficient, such as fuzzification coefficient h, are adopted. However, how to choose an appropriate coefficient is not described in this paper. This is a meaningful problem for future research. Acknowledgements This research is supported by the National Natural Science Foundation of China under Grants 60904043, China Postdoctoral Science Foundation (20090451152) and Jiangsu Planned Projects for Postdoctoral Research Funds (0901023C). References Arciniegas, A. I., & Arciniegas Rueda, I. E. (2008). Forecasting short-term power prices in the Ontario Electricity Market (OEM) with a fuzzy logic based inference system. Utilities Policy, 16(1), 39–48.

Bao, Y. K., Liu, Z. T., Guo, L., & Wang, W. (2005). Forecasting stock composite index by fuzzy support vector machines regression. In Proceedings of 2005 international conference on machine learning and cybernetics (pp. 3535–3540). Chang, P. C., Wang, Y. W. C., & Liu, H. (2007). The development of a weighted evolving fuzzy neural network for PCB sales forecasting. Expert Systems with Applications, 32(1), 86–96. Chen, H., Wang, J., & Yan, X. (2008). A fuzzy support vector machine with weighted margin for flight delay early warning. In Fifth international conference on fuzzy systems and knowledge discovery (pp. 331–335). Chen, R. C., & Hsieh, C. H. (2006). Web page classification based on a support vector machine using a weighted vote schema. Expert Systems with Applications, 31(2), 427–435. Chen, X., Li, Y., Harrison, R., & Zhang, Y. Q. (2008). Type-2 fuzzy logic-based classifier fusion for support vector machines. Applied Soft Computing, 8(3), 1222–1231. Chiu, D. Y., & Chen, P. J. (2009). Dynamically exploring internal mechanism of stock market by fuzzy-based support vector machines with high dimension input space and genetic algorithm. Expert Systems with Applications, 36(2), 1240–1248. Dong, H., Yang, S., & Wu, D. (2007). Intelligent prediction method for small-batch producing quality based on fuzzy least square SVM. Systems Engineering-Theory and Practice, 27(3), 98–104. Jayadeva Khemchandani, R., & Chandra, S. (2004). Fast and robust learning through fuzzy linear proximal support vector machines. Neurocomputing, 61, 401–411. Juang, C. F., Sun, W. K., & Chen, G. C. (in press). Object detection by color histogrambased fuzzy classifier with support vector learning. Neurocomputing. Kenedy, J., & Eberhart, R. (1995). Particle swarm optimization. In Proceedings of the IEEE international conference on neural networks (pp. 1942–1948). Khashei, M., Hejazi, S. R., & Bijari, M. (2008). A new hybrid artificial neural networks and fuzzy regression model for time series forecasting. Fuzzy Sets and Systems, 159(7), 769–786. Kuo, R. J., & Xue, K. C. (1999). Fuzzy neural networks with application to sales forecasting. Fuzzy Sets and Systems, 108(2), 123–143. Li, J., Huang, S., He, R., & Qian, K. (2008). Image classification based on fuzzy support vector machine. In International symposium on computational intelligence and design (pp. 68–71). Lin, S. W., Ying, K. C., Chen, S. C., & Lee, Z. J. (2008). Particle swarm optimization for parameter determination and feature selection of support vector machines. Expert Systems with Applications, 35(4), 1817–1824. Min, R., & Cheng, H. D. (2009). Effective image retrieval using dominant color descriptor and fuzzy support vector machine. Pattern Recognition, 42(1), 147–157. Popescu, C. A., Wong, Y. S., & Lee, B. H. K. (in press). An expert system for predicting nonlinear aeroelastic behavior of an airfoil. Journal of Sound and Vibration. Schölkopf, B., Smola, A. J., Williamson, R. C., & Bartlett, P. L. (2000). New support vector algorithms. Neural Computation, 12(5), 1207–1245. Shieh, M. D., & Yang, C. C. (2008). Classification model for product form design using fuzzy support vector machines. Computers and Industrial Engineering, 55(1), 150–164. Tang, H., & Qu, L. S. (2008). Fuzzy support vector machine with a new fuzzy membership function for pattern classification. In International conference on machine learning and cybernetics (pp. 768–773). Tsujinishi, D., & Abe, S. (2003). Fuzzy least squares support vector machines for multiclass problems. Neural Networks, 16(5–6), 785–792. Vapnik, V. (1995). The nature of statistical learning. New York: Springer. Wu, Q., Yan, H. S., Yang, H. B. (2008a). A hybrid forecasting model based on chaotic mapping and improved support vector machine. In The ninth international conference for young computer scientists (pp. 2701–2706). Wu, Q., Yan, H. S., & Yang, H. B. (2008b). A forecasting model based support vector machine and particle swarm optimization. In The 2008 workshop on power electronics and intelligent transportation system (pp. 218–222). Wu, Q. (2009). The forecasting model based on wavelet m-support vector machine. Expert Systems with Applications, 36(4), 7604–7610. Yang, C. H., Jin, L. C., & Chuang, L. Y. (2006). Fuzzy support vector machines for adaptive Morse code recognition. Medical Engineering and Physics, 28(9), 925–931. Yu, L., & Zhang, Y. Q. (2005). Evolutionary fuzzy neural networks for hybrid financial prediction. IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews, 35(2), 244–249.