Parameter selection of support vector machine for function approximation based on chaos optimization

Journal of Systems Engineering and Electronics Vol. 19, No. 1, 2008, pp.191–197 Parameter selection of support vector machine for function approximat...

Download PDF

467KB Sizes 0 Downloads 57 Views

Report

PDF Reader
Full Text

Journal of Systems Engineering and Electronics Vol. 19, No. 1, 2008, pp.191–197

Parameter selection of support vector machine for function approximation based on chaos optimization∗ Yuan Xiaofang & Wang Yaonan Coll. of Electrical & Information Engineering, Hunan Univ., Changsha 410082, P. R. China (Received November 13, 2006)

Abstract: The support vector machine (SVM) is a novel machine learning method, which has the ability to approximate nonlinear functions with arbitrary accuracy. Setting parameters well is very crucial for SVM learning results and generalization ability, and now there is no systematic, general method for parameter selection. In this article, the SVM parameter selection for function approximation is regarded as a compound optimization problem and a mutative scale chaos optimization algorithm is employed to search for optimal parameter values. The chaos optimization algorithm is an eﬀective way for global optimal and the mutative scale chaos algorithm could improve the search eﬃciency and accuracy. Several simulation examples show the sensitivity of the SVM parameters and demonstrate the superiority of this proposed method for nonlinear function approximation.

Keywords: learning systems, support vector machines (SVM), approximation theory, parameter selection, optimization.

1. Introduction Artiﬁcial neural networks (ANN) have the ability to approximate nonlinear functions with arbitrary accuracy and this has been validated since the late 1980s [1−2] . Nevertheless structure and types of ANN are experientially selected, and the training of ANN is based on the empirical risk minimization (ERM)[3] principle, which aims at minimizing training errors. Therefore, ANN faces some disadvantages such as overﬁtting, local optimization, and bad generalization ability. Support vector machines (SVM)[3,4] are new machine learning methods derived from the statistical learning theory. Since the late 1990s, SVM has shown growing popularity and has been successfully applied to many areas ranging from handwritten digit recognition, speaker identiﬁcation to function approximation, and time series forecasting [5−7] . Established on the theory of structural risk minimization (SRM) [3] principle, SVM has some distinct advantages over ANN, such as, it is globally optimal, small sample-size, good generalization ability, and it is resistant to the overﬁtting problem [3,6,7] .

It is well known that the SVM generalization performance depends on a good setting of hyperparameters C, ε, and the kernel parameters [8] , and now there is no systematic, general method for parameter selection. Reference[9] summarizes the existing practical approaches for parameter setting and describes the practical recommendations for settingC, and ε directly from the training data and estimated noise level. Indeed all these approaches (including Ref. [9]) for setting SVM parameters are based on a priori knowledge, user expertise or experimental trial, and hence they cannot make sure that the parameter values are globally optimal. In this article, the authors propose a novel approach for SVM parameter selection based on chaos optimization. In this implementation, the parameter selection is regarded as a compound optimization problem, and the mutative scale chaos optimization is proposed to select suitable parameter values. The chaos optimization algorithm is an eﬃcient and convenient way for global optimization[10] , and for improving the search eﬃciency and accuracy, mutative scale chaos optimization is employed, which can reduce the search

* This project was supported by the National Nature Science Foundation of China (60775047, 60402024).

192

Yuan Xiaofang & Wang Yaonan

ranges during the search process. Practical validity of the proposed SVM parameter selection approach is illustrated using several nonlinear functions. Simulations demonstrate that SVM, whose parameter selection is based on chaos optimization, has a better performance over ANN, and other SVM parameter selection technique.

the following dual optimization max

SVM approximation

To introduce the subject the authors begin by outlining SVM for function approximation. Let the given training data sets be represented by D = {(x1 , y1 ), (x2 , y2 ), · · · , (xn , yn )}, where xi ∈ Rd is an input vector, yi ∈ R is its corresponding desired output, and n is the number of training data. In SVM, the original input space is mapped into the high dimensional space called feature space, by nonlinear mapping x → g(x). Let f (x) be the output of SVM corresponding to the input vector x, in the feature space, then a linear function is constructed f (x) = wt g(x) + b (1) where, w is a coeﬃcient vector and b is a threshold. SVM learning can be obtained by the minimization of the empirical risk on the training data, and the εintensive loss function is used for the minimization of empirical risk. The loss function is deﬁned as Lε (x, y, f ) = |y − f (x)|e = max(0, |y − f (x)|−ε) (2) Where ε is a positive parameter that allows approximation errors smaller than ε. The empirical risk is n 1 ε Remp (w) = L (yi − f (xi )) (3) n i=1

Other than the ε-intensive loss, SVM tries to reduce the model complexity by minimizing w2 . This can be described by slack variables ξi and ξˆi , which measure training data xi whose deviations exceed the constant ε. Subsequently, the SVM approximation is obtained as the following optimization problem n 1 2 min w + C (ξi + ξˆi ) (4) 2 i=1 s.t. yi −f (xi ) ε+ξi , f (xi )−yi ε+ ξˆi , ξi , ξˆi 0 (5)

where C is a positive constant to be regulated. By using the Lagrange multiplier method[3] , the minimization of formula (4) causes the problem of maximizing

yi (ˆ αi − αi ) − ε

i=1 n

n

yi (ˆ αi + αi )−

i=1

1 (ˆ αi − αi )(ˆ αj − αj )K(xi , xj ) 2 i,j=1 s.t.

2. SVM approximation and parameters 2.1

n

n

(ˆ αi − αi ) = 0, C α ˆ i , αi 0

(6)

(7)

i=1

where α ˆ i and αi are Lagrange multipliers, and kernel K(xi , xj ) is a symmetric function, which is equivalent to the dot product in the feature space, namely K(xi , xj ) = g(xi )t g(xj )

(8)

Here the Gaussian function is used as kernel K(x, y) = exp(−x − y2 /2σ 2 )

(9)

Some other kernels are given below: the polynomial kernel K(x, y) = (x·y+1)d and the hyperbolic tangent kernel K(x, y) = tanh(c1 (x · y) + c2 ). By replacing βi = α ˆ− ˆi αi = 0, the i αi and relation α optimization of formulae (6) to (7) is rewritten as max

n

y i βi − ε

i=1

n

|βi | −

i=1

s.t.

n

n 1 βi βj K(xi , xj ) (10) 2 i,j=1

βi = 0, −C βi C

(11)

i=1

The learning results for training data set D can be derived from Eqs. (10) and (11). Note that only some of the coeﬃcients βi are not zeros and the corresponding vectors x are called support vectors (SV). That is, vectors xi whose corresponding coeﬃcients α ˆ i − αi are not zero are SV. Then the approximation function is represented by Lagrange multipliers, namely f (x) =

P

(ˆ αi − αi )K(xi , xj ) + b

(12)

i=1

It should be noted that P is the number of SV, and f (x) is computed only from SV. Furthermore, the constant b can be determined as well. 2.2

SVM parameters

The quality of SVM models strongly depends on the proper setting of parameters and the SVM approximation performance is sensitive to parameters. For

Parameter selection of support vector machine for function approximation based on chaos optimization Gaussian kernel, the parameters to be regulated include hyperparameters C, ε, and kernel parameter σ. The problem of parameter selection is further complicated by the fact that the SVM model complexity (which is the generalization performance) depends on these three parameters. Indeed, the values of C, σ, and ε relate to the actual function model and they are not ﬁxed for diﬀerent data sets. The values of parameter C, σ, and ε aﬀect the model complexity in a diﬀerent way. ParameterC determines the trade-oﬀ between model complexity and the degree to which deviations larger than ε are tolerated. Parameter ε controls the width of the εinsensitive zone and can aﬀect the number of SV in the optimization problem. Kernel parameter σ determines the kernel width and relates to the input range of the training data set.

3. Parameter optimization based on mutative scale chaos algorithm Chaos has properties of ergodicity, stochasticity, and “regularity”, and the chaotic trajectories of the maps are useful to probe in wide ranges of the searching domain without being trapped into the local optimal. The chaos optimization algorithm is an eﬃcient and convenient method for global optimization and here the mutative scale chaos algorithm is applied for optimal C, σ, and ε. During the search process, the search ranges are mutatively scaled, that is, as the iterations increase, the search ranges will decrease, and this can improve the search eﬃciency and accuracy. Here parameter selection is regarded as the compound optimization of parameters and the complexity of SVM is needlessly considered. Here the iterative chaotic map with inﬁnite collapses (ICMIC)[11] is chosen as the chaos model xn+1 = sin(a/xn ), n = 1, 2, · · · , x0 = 0

(13)

Assuming a=2, then xn+1 = sin(2/xn ). For SVM approximation, the objective of parameter selection is to minimize deviations between training data outputs and SVM outputs, and here the performance criterion is the mean square error (MSE) p M SE = [yk − f (xk , w)]2 /p (14) k=1

193

where p is the number of training data, yk are training data outputs and f (xk , w) are SVM outputs. Then the objective of the chaos algorithm is to search for optimal parameters C, σ, and ε, to minimize MSE ⎧ ⎨ minf (x , · · · , x ) = min M SE 1 i (15) ⎩ ai xi bi , i = 1, 2, 3 Here i = 3, corresponding to parameters C, σ, and ε, [ai , bi ] is the search range. As there is no systematic method for diﬀerent data sets, general search ranges are determined, based somewhat largely on expertise, typically C ∈[0.5, 100], σ ∈[0.1,100], and ε ∈[0.01, 0.2]. For special training data samples, the search ranges can be changed. The basic steps of the mutative scale chaos optimization algorithm are as follows Step 1 Set initial parameters: K is the number of iterations that start from 1. Random initial values xi,0 and x∗i = xi (0), fi∗ = fi (0). In addition, a termination criterion is created: maximal iterations N or maximal acceptable mean square error M SEd ; Step 2 The chaos iteration operates as Eq. (13) xi,n+1 = sin(2/xi,n ), Step 3

i = 1, 2, 3;

Conversion of search ranges xi,n+1 = ai + (bi − ai )|xi,n+1 |

(16)

Hence, search ranges of i chaos variables xi,n+1 will be changed from [−1, 1] to [ai , bi ]; Step 4 Replace decision-making. Set xi (K) = xi,n+1 and compute fi (K); If fi (K) fi∗ , Then fi∗ = fi (K), x∗i = xi (K); Else fi (K) > fi∗ , and fi∗ , x∗i is maintained; Step 5 If K > N or fi∗ M SEd , this iteration is stopped; If K < N and fi∗ > M SEd , K = K + 1, the iteration is to be continued and the search ranges are modiﬁed ai = x∗i − φ(bi − ai ), bi = x∗i + φ(bi − ai )

(17)

where φ represents the mutative scale factor, which is a decreasing function given by ⎧

2 ⎪ ⎨ φ=1− K −m , K m K (18) ⎪ ⎩ φ = 1, K
194

Yuan Xiaofang & Wang Yaonan

If ai < ai , Then ai = ai ; If bi > bi , Then bi = bi . Go back to Step 2 for next iteration. Hence x∗i (i = 1, 2, 3) are the optimal values of parameters C, σ, and ε derived from the mutative scale chaos optimization. The chaos optimization algorithm is the global optimal and the derived parameters can minimize the MSE of a data set. As SVM have a good generalization performance for training data as well as test data, the derived parameters are also adaptive and can minimize MSE.

MSE in Table 1 and Fig. 1 are rather large, as here the parameters C, σ, and ε are experientially selected, without optimization, just for showing sensitivity. Thus in this search procedure, parameter selection is regarded as a compound optimization problem and three parameters are searched simultaneously as in Section 3, based on the chaos optimization algorithm, which can reach optimal values.

4. Simulation study SVM sensitive to parameters C, σ, and ε

4.1

Choose a nonlinear function as an illustration: Herx2 mite function y = 1.1∗ (1 − x + 2x2 )e− 2 , where x ∈ [0, 6]. For the range of x, 100 pairs of training data (xi , yi ) are randomly selected. In this section, three experiments are used to demonstrate the inﬂuence of parameter values on the SVM performance. In each experiment, two parameters of C, σ, and ε are ﬁxed and the other parameter is changeable. For various values C, σ, and ε, the SVM approximation results are diﬀerent. The MSE in formula (15) illustrates deviations between f (xk , w) and yk . Both Table 1 and Fig. 1(a)∼(c) show the MSE of SVM approximation. It can be concluded that SVM performances are sensitive to the values of parameters and SVM performances are somewhat diﬀerent for various parameter values. Table 1

Contrastive MSE at various parameter values

No.

Expt.1

Expt.2

Expt.3

(σ=3.6,

(C=5,

(C=5,

ε=0.05)

ε=0.05)

σ=3.6)

(Fig. 1(a))

(Fig.1(b))

(Fig.1(c))

C

M SE

σ

M SE

ε

M SE

1

0.1

0.201 3

0.2

0.403 5

0.002

0.009 2

2

0.5

0.176 7

0.5

0.250 7

0.005

0.009 4

3

1

0.152 5

1

0.182 4

0.01

0.009 2

4

2

0.149 4

2.5

0.009 1

0.02

0.009 3

5

5

0.134 8

4

0.105 7

0.05

0.009 0

6

10

0.131 0

6

0.174 3

0.075

0.009 3

7

20

0.129 3

8

0.279 0

0.1

0.009 1

8

50

0.129 2

10

0.311 6

0.2

0.009 2

Fig. 1

Contrastive MSE at various parameter values

Parameter selection of support vector machine for function approximation based on chaos optimization Table 2

4.2 SVM approximation of nonlinear function Here the authors also consider Hermite function, and x ∈[0,6], 100 pairs training data (xi , yi ), and 40 pairs of test data are randomly selected. Mutative scale chaos optimization is applied to optimize parameters C, σ, and ε, and the optimization objective is to minimize MSE. Set maximum iterations is 500, and the search ranges for this training data are C ∈[0.5, 15], σ ∈[0.5, 5], and ε ∈[0.01, 0.2]. The initial values of parameters C, σ, and ε are randomly selected. As the iterations increase, MSE decreases greatly, as in Fig. 2 and the parameter values are more appropriate. MSE using two diﬀerent chaos optimization algorithms is illustrated in Fig. 2, Curve 1 denotes the mutative scale chaos optimization algorithm, Curve 2 denotes the conventional chaos optimization algorithm when iterations N reach 500. MSE of these two algorithms decreases to 0.013 and 0.020, respectively. It can be observed that the mutative scale chaos optimization algorithm has a faster convergence speed and preferable results. Here the optimal values derived from the optimization algorithm are C =6.4, σ=3.9 and ε=0.026.

Method

MSE

MSE during chaos optimization procedure

Now the approximation performance for three different approximation approaches is tested, that is, (1), ANN approximation (RBF neural network); (2) SVM(1) approximation (parameters selected as Ref. [9]); (3) SVM(2) approximation (parameters selected from our approach). Both Table 2 and Figs. 3∼5 show the test results of these three approaches, whereas, Figs. 3∼5 illustrate the test performance of ANN, SVM(1) , and SVM(2) Respectively. In Figs. 3(a), 4(a)

Approximation results

Max. positive error

Max. negative error

Test curve

Test error curve

ANN

0.064

0.197

−0.084

Fig.3(a)

Fig.3(b)

SVM(1)

0.052

0.171

−0.061

Fig.4(a)

Fig.4(b)

SVM(2)

0.037

0.078

−0.043

Fig.5(a)

Fig.5(b)

Fig. 3

Fig. 2

195

Test results of ANN approximation

and 5(a), the solid line shows 40 pairs of test data, whereas, the dotted line shows the approximate outputs of these three approaches respectively. Figs. 3(b), 4(b), and 5(b) illustrate the actual errors of these three approaches respectively. From these simulations, it can be observed that SVM, whose parameter is selected based on chaos optimization, has a better approximation performance than the other two approximation approaches. 4.3

Two-dimensional function approximation

Now a two-dimensional nonlinear function is considered and it is deﬁned as z=

1 + sin xy , and x ∈ [0, 2], y ∈ [0, 2] 4 + sin 2πx + sin πy

196

Yuan Xiaofang & Wang Yaonan

Parameter values are selected based on chaos optimization, and the search ranges for this training data are C ∈[0.5, 25], σ ∈[0.25, 10], and ε ∈[0.01, 0.2]. After 600 iterations, MSE is 0.0051 and Fig. 6 illustrates

the actual model and the SVM approximation model. Simulations demonstrate that for a two-dimensional nonlinear function, SVM, whose parameters are selected based on chaos optimization, also has a good

Fig. 4

Test results of SVM(1) approximation

Fig. 5

Test results of SVM(2) approximation

Fig. 6

Actual model and SVM approximation model

Parameter selection of support vector machine for function approximation based on chaos optimization

197

approximation performance.

modelling of nonlinear dynamic system using support vec-

5. Conclusions

tor neural networks. Engineering Applications of Artiﬁcial Intelligence, 2001, 14(2): 105–113.

Good setting parameters are very crucial for SVM learning results and the generalization ability, and the authors consider parameter selection as a compound optimization problem. This article proposes a mutative scale chaos optimization algorithm to search for optimal values of parameters. Various examples are simulated to demonstrate the superiority of this proposed approach. The simulation results demonstrate that SVM, whose parameters are selected using the authors’ approach, has a better approximation performance than ANN or SVM, whose parameters are chosen using other techniques. For other applications or types of SVM, parameters can also be selected by using chaos optimization algorithms.

[7] Chuang C C, Su S F, Jeng J T. Robust support vector regression networks for function approximation with outliers. IEEE Trans. on Neural Networks, 2002, 13(6): 1322–1330. [8] Wang W J, Xu Z B, Lu W Z. Determination of the spread parameter in the Gaussian kernel for classiﬁcation and regression. Neurocomputing, 2003, 55(3-4): 643–663. [9] Cherkassky V, Ma Y Q. Practical selection of SVM parameters and noise estimation for SVM regression. Neural Networks, 2004(17): 113–126. [10] Li B, Jiang W S. Chaos optimization method and its application. Control Theory and Applications, 1997, 14(4): 613–615. [11] He D, He C, Jiang L G. Chaotic characteristics of a onedimensional iterative map with inﬁnite collapses. IEEE

References

Trans. on Circuits and Systems-I: Fundamental Theory

[1] Funahashi K. On the approximate realization of continu-

and Applications, 2001, 48(7): 900–906.

ous mappings by neural networks. Neural Networks, 1989,

Yuan Xiaofang was born in 1979. He received the

2(3): 183–192. [2] Hornik K, Stinchombe M, White H. Multilayer feedforward networks are universal approximators. Neural Networks, 1989, 2(5): 359–366. [3] Vapnik V. The nature of statistical learning theory. New

B.S. and M.S. degrees from Hunan University in 2001 and 2006 respectively. Now he is a Ph. D. candidate in the Hunan University. His research interests include intelligent control, neural networks, and machine learning. E-mail: [email protected]

York: Springer-Verlag, 1995. [4] Vapnik V. An overview of statistical learning theory. IEEE Trans. on Neural Networks, 1999, 10(5): 988–999. [5] Byun H, Lee S W. A survey on pattern recognition applications of support vector machines. International Journal of Pattern Recognition and Artiﬁcial Intelligence, 2003, 17(3): 459–486. [6] Chan W C, Chan C W, Cheung K C, et al.

On the

Wang Yaonan was born in 1957. He received his Ph. D. degree from Hunan University in 1994. He is a professor and a doctoral supervisor in Hunan University. His research interests include control theory and applications, neural networks, and pattern recognition and intelligent image processing.

Parameter selection of support vector machine for function approximation based on chaos optimization

Parameter selection of support vector machine for function approximation based on chaos optimization

Recommend Documents