Expert Systems with Applications 37 (2010) 959–967
Contents lists available at ScienceDirect
Expert Systems with Applications journal homepage: www.elsevier.com/locate/eswa
TAIFEX and KOSPI 200 forecasting based on two-factors high-order fuzzy time series and particle swarm optimization Jin-Il Park a, Dae-Jong Lee a, Chang-Kyu Song b, Myung-Geun Chun a,* a b
Department of Electrical and Computer Engineering, Chungbuk National University, Gaeshin-dong, 12, Cheongju 361-763, Republic of Korea CBNU BK21 Chungbuk Information Technology Center, Chungbuk National University, Gaeshin-dong, 12, Cheongju 361-763, Republic of Korea
a r t i c l e
i n f o
Keywords: Fuzzy time series Two-factors high-order fuzzy logical relationships Particle swarm optimization TAIFEX KOSPI 200
a b s t r a c t Since the fuzzy time series forecasting methods provide a powerful framework to cope with vague or ambiguous problems, they have been widely used in real applications. The forecasting accuracy of these methods usually, however, depend on their universe of discourse and the length of intervals. So, we present a new forecasting method using two-factors high-order fuzzy time series and particle swarm optimization (PSO) for increasing the forecasting accuracy. To show the effectiveness of the proposed method, we applied our method for the Taiwan futures exchange (TAIFEX) forecasting and the Korea composite price index (KOSPI) 200 forecasting. The results show better forecasting accuracy than previous methods. Ó 2009 Elsevier Ltd. All rights reserved.
1. Introduction Song and Chissom (1994a, 1994b) presented the fuzzy time series model based on the fuzzy set theory (Zadeh, 1965) to forecast the historical enrollments of the University of Alabama. It has the some drawbacks such as large amount of computation time when the fuzzy rule matrix is large and lack of persuasiveness in determining universe of discourse and the length of intervals. Chen (1996, 2002) presented a simple fuzzy composition method and high-order fuzzy time series to overcome these problems and improve the forecasting accuracy rate of the fuzzy time series. Lee, Wang, Chen, and Leu (2004, 2006) presented a two-factors highorder fuzzy time series for temperature prediction and the Taiwan futures exchange (TAIFEX) forecasting. It is noted that the forecasting accuracy rates of these methods mainly depend on their universe of discourse and the length of intervals. In recent years, some methods have been proposed to remedy these drawbacks in the fuzzy time series. Huarng (2001a, 2001b) presented two effective methods based on distribution and average for effective lengths of intervals and then adding a heuristic function to get better forecasting results. Lee and Chen (2004) presented a method for temperature prediction using genetic algorithms and fuzzy time series. Lee, Wang, and Chen (2008) also presented a method for temperature prediction and TAIFEX forecasting based on high-order fuzzy logical relationships and genetic simulated annealing techniques (Gen & Cheng, 1997; Goldberg, 1998; Goldberg, Korb, & Deb, 1989; Holland, 1975; Kirkpatric, Gelatt, & Vecchi, 1983; Lin & Chen, 2004). * Corresponding author. Tel.: +82 43 261 2388; fax: +82 43 268 2386. E-mail address:
[email protected] (M.-G. Chun). 0957-4174/$ - see front matter Ó 2009 Elsevier Ltd. All rights reserved. doi:10.1016/j.eswa.2009.05.081
On the other hand, particle swarm optimization (PSO) technique was developed by Kennedy and Eberhart (1995). PSO mimics the social behavior of a flock of migrating birds trying to reach an unknown destination. Emad, Tarek, and Donald (2005) presented comparison among five recent evolutionary-based optimization algorithms: genetic algorithms, memetic algorithms, particle swarm, ant-colony systems, and shuffled frog leaping. The comparative results show the PSO method was generally found to perform better than other algorithms in terms of success rate and solution quality. Also, some comparative research works for the real problems presented that the PSO based results have better performance than the based on GA (Gaing, 2004; Panda & Padhy, 2007). Motivated by these previous research works, we present a new method using PSO and two-factors high-order fuzzy time series to increase the forecasting accuracy. We have applied the proposed method for the Taiwan futures exchange (TAIFEX) forecasting and also the Korea composite price index (KOSPI) 200 forecasting. Here, the proposed method shows better forecasting accuracy than other previous methods. The rest of the paper is organized as follows. In Section 2, we present a brief overview for fuzzy time series and particle swarm optimization. In Section 3, we present a particle swarm optimization based two-factors high-order fuzzy time series and some experimental results are given in Section 4. Finally, some concluding remarks are given in Section 5. 2. Related works 2.1. Fuzzy time series Song and Chissom (1993a, 1993b, 1994a, 1994b) presented the fuzzy time series based on the fuzzy set and fuzzy boundary. It
960
J.-I. Park et al. / Expert Systems with Applications 37 (2010) 959–967
x2
provides a powerful framework to cope with vague or ambiguous problems and can express linguistic values and human subjective judgments of natural language. The universe of discourse U can be represented as U ¼ fu1 ; u2 ; . . . ; un g. A fuzzy set Ai defined in the universe of discourse U can be represented as Ai ¼ fAi ðu1 Þ=u1 þ fAi ðu2 Þ= u2 þ þ fAi ðun Þ=un . Here, fAi is the membership function of the fuzzy set Ai ; fAi : U ! ½0; 1; uk belongs to the fuzzy set Ai , and fAi ðuk Þ denotes the degree of membership of uk belonging to the fuzzy set Ai . The fuzzy time series can be summarized as follows Song and Chissom (1993a, 1993b, 1994a, 1994b). Fuzzy time series FðtÞ is defined by a collection of fi ðtÞ. If there exists a fuzzy logical relation Rðt 1; tÞ such asFðtÞ ¼ Fðt 1Þ Rðt 1; tÞ, then Fðt 1Þ ! FðtÞ, where ‘’ is the max–min composition operator. If fuzzy logical relation is defined by Ai ! Aj , where Fðt 1Þ ¼ Ai and FðtÞ ¼ Aj , then Ai called ‘left-hand side’ and Aj called ‘righthand side’ of the fuzzy logical relation, respectively. Fuzzy logical relationship groups can be presented by the lefthand side of the fuzzy logical relation.
Ai ! Aj1 ; Aj2 ; . . . : Suppose FðtÞ is caused by Fðt 1Þ only, and FðtÞ ¼ Fðt 1Þ Rðt 1; tÞ. For any t, if Rðt 1; tÞ is independent of t, then FðtÞ is named a time-invariant fuzzy time series, otherwise a time-variant fuzzy time series. 2.2. Particle swarm optimization PSO method is a member of wide category of swarm intelligence methods for solving optimization problems. Since the PSO is based on a simple concept, the social behavior of particles in the swarm, it has been applied with success to complex real world problems. Each particle in PSO flies through the search space with an adaptable velocity that is dynamically modified according to its own flying experience and also communicates together as they flying experience of the other particles. Further, each particle has a memory and hence it is capable of remembering the best position in the search space ever visited by it. In the PSO, each particle remember the best position from its own flying experience is called pbest, and then the overall best out of all the particles in the population is called gbest. The features of the searching procedure of the PSO can be summarized as follows (Panda & Padhy, 2007).
i
v
ðtþ1Þ i
ðtÞ
¼w
v
ðtÞ i
þ c1 r1 ðpbest
ðtÞ xi Þ
þ c2 r 2 ðgbest
gbest
vi(t +1) pbest
xi(t ) x1 Fig. 1. Description of velocity and position updates in particle swarm optimization for two-dimensional parameter space. ðtþ1Þ
xi
ðtÞ
ðtþ1Þ
¼ xi þ v i ðtÞ i
wðtÞ ¼ wmax t ðwmax wmin Þ=iter max
ð3Þ
where wmax is an initial weight, wmin is a final weight, and itermax is a maximum iteration number. Applying a large inertia weight at the start of the algorithm and making it decay to a small value through the PSO execution make the algorithm search globally at the beginning of the search, and search locally at the end of the execution. In
Initialize; Generate random population of N solutions (particles); Initialize the value of the weight factor w;
Do
For each particle i∈ N ; Calculate objective value (i) ; For each particle; Set pbest as the best position of particle i; If objective value (i) is better than pbest; pbest(i) = objective value (i); End; Set gbest as the best objective value of all particles; For each particle; Calculate particle velocity according to Eq. (1); Update particle position according to Eq. (2); End; Update the value of the weight factor w according to Eq. (3);
Until
Check if termination condition is true.
ðtÞ xi Þ
ð1Þ
ð2Þ
ðtÞ xi
where v and are the current velocity and position of the ith particle at iteration t. w is the inertia weight factor. The parameters c1 andc2 are called cognitive and social acceleration factors, respectively. r 1 and r2 are two independent random numbers uniformly distributed in the range [0, 1]. Usually tth the inertia weight wðtÞ is set according to the following equation:
Initial positions of pbest are different. However, using the different direction of pbest and gbest, all particles gradually get close to the global optimum. The method is usually applied to the discrete problem using grids for XY position and its velocity. However, the modified value of the particle position can be continuous and so the method can be also applied to the continuous problem. There are no inconsistencies in searching procedures even if continuous and discrete state variables are utilized with continuous axes and grids for XY positions and velocities. Namely, the method can be applied to mixed integer non-linear optimization problems with continuous and discrete state variables naturally and easily. The new velocity and position of each particle can be calculated using the current velocity v and distance from the pbest to gbest as shown in the following equations:
xi(t +1)
v (t )
Fig. 2. Particle swarm optimization algorithm.
961
J.-I. Park et al. / Expert Systems with Applications 37 (2010) 959–967
the PSO, each particle moves in the search space with a velocity according to its own previous best solution and its group’s previous best solution. The parameters c1 and c2 determine the relative pull of pbest varying these pulls. In the above equations, superscripts denote the iteration number. Fig. 1 shows the velocity and position updates of a particle for two-dimensional parameter space and the overall computational flow chart of the PSO algorithm is given in Fig. 2.
main-factor
Fig. 3. The encoding format of the particles in the PSO.
3. A particle swarm optimization based two-factors high-order fuzzy time series In this section, we present a new forecasting method using particle swarm optimization and two-factors high-order fuzzy time series. Let FðtÞ be a fuzzy time series. If FðtÞ is caused by Fðt 1Þ; Fðt 2Þ; . . ., and Fðt nÞ, then the fuzzy logical relationship is represented by
Fðt nÞ; . . . ; Fðt 2Þ;
Fðt 1Þ ! FðtÞ
ð4Þ
and it is called an one-factor nth-order fuzzy time series forecasting model. If FðtÞ is caused by ðF 1 ðt 1Þ; F 2 ðt 1ÞÞ; ðF 1 ðt 2Þ; F 2 ðt 2ÞÞ; . . ., and ðF 1 ðt nÞ; F 2 ðt nÞÞ, then this fuzzy logical relationship is represented by ðF 1 ðt nÞ; F 2 ðt nÞÞ; . . . ; ðF 1 ðt 2Þ; F 2 ðt 2ÞÞ; ðF 1 ðt 1Þ; F 2 ðt 1ÞÞ ! FðtÞ ð5Þ
and it is called the two-factors nth-order fuzzy time series forecasting model, where F 1 ðtÞ and F 2 ðtÞ are called the main-factor and the second-factor, respectively. Based on the two-factors high-order fuzzy time series adopted from Lee et al. (2008), the proposed forecasting algorithm can be described as follows:
second-factor
Step 3: Define the linguistic term Ai ; fori ¼ 1; 2; . . . ; n, which represented by fuzzy sets of the main-factor. A1 ¼ 1=u1 þ 0:5=u2 þ 0=u3 þ 0=u4 þ 0=u5 þ . . . þ 0=un2 þ 0=un1 þ 0=un A2 ¼ 0:5=u1 þ 1=u2 þ 0:5=u3 þ 0=u4 þ 0=u5 þ . . . þ 0=un2 þ 0=un1 þ 0=un A3 ¼ 0=u1 þ 0:5=u2 þ 1=u3 þ 0:5=u4 þ 0=u5 þ . . . þ 0=un2 þ 0=un1 þ 0=un .. .. .. . . . An ¼ 0=u1 þ 0=u2 þ 0=u3 þ 0=u4 þ 0=u5 þ . . . þ 0=un2 þ 0:5=un1 þ 1=un ð6Þ
where A1 ; A2 ; . . . ; An are linguistic terms to describe the values of the main-factor which are divided into equal length intervals u1 ; u2 ; . . . ; un . And also, define the linguistic term Bj ; forj ¼ 1; 2; . . . ; m, which represented by fuzzy sets of the second-factor as follows:
B1 ¼ 1=v 1 þ 0:5=v 2 þ 0=v 3 þ . . . þ 0=v m2 þ 0=v m1 þ 0=v m B2 ¼ 0:5=v 1 þ 1=v 2 þ 0:5=v 3 þ . . . þ 0=v m2 þ 0=v m1 þ 0=v m B3 .. .
¼ 0=v 1 þ 0:5=v 2 þ 1=v 3 þ . . . þ 0=v m2 þ 0=v m1 þ 0=v m .. .. . .
Bm ¼ 0=v 1 þ 0=v 2 þ 0=v 3 þ . . . þ 0=v m2 þ 0:5=v m1 þ 1=v m ð7Þ
Step 1: Define the universe of discourse Uof the main-factor U ¼ ½Dmin D1 ; Dmax þ D2 , where Dmin and Dmax are the minimum value and the maximum value from the known historical data, respectively, and D1 and D2 are two proper positive real numbers to divide the universe of discourse U into equal length intervals. And also, define the universe of discourse V of the second-factor V ¼ ½Emin E1 ; Emax þ E2 , where Emin and Emax are the minimum value and the maximum value from the known historical data, respectively, and E1 and E2 are two selected positive numbers to divide the universe of discourse V into equal length intervals. Initialize the PSO parameters (weight factor, acceleration factor and maximum iteration) with the predefined values and generate initial random velocity and position of all particles. Divide the particle into two parts. Each part consist of n 1 ‘X’ and m 1 ‘Y’, where the X means the main-factor’s each interval and the Y means the second-factor’s each interval. In the sequence, the format of each particle is represented as follows: where xi1 6 xi2 6 . . . 6 xiðn1Þ and yi1 P yi2 P . . . P yiðm1Þ , for i ¼ 1; 2; . . . ; c; c is a swarm size. Here, the universe of discourse U of the main-factor is partitioned into n intervals u1 ; u2 ; . . . ; un where u1 ¼ ½U min ; xi1 ; u2 ¼ ½xi1 ; xi2 ; . . ., and un ¼ ½xiðn1Þ ; U max . Also, the universe of discourse V of the second-factor is partitioned into m intervals v 1 ; v 2 ; . . . ; v m where v 1 ¼ ½yi1 ; V max ; v 2 ¼ ½yi2 ; yi1 ; . . ., and v m ¼ ½V min ; yiðm1Þ . Step 2: Encode the values from the position of all particles into the particles in the PSO. Here, the two-factors are encoded in a single particle as shown in Fig. 3.
where B1 ; B2 ; . . . ; Bm are linguistic terms to describe the values of the second-factor which are divided into equal length intervals v 1; v 2; . . . ; v m. Step 4: Fuzzify the historical data described as follows. Find out the intervalui , where 1 6 i 6 n, to which the value of the main-factor belongs to ui . Case (1) If the value of the main-factor belongs to u1 , then the value of the main-factor is fuzzified into 1=A1 þ 0:5=A2 , denoted by X 1 . Case (2) If the value of the main-factor belongs to ui , where 2 6 i 6 n 1, then the value of the main-factor is fuzzified into 0:5=Ai1 þ 1=Ai þ 0:5=Aiþ1 , denoted byX i . Case (3) If the value of the main-factor belongs to un , then the value of the main-factor is fuzzified into 0:5=An1 þ 1=An , denoted by X n .Find out the interval v j , where 1 6 j 6 m, to which the value of the second-factor belongs to v 1 . Case (1) If the value of the second-factor belongs to v 1 , then the value of the second-factor is fuzzified into 1=B1 þ 0:5=B2 , denoted by Y 1 . Case (2) If the value of the second-factor belongs to v j , where 2 6 j 6 m 1, then the value of the second-factor is fuzzified into 0:5=Bj1 þ 1=Bj þ 0:5=Bjþ1 , denoted by Y j . Case (3) If the value of the second-factor belongs to v m , then the value of the second-factor is fuzzified into 0:5=Bm1 þ 1=Bm , denoted by Y m .
962
J.-I. Park et al. / Expert Systems with Applications 37 (2010) 959–967
Step 5: Get the two-factors nth-order fuzzy logical relationships based on the fuzzified main-factor and the fuzzified second-factor from the fuzzified historical data obtained in Step 4. If the fuzzified historical data of the main-factor of day i is X i , then construct the two-factors kth-order fuzzy logical relationships ‘ððX ik ; Y ik Þ; . . . ; ðX i2 ; Y i2 Þ; ðX i1 ; Y i1 ÞÞ ! X i ’ from day i k to day i, where 2 6 k 6 n and X ik ; . . . ; X i2 ; X i1 denote the fuzzified values of the main-factor of days i k; . . . ; i 2; i 1, respectively; Y ik ; . . . ; Y i2 ; Y i1 denote the fuzzified value of the second-factor of days i k; . . . ; i 2; i 1, respectively. Then, divide the derived fuzzy logical relationships into fuzzy logical relationship groups based on the current states of the fuzzy logical relationships. Step 6: Calculate the forecasted values for each individual based on the following principles. (1) If the two-factors kth-order fuzzified historical data before day i are ðX ik ; Y ik Þ; ðX iðk1Þ ; Y iðk1Þ Þ; . . ., and ðX i1 ; Y i1 Þ, where k P 2; X ik ; X iðk1Þ ; . . . ; X i1 and X j are fuzzified values represented by fuzzy sets of the main-factor fuzzy time series; Y ik ; Y iðk1Þ ; . . . ; Y i1 are fuzzified values represented by fuzzy sets of the second-factor fuzzy time series, and there is the following fuzzy logical relationship in the kth-order fuzzy logical relationship groups, shown as follows:
ðX ik ; Y ik Þ; ðX iðk1Þ ; Y iðk1Þ Þ; . . . ; ðX i1 ; Y i1 Þ ! X j then the forecasted value tj of day i is calculated as follows:
tj ¼
8 m þ0:5m 1 2 ; > > < 1þ0:5
0:5mj1 þmj þ0:5mjþ1 0:5þ1þ0:5 > > : 0:5mn1 þmn ; 0:5þ1
if j ¼ 1 ; if 2 6 j 6 n 1
ð8Þ
if j ¼ n
Initialize; Define the universe of discourse (U, V); Generate random population of N solutions (particles); Initialize the value of the weight factor w;
Do
For each particle i∈ N ; Construct the two-factors nth-order fuzzy time series of particle i; Calculate objective value (i) according to Eq. (13); Set pbest as the best position of particle i; If objective value (i) is better than pbest; pbest(i) = objective value(i); End; Set gbest as the best objective value of all particles; For each particle; Calculate particle velocity according to Eq. (1); Update particle position according to Eq. (2); End; Update the value of the weight factor w according to Eq. (3);
Until
Check if termination condition is true.
Fig. 4. PSO based two-factors high-order fuzzy time series forecasting algorithm.
(2)
where mj1 ; mj and mjþ1 are the midpoints of the intervals uj1 ; uj and ujþ1 , respectively. If the two-factors kth-order fuzzified historical data before day i are ðX ik ; Y ik Þ; ðX iðk1Þ ; Y iðk1Þ Þ; . . ., and ðX i1 ; Y i1 Þ, where k P 2; X ik ; X iðk1Þ ; . . . ; X i1 ; X j1 ; X j2 ; . . ., and X jp are fuzzified values represented by fuzzy sets of the main-factor fuzzy time series; Y ik ; Y iðk1Þ ; . . . ; Y i1 are fuzzified values represented by fuzzy sets of the second-factor fuzzy time series and the fuzzy logical relationships in the kth-order fuzzy logical relationship group are shown as follows:
ðX ik ; Y ik Þ; ðX iðk1Þ ; Y iðk1Þ Þ; . . . ; ðX i1 ; Y i1 Þ ! X j1 ðX ik ; Y ik Þ; ðX iðk1Þ ; Y iðk1Þ Þ; . . . ; ðX i1 ; Y i1 Þ ! X j2 .. .
ð9Þ
ðX ik ; Y ik Þ; ðX iðk1Þ ; Y iðk1Þ Þ; . . . ; ðX i1 ; Y i1 Þ ! X jp where X ik ; X iðk1Þ ; . . . ; X i1 ; X j1 ; X j2 ; . . ., and X jp are fuzzy sets of the main-factor and Y ik ; Y iðk1Þ ; . . . ; Y i1 are fuzzy sets of the second-factor, and the number of
Table 1 Historical data of the TAIFEX and the TAIEX from August 3, 1998 to September 30, 1998 (Huarng, 2001b). Date
TAIFEX index
TAIEX index
8/3/1998 8/4/1998 8/5/1998 8/6/1998 8/7/1998 8/10/1998 8/11/1998 8/12/1998 8/13/1998 8/14/1998 8/15/1998 8/17/1998 8/18/1998 8/19/1998 8/20/1998 8/21/1998 8/24/1998 8/25/1998 8/26/1998 8/27/1998 8/28/1998 8/29/1998 8/31/1998 9/1/1998 9/2/1998 9/3/1998 9/4/1998 9/5/1998 9/7/1998 9/8/1998 9/9/1998 9/10/1998 9/11/1998 9/14/1998 9/15/1998 9/16/1998 9/17/1998 9/18/1998 9/19/1998 9/21/1998 9/22/1998 9/23/1998 9/24/1998 9/25/1998 9/28/1998 9/29/1998 9/30/1998
7552 7560 7487 7462 7515 7365 7360 7330 7291 7320 7300 7219 7220 7285 7274 7225 6955 6949 6790 6835 6695 6728 6566 6409 6430 6200 6403.2 6697.5 6722.3 6859.4 6769.6 6709.75 6726.5 6774.55 6762 6952.75 6906 6842 7039 6861 6926 6852 6890 6871 6840 6806 6787
7599 7593 7500 7472 7530 7372 7384 7352 7363 7348 7372 7274 7182 7293 7271 7213 6958 6908 6814 6813 6724 6736 6550 6335 6472 6251 6463 6756 6801 6942 6895 6804 6842 6860 6858 6973 7001 6962 7150 7029 7034 6962 6980 6980 6911 6885 6834
963
J.-I. Park et al. / Expert Systems with Applications 37 (2010) 959–967
ðX ik ; Y ik Þ; ðX iðk1Þ ; Y iðk1Þ Þ; . . . ; ðX i1 ; Y i1 Þ ! #
X j1 ; X j2 ; . . ., and X jp appearing in the fuzzy logical relationship group are nj1 ; nj2 ; . . ., and njp , respectively, then the forecasted value of day i is calculated as follows:
nj1 tj1 þ nj2 tj2 þ þ njp tjp nj1 þ nj2 þ þ njp
(3)
ð11Þ
Then the forecasted value of day i is calculated as follows:
ð10Þ
1 t ik þ 2 tiðk1Þ þ þ k ti1 1 þ 2 þ þ k
where the values of t j1 ; tj2 ; . . ., and t jp are calculated by (8), respectively. If the two-factors kth-order fuzzified historical data before day i are X ik ; Y ik ; ðX iðk1Þ ; Y iðk1Þ Þ; . . ., and ðX i1 ; Y i1 Þ, where k P 2; X ik ; X iðk1Þ ; . . ., and X i1 are fuzzified values represented by fuzzy sets of the main-factor fuzzy time series; Y ik ; Y iðk1Þ ; . . . ; Y i1 are fuzzified values represented by fuzzy sets of the second-factor fuzzy time series, and there is the following fuzzy logical relationship in the kth-order fuzzy logical relationship groups in which the right-hand side of the fuzzy logical relationship is an unknown value ‘#’, shown as follows:
ð12Þ
where the values of tik ; tiðk1Þ ; . . ., and t i1 are calculated by (8), respectively.Here, we use the mean square error (MSE) as shown in Eq. (13) as the objective values of each particle in the particle swarm optimization.
Pn MSE ¼
i¼1 ðForecasted
value of i Actual value of iÞ2 n
100 ð13Þ Step 7: Calculate the pbest based on objective values for each particle. If the obtained objective value has better than the previous value, then the pbest is updated with the
Fig. 5. Convergence of best objective values for GA and PSO based forecasting methods (seventh-order case).
Fig. 6. Convergence of average objective values for GA and PSO based forecasting methods (seventh-order case).
Table 2 A comparison of mean square errors for different orders of each method. Algorithms
First-order
Second-order
Third-order
Fourth-order
Fifth-order
Sixth-order
Seventh-order
Eighth-order
GA based forecasting (Lee et al., 2008) PSO based forecasting
799.19 577.45
193.88 115.17
208.79 121.55
142.26 101.67
143.31 70.47
147.14 72.38
105.02 55.96
124.45 61.71
Table 3 The partition of the universe of discourse both the main-factor and the second-factor of the best particle (seventh-order case). Algorithms
x1
x2
x3
x4
x5
x6
x7
x8
x9
x10
x11
x12
x13
x14
x15
GA based forecasting PSO based forecasting
6300 6200
6372 6414
6455 6566
6679 6701
6722 6726
6730 6769
6833 6794
6846 6848
6876 6880
6920 6916
6971 6952
7107 7039
7211 7221
7231 7288
7369 7325
GA based forecasting PSO based forecasting
y1 7466 7481
y2 7170 7301
y3 7111 7297
y4 6967 7004
y5 6939 6996
y6 6865 6978
y7 6814 6929
y8 6785 6902
y9 6742 6588
y10 6723 6532
y11 6676 6412
y12 6641 6408
y13 6571 6398
y14 6536 6361
y15 6488 6304
964
J.-I. Park et al. / Expert Systems with Applications 37 (2010) 959–967
new value for each particle. Otherwise, the pbest value remains the previous best value, respectively. And then set the gbest of all particles, which choose the best pbest of all particles. If new objective value of gbest has better than the previous value, then the gebset is updated as the new value, otherwise it remains the previous value. Step 8: For each particle, calculate particle velocity according to Eq. (1), which based on the gbest and the pbest values and then update particle position according to Eq. (2). This step will lead the each particle to a more promising solution for optimal tuning of two-factors nth-order fuzzy time series. Step 9: Update the value of the weight factor w according to Eq. (3). Step 10: Stop if the stop criterion of the PSO, the maximum number of iteration, is satisfied. Otherwise, go to Step 2.
The overall computational flow of the proposed PSO based twofactors high-order fuzzy time series forecasting is described in Fig. 4.
Table 5 Parameters used for PSO and GA based forecasting. PSO parameters
GA parameters
Swarm size: 30 Maximum number of iterations: 500 c1 ¼ 0:2; c2 ¼ 2 wmin ; wmax ¼ 0:4; 0:9
Population size: 30 Maximum number of iterations: 500 Type of selection: grade selection Type of crossover: simple crossover [0.8] Type of mutation: dynamic mutation [0.05]
Universe of discourse: main-factor ½U min ¼ 6100; U max ¼ 7700, second-factor ½V min ¼ 6100; V max ¼ 7700
Table 4 A comparison of the forecasting values of the TAIFEX and the mean square errors for different forecasting methods. Date
Actual TAIFEX index
Chen’s method (Chen, 1996)
Huarng’s method (Huarng, 2001a) (two-variable heuristic)
Huarng’s method (Huarng, 2001b) (three-variable heuristic)
Lee et al.’s method (Lee et al., 2006)
Lee et al.’s method (Lee et al., 2008) (GA; seventh-order; annealing constant a ¼ 0:9)
Proposed forecasting method (seventh-order case)
8/3/1998 8/4/1998 8/5/1998 8/6/1998 8/7/1998 8/10/1998 8/11/1998 8/12/1998 8/13/1998 8/14/1998 8/15/1998 8/17/1998 8/18/1998 8/19/1998 8/20/1998 8/21/1998 8/24/1998 8/25/1998 8/26/1998 8/27/1998 8/28/1998 8/29/1998 8/31/1998 9/1/1998 9/2/1998 9/3/1998 9/4/1998 9/5/1998 9/7/1998 9/8/1998 9/9/1998 9/10/1998 9/11/1998 9/14/1998 9/15/1998 9/16/1998 9/17/1998 9/18/1998 9/19/1998 9/21/1998 9/22/1998 9/23/1998 9/24/1998 9/25/1998 9/28/1998 9/29/1998 9/30/1998
7552 7560 7487 7462 7515 7365 7360 7330 7291 7320 7300 7219 7220 7285 7274 7225 6955 6949 6790 6835 6695 6728 6566 6409 6430 6200 6403.2 6697.5 6722.3 6859.4 6769.6 6709.75 6726.5 6774.55 6762 6952.75 6906 6842 7039 6861 6926 6852 6890 6871 6840 6806 6787
– 7450 7450 7500 7500 7450 7300 7300 7300 7183.33 7300 7300 7183.33 7183.33 7183.33 7183.33 7183.33 6850 6850 6775 6850 6750 6775 6450 6450 6450 6450 6450 6750 6775 6850 6775 6775 6775 6775 6775 6850 6850 6850 6850 6850 6850 6850 6850 6850 6850 6850
– 7450 7450 7450 7500 7450 7350 7300 7350 7100 7350 7300 7100 7300 7100 7100 7100 6850 6850 6650 6750 6750 6650 6450 6550 6350 6450 6550 6750 6850 6750 6650 6850 6850 6650 6850 6950 6850 6950 6850 6850 6850 6950 6850 6750 6750 6750
– 7450 7450 7500 7500 7450 7300 7300 7300 7188.33 7300 7300 7100 7300 7188.33 7100 7100 6850 6850 6775 6750 6750 6650 6450 6550 6350 6450 6550 6750 6850 6750 6650 6775 6775 6775 6850 6850 6850 6850 6850 6850 6850 6850 6850 6750 6850 6750
– – – 7450 7550 7350 7350 7350 7250 7350 7350 7250 7250 7250 7250 7250 6950 6950 6750 6850 6650 6750 6550 6450 6450 6250 6450 6650 6750 6850 6750 6750 6750 6817 6817 6817 6950 6850 7050 6850 6950 6850 6850 6850 6850 6850 6750
– – – – – – – 7329 7289.5 7329 7289.5 7215 7215 7289.5 7289.5 7215 6949.5 6949.5 6796 6848 6698.5 6726 6569.5 6417 6417 6205 6417 6698.5 6726 6848 6763 6726 6726 6763 6763 6949.5 6904.5 6848 7064 6848 6904.5 6848 6904.5 6848 6848 6796 6796
– – – – – – – 7325 7287.5 7325 7287.5 7221.3 7221.3 7287.5 7287.5 7221.3 6952.2 6952.2 6794.3 6848.2 6700.8 6725.6 6566 6414.1 6414.1 6200 6414.1 6700.8 6725.6 6848.2 6768.7 6700.8 6725.6 6768.7 6768.7 6952.2 6916 6848.2 7039 6848.2 6916 6848.2 6880.5 6880.5 6848.2 6794.3 6794.3
9668.94
7856.5
5437.58
1364.56
MSE
105.02
For the reference, the detailed descriptions about initial parameters for PSO and GA algorithms are presented in Table 5.
55.96
J.-I. Park et al. / Expert Systems with Applications 37 (2010) 959–967
4. Experimental results We use two stock market datasets to perform comparative study with most recently studied GA based two-factors high-order fuzzy time series (Lee et al., 2008). In the first experiment, we apply the proposed method to the TAIFEX from August 3, 1998 to September 30, 1998 where the TAIFEX is called the main-factor and the TAIEX (Taiwan stock exchange capitalization weighted stock index) is called the second-factor. In the second case, we perform to forecast the Korea composite stock price index (KOSPI) 200, where the futures price can be considered as the main-factor and the underlying price can be considered as the second-factor. For comparing the performances of the two methods, all experiments are fifty times executed for each fixed size of population and iteration, and then we present the best performance solutions in each experiment.
965
results shows that the proposed method has better prediction accuracy in terms of mean square error. The simulation results according to the orders of each model are presented in Table 2 where the proposed method shows better forecasting accuracy than GA based forecasting method for all orders. It is noted that the seventh-order case shows the best performance. The partitions of the universe of discourse both the mainfactor and the second-factor having best forecasting are obtained as shown in Table 3 where the universe of discourse of the mainfactor and the second-factor are divided into 16 intervals as the same in Lee et al. (2008). On the other hand, Table 4 denotes several comparisons of the forecasted results of the TAIFEX in terms of mean square errors for different forecasting methods. As shown in this Table, the proposed method shows better forecasting results than previous forecasting ones. 4.2. Forecasting for KOSPI 200
4.1. Forecasting for TAIFEX Table 1 shows the historical data of TAIFEX and TAIEX from August 3, 1998 to September 30, 1998. The following figures show the performance of the TAIFEX forecasting. The convergence of best objective values and average objective values for GA and PSO based two-factors seventh-order fuzzy time series are depicted in Figs. 5 and 6, respectively. These
Table 6 Historical data of the KOSPI 200 from March 3, 2008 to April 30, 2008. Date
Futures price
Underlying price
3/3/2008 3/4/2008 3/5/2008 3/6/2008 3/7/2008 3/10/2008 3/11/2008 3/12/2008 3/13/2008 3/14/2008 3/17/2008 3/18/2008 3/19/2008 3/20/2008 3/21/2008 3/24/2008 3/25/2008 3/26/2008 3/27/2008 3/28/2008 3/31/2008 4/1/2008 4/2/2008 4/3/2008 4/4/2008 4/7/2008 4/8/2008 4/10/2008 4/11/2008 4/14/2008 4/15/2008 4/16/2008 4/17/2008 4/18/2008 4/21/2008 4/22/2008 4/23/2008 4/24/2008 4/25/2008 4/28/2008 4/29/2008 4/30/2008
213.05 213.65 213.85 217.4 211.5 208.05 208.9 212.8 206 206 200.95 202.95 208.3 207.35 211.55 212.8 214.9 216.05 214.8 219.1 217.6 219.1 225.85 228 229.45 229.75 227.15 228.4 229.9 225.7 224.6 227.3 228.85 228.9 232.5 230.9 232.35 232.7 236.55 236.4 234.75 237
211.73 212.14 212.14 214.92 210.57 206.29 208.06 210.49 204.42 202.63 199.68 201.79 206.48 206.62 209.69 211 213.73 214.41 213.85 217.22 217.65 217.81 223.76 226.99 226.95 227.79 225.1 226.44 228.8 224.54 223.51 225.36 226.98 227.25 231.11 229.22 230.56 230.77 234.56 234.79 233.24 235
The Korea composite stock price index (KOSPI) 200 consists of 200 dominant items selected from all stocks listed on the Korea exchange (KRX) stock market. We consider the daily stock price index from March 3, 2008 to April 30, 2008 also shown in Table 6. Here, the futures price can be considered as the main-factor and the underlying price can be considered as the second-factor. The historical data of the KOSPI 200 were taken from: http://eng.krx.co.kr/index.html. The following figures show the performance of the KOSPI 200 forecasting. The convergence of best objective values and average objective values for GA and PSO based two-factors eighth-order fuzzy time series are depicted in Figs. 7 and 8, respectively. More detail results to forecast the KOSPI 200 are shown in Table 7. The partitions of the universe of discourse both the main-factor and the second-factor having best forecasting are obtained as shown in Table 8 where the universe of discourse of the main-factor and the second-factor are divided into 16 intervals. Table 9 shows a comparison of the forecasted values of the KOSPI 200 and the mean square errors of each method. The proposed
Fig. 7. Convergence of best objective values for GA and PSO based forecasting methods (eighth-order case).
Fig. 8. Convergence of average objective values for GA and PSO based forecasting methods (eighth-order case).
966
J.-I. Park et al. / Expert Systems with Applications 37 (2010) 959–967
method has a smaller mean square error than the GA based forecasting method.
So, fuzzy time series has shown good performances for these real world problems. This paper presented a new forecasting method based on PSO and two-factors high-order fuzzy time series. After applying the proposed forecasting method for the real world datasets of TAIFEX and KOSPI 200, we found that our approach shows better forecasting accuracy than previous ones. In this work, we forecasted the future price of a stock based solely on the trends
5. Concluding remarks Stock market is very volatile time series in nature and it has difficult to make the potential relationship as a mathematical model. Table 7 A comparison of mean square errors for different orders of each method. Algorithms
First-order
Second-order
Third-order
Fourth-order
Fifth-order
Sixth-order
Seventh-order
Eighth-order
GA based forecasting PSO based forecasting
1.80 0.71
0.36 0.21
0.26 0.17
0.26 0.24
0.27 0.17
0.26 0.18
0.26 0.17
0.25 0.16
Table 8 The partition of the universe of discourse for the main-factor and the second-factor of the best particle (eighth-order case). Algorithms
x1
x2
x3
x4
x5
x6
x7
x8
x9
x10
x11
x12
x13
x14
x15
GA based forecasting PSO based forecasting
199 201
204 203
207 206
208 208
210 212
214 215
217 219
219 225
221 227
224 229
227 230
229 231
231 233
234 235
239 237
y1
y2
y3
y4
y5
y6
y7
y8
y9
y10
y11
y12
y13
y14
y15
237 233
232 223
226 221
221 220
216 219
211 218
208 216
202 214
196 210
195 206
192 206
190 199
189 197
188 196
187 186
GA based forecasting PSO based forecasting
Table 9 A comparison of the forecasting values of the KOSPI 200 and the mean square errors of each method. Date
Actual futures price
GA based forecasting (eighth-order case)
PSO based forecasting (eighth-order case)
3/3/2008 3/4/2008 3/5/2008 3/6/2008 3/7/2008 3/10/2008 3/11/2008 3/12/2008 3/13/2008 3/14/2008 3/17/2008 3/18/2008 3/19/2008 3/20/2008 3/21/2008 3/24/2008 3/25/2008 3/26/2008 3/27/2008 3/28/2008 3/31/2008 4/1/2008 4/2/2008 4/3/2008 4/4/2008 4/7/2008 4/8/2008 4/10/2008 4/11/2008 4/14/2008 4/15/2008 4/16/2008 4/17/2008 4/18/2008 4/21/2008 4/22/2008 4/23/2008 4/24/2008 4/25/2008 4/28/2008 4/29/2008 4/30/2008 MSE
213.05 213.65 213.85 217.4 211.5 208.05 208.9 212.8 206 206 200.95 202.95 208.3 207.35 211.55 212.8 214.9 216.05 214.8 219.1 217.6 219.1 225.85 228 229.45 229.75 227.15 228.4 229.9 225.7 224.6 227.3 228.85 228.9 232.5 230.9 232.35 232.7 236.55 236.4 234.75 237
– – – – – – – – 205.88 205.88 200.94 203.2 207.95 207.95 212.07 212.07 215.3 215.3 215.3 218.59 218.59 218.59 225.38 227.48 229.99 229.99 227.48 228.73 229.99 225.38 225.38 227.48 228.73 228.73 232.59 229.99 232.59 232.59 237.38 237.38 234.51 237.38 0.25
– – – – – – – – 206 206 200.95 202.95 207.83 207.83 212.18 212.18 215.25 215.25 215.25 218.60 218.60 218.60 225.38 228.54 229.70 229.70 227.22 228.54 229.70 225.38 225.38 227.22 228.54 228.54 232.52 230.90 232.52 232.52 236.65 236.65 234.75 236.65 0.16
For the reference, the detailed descriptions about initial parameters for PSO and GA algorithms are presented in Table 10.
J.-I. Park et al. / Expert Systems with Applications 37 (2010) 959–967 Table 10 Parameters used for PSO and GA based forecasting. PSO parameters
GA parameters
Swarm size: 30 Maximum number of iterations: 500 c1 ¼ 0:2; c2 ¼ 2 wmin ; wmax ¼ 0:4; 0:9
Population size: 30 Maximum number of iterations: 500 Type of selection: grade selection Type of crossover: simple crossover [0.8] Type of mutation: dynamic mutation [0.05] Universe of discourse: main-factor ½U min ¼ 185; U max ¼ 240, second-factor ½V min ¼ 185; V max ¼ 240
of the two-factors; adding the multi factors may render more favorable forecasting accuracy if they have potential correlations with the stock price. Therefore, multi factor forecasting based on the described scheme is now under consideration for further studies. References Chen, S. M. (1996). Forecasting enrollments based on fuzzy time series. Fuzzy Sets and Systems, 81(3), 311–319. Chen, S. M. (2002). Forecasting enrollments based on high-order fuzzy time series. Cybernetics and Systems, 33(1), 1–16. Emad, E., Tarek, H., & Donald, G. (2005). Comparison among five evolutionary-based optimization algorithm. Advanced Engineering Informatics, 19, 43–53. Gaing, Z. L. (2004). A particle swarm optimization approach for optimum design of PID controller in AVR system. IEEE Transactions on Energy Conversion, 19(2), 384–391. Gen, M., & Cheng, R. (1997). Genetic algorithms and engineering design. New York: John Wiley & Sons. Goldberg, D. E. (1998). Genetic algorithms in search optimization and machine learning. Massachusetts: Addison-Wesley. Goldberg, D. E., Korb, B., & Deb, K. (1989). Messy genetic algorithms: Motivation, analysis, and first results. Complex Systems, 3(5), 493–530.
967
Holland, J. H. (1975). Adaptation in natural and artificial systems. Massachusetts: MIT Press. Huarng, K. (2001a). Effective lengths of intervals to improve forecasting in fuzzy time series. Fuzzy Sets and Systems, 123(3), 387–394. Huarng, K. (2001b). Heuristic models of fuzzy time series for forecasting. Fuzzy Sets and Systems, 123(3), 369–386. Kennedy, J., & Eberhart, R. C. (1995). Particle swarm optimization. In Proceedings of IEEE international conference on neural networks (Vol. 4, pp. 1942–1948). Kirkpatric, S., Gelatt, C. D., Jr., & Vecchi, M. P. (1983). Optimization by simulated annealing. Science, 220(4598), 671–880. Lee, L. W., & Chen, S. M. (2004). Temperature prediction using genetic algorithms and fuzzy time series. In Proceedings of the 2004 international conference on information managements, Miaoli, Taiwan, Republic of China (pp. 299–306). Lee, L. W., Wang, L. H., Chen, S. M., & Leu, Y. H. (2004). A new method for handling forecasting problems based on two-factors high-order fuzzy time series. In Proceedings of the 2004 ninth conference on artificial intelligence and applications, Taipei, Taiwan, Republic of China. Lee, L. W., Wang, L. H., & Chen, S. M. (2008). Temperature prediction and TAIFEX forecasting based on high-order fuzzy logical relationships and genetic simulated annealing techniques. Expert Systems with Applications, 34, 328–336. Lee, L. W., Wang, L. H., Chen, S. M., & Leu, Y. H. (2006). Handling forecasting problems based on two-factors high-order fuzzy time series. IEEE Transactions on Fuzzy Systems, 14(3), 468–477. Lin, C. H., & Chen, S. M. (2004). A new method for multiple DNA sequence alignment based on genetic simulated annealing algorithms. In Proceedings of the 2004 international conference on information management, Miauli, Taiwan, Republic of China. Panda, S., & Padhy, N. P. (2007). Comparison of particle swarm optimization and genetic algorithm for FACTS-based controller design. Applied Soft Computing, ASOC-417, 1–10. Song, Q., & Chissom, B. S. (1993a). Fuzzy time series and its models. Fuzzy Sets and Systems, 54(3), 269–277. Song, Q., & Chissom, B. S. (1993b). Forecasting enrollments with fuzzy time series – Part I. Fuzzy Sets and Systems, 54(1), 1–9. Song, Q., & Chissom, B. S. (1994a). Some properties of defuzzification neural networks. Fuzzy Sets and Systems, 61(1), 83–89. Song, Q., & Chissom, B. S. (1994b). Forecasting enrollments with fuzzy time series – Part II. Fuzzy Sets and Systems, 62(1), 1–8. Zadeh, L. A. (1965). Fuzzy sets. Information and Control, 8, 338–353.