Expert Systems with Applications 37 (2010) 4261–4265
Contents lists available at ScienceDirect
Expert Systems with Applications journal homepage: www.elsevier.com/locate/eswa
Time series forecasting by a seasonal support vector regression model Ping-Feng Pai a,*, Kuo-Ping Lin b, Chi-Shen Lin c, Ping-Teng Chang c a
Department of Information Management, National Chi Nan University, University Rd., Puli, Nantou 545, Taiwan Department of Information Management, Lunghwa University of Science and Technology, Taoyuan 333, Taiwan c Department of Industrial Engineering and Enterprise Information, Box 985, Tunghai University, Taichung 407, Taiwan b
a r t i c l e
i n f o
Keywords: Seasonal time series Forecast Support vector regression Seasonal autoregressive integrated moving average
a b s t r a c t The support vector regression (SVR) model is a novel forecasting approach and has been successfully used to solve time series problems. However, the applications of SVR models in a seasonal time series forecasting has not been widely investigated. This study aims at developing a seasonal support vector regression (SSVR) model to forecast seasonal time series data. Seasonal factors and trends are utilized in the SSVR model to perform forecasts. Furthermore, hybrid genetic algorithms and tabu search (GA/TS) algorithms are applied in order to select three parameters of SSVR models. In this study, two other forecasting models, autoregressive integrated moving average (SARIMA) and SVR are employed for forecasting the same data sets. Empirical results indicate that the SSVR outperforms both SVR and SARIMA models in terms of forecasting accuracy. Thus, the SSVR model is an effective method for seasonal time series forecasting. Ó 2009 Elsevier Ltd. All rights reserved.
1. Introduction The seasonal time series is a sequence of seasonal data points recorded sequentially in time. Over the past several decades, many works have been devoted to develop and improve seasonal time series forecasting models. The SARIMA model (Box & Jenkins, 1976) is one of the most popular approaches in seasonal time series forecasting. The SARIMA model has been successfully utilized in many fields of forecasting, such as a soil dryness index (Li, Campbell, Haswell, Sneeuwjagt, & Venables, 2003), predicting tourism demand (Goh & Law, 2002; Huang & Min, 2002) and municipal solid waste management (Navarro-Esbrı´, Diamadopoulos, & Ginestar, 2002). However, the seasonal time series is a complex and nonlinear problem. The artificial neural network (ANN) model is an alternative in forecasting seasonal data pattern (Zhang & Qi, 2005). Some literature (Nam & Schaefer, 1995; Pai & Hong, 2005; Tang & Fishwick, 1993; Williams, 1997) indicted that ANN can obtain desirable results in seasonal and trend forecasting. With the introduction of Vapnik’s e-insensitive loss function, SVR (Vapnik, Golowich, & Smola, 1996) has been extended so as to solve forecasting problems and has provided many exciting results. In recent years, SVR schemes have been extended to cope with forecasting problems, and have provided many promising results in customer demand (Levis & Papageorgiou, 2005), finance (Huang, Nakamori, & Wang, 2005; Kim, 2003; Tay & Cao, 2002) intermittent demand
* Corresponding author. Tel.: +886 49 2915205. E-mail addresses:
[email protected] (P.-F. Pai),
[email protected] (K.-P. Lin),
[email protected] (C.-S. Lin),
[email protected] (P.-T. Chang). 0957-4174/$ - see front matter Ó 2009 Elsevier Ltd. All rights reserved. doi:10.1016/j.eswa.2009.11.076
(Hua & Zhang, 2006), tourism demand (Pai & Hong, 2005), air quality (Lu & Wang, 2005), wind speed (Mohandes, Halawani, Rehman, & Hussain, 2004), plant control systems (Xi, Poo, & Chou, 2007), rainfall (Hong & Pai, 2007), prices for the electricity market (Gaoa, Bompard, Napoli, & Cheng, 2007) and flood control (Yu, Chen, & Chang, 2006). However, applications of the SVR models in seasonal time series data have not been widely studied. Therefore, this study attempts to develop a SSVR model for exploiting the unique strength of the decomposition techniques and for the SVR model in the seasonal time series forecasting problems. The rest of this paper is organized as follows. The SSVR model is introduced in Section 2. In Section 3, two numerical examples are utilized to demonstrate performances of different forecasting models. Finally, conclusions are made in Section 4.
2. Forecasting models 2.1. Support vector regression The support vector regression technique is based on the structured risk minimization principle. Rather than finding empirical errors, SVR aim to minimize an upper bound of the generalization error. A regression function is generated by applying a set of high N dimensional linear functions. Hence, given a set of data ðxi ; Ai Þi¼1 (where xi is the input vector; Ai is the actual value, and N is the total number of data patterns), the regression function is expressed as follows:
G ¼ w/ðxi Þ þ b
ð1Þ
4262
P.-F. Pai et al. / Expert Systems with Applications 37 (2010) 4261–4265
where /(xi) denotes the feature of the inputs, and w and b are coefficients. The coefficients (wi and b) are calculated by minimizing the regularized risk function shown as Eq. (2). N 1X 1 PðGÞ ¼ C Le ðAi ; Gi Þ þ kwk2 N i¼1 2
ð2Þ
0 if jAi Gi j 6 jAi Gi j e otherwise
e
ð3Þ
where C and e are user-defined parameters. The parameter e is the difference between actual values and values calculated from the regression function. This difference can be treated as a tube around the regression function. The points outside the tube are viewed as training errors. In Eq. (2), Le(Ai, Gi) is called an e-insensitive loss function. The loss equals zero if the forecasted value is within the e-tube. Furthermore, the second item of Eq. (2), 12 kwk2 , is used to estimate the flatness of a function. Thus, C is a parameter determining the trade-off between the empirical risk and the model flatness. Two positive slack variables (ni and ni Þ, representing the distance from actual values to the corresponding boundary values of the e-tube, are then introduced. These two slack variables equal zero when the data points fall within the e-tube. Eq. (2) is then reformulated into the following constrained form: N X 1 ni þ ni kwk2 þ C 2 i¼1
Min :
e þ ni ; i ¼ 1; 2; . . . ; N Ai w/ðxi Þ b 6 e þ ni ; i ¼ 1; 2; . . . ; N
In this study, the GA/TS (Glover, Kelly, & Laguna, 1995) algorithm with binary coding was employed to determine parameters of SSVR models. The procedure of GA/TS is illustrated as follows. Step 1 (Initialization): Establish randomly an initial population of chromosomes. Three parameters, r, C and e, are expressed in a binary format; and represented by a chromosome. Step 2 (Evaluating fitness): Evaluate the fitness of each chromosome. In this study, the negative MAPE was used as the fitness function and expressed as follows:
Fitness ¼ MAPEð%Þ ¼
ð4Þ Step 3
i ¼ 1; 2; . . . ; N
N X 1 ðni þ ni Þ kwk2 þ C 2 i¼1
N X
! Step 4
bi ½w/ðxi Þ þ b Ai þ e þ ni
i¼1
N X
N X bi Ai w/ðxi Þ b þ e þ ni ðai ni þ ai ni Þ
i¼1
ð5Þ
i¼1
Eq. (5) is minimized with respect to primal variables w, b, n, and n*, and is maximized with regard to non-negative Lagrangian multipliers ai, ai ; bi , and bi . Finally, Karush–Kuhn–Tucker conditions are applied to Eq. (4), and the dual Lagrangian form is given by Eq. (6). N X
Max
Ai ðbi bi Þ e
i¼1
Step 5
N X ðbi þ bi Þ i¼1
N X N 1X ðb bi Þðbj bj ÞKðxi ; xj Þ 2 i¼1 j¼1 i
subjective to
N X ðbi bi Þ ¼ 0
ð6Þ
i¼1
0 6 bi 6 C;
i ¼ 1; 2; . . . ; N
0 6 bi 6 C;
i ¼ 1; 2; . . . ; N
Step 6
The Lagrange multipliers in Eq. (6) satisfy the equality bi bi ¼ 0. The Lagrange multipliers, bi and bi , are determined, and an optimal weight vector of the regression hyperplane is represented by Eq. (7).
w ¼
N X i¼1
ðbi bi ÞKðx; xi Þ
ð8Þ
2.2. GA/TS algorithms for parameter selection of SVR models
This constrained optimization problem is solved by the following primal Lagrangian form:
Min
N X ðbi bi ÞKðx; xi Þ þ b
K(xi, xj) is a Kernel function whose value equals the inner product of two vectors, xi and xj, in the feature space /(xi) and /(xj), meaning that K(xi, xj) = /(xi) * /(xj). Any function that satisfies Mercer’s condition (Mercer, 1909) can serve as the Kernel function. This study employs the Gaussian function.
!
subjective to w/ðxi Þ þ b Ai 6 ni ; ni P 0;
Gðx; b; b Þ ¼
i¼1
where
Le ðAi ; Gi Þ ¼
Thus, the regression function is expressed by:
ð7Þ
Step 7 Step 8
M 100 X g t zt M t¼1 g t
ð9Þ
where M is the number of forecasting periods; gt is the actual value at period t; and zt is the forecasting value at period t. (Selection): Based on the fitness functions, chromosomes with higher fitness values are more likely to yield offspring in the next generation. The roulette wheel selection principle (Holland, 1975) is applied to select chromosomes for reproduction. (Crossover and mutation): Create new offspring by performing crossover and mutation operations. Mutations were performed randomly by converting a ‘‘1” bit into a ‘‘0” bit or a ‘‘0” bit into a ‘‘1” bit. In this study, the single-point-crossover principle was employed. Segments of paired chromosomes between two determined breakpoints are exchanged. The probabilities of crossover and mutation were set at 0.5 and 0.1, respectively. (Perform TS on each chromosome): Evaluating neighbor chromosome and adjusting the tabu list. The tabu list size is 20 in this study. The chromosome with the smallest MAPE value and not having been recorded in the tabu list was placed in the tabu list. In this study, a first-infirst-out policy is performed for operating the tabu list. If the best neighbor chromosome is the same as one of the chromosome in the tabu list, then it generates the next set of neighbor chromosomes and calculates the fitness value of chromosome. The next set of neighbor chromosome is generated from the best neighbor chromosome in the current iteration. (Current chromosome selection by TS): If the best neighbor chromosome is better then the current chromosome, then the current chromosome is replaced by the best neighbor chromosome. Otherwise, keep the current chromosome. (Next generation): Form a population for the next generation. The size of the population was set to 50. (Stop criterion). If the number of epochs equals a given scale, then the best chromosomes are presented as a solution; otherwise go back to Step 2. The number of epochs was set to 1000.
4263
P.-F. Pai et al. / Expert Systems with Applications 37 (2010) 4261–4265
tionary and to remove most of the seasonality. The suitable values of p, P, q and Q can be evaluated by the autocorrelation function and partial autocorrelation function of the differenced series. The parameter selection of the SARIMA model includes the following four iterative steps:
2.3. The proposed SSVR model with GA/TS algorithms Fig. 1 shows the flowchart of the proposed SSVR model. The seasonal time series data are decomposed into trend and seasonal components. The SVR model was then employed to forecast deseasonal data; and the GA/TS algorithm was performed to determine parameters of SVR models. To minimize forecasting error of the SVR model, the negative values of MAPE were treated as the objective values of the GA/TS algorithm; thus, each iteration generates a smaller MAPE value. The parameter search procedure is conducted until the stop criterion is reached. The three parameters resulting in the smallest validation error are then used to develop a appropriate SVR model. Finalized forecasting vales are obtained by multiplying predicted values by seasonal factors. Finally, after comparison with the testing data, the testing errors are obtained.
(a) (b) (c) (d)
3. Numerical examples and experimental results Two seasonal time series examples (Makridakis, Wheelwright, & Hyndman, 1998) are used to demonstrate the performance of the SSVR model. The first example is the industry sales for printing and the writing paper. Fig. 2a shows the monthly industry sales for printing and for the writing paper from January 1963 to December 1972. In Fig. 2a, a seasonal pattern of an increasing trend can be observed. The second example, which is illustrated in Fig. 2b, is the monthly shipments of a company that manufactures pollution equipment. Fig. 2b shows a weak seasonal pattern. In addition, MAPE and the root mean square error (RMSE) are used to measure the forecasting accuracy of three models. Eq. (11) shows the expression of RMSE:
2.4. SARIMA The SARIMA model (Box & Jenkins, 1976) is a popular tool in time series forecasting for data with seasonal pattern. The SARIMA (p, d, q) (P, D, Q)S process generates a time series, {Xt,t = 1,2,. . .,N},with mean l of Box and Jenkins time series model satisfying
uðBÞ/ðBS Þð1 BÞd ð1 BS ÞD ðX t uÞ ¼ hðBÞHðBS Þat
Identifying a tentative SARIMA model. Estimating parameters in the tentative model. Evaluating the adequacy of the tentative model. If an appropriate model is obtained, then applying this model for forecasting. Otherwise go to the step (a).
ð10Þ
where p, d, q, P, D and Q are non-negative integers; S is the seasonal length; u(B) = (1 u1 B u2 B2 up Bp) represents a regular autoregressive operator of order p, /(BS) = (1 /1 BS /2 B2S /p BPS) is a seasonal autoregressive operator of order P, h(B) = (1 h1 B h2 B2 hq Bq) denotes a regular moving average operator of order q, and H(BS) = (1 H1 BS H2 B2S HQ BQS) expresses a seasonal moving average operator of order Q. Additionally, B indicates the backward shift operator; d denotes the number of regular differences; D represents the number of seasonal differences; and at is the forecasted residual at timet. When fitting a SARIMA model to data, the first task is to estimate values of d and D, the orders of differencing needed to make the series sta-
( RMSE ¼
M 1 X ð g zt Þ 2 M t¼1 t
)0:5
where M is the number of forecasting periods; gt is the actual production value at period t; and zt is the forecasting production value at period t. For both SSVR models and SVR models, experimental data are divided into training, validation and testing data sets. Table 1 depicts the experimental data employed in this work for SSVR models and SVR models. The training data set was used to determine the
Seasonal time series data
Training data
Validation data
Testing data
Use SVR model to forecast decomposed data
Decomposeseasonal time series data Multiply forecasted data by seasonal factors Use GA/TS to select SVR parameters
ð11Þ
Obtain validation errors by tentative SVR models
Calculate testing errors
Calvulate training errors No Yes Is the GA/TS stop criterion satisfied ?
Yes Fig. 1. A flowchart of the SSVR model with GA/TS algorithms.
4264
P.-F. Pai et al. / Expert Systems with Applications 37 (2010) 4261–4265
forecasting model; the validation data set was for the purpose of preventing over-fitting of forecasting models; and the testing data set is employed to investigate the performance of different forecasting models. For both examples, the training data set and the validation set are used to design SARIMA models. For comparing the forecasting accuracy, the same testing data set is examined
for three forecasting models. Tables 2 and 3 show the forecasting performances and preferred parameters of three models for both examples, respectively. SSVR models outperformed the SVR models and SARIMA models in terms of forecasting accuracy for both examples. Furthermore, the SSVR model improves the forecasting accuracy more in example two than in example one. The cause is
1200
6000
1000
5000
800
4000
600
3000
400
2000
200
1000
0
0
1963/1/1
1965/1/1
1967/1/1
1969/1/1
1986/1/1 1988/1/1 1990/1/1 1992/1/1 1994/1/1 1996/1/1
1971/1/1
(a) Industry sales of printing and writing paper
(b) Monthly shipmenst of pollutione quipment
from Jan.1963 through Dec. 1972 (Example one) from Jan.1986 through Oct.1996 (Example two) Fig. 2. Illustrations of experimental data of two numerical examples.
Table 1 Simulation data sets for two examples. Data sets
Periods
Training data
Example 1 Example 2
January 1963–December 1970 January 1986–December 1993
Number of data 96
Validation data
Example 1 Example 2
January 1971–December 1971 January 1994–December 1994
12
Testing data
Example 1 Example 2
January 1972–December 1972 January 1995–December 1995
12
Table 2 Forecasting performances and parameters of three models (example one). Forecasting models Parameters
SARIMA (p, d, q)(P, D, Q)S = (2, 1, 1) (0, 1, 2)12
SVR (r, C, e) = (2.5, 2549.53, 0.023)
SSVR (r, C, e) = (6.947, 994.83, 9.59)
MAPE (%) RMSE
22.57 210.94
19.75 175.54
5.47 50.96
Table 3 Forecasting performances and parameters of three models (example two). Forecasting models Parameters
SARIMA (p, d, q)(P, D, Q)S = (2, 1, 1) (0, 1, 2)12
SVR (r, C, e) = (0.708, 3836.64, 5.982)
SSVR (r, C, e) = (0.143, 2602.5, 5.25)
MAPE (%) RMSE
24.171 1123.05
22.12 1046.715
19.389 881.451
Actual values
SVR
SARIMA
SSVR
1200 1000 800 600 400 200 2/
1/
/1 72 19
72
/1
/1 19
72
1
1
1 0/
1 /9 / 19
72 19
72 19
72
/8
/7 /
/1
1
/1 19
72
/5
/6
/1 19
72 19
72
/3 /
/4
1 19
/1 /2
72 19
1 72
/1 /
19
72 19
/1
0
Fig. 3. Illustration of actual values and forecasting values of three models (example one).
P.-F. Pai et al. / Expert Systems with Applications 37 (2010) 4261–4265
Actual values
SVR
SARIMA
4265
SSVR
6000 5000 4000 3000 2000 1000
19 95 /1 /1 19 95 /2 /1 19 95 /3 /1 19 95 /4 /1 19 95 /5 /1 19 95 /6 /1 19 95 /7 /1 19 95 /8 /1 19 95 /9 /1 19 95 /1 0/ 1 19 95 /1 1/ 1 19 95 /1 2/ 1
0
Fig. 4. Illustration of actual values and forecasting values of three models (example two).
that example two is a weak seasonal pattern. Therefore, the proposed SSVR can cope with data showing strong seasonal patterns. Figs. 3 and 4 make point-to-point comparisons of actual values and predicted values of example 1 and example 2, respectively. As shown in Figs. 3 and 4, the SVR model and the SARIMA model cannot efficiently capture the trend of data. On the other hand, SSVR is able to follow the data trend and can make accurate forecast. 4. Conclusions SVR models have been successfully used in time series forecasting problems, but they have not been widely explored in seasonal time series prediction. This study proposed a hybrid SSVR model for forecasting seasonal time series data, with experimental results being valid and satisfied. The superior performance of the SSVR model can be ascribed to two causes. First, the decomposition method enhances the ability of SSVR models in capturing seasonal nonlinear data patterns. Second, the GA/TS algorithm properly provides three parameters for the SSVR model. For future work, forecasting other types of time series data by a SVR-related model is a challenging issue for study. Another future research direction could consider using data preprocessing techniques for improvement in the forecasting accuracy of the SSVR model in a seasonal time series data. Acknowledgment This research was conducted with the support of National Science Council (96-2628-E-260-001-MY3). References Box, G., & Jenkins, G. (1976). Time series analysis: Forecasting and control. San Francisco: Holden-Day. Gaoa, C., Bompard, E., Napoli, R., & Cheng, H. (2007). Price forecast in the competitive electricity market by support vector machine. Physica A, 382, 98–113. Glover, F., Kelly, J. P., & Laguna, M. (1995). Genetic algorithms and tabu search: hybrids for optimization. Computers and Operations Research, 22, 111–134. Goh, C., & Law, R. (2002). Modeling and forecasting tourism demand for arrivals with stochastic nonstationary seasonality and intervention. Tourism Management, 23, 499–510. Holland, J. H. (1975). Adaptation in natural and artificial system. Ann Arbor: University of Michigan Press.
Hong, W. C., & Pai, P. F. (2007). Potential assessment of the support vector regression technique in rainfall forecasting. Water Resources Management, 21, 495–513. Huang, J.-H., & Min, J. C.-H. (2002). Earthquake devastation and recovery in tourism: The Taiwan case. Tourism Management, 23, 145–154. Huang, W., Nakamori, Y., & Wang, S. Y. (2005). Forecasting stock market movement direction with support vector machine. Computer and Operations Research, 32, 2513–2522. Hua, Z. S., & Zhang, B. (2006). A hybrid support vector machines and logistic regression approach for forecasting intermittent demand of spare parts. Applied Mathematics and Computation, 181, 1035–1048. Kim, K. J. (2003). Financial time series forecasting using support vector machines. Neurocomputing, 55, 307–319. Levis, A. A., & Papageorgiou, L. G. (2005). Customer demand forecasting via support vector regression analysis. Chemical Engineering Research and Design, 83, 1009–1018. Li, T., Campbell, E. P., Haswell, D., Sneeuwjagt, R. J., & Venables, W. N. (2003). Statistical forecasting of soil dryness index in the southwest of western Australia. Forest Ecology and Management, 183, 147–157. Lu, W. Z., & Wang, W. J. (2005). Potential assessment of the ‘‘Support Vector Machine” method in forecasting Ambient air pollutant trends. Chemosphere, 59, 693–701. Makridakis, S., Wheelwright, S. C., & Hyndman, R. J. (1998). Forecasting: Methods and applications. New York: Wiley. Mercer, J. (1909). Function of positive and negative type and their connection with the theory of integral equations. Philosophical Transactions of the Royal Society A, 209, 415–446. Mohandes, M. A., Halawani, T. O., Rehman, S., & Hussain, A. A. (2004). Support vector machines for wind speed prediction. Renewable Energy, 29, 939–947. Nam, K., & Schaefer, T. (1995). Forecasting international airline passenger traffic using neural networks. Logistics and Transportation Review, 31, 239–251. Navarro-Esbrı´, J., Diamadopoulos, E., & Ginestar, D. (2002). Time series analysis and forecasting techniques for municipal solid waste management. Resources, Conservation and Recycling, 35, 201–214. Pai, P. F., & Hong, W. C. (2005). An improved neural network model in forecasting tourist arrivals. Annals of Tourism Research, 32, 1138–1141. Tang, Z., & Fishwick, P. A. (1993). Feedforward neural nets as models for time series forecasting. ORSA Journal of Computing, 5, 374–385. Tay, F. E. H., & Cao, L. (2002). Modified support vector machines in financial time series forecasting. Neurcomputing, 48, 847–861. Vapnik, V., Golowich, S., & Smola, A. (1996). Support vector machine for function approximation, regression estimation, and signal processing. Advances in Neural Information Processing Systems, 9, 281–287. Williams, P. M. (1997). Modelling seasonality and trends in daily rainfall data. In: D. S. Touretzky, M. C. Mozer, & M. E. Hasselmo (Eds.), Advances in neural information processing systems (Vol. 10, pp. 985–991). Xi, X. C., Poo, A. N., & Chou, S. K. (2007). Support vector regression model predictive control on a HVAC plant. Control Engineering Practice, 15, 897–908. Yu, P. S., Chen, S. T., & Chang, I. F. (2006). Support vector regression for real-time flood stage forecasting. Journal of Hydrology, 328, 704–716. Zhang, G., & Qi, M. (2005). Neural network forecasting for seasonal and trend time series. European Journal of Operational Research, 160, 501–514.