Applied Mathematics and Computation 217 (2011) 6733–6747
Contents lists available at ScienceDirect
Applied Mathematics and Computation journal homepage: www.elsevier.com/locate/amc
Hybrid evolutionary algorithms in a SVR traffic flow forecasting model Wei-Chiang Hong a,⇑, Yucheng Dong b, Feifeng Zheng c, Shih Yung Wei d a
Department of Information Management, Oriental Institute of Technology, 58 Sec. 2, Sichuan Rd., Panchiao, Taipei 220, Taiwan, ROC Department of Organization and Management, School of Management, Xi’an Jiaotong University, Xi’an 710049, PR China Department of Management Science, School of Management, Xi’an Jiaotong University, Xi’an 710049, PR China d Department & Graduate Institute of Finance, National Yunlin University of Science & Technology, 123 Sec. 3, University Rd., Douliou, Yunlin 640, Taiwan, ROC b c
a r t i c l e
i n f o
Keywords: Traffic flow forecasting Support vector regression Hybrid genetic algorithm-simulated annealing algorithm (GA-SA) Hybrid evolutionary algorithms SARIMA Back-propagation neural network BPNN Holt–Winters (HW) Seasonal Holt–Winters (SHW)
a b s t r a c t Accurate urban traffic flow forecasting is critical to intelligent transportation system developments and implementations, thus, it has been one of the most important issues in the research on road traffic congestion. Due to complex nonlinear data pattern of the urban traffic flow, there are many kinds of traffic flow forecasting techniques in literature, thus, it is difficult to make a general conclusion which forecasting technique is superior to others. Recently, the support vector regression model (SVR) has been widely used to solve nonlinear regression and time series problems. This investigation presents a SVR traffic flow forecasting model which employs the hybrid genetic algorithm-simulated annealing algorithm (GA-SA) to determine its suitable parameter combination. Additionally, a numerical example of traffic flow data from northern Taiwan is used to elucidate the forecasting performance of the proposed SVRGA-SA model. The forecasting results indicate that the proposed model yields more accurate forecasting results than the seasonal autoregressive integrated moving average (SARIMA), back-propagation neural network (BPNN), Holt– Winters (HW) and seasonal Holt–Winters (SHW) models. Therefore, the SVRGA-SA model is a promising alternative for forecasting traffic flow. 2011 Elsevier Inc. All rights reserved.
1. Introduction Providing sufficient and necessary capacity of inter-urban motorway networks is an essential component of traffic control and information systems management, particularly during daily peak periods. It is necessary to provide better synchronizing traffic signals for traffic management authorities and to provide more effective routes selections for the drivers. Since slightly inaccurate capacity forecasting will lead to congestion with huge social costs in terms of travel time, fuel costs and environment pollution, accurate forecasting of the traffic flow during peak periods is a very topic attracted interests in the literature. A lot of various forecasting approaches have been applied to forecast the traffic flow of inter-urban motorway networks. Those approaches can be classified according to the type of data, forecast horizon, and potential end-use [1], including Kalman state space filtering models [2–5] and system identification models [6]. However, traffic flow data are in the form of spatial time series and are collected at specific locations at constant intervals of time. These studies as mentioned above have indicated that the problem of forecasting inter-urban motorway traffic flow is multi-dimensional, including relationships among measurements made at different times and geographical sites. In addition, these methods are difficult to cope with observation noise and missing values while modeling. Therefore, Danech-Pajouh and Aron [7] employ a layered statistical approach with a mathematical clustering technique to group the traffic flow data and a separately tuned linear regression
⇑ Corresponding author. E-mail address:
[email protected] (W.-C. Hong). 0096-3003/$ - see front matter 2011 Elsevier Inc. All rights reserved. doi:10.1016/j.amc.2011.01.073
6734
W.-C. Hong et al. / Applied Mathematics and Computation 217 (2011) 6733–6747
model for each cluster. Their experimental results reveal that the proposed model is superior to the other forecasting approach, autoregressive integrated moving average models (ARIMA). Based on the multi-dimensional pattern recognition requests, such as intervals of time, geographical sites, and the relationships between dependent variable and independent variables, non-parametric regression models [8–10] have also successfully been employed to forecast motorway traffic flow. Furthermore, the ARIMA model, initially developed by Box and Jenkins [11], is one of the most popular alternatives in traffic flow forecasting [10,12–15]. For example, Kamarianakis and Prastacos [13] successfully apply the ARIMA model with space and time factors to forecast space–time stationary traffic flow. However, the critical limitation of the ARIMA model is that its natural tendency to concentrate on the mean values of the past series data is unable to capture the rapid variational process underlying inter-urban traffic flow [16]. Recently, as an extension of the ARIMA model, Williams [17] employs the seasonal ARIMA (SARIMA) model to traffic flow forecasting. The proposed model considers the peak/non-peak flow periods by seasonal differencing and forecasting results report that it significantly outperforms the heuristic forecast generation method in terms of forecasting accuracy. However, it is quite time consuming to detect the outlier required and to estimate the parameter of SARIMA model. These new findings are also encouraging authors to employ the SARIMA model as the benchmarking model in this study. Since, as mentioned above, the process underlying inter-urban traffic flow is complicated, and is difficult to be captured by a single linear statistical algorithm. Being able to approximate any degree of complexity and without prior knowledge of problem solving, the artificial neural networks (ANN) has received much attention and has been considered as alternatives for traffic flow forecasting models [14,18–23]. ANN is based on emulating the processing of the human neurological system to determine related numbers of vehicle and temporal characteristics from the historical traffic flow patterns, especially for nonlinear and dynamic evolutions. Therefore, ANN is widely applied in traffic flow forecasting. Recently, Yin et al. [24] develop a fuzzy-neural model (FNM) to predict traffic flow in an urban street network. The FNM contains two modules, gate network (GN) and expert network (EN). The GN classifies the input data using fuzzy approach, and the EN identifies the input–output relationship by neural network approaches. The empirical results show that the FNM model provides more accurate forecasting results than the BPNN model. Based on the proper representation of traffic flow data with temporal and spatial characteristics, Vlahogianni et al. [22] employ a genetic algorithm based and multilayered structural optimization strategy to determine the appropriate neural network structure. Their results show that the capabilities of a simple static neural network, with genetically optimized step size, momentum and number of hidden units, are very satisfactory when modeling both univariate and multivariate traffic data. Even though ANN forecasting models can approximate to any function particularly to nonlinear function, the limitations of ANN are concentrated not only on their difficulties to explain the operations of the so-called black-box (such as how to determine suitable network structure), but also on the non-convex problem of network training errors, i.e., it is hard to find the global optimum. Proposed by Vapnik [25], support vector machines (SVMs) are one of the most significant developments in overcoming shortcomings of ANN mentioned above. Rather than by implementing the empirical risk minimization (ERM) principle to minimize the training error, SVMs apply the structural risk minimization (SRM) principle to minimize an upper bound on the generalization error. SVMs can theoretically guarantee to achieve the global optimum, instead of trapping local optimum like ANN. Thus, the solution of a non-linear problem in the original lower dimensional input space can find its linear solution in the higher dimensional feature space. For more detailed mechanisms introduction of SVMs, please refer to Cortes and Vapnik [26] and Vapnik et al. [27], among others. SVMs have found wide application in the field of pattern recognition, bio-informatics, and other artificial intelligence relevant applications. Particularly, along with the introduction of Vapnik’s e-insensitive loss function, SVMs also have been extended to solve nonlinear regression estimation problems [28,29], which are so-called support vector regression (SVR). SVR has been successfully employed to solve forecasting problems in many fields, such as financial time series (stocks index and exchange rate) forecasting [30–36], engineering and software field (production values and reliability) forecasting [37], atmospheric science forecasting [38–41], electric load forecasting [42,43], and so on. Meanwhile, SVR has also been successfully applied to forecast tourist arrivals [44,45]. The empirical results indicate that the selection of the three parameters (C, e, and r) in a SVR model influences the forecasting accuracy significantly. Numerous publications in the literature have given some recommendations on appropriate setting of SVR parameters [46], however, those approaches do not simultaneously consider the interaction effects among the three parameters. There is no general consensus and many contradictory opinions, thus, evolutionary algorithms are employed to determine appropriate parameter values. Genetic algorithms (GAs) are auto-adaptive stochastic search techniques [47] that are based on the Darwinian survivalof-the-fittest philosophy and generate new individuals with selection, crossover and mutation operators. GAs start with a coding of the parameter set of all types of objective functions, thus, GAs have the ability in solving problems those traditional algorithms are not easily to solve. In Pai and Hong [37,44], SVR with GAs is superior to other competitive forecasting models (ARIMA and ANN). GAs are able to reserve a few best fitted members of the whole population for the next generation in the operation process, however, after some generations GAs may lead to a premature convergence to a local optimum in the searching the suitable parameters of a SVR model. Simulated annealing (SA) is a stochastic based general search tool that mimics the annealing process of material physics [48]. When the system in the original state is greater than that of the new generated state, this new state is automatically accepted. In contrast, the new state is accepted by Metropolis criterion with a probability function. The performance of SA is dependent on the cooling schedule. Thus, SA has some institution to be able to escape from local minima and reach to the global minimum [49]. In Pai and Hong [43,37], SVR with SA is also superior to other competitive forecasting models (Weibull
6735
W.-C. Hong et al. / Applied Mathematics and Computation 217 (2011) 6733–6747
Bayes, ARIMA, and GRNN). However, SA costs more computation time. To ensure the efficiency of SA, a proper temperature cooling rate (stop criterion) should be considered. To overcome these drawbacks from GAs and SA, it is necessary to find some effective approach and improvement to avoid misleading to the local optimum and to search optimum objective function efficiently. Genetic algorithm-simulated annealing (GA-SA) hybrid algorithm is a novel trial in dealing with the challenges mentioned above. The GA-SA can firstly employ the superiority of SA algorithm to escape from local minima and approximate to the global minimum, and secondly apply the mutation process of GAs to improve searching ability in the range of values. So, the hybrid algorithm has been applied to the fields of system design [50], system and network optimization [51,52], query to information retrieval system [53], continuous-time production planning [54,55], and electrical power districting problem [56]. However, there is little application of the GA-SA to SVR’s parameter determination. This investigation presented in this paper is motivated by a desire to solve the problem of maintaining the premature convergence to a local optimum of GAs and the efficiency of SA mentioned above in determining the three free parameters in a SVR traffic flow forecasting model. Therefore, the GA-SA algorithm is employed in the SVR model, namely SVRGA-SA, to forecast inter-urban motorway traffic flow in Panchiao city of Taipei County, Taiwan. The rest of this paper is organized as follows. Section 2 presents the models for comparing forecast performance and SVR models. Section 3 introduces the proposed SVRGA-SA forecasting model. Section 4 illustrates a numerical example that reveals the forecasting performance of the proposed models. Conclusions are finally made in Section 5. 2. Forecasting methodology In this investigation, four models, the seasonal ARIMA (SARIMA), back-propagation neural network (BPNN), seasonal Holt–Winters (SHW) models, and SVRGA-SA model, are used to compare the forecasting performance of traffic flow. 2.1. Models for comparing forecast performance 2.1.1. The seasonal autoregressive integrated moving average (SARIMA) model Proposed by Box and Jenkins [11], the seasonal ARIMA process has been one of the most popular approaches in time series forecasting, particularly for strong seasonal component. The SARIMA process is often referred to as the SARIMA (p, d, q) (P, D, Q)S model. Similar to the ARIMA model, the forecasting values are assumed to be a linear combination of past values and past errors. A time series {Xt} is a SARIMA process with seasonal period length S if d and D are nonnegative integers and if the differenced series Wt = (1 B)d(1 BS)DXt is a stationary autoregressive moving average process. In symbolic terms, the model can be written as:
/p ðBÞUP ðBS ÞW t ¼ hq ðBÞHQ ðBS Þet ;
t ¼ 1; 2; . . . ; N;
ð1Þ a
where N is the number of observations up to time t; B is the backshift operator defined by B Wt = Wta; /p(B) = 1 /1B /pBp is called a regular (non-seasonal) autoregressive operator of order p; UP(BS) = 1 U1BS UPBPS is a seasonal autoregressive operator of order P; hq(B) = 1 h1B hqBq is a regular moving average operator of order q; HQ(BS) = 1 H1BS HQBQS is a seasonal moving average operator of order Q; et is identically and independently distributed as normal random variables with mean zero, variance r2, and cov(et, etk) = 0, "k – 0. In the definition above, the parameters p and q represent the autoregressive and moving average order, respectively; and the parameters P and Q represent the autoregressive and moving average order at the model’s seasonal period length, S, respectively. The parameters d and D represent the order of ordinary and seasonal differencing, respectively. Basically, when fitting a SARIMA model to data, the first task is to estimate values of d and D, the orders of differencing needed to make the series stationary and to remove most of the seasonality. The values of p, P, q and Q then need to be estimated by the autocorrelation function (ACF) and partial autocorrelation function (PACF) of the differenced series. Other model parameters may be estimated by suitable iterative procedures. 2.1.2. Back-propagation neural networks (BPNN) model The multi-layer back-propagation neural network (BPNN) is one of the most widely used neural network models. Consider the simplest BPNN architecture (Fig. 1) including three layers: an input layer (x), an output layer (o), and a hidden layer (h). The computational procedure of this network is described below:
oi ¼ f
X
!
g ij xij ;
ð2Þ
j
where oi denotes the output of node i, f() represents the activation function, gij is the connection weight between nodes i and j in the lower layer which can be replaced with vji and wkj, and xij denotes the input signal from the node j in the lower layer. The BPNN algorithm attempts to improve neural network performance by reducing the total error through changing the gradient weights. The BPNN algorithm minimizes the sum-of-error-square, which can be calculated by:
E¼
P X K 1X ðdpj opj Þ2 ; 2 p¼1 j¼1
ð3Þ
6736
W.-C. Hong et al. / Applied Mathematics and Computation 217 (2011) 6733–6747
Fig. 1. A back-propagation network.
where E denotes the square errors, K represents the output layer neurons, P is the training data pattern, dpj denotes the actual output and opj represents the network output. The BPNN algorithm is expressed as follows. Let Dvji denote the weight change for any hidden layer neuron and Dwkj for any output layer neuron,
@E i ¼ 1; . . . ; I; j ¼ 1; . . . ; J 1; @ v ji @E Dwkj ¼ g j ¼ 1; . . . ; J 1; k ¼ 1; . . . ; K; @wkj
Dv ji ¼ g
ð4Þ ð5Þ
where g represents the learning rate parameter. Notably, the Jth node in Fig. 1 is the bias neuron without weight. Eqs. (6) and (7) express the signal (sj) to each hidden layer neuron and the signal (uk) to each neuron in the output layer.
sj ¼
I X
v ji xi ;
ð6Þ
i¼1
uk ¼
J1 X
wkj yj :
ð7Þ
j¼1
The error signal terms for the jth hidden neuron dyj, and for the kth output neuron dok are defined as Eqs. (8) and (9), respectively.
@E ; @sj @E ¼ : @uk
dyj ¼
ð8Þ
dok
ð9Þ
Applying the chain rule, the gradient of the cost function with respect to weights
@E @E @sj ¼ ; @ v ji @sj @ v ji @E @E @uk ¼ ; @wkj @uk @wkj
vji and wkj is ð10Þ ð11Þ
W.-C. Hong et al. / Applied Mathematics and Computation 217 (2011) 6733–6747
6737
and
@sj @ðv j1 x1 þ v j2 x2 þ þ v ji xi þ þ v jI xI Þ ¼ ¼ xi ; @ v ji @ v ji @ðwk1 y1 þ wk2 y2 þ þ wkj yj þ þ wkJ yJ Þ @uk ¼ ¼ yj ; @wkj @wkj
ð12Þ ð13Þ
combining Eqs. (8), (10), (12) and Eqs. (9), (11), (13) obtains
@E ¼ dyj xi @ v ji @E ¼ dok yj @wkj
ð14Þ ð15Þ
the weight change from Eqs. (4) and (5) can now be written as:
@E ¼ gdyj xi ; @ v ji @E Dwkj ¼ g ¼ gdok yj : @ekj
Dv ji ¼ g
ð16Þ ð17Þ
Furthermore, Eqs. (8) and (9) can be calculated as follows:
@E @E @ok ¼ ¼ ðdk ok Þf 0 ðuk Þ @uk @ok @uk ( ) K X @E @E @yj dyj ¼ ¼ ¼ ok wkj fj0 ðuj Þ @sj @yj @sj k¼1
dok ¼
the weights,
ð18Þ ð19Þ
vji and wkj, are changed as follows:
wkj ¼ wkj þ Dwkj ¼ wkj þ gdok yj ;
v ji ¼ v ji þ Dv ji ¼ v ji þ gfj0 ðuj Þxi
K X
ð20Þ dok wkj :
ð21Þ
k¼1
The constant term, g, is specified at the start of training cycle, and determines the training speed and stability of the network. The most common activation functions are the squashing sigmoid function, such as the logistic and tangent hyperbolic functions. 2.1.3. Holt–Winters (HW) and seasonal Holt–Winters (SHW) models The last model employed is the Holt–Winters (HW) method [57,58]. The Holt–Winters method is an extension of exponentially weighted moving average procedure. The exponentially weighted moving average approach forecasts future values based on past observations, and places more weight on the recent observations. The Holt–Winters method smoothes the trend values separately with two smoothing coefficients (with values between 0 and 1) and incorporates an explicit linear trend in the forecast. The approach of Holt–Winters linear exponential smoothing is as follows:
st ¼ aat þ ð1 aÞðst1 þ bt1 Þ; bt ¼ bðst st1 Þ þ ð1 bÞbt1 ;
ð22Þ ð23Þ
ft ¼ st þ ibt ;
ð24Þ
where at is the actual value at time t; st is the smoothed estimate at time t; bt is the trend value at time t; a is the level smoothing coefficient; and b is the trend smoothing coefficient. The Eq. (22) lets the actual value be smoothed in a recursive manner by weighting the current level (a), and then adjusts st directly for the trend of the previous period, bt1, by adding it to the last smoothed value, st1. This helps to eliminate the lag and brings st to the approximate base of the current data value. The Eq. (23) updates the trend, which is expressed as the difference between the last two smoothed values. It modifies the trend by smoothing with b in the last period (st st1), and adding that to the previous estimate of the trend multiplied by (1 b). The Eq. (24) is used to forecast ahead. The trend, bt, is multiplied by the number of periods ahead to be forecast, i, and added to the base value, st. The forecast error (et) is defined as the actual value minus the forecast (fitted) value for time period t, that is:
et ¼ at ft :
ð25Þ
The forecast error is assumed to be an independent random variable with zero mean and constant variance. Values of smoothing coefficients, a and b, are determined to minimize the forecast NRMSE.
6738
W.-C. Hong et al. / Applied Mathematics and Computation 217 (2011) 6733–6747
To consider the seasonal effect, the seasonal Holt and Winters’ linear exponential smoothing (SHW) approach is also employed. The Holt–Winters method cannot be extended to accommodate additive seasonality if the magnitude of the seasonal effects does not change with the series or multiplicative seasonality if the amplitude of the seasonal pattern changes over time. The forecast for SHW model is as follows:
at þ ð1 aÞðst1 þ bt1 Þ; ItL bt ¼ bðst st1 Þ þ ð1 bÞbt1 ; at It ¼ c þ ð1 cÞItL ; st ft ¼ ðst þ ibt ÞItLþi ;
st ¼ a
ð26Þ ð27Þ ð28Þ ð29Þ
where L is the length of seasonality; I is the seasonal adjustment factor; and c is the seasonal adjustment coefficient. The Eq. (26) differs slightly from Eq. (22) in that the first term is divided by the seasonal number ItL; this is done to deseasonalize at (eliminate seasonal fluctuations from at). The Eq. (28) is comparable to a seasonal index that is found as a ratio of current values of the series, at, divided by the smoothed value for the series, st. If at is larger than st, the ratio will be greater than 1, else, the ratio will be less than 1. In order to smooth the randomness of at, Eq. (28) weights the newly computed seasonal factor with c and the most recent seasonal number corresponding to the same season with (1 c). 2.2. The support vector regression (SVR) model 2.2.1. Structural risk minimization As mentioned above that traditional artificial intelligent approaches have tended to be based on finding functions to map as training errors over training set, i.e., empirical risk minimization (ERM). However, the ERM does not guarantee good generalization to novel testing data set. To separate the classes with a surface (hyperplane) that maximizes the margin between training data set, SVMs employ the SRM principle that aims to minimize a bound on the generalization error, rather than minimizing the mean square error over the training data set. SRM provides a well-defined quantitative measurement for the capacity of a learned function to capture the true structure of the data distribution and generalize over unknown test data set. Vaplink-Chervonenkis (VC) dimension [31] has been applied for such a capacity, by selecting a function and minimizing its empirical error to a training data set, SRM can guarantee a minimal bound on the test data set. Given a training data set of N elements {(xi, yi), i = 1, 2, . . . , N}, where xi is the ith element in n-dimensional space, i.e., xi ¼ ½x1i ; . . . ; xni 2 Rn , and yi 2 {1, +1} is the label of xi. Then, define a deterministic function f: x ? {1, +1} for a given input data x and adjustable weights wðw 2 Rn Þ, according to the same but unknown probability distribution (P(x, y)). The weights w would be adjusted during the training stage. Since the underlying probability distribution P(x, y) is unknown, the upper bound for the probability of classification errors on the test data set (i.e., expected error of f), R(f), cannot be minimized directly. Thus, it is feasible to estimate an approximate function of R(f), i.e., empirical risk, denoted as Remp(f), that is close to the optimal one based on the training data pairs (x, y). Then, according to the SRM principle [25,26], R(f) and Remp(f) are expressed as:
Rðf Þ 6 Remp ðf Þ þ e1 ðN; h; g; Remp Þ; Remp ðf Þ ¼
1 N
N X
jyi f ðxi Þjloss function ;
ð30Þ ð31Þ
i¼1
sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi! R ðf Þ 2 e1 ðN; h; g; Remp Þ ¼ 2e0 ðN; h; gÞ 1 þ 1 þ 2 emp ; e0 ðN; h; gÞ sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ffi h ln 2N þ 1 ln g4 h : e0 ðN; h; gÞ ¼ N
ð32Þ
ð33Þ
The Eq. (30) holds with probability 1 g for 0 6 g 6 1. The e0(N, h, g) is the so-called VC confidence interval. The values of e0(N, h, g) depend on the number of training data N, the VC dimension h, and the value of g. For a small empirical risk Remp(f), for example, close to 0, then Eq. (30) would approximately reduced to Remp ðf Þ þ 4e20 ðN; h; gÞ, in contrast, for a large empirical risk close to 1, the Eq. (30) would approximately reduced to Remp(f) + e0(N, h, g) [59]. Thus, there are two strategies for minimizing the upper bound, R(f). The first one is to keep the VC confidence (e0(N, h, g)) fixed and to minimize the empirical risk, most of ANN models seek to employ the first one. However, this does not perform well because dealing with Remp(f) lonely cannot guarantee reduction VC confidence. The second one is to fix the empirical risk to a small value and to minimize the VC confidence, which is the so-called SRM principle. Although SVMs implement this principle, their training algorithm that aims to minimize the VC dimension is still based on a hierarchy that depends on the data [25,60].
6739
W.-C. Hong et al. / Applied Mathematics and Computation 217 (2011) 6733–6747
2.2.2. Support vector regression As mentioned above, SVMs have originally been used for classification purposes but their principles can be extended easily to the task of regression and time series prediction. The brief ideas of SVMs for the case of regression are introduced. A nonlinear mapping uðÞ : Rn ! Rnh is defined to map the input data (training data set) fðxi ; yi ÞgNi¼1 into a so-called high dimensional feature space (which may have infinite dimensions), Rnh (Fig. 2(a) and (b)). Then, in the high dimensional feature space, there theoretically exists a linear function, f, to formulate the nonlinear relationship between input data and output data. Such a linear function, namely SVR function, is as Eq. (34):
f ðxÞ ¼ wT uðxÞ þ b;
ð34Þ nh
where f(x) denotes the forecasting values; the coefficients w (w 2 R Þ and b (b 2 RÞ are adjustable. As mentioned above, SVMs method one aims at minimizing the empirical risk,
Remp ðf Þ ¼
N 1 X He ðyi ; wT uðxi Þ þ bÞ; N i¼1
ð35Þ
where He(y, f(x)) is the e-insensitive loss function (as thick line in Fig. 2(c)) and is defined as Eq. (36),
He ðy; f ðxÞÞ ¼
jf ðxÞ yj e; if jf ðxÞ yj P e; 0;
ð36Þ
otherwise:
In addition, He(y, f(x)) is employed to find out an optimum hyper plane on the high dimensional feature space (Fig. 2(b)) to maximize the distance separating the training data into two subsets. Thus, the SVR focuses on finding the optimum hyper plane and minimizing the training error between the training data and the e-insensitive loss function. Then, the SVR minimizes the overall errors,
Minw;b;n ;n Re ðw; n ; nÞ ¼
N X 1 T ni þ ni w wþC 2 i¼1
ð37Þ
with the constraints
e þ ni ; i ¼ 1; 2; . . . ; N; yi þ wT uðxi Þ þ b 6 e þ ni ; i ¼ 1; 2; . . . ; N;
yi wT uðxi Þ b 6 ni P 0;
i ¼ 1; 2; . . . ; N;
ni P 0;
i ¼ 1; 2; . . . ; N:
The first term of Eq. (37), employed the concept of maximizing the distance of two separated training data, is used to regularize weight sizes, to penalize large weights, and to maintain regression function flatness. The second term penalizes training errors of f(x) and y by using the e-insensitive loss function. C is a parameter to trade off these two terms. Training errors above +e are denoted as ni , whereas training errors below e are denoted as ni (Fig. 2(b)). After the quadratic optimization problem with inequality constraints is solved, the parameter vector w in Eq. (34) is obtained:
w¼
N X bi bi uðxi Þ;
ð38Þ
i¼1
where bi ; bi are obtained by solving a quadratic program and are the Lagrangian multipliers. Finally, the SVR regression function is obtained as Eq. (39) in the dual space,
+ε
(a)
(b)
(c)
0
ξi*
−ε
ξi*
ϕ(x) Hyper plane
ξi
Input space
Feature space Fig. 2. Transformation process illustration of a SVR model.
−ε +ε ε -insensitive loss function
6740
W.-C. Hong et al. / Applied Mathematics and Computation 217 (2011) 6733–6747
f ðxÞ ¼
N X
bi bi Kðxi ; xÞ þ b;
ð39Þ
i¼1
where K(xi, xj) is called the kernel function, and the value of the Kernel equals the inner product of two vectors, xi and xj, in the feature space u(xi) and u(xj), respectively; that is, K(xi, xj) = u(xi) u(xj). Any function that meets Mercer’s condition [25] can be used as the Kernel function. There are several types of kernel function. The most used kernel functions are the Gaussian RBF with a width of r: K(xi, xj) = exp(0.5kxi xjk2/r2) and the polynomial kernel with an order of d and constants a1 and a2: K(xi, xj) = (a1xixj + a2)d. If the value of r is very large, the RBF kernel approximates the use of a linear kernel (polynomial with an order of 1). Till now, it is hard to determine the type of kernel functions for specific data patterns [61,62]. However, the Gaussian RBF kernel is not only easier to implement, but also capable to nonlinearly map the training data into an infinite dimensional space, thus, it is suitable to deal with nonlinear relationship problems. Therefore, the Gaussian RBF kernel function is specified in this study. The selection of the three parameters, r, e and C, of a SVR model influences the accuracy of forecasting. However, structural methods for confirming efficient selection of parameters efficiently are lacking. The GA-SA algorithm is used in the proposed SVR model to optimize the parameter selection. 3. Hybrid genetic algorithm-simulated annealing algorithm (GA-SA) To overcome the drawbacks from GAs and SA, this study proposes a hybrid GA-SA algorithm by applying the superiority of SA to escape from local minima and approximate to the global minimum, in addition, by using the mutation process of GAs to improve searching ability in the range of values. On the other hand, to avoid computation executing time consuming, only the optimal individual of GAs population will be delivered to the SA for further improving. The proposed GA-SA algorithm consists of the GAs part and the SA part. GAs evaluates the initial population and operates on the population using three basic genetic operators to produce new population (best individual), then, for each generation of GAs, it will be delivered to SA for further processing. After finishing all the processes of SA, the modified individual will be sent back to GAs for the next generation. These computing iterations will be never stopped till the termination condition of the algorithm is reached. The proposed procedure of GA-SA is illustrated as follow and the flowchart is shown as Fig. 3. The procedure of the GAs part is illustrated as follows: Step 1: Initialization. Construct randomly the initial population of chromosomes. The three parameters, C, r, and e in a SVR model in the ith generation are encoded into a binary format; and represented by a chromosome that is composed of ‘‘genes’’ of binary numbers (Fig. 4). Each chromosome has three genes, which represent three parameters. Each gene has 40 bits. For instance, if each gene contains 40 bits, a chromosome contains 120 bits. More bits in a gene correspond to finer partition of the search space. Step 2: Evaluating fitness. Evaluate the fitness (forecasting errors) of each chromosome. In this paper, a negative normalized root mean square error (NRMSE) is used as the fitness function. The NRMSE is as Eq. (40),
vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi u N N uX X NRMSE ¼ t ðai fi Þ2 = a2i ; i¼1
ð40Þ
i¼1
where ai and fi represent the actual and forecast values, and N is the number of forecasting periods. Step 3: Selection operation. Based on fitness functions, chromosomes with higher fitness values are more likely to yield offspring in the next generation. The roulette wheel selection principle [47] is applied to choose chromosomes for reproduction. Step 4: Crossover operation and mutation operation. Mutations are performed randomly by converting a ‘‘1’’ bit into a ‘‘0’’ bit or a ‘‘0’’ bit into a ‘‘1’’ bit. In crossover operation, chromosomes are paired randomly. The single-point-crossover principle is employed herein. Segments of paired chromosomes between two determined break-points are swapped. For simplicity, suppose a gene has four bits, thus, a chromosome contains 12 bits (Fig. 5). Before crossover is performed, the values of the three parameters in #1 parent are 1.5, 1.25 and 0.34375, respectively. For #2 parent, the three values are 0.625, 8.75 and 0.15625, accordingly. After crossover, for #1 offspring, the three values are 1.625, 3.75 and 0.40625, accordingly. For #2 offspring, the three values are 0.5, 6.25 and 0.09375, respectively. Finally, decode the crossover three parameters in a decimal format. Step 5: Stop condition. If the number of generation is equal to a given scale, then the best chromosomes are presented as a solution, otherwise go to the step 1 of the SA part. In the proposed GA-SA algorithm process, GAs will deliver its best individual to SA for further processing. After the optimal individual of GAs being improved, SA sends it back to GAs for the next generation. These computing iterations will be never stopped till the termination condition of the algorithm is reached. The procedure of the SA part is illustrated as follows:
6741
W.-C. Hong et al. / Applied Mathematics and Computation 217 (2011) 6733–6747
Generation =1 (random initial population )
Start with best individual from GAs
Generate initial current state (absolute value of forecasting error )
Is the number of generation less than or equal to the maximal number Random move to get a provisional state Yes
Calculate the fitness function (forecasting error )
Metropolis criterion test
No
Yes (1) E(Snew )>E(Sold ), and P(accept S new)> random number (2) E(Snew ) > E(Sold )
Parent selection operation No
No
Equilibrium?
Generation= generation+1
Crossover operation
Yes; or maximum iteration is reached
Set current state as the new system state
Mutation operation Temperature reduction
Optimum (GA-SA finished)
No
Frozen?
Yes (pre-determined temperature is reached )
SA finished
Fig. 3. The GA-SA algorithm flowchart.
0
1
1
…
1
1
0
0
…
1
0
1
0
…
0
Fig. 4. Binary encoding of a chromosome.
Step 1: Generate initial current state. Receive values of the three parameters from GAs. The values of forecasting error, NRMSE, shown as Eq. (40), is defined as the system state (E). Here, the initial state (E0) is obtained.
6742
W.-C. Hong et al. / Applied Mathematics and Computation 217 (2011) 6733–6747
before crossover Parameter
Parameter
Parameter
Parent 1
1
1
0
0
0
0
1
0
1
0
1
1
Parent 2
0
1
0
1
1
1
1
0
0
1
0
1
Crossover Point=1 after crossover Parameter
Parameter Offspring 1 Offspring 2
Parameter
1
1
0
1
0
1
1
0
0
1
0
1
0
1
0
0
1
0
1
0
0
0
1
1
Fig. 5. A simplified example of parameter representation.
Step 2: Provisional state. Make a random move to change the existing system state to a provisional state. Another set of three positive parameters are generated in this stage. Step 3: Metropolis criterion tests. The following Metropolis criterion equation is employed to determine the acceptance or rejection of provisional state [63].:
8 > < Accept the provisional state; if Eðsnew Þ > Eðsold Þ; and p < Pðaccept snew Þ; Accept the provisional state; if Eðsnew Þ 6 Eðsold Þ; > : Reject the provisional state; otherwise;
0 6 p 6 1; ð41Þ
where the p is a random number to determine the acceptance of the provisional state, P(accept snew), the probability of new Þ (T is the thermal accepting the new state is given by the following probability function, Pðaccept snew Þ ¼ exp Eðsold ÞEðs kT equilibrium temperature, k represents the Boltzmann constant). If the provisional state is accepted, then set the provisional state as the current state. Step 4: Incumbent solutions. If the provisional state is not accepted, then return to step 2. Furthermore, if the current state is not superior to the system state, then repeat steps 2 and 3 until the current state is superior to the system state, and set the current state as new system state. Previous studies [48] indicated that the maximum number of loops (Nsa) is 100d to avoid infinitely repeated loops, where d denotes the problem dimension. In this investigation, three parameters (r, C, and e) are used to determine the system states. Therefore, Nsa is set to 300. Step 5: Temperature reduction. After the new system state is obtained, reduce the temperature. The new temperature reduction is obtained by the Eq. (42):
New temperature ¼ ðCurrent temperatureÞ q;
where 0 < q < 1:
ð42Þ
The q is set at 0.9 in this study [64]. If the pre-determined temperature is reached, then stop the algorithm and the latest state is an approximate optimal solution. Otherwise, go to step 2. 4. A numerical example and experimental results The traffic flow data sets are originated from three Civil Motorway detector sites. The Civil Motorway is the busiest interurban motorway networks in Panchiao city, the capital of Taipei County, Taiwan. The major site was located at the center of Panchiao city, where the flow intersects an urban local street system, and it provides one way traffic volume for each hour in weekdays. Therefore, one way flow data for peak traffic are employed in this investigation, which includes the morning peak period (from 6:00 to 10:00) and the evening peak period (from 16:00 to 20:00). The data collection is conducted from February 2005 to March 2005. During the observation period, the number of traffic flow data available for the morning and evening peak periods are 50 and 90 h, respectively. For convenience, the traffic flow data are converted to equivalent of passengers (EOP), and both of these two peak periods show the seasonality of traffic data. In addition, traffic flow data are divided into three parts: training data, validation data and testing data. For the morning peak period the training data set, validation data set and testing data set are 30, 10 and 10 h accordingly. For the evening peak period, the experimental data are arranged as: training data (60 h), validation data (15 h) and testing data (15 h). The accuracy of forecasting models is measured Eq. (40). 4.1. Parameter determination of different comparative forecasting models The parameter selection of forecasting models is important for obtaining good forecasting performance. For the SARIMA model, the parameters are determined by taking the first-order regular difference and first seasonal difference to remove
6743
W.-C. Hong et al. / Applied Mathematics and Computation 217 (2011) 6733–6747
non-stationary and seasonality characteristics. Using statistical packages, with no residuals autocorrelated and approximately white noise residuals, the most suitable models for these two morning/evening peak periods for the traffic data are SARIMA (1, 0, 1) (0, 1, 1)5 with non-constant item and SARIMA (1, 0, 1) (1, 1, 1)5 with constant item, respectively. The equations used for the SARIMA models are presented as Eqs. (43) and (44), respectively:
ð1 0:5167BÞð1 B5 ÞX t ¼ ð1 þ 0:3306BÞð1 0:9359B5 Þet ;
ð43Þ
ð1 0:5918BÞð1 B5 ÞX t ¼ 2:305 þ ð1 0:9003B5 Þet :
ð44Þ
For the Holt–Winters method, by Minitab 14 statistic software, the a value and b value, for morning peak period, are determined as 0.654 and 0.041, respectively; for evening peak period as 0.76 and 0.367, respectively. Similarly, the appropriate parameters (L, a, b, and c) in the SHW model for morning peak period are determined 5, 0.15, 0.72, and 0.73, correspondingly; for evening peak period as 5, 0.48, 0.04, and 0.13, correspondingly. For the BPNN model, Matlab 6.5 computing software is employed to implement the forecasting procedure. The number of nodes in the hidden layer is used as a validation parameter of the BPNN model. The most suitable number of hidden nodes of a BPNN model is three.
Table 1 Forecasting results and associated parameters of the SVRGA models. Morning peak period No. of fed-in data
5 10 15 20 25
Evening peak period Parameters
NRMSE of testing
r
C
e
0.7286 0.7138 0.7561 0.3933 0.9599
2149 1199 2037 8549 6850
0.8249 0.1460 0.9813 0.3155 0.4339
0.3965 0.3464 0.2705 0.3518 0.2631
No. of fed-in data
Parameters
25 30 35 40 45 50 55
NRMSE of testing
r
C
e
0.2583 0.0103 0.3359 0.0287 0.0327 0.1270 0.1104
8838 1352 9756 1363 1312 8954 9540
0.4464 0.0632 0.2076 0.6152 0.1607 0.2965 0.4314
0.1257 0.1115 0.1010 0.1177 0.1028 0.1226 0.1361
The bold values are determined as the final values of three parameters in a SVR model.
Table 2 Forecasting results and associated parameters of the SVRGA-SA models. Morning peak period No. of fed-in data
5 10 15 20 25
Evening peak period Parameters
NRMSE of testing
r
C
e
0.5089 1.0479 1.0502 0.6858 0.9668
1013 1421 2522 2142 4518
0.2290 0.5395 0.6872 0.4724 0.3377
0.3265 0.3109 0.2602 0.2754 0.2634
No. of fed-in data
25 30 35 40 45 50 55
Parameters
NRMSE of testing
r
C
e
0.7472 0.6498 0.6717 0.1605 0.1372 0.1338 0.1903
10,215 10,948 9792 8774 10,791 9560 9359
0.8739 0.9494 0.7517 0.2110 0.6252 0.2565 0.5196
0.1112 0.1090 0.1004 0.1207 0.1401 0.1212 0.1017
The bold values are determined as the final values of three parameters in a SVR model.
Table 3 Morning peak period traffic flow forecasting results (unit: EOP). Peak periods
Actual
031106a 031107 031108 031109 031110 031206 031207 031208 031209 031210
1317.5 2522.0 2342.0 2072.0 1841.5 995.5 1457.0 1899.0 1870.5 2151.5
NRMSE a
HW (a = 0.654, b = 0.041)
c = 0.73)
2239.5 1270.0 2364.5 2483.0 2188.5 2091.5 1277.0 2507.0 2627.0 2264.5
2243.1 1624.5 2222.3 2314.6 2163.4 1951.8 1299.6 1380.0 1710.8 1810.9
0.3682
0.2972
SARIMA (1, 0, 1) (0, 1, 1)5
BPNN
1363.87 2440.1 2593.9 2422.1 2459.9 1578.3 2569.9 2690.4 2505.4 25387.0 0.3039
‘‘031106’’ denotes the 6 o’clock on 11 March 2005, and so on.
SHW (L = 5, a = 0.15, b = 0.72,
SVRGA
SVRGASA
1361.3 2573.9 2677.6 2351.2 2256.9 1387.8 2656.8 2586.4 2329.1 2180.2
2190.9 2027.4 2140.5 2313.7 2053.4 1980.7 1704.1 1548.3 1521.0 1881.1
2148.3 2010.5 2013.6 2364.2 2065.3 1820.8 1663.5 1441.3 1419.8 1813.5
0.2706
0.2631
0.2602
6744
W.-C. Hong et al. / Applied Mathematics and Computation 217 (2011) 6733–6747
4.2. SVRGA-SA traffic forecasting model For the SVRGA-SA model, in the training stage, a rolling-based forecasting procedure is conducted, and, in the validation and testing stage, a 1-h-ahead forecasting policy adopts. Then, several types of data-rolling are considered to forecast traffic flow in the next hour. Different numbers of the traffic flow in a time series are fed into the SVRGA-SA model to forecast the traffic flow in the next validation period. While training errors improvement occurs, the three kernel parameters, r, C, and e of the SVRGA-SA model adjusted by GA-SA algorithm are employed to calculate the validation error. Then, the adjusted parameters with minimum validation error are selected as the most appropriate parameters. Table 1 indicates that SVRGA (SVR with genetic algorithm) models perform the best when 25 and 35 fed-in data are used for morning/evening traffic forecast, respectively. Table 2 indicates that SVRGA-SA (SVR with hybrid GA and SA algorithms) models perform the best when 15 and 35 input data are used for morning/evening traffic forecast, respectively.
Table 4 Evening peak period traffic flow forecasting results (unit: EOP). Peak periods
Actual
SARIMA (1, 0, 1) (1, 1, 1)5
BPNN
HW (a = 0.76, b = 0.367)
SHW (L = 5, a = 0.48, b = 0.04, c = 0.13)
SVRGA
SVRGASA
031016a 031017 031018 031019 031020 031116 031117 031118 031119 031120 031216 031217 031218 031219 031220
2310.5 2618.0 2562.0 2451.5 2216.5 2175.5 2577.0 2879.5 2693.0 2640.0 2146.5 2544.5 2873.0 2567.5 2660.5
2573.84 2821.57 3107.01 3103.66 3011.80 2611.58 2859.31 3144.75 3141.40 3049.54 2649.32 2897.05 3182.49 3179.13 3087.28
2844.5 2110.5 2620.0 2853.5 2873.0 2624.5 1999.0 2191.5 2500.5 2386.0 2203.5 2116.0 2522.5 2620.0 2512.0
2270.5 2259.8 2590.9 2619.7 2495.7 2209.5 2100.2 2512.1 2943.3 2835.2 2714.6 2152.1 2429.0 2869.0 2658.3
1999.5 2260.0 2475.5 2433.2 2312.3 2004.6 2254.4 2444.4 2404.0 2281.8 2000.5 2251.7 2438.7 2383.7 2268.2
2251.6 2364.8 2423.4 2598.9 2361.3 2101.2 2269.6 2156.5 2792.6 2693.9 2542.7 2379.6 2688.5 2602.0 2626.5
2209.8 2321.2 2364.4 2569.9 2423.7 2369.9 2153.1 2431.8 2643.9 2729.3 2618.8 2596.6 2534.4 2623.9 2579.3
0.1821
0.1636
0.1232
0.1142
0.1010
0.1004
NRMSE a
‘‘031016’’ denotes the 16 o’clock on 10 March 2005, and so on.
EOP 3,000.00
2,500.00
2,000.00
1,500.00
1,000.00
500.00
Actual
SARIMA (1,0,1)× (0,1,1)5
BPNN
HW
SHW
SVRGA
SVRGA-SA 0.00 031106
031107
031108
031109
031110
031206
031207
031208
031209
Peak periods *: “031106” denotes the 6 o’clock on 11 March 2005, and so on. Fig. 6. The forecasting morning traffic flow peak of SARIMA, BPNN, HW, SHW, SVRGA and SVRGA-SA models.
031210
W.-C. Hong et al. / Applied Mathematics and Computation 217 (2011) 6733–6747
6745
EOP 3,300.00 3,100.00 2,900.00 2,700.00 2,500.00 2,300.00 2,100.00 1,900.00 1,700.00
Actual
SARIMA (1,0,1)×(1,1,1)5
BPNN
HW
SHW
SVRGA
SVRGA-SA 1,500.00 031016 031017 031018 031019 031020 031116 031117 031118 031119 031120 031216 031217 031218 031219 031220
Peak periods *: “031016” denotes the 16 o’clock on 10 March 2005, and so on. Fig. 7. The forecasting evening traffic flow peak of SARIMA, BPNN, HW, SHW, SVRGA and SVRGA-SA models.
4.3. Forecasting results discussions The well-trained models, SARIMA, BPNN, HW, SHW, SVRGA, and SVRGA-SA, are applied to forecast the traffic flow during the morning/evening peak period. Tables 3 and 4 show the actual values and the forecast values obtained using various forecasting models in the morning peak and evening peak, respectively. The NRMSE values for each peak hour are calculated to compare fairly the proposed models with other alternative models. The proposed SVRGA-SA model has smaller NRMSE values than the SARIMA, BPNN, HW, SHW, and SVRGA models to capture the traffic flow patterns on hourly average basis. In addition, the GA-SA algorithm also helps to avoid trapping into local minimum than GAs did, thus, outperform the SVRGA model. For example, in Table 2, case of morning traffic flow peak, the GA-SA algorithm is then excellently to shift the local solution of SVRGA model by 25 fed-in data rolling type, (r, C, e) = (0.9599, 6850, 0.4339), with local optimal forecasting errors in terms of NRMSE (0.2631), to another better solution by 15 fed-in data rolling type, (r, C, e) = (1.0502, 2522, 0.6872), with more appropriate local optimal forecasting error in terms of NRMSE (0.2602). Thus, it once again reveals that GA-SA algorithm is much appropriate than GAs in parameter adjustments to achieve forecasting accuracy improvement by integrated into the SVR model, similar capability of GA-SA can be also found in the case of evening flow peak. Figs. 6 and 7 illustrate the forecasting traffic flow of SARIMA, BPNN, HW, SHW, SVRGA, and SVRGA-SA models. 5. Conclusions Accurate traffic forecast is crucial for the inter-urban traffic control system, particularly for avoiding congestion and for increasing efficiency of limited traffic resources during peak periods. The historical traffic data of Panchiao city in northern Taiwan shows a seasonal fluctuation trend which occurs in many inter-urban traffic systems. Therefore, over-prediction or under-prediction of traffic flow influences the transportation capability of an inter-urban system. In this investigation, SVRGA-SA model is proposed to forecast inter-urban traffic flow. This study evaluates the feasibility of GA-SA algorithm in parameter determination to achieve forecasting accuracy improvement by integrated into the SVR model. Many forecasting methodologies have been proposed to deal with the seasonality of traffic flow. However, most models are time consuming in verifying the suitable time-phase divisions, particularly when the sample size is large. In this investigation, the SVRGA-SA model provides a convenient and valid alternative for traffic flow forecasting. The SVRGA-SA model directly uses historical observations from traffic control systems and then determines suitable parameters by efficient optimization algorithms. Tables 3 and 4 illustrate that the SVRGA-SA model has given better results than other forecasting models; in the meanwhile, particularly, the SVRGA-SA model has avoided being trapped in local optimum like SVRGA model does. The superior performance of the SVRGA-SA model is caused of: (1) nonlinear mapping capabilities and so can more easily capture data patterns of tourist arrivals than can SARIMA models; (2) SVR model applies structural risk minimization rather than minimizing the training errors, this minimization of an upper bound on the generalization error provides better generalization performance than that of SARIMA models; finally; (3) the parameter selection in a SVR model heavily influences their forecasting performance, thus, improper selection of these three parameters will lead to either over-fitting
6746
W.-C. Hong et al. / Applied Mathematics and Computation 217 (2011) 6733–6747
or under-fitting of a SVR model. The GA-SA algorithm employs the superiority of SA algorithm to escape from local minima to the global minimum, and then, applies the mutation process of GAs’ searching capability to determine proper parameters combination. The favorable results obtained in this work reveal that the proposed model is a valid alternative for use in high technological products demand forecasting science. In the future, other novel hybrid evolutionary algorithms should be further applied to obtain more appropriate parameter combination, and then, to achieve more improvable, satisfactory accurate traffic flow forecasting while issues of traffic congestions improvement rising. Other socio-economic factors and meteorological control variables during peak periods, such as driving speed limitation, important social events, the percentage of heavy vehicles, bottleneck service level, and waiting time during intersection traffic signals can be included in the traffic forecasting model. In addition, even the proposed SVRGA-SA model is one of the hybrid forecasting models; some other advanced optimization algorithms for parameters selection can be applied for the SVR model to satisfy the requirement of real-time traffic control systems. The goal of the author is to show that combination of novel techniques is as good as pure techniques. Acknowledgments This research was conducted with the support of National Science Council, Taiwan, ROC (NSC 99–2410-H-161–001), and the support of National Natural Science Foundation of China (No. 70801048). References [1] M.S. Dougherty, Investigation of Network Performance Prediction Literature Review (Technical Note 394), Institute for Transport Studies, University of Leeds, UK, 1996. [2] I. Okutani, Y.J. Stephanedes, Dynamic prediction of traffic volume through Kalman filtering theory, Transportation Research Part B 18 (1984) 1–11. [3] A. Stathopoulos, G.M. Karlaftis, A multivariate state space approach for urban traffic flow modeling and prediction, Transportation Research Part C 11 (2003) 121–135. [4] B. van Arem, M. van der Vlist, J. de Ruuiter, M. Muste, S. Smulders, Travel time estimation in the GERDIEN project, in: Proceedings of the Second DRIVE II Workshop on Short-Term Traffic Forecasting, Delft, 1994. [5] J. Whittaker, S. Garside, K. Lindveld, Tracking and predicting a network traffic process, in: Proceedings of the Second DRIVE II Workshop on Short-Term Traffic Forecasting, Delft, 1994. [6] P.C. Vythoulkas, Alternative approaches to short term forecasting for use in driver information systems, in: C.F. Daganzo (Ed.), Transportation and Traffic Theory, Elsevier, Amsterdam, 1993. [7] M. Danech-Pajouh, M. Aron, ATHENA: a method for short-term inter-urban motorway traffic forecasting, Recherche Transports Sécurité 6 (1991) 11– 16. [8] R.L. Eubank, Spline Smoothing and Nonparametric Regression, Marcel Dekker Inc., New York, 1988. [9] B.L. Smart, M.J. Demetsky, Traffic flow forecasting: comparison of modeling approaches, Journal of Transportation Engineering 123 (1997) 261–266. [10] B.L. Smith, B.M. Williams, R.K. Oswald, Comparison of parametric and nonparametric models for traffic flow forecasting, Transportation Research Part C 10 (2002) 303–321. [11] G.E.P. Box, G.M. Jenkins, Time Series Analysis: Forecasting and Control, Holden-Day, San Francisco, 1976. [12] M.M. Hamed, H.R. Al-Masaeid, Z.M. Bani-Said, Short-term prediction of traffic volume in urban arterials, ASCE Journal of Transportation Engineering 121 (1995) 249–254. [13] Y. Kamarianakis, P. Prastacos, Space-time modeling of traffic flow, Computers & Geosciences 31 (2005) 119–133. [14] H.R. Kirby, S.M. Watson, M.S. Dougherty, Should we use neural networks or statistical models for short-term motorway traffic forecasting?, International Journal of Forecasting 13 (1997) 43–50 [15] B.M. Williams, Multivariate vehicular traffic flow prediction: an evaluation of ARIMAX modeling, Transportation Research Record 1776 (2001) 194– 200. [16] P.J. Angeline, G.M. Saunders, J.B. Pollack, An evolutionary algorithm that constructs recurrent neural networks, IEEE Transactions on Neural Networks 5 (1994) 54–65. [17] B.M. Williams, Modeling and Forecasting Vehicular Traffic Flow as a Seasonal Stochastic Time Series Process, Ph.D. Dissertation, Department of Civil Engineering, University of Virginia, Charlottesville, Virginia, 1999. [18] H. Chen, S. Grant-Muller, L. Mussone, F. Montgomery, A study of hybrid neural network approaches and the effects of missing data on traffic forecasting, Neural Computing and Applications 10 (2001) 277–286. [19] M.S. Dougherty, M.R. Cobbett, Short-term inter-urban traffic forecasts using neural networks, International Journal of Forecasting 13 (1997) 21–31. [20] L. Florio, L. Mussone, Neural network models for classification and forecasting of freeway traffic flow stability, Control Engineering Practice 4 (1996) 153–164. [21] C. Ledoux, An urban traffic flow model integrating neural networks, Transportation Research Part C 5 (1997) 287–300. [22] E.I. Vlahogianni, M.G. Karlaftis, J.C. Golias, Optimized and meta-optimized neural networks for short-term traffic flow prediction: a genetic approach, Transportation Research Part C 13 (2005) 211–234. [23] G. Zhang, B.E. Patuwo, M.Y. Hu, Forecasting with artificial neural networks: the state of art, International Journal of Forecasting 14 (1998) 35–62. [24] H. Yin, S.C. Wong, J. Xu, C.K. Wong, Urban traffic flow prediction using a fuzzy-neural approach, Transportation Research Part C 10 (2002) 85–98. [25] V. Vapnik, The Nature of Statistical Learning Theory, Springer-Verlag, New York, 1995. [26] C. Cortes, V. Vapnik, Support vector networks, Machine Learning 20 (1995) 273–297. [27] V. Vapnik, S. Golowich, A. Smola, Support vector machine for function approximation, regression estimation, and signal processing, Advances in Neural Information Processing Systems 9 (1996) 281–287. [28] H. Drucker, C.J.C. Burges, L. Kaufman, A. Smola, V.N. Vapnik, Support vector regression machines, Advances in Neural Information Processing Systems 9 (1997) 155–161. [29] C.A. Martin, S.F. Witt, Forecasting tourism demand: a comparison of accuracy of several quantitative methods, International Journal of Forecasting 5 (1989) 7–19. [30] L. Cao, Support vector machines experts for time series forecasting, Neurocomputing 51 (2003) 321–339. [31] L. Cao, Q. Gu, Dynamic support vector machines for non-stationary time series forecasting, Intelligent Data Analysis 6 (2002) 67–83. [32] W. Huang, Y. Nakamori, S.Y. Wang, Forecasting stock market movement direction with support vector machine, Computers & Operations Research 32 (2005) 2513–2522. [33] P.F. Pai, C.S. Lin, Using support vector machines in forecasting production values of machinery industry in Taiwan, International Journal of Advanced Manufacturing Technology 27 (2005) 205–210.
W.-C. Hong et al. / Applied Mathematics and Computation 217 (2011) 6733–6747 [34] [35] [36] [37] [38] [39] [40] [41] [42] [43] [44] [45] [46] [47] [48] [49] [50] [51] [52] [53] [54] [55] [56] [57] [58] [59] [60] [61] [62] [63] [64]
6747
P.F. Pai, C.S. Lin, A hybrid ARIMA and support vector machines model in stock price forecasting, Omega 33 (2005) 497–505. F.E.H. Tay, L. Cao, Application of support vector machines in financial time series forecasting, Omega 29 (2001) 309–317. F.E.H. Tay, L. Cao, Modified support vector machines in financial time series forecasting, Neurocomputing 48 (2002) 847–861. P.F. Pai, W.C. Hong, Software reliability forecasting by support vector machines with simulated annealing algorithms, Journal of Systems and Software 79 (2006) 747–755. W.C. Hong, P.F. Pai, Potential assessment of the support vector regression technique in rainfall forecasting, Water Resources Management 21 (2007) 495–513. M.A. Mohandes, T.O. Halawani, S. Rehman, A.A. Hussain, Support vector machines for wind speed prediction, Renewable Energy 29 (2004) 939–947. P.F. Pai, W.C. Hong, A recurrent support vector regression model in rainfall forecasting, Hydrological Processes 21 (2007) 819–827. W. Wang, Z. Xu, J.W. Lu, Three improved neural network models for air quality forecasting, Engineering Computations 20 (2003) 192–210. P.F. Pai, W.C. Hong, Forecasting regional electric load based on recurrent support vector machines with genetic algorithms, Electric Power Systems Research 74 (2005) 417–425. P.F. Pai, W.C. Hong, Support vector machines with simulated annealing algorithms in electricity load forecasting, Energy Conversion and Management 46 (2005) 2626–2669. P.F. Pai, W.C. Hong, An improved neural network model in forecasting arrivals, Annals of Tourism Research 32 (2005) 1138–1141. P.F. Pai, W.C. Hong, P.T. Chang, C.T. Chen, The application of support vector machines to forecast tourist arrivals in Barbados: an empirical study, International Journal of Management 23 (2006) 375–385. V. Cherkassky, Y. Ma, Practical selection of SVM parameters and noise estimation for SVM regression, Neural Networks 17 (2004) 113–126. J. Holland, Adaptation in Natural and Artificial System, University of Michigan Press, Ann Arbor, 1975. S. Kirkpatrick, C.D. Gelatt, M.P. Vecchi, Optimization by simulated annealing, Science 220 (1983) 671–680. J. Lee, G.E. Johnson, Optimal tolerance allotment using a genetic algorithm and truncated Monte Carlo simulation, Computer Aided Design 25 (1983) 601–611. H.J. Shieh, R.C. Peralta, Optimal in situ bioremediation design by hybrid genetic algorithm-simulated annealing, Journal of Water Resources Planning and Management 131 (2005) 67–78. S.G. Ponnambalam, M.M. Reddy, A GA-SA multiobjective hybrid search algorithm for integrating lot sizing and sequencing in flow-line scheduling, International Journal of Advanced Manufacturing Technology 21 (2003) 126–137. F. Zhao, X. Zeng, Simulated annealing—genetic algorithm for transit network optimization, Journal of Computing in Civil Engineering 20 (2006) 57–68. O. Cordón, F. Moya, C. Zarco, A new evolutionary algorithm combining simulated annealing and genetic programming for relevance feedback in fuzzy information retrieval systems, Soft Computing 6 (2002) 308–319. K. Ganesh, M. Punniyamoorthy, Optimization of continuous-time production planning using hybrid genetic algorithms-simulated annealing, International Journal of Advanced Manufacturing Technology 26 (2005) 148–154. Z.G. Wang, Y.S. Wong, M. Rahman, Optimisation of multi-pass milling using genetic algorithm and genetic simulated annealing, International Journal of Advanced Manufacturing Technology 24 (2004) 727–732. P.K. Bergey, C.T. Ragsdale, M. Hoskote, A simulated annealing genetic algorithm for the electrical power districting problem, Annals of Operations Research 121 (2003) 33–55. C.C. Holt, Forecasting Seasonal and Trends by Exponentially Weighted Averages, Carnegie Institute of Technology, Pittsburgh, PA, 1957. P.R. Winters, Forecasting sales by exponentially weighted moving averages, Management Science 6 (1960) 324–342. S. Haykin, Neural Networks: A Comprehensive Foundation, Prentice-Hall, Upper Saddle River, NJ, 1999. J. Shawe-Taylor, P.L. Bartlett, R.C. Williamson, M. Anthony, Structural risk minimization over data-dependent hierarchies, IEEE Transactions on Information Theory 44 (1998) 1926–1940. S. Amari, S. Wu, Improving support vector machine classifiers by modifying kernel functions, Neural Networks 12 (1999) 783–789. K. Vojislav, Learning and Soft Computing—Support Vector Machines, Neural Networks and Fuzzy Logic Models, The MIT Press, Massachusetts, 2001. N. Metropolis, A.W. Rosenbluth, M.N. Rosenbluth, A.H. Teller, Equations of state calculations by fast computing machines, Journal of Chemical Physics 21 (1953) 1087–1092. A. Dekkers, E.H.L. Aarts, Global optimization and simulated annealing, Mathematical Programming 50 (1991) 367–393.