Hybrid evolutionary algorithms in a SVR traffic flow forecasting model

Applied Mathematics and Computation 217 (2011) 6733–6747 Contents lists available at ScienceDirect Applied Mathematics and Computation journal homep...

Download PDF

524KB Sizes 6 Downloads 215 Views

Report

PDF Reader
Full Text

Applied Mathematics and Computation 217 (2011) 6733–6747

Contents lists available at ScienceDirect

Applied Mathematics and Computation journal homepage: www.elsevier.com/locate/amc

Hybrid evolutionary algorithms in a SVR trafﬁc ﬂow forecasting model Wei-Chiang Hong a,⇑, Yucheng Dong b, Feifeng Zheng c, Shih Yung Wei d a

Department of Information Management, Oriental Institute of Technology, 58 Sec. 2, Sichuan Rd., Panchiao, Taipei 220, Taiwan, ROC Department of Organization and Management, School of Management, Xi’an Jiaotong University, Xi’an 710049, PR China Department of Management Science, School of Management, Xi’an Jiaotong University, Xi’an 710049, PR China d Department & Graduate Institute of Finance, National Yunlin University of Science & Technology, 123 Sec. 3, University Rd., Douliou, Yunlin 640, Taiwan, ROC b c

a r t i c l e

i n f o

Keywords: Trafﬁc ﬂow forecasting Support vector regression Hybrid genetic algorithm-simulated annealing algorithm (GA-SA) Hybrid evolutionary algorithms SARIMA Back-propagation neural network BPNN Holt–Winters (HW) Seasonal Holt–Winters (SHW)

a b s t r a c t Accurate urban trafﬁc ﬂow forecasting is critical to intelligent transportation system developments and implementations, thus, it has been one of the most important issues in the research on road trafﬁc congestion. Due to complex nonlinear data pattern of the urban trafﬁc ﬂow, there are many kinds of trafﬁc ﬂow forecasting techniques in literature, thus, it is difﬁcult to make a general conclusion which forecasting technique is superior to others. Recently, the support vector regression model (SVR) has been widely used to solve nonlinear regression and time series problems. This investigation presents a SVR trafﬁc ﬂow forecasting model which employs the hybrid genetic algorithm-simulated annealing algorithm (GA-SA) to determine its suitable parameter combination. Additionally, a numerical example of trafﬁc ﬂow data from northern Taiwan is used to elucidate the forecasting performance of the proposed SVRGA-SA model. The forecasting results indicate that the proposed model yields more accurate forecasting results than the seasonal autoregressive integrated moving average (SARIMA), back-propagation neural network (BPNN), Holt– Winters (HW) and seasonal Holt–Winters (SHW) models. Therefore, the SVRGA-SA model is a promising alternative for forecasting trafﬁc ﬂow. 2011 Elsevier Inc. All rights reserved.

1. Introduction Providing sufﬁcient and necessary capacity of inter-urban motorway networks is an essential component of trafﬁc control and information systems management, particularly during daily peak periods. It is necessary to provide better synchronizing trafﬁc signals for trafﬁc management authorities and to provide more effective routes selections for the drivers. Since slightly inaccurate capacity forecasting will lead to congestion with huge social costs in terms of travel time, fuel costs and environment pollution, accurate forecasting of the trafﬁc ﬂow during peak periods is a very topic attracted interests in the literature. A lot of various forecasting approaches have been applied to forecast the trafﬁc ﬂow of inter-urban motorway networks. Those approaches can be classiﬁed according to the type of data, forecast horizon, and potential end-use [1], including Kalman state space ﬁltering models [2–5] and system identiﬁcation models [6]. However, trafﬁc ﬂow data are in the form of spatial time series and are collected at speciﬁc locations at constant intervals of time. These studies as mentioned above have indicated that the problem of forecasting inter-urban motorway trafﬁc ﬂow is multi-dimensional, including relationships among measurements made at different times and geographical sites. In addition, these methods are difﬁcult to cope with observation noise and missing values while modeling. Therefore, Danech-Pajouh and Aron [7] employ a layered statistical approach with a mathematical clustering technique to group the trafﬁc ﬂow data and a separately tuned linear regression

⇑ Corresponding author. E-mail address: [email protected] (W.-C. Hong). 0096-3003/$ - see front matter 2011 Elsevier Inc. All rights reserved. doi:10.1016/j.amc.2011.01.073

6734

W.-C. Hong et al. / Applied Mathematics and Computation 217 (2011) 6733–6747

model for each cluster. Their experimental results reveal that the proposed model is superior to the other forecasting approach, autoregressive integrated moving average models (ARIMA). Based on the multi-dimensional pattern recognition requests, such as intervals of time, geographical sites, and the relationships between dependent variable and independent variables, non-parametric regression models [8–10] have also successfully been employed to forecast motorway trafﬁc ﬂow. Furthermore, the ARIMA model, initially developed by Box and Jenkins [11], is one of the most popular alternatives in trafﬁc ﬂow forecasting [10,12–15]. For example, Kamarianakis and Prastacos [13] successfully apply the ARIMA model with space and time factors to forecast space–time stationary trafﬁc ﬂow. However, the critical limitation of the ARIMA model is that its natural tendency to concentrate on the mean values of the past series data is unable to capture the rapid variational process underlying inter-urban trafﬁc ﬂow [16]. Recently, as an extension of the ARIMA model, Williams [17] employs the seasonal ARIMA (SARIMA) model to trafﬁc ﬂow forecasting. The proposed model considers the peak/non-peak ﬂow periods by seasonal differencing and forecasting results report that it signiﬁcantly outperforms the heuristic forecast generation method in terms of forecasting accuracy. However, it is quite time consuming to detect the outlier required and to estimate the parameter of SARIMA model. These new ﬁndings are also encouraging authors to employ the SARIMA model as the benchmarking model in this study. Since, as mentioned above, the process underlying inter-urban trafﬁc ﬂow is complicated, and is difﬁcult to be captured by a single linear statistical algorithm. Being able to approximate any degree of complexity and without prior knowledge of problem solving, the artiﬁcial neural networks (ANN) has received much attention and has been considered as alternatives for trafﬁc ﬂow forecasting models [14,18–23]. ANN is based on emulating the processing of the human neurological system to determine related numbers of vehicle and temporal characteristics from the historical trafﬁc ﬂow patterns, especially for nonlinear and dynamic evolutions. Therefore, ANN is widely applied in trafﬁc ﬂow forecasting. Recently, Yin et al. [24] develop a fuzzy-neural model (FNM) to predict trafﬁc ﬂow in an urban street network. The FNM contains two modules, gate network (GN) and expert network (EN). The GN classiﬁes the input data using fuzzy approach, and the EN identiﬁes the input–output relationship by neural network approaches. The empirical results show that the FNM model provides more accurate forecasting results than the BPNN model. Based on the proper representation of trafﬁc ﬂow data with temporal and spatial characteristics, Vlahogianni et al. [22] employ a genetic algorithm based and multilayered structural optimization strategy to determine the appropriate neural network structure. Their results show that the capabilities of a simple static neural network, with genetically optimized step size, momentum and number of hidden units, are very satisfactory when modeling both univariate and multivariate trafﬁc data. Even though ANN forecasting models can approximate to any function particularly to nonlinear function, the limitations of ANN are concentrated not only on their difﬁculties to explain the operations of the so-called black-box (such as how to determine suitable network structure), but also on the non-convex problem of network training errors, i.e., it is hard to ﬁnd the global optimum. Proposed by Vapnik [25], support vector machines (SVMs) are one of the most signiﬁcant developments in overcoming shortcomings of ANN mentioned above. Rather than by implementing the empirical risk minimization (ERM) principle to minimize the training error, SVMs apply the structural risk minimization (SRM) principle to minimize an upper bound on the generalization error. SVMs can theoretically guarantee to achieve the global optimum, instead of trapping local optimum like ANN. Thus, the solution of a non-linear problem in the original lower dimensional input space can ﬁnd its linear solution in the higher dimensional feature space. For more detailed mechanisms introduction of SVMs, please refer to Cortes and Vapnik [26] and Vapnik et al. [27], among others. SVMs have found wide application in the ﬁeld of pattern recognition, bio-informatics, and other artiﬁcial intelligence relevant applications. Particularly, along with the introduction of Vapnik’s e-insensitive loss function, SVMs also have been extended to solve nonlinear regression estimation problems [28,29], which are so-called support vector regression (SVR). SVR has been successfully employed to solve forecasting problems in many ﬁelds, such as ﬁnancial time series (stocks index and exchange rate) forecasting [30–36], engineering and software ﬁeld (production values and reliability) forecasting [37], atmospheric science forecasting [38–41], electric load forecasting [42,43], and so on. Meanwhile, SVR has also been successfully applied to forecast tourist arrivals [44,45]. The empirical results indicate that the selection of the three parameters (C, e, and r) in a SVR model inﬂuences the forecasting accuracy signiﬁcantly. Numerous publications in the literature have given some recommendations on appropriate setting of SVR parameters [46], however, those approaches do not simultaneously consider the interaction effects among the three parameters. There is no general consensus and many contradictory opinions, thus, evolutionary algorithms are employed to determine appropriate parameter values. Genetic algorithms (GAs) are auto-adaptive stochastic search techniques [47] that are based on the Darwinian survivalof-the-ﬁttest philosophy and generate new individuals with selection, crossover and mutation operators. GAs start with a coding of the parameter set of all types of objective functions, thus, GAs have the ability in solving problems those traditional algorithms are not easily to solve. In Pai and Hong [37,44], SVR with GAs is superior to other competitive forecasting models (ARIMA and ANN). GAs are able to reserve a few best ﬁtted members of the whole population for the next generation in the operation process, however, after some generations GAs may lead to a premature convergence to a local optimum in the searching the suitable parameters of a SVR model. Simulated annealing (SA) is a stochastic based general search tool that mimics the annealing process of material physics [48]. When the system in the original state is greater than that of the new generated state, this new state is automatically accepted. In contrast, the new state is accepted by Metropolis criterion with a probability function. The performance of SA is dependent on the cooling schedule. Thus, SA has some institution to be able to escape from local minima and reach to the global minimum [49]. In Pai and Hong [43,37], SVR with SA is also superior to other competitive forecasting models (Weibull

6735

W.-C. Hong et al. / Applied Mathematics and Computation 217 (2011) 6733–6747

Bayes, ARIMA, and GRNN). However, SA costs more computation time. To ensure the efﬁciency of SA, a proper temperature cooling rate (stop criterion) should be considered. To overcome these drawbacks from GAs and SA, it is necessary to ﬁnd some effective approach and improvement to avoid misleading to the local optimum and to search optimum objective function efﬁciently. Genetic algorithm-simulated annealing (GA-SA) hybrid algorithm is a novel trial in dealing with the challenges mentioned above. The GA-SA can ﬁrstly employ the superiority of SA algorithm to escape from local minima and approximate to the global minimum, and secondly apply the mutation process of GAs to improve searching ability in the range of values. So, the hybrid algorithm has been applied to the ﬁelds of system design [50], system and network optimization [51,52], query to information retrieval system [53], continuous-time production planning [54,55], and electrical power districting problem [56]. However, there is little application of the GA-SA to SVR’s parameter determination. This investigation presented in this paper is motivated by a desire to solve the problem of maintaining the premature convergence to a local optimum of GAs and the efﬁciency of SA mentioned above in determining the three free parameters in a SVR trafﬁc ﬂow forecasting model. Therefore, the GA-SA algorithm is employed in the SVR model, namely SVRGA-SA, to forecast inter-urban motorway trafﬁc ﬂow in Panchiao city of Taipei County, Taiwan. The rest of this paper is organized as follows. Section 2 presents the models for comparing forecast performance and SVR models. Section 3 introduces the proposed SVRGA-SA forecasting model. Section 4 illustrates a numerical example that reveals the forecasting performance of the proposed models. Conclusions are ﬁnally made in Section 5. 2. Forecasting methodology In this investigation, four models, the seasonal ARIMA (SARIMA), back-propagation neural network (BPNN), seasonal Holt–Winters (SHW) models, and SVRGA-SA model, are used to compare the forecasting performance of trafﬁc ﬂow. 2.1. Models for comparing forecast performance 2.1.1. The seasonal autoregressive integrated moving average (SARIMA) model Proposed by Box and Jenkins [11], the seasonal ARIMA process has been one of the most popular approaches in time series forecasting, particularly for strong seasonal component. The SARIMA process is often referred to as the SARIMA (p, d, q) (P, D, Q)S model. Similar to the ARIMA model, the forecasting values are assumed to be a linear combination of past values and past errors. A time series {Xt} is a SARIMA process with seasonal period length S if d and D are nonnegative integers and if the differenced series Wt = (1 B)d(1 BS)DXt is a stationary autoregressive moving average process. In symbolic terms, the model can be written as:

/p ðBÞUP ðBS ÞW t ¼ hq ðBÞHQ ðBS Þet ;

t ¼ 1; 2; . . . ; N;

ð1Þ a

where N is the number of observations up to time t; B is the backshift operator deﬁned by B Wt = Wta; /p(B) = 1 /1B /pBp is called a regular (non-seasonal) autoregressive operator of order p; UP(BS) = 1 U1BS UPBPS is a seasonal autoregressive operator of order P; hq(B) = 1 h1B hqBq is a regular moving average operator of order q; HQ(BS) = 1 H1BS HQBQS is a seasonal moving average operator of order Q; et is identically and independently distributed as normal random variables with mean zero, variance r2, and cov(et, etk) = 0, "k – 0. In the deﬁnition above, the parameters p and q represent the autoregressive and moving average order, respectively; and the parameters P and Q represent the autoregressive and moving average order at the model’s seasonal period length, S, respectively. The parameters d and D represent the order of ordinary and seasonal differencing, respectively. Basically, when ﬁtting a SARIMA model to data, the ﬁrst task is to estimate values of d and D, the orders of differencing needed to make the series stationary and to remove most of the seasonality. The values of p, P, q and Q then need to be estimated by the autocorrelation function (ACF) and partial autocorrelation function (PACF) of the differenced series. Other model parameters may be estimated by suitable iterative procedures. 2.1.2. Back-propagation neural networks (BPNN) model The multi-layer back-propagation neural network (BPNN) is one of the most widely used neural network models. Consider the simplest BPNN architecture (Fig. 1) including three layers: an input layer (x), an output layer (o), and a hidden layer (h). The computational procedure of this network is described below:

oi ¼ f

X

!

g ij xij ;

ð2Þ

j

where oi denotes the output of node i, f() represents the activation function, gij is the connection weight between nodes i and j in the lower layer which can be replaced with vji and wkj, and xij denotes the input signal from the node j in the lower layer. The BPNN algorithm attempts to improve neural network performance by reducing the total error through changing the gradient weights. The BPNN algorithm minimizes the sum-of-error-square, which can be calculated by:

E¼

P X K 1X ðdpj opj Þ2 ; 2 p¼1 j¼1

ð3Þ

6736

W.-C. Hong et al. / Applied Mathematics and Computation 217 (2011) 6733–6747

Fig. 1. A back-propagation network.

where E denotes the square errors, K represents the output layer neurons, P is the training data pattern, dpj denotes the actual output and opj represents the network output. The BPNN algorithm is expressed as follows. Let Dvji denote the weight change for any hidden layer neuron and Dwkj for any output layer neuron,

@E i ¼ 1; . . . ; I; j ¼ 1; . . . ; J 1; @ v ji @E Dwkj ¼ g j ¼ 1; . . . ; J 1; k ¼ 1; . . . ; K; @wkj

Dv ji ¼ g

ð4Þ ð5Þ

where g represents the learning rate parameter. Notably, the Jth node in Fig. 1 is the bias neuron without weight. Eqs. (6) and (7) express the signal (sj) to each hidden layer neuron and the signal (uk) to each neuron in the output layer.

sj ¼

I X

v ji xi ;

ð6Þ

i¼1

uk ¼

J1 X

wkj yj :

ð7Þ

j¼1

The error signal terms for the jth hidden neuron dyj, and for the kth output neuron dok are deﬁned as Eqs. (8) and (9), respectively.

@E ; @sj @E ¼ : @uk

dyj ¼

ð8Þ

dok

ð9Þ

Applying the chain rule, the gradient of the cost function with respect to weights

@E @E @sj ¼ ; @ v ji @sj @ v ji @E @E @uk ¼ ; @wkj @uk @wkj

vji and wkj is ð10Þ ð11Þ

W.-C. Hong et al. / Applied Mathematics and Computation 217 (2011) 6733–6747

6737

and

@sj @ðv j1 x1 þ v j2 x2 þ þ v ji xi þ þ v jI xI Þ ¼ ¼ xi ; @ v ji @ v ji @ðwk1 y1 þ wk2 y2 þ þ wkj yj þ þ wkJ yJ Þ @uk ¼ ¼ yj ; @wkj @wkj

ð12Þ ð13Þ

combining Eqs. (8), (10), (12) and Eqs. (9), (11), (13) obtains

@E ¼ dyj xi @ v ji @E ¼ dok yj @wkj

ð14Þ ð15Þ

the weight change from Eqs. (4) and (5) can now be written as:

@E ¼ gdyj xi ; @ v ji @E Dwkj ¼ g ¼ gdok yj : @ekj

Dv ji ¼ g

ð16Þ ð17Þ

Furthermore, Eqs. (8) and (9) can be calculated as follows:

@E @E @ok ¼ ¼ ðdk ok Þf 0 ðuk Þ @uk @ok @uk ( ) K X @E @E @yj dyj ¼ ¼ ¼ ok wkj fj0 ðuj Þ @sj @yj @sj k¼1

dok ¼

the weights,

ð18Þ ð19Þ

vji and wkj, are changed as follows:

wkj ¼ wkj þ Dwkj ¼ wkj þ gdok yj ;

v ji ¼ v ji þ Dv ji ¼ v ji þ gfj0 ðuj Þxi

K X

ð20Þ dok wkj :

ð21Þ

k¼1

The constant term, g, is speciﬁed at the start of training cycle, and determines the training speed and stability of the network. The most common activation functions are the squashing sigmoid function, such as the logistic and tangent hyperbolic functions. 2.1.3. Holt–Winters (HW) and seasonal Holt–Winters (SHW) models The last model employed is the Holt–Winters (HW) method [57,58]. The Holt–Winters method is an extension of exponentially weighted moving average procedure. The exponentially weighted moving average approach forecasts future values based on past observations, and places more weight on the recent observations. The Holt–Winters method smoothes the trend values separately with two smoothing coefﬁcients (with values between 0 and 1) and incorporates an explicit linear trend in the forecast. The approach of Holt–Winters linear exponential smoothing is as follows:

st ¼ aat þ ð1 aÞðst1 þ bt1 Þ; bt ¼ bðst st1 Þ þ ð1 bÞbt1 ;

ð22Þ ð23Þ

ft ¼ st þ ibt ;

ð24Þ

where at is the actual value at time t; st is the smoothed estimate at time t; bt is the trend value at time t; a is the level smoothing coefﬁcient; and b is the trend smoothing coefﬁcient. The Eq. (22) lets the actual value be smoothed in a recursive manner by weighting the current level (a), and then adjusts st directly for the trend of the previous period, bt1, by adding it to the last smoothed value, st1. This helps to eliminate the lag and brings st to the approximate base of the current data value. The Eq. (23) updates the trend, which is expressed as the difference between the last two smoothed values. It modiﬁes the trend by smoothing with b in the last period (st st1), and adding that to the previous estimate of the trend multiplied by (1 b). The Eq. (24) is used to forecast ahead. The trend, bt, is multiplied by the number of periods ahead to be forecast, i, and added to the base value, st. The forecast error (et) is deﬁned as the actual value minus the forecast (ﬁtted) value for time period t, that is:

et ¼ at ft :

ð25Þ

The forecast error is assumed to be an independent random variable with zero mean and constant variance. Values of smoothing coefﬁcients, a and b, are determined to minimize the forecast NRMSE.

6738

W.-C. Hong et al. / Applied Mathematics and Computation 217 (2011) 6733–6747

To consider the seasonal effect, the seasonal Holt and Winters’ linear exponential smoothing (SHW) approach is also employed. The Holt–Winters method cannot be extended to accommodate additive seasonality if the magnitude of the seasonal effects does not change with the series or multiplicative seasonality if the amplitude of the seasonal pattern changes over time. The forecast for SHW model is as follows:

at þ ð1 aÞðst1 þ bt1 Þ; ItL bt ¼ bðst st1 Þ þ ð1 bÞbt1 ; at It ¼ c þ ð1 cÞItL ; st ft ¼ ðst þ ibt ÞItLþi ;

st ¼ a

ð26Þ ð27Þ ð28Þ ð29Þ

where L is the length of seasonality; I is the seasonal adjustment factor; and c is the seasonal adjustment coefﬁcient. The Eq. (26) differs slightly from Eq. (22) in that the ﬁrst term is divided by the seasonal number ItL; this is done to deseasonalize at (eliminate seasonal ﬂuctuations from at). The Eq. (28) is comparable to a seasonal index that is found as a ratio of current values of the series, at, divided by the smoothed value for the series, st. If at is larger than st, the ratio will be greater than 1, else, the ratio will be less than 1. In order to smooth the randomness of at, Eq. (28) weights the newly computed seasonal factor with c and the most recent seasonal number corresponding to the same season with (1 c). 2.2. The support vector regression (SVR) model 2.2.1. Structural risk minimization As mentioned above that traditional artiﬁcial intelligent approaches have tended to be based on ﬁnding functions to map as training errors over training set, i.e., empirical risk minimization (ERM). However, the ERM does not guarantee good generalization to novel testing data set. To separate the classes with a surface (hyperplane) that maximizes the margin between training data set, SVMs employ the SRM principle that aims to minimize a bound on the generalization error, rather than minimizing the mean square error over the training data set. SRM provides a well-deﬁned quantitative measurement for the capacity of a learned function to capture the true structure of the data distribution and generalize over unknown test data set. Vaplink-Chervonenkis (VC) dimension [31] has been applied for such a capacity, by selecting a function and minimizing its empirical error to a training data set, SRM can guarantee a minimal bound on the test data set. Given a training data set of N elements {(xi, yi), i = 1, 2, . . . , N}, where xi is the ith element in n-dimensional space, i.e., xi ¼ ½x1i ; . . . ; xni 2 Rn , and yi 2 {1, +1} is the label of xi. Then, deﬁne a deterministic function f: x ? {1, +1} for a given input data x and adjustable weights wðw 2 Rn Þ, according to the same but unknown probability distribution (P(x, y)). The weights w would be adjusted during the training stage. Since the underlying probability distribution P(x, y) is unknown, the upper bound for the probability of classiﬁcation errors on the test data set (i.e., expected error of f), R(f), cannot be minimized directly. Thus, it is feasible to estimate an approximate function of R(f), i.e., empirical risk, denoted as Remp(f), that is close to the optimal one based on the training data pairs (x, y). Then, according to the SRM principle [25,26], R(f) and Remp(f) are expressed as:

Rðf Þ 6 Remp ðf Þ þ e1 ðN; h; g; Remp Þ; Remp ðf Þ ¼

1 N

N X

jyi f ðxi Þjloss function ;

ð30Þ ð31Þ

i¼1

sﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ! R ðf Þ 2 e1 ðN; h; g; Remp Þ ¼ 2e0 ðN; h; gÞ 1 þ 1 þ 2 emp ; e0 ðN; h; gÞ sﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ ﬃ h ln 2N þ 1 ln g4 h : e0 ðN; h; gÞ ¼ N

ð32Þ

ð33Þ

The Eq. (30) holds with probability 1 g for 0 6 g 6 1. The e0(N, h, g) is the so-called VC conﬁdence interval. The values of e0(N, h, g) depend on the number of training data N, the VC dimension h, and the value of g. For a small empirical risk Remp(f), for example, close to 0, then Eq. (30) would approximately reduced to Remp ðf Þ þ 4e20 ðN; h; gÞ, in contrast, for a large empirical risk close to 1, the Eq. (30) would approximately reduced to Remp(f) + e0(N, h, g) [59]. Thus, there are two strategies for minimizing the upper bound, R(f). The ﬁrst one is to keep the VC conﬁdence (e0(N, h, g)) ﬁxed and to minimize the empirical risk, most of ANN models seek to employ the ﬁrst one. However, this does not perform well because dealing with Remp(f) lonely cannot guarantee reduction VC conﬁdence. The second one is to ﬁx the empirical risk to a small value and to minimize the VC conﬁdence, which is the so-called SRM principle. Although SVMs implement this principle, their training algorithm that aims to minimize the VC dimension is still based on a hierarchy that depends on the data [25,60].

6739

W.-C. Hong et al. / Applied Mathematics and Computation 217 (2011) 6733–6747

2.2.2. Support vector regression As mentioned above, SVMs have originally been used for classiﬁcation purposes but their principles can be extended easily to the task of regression and time series prediction. The brief ideas of SVMs for the case of regression are introduced. A nonlinear mapping uðÞ : Rn ! Rnh is deﬁned to map the input data (training data set) fðxi ; yi ÞgNi¼1 into a so-called high dimensional feature space (which may have inﬁnite dimensions), Rnh (Fig. 2(a) and (b)). Then, in the high dimensional feature space, there theoretically exists a linear function, f, to formulate the nonlinear relationship between input data and output data. Such a linear function, namely SVR function, is as Eq. (34):

f ðxÞ ¼ wT uðxÞ þ b;

ð34Þ nh

where f(x) denotes the forecasting values; the coefﬁcients w (w 2 R Þ and b (b 2 RÞ are adjustable. As mentioned above, SVMs method one aims at minimizing the empirical risk,

Remp ðf Þ ¼

N 1 X He ðyi ; wT uðxi Þ þ bÞ; N i¼1

ð35Þ

where He(y, f(x)) is the e-insensitive loss function (as thick line in Fig. 2(c)) and is deﬁned as Eq. (36),

He ðy; f ðxÞÞ ¼

jf ðxÞ yj e; if jf ðxÞ yj P e; 0;

ð36Þ

otherwise:

In addition, He(y, f(x)) is employed to ﬁnd out an optimum hyper plane on the high dimensional feature space (Fig. 2(b)) to maximize the distance separating the training data into two subsets. Thus, the SVR focuses on ﬁnding the optimum hyper plane and minimizing the training error between the training data and the e-insensitive loss function. Then, the SVR minimizes the overall errors,

Minw;b;n ;n Re ðw; n ; nÞ ¼

N X 1 T ni þ ni w wþC 2 i¼1

ð37Þ

with the constraints

e þ ni ; i ¼ 1; 2; . . . ; N; yi þ wT uðxi Þ þ b 6 e þ ni ; i ¼ 1; 2; . . . ; N;

yi wT uðxi Þ b 6 ni P 0;

i ¼ 1; 2; . . . ; N;

ni P 0;

i ¼ 1; 2; . . . ; N:

The ﬁrst term of Eq. (37), employed the concept of maximizing the distance of two separated training data, is used to regularize weight sizes, to penalize large weights, and to maintain regression function ﬂatness. The second term penalizes training errors of f(x) and y by using the e-insensitive loss function. C is a parameter to trade off these two terms. Training errors above +e are denoted as ni , whereas training errors below e are denoted as ni (Fig. 2(b)). After the quadratic optimization problem with inequality constraints is solved, the parameter vector w in Eq. (34) is obtained:

w¼

N X bi bi uðxi Þ;

ð38Þ

i¼1

where bi ; bi are obtained by solving a quadratic program and are the Lagrangian multipliers. Finally, the SVR regression function is obtained as Eq. (39) in the dual space,

+ε

(a)

(b)

(c)

0

ξi*

−ε

ξi*

ϕ(x) Hyper plane

ξi

Input space

Feature space Fig. 2. Transformation process illustration of a SVR model.

−ε +ε ε -insensitive loss function

6740

W.-C. Hong et al. / Applied Mathematics and Computation 217 (2011) 6733–6747

f ðxÞ ¼

N X

bi bi Kðxi ; xÞ þ b;

ð39Þ

i¼1

where K(xi, xj) is called the kernel function, and the value of the Kernel equals the inner product of two vectors, xi and xj, in the feature space u(xi) and u(xj), respectively; that is, K(xi, xj) = u(xi) u(xj). Any function that meets Mercer’s condition [25] can be used as the Kernel function. There are several types of kernel function. The most used kernel functions are the Gaussian RBF with a width of r: K(xi, xj) = exp(0.5kxi xjk2/r2) and the polynomial kernel with an order of d and constants a1 and a2: K(xi, xj) = (a1xixj + a2)d. If the value of r is very large, the RBF kernel approximates the use of a linear kernel (polynomial with an order of 1). Till now, it is hard to determine the type of kernel functions for speciﬁc data patterns [61,62]. However, the Gaussian RBF kernel is not only easier to implement, but also capable to nonlinearly map the training data into an inﬁnite dimensional space, thus, it is suitable to deal with nonlinear relationship problems. Therefore, the Gaussian RBF kernel function is speciﬁed in this study. The selection of the three parameters, r, e and C, of a SVR model inﬂuences the accuracy of forecasting. However, structural methods for conﬁrming efﬁcient selection of parameters efﬁciently are lacking. The GA-SA algorithm is used in the proposed SVR model to optimize the parameter selection. 3. Hybrid genetic algorithm-simulated annealing algorithm (GA-SA) To overcome the drawbacks from GAs and SA, this study proposes a hybrid GA-SA algorithm by applying the superiority of SA to escape from local minima and approximate to the global minimum, in addition, by using the mutation process of GAs to improve searching ability in the range of values. On the other hand, to avoid computation executing time consuming, only the optimal individual of GAs population will be delivered to the SA for further improving. The proposed GA-SA algorithm consists of the GAs part and the SA part. GAs evaluates the initial population and operates on the population using three basic genetic operators to produce new population (best individual), then, for each generation of GAs, it will be delivered to SA for further processing. After ﬁnishing all the processes of SA, the modiﬁed individual will be sent back to GAs for the next generation. These computing iterations will be never stopped till the termination condition of the algorithm is reached. The proposed procedure of GA-SA is illustrated as follow and the ﬂowchart is shown as Fig. 3. The procedure of the GAs part is illustrated as follows: Step 1: Initialization. Construct randomly the initial population of chromosomes. The three parameters, C, r, and e in a SVR model in the ith generation are encoded into a binary format; and represented by a chromosome that is composed of ‘‘genes’’ of binary numbers (Fig. 4). Each chromosome has three genes, which represent three parameters. Each gene has 40 bits. For instance, if each gene contains 40 bits, a chromosome contains 120 bits. More bits in a gene correspond to ﬁner partition of the search space. Step 2: Evaluating ﬁtness. Evaluate the ﬁtness (forecasting errors) of each chromosome. In this paper, a negative normalized root mean square error (NRMSE) is used as the ﬁtness function. The NRMSE is as Eq. (40),

vﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ u N N uX X NRMSE ¼ t ðai fi Þ2 = a2i ; i¼1

ð40Þ

i¼1

where ai and fi represent the actual and forecast values, and N is the number of forecasting periods. Step 3: Selection operation. Based on ﬁtness functions, chromosomes with higher ﬁtness values are more likely to yield offspring in the next generation. The roulette wheel selection principle [47] is applied to choose chromosomes for reproduction. Step 4: Crossover operation and mutation operation. Mutations are performed randomly by converting a ‘‘1’’ bit into a ‘‘0’’ bit or a ‘‘0’’ bit into a ‘‘1’’ bit. In crossover operation, chromosomes are paired randomly. The single-point-crossover principle is employed herein. Segments of paired chromosomes between two determined break-points are swapped. For simplicity, suppose a gene has four bits, thus, a chromosome contains 12 bits (Fig. 5). Before crossover is performed, the values of the three parameters in #1 parent are 1.5, 1.25 and 0.34375, respectively. For #2 parent, the three values are 0.625, 8.75 and 0.15625, accordingly. After crossover, for #1 offspring, the three values are 1.625, 3.75 and 0.40625, accordingly. For #2 offspring, the three values are 0.5, 6.25 and 0.09375, respectively. Finally, decode the crossover three parameters in a decimal format. Step 5: Stop condition. If the number of generation is equal to a given scale, then the best chromosomes are presented as a solution, otherwise go to the step 1 of the SA part. In the proposed GA-SA algorithm process, GAs will deliver its best individual to SA for further processing. After the optimal individual of GAs being improved, SA sends it back to GAs for the next generation. These computing iterations will be never stopped till the termination condition of the algorithm is reached. The procedure of the SA part is illustrated as follows:

6741

W.-C. Hong et al. / Applied Mathematics and Computation 217 (2011) 6733–6747

Generation =1 (random initial population )

Start with best individual from GAs

Generate initial current state (absolute value of forecasting error )

Is the number of generation less than or equal to the maximal number Random move to get a provisional state Yes

Calculate the fitness function (forecasting error )

Metropolis criterion test

No

Yes (1) E(Snew )>E(Sold ), and P(accept S new)> random number (2) E(Snew ) > E(Sold )

Parent selection operation No

No

Equilibrium?

Generation= generation+1

Crossover operation

Yes; or maximum iteration is reached

Set current state as the new system state

Mutation operation Temperature reduction

Optimum (GA-SA finished)

No

Frozen?

Yes (pre-determined temperature is reached )

SA finished

Fig. 3. The GA-SA algorithm ﬂowchart.

0

1

1

…

1

1

0

0

…

1

0

1

0

…

0

Fig. 4. Binary encoding of a chromosome.

Step 1: Generate initial current state. Receive values of the three parameters from GAs. The values of forecasting error, NRMSE, shown as Eq. (40), is deﬁned as the system state (E). Here, the initial state (E0) is obtained.

6742

W.-C. Hong et al. / Applied Mathematics and Computation 217 (2011) 6733–6747

before crossover Parameter

Parameter

Parameter

Parent 1

1

1

0

0

0

0

1

0

1

0

1

1

Parent 2

0

1

0

1

1

1

1

0

0

1

0

1

Crossover Point=1 after crossover Parameter

Parameter Offspring 1 Offspring 2

Parameter

1

1

0

1

0

1

1

0

0

1

0

1

0

1

0

0

1

0

1

0

0

0

1

1

Fig. 5. A simpliﬁed example of parameter representation.

Step 2: Provisional state. Make a random move to change the existing system state to a provisional state. Another set of three positive parameters are generated in this stage. Step 3: Metropolis criterion tests. The following Metropolis criterion equation is employed to determine the acceptance or rejection of provisional state [63].:

8 > < Accept the provisional state; if Eðsnew Þ > Eðsold Þ; and p < Pðaccept snew Þ; Accept the provisional state; if Eðsnew Þ 6 Eðsold Þ; > : Reject the provisional state; otherwise;

0 6 p 6 1; ð41Þ

where the p is a random number to determine the acceptance of the provisional state, P(accept snew), the probability of new Þ (T is the thermal accepting the new state is given by the following probability function, Pðaccept snew Þ ¼ exp Eðsold ÞEðs kT equilibrium temperature, k represents the Boltzmann constant). If the provisional state is accepted, then set the provisional state as the current state. Step 4: Incumbent solutions. If the provisional state is not accepted, then return to step 2. Furthermore, if the current state is not superior to the system state, then repeat steps 2 and 3 until the current state is superior to the system state, and set the current state as new system state. Previous studies [48] indicated that the maximum number of loops (Nsa) is 100d to avoid inﬁnitely repeated loops, where d denotes the problem dimension. In this investigation, three parameters (r, C, and e) are used to determine the system states. Therefore, Nsa is set to 300. Step 5: Temperature reduction. After the new system state is obtained, reduce the temperature. The new temperature reduction is obtained by the Eq. (42):

New temperature ¼ ðCurrent temperatureÞ q;

where 0 < q < 1:

ð42Þ

The q is set at 0.9 in this study [64]. If the pre-determined temperature is reached, then stop the algorithm and the latest state is an approximate optimal solution. Otherwise, go to step 2. 4. A numerical example and experimental results The trafﬁc ﬂow data sets are originated from three Civil Motorway detector sites. The Civil Motorway is the busiest interurban motorway networks in Panchiao city, the capital of Taipei County, Taiwan. The major site was located at the center of Panchiao city, where the ﬂow intersects an urban local street system, and it provides one way trafﬁc volume for each hour in weekdays. Therefore, one way ﬂow data for peak trafﬁc are employed in this investigation, which includes the morning peak period (from 6:00 to 10:00) and the evening peak period (from 16:00 to 20:00). The data collection is conducted from February 2005 to March 2005. During the observation period, the number of trafﬁc ﬂow data available for the morning and evening peak periods are 50 and 90 h, respectively. For convenience, the trafﬁc ﬂow data are converted to equivalent of passengers (EOP), and both of these two peak periods show the seasonality of trafﬁc data. In addition, trafﬁc ﬂow data are divided into three parts: training data, validation data and testing data. For the morning peak period the training data set, validation data set and testing data set are 30, 10 and 10 h accordingly. For the evening peak period, the experimental data are arranged as: training data (60 h), validation data (15 h) and testing data (15 h). The accuracy of forecasting models is measured Eq. (40). 4.1. Parameter determination of different comparative forecasting models The parameter selection of forecasting models is important for obtaining good forecasting performance. For the SARIMA model, the parameters are determined by taking the ﬁrst-order regular difference and ﬁrst seasonal difference to remove

6743

W.-C. Hong et al. / Applied Mathematics and Computation 217 (2011) 6733–6747

non-stationary and seasonality characteristics. Using statistical packages, with no residuals autocorrelated and approximately white noise residuals, the most suitable models for these two morning/evening peak periods for the trafﬁc data are SARIMA (1, 0, 1) (0, 1, 1)5 with non-constant item and SARIMA (1, 0, 1) (1, 1, 1)5 with constant item, respectively. The equations used for the SARIMA models are presented as Eqs. (43) and (44), respectively:

ð1 0:5167BÞð1 B5 ÞX t ¼ ð1 þ 0:3306BÞð1 0:9359B5 Þet ;

ð43Þ

ð1 0:5918BÞð1 B5 ÞX t ¼ 2:305 þ ð1 0:9003B5 Þet :

ð44Þ

For the Holt–Winters method, by Minitab 14 statistic software, the a value and b value, for morning peak period, are determined as 0.654 and 0.041, respectively; for evening peak period as 0.76 and 0.367, respectively. Similarly, the appropriate parameters (L, a, b, and c) in the SHW model for morning peak period are determined 5, 0.15, 0.72, and 0.73, correspondingly; for evening peak period as 5, 0.48, 0.04, and 0.13, correspondingly. For the BPNN model, Matlab 6.5 computing software is employed to implement the forecasting procedure. The number of nodes in the hidden layer is used as a validation parameter of the BPNN model. The most suitable number of hidden nodes of a BPNN model is three.

Table 1 Forecasting results and associated parameters of the SVRGA models. Morning peak period No. of fed-in data

5 10 15 20 25

Evening peak period Parameters

NRMSE of testing

r

C

e

0.7286 0.7138 0.7561 0.3933 0.9599

2149 1199 2037 8549 6850

0.8249 0.1460 0.9813 0.3155 0.4339

0.3965 0.3464 0.2705 0.3518 0.2631

No. of fed-in data

Parameters

25 30 35 40 45 50 55

NRMSE of testing

r

C

e

0.2583 0.0103 0.3359 0.0287 0.0327 0.1270 0.1104

8838 1352 9756 1363 1312 8954 9540

0.4464 0.0632 0.2076 0.6152 0.1607 0.2965 0.4314

0.1257 0.1115 0.1010 0.1177 0.1028 0.1226 0.1361

The bold values are determined as the ﬁnal values of three parameters in a SVR model.

Table 2 Forecasting results and associated parameters of the SVRGA-SA models. Morning peak period No. of fed-in data

5 10 15 20 25

Evening peak period Parameters

NRMSE of testing

r

C

e

0.5089 1.0479 1.0502 0.6858 0.9668

1013 1421 2522 2142 4518

0.2290 0.5395 0.6872 0.4724 0.3377

0.3265 0.3109 0.2602 0.2754 0.2634

No. of fed-in data

25 30 35 40 45 50 55

Parameters

NRMSE of testing

r

C

e

0.7472 0.6498 0.6717 0.1605 0.1372 0.1338 0.1903

10,215 10,948 9792 8774 10,791 9560 9359

0.8739 0.9494 0.7517 0.2110 0.6252 0.2565 0.5196

0.1112 0.1090 0.1004 0.1207 0.1401 0.1212 0.1017

The bold values are determined as the ﬁnal values of three parameters in a SVR model.

Table 3 Morning peak period trafﬁc ﬂow forecasting results (unit: EOP). Peak periods

Actual

031106a 031107 031108 031109 031110 031206 031207 031208 031209 031210

1317.5 2522.0 2342.0 2072.0 1841.5 995.5 1457.0 1899.0 1870.5 2151.5

NRMSE a

HW (a = 0.654, b = 0.041)

c = 0.73)

2239.5 1270.0 2364.5 2483.0 2188.5 2091.5 1277.0 2507.0 2627.0 2264.5

2243.1 1624.5 2222.3 2314.6 2163.4 1951.8 1299.6 1380.0 1710.8 1810.9

0.3682

0.2972

SARIMA (1, 0, 1) (0, 1, 1)5

BPNN

1363.87 2440.1 2593.9 2422.1 2459.9 1578.3 2569.9 2690.4 2505.4 25387.0 0.3039

‘‘031106’’ denotes the 6 o’clock on 11 March 2005, and so on.

SHW (L = 5, a = 0.15, b = 0.72,

SVRGA

SVRGASA

1361.3 2573.9 2677.6 2351.2 2256.9 1387.8 2656.8 2586.4 2329.1 2180.2

2190.9 2027.4 2140.5 2313.7 2053.4 1980.7 1704.1 1548.3 1521.0 1881.1

2148.3 2010.5 2013.6 2364.2 2065.3 1820.8 1663.5 1441.3 1419.8 1813.5

0.2706

0.2631

0.2602

6744

W.-C. Hong et al. / Applied Mathematics and Computation 217 (2011) 6733–6747

4.2. SVRGA-SA trafﬁc forecasting model For the SVRGA-SA model, in the training stage, a rolling-based forecasting procedure is conducted, and, in the validation and testing stage, a 1-h-ahead forecasting policy adopts. Then, several types of data-rolling are considered to forecast trafﬁc ﬂow in the next hour. Different numbers of the trafﬁc ﬂow in a time series are fed into the SVRGA-SA model to forecast the trafﬁc ﬂow in the next validation period. While training errors improvement occurs, the three kernel parameters, r, C, and e of the SVRGA-SA model adjusted by GA-SA algorithm are employed to calculate the validation error. Then, the adjusted parameters with minimum validation error are selected as the most appropriate parameters. Table 1 indicates that SVRGA (SVR with genetic algorithm) models perform the best when 25 and 35 fed-in data are used for morning/evening trafﬁc forecast, respectively. Table 2 indicates that SVRGA-SA (SVR with hybrid GA and SA algorithms) models perform the best when 15 and 35 input data are used for morning/evening trafﬁc forecast, respectively.

Table 4 Evening peak period trafﬁc ﬂow forecasting results (unit: EOP). Peak periods

Actual

SARIMA (1, 0, 1) (1, 1, 1)5

BPNN

HW (a = 0.76, b = 0.367)

SHW (L = 5, a = 0.48, b = 0.04, c = 0.13)

SVRGA

SVRGASA

031016a 031017 031018 031019 031020 031116 031117 031118 031119 031120 031216 031217 031218 031219 031220

2310.5 2618.0 2562.0 2451.5 2216.5 2175.5 2577.0 2879.5 2693.0 2640.0 2146.5 2544.5 2873.0 2567.5 2660.5

2573.84 2821.57 3107.01 3103.66 3011.80 2611.58 2859.31 3144.75 3141.40 3049.54 2649.32 2897.05 3182.49 3179.13 3087.28

2844.5 2110.5 2620.0 2853.5 2873.0 2624.5 1999.0 2191.5 2500.5 2386.0 2203.5 2116.0 2522.5 2620.0 2512.0

2270.5 2259.8 2590.9 2619.7 2495.7 2209.5 2100.2 2512.1 2943.3 2835.2 2714.6 2152.1 2429.0 2869.0 2658.3

1999.5 2260.0 2475.5 2433.2 2312.3 2004.6 2254.4 2444.4 2404.0 2281.8 2000.5 2251.7 2438.7 2383.7 2268.2

2251.6 2364.8 2423.4 2598.9 2361.3 2101.2 2269.6 2156.5 2792.6 2693.9 2542.7 2379.6 2688.5 2602.0 2626.5

2209.8 2321.2 2364.4 2569.9 2423.7 2369.9 2153.1 2431.8 2643.9 2729.3 2618.8 2596.6 2534.4 2623.9 2579.3

0.1821

0.1636

0.1232

0.1142

0.1010

0.1004

NRMSE a

‘‘031016’’ denotes the 16 o’clock on 10 March 2005, and so on.

EOP 3,000.00

2,500.00

2,000.00

1,500.00

1,000.00

500.00

Actual

SARIMA (1,0,1)× (0,1,1)5

BPNN

HW

SHW

SVRGA

SVRGA-SA 0.00 031106

031107

031108

031109

031110

031206

031207

031208

031209

Peak periods *: “031106” denotes the 6 o’clock on 11 March 2005, and so on. Fig. 6. The forecasting morning trafﬁc ﬂow peak of SARIMA, BPNN, HW, SHW, SVRGA and SVRGA-SA models.

031210

W.-C. Hong et al. / Applied Mathematics and Computation 217 (2011) 6733–6747

6745

EOP 3,300.00 3,100.00 2,900.00 2,700.00 2,500.00 2,300.00 2,100.00 1,900.00 1,700.00

Actual

SARIMA (1,0,1)×(1,1,1)5

BPNN

HW

SHW

SVRGA

SVRGA-SA 1,500.00 031016 031017 031018 031019 031020 031116 031117 031118 031119 031120 031216 031217 031218 031219 031220

Peak periods *: “031016” denotes the 16 o’clock on 10 March 2005, and so on. Fig. 7. The forecasting evening trafﬁc ﬂow peak of SARIMA, BPNN, HW, SHW, SVRGA and SVRGA-SA models.

4.3. Forecasting results discussions The well-trained models, SARIMA, BPNN, HW, SHW, SVRGA, and SVRGA-SA, are applied to forecast the trafﬁc ﬂow during the morning/evening peak period. Tables 3 and 4 show the actual values and the forecast values obtained using various forecasting models in the morning peak and evening peak, respectively. The NRMSE values for each peak hour are calculated to compare fairly the proposed models with other alternative models. The proposed SVRGA-SA model has smaller NRMSE values than the SARIMA, BPNN, HW, SHW, and SVRGA models to capture the trafﬁc ﬂow patterns on hourly average basis. In addition, the GA-SA algorithm also helps to avoid trapping into local minimum than GAs did, thus, outperform the SVRGA model. For example, in Table 2, case of morning trafﬁc ﬂow peak, the GA-SA algorithm is then excellently to shift the local solution of SVRGA model by 25 fed-in data rolling type, (r, C, e) = (0.9599, 6850, 0.4339), with local optimal forecasting errors in terms of NRMSE (0.2631), to another better solution by 15 fed-in data rolling type, (r, C, e) = (1.0502, 2522, 0.6872), with more appropriate local optimal forecasting error in terms of NRMSE (0.2602). Thus, it once again reveals that GA-SA algorithm is much appropriate than GAs in parameter adjustments to achieve forecasting accuracy improvement by integrated into the SVR model, similar capability of GA-SA can be also found in the case of evening ﬂow peak. Figs. 6 and 7 illustrate the forecasting trafﬁc ﬂow of SARIMA, BPNN, HW, SHW, SVRGA, and SVRGA-SA models. 5. Conclusions Accurate trafﬁc forecast is crucial for the inter-urban trafﬁc control system, particularly for avoiding congestion and for increasing efﬁciency of limited trafﬁc resources during peak periods. The historical trafﬁc data of Panchiao city in northern Taiwan shows a seasonal ﬂuctuation trend which occurs in many inter-urban trafﬁc systems. Therefore, over-prediction or under-prediction of trafﬁc ﬂow inﬂuences the transportation capability of an inter-urban system. In this investigation, SVRGA-SA model is proposed to forecast inter-urban trafﬁc ﬂow. This study evaluates the feasibility of GA-SA algorithm in parameter determination to achieve forecasting accuracy improvement by integrated into the SVR model. Many forecasting methodologies have been proposed to deal with the seasonality of trafﬁc ﬂow. However, most models are time consuming in verifying the suitable time-phase divisions, particularly when the sample size is large. In this investigation, the SVRGA-SA model provides a convenient and valid alternative for trafﬁc ﬂow forecasting. The SVRGA-SA model directly uses historical observations from trafﬁc control systems and then determines suitable parameters by efﬁcient optimization algorithms. Tables 3 and 4 illustrate that the SVRGA-SA model has given better results than other forecasting models; in the meanwhile, particularly, the SVRGA-SA model has avoided being trapped in local optimum like SVRGA model does. The superior performance of the SVRGA-SA model is caused of: (1) nonlinear mapping capabilities and so can more easily capture data patterns of tourist arrivals than can SARIMA models; (2) SVR model applies structural risk minimization rather than minimizing the training errors, this minimization of an upper bound on the generalization error provides better generalization performance than that of SARIMA models; ﬁnally; (3) the parameter selection in a SVR model heavily inﬂuences their forecasting performance, thus, improper selection of these three parameters will lead to either over-ﬁtting

6746

W.-C. Hong et al. / Applied Mathematics and Computation 217 (2011) 6733–6747

or under-ﬁtting of a SVR model. The GA-SA algorithm employs the superiority of SA algorithm to escape from local minima to the global minimum, and then, applies the mutation process of GAs’ searching capability to determine proper parameters combination. The favorable results obtained in this work reveal that the proposed model is a valid alternative for use in high technological products demand forecasting science. In the future, other novel hybrid evolutionary algorithms should be further applied to obtain more appropriate parameter combination, and then, to achieve more improvable, satisfactory accurate trafﬁc ﬂow forecasting while issues of trafﬁc congestions improvement rising. Other socio-economic factors and meteorological control variables during peak periods, such as driving speed limitation, important social events, the percentage of heavy vehicles, bottleneck service level, and waiting time during intersection trafﬁc signals can be included in the trafﬁc forecasting model. In addition, even the proposed SVRGA-SA model is one of the hybrid forecasting models; some other advanced optimization algorithms for parameters selection can be applied for the SVR model to satisfy the requirement of real-time trafﬁc control systems. The goal of the author is to show that combination of novel techniques is as good as pure techniques. Acknowledgments This research was conducted with the support of National Science Council, Taiwan, ROC (NSC 99–2410-H-161–001), and the support of National Natural Science Foundation of China (No. 70801048). References [1] M.S. Dougherty, Investigation of Network Performance Prediction Literature Review (Technical Note 394), Institute for Transport Studies, University of Leeds, UK, 1996. [2] I. Okutani, Y.J. Stephanedes, Dynamic prediction of trafﬁc volume through Kalman ﬁltering theory, Transportation Research Part B 18 (1984) 1–11. [3] A. Stathopoulos, G.M. Karlaftis, A multivariate state space approach for urban trafﬁc ﬂow modeling and prediction, Transportation Research Part C 11 (2003) 121–135. [4] B. van Arem, M. van der Vlist, J. de Ruuiter, M. Muste, S. Smulders, Travel time estimation in the GERDIEN project, in: Proceedings of the Second DRIVE II Workshop on Short-Term Trafﬁc Forecasting, Delft, 1994. [5] J. Whittaker, S. Garside, K. Lindveld, Tracking and predicting a network trafﬁc process, in: Proceedings of the Second DRIVE II Workshop on Short-Term Trafﬁc Forecasting, Delft, 1994. [6] P.C. Vythoulkas, Alternative approaches to short term forecasting for use in driver information systems, in: C.F. Daganzo (Ed.), Transportation and Trafﬁc Theory, Elsevier, Amsterdam, 1993. [7] M. Danech-Pajouh, M. Aron, ATHENA: a method for short-term inter-urban motorway trafﬁc forecasting, Recherche Transports Sécurité 6 (1991) 11– 16. [8] R.L. Eubank, Spline Smoothing and Nonparametric Regression, Marcel Dekker Inc., New York, 1988. [9] B.L. Smart, M.J. Demetsky, Trafﬁc ﬂow forecasting: comparison of modeling approaches, Journal of Transportation Engineering 123 (1997) 261–266. [10] B.L. Smith, B.M. Williams, R.K. Oswald, Comparison of parametric and nonparametric models for trafﬁc ﬂow forecasting, Transportation Research Part C 10 (2002) 303–321. [11] G.E.P. Box, G.M. Jenkins, Time Series Analysis: Forecasting and Control, Holden-Day, San Francisco, 1976. [12] M.M. Hamed, H.R. Al-Masaeid, Z.M. Bani-Said, Short-term prediction of trafﬁc volume in urban arterials, ASCE Journal of Transportation Engineering 121 (1995) 249–254. [13] Y. Kamarianakis, P. Prastacos, Space-time modeling of trafﬁc ﬂow, Computers & Geosciences 31 (2005) 119–133. [14] H.R. Kirby, S.M. Watson, M.S. Dougherty, Should we use neural networks or statistical models for short-term motorway trafﬁc forecasting?, International Journal of Forecasting 13 (1997) 43–50 [15] B.M. Williams, Multivariate vehicular trafﬁc ﬂow prediction: an evaluation of ARIMAX modeling, Transportation Research Record 1776 (2001) 194– 200. [16] P.J. Angeline, G.M. Saunders, J.B. Pollack, An evolutionary algorithm that constructs recurrent neural networks, IEEE Transactions on Neural Networks 5 (1994) 54–65. [17] B.M. Williams, Modeling and Forecasting Vehicular Trafﬁc Flow as a Seasonal Stochastic Time Series Process, Ph.D. Dissertation, Department of Civil Engineering, University of Virginia, Charlottesville, Virginia, 1999. [18] H. Chen, S. Grant-Muller, L. Mussone, F. Montgomery, A study of hybrid neural network approaches and the effects of missing data on trafﬁc forecasting, Neural Computing and Applications 10 (2001) 277–286. [19] M.S. Dougherty, M.R. Cobbett, Short-term inter-urban trafﬁc forecasts using neural networks, International Journal of Forecasting 13 (1997) 21–31. [20] L. Florio, L. Mussone, Neural network models for classiﬁcation and forecasting of freeway trafﬁc ﬂow stability, Control Engineering Practice 4 (1996) 153–164. [21] C. Ledoux, An urban trafﬁc ﬂow model integrating neural networks, Transportation Research Part C 5 (1997) 287–300. [22] E.I. Vlahogianni, M.G. Karlaftis, J.C. Golias, Optimized and meta-optimized neural networks for short-term trafﬁc ﬂow prediction: a genetic approach, Transportation Research Part C 13 (2005) 211–234. [23] G. Zhang, B.E. Patuwo, M.Y. Hu, Forecasting with artiﬁcial neural networks: the state of art, International Journal of Forecasting 14 (1998) 35–62. [24] H. Yin, S.C. Wong, J. Xu, C.K. Wong, Urban trafﬁc ﬂow prediction using a fuzzy-neural approach, Transportation Research Part C 10 (2002) 85–98. [25] V. Vapnik, The Nature of Statistical Learning Theory, Springer-Verlag, New York, 1995. [26] C. Cortes, V. Vapnik, Support vector networks, Machine Learning 20 (1995) 273–297. [27] V. Vapnik, S. Golowich, A. Smola, Support vector machine for function approximation, regression estimation, and signal processing, Advances in Neural Information Processing Systems 9 (1996) 281–287. [28] H. Drucker, C.J.C. Burges, L. Kaufman, A. Smola, V.N. Vapnik, Support vector regression machines, Advances in Neural Information Processing Systems 9 (1997) 155–161. [29] C.A. Martin, S.F. Witt, Forecasting tourism demand: a comparison of accuracy of several quantitative methods, International Journal of Forecasting 5 (1989) 7–19. [30] L. Cao, Support vector machines experts for time series forecasting, Neurocomputing 51 (2003) 321–339. [31] L. Cao, Q. Gu, Dynamic support vector machines for non-stationary time series forecasting, Intelligent Data Analysis 6 (2002) 67–83. [32] W. Huang, Y. Nakamori, S.Y. Wang, Forecasting stock market movement direction with support vector machine, Computers & Operations Research 32 (2005) 2513–2522. [33] P.F. Pai, C.S. Lin, Using support vector machines in forecasting production values of machinery industry in Taiwan, International Journal of Advanced Manufacturing Technology 27 (2005) 205–210.

W.-C. Hong et al. / Applied Mathematics and Computation 217 (2011) 6733–6747 [34] [35] [36] [37] [38] [39] [40] [41] [42] [43] [44] [45] [46] [47] [48] [49] [50] [51] [52] [53] [54] [55] [56] [57] [58] [59] [60] [61] [62] [63] [64]

6747

P.F. Pai, C.S. Lin, A hybrid ARIMA and support vector machines model in stock price forecasting, Omega 33 (2005) 497–505. F.E.H. Tay, L. Cao, Application of support vector machines in ﬁnancial time series forecasting, Omega 29 (2001) 309–317. F.E.H. Tay, L. Cao, Modiﬁed support vector machines in ﬁnancial time series forecasting, Neurocomputing 48 (2002) 847–861. P.F. Pai, W.C. Hong, Software reliability forecasting by support vector machines with simulated annealing algorithms, Journal of Systems and Software 79 (2006) 747–755. W.C. Hong, P.F. Pai, Potential assessment of the support vector regression technique in rainfall forecasting, Water Resources Management 21 (2007) 495–513. M.A. Mohandes, T.O. Halawani, S. Rehman, A.A. Hussain, Support vector machines for wind speed prediction, Renewable Energy 29 (2004) 939–947. P.F. Pai, W.C. Hong, A recurrent support vector regression model in rainfall forecasting, Hydrological Processes 21 (2007) 819–827. W. Wang, Z. Xu, J.W. Lu, Three improved neural network models for air quality forecasting, Engineering Computations 20 (2003) 192–210. P.F. Pai, W.C. Hong, Forecasting regional electric load based on recurrent support vector machines with genetic algorithms, Electric Power Systems Research 74 (2005) 417–425. P.F. Pai, W.C. Hong, Support vector machines with simulated annealing algorithms in electricity load forecasting, Energy Conversion and Management 46 (2005) 2626–2669. P.F. Pai, W.C. Hong, An improved neural network model in forecasting arrivals, Annals of Tourism Research 32 (2005) 1138–1141. P.F. Pai, W.C. Hong, P.T. Chang, C.T. Chen, The application of support vector machines to forecast tourist arrivals in Barbados: an empirical study, International Journal of Management 23 (2006) 375–385. V. Cherkassky, Y. Ma, Practical selection of SVM parameters and noise estimation for SVM regression, Neural Networks 17 (2004) 113–126. J. Holland, Adaptation in Natural and Artiﬁcial System, University of Michigan Press, Ann Arbor, 1975. S. Kirkpatrick, C.D. Gelatt, M.P. Vecchi, Optimization by simulated annealing, Science 220 (1983) 671–680. J. Lee, G.E. Johnson, Optimal tolerance allotment using a genetic algorithm and truncated Monte Carlo simulation, Computer Aided Design 25 (1983) 601–611. H.J. Shieh, R.C. Peralta, Optimal in situ bioremediation design by hybrid genetic algorithm-simulated annealing, Journal of Water Resources Planning and Management 131 (2005) 67–78. S.G. Ponnambalam, M.M. Reddy, A GA-SA multiobjective hybrid search algorithm for integrating lot sizing and sequencing in ﬂow-line scheduling, International Journal of Advanced Manufacturing Technology 21 (2003) 126–137. F. Zhao, X. Zeng, Simulated annealing—genetic algorithm for transit network optimization, Journal of Computing in Civil Engineering 20 (2006) 57–68. O. Cordón, F. Moya, C. Zarco, A new evolutionary algorithm combining simulated annealing and genetic programming for relevance feedback in fuzzy information retrieval systems, Soft Computing 6 (2002) 308–319. K. Ganesh, M. Punniyamoorthy, Optimization of continuous-time production planning using hybrid genetic algorithms-simulated annealing, International Journal of Advanced Manufacturing Technology 26 (2005) 148–154. Z.G. Wang, Y.S. Wong, M. Rahman, Optimisation of multi-pass milling using genetic algorithm and genetic simulated annealing, International Journal of Advanced Manufacturing Technology 24 (2004) 727–732. P.K. Bergey, C.T. Ragsdale, M. Hoskote, A simulated annealing genetic algorithm for the electrical power districting problem, Annals of Operations Research 121 (2003) 33–55. C.C. Holt, Forecasting Seasonal and Trends by Exponentially Weighted Averages, Carnegie Institute of Technology, Pittsburgh, PA, 1957. P.R. Winters, Forecasting sales by exponentially weighted moving averages, Management Science 6 (1960) 324–342. S. Haykin, Neural Networks: A Comprehensive Foundation, Prentice-Hall, Upper Saddle River, NJ, 1999. J. Shawe-Taylor, P.L. Bartlett, R.C. Williamson, M. Anthony, Structural risk minimization over data-dependent hierarchies, IEEE Transactions on Information Theory 44 (1998) 1926–1940. S. Amari, S. Wu, Improving support vector machine classiﬁers by modifying kernel functions, Neural Networks 12 (1999) 783–789. K. Vojislav, Learning and Soft Computing—Support Vector Machines, Neural Networks and Fuzzy Logic Models, The MIT Press, Massachusetts, 2001. N. Metropolis, A.W. Rosenbluth, M.N. Rosenbluth, A.H. Teller, Equations of state calculations by fast computing machines, Journal of Chemical Physics 21 (1953) 1087–1092. A. Dekkers, E.H.L. Aarts, Global optimization and simulated annealing, Mathematical Programming 50 (1991) 367–393.

Hybrid evolutionary algorithms in a SVR traffic flow forecasting model

Hybrid evolutionary algorithms in a SVR traffic flow forecasting model

Recommend Documents