Daily long-term traffic flow forecasting based on a deep neural network

Expert Systems With Applications 121 (2019) 304–312 Contents lists available at ScienceDirect Expert Systems With Applications journal homepage: www...

Download PDF

3MB Sizes 0 Downloads 51 Views

Report

Full Text

Expert Systems With Applications 121 (2019) 304–312

Contents lists available at ScienceDirect

Expert Systems With Applications journal homepage: www.elsevier.com/locate/eswa

Daily long-term traﬃc ﬂow forecasting based on a deep neural network Licheng Qu a,b, Wei Li a, Wenjing Li c, Dongfang Ma c,∗, Yinhai Wang d a

School of Electronic and Information Engineering, Xi’an Jiaotong University, Xianning Road, No. 28, Xi’an 710049, China School of Information Engineering, Chang’an University, Middle-Section of South 2nd Ring Road, Xi’an 710064, China c Institute of Marine information science and technology, Zhejiang University, Yuhangtang Road, No. 866, Hangzhou 310058, China d Department of Civil & Environmental Engineering, University of Washington, More 201, Seattle 98105, USA b

a r t i c l e

i n f o

Article history: Received 26 June 2018 Accepted 18 December 2018 Available online 21 December 2018 Keywords: Daily long-term traﬃc ﬂow Forecasting Deep neural network Contextual factor Batch training

a b s t r a c t Daily traﬃc ﬂow forecasting is critical in advanced traﬃc management and can improve the eﬃciency of ﬁxed-time signal control. This paper presents a traﬃc prediction method for one whole day using a deep neural network based on historical traﬃc ﬂow data and contextual factor data. The main idea is that traﬃc ﬂow within a short time period is strongly correlated with the starting and ending time points of the period together with a number of other contextual factors, such as day of week, weather, and season. Therefore, the relationship between the traﬃc ﬂow values within a given time interval and a combination of contextual factors can be mined from historical data. First, a predictor was trained using a multi-layer supervised learning algorithm to mine the potential relationship between traﬃc ﬂow data and a combination of key contextual factors. To reduce training times, a batch training method was proposed. Finally, a Seattle-based case study shows that, overall, the proposed method outperforms the conventional traﬃc prediction method in terms of prediction accuracy. © 2018 Elsevier Ltd. All rights reserved.

1. Introduction Traﬃc ﬂow forecasting is important for both government agencies and individual travelers (Zhang & Xie, 2007). Local government agencies use traﬃc ﬂow forecasting to predict potential congestion and other events (Haijema & Hendrix, 2014) in order to guide suitable traﬃc interventions, such as altering traﬃc signal timing or closing certain roads. Moreover, real-time traﬃc information is now available on navigation apps such as Waze and Google Maps and allows individual travelers to plan their trip in advance and adjust their route at any particular moment (Lee, Tseng, & Tsai, 2009). Traﬃc ﬂow forecasting can be classiﬁed into long-term and short-term forecasting. Short-term traﬃc forecasting has been a critical component of most Intelligent Transportation Systems since the early 1980s (Bezuglov & Comert, 2016; Vlahogianni, Karlaftis, & Golias, 2014). Traﬃc predictions of the near future, from few seconds to possibly a few hours based on current and past traﬃc information, can be used in various applications such as adaptive traﬃc signal control and route guidance. Long-term predictions tar∗

Corresponding author. E-mail addresses: [email protected] (L. Qu), [email protected] (W. [email protected] (W. Li), [email protected] (D. Ma), [email protected] (Y. Wang). https://doi.org/10.1016/j.eswa.2018.12.031 0957-4174/© 2018 Elsevier Ltd. All rights reserved.

Li),

get one or more days in the future and are used by traﬃc management agencies to achieve appropriate traﬃc management and avoid potential traﬃc congestion, thus ensuring operational eﬃciency (García-Ródenas & Verastegui-Rayo, 2013; Haijema & Hendrix, 2014). This paper focuses on forecasting traﬃc ﬂow data one whole day into the future, which is referred to as daily traﬃc ﬂow forecasting. In particular, for ﬁxed-time signal control strategies, daily traﬃc ﬂow forecasting is crucial. Traﬃc ﬂow on the target day must ﬁrst be forecast and the whole day is then split into several intervals based on the forecast. Finally, appropriate signal timing is planned for each interval. There has been a considerable amount of research on shortterm traﬃc ﬂow forecasting (Van Lint & Van Hinsbergen, 2012; Vlahogianni et al., 2014). Two major types of forecasting are the classical statistical methods and machine learning methods (Cetin & Comert, 2006; Cools, Moons, & Wets, 2009; Fowe & Chan, 2013; Kerner, Klenov, Hermanns, & Schreckenberg, 2013; Treiber & Kesting, 2012; Wang & Papageorgiou, 2005; Yuan, Van Lint, Wilson, van Wageningen-Kessels & Hoogendoorn, 2012). However, long-term daily traﬃc ﬂow forecasting has attracted less attention. Comparisons of the existing traﬃc ﬂow forecasting methods have previously been presented in the literature (Smith, Williams, & Oswald, 2002; Stathopoulos & Karlaftis, 2003;

L. Qu, W. Li and W. Li et al. / Expert Systems With Applications 121 (2019) 304–312

Vlahogianni et al., 2004; Chen, Wang, Li, Hu, & Zhang, 2012; Lippi, Bertini, & Frasconi, 2013). Machine learning methods such as artiﬁcial neural network (ANN) and hybrid methods are often reported to outperform classical statistical methods, such as historical averaging and smoothing techniques. In this paper, long-term traﬃc ﬂow forecasting is used and a daily prediction method based on machine learning is presented that uses a deep neural network (DNN). The remainder of this paper is organized as follows: Section 2 summarizes the existing literature; The proposed forecasting model is then established in Section 3. In Section 4, the new method is evaluated based on a real-world traﬃc ﬂow data; Finally, the main contribution of the paper and summary of new ﬁndings are presented in Section 5.

2. Literature review The simplest method of forecasting daily traﬃc ﬂow is taking all the historical data, dividing the day into time intervals, and averaging the values across each interval. In this case, the average historical data is regarded as a forecasting result for each time interval. For long-time historical data, traﬃc ﬂow is typically characterized by non-reproducible ﬂow patterns that vary from day to day, however, traﬃc ﬂow trajectories look similar on typical workdays—morning and afternoon peaks during commuter periods and a lull in traﬃc after midnight (Wang & Shi, 2013). Although some days are similar, signiﬁcant differences between daily patterns can also exist and will negatively affect traﬃc ﬂow forecasting. Selecting appropriate historical data can be helpful for improving prediction accuracy, and some works based on this idea have previously been published. Methods based on historical data selection are called prediction-after-classiﬁcation methods (Habtemichael & Cetin, 2016) and consist of three steps: 1) classify the historical data into several groups based on similar daily traﬃc ﬂow data of each group; 2) select an appropriate group of historical data for the target day; and 3) ﬁnish forecasting by using the appropriate data for that group. For example, Hosseini, Moshiri, Rahimi-Kian, and Nadjar Araabi (2014) collected historical data from all the workdays of a previous 6-month period (128 days) and used it to predict the daily traﬃc ﬂow on the following Friday. It was assumed that traﬃc ﬂow on the target Friday would be similar to all workdays of the previous 6 months. Hou and Li (2016) divided historical data into two groups according to their degree of similarity, then using wavelet analysis, split the daily traﬃc ﬂow series into a basic series and deviation series. For a single target day, historical data should ﬁrst be selected based on whether the target day is a workday or not and the average of the basic series across the whole group is considered as the basic forecasting result. Then, deviation of the target day is predicted using common time series forecasting methods, such as the autoregressive integrated moving average algorithm (ARIMA) (Cools et al., 2009; Smith et al., 2002) or generalized regression neural network algorithm (GRNN) (Hou & Li, 2016). Differences among daily traﬃc ﬂow patterns are not just caused by workdays and weekends, but are also strongly correlated with other contextual factors, such as day of week, weather, holiday, season, and big events. The National Cooperative Highway Research Program (NCHRP) Report 765 (Horowitz, Creasey, Pendyala, & Chen, 2014) has identiﬁed a number of important factors associated with traﬃc demand and divided daily historical traﬃc ﬂow data into 24 groups according to the following four criteria 1) weekday, Saturday, or Sunday; 2) summer day or non-summer day; 3) rainy day or not; 4) holiday or not. Then, the historical data group can be selected based on the attributes of the target day and the mean of the group can used to forecast a traﬃc ﬂow value.

305

In addition to the simple methods used for daily traﬃc ﬂow classiﬁcation, ﬂow patterns can also be recognized using unsupervised learning algorithms including k-means clustering with various distance metrics, such as Euclidean distance (Xia, Huang, & Guo, 2012) and spectral distance (García-Ródenas, López-García, & Sánchez-Rico, 2017). Other methods for pattern classiﬁcation include hierarchical clustering analysis (Weijermars & Van Berkum, 2005), wavelet analysis (Jiang & Adeli, 2004), support vector machines (SVM) (Castro-Neto, Jeong, Jeong, & Han, 2009; Wang & Shi, 2013), k-nearest neighbors (Habtemichael & Cetin, 2016; Lin, Li, & Sadek, 2013; Zhang, Liu, Yang, Wei & Dong, 2013; Zheng, Lee, & Shi, 2006), and neural networks (Celikoglu et al., 2013; Tsai, Lee, & Wei, 2009). Despite the advanced methods for classifying trafﬁc ﬂow patterns, no clear method for the selection of appropriate group data exists. Furthermore, existing studies on daily forecasting are based on statistical analyses, such as those of Hosseini et al. (2014) and Hou and Li (2016), and research on the mining of traﬃc ﬂow data is lacking. The traﬃc ﬂow within a given time period, for example 6:00– 6:10, usually varies within a small range on different days and differences are caused by the contextual factors. By allowing timeof-day, i.e., the starting and ending time points, to act as an additional contextual factor, the relationship between the traﬃc ﬂow value and a particular combination of contextual factors can be mined using a machine learning algorithm to obtain an appropriate predictor. By inputting all possible combinations of contextual factors on the target day into the predictor, traﬃc ﬂow data for the whole day can be predicted (described in further detail in Section 3). 3. Methodology In general, traﬃc ﬂow data is collected with a given time interval, e.g., 5 min, 15 min, 1 h. In this paper, the aim was to predict daily traﬃc ﬂow by mining the correspondence between the traﬃc ﬂow data and a combination of contextual factors. The framework of the forecasting method is illustrated in Fig. 1. The DNN, a deep learning architecture, allows computational models to be composed of multiple processing layers in order to learn representations of data with multiple levels of abstraction (LeCun, Bengio, & Hinton, 2015). The DNN is frequently recommended for mining potential relationships within massive multidimensional historical datasets (Hinton, Osindero, & Teh, 2006). However, the DNN training process usually takes a long time, moreover, optimization of the process is necessary. Therefore, the proposed method considers two aspects: prediction logic with a deep neural network; batch training and generating a predictor. 3.1. Prediction logic with deep neural network For feature extraction and transformation, the DNN uses a cascade of multiple layers with nonlinear processing units. Each successive layer uses the output of the previous layer as input, and training is based on a distinct set of patterns according to the output of the previous layer. Factors are aggregated and recombined layer by layer, therefore, more factors advancing through the neural net are helpful in recognizing complex patterns. The DNN is fully connected and feedforward. Models of the DNN can be viewed as simple mathematical models deﬁning a function θ : Z→Yˆ , or a distribution over Z, or both Z and Yˆ . Here, Z is the set of contextual factors and Yˆ is the future traﬃc volume to be predicted. Assuming z is the daily vector of contextual factors, z ∈ Z, the neuron network function f(z) can be deﬁned as a composition of some function g(z) and the function of each layer can be decomposed into other functions. Based on the deﬁnition

306

L. Qu, W. Li and W. Li et al. / Expert Systems With Applications 121 (2019) 304–312

Fig. 1. Framework of the forecasting method.

bination of the nodes of the previous layer. Analogously, for nodes in the subsequent layer, the inputs are also a mixture of the nodes in different proportions according to their coeﬃcients. The combination of inputs in different layers is very important in a DNN, as it signiﬁcantly reduces error. Before training the forecasting model and determining the parameters w and b, the loss function L should ﬁrst be deﬁned. The loss function is an important concept in machine learning, as it is a measure of how far away a particular solution is from the optimal solution of the problem to be solved. The function can be expressed as

L y, yˆ =

Fig. 2. Architecture of deep neural network prediction model.

of the neural network function, a network structure can be conveniently designed with arrows depicting the dependencies between functions. The nonlinear weighted sum is a widely used function:

f (z ) = K

w i gi ( z )

(1)

i

where wi is the weight parameter of gi ; gi refers to a speciﬁc function, which forms the set g = (g1 , g2 ,…gi ,…); and K is a predeﬁned function, such as the sigmoid, soft-max or rectiﬁer function, and is commonly referred as the activation function. An important characteristic of the activation function is that it provides a smooth transition as input values change. In this paper, the tanh function (exp(z)-exp(-z))/((exp(z)+exp(-z)) for K is considered, which can generate a non-linear value and compress it between −1 and 1. The architecture of the DNN model is presented in Fig. 2, with some input nodes (zi ∈ Z) and only one output node, the predictor, which can be deﬁned as yˆ ∈ Yˆ , such that

yˆ = z · W + b

(2)

where z is the row contextual input factor, i.e., the daily vector of the contextual factor, z ∈ Rd = Z; w is the weight parameter; and b is the bias. If w and b are given different meanings and ranges, the formula can also be used to represent the whole neural network, every layer, or even every neuron in the network. In this way, a set of units is used to compute a weighted sum based on the inputs from the previous layer and the result is passed through a non-linear function. After feeding the contextual factor vector into the network, the internal state (activation) of the neurons and layers changes according to the input, and the prediction is produced based on the factors and activation function. The network is constructed by connecting the output of a certain neuron to the input of other neurons, thus forming a directed, weighted graph. The weights as well as the functions that compute the activation can be modiﬁed by a process called learning, which is governed by a learning rule (Chandra & Sharma, 2016). Multiple nonlinear regression is used at every node of a neural network. For each node of a single layer, the inputs are the recom-

1 1 y − yˆ 2 = y − f (z )2 2 2

(3)

where y is the actual value, yˆ is the predicted value, and z is a vector of contextual factors across some input training set. Several algorithms, for example the Stochastic Gradient Descent or other improved methods, can be used to train and update the forecasting model. Backpropagation is a method used to calculate the gradient of the loss function (the cost associated with a given state is produced) with respect to the weights of a DNN. The computational costs of the backward pass (BP) are essentially those of the forward pass (FP). Forward and backward passes are re-iterated until a suﬃcient performance level is reached (Schmidhuber, 2015). The weight updates of backpropagation can be done via stochastic gradient descent using the following equation:

wi j (t + 1 ) = wi j (t ) + η

∂L + ξ (t ) ∂ wi j

(4)

where η is the learning rate and ξ (t) is a stochastic term. The choice of loss function depends on a number of factors including the learning type and activation function. At the same time, this mechanism can be used to update parameter b. 3.2. Batch training and predictor generation Learning is a diﬃcult task in densely-connected neural networks, as there are many hidden layers and inferring the conditional distribution of hidden activities with a given a data vector is diﬃcult. Accurate regression results can be obtained by training samples one by one, however, this results in lengthy training times, particularly for a DNN with a large number of parameters (Srivastava, Hinton, Krizhevsky, Sutskever, & Salakhutdinov, 2014). Hinton et al. (2006) presented a fast and greedy algorithm for transforming representations assuming different layers of the model have the same number of units, therefore, deep, directed belief networks can be learned one layer at a time. This simple case is unsuitable for general neural networks. In this paper, Hinton’s algorithm is extended by removing the assumptions, thereby presenting a new algorithm to train models consisting of different numbers of units layer by layer. Online training and batch training are two essential training schemes used for neural networks. Theoretical analysis of the two schemes shows that the former has several advantages over the latter with respect to the absolute value of the expected difference. However, if the variance of the per-instance gradient remains constant, online training does not result in convergence to the

L. Qu, W. Li and W. Li et al. / Expert Systems With Applications 121 (2019) 304–312

optimal weight with respect to the expected squared difference (Nakama, 2009). It has been proven that convergence rate does not decrease with increasing mini-batch size (Li, Zhang, Chen, & Smola, 2014). Based on the characteristics of the loss function, batch training was used to train the model. The direction was determined by the whole data set to better represent the sample population and thus achieve greater accuracy in the direction of the extreme value. The batched matrix of the proposed model can be expressed as

Yˆ = Z · W + B

(5)

where Z is a T × M matrix, and each row vector (zt ) represents a training sample, T is the number of training samples or batch size and M is the feature number of each input sample; W is an M × N matrix; and B is a T × N matrix, N is the hidden number of neurons of the next layer or the number of output nodes in the output layer. The formula can be rewritten as

yˆ11 ... yˆt1

... ... ...

yˆ1n ... yˆtn

=

z11 ... zt1

b11 + ... bt1

... ... ...

z 1m ... ztm

... ... ...

b 1n ... btn

w11 · ... wm1

... ... ...

w 1n ... wmn

(6)

where t ∈ R = T, m ∈ R = M, and n ∈ R = N. In this way, stochastic gradient descent and backpropagation algorithms can be used to train the DNN. Moreover, batch training can speed up the training process and generalize the model. The training process can be described as follows: Based on training, a ﬁne-tuned DNN and predictor were obtained to predict future traﬃc volumes. The new model only has one output node, as shown in Fig. 2, and the traﬃc volume at any given TOD can be obtained from the output node after inputting a combination of contextual factors into the predictor for a particular TOD. The ﬁne-tuned DNN works well but is slightly inconvenient since only one traﬃc volume can be predicted at a time. Using the same matrix operation as the batch training, a factor matrix Z can be fed into the model and a predicted traﬃc volume matrix Y can then be output at the ﬁnal layer. Here, Z is the factor matrix and zi ∈Z represents a prediction sample, which is used to predict traﬃc within different time periods in the future. Each zi consists of several factors zij which are the primary contextual factors inﬂuencing traﬃc volume. Because the output layer of the predictors have only one hidden neuron (N = 1), the batch prediction formula can be simpliﬁed as

⎡ ⎤

yˆ 1 z11 . ⎣ . ⎦ = ... . zt1 yˆ t

... ... ...

z 1m ... ztm

⎡ w1 ⎤ ⎡b1 ⎤ · ⎣ .. ⎦ + ⎣ .. ⎦ . wm

. bt

(7)

where t is the number of training samples or batch size and m is the feature number of each input sample. Now, when a batch of contextual factors is input into the model, a batch of predicted traﬃc volumes is achieved. 4. Experimental To evaluate the performance of the proposed method, experiments were performed using real data. In addition, a comparison between the proposed method and a frequently-used method was performed in two dimensions, time and space. 4.1. Data preparation and description A real-world traﬃc dataset obtained from the Digital Roadway Interactive Visualization and Evaluation Network (DRIVENET,

307

http://uwdrive.net/STARLab) was used in the experiment and included data for traﬃc ﬂow, occupancy, and point speed on the highways of Seattle, Washington (Ma, Wu, & Wang, 2011). The trafﬁc ﬂow data can be downloaded within certain intervals (5 min, 10 min, 15 min, 1 h…). In this study, data from multiple loop detectors on the I5 freeway were downloaded for different time intervals and mileposts ranging from 180 to 190 miles, which contain a total of sixteen detectors, as shown in Fig. 3. The majority of the case study was based on data from the sixteen loop detectors between February 1, 2015 to March 31, 2016 as the training/test dataset. Data of the last month, from March 1 to 31, 2016, was selected as the forecasting target, and the remainder of 2016 was treated as historical data used to train the forecasting model. Some abnormalities were found in the dataset and can be attributed to issues with the loop detectors or transmission interruptions between the local controller and data center. Abnormal data will have a negative inﬂuence on the forecasting models and should be ﬁltered out before training and forecasting. Since trafﬁc volumes are different between the peak and off-peak hours, the abnormal data was adjusted differently for two time periods: 5:00 AM–6:00 PM and the rest of the day. For 5:00 AM–6:00 PM, if the traﬃc data within a certain time interval was 60% more or less than the previous or next time interval, it was considered abnormal. For the rest of the day, 80% was used as the threshold. Abnormal data points were removed and replaced with adjusted values based on the adjacent data points of the previous time interval. For 5:00 AM–6:00 PM, random values between 0.8 and 1.1 were selected and multiplied with traﬃc data from the previous time interval. For the rest of the day, random values between 0.9 and 1.3 were selected and multiplied with traﬃc data of the previous time interval. Another criterion was used for one special case: if no traﬃc ﬂow data existed for more than one hour in one day, the whole day was removed from the dataset. 4.2. Forecasting based on the proposed method 4.2.1. Deep neural network training It is diﬃcult to determine a suitable number of hidden layers and hidden nodes for a DNN. In general, the number of hidden nodes is often considered to be related to the number of input nodes, approximately 1 to 3 times of the number of input nodes. Therefore, the greedy layer-wise unsupervised learning algorithm was used to optimize the number of hidden nodes in each layer and to determine the number of hidden layers of the whole DNN. After repeated learning and veriﬁcation, a six-layer DNN was constructed and the number of hidden neurons in each layer was [15, 18, 22, 9, 5]. The learning rate was set to 0.01 and the batch training size was set as the number of data points associated with each day. An example of training and evaluation is presented in Fig. 4, based on data from detector 1 during a 5-min time interval. In the ﬁrst few iterations, the loss forecast of the neural network decreases sharply, then ﬂattens, and eventually converges. The predictor was achieved both with and without the batch training method using the same machine conﬁgurated with a 64bit quad-core 3.2 GHz AMD Ryzen 5 1400 processor and 8GB of main memory. The average batch training time for sixteen detectors across ﬁve different time intervals was approximately 12 min. With online training, however, the average time was approximately 2460 min. Batch training can save a signiﬁcant amount of time, and the time was reduced by 99.51%. 4.2.2. Forecasting results using proposed method Forecasting results contains P data points for a whole day (P = 1440/ where is the time interval in minutes). For each data point, the absolute percentage error (APE) was used to

308

L. Qu, W. Li and W. Li et al. / Expert Systems With Applications 121 (2019) 304–312

Fig. 3. Locations of detectors in Seattle, Washington.

Fig. 5. Forecasting results and APE on March 15 at milepost 180.66. Fig. 4. Training and evaluation of deep neural network.

evaluate the forecasting accuracy, expressed as

APE(t ) =

y(t ) − yˆ(t ) y(t )

(8)

where, y(t) is the true value of traﬃc ﬂow in the tth interval; yˆ (t ) is the forecasted value of traﬃc ﬂow in the tth interval; APE(t) denotes the absolute percentage error for traﬃc ﬂow in the tth interval. For the time series forecasting results of the whole day, the mean absolute percentage error (MAPE) was used to assess the results, expressed as

MAPE =

T 1 |APE(t )| T

(9)

t=1

where T is the total number of intervals in the whole day. Taking March 15 and detector 1 as an example, the APE across whole day is shown in Fig. 5. It can be seen that just two data points have a high APE, approaching 30%, however, most of the APE values are less than 15%. The average APE value across the whole day is approximately 6.136%. 4.2.3. Robustness analysis of the proposed method Before using the proposed method in actual applications, it is necessary to assess its robustness. The robustness of the method was evaluated by considering different time intervals, days, and detectors. The forecasting results using the proposed algorithm for detector 1 across 31 target days for the 5-min time interval are shown in

Fig. 6. APEs at different days with an interval of 5 min.

Fig. 6. It can be observed that the predictions are accurate across the whole month and high APEs mainly occurred between 23:00 and 6:00 the following day. The reason for this is that the trafﬁc ﬂow during the time periods from 22:00 to 6:00 the next day are low and relative ﬂuctuations are high at the same data points across different days. For example, the number of vehicles collected by one detector between 0 0:0 0–0 0:05 was 5 one day and 10 the next, therefore, the relative ﬂuctuation is 100%, and based on this, accuracy of the prediction model will be low. Thus, high APEs during low-traﬃc-ﬂow periods are reasonable. In addition, no special measures are required for the low-traﬃc-ﬂow period, and high prediction errors are acceptable for these periods.

L. Qu, W. Li and W. Li et al. / Expert Systems With Applications 121 (2019) 304–312

309

Table 1 MAPEs for different time intervals. Date March March March March March March March March March March March March March March March

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

March 16

5 min

10 min

15 min

30 min

60 min

Date

10.18 13.04 8.93 13.11 10.73 11.48 11.24 7.46 8.01 6.80 5.85 6.22 5.70 5.10 4.34

9.21 14.95 10.34 11.49 10.44 11.61 11.81 8.75 6.57 6.16 5.41 5.21 5.14 3.84 3.69

10.40 16.50 8.59 12.17 13.04 12.95 10.13 6.64 5.09 5.42 5.39 4.84 4.70 3.57 3.55

7.08 8.00 10.80 10.33 11.68 9.62 8.27 4.98 5.53 6.29 5.52 4.32 3.79 3.43 3.53

6.41 8.31 9.37 9.71 11.73 9.59 8.81 9.00 8.27 6.89 6.08 4.69 4.14 3.16 2.90

March March March March March March March March March March March March March March March

5.34

5.49

4.45

4.31

3.03

Average

17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

5 min

10 min

15 min

30 min

60 min

4.27 3.87 5.30 9.81 6.31 6.16 4.96 5.05 5.54 5.15 4.95 5.76 6.50 5.88 8.59

3.65 2.62 5.29 7.57 5.34 5.16 4.06 4.41 5.11 5.10 6.69 5.44 8.95 11.52 9.55

3.34 2.72 4.86 7.30 5.00 5.07 4.15 4.46 4.87 5.20 6.41 4.46 6.56 7.80 12.37

3.43 2.65 5.24 4.85 4.49 4.29 3.78 3.62 4.39 4.29 3.24 3.29 5.22 4.80 5.53

2.84 2.47 4.01 4.29 3.71 3.64 3.45 2.87 4.37 4.29 4.44 4.03 3.97 6.03 7.75

7.15

7.12

6.84

5.50

5.62

Algorithm 1 Batch training. 1) Prepare the historical data. 2) Split the historical data into the training set, validation set, and test set. 3) Initialize the weight matrices W and B, 4) Randomly extract T samples from the training set, 5) Feed the batch samples into the DNN as input, 6) Calculate the difference between the predicted output and real traﬃc volume, 7) Calculate the descending gradient and ﬁne-tune the weight matrices W and B 8) Return to Step (4) until the desired accuracy or epoch is reached.

Fig. 7. APE of different detectors for a 1-h interval on March 22.

The time interval was varied from 5 min to 10 min, 15 min, 30 min, and 1 h. The MAPE values for different time intervals from March 1 to 31, 2016 using data from detector 1 are presented in Table 1. It can be observed that the MAPE values are generally smaller for larger time intervals, which is expected since forecasting becomes relatively easier. However, if less data is input, there is less historical information for forecasting. Therefore, MAPE values may not always be smaller for larger time intervals. The previous evaluations focused on data from loop detector 1 on the Seattle highway, however, the method was also applied using data from ﬁfteen additional detectors on the same highway. The APE distribution for a 1-h time interval on March 22 is presented in Fig. 7. The same conclusions can be made. The proposed method is robust among different detectors and high APE values appear during low-traﬃc-ﬂow periods. 4.2.4. Comparison to alternative methods To further evaluate the beneﬁts of the proposed method (denoted DL), forecasting results were compared to those of a frequently-used conventional method based on the NCHRP Report 765 (denoted CM). The CM divides historical days of the whole year into 24 groups and the mean of these groups is used to forecast target days. The

Fig. 8. Comparison of forecasting results based on different prediction methods (180.17 milepost on March 22).

forecasting results of detector 1 on March 22 using the proposed method are compared to the CM in Fig. 8. From Fig. 8, it can be observed that the forecasting results of the proposed method are closer to the true traﬃc ﬂow data. The MAPE values of the forecasting results using the DL and CM method are summarized as box plots in Fig. 9. The ﬁve sub-graphs in the ﬁrst column show differences between the MAPEs of the sixteen detectors across the 31 days in March 2016. Another ﬁve sub-graphs in the second column present differences between the MAPEs of each day in March 2016, across the sixteen detectors. It can be observed that the MAPEs of the proposed method across different days are slightly less than corresponding values of the CM. The same conclusion can be deduced for different detectors. Thus, our new method offers a stable advantage, particularly in terms of forecasting accuracy. However, several inconsistent points can be observed in ten of the sub-graphs of the ﬁrst two columns, shown in Fig. 9. For example, from March 3–5 for the 60-min time interval (the last sub-graph of the second column), the DL method

310

L. Qu, W. Li and W. Li et al. / Expert Systems With Applications 121 (2019) 304–312

Fig. 9. Box plots of MAPEs based on different forecasting methods with various spatial and time scales.

is slightly inferior to the CM. The main reason for this may simply be the high degree of volatility associated with low-traﬃc-ﬂow periods. Therefore, any forecasting method will not be robust during low-demand periods and the CM will always outperform the DL method. A summary of the MAPEs for different time intervals and spatial scales from 6:00 to 22:00 (after omitting low traﬃc periods) is presented in columns 3 and 4 of Fig. 9. Average performance for the whole month of March using the proposed method is better than the CM, as shown in the subgraphs of Fig. 9. Furthermore, the average performance of sixteen detectors each day is also higher using the DL method compared to

the CM, with the exception of March 13. On this particular day, the MAPEs of the DL method are higher than those of the CM method for all the ﬁve conditions. The reason for this confusing result is that March 13 is the ﬁrst day of the daylight time in 2016, and a signiﬁcant number of travelers, especially non-commuters, did not change their departure time on time, thus the traﬃc ﬂow data on this day is signiﬁcantly different from historical values, which have a similar combination of contextual factors. Therefore, predictions based on the relationship between the traﬃc ﬂow data and contextual factors will cause large errors. For special cases like this, DNN-based methods perform much worse than the CM, which

L. Qu, W. Li and W. Li et al. / Expert Systems With Applications 121 (2019) 304–312

311

Fig. 10. Box plots of MAPEs based on different forecasting methods for different time interval (a) 0:0 0–24:0 0; (b) 06:0 0–22:0 0.

simply predicts the result based on average values within one group. The ﬁnal analysis assessed box plots of the MAPEs for different time intervals based on the DL method and CM, as shown in Fig. 10. As an example, the results of detector 1 were compared and similar conclusions can be made. The DL method performs better than the CM, especially during the period between 6:00– 22:00. 5. Conclusions In this paper, a new prediction method for daily long-term trafﬁc ﬂow was presented based on mining the relationship between contextual factors and traﬃc ﬂow data using a deep neutral network. By combining the contextual factors with different TODs throughout the day, we can predict daily long-term traﬃc ﬂow. The training process for deep neutral networks based on massive historical data is typically extremely time-consuming; therefore, a batch training method was developed to address the issue. In addition, a case study based on Seattle was presented using traﬃc data collected by loop detectors on the I5 freeway. The results of the proposed method are robust in both the temporal and spatial context, and predictions were superior to the CM, particularly during high-demand periods (e.g., 6:0 0–22:0 0). The ﬁndings presented in this paper are useful for improving traﬃc management in cities, including the performance of ﬁxed-time traﬃc signal control strategies such as time-of-day and other traﬃc management systems. Special events, such as accidents and construction activities, can signiﬁcantly inﬂuence traﬃc ﬂow, but were not considered in the proposed method. In the future, we plan to expand the method by collecting additional data on special events to provide more contextual factors and historical data. Acknowledgements This Research is supported by the National Natural Science Foundation of China (61773337 and 61773338), Zhejiang Provincial Natural Science Foundation (LY17F030 0 09), Fundamental Research Funds for the Central Universities (2018QNA4050), Zhejiang Province Key Research and Development Plan (2018C01007), and National key research and development program (2016YFE01080 0 0). References Bezuglov, A., & Comert, G. (2016). Short-term freeway traﬃc parameter prediction: Application of grey system theory models. Expert Systems with Applications, 62, 284–292.

Castro-Neto, M., Jeong, Y. S., Jeong, M. K., & Han, L. D. (2009). Online-SVR for short-term traﬃc ﬂow prediction under typical and atypical traﬃc conditions. Expert Systems with Applications, 36(3), 6164–6173. Celikoglu, H. B. (2013). An approach to dynamic classiﬁcation of traﬃc ﬂow patterns. Computer-Aided Civil and Infrastructure Engineering, 28(4), 273–288. Cetin, M., & Comert, G. (2006). Short-term traﬃc ﬂow prediction with regime switching models. Transportation Research Record: Journal of the Transportation Research Board, (1965), 23–31. Chen, C., Wang, Y., Li, L., Hu, J., & Zhang, Z. (2012). The retrieval of intra-day trend and its inﬂuence on traﬃc prediction. Transportation Research Part C: Emerging technologies, 22, 103–118. Chandra, B., & Sharma, R. K. (2016). Deep learning with adaptive learning rate using Laplacian score. Expert Systems with Applications, 63, 1–7. Cools, M., Moons, E., & Wets, G. (2009). Investigating the variability in daily trafﬁc counts through use of ARIMAX and SARIMAX models: Assessing the effect of holidays on two site locations. Transportation Research Record: Journal of the Transportation Research Board, (2136), 57–66. Fowe, A. J., & Chan, Y. (2013). A microstate spatial-inference model for network– traﬃc estimation. Transportation Research Part C: Emerging Technologies, 36, 245–260. García-Ródenas, R., & Verastegui-Rayo, D. (2013). Adjustment of the link travel-time functions in traﬃc equilibrium assignment models. Transportmetrica A: Transport Science, 9(9), 798–824. García-Ródenas, R., López-García, M. L., & Sánchez-Rico, M. T. (2017). An approach to dynamical classiﬁcation of daily traﬃc patterns. Computer-Aided Civil and Infrastructure Engineering, 32(3), 191–212. Habtemichael, F. G., & Cetin, M. (2016). Short-term traﬃc ﬂow rate forecasting based on identifying similar traﬃc patterns. Transportation Research Part C: Emerging Technologies, 66, 61–78. Haijema, R., & Hendrix, E. M. (2014). Traﬃc responsive control of intersections with predicted arrival times: A Markovian approach. Computer-Aided Civil and Infrastructure Engineering, 29(2), 123–139. Hinton, G. E., Osindero, S., & Teh, Y. W. (2006). A fast learning algorithm for deep belief nets. Neural Computation, 18(7), 1527–1554. Horowitz, A., Creasey, T., Pendyala, R., & Chen, M. (2014). Analytical travel forecasting approaches for project-level planning and design (No. Project 08-83). Hosseini, S. H., Moshiri, B., Rahimi-Kian, A., & Nadjar Araabi, B. (2014). Trafﬁc ﬂow prediction using MI algorithm and considering noisy and data loss conditions: An application to Minnesota traﬃc ﬂow prediction. PROMET-Trafﬁc&Transportation, 26(5), 393–403. Hou, Z., & Li, X. (2016). Repeatability and similarity of freeway traﬃc ﬂow and long-term prediction under big data. IEEE Transactions on Intelligent Transportation Systems, 17(6), 1786–1796. Jiang, X., & Adeli, H. (2004). Wavelet packet-autocorrelation function method for traﬃc ﬂow pattern analysis. Computer-Aided Civil and Infrastructure Engineering, 19(5), 324–337. Kerner, B. S., Klenov, S. L., Hermanns, G., & Schreckenberg, M. (2013). Effect of driver over-acceleration on traﬃc breakdown in three-phase cellular automaton trafﬁc ﬂow models. Physica A: Statistical Mechanics and its Applications, 392(18), 4083–4105. LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521(7553), 436–444. Lee, W. H., Tseng, S. S., & Tsai, S. H. (2009). A knowledge based real-time travel time prediction system for urban network. Expert Systems with Applications, 36(3), 4239–4247. Li, M., Zhang, T., Chen, Y., & Smola, A. J. (2014). Eﬃcient mini-batch training for stochastic optimization. In Proceedings of the 20th ACM SIGKDD international conference on knowledge discovery and data mining (pp. 661–670). ACM. Lin, L., Li, Y., & Sadek, A. (2013). A k nearest neighbor based local linear wavelet neural network model for on-line short-term traﬃc volume prediction. Procedia-Social and Behavioral Sciences, 96, 2066–2077.

312

L. Qu, W. Li and W. Li et al. / Expert Systems With Applications 121 (2019) 304–312

Lippi, M., Bertini, M., & Frasconi, P. (2013). Short-term traﬃc ﬂow forecasting: An experimental comparison of time-series analysis and supervised learning. IEEE Transactions on Intelligent Transportation Systems, 14(2), 871–882. Ma, X., Wu, Y. J., & Wang, Y. (2011). DRIVE net: E-science transportation platform for data sharing, visualization, modeling, and analysis. Transportation Research Record: Journal of the Transportation Research Board, (2215), 37–49. Nakama, T. (2009). Theoretical analysis of batch and on-line training for gradient descent learning in neural networks. Neurocomputing, 73(1–3), 151–159. Schmidhuber, J. (2015). Deep learning in neural networks: An overview. Neural networks, 61, 85–117. Smith, B. L., Williams, B. M., & Oswald, R. K. (2002). Comparison of parametric and nonparametric models for traﬃc ﬂow forecasting. Transportation Research Part C: Emerging Technologies, 10(4), 303–321. Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. (2014). Dropout: A simple way to prevent neural networks from overﬁtting. The Journal of Machine Learning Research, 15(1), 1929–1958. Stathopoulos, A., & Karlaftis, M. G. (2003). A multivariate state space approach for urban traﬃc ﬂow modeling and prediction. Transportation Research Part C: Emerging Technologies, 11(2), 121–135. Treiber, M., & Kesting, A. (2012). Validation of traﬃc ﬂow models with respect to the spatiotemporal evolution of congested traﬃc patterns. Transportation Research Part C: Emerging Technologies, 21(1), 31–41. Tsai, T. H., Lee, C. K., & Wei, C. H. (2009). Neural network based temporal feature models for short-term railway passenger demand forecasting. Expert Systems with Applications, 36(2), 3728–3736. Van Lint, J. W. C., & Van Hinsbergen, C. P. I. J. (2012). Short term traﬃc and travel time prediction models. Artiﬁcial Intelligence Applications to Critical Transportation Issues, 22(1), 22–41 2012.

Vlahogianni, E. I., Golias, J. C., & Karlaftis, M. G. (2004). Short-term traﬃc forecasting: overview of objectives and methods. Transport Reviews, 24(5), 533–557. Vlahogianni, E. I., Karlaftis, M. G., & Golias, J. C. (2014). Short-term traﬃc forecasting: Where we are and where we’re going. Transportation Research Part C: Emerging Technologies, 43, 3–19. Wang, J., & Shi, Q. (2013). Short-term traﬃc speed forecasting hybrid model based on chaos–wavelet analysis-support vector machine theory. Transportation Research Part C: Emerging Technologies, 27, 219–232. Wang, Y., & Papageorgiou, M. (2005). Real-time freeway traﬃc state estimation based on extended Kalman ﬁlter: A general approach. Transportation Research Part B: Methodological, 39(2), 141–167. Weijermars, W., & Van Berkum, E. (2005). Analyzing highway ﬂow patterns using cluster analysis. In Intelligent transportation systems, 2005. Proceedings. 2005 IEEE, September (pp. 308–313). IEEE. Xia, J., Huang, W., & Guo, J. (2012). A clustering approach to online freeway trafﬁc state identiﬁcation using ITS data. KSCE Journal of Civil Engineering, 16(3), 426–432. Yuan, Y., Van Lint, J. W. C., Wilson, R. E., van Wageningen-Kessels, F., & Hoogendoorn, S. P. (2012). Real-time Lagrangian traﬃc state estimator for freeways. IEEE Transactions on Intelligent Transportation Systems, 13(1), 59–70. Zhang, L., Liu, Q., Yang, W., Wei, N., & Dong, D. (2013). An improved k-nearest neighbor model for short-term traﬃc ﬂow prediction. Procedia-Social and Behavioral Sciences, 96, 653–662. Zhang, Y., & Xie, Y. (2007). Forecasting of short-term freeway volume with v-support vector machines. Transportation Research Record, 2024(1), 92–99. Zheng, W., Lee, D. H., & Shi, Q. (2006). Short-term freeway traﬃc ﬂow prediction: Bayesian combined neural network approach. Journal of Transportation Engineering, 132(2), 114–121.

Daily long-term traffic flow forecasting based on a deep neural network

Daily long-term traffic flow forecasting based on a deep neural network

Recommend Documents