Daily long-term traffic flow forecasting based on a deep neural network

Daily long-term traffic flow forecasting based on a deep neural network

Expert Systems With Applications 121 (2019) 304–312 Contents lists available at ScienceDirect Expert Systems With Applications journal homepage: www...

3MB Sizes 0 Downloads 51 Views

Expert Systems With Applications 121 (2019) 304–312

Contents lists available at ScienceDirect

Expert Systems With Applications journal homepage: www.elsevier.com/locate/eswa

Daily long-term traffic flow forecasting based on a deep neural network Licheng Qu a,b, Wei Li a, Wenjing Li c, Dongfang Ma c,∗, Yinhai Wang d a

School of Electronic and Information Engineering, Xi’an Jiaotong University, Xianning Road, No. 28, Xi’an 710049, China School of Information Engineering, Chang’an University, Middle-Section of South 2nd Ring Road, Xi’an 710064, China c Institute of Marine information science and technology, Zhejiang University, Yuhangtang Road, No. 866, Hangzhou 310058, China d Department of Civil & Environmental Engineering, University of Washington, More 201, Seattle 98105, USA b

a r t i c l e

i n f o

Article history: Received 26 June 2018 Accepted 18 December 2018 Available online 21 December 2018 Keywords: Daily long-term traffic flow Forecasting Deep neural network Contextual factor Batch training

a b s t r a c t Daily traffic flow forecasting is critical in advanced traffic management and can improve the efficiency of fixed-time signal control. This paper presents a traffic prediction method for one whole day using a deep neural network based on historical traffic flow data and contextual factor data. The main idea is that traffic flow within a short time period is strongly correlated with the starting and ending time points of the period together with a number of other contextual factors, such as day of week, weather, and season. Therefore, the relationship between the traffic flow values within a given time interval and a combination of contextual factors can be mined from historical data. First, a predictor was trained using a multi-layer supervised learning algorithm to mine the potential relationship between traffic flow data and a combination of key contextual factors. To reduce training times, a batch training method was proposed. Finally, a Seattle-based case study shows that, overall, the proposed method outperforms the conventional traffic prediction method in terms of prediction accuracy. © 2018 Elsevier Ltd. All rights reserved.

1. Introduction Traffic flow forecasting is important for both government agencies and individual travelers (Zhang & Xie, 2007). Local government agencies use traffic flow forecasting to predict potential congestion and other events (Haijema & Hendrix, 2014) in order to guide suitable traffic interventions, such as altering traffic signal timing or closing certain roads. Moreover, real-time traffic information is now available on navigation apps such as Waze and Google Maps and allows individual travelers to plan their trip in advance and adjust their route at any particular moment (Lee, Tseng, & Tsai, 2009). Traffic flow forecasting can be classified into long-term and short-term forecasting. Short-term traffic forecasting has been a critical component of most Intelligent Transportation Systems since the early 1980s (Bezuglov & Comert, 2016; Vlahogianni, Karlaftis, & Golias, 2014). Traffic predictions of the near future, from few seconds to possibly a few hours based on current and past traffic information, can be used in various applications such as adaptive traffic signal control and route guidance. Long-term predictions tar∗

Corresponding author. E-mail addresses: [email protected] (L. Qu), [email protected] (W. [email protected] (W. Li), [email protected] (D. Ma), [email protected] (Y. Wang). https://doi.org/10.1016/j.eswa.2018.12.031 0957-4174/© 2018 Elsevier Ltd. All rights reserved.

Li),

get one or more days in the future and are used by traffic management agencies to achieve appropriate traffic management and avoid potential traffic congestion, thus ensuring operational efficiency (García-Ródenas & Verastegui-Rayo, 2013; Haijema & Hendrix, 2014). This paper focuses on forecasting traffic flow data one whole day into the future, which is referred to as daily traffic flow forecasting. In particular, for fixed-time signal control strategies, daily traffic flow forecasting is crucial. Traffic flow on the target day must first be forecast and the whole day is then split into several intervals based on the forecast. Finally, appropriate signal timing is planned for each interval. There has been a considerable amount of research on shortterm traffic flow forecasting (Van Lint & Van Hinsbergen, 2012; Vlahogianni et al., 2014). Two major types of forecasting are the classical statistical methods and machine learning methods (Cetin & Comert, 2006; Cools, Moons, & Wets, 2009; Fowe & Chan, 2013; Kerner, Klenov, Hermanns, & Schreckenberg, 2013; Treiber & Kesting, 2012; Wang & Papageorgiou, 2005; Yuan, Van Lint, Wilson, van Wageningen-Kessels & Hoogendoorn, 2012). However, long-term daily traffic flow forecasting has attracted less attention. Comparisons of the existing traffic flow forecasting methods have previously been presented in the literature (Smith, Williams, & Oswald, 2002; Stathopoulos & Karlaftis, 2003;

L. Qu, W. Li and W. Li et al. / Expert Systems With Applications 121 (2019) 304–312

Vlahogianni et al., 2004; Chen, Wang, Li, Hu, & Zhang, 2012; Lippi, Bertini, & Frasconi, 2013). Machine learning methods such as artificial neural network (ANN) and hybrid methods are often reported to outperform classical statistical methods, such as historical averaging and smoothing techniques. In this paper, long-term traffic flow forecasting is used and a daily prediction method based on machine learning is presented that uses a deep neural network (DNN). The remainder of this paper is organized as follows: Section 2 summarizes the existing literature; The proposed forecasting model is then established in Section 3. In Section 4, the new method is evaluated based on a real-world traffic flow data; Finally, the main contribution of the paper and summary of new findings are presented in Section 5.

2. Literature review The simplest method of forecasting daily traffic flow is taking all the historical data, dividing the day into time intervals, and averaging the values across each interval. In this case, the average historical data is regarded as a forecasting result for each time interval. For long-time historical data, traffic flow is typically characterized by non-reproducible flow patterns that vary from day to day, however, traffic flow trajectories look similar on typical workdays—morning and afternoon peaks during commuter periods and a lull in traffic after midnight (Wang & Shi, 2013). Although some days are similar, significant differences between daily patterns can also exist and will negatively affect traffic flow forecasting. Selecting appropriate historical data can be helpful for improving prediction accuracy, and some works based on this idea have previously been published. Methods based on historical data selection are called prediction-after-classification methods (Habtemichael & Cetin, 2016) and consist of three steps: 1) classify the historical data into several groups based on similar daily traffic flow data of each group; 2) select an appropriate group of historical data for the target day; and 3) finish forecasting by using the appropriate data for that group. For example, Hosseini, Moshiri, Rahimi-Kian, and Nadjar Araabi (2014) collected historical data from all the workdays of a previous 6-month period (128 days) and used it to predict the daily traffic flow on the following Friday. It was assumed that traffic flow on the target Friday would be similar to all workdays of the previous 6 months. Hou and Li (2016) divided historical data into two groups according to their degree of similarity, then using wavelet analysis, split the daily traffic flow series into a basic series and deviation series. For a single target day, historical data should first be selected based on whether the target day is a workday or not and the average of the basic series across the whole group is considered as the basic forecasting result. Then, deviation of the target day is predicted using common time series forecasting methods, such as the autoregressive integrated moving average algorithm (ARIMA) (Cools et al., 2009; Smith et al., 2002) or generalized regression neural network algorithm (GRNN) (Hou & Li, 2016). Differences among daily traffic flow patterns are not just caused by workdays and weekends, but are also strongly correlated with other contextual factors, such as day of week, weather, holiday, season, and big events. The National Cooperative Highway Research Program (NCHRP) Report 765 (Horowitz, Creasey, Pendyala, & Chen, 2014) has identified a number of important factors associated with traffic demand and divided daily historical traffic flow data into 24 groups according to the following four criteria 1) weekday, Saturday, or Sunday; 2) summer day or non-summer day; 3) rainy day or not; 4) holiday or not. Then, the historical data group can be selected based on the attributes of the target day and the mean of the group can used to forecast a traffic flow value.

305

In addition to the simple methods used for daily traffic flow classification, flow patterns can also be recognized using unsupervised learning algorithms including k-means clustering with various distance metrics, such as Euclidean distance (Xia, Huang, & Guo, 2012) and spectral distance (García-Ródenas, López-García, & Sánchez-Rico, 2017). Other methods for pattern classification include hierarchical clustering analysis (Weijermars & Van Berkum, 2005), wavelet analysis (Jiang & Adeli, 2004), support vector machines (SVM) (Castro-Neto, Jeong, Jeong, & Han, 2009; Wang & Shi, 2013), k-nearest neighbors (Habtemichael & Cetin, 2016; Lin, Li, & Sadek, 2013; Zhang, Liu, Yang, Wei & Dong, 2013; Zheng, Lee, & Shi, 2006), and neural networks (Celikoglu et al., 2013; Tsai, Lee, & Wei, 2009). Despite the advanced methods for classifying traffic flow patterns, no clear method for the selection of appropriate group data exists. Furthermore, existing studies on daily forecasting are based on statistical analyses, such as those of Hosseini et al. (2014) and Hou and Li (2016), and research on the mining of traffic flow data is lacking. The traffic flow within a given time period, for example 6:00– 6:10, usually varies within a small range on different days and differences are caused by the contextual factors. By allowing timeof-day, i.e., the starting and ending time points, to act as an additional contextual factor, the relationship between the traffic flow value and a particular combination of contextual factors can be mined using a machine learning algorithm to obtain an appropriate predictor. By inputting all possible combinations of contextual factors on the target day into the predictor, traffic flow data for the whole day can be predicted (described in further detail in Section 3). 3. Methodology In general, traffic flow data is collected with a given time interval, e.g., 5 min, 15 min, 1 h. In this paper, the aim was to predict daily traffic flow by mining the correspondence between the traffic flow data and a combination of contextual factors. The framework of the forecasting method is illustrated in Fig. 1. The DNN, a deep learning architecture, allows computational models to be composed of multiple processing layers in order to learn representations of data with multiple levels of abstraction (LeCun, Bengio, & Hinton, 2015). The DNN is frequently recommended for mining potential relationships within massive multidimensional historical datasets (Hinton, Osindero, & Teh, 2006). However, the DNN training process usually takes a long time, moreover, optimization of the process is necessary. Therefore, the proposed method considers two aspects: prediction logic with a deep neural network; batch training and generating a predictor. 3.1. Prediction logic with deep neural network For feature extraction and transformation, the DNN uses a cascade of multiple layers with nonlinear processing units. Each successive layer uses the output of the previous layer as input, and training is based on a distinct set of patterns according to the output of the previous layer. Factors are aggregated and recombined layer by layer, therefore, more factors advancing through the neural net are helpful in recognizing complex patterns. The DNN is fully connected and feedforward. Models of the DNN can be viewed as simple mathematical models defining a function θ : Z→Yˆ , or a distribution over Z, or both Z and Yˆ . Here, Z is the set of contextual factors and Yˆ is the future traffic volume to be predicted. Assuming z is the daily vector of contextual factors, z ∈ Z, the neuron network function f(z) can be defined as a composition of some function g(z) and the function of each layer can be decomposed into other functions. Based on the definition

306

L. Qu, W. Li and W. Li et al. / Expert Systems With Applications 121 (2019) 304–312

Fig. 1. Framework of the forecasting method.

bination of the nodes of the previous layer. Analogously, for nodes in the subsequent layer, the inputs are also a mixture of the nodes in different proportions according to their coefficients. The combination of inputs in different layers is very important in a DNN, as it significantly reduces error. Before training the forecasting model and determining the parameters w and b, the loss function L should first be defined. The loss function is an important concept in machine learning, as it is a measure of how far away a particular solution is from the optimal solution of the problem to be solved. The function can be expressed as





L y, yˆ =

Fig. 2. Architecture of deep neural network prediction model.

of the neural network function, a network structure can be conveniently designed with arrows depicting the dependencies between functions. The nonlinear weighted sum is a widely used function:

 f (z ) = K



 w i gi ( z )

(1)

i

where wi is the weight parameter of gi ; gi refers to a specific function, which forms the set g = (g1 , g2 ,…gi ,…); and K is a predefined function, such as the sigmoid, soft-max or rectifier function, and is commonly referred as the activation function. An important characteristic of the activation function is that it provides a smooth transition as input values change. In this paper, the tanh function (exp(z)-exp(-z))/((exp(z)+exp(-z)) for K is considered, which can generate a non-linear value and compress it between −1 and 1. The architecture of the DNN model is presented in Fig. 2, with some input nodes (zi ∈ Z) and only one output node, the predictor, which can be defined as yˆ ∈ Yˆ , such that

yˆ = z · W + b

(2)

where z is the row contextual input factor, i.e., the daily vector of the contextual factor, z ∈ Rd = Z; w is the weight parameter; and b is the bias. If w and b are given different meanings and ranges, the formula can also be used to represent the whole neural network, every layer, or even every neuron in the network. In this way, a set of units is used to compute a weighted sum based on the inputs from the previous layer and the result is passed through a non-linear function. After feeding the contextual factor vector into the network, the internal state (activation) of the neurons and layers changes according to the input, and the prediction is produced based on the factors and activation function. The network is constructed by connecting the output of a certain neuron to the input of other neurons, thus forming a directed, weighted graph. The weights as well as the functions that compute the activation can be modified by a process called learning, which is governed by a learning rule (Chandra & Sharma, 2016). Multiple nonlinear regression is used at every node of a neural network. For each node of a single layer, the inputs are the recom-

1 1 y − yˆ 2 = y − f (z )2 2 2

(3)

where y is the actual value, yˆ is the predicted value, and z is a vector of contextual factors across some input training set. Several algorithms, for example the Stochastic Gradient Descent or other improved methods, can be used to train and update the forecasting model. Backpropagation is a method used to calculate the gradient of the loss function (the cost associated with a given state is produced) with respect to the weights of a DNN. The computational costs of the backward pass (BP) are essentially those of the forward pass (FP). Forward and backward passes are re-iterated until a sufficient performance level is reached (Schmidhuber, 2015). The weight updates of backpropagation can be done via stochastic gradient descent using the following equation:

wi j (t + 1 ) = wi j (t ) + η

∂L + ξ (t ) ∂ wi j

(4)

where η is the learning rate and ξ (t) is a stochastic term. The choice of loss function depends on a number of factors including the learning type and activation function. At the same time, this mechanism can be used to update parameter b. 3.2. Batch training and predictor generation Learning is a difficult task in densely-connected neural networks, as there are many hidden layers and inferring the conditional distribution of hidden activities with a given a data vector is difficult. Accurate regression results can be obtained by training samples one by one, however, this results in lengthy training times, particularly for a DNN with a large number of parameters (Srivastava, Hinton, Krizhevsky, Sutskever, & Salakhutdinov, 2014). Hinton et al. (2006) presented a fast and greedy algorithm for transforming representations assuming different layers of the model have the same number of units, therefore, deep, directed belief networks can be learned one layer at a time. This simple case is unsuitable for general neural networks. In this paper, Hinton’s algorithm is extended by removing the assumptions, thereby presenting a new algorithm to train models consisting of different numbers of units layer by layer. Online training and batch training are two essential training schemes used for neural networks. Theoretical analysis of the two schemes shows that the former has several advantages over the latter with respect to the absolute value of the expected difference. However, if the variance of the per-instance gradient remains constant, online training does not result in convergence to the

L. Qu, W. Li and W. Li et al. / Expert Systems With Applications 121 (2019) 304–312

optimal weight with respect to the expected squared difference (Nakama, 2009). It has been proven that convergence rate does not decrease with increasing mini-batch size (Li, Zhang, Chen, & Smola, 2014). Based on the characteristics of the loss function, batch training was used to train the model. The direction was determined by the whole data set to better represent the sample population and thus achieve greater accuracy in the direction of the extreme value. The batched matrix of the proposed model can be expressed as

Yˆ = Z · W + B

(5)

where Z is a T × M matrix, and each row vector (zt ) represents a training sample, T is the number of training samples or batch size and M is the feature number of each input sample; W is an M × N matrix; and B is a T × N matrix, N is the hidden number of neurons of the next layer or the number of output nodes in the output layer. The formula can be rewritten as



yˆ11 ... yˆt1

... ... ...

yˆ1n ... yˆtn





=

z11 ... zt1



b11 + ... bt1

... ... ...

z 1m ... ztm

... ... ...

 

b 1n ... btn

w11 · ... wm1



... ... ...

w 1n ... wmn



(6)

where t ∈ R = T, m ∈ R = M, and n ∈ R = N. In this way, stochastic gradient descent and backpropagation algorithms can be used to train the DNN. Moreover, batch training can speed up the training process and generalize the model. The training process can be described as follows: Based on training, a fine-tuned DNN and predictor were obtained to predict future traffic volumes. The new model only has one output node, as shown in Fig. 2, and the traffic volume at any given TOD can be obtained from the output node after inputting a combination of contextual factors into the predictor for a particular TOD. The fine-tuned DNN works well but is slightly inconvenient since only one traffic volume can be predicted at a time. Using the same matrix operation as the batch training, a factor matrix Z can be fed into the model and a predicted traffic volume matrix Y can then be output at the final layer. Here, Z is the factor matrix and zi ∈Z represents a prediction sample, which is used to predict traffic within different time periods in the future. Each zi consists of several factors zij which are the primary contextual factors influencing traffic volume. Because the output layer of the predictors have only one hidden neuron (N = 1), the batch prediction formula can be simplified as

⎡ ⎤

 yˆ 1 z11 . ⎣ . ⎦ = ... . zt1 yˆ t

... ... ...

z 1m ... ztm

 ⎡ w1 ⎤ ⎡b1 ⎤ · ⎣ .. ⎦ + ⎣ .. ⎦ . wm

. bt

(7)

where t is the number of training samples or batch size and m is the feature number of each input sample. Now, when a batch of contextual factors is input into the model, a batch of predicted traffic volumes is achieved. 4. Experimental To evaluate the performance of the proposed method, experiments were performed using real data. In addition, a comparison between the proposed method and a frequently-used method was performed in two dimensions, time and space. 4.1. Data preparation and description A real-world traffic dataset obtained from the Digital Roadway Interactive Visualization and Evaluation Network (DRIVENET,

307

http://uwdrive.net/STARLab) was used in the experiment and included data for traffic flow, occupancy, and point speed on the highways of Seattle, Washington (Ma, Wu, & Wang, 2011). The traffic flow data can be downloaded within certain intervals (5 min, 10 min, 15 min, 1 h…). In this study, data from multiple loop detectors on the I5 freeway were downloaded for different time intervals and mileposts ranging from 180 to 190 miles, which contain a total of sixteen detectors, as shown in Fig. 3. The majority of the case study was based on data from the sixteen loop detectors between February 1, 2015 to March 31, 2016 as the training/test dataset. Data of the last month, from March 1 to 31, 2016, was selected as the forecasting target, and the remainder of 2016 was treated as historical data used to train the forecasting model. Some abnormalities were found in the dataset and can be attributed to issues with the loop detectors or transmission interruptions between the local controller and data center. Abnormal data will have a negative influence on the forecasting models and should be filtered out before training and forecasting. Since traffic volumes are different between the peak and off-peak hours, the abnormal data was adjusted differently for two time periods: 5:00 AM–6:00 PM and the rest of the day. For 5:00 AM–6:00 PM, if the traffic data within a certain time interval was 60% more or less than the previous or next time interval, it was considered abnormal. For the rest of the day, 80% was used as the threshold. Abnormal data points were removed and replaced with adjusted values based on the adjacent data points of the previous time interval. For 5:00 AM–6:00 PM, random values between 0.8 and 1.1 were selected and multiplied with traffic data from the previous time interval. For the rest of the day, random values between 0.9 and 1.3 were selected and multiplied with traffic data of the previous time interval. Another criterion was used for one special case: if no traffic flow data existed for more than one hour in one day, the whole day was removed from the dataset. 4.2. Forecasting based on the proposed method 4.2.1. Deep neural network training It is difficult to determine a suitable number of hidden layers and hidden nodes for a DNN. In general, the number of hidden nodes is often considered to be related to the number of input nodes, approximately 1 to 3 times of the number of input nodes. Therefore, the greedy layer-wise unsupervised learning algorithm was used to optimize the number of hidden nodes in each layer and to determine the number of hidden layers of the whole DNN. After repeated learning and verification, a six-layer DNN was constructed and the number of hidden neurons in each layer was [15, 18, 22, 9, 5]. The learning rate was set to 0.01 and the batch training size was set as the number of data points associated with each day. An example of training and evaluation is presented in Fig. 4, based on data from detector 1 during a 5-min time interval. In the first few iterations, the loss forecast of the neural network decreases sharply, then flattens, and eventually converges. The predictor was achieved both with and without the batch training method using the same machine configurated with a 64bit quad-core 3.2 GHz AMD Ryzen 5 1400 processor and 8GB of main memory. The average batch training time for sixteen detectors across five different time intervals was approximately 12 min. With online training, however, the average time was approximately 2460 min. Batch training can save a significant amount of time, and the time was reduced by 99.51%. 4.2.2. Forecasting results using proposed method Forecasting results contains P data points for a whole day (P = 1440/ where  is the time interval in minutes). For each data point, the absolute percentage error (APE) was used to

308

L. Qu, W. Li and W. Li et al. / Expert Systems With Applications 121 (2019) 304–312

Fig. 3. Locations of detectors in Seattle, Washington.

Fig. 5. Forecasting results and APE on March 15 at milepost 180.66. Fig. 4. Training and evaluation of deep neural network.

evaluate the forecasting accuracy, expressed as

APE(t ) =

y(t ) − yˆ(t ) y(t )

(8)

where, y(t) is the true value of traffic flow in the tth interval; yˆ (t ) is the forecasted value of traffic flow in the tth interval; APE(t) denotes the absolute percentage error for traffic flow in the tth interval. For the time series forecasting results of the whole day, the mean absolute percentage error (MAPE) was used to assess the results, expressed as

MAPE =

T 1 |APE(t )| T

(9)

t=1

where T is the total number of intervals in the whole day. Taking March 15 and detector 1 as an example, the APE across whole day is shown in Fig. 5. It can be seen that just two data points have a high APE, approaching 30%, however, most of the APE values are less than 15%. The average APE value across the whole day is approximately 6.136%. 4.2.3. Robustness analysis of the proposed method Before using the proposed method in actual applications, it is necessary to assess its robustness. The robustness of the method was evaluated by considering different time intervals, days, and detectors. The forecasting results using the proposed algorithm for detector 1 across 31 target days for the 5-min time interval are shown in

Fig. 6. APEs at different days with an interval of 5 min.

Fig. 6. It can be observed that the predictions are accurate across the whole month and high APEs mainly occurred between 23:00 and 6:00 the following day. The reason for this is that the traffic flow during the time periods from 22:00 to 6:00 the next day are low and relative fluctuations are high at the same data points across different days. For example, the number of vehicles collected by one detector between 0 0:0 0–0 0:05 was 5 one day and 10 the next, therefore, the relative fluctuation is 100%, and based on this, accuracy of the prediction model will be low. Thus, high APEs during low-traffic-flow periods are reasonable. In addition, no special measures are required for the low-traffic-flow period, and high prediction errors are acceptable for these periods.

L. Qu, W. Li and W. Li et al. / Expert Systems With Applications 121 (2019) 304–312

309

Table 1 MAPEs for different time intervals. Date March March March March March March March March March March March March March March March

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

March 16

5 min

10 min

15 min

30 min

60 min

Date

10.18 13.04 8.93 13.11 10.73 11.48 11.24 7.46 8.01 6.80 5.85 6.22 5.70 5.10 4.34

9.21 14.95 10.34 11.49 10.44 11.61 11.81 8.75 6.57 6.16 5.41 5.21 5.14 3.84 3.69

10.40 16.50 8.59 12.17 13.04 12.95 10.13 6.64 5.09 5.42 5.39 4.84 4.70 3.57 3.55

7.08 8.00 10.80 10.33 11.68 9.62 8.27 4.98 5.53 6.29 5.52 4.32 3.79 3.43 3.53

6.41 8.31 9.37 9.71 11.73 9.59 8.81 9.00 8.27 6.89 6.08 4.69 4.14 3.16 2.90

March March March March March March March March March March March March March March March

5.34

5.49

4.45

4.31

3.03

Average

17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

5 min

10 min

15 min

30 min

60 min

4.27 3.87 5.30 9.81 6.31 6.16 4.96 5.05 5.54 5.15 4.95 5.76 6.50 5.88 8.59

3.65 2.62 5.29 7.57 5.34 5.16 4.06 4.41 5.11 5.10 6.69 5.44 8.95 11.52 9.55

3.34 2.72 4.86 7.30 5.00 5.07 4.15 4.46 4.87 5.20 6.41 4.46 6.56 7.80 12.37

3.43 2.65 5.24 4.85 4.49 4.29 3.78 3.62 4.39 4.29 3.24 3.29 5.22 4.80 5.53

2.84 2.47 4.01 4.29 3.71 3.64 3.45 2.87 4.37 4.29 4.44 4.03 3.97 6.03 7.75

7.15

7.12

6.84

5.50

5.62

Algorithm 1 Batch training. 1) Prepare the historical data. 2) Split the historical data into the training set, validation set, and test set. 3) Initialize the weight matrices W and B, 4) Randomly extract T samples from the training set, 5) Feed the batch samples into the DNN as input, 6) Calculate the difference between the predicted output and real traffic volume, 7) Calculate the descending gradient and fine-tune the weight matrices W and B 8) Return to Step (4) until the desired accuracy or epoch is reached.

Fig. 7. APE of different detectors for a 1-h interval on March 22.

The time interval was varied from 5 min to 10 min, 15 min, 30 min, and 1 h. The MAPE values for different time intervals from March 1 to 31, 2016 using data from detector 1 are presented in Table 1. It can be observed that the MAPE values are generally smaller for larger time intervals, which is expected since forecasting becomes relatively easier. However, if less data is input, there is less historical information for forecasting. Therefore, MAPE values may not always be smaller for larger time intervals. The previous evaluations focused on data from loop detector 1 on the Seattle highway, however, the method was also applied using data from fifteen additional detectors on the same highway. The APE distribution for a 1-h time interval on March 22 is presented in Fig. 7. The same conclusions can be made. The proposed method is robust among different detectors and high APE values appear during low-traffic-flow periods. 4.2.4. Comparison to alternative methods To further evaluate the benefits of the proposed method (denoted DL), forecasting results were compared to those of a frequently-used conventional method based on the NCHRP Report 765 (denoted CM). The CM divides historical days of the whole year into 24 groups and the mean of these groups is used to forecast target days. The

Fig. 8. Comparison of forecasting results based on different prediction methods (180.17 milepost on March 22).

forecasting results of detector 1 on March 22 using the proposed method are compared to the CM in Fig. 8. From Fig. 8, it can be observed that the forecasting results of the proposed method are closer to the true traffic flow data. The MAPE values of the forecasting results using the DL and CM method are summarized as box plots in Fig. 9. The five sub-graphs in the first column show differences between the MAPEs of the sixteen detectors across the 31 days in March 2016. Another five sub-graphs in the second column present differences between the MAPEs of each day in March 2016, across the sixteen detectors. It can be observed that the MAPEs of the proposed method across different days are slightly less than corresponding values of the CM. The same conclusion can be deduced for different detectors. Thus, our new method offers a stable advantage, particularly in terms of forecasting accuracy. However, several inconsistent points can be observed in ten of the sub-graphs of the first two columns, shown in Fig. 9. For example, from March 3–5 for the 60-min time interval (the last sub-graph of the second column), the DL method

310

L. Qu, W. Li and W. Li et al. / Expert Systems With Applications 121 (2019) 304–312

Fig. 9. Box plots of MAPEs based on different forecasting methods with various spatial and time scales.

is slightly inferior to the CM. The main reason for this may simply be the high degree of volatility associated with low-traffic-flow periods. Therefore, any forecasting method will not be robust during low-demand periods and the CM will always outperform the DL method. A summary of the MAPEs for different time intervals and spatial scales from 6:00 to 22:00 (after omitting low traffic periods) is presented in columns 3 and 4 of Fig. 9. Average performance for the whole month of March using the proposed method is better than the CM, as shown in the subgraphs of Fig. 9. Furthermore, the average performance of sixteen detectors each day is also higher using the DL method compared to

the CM, with the exception of March 13. On this particular day, the MAPEs of the DL method are higher than those of the CM method for all the five conditions. The reason for this confusing result is that March 13 is the first day of the daylight time in 2016, and a significant number of travelers, especially non-commuters, did not change their departure time on time, thus the traffic flow data on this day is significantly different from historical values, which have a similar combination of contextual factors. Therefore, predictions based on the relationship between the traffic flow data and contextual factors will cause large errors. For special cases like this, DNN-based methods perform much worse than the CM, which

L. Qu, W. Li and W. Li et al. / Expert Systems With Applications 121 (2019) 304–312

311

Fig. 10. Box plots of MAPEs based on different forecasting methods for different time interval (a) 0:0 0–24:0 0; (b) 06:0 0–22:0 0.

simply predicts the result based on average values within one group. The final analysis assessed box plots of the MAPEs for different time intervals based on the DL method and CM, as shown in Fig. 10. As an example, the results of detector 1 were compared and similar conclusions can be made. The DL method performs better than the CM, especially during the period between 6:00– 22:00. 5. Conclusions In this paper, a new prediction method for daily long-term traffic flow was presented based on mining the relationship between contextual factors and traffic flow data using a deep neutral network. By combining the contextual factors with different TODs throughout the day, we can predict daily long-term traffic flow. The training process for deep neutral networks based on massive historical data is typically extremely time-consuming; therefore, a batch training method was developed to address the issue. In addition, a case study based on Seattle was presented using traffic data collected by loop detectors on the I5 freeway. The results of the proposed method are robust in both the temporal and spatial context, and predictions were superior to the CM, particularly during high-demand periods (e.g., 6:0 0–22:0 0). The findings presented in this paper are useful for improving traffic management in cities, including the performance of fixed-time traffic signal control strategies such as time-of-day and other traffic management systems. Special events, such as accidents and construction activities, can significantly influence traffic flow, but were not considered in the proposed method. In the future, we plan to expand the method by collecting additional data on special events to provide more contextual factors and historical data. Acknowledgements This Research is supported by the National Natural Science Foundation of China (61773337 and 61773338), Zhejiang Provincial Natural Science Foundation (LY17F030 0 09), Fundamental Research Funds for the Central Universities (2018QNA4050), Zhejiang Province Key Research and Development Plan (2018C01007), and National key research and development program (2016YFE01080 0 0). References Bezuglov, A., & Comert, G. (2016). Short-term freeway traffic parameter prediction: Application of grey system theory models. Expert Systems with Applications, 62, 284–292.

Castro-Neto, M., Jeong, Y. S., Jeong, M. K., & Han, L. D. (2009). Online-SVR for short-term traffic flow prediction under typical and atypical traffic conditions. Expert Systems with Applications, 36(3), 6164–6173. Celikoglu, H. B. (2013). An approach to dynamic classification of traffic flow patterns. Computer-Aided Civil and Infrastructure Engineering, 28(4), 273–288. Cetin, M., & Comert, G. (2006). Short-term traffic flow prediction with regime switching models. Transportation Research Record: Journal of the Transportation Research Board, (1965), 23–31. Chen, C., Wang, Y., Li, L., Hu, J., & Zhang, Z. (2012). The retrieval of intra-day trend and its influence on traffic prediction. Transportation Research Part C: Emerging technologies, 22, 103–118. Chandra, B., & Sharma, R. K. (2016). Deep learning with adaptive learning rate using Laplacian score. Expert Systems with Applications, 63, 1–7. Cools, M., Moons, E., & Wets, G. (2009). Investigating the variability in daily traffic counts through use of ARIMAX and SARIMAX models: Assessing the effect of holidays on two site locations. Transportation Research Record: Journal of the Transportation Research Board, (2136), 57–66. Fowe, A. J., & Chan, Y. (2013). A microstate spatial-inference model for network– traffic estimation. Transportation Research Part C: Emerging Technologies, 36, 245–260. García-Ródenas, R., & Verastegui-Rayo, D. (2013). Adjustment of the link travel-time functions in traffic equilibrium assignment models. Transportmetrica A: Transport Science, 9(9), 798–824. García-Ródenas, R., López-García, M. L., & Sánchez-Rico, M. T. (2017). An approach to dynamical classification of daily traffic patterns. Computer-Aided Civil and Infrastructure Engineering, 32(3), 191–212. Habtemichael, F. G., & Cetin, M. (2016). Short-term traffic flow rate forecasting based on identifying similar traffic patterns. Transportation Research Part C: Emerging Technologies, 66, 61–78. Haijema, R., & Hendrix, E. M. (2014). Traffic responsive control of intersections with predicted arrival times: A Markovian approach. Computer-Aided Civil and Infrastructure Engineering, 29(2), 123–139. Hinton, G. E., Osindero, S., & Teh, Y. W. (2006). A fast learning algorithm for deep belief nets. Neural Computation, 18(7), 1527–1554. Horowitz, A., Creasey, T., Pendyala, R., & Chen, M. (2014). Analytical travel forecasting approaches for project-level planning and design (No. Project 08-83). Hosseini, S. H., Moshiri, B., Rahimi-Kian, A., & Nadjar Araabi, B. (2014). Traffic flow prediction using MI algorithm and considering noisy and data loss conditions: An application to Minnesota traffic flow prediction. PROMET-Traffic&Transportation, 26(5), 393–403. Hou, Z., & Li, X. (2016). Repeatability and similarity of freeway traffic flow and long-term prediction under big data. IEEE Transactions on Intelligent Transportation Systems, 17(6), 1786–1796. Jiang, X., & Adeli, H. (2004). Wavelet packet-autocorrelation function method for traffic flow pattern analysis. Computer-Aided Civil and Infrastructure Engineering, 19(5), 324–337. Kerner, B. S., Klenov, S. L., Hermanns, G., & Schreckenberg, M. (2013). Effect of driver over-acceleration on traffic breakdown in three-phase cellular automaton traffic flow models. Physica A: Statistical Mechanics and its Applications, 392(18), 4083–4105. LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521(7553), 436–444. Lee, W. H., Tseng, S. S., & Tsai, S. H. (2009). A knowledge based real-time travel time prediction system for urban network. Expert Systems with Applications, 36(3), 4239–4247. Li, M., Zhang, T., Chen, Y., & Smola, A. J. (2014). Efficient mini-batch training for stochastic optimization. In Proceedings of the 20th ACM SIGKDD international conference on knowledge discovery and data mining (pp. 661–670). ACM. Lin, L., Li, Y., & Sadek, A. (2013). A k nearest neighbor based local linear wavelet neural network model for on-line short-term traffic volume prediction. Procedia-Social and Behavioral Sciences, 96, 2066–2077.

312

L. Qu, W. Li and W. Li et al. / Expert Systems With Applications 121 (2019) 304–312

Lippi, M., Bertini, M., & Frasconi, P. (2013). Short-term traffic flow forecasting: An experimental comparison of time-series analysis and supervised learning. IEEE Transactions on Intelligent Transportation Systems, 14(2), 871–882. Ma, X., Wu, Y. J., & Wang, Y. (2011). DRIVE net: E-science transportation platform for data sharing, visualization, modeling, and analysis. Transportation Research Record: Journal of the Transportation Research Board, (2215), 37–49. Nakama, T. (2009). Theoretical analysis of batch and on-line training for gradient descent learning in neural networks. Neurocomputing, 73(1–3), 151–159. Schmidhuber, J. (2015). Deep learning in neural networks: An overview. Neural networks, 61, 85–117. Smith, B. L., Williams, B. M., & Oswald, R. K. (2002). Comparison of parametric and nonparametric models for traffic flow forecasting. Transportation Research Part C: Emerging Technologies, 10(4), 303–321. Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. (2014). Dropout: A simple way to prevent neural networks from overfitting. The Journal of Machine Learning Research, 15(1), 1929–1958. Stathopoulos, A., & Karlaftis, M. G. (2003). A multivariate state space approach for urban traffic flow modeling and prediction. Transportation Research Part C: Emerging Technologies, 11(2), 121–135. Treiber, M., & Kesting, A. (2012). Validation of traffic flow models with respect to the spatiotemporal evolution of congested traffic patterns. Transportation Research Part C: Emerging Technologies, 21(1), 31–41. Tsai, T. H., Lee, C. K., & Wei, C. H. (2009). Neural network based temporal feature models for short-term railway passenger demand forecasting. Expert Systems with Applications, 36(2), 3728–3736. Van Lint, J. W. C., & Van Hinsbergen, C. P. I. J. (2012). Short term traffic and travel time prediction models. Artificial Intelligence Applications to Critical Transportation Issues, 22(1), 22–41 2012.

Vlahogianni, E. I., Golias, J. C., & Karlaftis, M. G. (2004). Short-term traffic forecasting: overview of objectives and methods. Transport Reviews, 24(5), 533–557. Vlahogianni, E. I., Karlaftis, M. G., & Golias, J. C. (2014). Short-term traffic forecasting: Where we are and where we’re going. Transportation Research Part C: Emerging Technologies, 43, 3–19. Wang, J., & Shi, Q. (2013). Short-term traffic speed forecasting hybrid model based on chaos–wavelet analysis-support vector machine theory. Transportation Research Part C: Emerging Technologies, 27, 219–232. Wang, Y., & Papageorgiou, M. (2005). Real-time freeway traffic state estimation based on extended Kalman filter: A general approach. Transportation Research Part B: Methodological, 39(2), 141–167. Weijermars, W., & Van Berkum, E. (2005). Analyzing highway flow patterns using cluster analysis. In Intelligent transportation systems, 2005. Proceedings. 2005 IEEE, September (pp. 308–313). IEEE. Xia, J., Huang, W., & Guo, J. (2012). A clustering approach to online freeway traffic state identification using ITS data. KSCE Journal of Civil Engineering, 16(3), 426–432. Yuan, Y., Van Lint, J. W. C., Wilson, R. E., van Wageningen-Kessels, F., & Hoogendoorn, S. P. (2012). Real-time Lagrangian traffic state estimator for freeways. IEEE Transactions on Intelligent Transportation Systems, 13(1), 59–70. Zhang, L., Liu, Q., Yang, W., Wei, N., & Dong, D. (2013). An improved k-nearest neighbor model for short-term traffic flow prediction. Procedia-Social and Behavioral Sciences, 96, 653–662. Zhang, Y., & Xie, Y. (2007). Forecasting of short-term freeway volume with v-support vector machines. Transportation Research Record, 2024(1), 92–99. Zheng, W., Lee, D. H., & Shi, Q. (2006). Short-term freeway traffic flow prediction: Bayesian combined neural network approach. Journal of Transportation Engineering, 132(2), 114–121.