Reservoir-based surrogate modeling of dynamic user equilibrium

Reservoir-based surrogate modeling of dynamic user equilibrium

Transportation Research Part C xxx (xxxx) xxx–xxx Contents lists available at ScienceDirect Transportation Research Part C journal homepage: www.els...

4MB Sizes 0 Downloads 30 Views

Transportation Research Part C xxx (xxxx) xxx–xxx

Contents lists available at ScienceDirect

Transportation Research Part C journal homepage: www.elsevier.com/locate/trc

Reservoir-based surrogate modeling of dynamic user equilibrium☆ Qian Gea,b, , Daisuke Fukudaa, Ke Hanc, Wenjing Songd ⁎

a

Department of Civil and Environmental Engineering, Tokyo Institute of Technology, Tokyo 152-8552, Japan School of Transportation and Logistics, National Engineering Laboratory of Integrated Transportation Big Data Application Technology, National United Engineering Laboratory of Integrated and Intelligent Transportation, Southwest Jiaotong University, Chengdu 611756, China c Department of Civil and Environmental Engineering, Imperial College London, London SW7 2BU, UK d Department of Industrial and Manufacturing Engineering, Pennsylvania State University, PA 16802, USA b

ARTICLE INFO

ABSTRACT

Keywords: Dynamic traffic assignment Surrogate model Macroscopic fundamental diagram Kernel trick

This paper proposes a new dynamic user equilibrium (DUE) traffic assignment model using reservoir-based network reduction techniques and surrogate dynamic network loading models. A traffic network is decomposed into a reservoir structure, and the DUE problem is formulated as a variational inequality, with an embedded surrogate model for the path delay operator to describe traffic dynamics at the reservoir level. The surrogate model is further enhanced by the reproducing kernel Hilbert space and adaptive sampling to reduce approximation error and improve computational efficiency. To solve the proposed surrogate-based DUE problem on reduced networks, we develop a customized algorithm that integrates the kernel trick with the generalized projection framework. A pre-computation scheme is proposed, which calculates and stores the of kernel matrices and vectors, could further reduce the computational burden. Numerical experiments of the proposed methods show significant reduction of the computational times, by up to 90% , while maintaining low approximation errors (MAPE below 6% ), when compared to the exact models and solution methods.

1. Introduction Dynamic traffic assignment (DTA) is a widely studied class of problems in traffic research with numerous applications in traffic management, network design, and intelligent transportation systems. One key challenge in DTA research is the modeling of highly complex network dynamics and the computational burden associated therein. The computation of dynamic user equilibrium (DUE), which is the primary form of DTA study, typically involves iterations between two main modeling components: (1) an assignment module to update the path flows towards the dynamic extension of Wardrop’s first principle (Wardrop, 1952); and (2) a dynamic network loading (DNL) module to evaluate the path travel times/costs with given path flows. The map from the set of path flows to the set of path travel times/costs is termed the path delay operator (Friesz, 2010; Song et al., 2017). The DNL module describes the dynamic evolution of traffic flows on links and nodes in a network. The most commonly employed DNL modules in the literature are macroscopic/mesoscopic traffic flow models. In a typical macroscopic traffic flow model, the link dynamics are usually characterized by a differential algebraic equation or partial differential equation (PDE); the latter is derived from the conservation law (Lighthill and Whitham, 1955). Meanwhile, the vehicle flows at the nodes are conveniently described by

☆ This paper has been accepted for a poster presentation at the 23rd International Symposium on Transportation and Traffic Theory (ISTTT23) July 24–26, 2019 in Lausanne, CH. ⁎ Corresponding author at: School of Transportation and Logistics, Southwest Jiaotong University, Chengdu 611756, China. E-mail address: [email protected] (Q. Ge).

https://doi.org/10.1016/j.trc.2019.10.010 Received 1 December 2018; Received in revised form 2 September 2019; Accepted 20 October 2019 0968-090X/ © 2019 Elsevier Ltd. All rights reserved.

Please cite this article as: Qian Ge, et al., Transportation Research Part C, https://doi.org/10.1016/j.trc.2019.10.010

Transportation Research Part C xxx (xxxx) xxx–xxx

Q. Ge, et al.

Nomenclature

m na nd np nli

number of path segments in a reservoir flow of path segments speed of path segments density path segments reservoir flow speed of the reference line segment density of the reference line segment function of vehicle accumulation and average reservoir speed V ( (t , x )) function of average density and average reservoir speed f rspt flow of path p for the OD pair r s departing at time t rspt delay of path p for the OD pair r s departing at time t rspt surrogate delay of path p for the OD pair r s departing at time t Ti assigment interval

na fi (t , xi ) i (t , x i ) i (t , x i ) f (t , x ) (t , x ) (t , x ) V (N (t ))

nt f c nc nk r , s, p, t rspt rspt c

(·) (·) M

number of measurement intervals number of path segments in a reservoir number of OD pairs number of paths number of vehicles entering a path segment i during measurement interval tl number of assignment intervals path flow vector path delay vector number of samples for training the surrogate model dimensions of the kernel function origin, destination, path, time parameter of regression model path cost of c ’s sample path delay operator surrogate path delay operator assignment map OD travel demand

the supply–demand relationship. In general, the DNL models with physical queues are neither continuous nor analytical (Szeto, 2003; Han et al., 2016; Han and Friesz, 2017), with the exception of several probabilistic models (Osorio et al., 2011; Osorio and Flötteröd, 2014), and their computational time grows super-linearly with the size of the network. These adverse properties pose obstacles to the formulation and solution of the DUE problem. Aiming to address this challenge, this paper introduces two strategies: one is a macroscopic fundamental diagram (MFD)-enabled decomposition technique to scale down the size of the network, while the other one is surrogate modeling to efficiently perform the DNL module. Since the seminal work of Daganzo (2007), there has been a large body of literature on MFD. As demonstrated by Daganzo (2007), a large and complex network can be viewed as a multi-reservoir system if decomposed into a collection of reservoirs. A reservoir is the spatial aggregation of a homogeneous subnetwork, which is clustered by the designed properties of links and demand profiles. Instead of modeling vehicular movements through links and nodes, we may instead describe the vehicle dynamics at the reservoir level. This effectively reduces the size of the network, thereby improving the efficiency of computing path delays. When solving a typical DUE problem, the DNL is performed repeatedly until the path flows converge. In addition to simplifying the network dynamics through a reservoir approach, this paper proposes to further reduce the computational cost by reducing the need to execute the DNL module in its full scale. This is done by devising an approximation model as a surrogate for the DNL using statistical/machine learning with sampled inputs and outputs. The surrogate model, also called the meta-model or response surface model, is an analytical and inexpensive-to-evaluate approximation of the computationally intensive true response (Osorio and Bierlaire, 2013). Typical meta-modeling techniques include polynomial interpolation, support vector regression, Kriging, and neural networks (Gosavi, 2015). Frequently employed for solving simulation-based optimization in transportation applications, surrogate modeling was recently extended to treat analytical dynamic equilibrium problems. In particular, Song et al. (2017) proposed a Kriging-based surrogate modeling approach for DNL with considerable computational savings while maintaining satisfactory approximation accuracy. This paper further extends their work by employing surrogate modeling for solving DUE problems. As previously mentioned, the modeling and computational complexity of the model is reduced by (1) a MFD-based reservoir approach, and (2) a surrogate modeling approach for DNL. The surrogate modeling is enhanced by the reproducing kernel Hilbert space (RKHS) and adaptive sampling to reduce the prediction error. To solve the proposed surrogate-based DUE problem, we develop a customized algorithm that integrates the kernel trick into the generalized projection framework. Our numerical tests show that the proposed method leads to a significant reduction in the computational burden by up to 90% while yielding an acceptable approximation gap of no more than 3%1 with respect to the original DUE model. The remainder of this paper is organized as follows. Section 2 reviews relevant literature on DTA and existing studies closely related to the proposed strategies for simplifying network dynamics. The surrogate-based DUE problem, which is named the reservoirbased surrogate model of dynamic user equilibrium (RSM-DUE) problem, is formulated in detail in Section 3. Section 4 describes a novel customized projection algorithm for solving the RSM-DUE problem. Section 5 reports the results from several numerical experiments. Finally, conclusions and discussions are provided in Section 6. 2. Literature review This section reviews previous studies with respect to the DNL module, the formulation of DUE problem, and the application of 1

In some tests, the approximation gap could even be 1%. 2

Transportation Research Part C xxx (xxxx) xxx–xxx

Q. Ge, et al.

surrogate models. Because of the abundance of literature in these topics, we limit the scope of this review to the underlying assumptions of DNL/DUE and the resulting formulations. The detailed mathematical form, theoretical analysis on the existence of solutions, and solution methods are excluded here, although we may use the findings of these studies in the body of this manuscript. 2.1. Macroscopic traffic flow models It is unsurprising that macroscopic traffic flow models are chosen for studies that require the repeated use of DNL; they require fewer parameters as inputs and are more tractable and easier to calibrate than microscopic traffic flow models (Nie et al., 2008). This section reviews macroscopic traffic flow models with a focus on the MFD. Friesz et al. (2013) distinguished macroscopic traffic flow models into delay function models and exit-flow models by their method of describing travel delay and travel flow. Delay function models estimate the delay directly from exogenous data (Friesz et al., 1993; Wu et al., 1998; Xu et al., 1999; Zhu and Marcotte, 2000; Carey et al., 2003). In contrast, exit-flow models track the flow dynamics and calibrate the delay based on the propagation of traffic in the network. Models in this class includes the point queue model (Vickrey, 1969; Han et al., 2013,), exit flow function model (Merchant and Nemhauser, 1978; Nie, 2011), queueing theoretical model (Osorio et al., 2011), and hydrodynamics models (Lighthill and Whitham, 1955; Richards, 1956). Various solution methods has been developed for the hydrodynamics traffic model, including the cell transmission model (CTM) (Daganzo, 1994, 1995), link transmission model (Yperman, 2007; Himpe et al., 2016; Jin, 2015; Han and Friesz, 2017), and variational method (Daganzo, 2005a,b; Claudel and Bayen, 2010a,b). Nevertheless, all these models are formulated for the link dynamics. There also exist a few studies that aim to extend the link-based models to path-based models, e.g., Ukkusuri et al. (2012). The calibration of CTM relies on the fundamental diagram of speed, density, and flow of each link. A spatial aggregate of links shares similar characteristics in terms of vehicle accumulation, trip completion rate, and average speed (Daganzo, 2007; Geroliminis and Daganzo, 2008; Leclercq et al., 2015). Yildirimoglu and Geroliminis (2014, 2015) extended simulation-based traffic assignment by applying the MFD as the DNL model. Laval et al. (2017) demonstrated that only vehicle accumulation values are required to characterize the traffic dynamics using a network MFD. Leclercq et al. (2015) noted the similarity between MFD and the exit flow function model and proposed a hydrodynamic traffic flow model for path segments in a reservoir instead of single links. The hydrodynamic model is depicted by a system of PDEs. Ge and Fukuda (2019) revealed that the PDE system could be reformulated as a single PDE when introducing an auxiliary reference link and developed a mesh-link iterative numerical scheme based on this finding. 2.2. Dynamic user equilibrium traffic assignment DUE is one type of DTA in which “the effective unit travel delay, including early and late arrival penalties, of travel for the same purpose is identical for all utilized path and departure time pairs” (Friesz, 2010). It is a strict definition of an equilibrium state with regard to departure time, path choice, and path delay evaluation. Therefore, existing models could be distinguished by the continuity of departure time choice, the route choice across different time horizons, and the evaluation of path delay given the link delay at different observation times. First, the DUE problem can be formulated as a continuous problem or discrete problem considering the continuity of departure time choice. Friesz et al. (1989, 1993), Friesz and Mookherjee (2006), Ban et al. (2012) developed continuous time DUE formulations in which the time horizon for departure is continuous. In contrast, many more studies view the departure time as different discrete assignment horizons, e.g., Wie et al. (1995), Lo and Szeto (2002), Huang and Lam (2002), Szeto and Lo (2004). Second, in DUE models involving multiple assignment horizons, the models can be grouped into simultaneous route and departure time choice models (Mahmassani and Herman, 1984; Ran et al., 1996; Nie and Zhang, 2007; Han et al., 2013) and a pure route choice model (Friesz et al., 1993, 2001, 2013) if the equilibrium condition is satisfied across different time horizons. Moreover, because of the change of traffic conditions across different times, the path/link delay can be calibrated at the departure time or at the time when the trips are finished. The former type of model is called a reactive model (Kuwahara and Akamatsu, 1997; Akamatsu, 2001) while the latter type is referred to as a predictive model (Kuwahara and Akamatsu, 1993; Nie and Zhang, 2007). Existing DUE models also differ in their mathematical formulation tools, underlying DNL modules, treatment of stochasticity (Wei et al., 2014), and solution techniques. An early but not yet outdated review on the DUE problem is Peeta and Ziliaskopoulos (2001). 2.3. Surrogate models with applications in DUE/DNL The application of surrogate models in transportation is rather new. Most of the existing studies focus on network optimization problems in which DTA/DNL are treated as low-level models. Among them, a couple of studies use a surrogate model for simulationbased DTA solvers (Ciuffo et al., 2013; Chen et al., 2013; He et al., 2016). However, the simulation-based DTA solvers may sacrifice mathematical rigor and optimality when reaching convergence (Song et al., 2017). Instead, (Osorio and Bierlaire, 2013; Osorio and Chong, 2015; Chong and Osorio, 2017) integrate an analytical traffic model with a polynomial fitting model to obtain a surrogate model of DNL directly. To the authors’ best knowledge, Song et al. (2017) is the first attempt to use a surrogate model for DUE. Despite the wide variety of surrogate models in the literature, an ideal model for the path delay operator must (1) have a closed form representation of the map from inputs to outputs; (2) be easy to train and evaluate; and (3) be flexible to different degrees of smoothness (Song et al., 2017). Hence, many preceding studies adopt the Kriging as a surrogate model (Chen et al., 2013; Song et al., 2017). Osorio and Bierlaire (2013) proposed a joint model consisting of a polynomial basis function and physical model. The polynomial basis function is further enhanced with a regularization term and weights. Though low-order polynomial functions are sufficient for many applications, high-order polynomial functions may be required for training the model. However, due to the 3

Transportation Research Part C xxx (xxxx) xxx–xxx

Q. Ge, et al.

Fig. 1. Multiple entrance–exit pairs of a reservoir.

complexity of the path delay operator, fitting high-order polynomial functions require large number of variables and is constrained by the curse of dimensionality. The motivation of this study is to find an alternative surrogate model that can use a high-order polynomial function to approximate the path delay operator while maintaining a relatively low computational cost. Meanwhile, we also wish to derive an analytical solution for the surrogate model for the computation of the DUE problem. Weighted kernel ridge regression (WKRR) (Haworth et al., 2014) is the choice that meets our requirements. First, it has a closed form solution. The loss function minimization problem can be analytically solved, and only a handful of parameters that specify the kernel need to be trained. Second, classical regression models such as linear regression or ridge regression may fail to capture the on-line property of travel demand. If there are drastic shifts in travel demand, the equilibrated travel pattern will be changed accordingly. In contrast, weights and adaptive samples avoid this problem by emphasizing the importance of datasets that have lower error variance. Acceptable approximation may still be able to be generated when the distances of the new data and most training datasets are large. Moreover, by virtue of the regularization term, we can avoid the ill-posedness of problem when training regression models. Finally, a variety of kernels are available, which enables us to fit the surrogate model better. Note that the weighting and regularization schemes are also adopted in Osorio and Bierlaire (2013). In addition, Osorio and Bierlaire (2013) introduced a simple physical model of DNL in the loss function to maintain a reasonable approximation for DNL model when training samples are insufficient. The kernel trick is widely used in regression models in the machine learning community, e.g., in support vector machines, Gaussian processes, principal components analysis (PCA), ridge regression, spectral clustering, and linear adaptive filters (Murphy, 2012). Although many practitioners prefer the radial basis function (RBF) kernel because of its non-parametric characteristics, the performance of each kernel depends on the training dataset. Each kernel has been discovered to be suitable for some cases (Chen et al., 2013; Alaoui and Mahoney, 2015). Besides the novelty of the surrogate model, this manuscript also contributes to the solution framework of the DUE problem. Using the surrogate model of DNL, the DUE problem can be solved in a data-driven manner. New observations can be readily plugged into the dataset for further training and prediction. 3. Methodology This section starts by investigating the two strategies for building a tractable and analytical DNL model, which are building blocks of the RSM-DUE problem. The first strategy is the reservoir-based traffic model built upon Leclercq et al. (2015) and Ge and Fukuda (2019). This traffic model is used as the underlying DNL for training the surrogate model. The second strategy is to use WKRR as a surrogate of the DNL, which is a version of kernel ridge regression Murphy (2012) enhanced by assigning weights to training samples. With these two strategies, an analytical path delay operator is derived and RSM-DUE is formally presented. The following notations are used throughout this article. 3.1. Reservoir-based traffic model The traffic model of interest here is based on the conservation law for reservoirs. The link-based hydrodynamics traffic flow model, commonly referred to as the Lighthill-Whitham-Richards (LWR) model, is described by the following partial differential equation: t

(t , x ) +

xf

(1)

( (t , x )) = 0

where (t , x ) denotes the local vehicle density and f ( (t , x )) is the flow of a link. Leclercq et al. (2015) noted that a reservoir has multiple entrance-exit pairs, as shown in Fig. 1. Thus, the flow dynamics can be described for each path segment connecting an entrance-exit pair. Moreover, the traffic dynamics within a reservoir can be approximated by the macroscopic relationship between average speed and vehicle accumulation. For each path segment, the following PDE is given: t i (t ,

x i) +

x [ i (t ,

x i ) V (N (t ))] = 0,

(2)

i = 1, 2, …, na

where i (t , x i ) denotes the local vehicle density on path segment i and N (t ) is the vehicle accumulation. The traffic dynamics in a reservoir are governed by the PDE system for each path segment. Leclercq et al. (2015) propose a novel idea for the decomposition of a large and complex network, based on which the traffic dynamics in a network can be tracked in terms of path segments instead of links. This has inspired the decomposition scheme of this paper. 4

Transportation Research Part C xxx (xxxx) xxx–xxx

Q. Ge, et al.

Ge and Fukuda (2019) state that the previous formulation (2) could be simplified by introducing an auxiliary reference link. The traffic dynamics can now be described by a single PDE of the flow conservation in the reference link: t

(t , x ) +

1 na

x [f

(t , x )] = 0

(3)

where (t , x ) denotes the average density of the path segment and f (t , x ) = na fi (t , xi ) = i (t , x i ) V ( (t , x )) is the flow of the reservoir. More theoretical descriptions of this PDE and its relationship with Leclercq et al. (2015) may be found in Ge and Fukuda (2019). In addition to the reformulation of the PDE system, Ge and Fukuda (2019) also study the design of parameters for the path segment2, and develop a mesh-cell iterative update numerical scheme. They also extend the single-reservoir traffic model to a multireservoir system considering the demand and supply at the connections of the reservoirs. The numerical scheme for the reservoir-based traffic model of Ge and Fukuda (2019) is summarized here. Algorithm 1. Reservoir-based traffic model

Now, we show that the reservoir-based traffic model can serve as a path delay operator. Given path flow f rspt across each assignment interval, path travel cost rspt is estimated from the accumulation departure and arrival curves for each link. These curves record the number of arrivals/departures at each simulation step t . The time dependent travel time on a path rspt can be tracked from the start path segment to the end path segment sequentially. First, we may reverse the axes of the departure/arrival curve to obtain the departure/arrival time (Di (N )/ Ai (N ) ) for each vehicle using path segment i, as shown in Fig. 2. The difference in the two curves is the travel time of each vehicle. i (N )

= Ai (N )

Di (N ),

(4)

N

Then, we may group by the departure time of vehicles t1, t2, …, tl…,tm . Without loss of generality, we use the simulation resolution as both the measurement and assignment intervals across this manuscript. However, they may be defined differently in practice. The time for the group is expressed as follows: i (N )

i (t

l) =

1 nli

nli

i (k ),

k such that D (k )

tl

(5)

k=1

where f i (tl ) denotes the number of vehicles that enter path segment i during measurement interval tl . Therefore, the travel time of a vehicle departing at t [tl 1, tl] whose path consists of path segments 1, 2, …, i…is 1 (t ) + 2 [t + 1 (t )] + …+ i [t + …+ i 1 (t )] + …. For path flow f rspt , we can estimate path cost rspt using the aforementioned procedure. The map from (f): f is a finite-dimensional delay operator (Song et al., 2017). Remark 1. The computational time of the DNL module could benefit from the introduction of path segments, as shown in Ge and Fukuda (2019). Meanwhile, the number of paths in the network could also be reduced because link-based paths are replaced by path segment-based ones. Thus, we may regard the reservoir-based DNL model as an alternative to DNL for reducing the variables and drawing samples. 3.2. Surrogate modeling of dynamic network loading (DNL) We briefly reviewed surrogate modeling for DUE in Sections 1 and 2. This section introduces the theoretical basis of the WKRR and the surrogate modeling of the DNL with the WKRR. As discussed previously, the WKRR with adaptive sampling is a tailored version of Osorio and Bierlaire (2013) and Haworth et al. (2014). Regression, weights, and regularization Suppose we have a training set of samples of the path flow and path costs (f1, 1), (f2, 2) , …, (fnc, nc ), where fc, c are vectors in np× nt . Then, for each path cost, the most naive approach is to use a linear 2 Ge and Fukuda (2019) applied the minimum concurrent flow problem (MCFP) to calculate the boundary flux for each path segment. In our experiments, we found that the combination of maximum flow problem and MCFP will lead to better approximation.

5

Transportation Research Part C xxx (xxxx) xxx–xxx

Q. Ge, et al.

Fig. 2. Departure and arrival curves for path segment i (superscript i is dropped for clarity).

Table 1 Comparison of surrogate models. Model

Loss function

LR

nc c= 1 nc c= 1

RR WRR

(

rspt T f

(

rsptT f

c

c

nc rsptT f c c = 1 wc (

WKRR

Same as above

Predictor rspt 2 c ) rspt 2 c )

+

rspt 2 c )

1 2

||

rspt ||2

+

1 2

||

rspt ||2

rspt

=

rsptT (FFT ) 1Ff

rspt

=

rsptT (FFT

rspt

=

rsptT (WFFT

=

rsptT (WK

rspt

+ I) 1Ff + I) 1WFf

+ I) 1Wk (f , ·)

rspt

= rspt T f to fit the model. Let the matrix of all samples be F = [f1 f2…fnc]T . The objective function of the linear regression function is to minimize the loss function of [LR], as shown in Table 1. However, when the sampling size for training the model grows large, the super-collinearity problem may occur and the matrix FT F becomes singular. The loss function may be penalized by a regularization term to avoid this problem. Thus, we have the ridge regression problem ([RR] in Table 1). In addition, the importance of each sample can be adjusted by its distance with respect to the new test data point. We use the inverse distance function in the Euclidean metric to 1 attach a weight wc = || f f || to each item of training data. This weighting parameter gives more importance to local points (Osorio c 2 and Bierlaire, 2013). The weighted ridge regression (WRR) problem is able to capture the weighting parameters, as shown for [WRR] in Table 1, where W is a nc × nc square matrix in which the non-diagonal elements are zero and diagonal elements Wc, c = wc . Note that the predictors in the LR, RR, and WRR models only rely on the inner products of samples. This conclusion enables us to integrate high-order information into the model using the kernel trick. The WKRR model is the resulting predictor. where k (f1, f1) k (f1, f2) … … k ( f , f ) k ( f , f ) … … is the kernel matrix. Further, k (f, ·) = [k (f1, f), …, k (f c, f)…] is the the vector of inner product of 2 1 2 2 K= … … k (fc , f c) … … … … … training samples and the test data point. n denote some domain. Let k: F × F (f), (f ) and be a map in that k (f, f ) = Kernel method. Let F (f) = [ 1 (f), 2 (f), …]T be a vector of functions that transforms the input space to a higher dimensional space. Then, k (f, f ) is called {k (f, f )} is positive definite the inner-product kernel or simply kernel. A kernel is called a Mercer kernel when the kernel matrix K b b b 0, f , f F holds for all (·) such that a 2 (f)df . Examples of (Haykin, 2009). In this kernel, a a (f) (f ) k (f , f )dfdf n T T d Mercer kernels in R are the (a) linear kernel: k (f, f ) = f f , (b) polynomial kernel: k (f, f ) = (f f + r ) , r 0 , and (c) radial basis || f f ||

function kernel: k (f, f ) = e 2 2 , > 0 . np × nt . Define H to be the inner Let k (f, ·) denote any real-valued function generated by the kernel operation with f F N , i , i = 1, …, N } . If the inner product space spanned by finite combinations of f F , i.e., H = { i i k (fi, ·): fi F , N product space H is complete, it will be a Hilbert space. N M Suppose two functions are picked from space H and can be represented by f (·) = 1 ai k (fi, ·), g (·) = 1 bj k (f j, ·) , where ai and bj are expansion coefficients and both fi, f j F for each i , j . This representation may not be unique, i.e., f (·) is independent of the choice of kernel. N M N M The bilinear form of functions f (·) and g (·) reads as follows: f , g = 1 1 ai k (fi, f j) bj = 1 ai g (fi) = 1 bi f (fi) Hence, N

N

f , k (f, ·) = 1 ai k (f, fi) = 1 ai k (fi, f) = f (f) . This property is called the reproducing property of Mercer kernels. When the inner product space H is complete, the space where the reproducing kernel is defined becomes an RKHS. The most interesting conclusion of RKHS related to our study is the representer theorem (Schölkopf et al., 2001). 6

Transportation Research Part C xxx (xxxx) xxx–xxx

Q. Ge, et al.

Theorem 1 (Representer Theorem). Any function defined in RKHS can be represented as a linear combination of Mercer kernel functions. The representer theorem supports the application of the kernel trick in the WRR problem. Applying the kernel trick, initial variables can be mapped to a high-dimensional and implicit feature space. However, the evaluation of the predictor does not impose much extra computational burden because one only needs to compute the inner products of the initial variables instead. Introducing the kernel trick to the WRR problem yields the formula presented in Table 1. The kernel trick leads to an online algorithm for the training and prediction of the surrogate model. This algorithm could be integrated with the solution method of the DUE problem, which is described in Section 4. Training. The feature space of the path flows is constrained by the polyhedron of

f rspt

R+:

f rspt = v rst ,

r , s, t

(6)

p

Because of the constraints of the feasible region in the polyhedron, we propose a sampling method based on Latin hypercube sampling (LHS) (Fang et al., 2005; Song et al., 2017). This method starts by drawing samples following the LHS and then maps each sample to the polyhedron, as shown in Algorithm 2. Algorithm 2. LHS with projection

Remark 2. Although Algorithm 2 samples uniformly from the set [0, 1], the projection step will violate the uniformity. Fortunately, it does not affect the training and prediction model thanks to the weighting scheme in the loss function. The importance of rarely sampled data is further strengthened when the test point is in the neighborhood of the data. Dimensionality reduction. The proposed [WKRR] model uses all path flows to predict each path cost. However, over-fitting may occur if we use the whole set of features for each path cost. For example, the travel time of an earlier-departing vehicle may not be affected by the later-departing ones on a particular path. Using all the path flows to fit the path cost on the path cost of earlierdeparting vehicles will introduce error for future prediction, especially when the left term of the [WKRR] objective function is illposed. Hence, it is necessary to reduce the dimension of the feature vector. We use the method of Song et al. (2017) for feature selection. When training the WKRR predictor for the path cost rspt , the kernel operation becomes k (f, ·) = (t , t ) f, · , where (t , t ) 0 t t is an indicator function defined as (t , t ) = . Nonetheless, this method may neglect the impact of the later-departing 1 otherwise rspt . At a minimum, this method is one way to balance the over-fitting problem induced by path flows of other paths on path cost dimensionality and mathematical complexity, which is discussed in depth in Song et al. (2017). When solving the DUE problem, this dimensionality reduction method is effective because of the self-routing of vehicles during each assignment interval. However, it may cause serious error when solving the DSO problem. Thus, all features may be used for that case.

{

3.3. Surrogate-based DUE problem The DUE model has a number of variants, which differ in the level of discretization and underlying behavioral assumption. As a demonstration of the proposed methodology, and without loss of generality, we consider route-choice DUE problems. This means that the departure rate between each O-D pair is pre-determined, and the assignment mainly concerns route choices. Under this assumption, the DUE is achieved when the path travel times within the same O-D pair and associated with the same departure time are equal, and no more than the travel times that would have been experienced on any unused path. For this type of DUE, we consider the following variational inequality representation, which has been widely adopted in the DUE literature (Friesz et al., 1993; Wie et al., 1995; Ran et al., 1996; Akamatsu, 2001; Nie and Zhang, 2007). A primitive formulation of the DUE problem (with route choice) is

(f r

s

: {f rspt

p

rspt

)[f rspt

f

rspt

]

0,

f rspt

t

R+:

f rspt = v rst

r , s, t }

p rspt

rspt

) is the corresponding path cost, where f is the path flow between O-D (r , s ) that departs at t under the DUE condition, (f where (·) is viewed as the path delay operator, and v rst is the prescribed OD flow between (r , s ) at t. Alternatively, the matrix form for this formulation reads as follows: 7

Transportation Research Part C xxx (xxxx) xxx–xxx

Q. Ge, et al.

[VI

DUE]

Find f

such that

( f ), f

f

0,

f

R(np × nt ) +: Mf = v }

: {f

(7)

The surrogate-based DUE model replaces the simulation-based path delay operator could be any predictor in Table 1 by the assumption of the loss function.

[VI

Find f

SADUE]

suchthat

( f ), f

f

0,

(·) by the surrogate estimator

(·)

f

R(np × nt ) +: Mf = v}

: {f

(·) . This

(8)

Theorem 2. The [VI-SADUE] problem is equivalent to the fixed point problem:

f = P (f

(9)

( f ))

where P is projection operation and

is any real number.

Proof. See Theorem 1.3 in Nagurney (1998). □ Theorem 3. The [VI-SADUE] problem admits at least one solution. Proof. Per Brouwer’s fixed point theorem (Theorem 2.1.18 of Facchinei and Pang, 2007), it suffices to show is convex, closed, and ( f )) is continuous. We observe that compact, and the projection function P ( f is a simplex, therefore closed, convex and compact. Moreover, given the continuity of the projection operation P , it remains to show that ( f ) is continuous, which immediately follows as this function is a linear combination of basis functions determined by the kernel operation via the representer theorem. Therefore, [VI-SADUE] has at least one solution. □ Remark 3. The continuity of the exact path delay operators considering a physical queue, however, is still an open problem. Some recent discoveries may refer to Han et al. (2016), Han and Friesz (2017). It is concluded that the continuity of delay operator depends on the link model as well as the initial and boundary conditions. However, the spillbacks and link interactions in general network may affect the continuity of the exact path delay operator. Even if the delay operator is continuous, [VI-DUE] and [VI-SADUE] may have different solutions because Theorem 3 only guarantees the existence, not the uniqueness, of the DUE problem. In the experiment, we show that [V-DUE] and [VI-SA-DUE] lead to different solutions but both satisfy the DUE condition. Theorem 4. The [VI-SADUE] model admits to a unique solution if the map

( f ) is strongly monotonic.

Proof. This is a direct consequence of Theorem 1.8 in Nagurney (1998). □ Remark 4. This theorem implies that if the surrogate model of the path cost is strongly monotonic, then the [VI-SADUE] admits to a unique solution. However, the assumption on monotonicity is not realistic in general (Iryo, 2011, 2013). When the path delay operator is believed to be strongly monotonic, we could impose additional constraints on the objective function of the [LR], [RR], and [WRR] problems such that w > 0 . The isotonic regression (Luss et al., 2012) is another alternative for the surrogate model. As pointed out by Song et al. (2017), the monotonicity of the exact path delay operator is an unlikely property given the complexity of the DNL and the non-analytical nature of the delay operator. However, the generalized monotonicity of the proposed surrogate DNL model is much easier to characterize thanks to its analytical representation and regularity (e.g. continuous differentiability). This will be further analyzed in a future study. Two-stage model. Although the solution is guaranteed, the primitive model has an inconsistency problem, i.e., for the solution of [VI-SADUE], the estimated path cost by [WKRR] and the calculated path cost by the DNL module may have a substantial discrepancy. That is due to the under-fitting problem of the surrogate model at the solution. Thus, a two-stage model is proposed based on discussion on Osorio and Bierlaire (2013)’s framework. The two-stage model maintains two goals when solving the DUE problem: (1) to train an acceptable surrogate model with respect to the path flow pattern under the DUE condition, and (2) to find the solution of the DUE assignment problem given a trained surrogate model. Thus, the upper level model is used to minimize the loss function for the given path flow. Because kernel operation is implicit in most cases, an explicit objective function for the WKRR problem could not be obtained. We use the objective function of WRR instead, noting that krspt s could be parameterized or non-parameterized according to the kernel we adopt. In contrast, the lower-level model is to calculate the [VI-SADUE] problem given a trained surrogate model. The overall formulation of the two-stage model reads as follows: nk

Upper level

argmin rspt k

Lower level

Find f : {f

wkrspt (

rsptT fc k

rspt 2 p )

+

c=1

such that

k ( f ),

f

f

1 || 2

rspt 2 k ||

r , s, p, t

(10)

f

0,

R(np × nt ) +: Mf = v }

(11)

In this model, the subscript k denotes the change of samples and parameters in each iteration. Remark 5. The proposed two-stage model is also called infill sampling or adaptive sampling in the surrogate model literature. 8

Transportation Research Part C xxx (xxxx) xxx–xxx

Q. Ge, et al.

Fig. 3. Framework of the two stage model.

Adaptive sampling is a variety of strategies to improve the fitting of the surrogate model with wise selection of the new sampling points. The strategy is straightforward in our model. Because we want to minimize the gap between the surrogate model and DNL for the equilibrated path flow, it is obvious that the solution of the VI problem in the k-th iteration can serve as a warm-up start for the (k + 1)-th iteration. 4. Solution algorithm of the RSM-DUE problem The solution of the RSM-DUE problem is illustrated in Fig. 3. We start by sampling in the feasible region of path flows constrained by the OD demand, and then generate a series of path costs using the reservoir-based DNL module. These data are used to train a primal WKRR model with a polynomial kernel and produce a parameterized predictor. The upper-level model is the iterative procedure for solving the lower-level model, whose principal operator is being constantly updated from the lower level with a new sample. By virtue of the kernel trick, updating the surrogate model does not require re-calculating all the coefficients, which significantly improves the computational efficiency of the proposed method. In summary, the bi-level problem can be solved via the main loop in Algorithm 3. Algorithm 3. Iterative update algorithm

The lower-level model solves the DUE problem. By the equivalence of the VI problem and fixed pointed problem stated in Theorem 2, we develop a customized extragradient algorithm, which is shown in Algorithm 4, using the fixed point iteration scheme. Algorithm 4. Customized extragradient algorithm for solving DUE

9

Transportation Research Part C xxx (xxxx) xxx–xxx

Q. Ge, et al.

Fig. 4. The test network.

The customized extragradient algorithm begins with a warm-up start of initial path flow f0 using the solution of VI problem in previous iteration, i.e., fk in the main loop of Algorithm 3. A vector of white noise k is added fk to avoid the super-collinearity problem of the kernel matrix. Note that the white noise must converge to the zero vector as the difference between fk + 1 and fk diminishes. Then, a try-and-trail process is conducted to determine a step size, which should be neither too large nor too small for convergence. Readers interested in this process may refer to Khobotov (1987), He and Liao (2002). With the step size and the profitable direction, we can draw a tentative point fl + 1 of the path flow and evaluate the path cost using the surrogate model. This step requires a projection operation on the Euclidean distance. An extra projection step is performed to obtain the fl + 1. This algorithm terminates when the gap between fl + 1 and fl is acceptable. Remark 6. The size of path flow vector f is quite large and Algorithm 4 requires a repeated solution of the quadratic programming problem to perform the projection. Fortunately, the cost of the quadratic programming problem can be reduced because the feasible region and objective function are both decomposable with respect to assignment interval and OD pairs. Moreover, this algorithm does not need to calculate the kernel matrix at each iteration because the kernel matrix in the (k + 1)-th iteration can be viewed as Kk k (·,fk + 1) . This property reduces the number of kernel operations. k (·,fk + 1)T k (fk + 1, fk + 1) Pre-computation of kernel matrices.. When training a kernel-based model, pre-computation of the kernel operators could avoid repeated on-the-fly computation of kernel matrices and kernel vectors. This trick is widely used in machine learning toolkits, for example, scikit-learn (Pedregosa et al., 2011) and LIBSVM (Chang and Lin, 2011). We may observe that the kernel matrices and kernel vectors for paths that belong to the same assignment interval are the same. Therefore, the computation time could be reduced by large if we compute the kernel matrices and vectors for each assignment interval in advance and save them in memory. We then fetch them when use it. We note that the proposed scheme does not exert any extra computational time because the pre-computation scheme is enforced within the solution algorithm. The details of the algorithm is shown in Algorithm 5, where the highlighted lines show the improvements. Algorithm 5. Revised iterative update algorithm

10

Transportation Research Part C xxx (xxxx) xxx–xxx

Q. Ge, et al.

5. Numerical experiments The aim of this section is to test the performance of the RSM-DUE in terms of computational efficiency and approximation accuracy relative to the exact DUE solution. For fair comparison, all the DUE solvers were developed using the same programming language and tested under the same environment. We implement two DNL models, namely the cell transmission model and the reservoir-based DNL model as benchmarks. Both are capable of capturing physical queuing and spillback. We develop a Python library based on the surrogate model and solution algorithms. The Python library depends on pyDOE for sampling and scipy . optimize for solving the quadratic programming problem. The workflow of our Python library follows the solution framework in Fig. 3. All experiments were performed on a Windows system PC with 2.80 GHz CPU and 8 GB RAM. 5.1. The Nguyen network with 25 assignment intervals We use the network shown in Fig. 4, which is also used in Ge and Fukuda (2019). It is a customized version of the Nguyen network (Nguyen, 1984; Nie and Zhang, 2007). The test network has 13 nodes and 19 links. Links were manually clustered into three reservoirs based on their design properties; see Fig. 4(a). More systematic methods such as graph partitioning and clustering may be adopted for large networks (Ji and Geroliminis, 2012). For simplicity, we assume that links in the same reservoir follow the same fundamental diagram and the free flow speed in Reservoirs 1, 2, and 3 are 60, 45, and 40 mph, respectively. The boundary flow of each link is 1,800. Therefore, it is possible that a bottleneck will appear around nodes 6, 9, or 11. The path segments of each reservoir form the reduced network. The MFD of each reservoir is fitted by nonparametric regression. The boundary flow of each path segment is calculated by a combination of maximum concurrent flow problem and maximum flow problem. We consider four OD pairs; namely 1–2, 1–3, 4–2, and 4–3. There are 14 feasible paths in the reduced network and 30 feasible paths in the original network. 25 assignment intervals are considered in these experiments. The travel demands of OD pairs 1–2 and 1–3 start at 900 vehicles per hour (vph) in the first 10 intervals, increase to 4,000 vph in the next five intervals, and then decrease to 900 vph in the remaining time. The OD travel demands of 4–2 and 4–3 are 60% of those in 1–2 and 1–3. The following scenarios are considered in the experiment.

• Scenario I: DUE on the original network. The path delay operator is evaluated via the CTM-based DNL. The dimension of the path flow vector is 750 (= 25 × 30 ). This scenario serves as a benchmark for the first strategy (reservoir-based modeling). • Scenario II: DUE on the original network with a surrogate delay operator without adaptive sampling. The dimension of the • • •

path flow vector is 750. We draw three initial datasets with respectively (a) 50, (b) 100, and (c) 200 sample points for training the surrogate model. Scenario III: DUE on the original network with a surrogate delay operator with adaptive sampling. The dimension of the path flow vector is 750. We use the same initial datasets as in Scenario II to train the surrogate model. Scenario IV: DUE on the reduced network with a reservoir-based DNL. The dimension of the path flow vector is 350 (= 25 × 14 ). This serves as the benchmark of the second strategy (surrogate modeling). Scenario V: DUE on the reduced network with a surrogate delay operator without adaptive sampling. The dimension of the path flow vector is 350. We draw three initial datasets with respectively (a) 50, (b) 100, and (c) 200 sample points to train the

Table 2 Summary of tests for the Nyugen network with 25 assignment intervals (AS: adaptive sampling; BM: benchmark). Run time is the sum of training and testing times. Scenarios

# of samples

I

RMSE (OD cost)

MAPE(%) (OD cost)

RMSE (path cost)

MAPE(%) (path cost)

# AS

Run time (s)

BM

BM

BM (II,III)

BM (II,III)



2158

(a) II (b) (c)

50 100 200

2.992 2.674 2.514

0.739 0.664 0.627

2.875 2.801 2.666

0.716 0.729 0.682

– – –

159 349 1277

(a) III(b) (c)

50 100 200

1.839 2.025 0.928

0.486 0.492 0.172

2.698 2.410 1.195

0.674 0.600 0.220

2 2 1

278 614 1965

2.969

0.739

BM(V,VI)

BM(V,VI)



1420

IV (a) V (b) (c)

50 100 200

2.943 2.129 2.627

0.735 0.551 0.707

4.135 3.444 3.205

0.834 0.788 0.716

– – –

97 162 743

(a) VI(b) (c)

50 100 200

2.616 2.129 2.957

0.708 0.551 0.839

2.889 3.444 2.741

0.734 0.788 0.643

4 0 3

363 162 2218

11

Transportation Research Part C xxx (xxxx) xxx–xxx

Q. Ge, et al.



surrogate model. Scenario VI: DUE on the reduced network with the surrogate delay operator with adaptive sampling. The dimension of the path flow vector is 350. We use the same datasets as in Scenario V to train the surrogate model.

As noted in Section 3.3, because of the non-monotonicity of the path delay operator, a DUE problem may have multiple equilibria; nevertheless, their path costs should be the same. For this reason, we focus on the path delays instead of path flow patterns in the following presentation. In the numerical tests, we set the termination criterion of Algorithm 4 to be

||fk + 1

fk ||

(12)

0.1,

and the termination criteria of Algorithms 3 and 5 to be

|| ( f )

( f )||

|| ( f )||

0.01,

(13)

which means that the gap between model-generated path delays and path delays at the exact equilibrium is less than 1%. We use the root mean square error (RMSE) and mean average percentage error (MAPE) to measure the approximation accuracy of the surrogate models, which are defined as n

( i (f ) RMSE =

i ( f ))

i=1

n

2

,

MAPE =

1 n

n i=1

i (f )

i (f ) i (f )

,

where n is the number of observations, are used to measure the absolute or relative errors. In our case the quantity of interest is either the vector of cost per OD (OD cost) or cost per path (path cost). The results for each scenario are summarized in Table 2. It is seen that the proposed models can reduce the computational time significantly and yield good approximation accuracy compared to the benchmark case. After finite runs of Algorithms 3 and 4, all three tests in Scenarios III and VI reached convergence. It is observed that, for an error threshold of 1%, the surrogate-based network loading is sufficient to approximate the OD/path costs for all tests. In the tests II(a), V(a) and V(b), the computational time was reduced by 90%. However, the computational burden of scenario III(c) and VI(c) are quite significant and comparable to that of the benchmarks. Measures to fix this problem will be discussed later in this section. We show the convergence of Algorithms 3 and 4 in Fig. 5. Since Scenarios II(a,b,c) and V(a,b,c) do not require adaptive sampling and only Algorithm 4 is adopted, we pick the benchmark tests and II(c) and V(c) to show the convergence of the projection algorithm. The convergence associated with these tests are shown in Fig. 5(a), from which we observe oscillations in the gap, followed by descending trend indicating convergence to equilibria. The oscillations are caused by the choice of small initial step size in the projection algorithm. Scenarios III(a,b,c) and VI(a,b,c) require the adaptive sampling procedure, thus we pick these scenarios to show the convergence of Algorithm 3. The convergence of approximation gaps in these tests is illustrated in Fig. 5(b). In Figs. 6 and 7, we illustrate the equilibrated path flows and delays of OD pair 1–2 in Scenarios I, II(c), III(c) (Fig. 6) and and IV, I (c), VI(c) (Fig. 7). The other ODs are omitted here for brevity. It is observed that the DUE condition is satisfied for all these scenarios. That is, all utilized paths for OD-pair 1–2 have the same, minimum travel cost within each assignment interval. The path costs in all these figures are calculated by an extra run of the exact DNL model with the path flow pattern obtained from the surrogate-based DUE problem. We observe that the path flows calculated by the surrogate model and the path costs evaluated by the exact DNL may agree

Fig. 5. The equilibrium gap of Scenario I-III and approximation gap of Scenario IV. 12

Transportation Research Part C xxx (xxxx) xxx–xxx

Q. Ge, et al.

Fig. 6. DUE solutions for OD pair 1–2. (a) Scenario I path delays; (b) Scenario II(c) path delays; (c) Scenario III(c) path delays; (d) Scenario I flows; (e) Scenario II(c) path flow; (f) Scenario III(c) flows.

more if the adaptive sampling is applied, such as in Fig. 6(c) and (f), as opposed to Fig. 6(b) and (e). When solving the two-stage surrogate-based DUE problem, we notice that a larger initial dataset facilitates the quick convergence in general, as shown in Fig. 5(b) and Table 2. When there are only a handful of sample points available, the approximation gap could be very large because the projection algorithm attempts to evaluate the path cost with the training samples when searching for the next path flow. Therefore, more adaptive sampling steps are required to generate an acceptable approximation of the path delay operator for smaller initial datasets. Nonetheless, there are exceptions. For example, samples drawn from the local neighborhood of the DUE solution. However, the effect of the initial sample size on the approximation gap is not as significant as the convergence speed. Take Scenarios V(a)-(c) for example. As shown in Table 2, the surrogate model trained with 100 samples achieves smaller approximation gap than that of 200 samples. This result is not surprising because the WKRR model serves as a local approximation of the path delay. The initial training data are scattered in the feasible region; thus, the fineness of the trained model is sensitive to the selection of samples. Instead, the adaptive sampling scheme is more effective in improving the approximation gap. In the two-stage model, an extra data point is drawn at the equilibrium and the surrogate model is re-trained with the new data. Hence, the approximation gap can be 13

Transportation Research Part C xxx (xxxx) xxx–xxx

Q. Ge, et al.

Fig. 7. DUE solutions for OD pair 1–2. (a) Scenario IV path delays; (b) Scenario V(c) path delays; (c) Scenario VI(c) path delays; (d) Scenario IV path flows; (e) Scenario V(c) path flows; (f) Scenario VI(c) path flows.

reduced for the next run of the DUE problem. This phenomenon could be observed by comparing the results of Scenarios II and III or Scenarios V and VI. Fig. 5(b) depicts the change of approximation gap of the surrogate model as the adaptive sampling procedure carries forward. The approximation gaps drop from over 3% to less than 1% with this scheme in the three tests of Scenario III or VI. However, the adaptive sampling scheme is realized by solving the VIP and evaluating an extra datum, which can be computationally expensive. Therefore, experiments should be carefully designed when determining the initial sample size. 5.2. The Nyugen network with 100 assignment intervals This section presents the results for the customized Nyugen network with 100 assignment intervals (4 times of the small problem presented in the previous section). This experiment helps to explore the possibility of applying the two proposed schemes to largescale problems. Due to the growth of the number of paths with the size of network, several heuristics have been used to reduce the dimension of variables in past studies, for example, the k-minimum cost heuristics (Mahmassani, 2001) and the column generation heuristics (Nie and Zhang, 2007). We filtered the number of paths by the top-k for experiments in this section. Initially, there are 3000 paths in the 14

Transportation Research Part C xxx (xxxx) xxx–xxx

Q. Ge, et al.

Table 3 Summary of tests on the Nyugen network (AS: adaptive sampling; BM: benchmark). Scenarios

# of samples

DUE-CTM

RMSE (OD cost)

MAPE(%) (OD cost)

RMSE (path cost)

MAPE(%) (path cost)

# AS

Run time (s)

BM

BM

BM (CTM)

BM (CTM)



2030

DUE-CTM-WKRR DUE-CTM-WKRR DUE-CTM-WKRR

50 100 150

5.07 3.22 3.17

1.68 0.89 0.84

6.65 5.49 4.49

1.96 1.68 1.53

– – –

347 704 1460

DUE-CTM-WKRR-AS DUE-CTM-WKRR-AS DUE-CTM-WKRR-AS

50 100 150

2.32 2.87 1.88

0.57 0.64 0.42

3.98 4.76 3.99

1.41 1.58 1.40

20 11 8

4514 4523 9535

DUE-MFD



5.56

1.43

BM(MFD)

BM(MFD)



1450

DUE-MFD-WKRR DUE-MFD-WKRR DUE-MFD-WKRR

50 100 150

5.99 5.97 5.25

1.47 1.57 1.50

13.73 11.30 9.82

4.02 3.25 2.84

– – –

339 700 1871

DUE-MFD-WKRR-AS DUE-MFD-WKRR-AS DUE-MFD-WKRR-AS

50 100 150

6.17 6.13 6.13

1.64 1.69 1.79

9.63 9.49 9.72

2.86 1.58 2.88

9 7 8

1822 2915 13,171

physical network and 1400 paths in the reduced network. Based on our prior knowledge of the path choice of peak hours in the first experiment, we use k = 3 in this experiment. This results in 1200 variables in the benchmark and the surrogate models. We find that this filtering heuristic is useful to reduce computation time. Detailed discussion on the effect of the top-k path heuristics will be presented later. The results of the first experiment are reported in Table 3, which shows similar trends in relation to the proposed schemes and benchmarks. However, the proposed schemes may not always be effective in all tests. We observe that the computational times of the surrogate model could be longer than the benchmarks in some extreme cases, such as DUE-MFD-WKRR-AS 3 and DUE-CTM-WKRR-AS 4 with 150 initial training points. This issue occurs because we generated the same number of paths for each experiment but the repeated computation of kernel matrices and vectors could not be reduces. In other words, the CTM-based surrogate model and the MFD-based model have the same dimensions of variables. The dimension of kernel matrices and vectors of both surrogate models are also the same. However, the reservoir-based DUE model may occasionally requires more iterations to get the low-level model to converge.5 Hence, higher computation time is required for the reservoir-based surrogate model in some tests. In practice, smaller path set may be used for the MFD-based scheme because of the aggregation of the network. Meanwhile, we may also reduce the repeated computation of the kernel matrices using Algorithm 5. 5.3. The Sioux-Falls network The third experiment was conducted on the Sioux-Falls network, whose properties are detailed in Ge and Fukuda (2019). We consider 15 OD pairs and 19 assignment intervals. We filtered the path set by top 3 for each OD pair, which results in 855 variables in the physical network and 779 variables in the reduced network. Travel demands change with time and have been set as given. Multiple surrogate models and kernels were tested in this experiment. We only report the results of the CTM-based surrogate model with 100 initial samples. For simplicity and fairness, both the polynomial kernel and RBF kernel have one free parameter, whose ranges are [1, 10] and [1, 10, 000], respectively. For convenience, we only test 10 candidate values from these ranges in each case. They are drawn in fixed scale for the polynomial kernel and in log scale for the RBF kernel, respectively. The results are reported in Table 4, which suggests that all surrogate models can serve as acceptable approximation to the original model. The WKRR model with dimension reduction has the best performance for this experiment in terms of RMSE and MAPE with respect to the benchmark model. Meanwhile, dimension reduction in the features could save considerable computational time without affecting the calculated results. In addition, the RBF kernel is not computationally economic in our problem setting considering its extremely long running time. Besides, it performs poorly compared with the polynomial kernel. We may observe that the computation time grows drastically with the increased number of initial samples. Meanwhile, most surrogate models could not improve the computational time. Clearly, the naive implementation of the surrogate model is not applicable to larger-scale problems. The excessive computational times of the surrogate models are due to (1) the repeated computation of the kernel matrices and (2) strict stop criteria for the approximation gap. These issues will be addressed in the following sections. 3

DUE problem with data drawn from reservoir-based simulation model as training samples and WKRR as the surrogate model DUE problem with data drawn from CTM as training samples and WKRR as the surrogate model 5 Though fine initial samples may reduce the number of iteration of both the upper and lower level of models, the number of iterations to reach the convergence of both models could not be guaranteed. 4

15

Transportation Research Part C xxx (xxxx) xxx–xxx

Q. Ge, et al.

Table 4 Summary of tests for the Sioux-Falls network (AS: adaptive sampling; BM: benchmark; DR: dimensionality reduction). Scenario

Kernel

Features (use DR?)

# of samples

RMSE (OD cost)

MAPE(%) (OD cost)

RMSE (path cost)

MAPE(%) (path cost)

# AS

Run time (s)

DUE-CTM







BM

BM

BM

BM

DUR-CTM-RR DUE-CTM-WRR DUE-CTM-KRR DUE-CTM-WKRR

Linear Linear Poly Poly

Y Y Y Y

100 100 100 100

1.26 1.30 1.28 1.24

0.67 0.66 0.67 0.65

6.83 6.83 6.83 6.83

3.27 3.26 3.27 3.26

9 12 9 11

1287 1589 1797 2118

DUE-CTM-RR DUE-CTM-WRR DUE-CTM-KRR DUE-CTM-WKRR

Linear Linear Poly Poly

N N N N

100 100 100 100

1.26 1.29 1.24 1.30

0.67 0.68 0.65 0.68

6.83 6.83 6.83 6.83

3.26 3.27 3.26 3.27

7 10 4 5

959 3443 2515 3543

DUE-CTM-KRR DUE-CTM-WKRR

rbf rbf

Y Y

100 100

24.53 7.25

1.65 4.39

8.92 6.49

5.39 4.07

>20 >20

35302 36885

DUE-MFD







19.30

12.59







2028

2683

Table 5 Test results on the Nyugen network with the pre-computation algorithm, and time saving compared to the cases without pre-computation. Scenario

# of samples

DUE-CTM

RMSE (OD cost)

MAPE(%) (OD cost)

RMSE (path cost)

MAPE(%) (path cost)

# AS

Run time (s)

Time saving

BM

BM

BM (CTM)

BM (CTM)



2030



DUE-CTM-WKRR DUE-CTM-WKRR DUE-CTM-WKRR

50 100 150

5.33 3.26 2.96

1.72 0.88 0.79

10.19 9.62 9.98

2.73 2.45 2.48

– – –

314 554 589

9.5% 21.3% 59.7%

DUE-CTM-WKRR-AS DUE-CTM-WKRR-AS DUE-CTM-WKRR-AS

50 100 150

1.85 2.50 1.91

0.52 0.64 0.43

9.37 8.71 8.69

2.18 2.19 2.18

6 3 3

961 764 861

78.7% 83.1% 91.0%

DUE-MFD

~

7.59

2.36

BM (MFD)

BM (MFD)



1450



DUE-MFD-WKRR DUE-MFD-WKRR DUE-MFD-WKRR

50 100 150

6.90 6.77 6.23

1.94 1.71 1.49

12.36 12.21 11.50

3.51 3.39 3.21

– – –

277 410 564

18.4% 41.5% 69.9%

DUE-MFD-WKRR-AS DUE-MFD-WKRR-AS DUE-MFD-WKRR-AS

50 100 150

5.48 5.04 6.30

1.37 1.26 1.54

11.56 10.61 12.32

3.13 2.99 3.29

7 9 7

767 1077 1338

57.9% 63.1% 89.8%

5.4. Boosting computational efficiency In this section, we address the potential issue of high computational time when the proposed models and algorithms are applied to large-scale problems. This is investigated through the pre-computation scheme, selection of top-k paths, and termination criteria. Pre-computation scheme. To address the issue of computational burden for large-scale problems, we implement the proposed precomputation scheme based on Algorithms 4 and 5 using the same test settings as the previous sections. The results are shown in Table 5. Comparing with the results in Table 3, we find that the computational times for almost all scenarios are reduced substantially while maintaining the same level of RMSE/MAPE. The amount of saving increases with the number of initial samples, and can reach up to 78.7% (50 samples), 83.1% (100 samples) and 91.0% (150 samples). Nevertheless, the effectiveness of the pre-computation scheme may be limited with adaptive sampling (AS), as the pre-computation of kernel matrices cannot be done without knowing the samples in advance, which in the AS case are obtained on-the-fly. We note this as a limitation of the pre-computation scheme. For potential solutions, the need for AS is highly related to the required approximation precision (in our case 1%), and can be substantially reduced by relaxing such requirements. For example, all the test scenarios achieve the approximation precision of 3% in the second iteration, as shown in Table 7. Thus, with limited instances of AS, the benefits of pre-computation can be amplified. We also tested the pre-computation scheme on the Sioux Falls network with results presented in Table 6. In this experiment, we used a different training sample than in Table 4, where the sampled path flows are more sparsely distributed in the feasible region of OD demands. We expect that surrogate models using local weights would outperform the ones without weighting scheme. The results 16

Transportation Research Part C xxx (xxxx) xxx–xxx

Q. Ge, et al.

Table 6 Test results on the Sioux Falls network with the pre-computation algorithm, and time saving compared to the cases without pre-computation. Scenarios

Kernel

Features (use DR?)

# of samples

RMSE (OD cost)

MAPE(%) (OD cost)

RMSE (path cost)

MAPE(%) (path cost)

DUE-CTM







BM

BM

BM

BM

DUE-CTM-RR DUE-CTM-WRR DUE-CTM-KRR DUE-CTM-WKRR

Linear Linear Poly Poly

Y Y Y Y

100 100 100 100

>100 1.46 >100 1.47

>100 0.76 >100 0.76

>100 8.08 >100 8.08

>100 4.62 >100 4.62

DUE-CTM-RR DUE-CTM-WRR DUE-CTM-KRR DUE-CTM-WKRR

Linear Linear Poly Poly

N N N N

100 100 100 100

>100 1.42 >100 1.42

>100 0.71 >100 0.71

>100 8.02 >100 8.02

>100 5.31 >100 5.31

# AS

Run time (s)

Time saving

2683



21 3 21 3

1002 401 1022 399

22.1% 74.8% 43.1% 81.2%

21 3 21 3

1256 398 1652 395

−31.0% 88.4% 34.3% 88.9%

Table 7 Model performances on the Nyugen network (with adaptive sampling and pre-computation) with different sample sizes and k value. Scenario

# of samples

top-k

# of variable

Run time (s)

# of AS to converge

Run time (s) (per AS)

DUE-CTM-WKRR-AS DUE-CTM-WKRR-AS DUE-CTM-WKRR-AS

50 100 150

5 5 5

2000 2000 2000

1043 1640 1750

6 7 5

174 234 350

DUE-CTM-WKRR-AS DUE-CTM-WKRR-AS DUE-CTM-WKRR-AS

50 100 150

4 4 4

1600 1600 1600

667 917 1226

4 4 4

167 229 307

DUE-CTM-WKRR-AS DUE-CTM-WKRR-AS DUE-CTM-WKRR-AS

50 100 150

3 3 3

1200 1200 1200

809 764 861

6 3 3

135 255 287

DUE-MFD-WKRR-AS DUE-MFD-WKRR-AS DUE-MFD-WKRR-AS

50 100 150

3 3 3

1200 1200 1200

766 1105 1338

7 9 7

109 123 191

match our expectation well. We observe that the [RR] and [KRR] models fail to regress the surface of path costs in this experiment (with very large RMSEs and MAPEs). In contrast, schemes that considers local weights such as [WRR] and [WKRR] achieves much smaller and reasonable RMSE and MAPE. The amount of computational time saving through the pre-computation scheme can reach up to 88.4% (linear kernel) and 88.9% (polynomial kernel). Overall, Tables 5,6 suggest that, compared with the benchmark cases (DUE-CTM), the proposed models with pre-computation scheme can achieve a computational saving of up to 84.5% (Nyugen) and 85.3% (Sioux Falls). This means that the proposed methods have a significant potential to increase the computational efficiency of DUE solution procedure on large-scale networks. Top-k paths. To elaborate the effect of top-k paths on the computational efficiency, we consider different choices of k in Table 7, which shows the average computational time of surrogate-based DUE model per adaptive sampling procedure (i.e. an iteration of the upper-level and lower level model). It is verified that the selection of k determines the number of variables and hence affects the computational times of the lower-level model. As shown in Table 7, the computational times decrease along with the number of variables in most tests. However, minor exception exists, due to the number of iterations to converge in the lower-level model being affected by the generated path set. Overall, the run time is affected by the value of k, but the trend is insignificant according to our test results. Stopping criteria. Another issue that affects the usage of the surrogate model is to determine a reasonable stop criterion for the approximation gap. We may use the results in Table 8 to show the effect of different thresholds on the number of adaptive sampling and hence the computational time. If the threshold is relaxed to be 2%, we see that all six tests converge after no more than 4 AS iterations. Fig. 8 further shows the trend of the approximation gap with 9 adaptive sampling procedures in the case DUE-MFD-WKRRAS (100 samples). It is seen that only 2 AS iterations are needed to reach an approximation gap within 2%, and 7 more to reach a gap of 1%. Since adaptive sampling requires excessive computation time, the trade-off between computational feasibility and accuracy of approximation is critical in larger-scale problems. Similarly, the lower-level DUE problem is also solved with a termination threshold, as seen in Eq. (12), which affects the solution times of the lower-level problem. By relaxing this threshold, the computational times decrease accordingly, as shown in Table 9. 17

Transportation Research Part C xxx (xxxx) xxx–xxx

Q. Ge, et al.

Table 8 Approximation gap after different numbers of adaptive sampling procedures. ’-’ means that the algorithm has converged (with a threshold of 1%). Scenarios

# of samples

AS #1

AS #2

AS #3

AS #4

DUE-CTM-WKRR-AS DUE-CTM-WKRR-AS DUE-CTM-WKRR-AS

50 100 150

2.61% 1.98% 1.38%

2.61% 2.15% 2.65%

1.41% 0.73% 0.77%

1.58% – –

DUE-MFD-WKRR-AS DUE-MFD-WKRR-AS DUE-MFD-WKRR-AS

50 100 150

1.98% 4.52% 4.55%

3.86% 1.99% 2.93%

2.17% 3.74% 2.07%

1.88% 2.51% 1.59%

Fig. 8. Convergence of the adaptive sampling procedure (DUE-MFD-WKRR-AS, 100 initial samples). Table 9 Computational times of the lower-level DUE solver with different termination thresholds . Scenarios

# of samples

DUE-CTM-WKRR-AS DUE-CTM-WKRR-AS DUE-CTM-WKRR-AS

50 100 150

348 s 687 s 770 s

285 s 477 s 687 s

267 s 433 s 637 s

DUE-MFD-WKRR-AS DUE-MFD-WKRR-AS DUE-MFD-WKRR-AS

50 100 150

347 s 467 s 678 s

281 s 431 s 655 s

259 s 421 s 606 s

= 0.1

= 0.2

= 0.3

6. Discussion and conclusions This study proposed a surrogate-based solution framework for the dynamic user equilibrium problem. We introduce the weighted ridge regression (WRR) and weighted kernel ridge regression (WKRR) problems to the analytical approximation of the path delay operator and then apply the kernel trick to train the model. Based on the surrogate model, we formulate the DUE as a variational inequality problem. To maintain the fitness of the surrogate model while reaching the equilibrium path flow, we propose a two-stage model, which considers both objectives simultaneously. We further prove the existence of the DUE solution with the surrogate model and explored the benefits of the proposed method. A customized iterative update algorithm is developed to solve the model, followed by a pre-computation scheme that significantly boost the computational efficiency. We tested various scenarios to evaluate the proposed models and algorithms. The results show that the proposed methods can achieve a computational saving of over 80% compared with the exact DUEs, while maintaining a low level of approximation error (with MAPE below 6%). 18

Transportation Research Part C xxx (xxxx) xxx–xxx

Q. Ge, et al.

It is concluded from the numerical experiments that the WKRR could serve as a surrogate for the path delay operator because the DUE condition could be attained with small approximation gap. The MAPEs of path delays under equilibrium are acceptable in all tested cases (below 6%). While the computational time depends on the network property and demand profile, the computational efficiency of the proposed methods is consistent across all the test scenarios. In particular, the naive algorithm (Algorithm 3) without adaptive sampling could save the computational time by up to 83% (Nyugen network). However, the naive algorithm calculates the kernel matrices on the fly and could be time-consuming. We avoid the need to repeatedly calculate these matrices through a precomputation scheme that stores all these matrices. This allows us to save the computational time especially when there are too many assignment intervals. We observed significant saving in the computation time when applying the pre-computation scheme, by up to 84.5% (Nyugen) and 85.3% (Soux Falls). Another issue related to the computational time is the approximation threshold used in adaptive sampling. The choice of such a threshold has a significant impact on the required number of AS. For example, all the test scenarios achieve the approximation precision of 2% within 4 iterations, and 3% within 2. Therefore, this suggests a practical way to flexibly trade approximation accuracy for computational efficiency in surrogate modeling. Furthermore, the lower-level DUE problem is also solved with a termination threshold, as seen in Eq. (12), which affects the solution times of the lower-level problem. By relaxing this threshold, the computational times decrease accordingly, as shown in Table 9. The performance of the initially trained WKRR is sensitive to the experiment design. In this numerical experiment, we used the LHS sampling with projection to draw initial training samples from the feasible region. Although the data were uniformly drawn under the LHS scheme, the projection operation would break the uniformity, as shown in Fang et al. (2005). We introduced the weighting scheme and adaptive sampling to reduce the sensitivity of initial sample. There are still unresolved issues when applying the surrogate model to DUE problems. A fundamental problem worth noting is the continuity and monotonicity of the path delay operator. Szeto (2003) and Han et al. (2016) suggest that discontinuity may occur due to queue spillback. Though our model could approximate the path delay by high-order features, the discontinuity would inevitably affect the approximation gap and even the existence of the DUE solution. Better understanding of the continuity of exact path delay could help us make wiser choices on whether to adopt a surrogate model. Meanwhile, the monotonicity of the exact path delay operator is not guaranteed. Luckily, the surrogate model does not depend on the monotonicity of the physical system. It could provide a local approximation to the delay operator at the test data. When there is an extra requirement that the path delay operator should be monotonic in some cases, we could easily verify whether the WKRR is applicable to such a situation. The following research is underway to extend the surrogate model to wider applications of the DTA problem: (1) integrating the surrogate model with the DSO problem; (2) applying the surrogate model to inverse problems such as the network design problem; and (3) incorporating the surrogate model with real-world data. Acknowledgement This study was supported by the Committee on Advanced Road Technology (CART), Ministry of Land, Infrastructure, Transport, and Tourism, Japan, # 28-1 and # 30-3. References Akamatsu, T., 2001. An efficient algorithm for dynamic traffic equilibrium assignment with queues. Transp. Sci. 35, 389–404. Alaoui, A., Mahoney, M.W., 2015. Fast randomized kernel ridge regression with statistical guarantees. In: Cortes, C., Lawrence, N.D., Lee, D.D., Sugiyama, M., Garnett, R. (Eds.), Advances in Neural Information Processing Systems 28. Curran Associates Inc, pp. 775–783. Ban, X.J., Pang, J.S., Liu, H.X., Ma, R., 2012. Modeling and solving continuous-time instantaneous dynamic user equilibria: A differential complementarity systems approach. Transp. Res. Part B: Methodological 46, 389–408. Carey, M., Ge, Y., McCartney, M., 2003. A whole-link travel-time model with desirable properties. Transp. Sci. 37, 83–96. Chang, C.C., Lin, C.J., 2011. Libsvm: A library for support vector machines. ACM Trans. Intell. Syst. Technol. (TIST) 2, 27. Chen, X.M., Zhang, L., He, X., Xiong, C., Li, Z., 2013. Surrogate-Based optimization of expensive-to-evaluate objective for optimal highway toll charges in transportation network. Comput-Aided Civil Infrastruct. Eng. 29, 359–381. Chong, L., Osorio, C., 2017. A simulation-based optimization algorithm for dynamic large-scale urban transportation problems. Transp. Sci. 52, 637–656. Ciuffo, B., Casas, J., Montanino, M., Perarnau, J., Punzo, V., 2013. Gaussian process metamodels for sensitivity analysis of traffic simulation models: case study of aimsun mesoscopic model. Transp. Res. Rec. 2390, 87–98. Claudel, C.G., Bayen, A.M., 2010a. Lax–hopf based incorporation of internal boundary conditions into hamilton–jacobi equation. Part i: Theory. IEEE Trans. Autom. Control 55, 1142–1157. Claudel, C.G., Bayen, A.M., 2010b. Lax–hopf based incorporation of internal boundary conditions into hamilton-jacobi equation. Part ii: Computational methods. IEEE Trans. Autom. Control 55, 1158–1174. Daganzo, C.F., 1994. The cell transmission model: a dynamic representation of highway traffic consistent with the hydrodynamic theory. Transp. Res. Part B: Methodological 28, 269–287. Daganzo, C.F., 1995. The cell transmission model, Part II: Network traffic. Transp. Res. Part B: Methodological 29, 79–93. Daganzo, C.F., 2005a. A variational formulation of kinematic waves: basic theory and complex boundary conditions. Transp. Res. Part B: Methodological 39, 187–196. Daganzo, C.F., 2005b. A variational formulation of kinematic waves: Solution methods. Transp. Res. Part B: Methodological 39, 934–950. Daganzo, C.F., 2007. Urban gridlock: Macroscopic modeling and mitigation approaches. Transp. Res. Part B: Methodological 41, 49–62. Facchinei, F., Pang, J.S., 2007. Finite-Dimensional Variational Inequalities and Complementarity Problems. Springer Science & Business Media. Fang, K.T., Li, R., Sudjianto, A., 2005. Design and Modeling for Computer Experiments. CRC Press. Friesz, T.L., 2010. Dynamic optimization and differential games, vol. 135 Springer Science & Business Media. Friesz, T.L., Bernstein, D., Smith, T.E., Tobin, R.L., Wie, B.W., 1993. A variational inequality formulation of the dynamic network user equilibrium problem. Oper. Res. 41, 179–191. Friesz, T.L., Bernstein, D., Suo, Z., Tobin, R.L., 2001. Dynamic network user equilibrium with state-dependent time lags. Networks Spatial Econ. 1, 319–347. Friesz, T.L., Han, K., Neto, P.A., Meimand, A., Yao, T., 2013. Dynamic user equilibrium based on a hydrodynamic model. Transp. Res. Part B: Methodological 47, 102–126. Friesz, T.L., Luque, J., Tobin, R.L., Wie, B.W., 1989. Dynamic network traffic assignment considered as a continuous time optimal control problem. Oper. Res. 37, 893–901. Friesz, T.L., Mookherjee, R., 2006. Solving the dynamic network user equilibrium problem with state-dependent time shifts. Transp. Res. Part B: Methodological 40, 207–229.

19

Transportation Research Part C xxx (xxxx) xxx–xxx

Q. Ge, et al.

Ge, Q., Fukuda, D., 2019. A macroscopic dynamic network loading model for multiple-reservoir system. Transp. Res. Part B: Methodological 126, 502–527. https://doi.org/10. 1016/j.trb.2018.06.008. Geroliminis, N., Daganzo, C.F., 2008. Existence of urban-scale macroscopic fundamental diagrams: some experimental findings. Transp. Res. Part B: Methodological 42, 759–770. Gosavi, A., 2015. Simulation-Based Optimization, 2nd Edition. Springer. Han, K., Friesz, T.L., 2017. Continuity of the effective delay operator for networks based on the link delay model. Networks Spatial Econ. 1–16. Han, K., Friesz, T.L., Yao, T., 2013. A partial differential equation formulation of Vickrey’s bottleneck model, Part I: Methodology and theoretical analysis. Transp. Res. Part B: Methodological 49, 55–74. Han, K., Friesz, T.L., Yao, T., 2013. A partial differential equation formulation of Vickrey’s bottleneck model, Part II: Numerical analysis and computation. Transp. Res. Part B: Methodological 49, 75–93. Han, K., Friesz, T.L., Yao, T., 2013. Existence of simultaneous route and departure choice dynamic user equilibrium. Transp. Res. Part B: Methodological 53, 17–30. Han, K., Piccoli, B., Friesz, T.L., 2016. Continuity of the path delay operator for dynamic network loading with spillback. Transp. Res. Part B: Methodological 92, 211–233. Haworth, J., Shawe-Taylor, J., Cheng, T., Wang, J., 2014. Local online kernel ridge regression for forecasting of urban travel times. Transp. Res. Part C: Emerg. Technol. 46, 151–178. Haykin, S., 2009. Neural Networks and Learning Machines, vol. 3 Pearson Upper Saddle River, NJ, USA. He, B., Liao, L.Z., 2002. Improvements of some projection methods for monotone nonlinear variational inequalities. J. Optim. Theory Appl. 112, 111–128. He, X., Chen, X., Xiong, C., Zhu, Z., Zhang, L., 2016. Optimal time-varying pricing for toll roads under multiple objectives: a simulation-based optimization approach. Transp. Sci. 51, 412–426. Himpe, W., Corthout, R., Tamp re, M.J.C., 2016. An efficient iterative link transmission model. Transp. Res. Part B: Methodological 92, 170–190. Huang, H.J., Lam, W.H., 2002. Modeling and solving the dynamic user equilibrium route and departure time choice problem in network with queues. Transp. Res. Part B: Methodological 36, 253–273. Iryo, T., 2011. Multiple equilibria in a dynamic traffic network. Transp. Res. Part B: Methodological 45, 867–879. Iryo, T., 2013. Properties of dynamic user equilibrium solution: existence, uniqueness, stability, and robust solution methodology. Transportmetrica B: Transp. Dyn. 1, 52–67. Ji, Y., Geroliminis, N., 2012. On the spatial partitioning of urban transportation networks. Transp. Res. Part B: Methodological 46, 1639–1656. Jin, W.L., 2015. Continuous formulations and analytical properties of the link transmission model. Transp. Res. Part B: Methodological 74, 88–103. Khobotov, E.N., 1987. Modification of the extra-gradient method for solving variational inequalities and certain optimization problems. USSR Comput. Math. Math. Phys. 27, 120–127. Kuwahara, M., Akamatsu, T., 1993. Dynamic equilibrium assignment with queues for a one-to-many OD pattern. In: Transportation and Traffic Theory (the 12th ISTTT), pp. 185–204. Kuwahara, M., Akamatsu, T., 1997. Decomposition of the reactive dynamic assignments with queues for a many-to-many origin-destination pattern. Transp. Res. Part B: Methodological 31, 1–10. Laval, J.A., Leclercq, L., Chiabaut, L., 2017. Minimal parameter formulations dynamic traffic assignment using the macroscopic fundamental diagram: Freeway vs city streets user equilibrium revisited. In: Hani Mahmassani, Y.M.N., Smilowitz, K. (Eds.), 22nd International Symposium on Transportation and Traffic Theory (ISTTT22). vol. 23, pp. 517–530. Leclercq, L., Parzani, C., Knoop, V.L., Amourette, J., Hoogendoorn, S.P., 2015. Macroscopic traffic dynamics with heterogeneous route patterns. Transp. Res. Part C: Emerg. Technol. 59, 292–307. Lighthill, M.J., Whitham, G.B., 1955. On kinematic waves. ii. A theory of traffic flow on long crowded roads. Proc. R. Soc. Lond. A 229, 317–345. Lo, H.K., Szeto, W.Y., 2002. A cell-based variational inequality formulation of the dynamic user optimal assignment problem. Transp. Res. Part B: Methodological 36, 421–443. Luss, R., Rosset, S., Shahar, M., et al., 2012. Efficient regularized isotonic regression with application to gene–gene interaction search. Ann. Appl. Stat. 6, 253–283. Mahmassani, H., Herman, R., 1984. Dynamic user equilibrium departure time and route choice on idealized traffic arterials. Transp. Sci. 18, 362–384. Mahmassani, H.S., 2001. Dynamic network traffic assignment and simulation methodology for advanced system management applications. Networks Spatial Econ. 1, 267–292. Merchant, D.K., Nemhauser, G.L., 1978. A model and an algorithm for the dynamic traffic assignment problems. Transp. Sci. 12, 183–199. Murphy, K.P., 2012. Machine Learning: A Probabilistic Perspective. MIT press. Nagurney, A., 1998. Network Economics: A Variational Inequality Approach, vol. 10 Springer Science & Business Media. Nguyen, S., 1984. Estimating origin destination matrices from observed flows. In: Florian, M.A. (Ed.), Transportation Planning Models, Proceedings of a course given at the International Center for Transportation Studies (ICTS). Nie, Y.M., 2011. A cell-based Merchant-Nemhauser model for the system optimum dynamic traffic assignment problem. Transp. Res. Part B: Methodological 45, 329–342. Nie, Y.M., Ma, J., Zhang, H.M., 2008. A polymorphic dynamic network loading model. Comput.-Aided Civil Infrastruct. Eng. 23, 86–103. Nie, Y.M., Zhang, H.M., 2007. Solving the dynamic user optimal assignment problem considering queue spillback. Networks Spatial Econ. 10, 49–71. Osorio, C., Bierlaire, M., 2013. A simulation-based optimization framework for urban transportation problems. Oper. Res. 61, 1333–1345. Osorio, C., Chong, L., 2015. A computationally efficient simulation-based optimization algorithm for large-scale urban transportation problems. Transp. Sci. 49, 623–636. Osorio, C., Flötteröd, G., 2014. Capturing dependency among link boundaries in a stochastic dynamic network loading model. Transp. Sci. 49, 420–431. Osorio, C., Flötteröd, G., Bierlaire, M., 2011. Dynamic network loading: a stochastic differentiable model that derives link state distributions. Transp. Res. Part B: Methodological 45, 1410–1423. Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., et al., 2011. Scikit-learn: machine learning in python. J. Mach. Learn. Res. 12, 2825–2830. Peeta, S., Ziliaskopoulos, A.K., 2001. Foundations of dynamic traffic assignment: the past, the present and the future. Networks Spatial Econ. 1, 233–265. Ran, B., Hall, R.W., Boyce, D.E., 1996. A link-based variational inequality model for dynamic departure time/route choice. Transp. Res. Part B: Methodological 30, 31–46. Richards, P.I., 1956. Shock waves on the highway. Oper. Res. 4, 42–51. Schölkopf, B., Herbrich, R., Smola, A.J., 2001. A generalized representer theorem. In: International Conference on Computational Learning Theory. Springer, pp. 416–426. Song, W., Han, K., Wang, Y., Friesz, T.L., del Castillo, E., 2017. Statistical metamodeling of dynamic network loading. Transp. Res. Part B: Methodological. Szeto, W.Y., 2003. Dynamic Traffic Assignment: Formulations, Properties, and Extensions. The Hong Kong University of Science and Technology. Szeto, W.Y., Lo, H.K., 2004. A cell-based simultaneous route and departure time choice model with elastic demand. Transp. Res. Part B: Methodological 38, 593–612. Ukkusuri, S.V., Han, L., Doan, K., 2012. Dynamic user equilibrium with a path based cell transmission model for general traffic networks. Transp. Res. Part B: Methodological 46, 1657–1684. Vickrey, W.S., 1969. Congestion theory and transport investment. Am. Econ. Rev. Wardrop, J.G., 1952. Some theoretical aspects of road traffic research. In: Inst Civil Engineers Proc London/UK/. Wei, C., Asakura, Y., Iryo, T., 2014. Formulating the within-day dynamic stochastic traffic assignment problem from a bayesian perspective. Transp. Res. Part B: Methodological 59, 45–57. Wie, B.W., Tobin, R.L., Friesz, T.L., Bernstein, D., 1995. A discrete time, nested cost operator approach to the dynamic network user equilibrium problem. Transp. Sci. 29, 79–92. Wu, J.H., Chen, Y., Florian, M., 1998. The continuous dynamic network loading problem: a mathematical formulation and solution method. Transp. Res. Part B: Methodological 32, 173–187. Xu, Y., Wu, J.H., Florian, M., Marcotte, P., Zhu, D., 1999. Advances in the continuous dynamic network loading problem. Transp. Sci. 33, 341–353. Yildirimoglu, M., Geroliminis, N., 2014. Approximating dynamic equilibrium conditions with macroscopic fundamental diagrams. Transp. Res. Part B: Methodological 70, 186–200. Yildirimoglu, M., Ramezani, M., Geroliminis, N., 2015. Equilibrium analysis and route guidance in large-scale networks with MFD dynamics. Transp. Res. Part C 404–420. Yperman, I., 2007. The link transmission model for dynamic network loading. Zhu, D., Marcotte, P., 2000. On the existence of solutions to the dynamic user equilibrium problem. Transp. Sci. 34, 402–414.

20