Updating origin–destination matrices with aggregated data of GPS traces

Updating origin–destination matrices with aggregated data of GPS traces

Transportation Research Part C 69 (2016) 291–312 Contents lists available at ScienceDirect Transportation Research Part C journal homepage: www.else...

4MB Sizes 0 Downloads 67 Views

Transportation Research Part C 69 (2016) 291–312

Contents lists available at ScienceDirect

Transportation Research Part C journal homepage: www.elsevier.com/locate/trc

Updating origin–destination matrices with aggregated data of GPS traces Qian Ge ⇑, Daisuke Fukuda Department of Civil Engineering, Tokyo Institute of Technology, 2-12-1 O-okayama, Meguro-ku 152-8552, Tokyo, Japan

a r t i c l e

i n f o

Article history: Received 17 August 2015 Received in revised form 30 May 2016 Accepted 2 June 2016

Keywords: Coarse-grained travel data Mobile phone data Travel survey Origin–destination trip matrix estimation Maximum entropy

a b s t r a c t The practice of estimating origin–destination (OD) demand usually requires large-scale travel surveys. To reduce the cost and time spent on surveys, individual trajectory data obtained from mobile devices has been used as an alternative dataset since the last two decades for OD estimation but also constrained in practice in some countries. To estimate OD matrices while protecting privacy, this study uses aggregated data of mobile phone traces to estimate work-related trips. The proposed approach is a sequential updater based on the maximum entropy principle. Trip production and attraction are firstly calculated by a non-linear programming problem followed by a matrix fitting problem to distribute trips to each OD pair. Numerical study shows that updated values are much closer to the synthesize real values than the referred ones. The case study in Tokyo further demonstrates that the proposed updating approach can track the change of travel pattern. Ó 2016 Published by Elsevier Ltd.

1. Introduction 1.1. Background Travel demand, represented in the form of origin–destination (OD) matrices, is indispensable while developing traffic management and control schemes. In the past several decades, new methodologies for estimating OD matrices have been developed along with the availability of travel data. State-of-the-practice in travel demand estimation usually derives OD matrices through data obtained from large-scale travel surveys, while later development in this field has used a variety of new data sources to reduce the cost of data collection and make timely estimation. Since the 1950s, the four-step model has become mainstream for transportation planning (e.g. Weiner, 1992). OD matrices are generated from trip generation and distribution models using surveys on travel behavior. Later on, the relationship between route choice and OD travel demand was observed by researchers, and OD estimation became an inverse problem of the assignment matrix and traffic counts on transport links. Optimization and statistical models were formulated with static/dynamic route assignment assumptions. Though the objective function is to minimize the difference between the estimated and target OD matrices as well as the assigned link flows and observations, the underlying assumptions on route assignment and the divergence between estimated values and observations can be different in models. These problems lead to discrepancies between estimation results.

⇑ Corresponding author. E-mail addresses: [email protected] (Q. Ge), [email protected] (D. Fukuda). http://dx.doi.org/10.1016/j.trc.2016.06.002 0968-090X/Ó 2016 Published by Elsevier Ltd.

292

Q. Ge, D. Fukuda / Transportation Research Part C 69 (2016) 291–312

In practice, household travel surveys are conducted in many countries. For example, the Person Trip survey is conducted every 10 years in large metropolitan areas of Japan. Usually, the sampling rate of travel survey tends to be quite low. Thus, sampling error cannot be neglected (Santos et al., 2011). Besides, there also exist non-sampling errors that bring considerable mistake in records (Groves, 2006). People tend to filter certain types of trips for reasons such as forgetfulness, soft non-response and rounding up time in surveys (Axhausen, 2014). Meanwhile, the dataset soon becomes outdated because a considerable amount of time is necessary for coding and pre-processing data. The interval between two consecutive surveys is extremely long, thus it is quite difficult to derive reliable estimates of travel demand during intermediate years. The pervasive mobile device has become an alternative for travel surveys, especially for the past two decades. It can satisfy the requirement of detailed personal location data to know individual’s travel behavior. There exist quite a few papers on developing tracking method (Asakura and Hato, 2004), field test (Murakami and Wagner, 1999) and using GPS data for helping travel survey (Wargelin et al., 2012). Data collection schemes have been meticulously designed for these datasets. Precision and spatial–temporal coverage of these fine-grained datasets are also very desirable but with considerable cost. GPS data can collect multiple day trips, handle short trips which is always missing in self-reported surveys, has good data quality on travel start and end times, total trip times and destination locations. But is still not perfect due to satellite signal loss and the error induced by GPS system especially when the frequency of recording is low (Murakami and Wagner, 1999). Furthermore, individual location data has also aroused public concern about the invasion of privacy. Therefore, data in coarse granularity, or coarse-grained datasets become promising alternative data sources for OD estimation. In summary, although the availability of data on travel demand has improved along with technological development in the past several decades, cost and privacy issues impose progressively more severe constraints. There is still a great gap between practice and art. In addition, individual location data are not available in most countries because of privacy and cost issues. This study is dedicated to developing a calibration method for OD matrices using low cost mobile phone datasets that do not invade privacy, i.e., aggregated mobile phone data. 1.2. Literature review The problem of OD travel demand estimation has been studied extensively in the past several decades. Considerable number of studies in this line focus on the inverse problem of traffic assignment and OD demand on using link counts. These models usually estimate OD matrices by two sub-models: determining OD trip flows (or path flow) by the upper level model, and determining the most probable route choice of travelers given the traffic counts by the lower level model. These two problems are solved iteratively (e.g. Yang et al., 1992) or relaxed to be a decoupled form (e.g. Nie et al., 2005). Studies on OD estimation problem have developed from estimating OD trips in a static scenario to a dynamics scenario within a day, and then further to incorporating different data sources to obtain filtered calibration of the OD matrices (e.g. Zhou and Mahmassani, 2007). Effective methods are also developed to reduce computation load (Cipriani et al., 2011). For the master problem of OD estimation, a variety of assumptions are made to measure the divergence between the estimated OD demand and the reference one. Maximum entropy (information minimization) principle is firstly used to distribute OD demand by the trip attraction and trip production (Wilson, 1971). Later, constraints on the traffic counts and assignment map are added to this modeling framework (e.g. Van Zuylen and Willumsen, 1980) enabling it to deal with OD estimation in a more modern structure of mathematical programming with equilibrium constraint (MPEC) (Yang et al., 1992). Recently, the maximum entropy method is applied to subnetworks to estimate elastic travel demand (Xie et al., 2011). Constrained Generalized Least Square (CGLS) are also common assumptions in estimating OD demand (Hendrickson and McNeil, 1984; Bell, 1991). The objective function of CGLS is flexible and thus can be used adapted to new datasets, for instance, Zhou and Mahmassani estimate OD demand with automatic vehicle identification data (AVI) by this structure. The CGLS can be further relaxed to one-level estimators (Nie et al., 2005). Bayesian statistical model is a nature choice when studying the OD estimation problem from statistical perspective. One may treat the link count (Maher, 1983), link choice (Lo et al., 1996), routing choice (Parry and Hazelton, 2012) or other observations of traffic flow such as plating scanning as random variables (Castillo et al., 2013) to form a statistical model for OD estimation. The diversity of underlying assumptions in these models will also admit different estimations for one problem. Advances in data collection methods, especially ubiquitous data have helped to relieve the problems. Automated vehicle identification (Zhou and Mahmassani, 2006), probe vehicle (Cao et al., 2013), billing data (White and Wells, 2002) and GPS trajectory data (e.g. Pan et al., 2006) are used for estimating OD demand through either simulation or field tests to name a few. Despite the relative accuracy and soundness of these fine-grained data sets, the installation and maintenance cost of GPS devices and license plate readers can be expensive. These are the driving motivation for using passive data. Coarse-grained data are datasets that have coarse granularity. These data sets are usually collected over long intervals and have low precision for positioning. One source of coarse-grained data is passive data, i.e., datasets from surveys or records that are not intentionally designed for travel behavior studies. A pioneer study of coarse-grained mobile phone data is by Gonzalez et al. (2008), who investigated the mobility pattern of people using call dial record (CDR) data. In Gonzalez et al.’s study, the dataset records each person’s location when he/she sends a message, makes a phone call or connects to the Internet, all activities that use a local cell tower. Similar dataset has also been used to estimate OD matrices based on the fact that mobile phone moving along a route always tends to change the base stations near one’s position (Caceres et al., 2007). Using mobile phone data, several rules of thumb are established to infer each individual’s location from one’s

Q. Ge, D. Fukuda / Transportation Research Part C 69 (2016) 291–312

293

activities by clustering (Calabrese et al., 2011, 2013). The origin and destination of a trip can be firstly inferred by comparing the dwelling time and expected travel time between two data entries (Wang et al. (2013)). Similar rule can be used to understand the mode of each trip when the threshold of the mean travel time between an OD pair is known (Zhang (2012)). Tower-to-tower OD matrix can then be converted to node-to-node OD matrix (Iqbal et al., 2014). Järv et al. (2014) uses multiple day observations of CDR data to understand people’s temporal activity pattern. Although CDR data seems to be promising for estimating OD demand, several issues have become obstacles in using this dataset in practice. People may not always use their phone when they start a trip or arrive at their destination. Thus, the rule for deriving an OD pair may not always hold. Furthermore, the frequency of mobile phone usage may be significantly different across individuals and hence the estimated OD matrices would possibly overestimate the travel patterns of active phone users. Aggregate mobile phone data, on the other hand, usually records the number of people who are currently located in each activity location (e.g. at the workplace, at home or midway) of some defined area. The data are then aggregated to either cells or meshes and can be accessed at low cost and high frequency. The number of trips will not be filtered and individuals privacy can be protected to some extent. From reviews of the literature, we may deduce that aggregated mobile phone data provide a promising alternative for estimating travel demand, particularly OD matrices. As far as the authors know, a few studies have used aggregated mobile phone data for estimating travel demand. For example, Caceres et al. (2013) has shown that it is possible to adopt a bi-level estimator (e.g. Yang et al., 1992) to derive the OD matrix if viewing each aggregated area as a combination of links. Because an area for aggregating mobile phone data usually consists of several links, the number of observations will be far smaller than the variables estimated. The identification problem can be more severe. Moreover, pre-processing is still necessary to distinguish the number of vehicular trips from individuals data for the sub-model of route assignment. Fig. 1 summarizes the history of different data sources used for OD estimation. In Table 1, we compare the advantages and disadvantages of different data sources. To deal with the problems of previous types of dataset, we used the current aggregate mobile phone data to update OD matrices that were originally extracted from a standard travel survey. 1.3. Research objective This study is motivated by existing issues on household travel surveys and mobile phone data. We consider the OD estimation problem within the classical travel demand forecasting framework and develop a method for estimating OD matrices of work-related trips using coarse-grained aggregated mobile phone data. Though this dataset seems to be able to capture the dynamic travel demand among zones, this paper focuses on estimating static OD demand. Since the inter-zonal flux can only be known with a dynamic network loading module. Due to the limitation of traffic flow information, propagation of vehicles and pedestrians in traffic network is unknown. Therefore, the dynamic assignment map is not available and considerable mistake will be produced if the time-of-day dynamics of travel pattern is neglected in the estimation procedure. Thus, we concentrate on developing an updating method for static OD demand. We focus on work-related trips of individuals due to the limitation of dataset. In Tokyo, work-related trips are the main contributors to travel demand on weekdays. Commuters are the main users of railway, especially during peak hours (MLIT, 2015). Besides, understanding the change of work-related travel demand is important to know the urban development. These are main reasons for focusing on calibrating the OD demand of work related trips in this study. This study estimates the OD demand by two sub-models. The submodel 1 estimates time-of-day trip production and attraction of each zone with reference values from traffic survey. Though it seems like a dynamic model because it estimates time-of-day trip production and attraction, it is a static model since we do not consider the dynamic assignment in network. The outputs of this submodel are aggregated and then used as input for the submodel 2 in which trips are distributed to origins and destinations. As discussed in the previous section, there are various methods and assumptions available for modeling the OD estimation problem. In this article, the objective of submodel 1 is to minimize the divergence of estimated trip attraction and trip production of each zone and the reference values from travel survey. Maximum entropy, GLS and statistical methods are common options. The objective of submodel 2 is to distribute trips to OD pair. Gravity model, maximum entropy and other spatial interaction models are choices of this subproblem. In submodel 1, we deal with a problem that how to distribute time-of-day trip production and attraction to each hour with referred historical value given the mobile phone data but no information about the propagation of people and vehicle. In submodel 2, the distribution of OD demand given the trip production and attraction is considered. To make the trip distribution in the time–space regime consistent, we have to develop submodel 1 and 2 based on the same underlying assumptions. Among all referred models for OD estimation, maximum entropy is the only spatial interaction model that satisfies this requirement. In summary, this study demonstrates the possibility of a novel method to update work-related OD ftrip matrices with aggregated mobile phone data and we present the following specific objectives:  Discover the relationship between aggregated population data obtained from mobile phones and relevant trip production/attraction, and then calibrate the updated values of trip production/attraction combining the recent mobile phone data with historical travel survey data;

294

Q. Ge, D. Fukuda / Transportation Research Part C 69 (2016) 291–312

Fig. 1. Advances in data sources for OD estimation.

 Propose a novel approach that calculates trip flows of each OD pair using two sequential sub-models based on the maximum entropy principle and then analyze the computational complexity of the proposed method;  Verify the performance of proposed methodology through a small numerical example and then conduct a large-scale case study in Tokyo. 1.4. Organization of the paper The remainder of this paper is organized as follows. Section 2 will briefly depict the data collection scheme and properties of the dataset. The maximum-entropy principle-based estimation method and solution algorithm will be described in Section 3 in detail. Results of the case study in Tokyo will be demonstrated in Section 4. Conclusions and remarks on this study are summarized in the last section. 2. Data 2.1. OD matrices from a person trip survey Because this study is data-driven, the research settings will affect the methods and results. As we described in the previous section, two data sets were adopted in this study, i.e., an historical large-scale travel survey in which the initial or the prior values of the OD matrices can be elicited, and coarse-grained mobile phone data. In this study, the special wards

Table 1 Comparison of different data sources. Data collection techniques

Cost

Accuracy

Sampling size

Frequency

Privacy issues

Household survey Fixed traffic sensor Mobile traffic sensor GPS-equipped mobile phone (individual) Call, dial record (individual) GPS-equipped mobile phone (aggregated)

High High High High Low Low

High Medium High Medium Medium Medium

Small Large Small By case Large Large

Low High Low High High High

Yes No Yes Yes Yes No

Q. Ge, D. Fukuda / Transportation Research Part C 69 (2016) 291–312

295

of the Tokyo Metropolis were selected as the study area, consisting of 23 municipalities, which is the most populous region 2

in Tokyo. There are over 9 million people residing in an area of about 622 km . The Person-Trip (PT) survey has been conducted by the committee of transportation planning of the Tokyo Metropolitan Area every 10 years since the 1970s. It covers the municipalities and cities of Tokyo as well as several neighboring prefectures such as Kanagawa, Chiba, Saitama and Ibaraki. In the latest 2008 PT survey, the effective response rate was 24% and the sampling rate was 2% (340,000 questionnaires out of 16 million residents). The questionnaire for individuals included questions such as basic personal and household attributes and details of all their trips on a weekday. Individuals trips were collected through the home-based survey questionnaire to obtain the information such as mode, purpose and time-of-day. The most recent large-scale urban travel survey in Tokyo is the 5th PT Survey in October of 2008 conducted by the council of urban transportation planning for the Tokyo Metropolitan Area. After the survey results were scaled up, the aggregated OD table of work trips for the base year 2008 were provided by the council. The OD matrices obtained from this survey became the source of reference values for the sub-models that will be discussed in the next section. PT survey is one of the most important large-scale travel surveys in Japan. But it has shortcomings as we discussed in Section 1, i.e., sampling error and non-sampling error, long interval and huge expense. Besides, the response rate of survey is also decreasing sharply in recent years. Thus, out-to-date data should be incorporated with historical travel surveys to satisfy the requirement of OD demand in the intermediate years. 2.2. Aggregated mobile phone data (congestion statistics data) The aggregated mobile phone data mainly used for our analysis is called ‘‘Congestion Statistics Data”. The data were constantly collected by the Zenrin Corporation Inc., one of the major geographical information and map service providers in Japan. The prime use of the Congestion Statistics Data is to infer and monitor the population change in areas of Japan with spatially and temporally higher resolution. The sampling size of this dataset is about 500,000–700,000 all around Japan, among which 30% of sampled individuals dwelling in the metropolitan area of Tokyo. In the raw data, individual’s trace is recorded about every 5 min. The location of person is derived from the raw data by registered information of each person. To protect privacy of individuals, the data provider aggregated the raw data hourly for geographical areas (meshes) for several groups of people. A mesh is a grid cell with an area of 500  500 m2 . For each zone, there are two streams of trip flows contributing to the temporal change in the population of the zone, i.e., the accumulated trips of arrival and departure, respectively. However, the trip production and attraction cannot be directly estimated from the number of trips that arrive at or leave the zone unless the OD can be identified. In Japan, telecom operators hold the location of home address and the office of each phone user in their registration information. With the registration information and one’s GPS traces, A person’s location can be specified into home, office or other places as illustrated in Fig. 2. t0  t 4 are the timing of data recording and the white dots in the figures recorded locations. The blue lines are the time–space trajectories for the individual. The locations of office and home of mobile phone users are pre-registered, so that they can be specified for each individual (as in (b)). It may not be possible to capture other travel patterns (as in (a)). Because the dataset was collected from a sample of GPS-equipped mobile phone users, it had to be scaled up. The scaling rate for mesh in a zone was:



1 psg

ð1Þ

where R is the scaling rate, p is the penetration rate of smart phones, s is the market share of the mobile network operator, and g is the sampling rate. In fact, this scaling step is conducted by the provider of dataset due to commercial reasons. This scheme may introduces sampling error to the dataset. But, several facts help to relieve this problem. (1) The penetration rate of smart phone is high in Japan. (2) The coverage of smart phone conforms with the employment rate to some extent. Another problem of this dataset is GPS signal loss. This problem can cause underestimation of short time business trip due the trip chain of people’s travel pattern. But it doesn’t affect the estimated result by large. By the 2008 PT survey, over 90% of workforce in Tokyo follow a ‘Home–Work–Home’ travel pattern. However, it should be considered when applying similar method to other areas where peoples’ travel patterns are different. For occasions in which high-resolution estimation is required, the frequency of recording should be increase and GSM could be incorporated with GPS to generate more reliable dataset. We selected an normal day (Wednesday October 30th, 2012) to conduct case study, after carefully checking there were no extraordinary events or incidents that might affect the regular travel pattern, and this day was regarded as a typical weekday and counterpart to the day in Autumn when the 2008 PT survey was conducted. Individual mobile phone data were pre-processed and aggregated before being used, as shown in Fig. 3. As stated in Section 1.3, people were grouped into the following three categories for each mesh and then the total number of people hourly was recorded.  People at home during the time of recording (at-home group);  People in their workplace during the time of recording (at-work group); and  People neither at home nor in their workplace at the time of recording (floating group).

296

Q. Ge, D. Fukuda / Transportation Research Part C 69 (2016) 291–312

Fig. 2. Two travel patterns in relation to the observations using the coarse-grained mobile phone data.

Fig. 3. Procedure of data pre-processing of the Congestion Statistics.

We mainly used the data of the second group for our analysis. The interval for data aggregation was 1 h. Each item in the aggregated mobile phone dataset was characterized by maximum and minimum values of longitude and latitude in the mesh, a time stamp to indicate the time slice when the data were recorded and the aggregated number of persons (i.e. populations) in each group for each mesh (Table 2). Fig. 5a and b shows the distribution of the mesh population of the at-work group in the Congestion Statistics Data (see Table 2) To calibrate the model in Section 3, we needed zonal mobile phone data as input. However, the provided data of the Congestion Statistics were mesh-based. As a prerequisite, geo-processing procedure was necessary to convert data to zone-level statistics. In this study, it was assumed that people were uniformly distributed within each mesh. Geographic Information System (GIS) tools were adopted to complete this procedure. Thus, the zonal population was given by:

Pm ¼

X Pc c2c

Sc

 Sm\c ;

8m 2 M;

ð2Þ

where m is a zone, c is a mesh, M is the set of zones, C is the set of meshes, Pm is the population of zone m; P c is the population of mesh c; Sc is the area of mesh c, and Sm\c is the area that zone m intersects mesh c. This conversion scheme is valid when the following condition are satisfied: (a) the distribution of road network and job opportunities is continuous in urban network, (b) users preference to mobile phone company is unrelated to location, (c) the size of each mesh is small enough. The first condition always holds in mega-cities and is a fundamental assumption of continuum network modeling in transportation. Different assumptions on the distribution of people can be made based on the geographic feature of study area. The third condition is also valid in reality. Since the size of each mesh is just 500 ⁄ 500 m2, there will be over 40 meshes even for the smallest zone (for instance, Chiyoda has an area of 10.21 km2). The second condition holds in our case since NTT docomo is the largest mobile operator in Japan. But this condition is not always true. We should take the choice of mobile operator and distribution of people into consideration. When social-demographic characteristics of mobile phone users are available, a discrete choice model may be applied to estimate the relationship between the choice of mobile operator and individual’s personal trait, such as age, income and dwelling region. The estimated parameter can then be used to calibrate the market share of each mobile operator in each mesh. However, this procedure is too complicated and not realistic in reality. A much simper way is to gather from multiple mobile operators directly.

297

Q. Ge, D. Fukuda / Transportation Research Part C 69 (2016) 291–312 Table 2 Sample format of congestion statistics data. Mesh ID

Min. latitude

Min. longitude

Max. latitude

Max. longitude

Time slice

At-home

At-work

Floating

533925934 533925743 533925841 533925843 533925941 533925744

128085000 128025000 128040000 128055000 128070000 128025000

502807500 502830000 502830000 502830000 502830000 502852500

128100000 128040000 128055000 128070000 128085000 128040000

502830000 502852500 502852500 502852500 502852500 502875000

00–01 00–01 00–01 00–01 00–01 00–01

2190 982 401 1160 208 611

0 0 0 193 0 209

2124 1434 0 3283 5021 4055

Table 3 Notations. M

Set of zones in the study area

T Dm Q tm rQ tm Atm et A

Number of time slices during a day (=24) Number of residence in zone m Number of working people in zone m at time slice t t1 Change in the number of working people in zone m at time slice t; rQ tm ¼ Q tm  Q m Number of people arriving at zone m for work purposes at time slice t Reference value for the number of people arriving at zone m for work purposes at time slice t Number of people leaving their workplaces m at time slice t Reference value of the number of people leaving their workplaces m at time slice t Trip flows of work related trips from zone m at time slice t Reference value for the trip flows of work related trips from zone m at time slice t P Trip production of work related trips of zone m; pm ¼ Tt¼1 P tm P P Trip attraction of work related trips of zone m; am ¼ Tt¼1 Atm or am ¼ Tt Ltm Trip flow from zone i to zone j for work purpose Reference value for the trip flow from zone i to zone j for work purpose Number of transfer trips in zone m in time t using the railway Reference value for number of transfer trips in zone m at time slice t Capacity of the railway in zone m. Mode share by the railway of trip productions in zone m Mode share by the railway of trip attractions in zone m Average number of trips made by each working person in zone m Ratio of transfer trips to the sum of trip outflow and production in zone m A positive arbitrary value that measures the importance of trip attraction and production in zone m

m

Ltm e L tm P tm et P m pm am xij ~ xij N tm et N m Cm

am bm

cm dm

lm

Comparison of the zone and mesh maps of Tokyo is shown in Fig. 4. It is observed that three inner zones (Chiyoda, Chuo and Minato) intersect with tens of meshes and other large zones can intersect with more meshes. After finishing the geo-processing procedure described above, we could see the population distribution of working people in each zone. The number of people in their workplaces in each ward is shown in Fig. 6a and b. Specifically, the temporal (hourly) variations of the working population in Chiyoda, Minato and Chuo wards are shown in Fig. 7a, b and c, respectively. We observed that though the exact number may vary across each zone, the variation in people’s pattern of staying at their workplace would be similar. 3. Methodology 3.1. Overview of the methodology To estimate the work related travel demands, we first determine how the aggregated mobile phone dataset related to trips. There are two streams of trips contributing to the change in each area, i.e., (a) inflows (b) outflows. We were able to establish the constraints of trip production and inflow/outflow of trip attraction from the mobile phone data and PT survey. Therefore, we could estimate the trip production and attraction for each zone. Sub-model 1 addresses this problem. With the prior OD matrices and estimated trip production/attraction from Sub-model 1 as input, we could further derive the travel demand (i.e. updated OD matrix) using Sub-model 2. The overall idea is illustrated in Fig. 8. 3.2. Notation The following notations (see Table 3) were used for formulating the problem. t t t et ; e et Atm ; Ltm ; Ptm are decision variables in submodel 1, A m L m ; P m are aggregated from PT survey, Dm ; Q m ; rQ m are calculated t e from the mobile phone data, a; b; c are estimated from PT survey, C m is from a railways, N is an intermediate variable, m

and can be estimated from a survey on the railway. N tm are estimated by past surveys on railway use. xij are decision variables in the submodel 2. ~ xij are from the PT survey, am ; pm are aggregate from the output of submodel 1.

298

Q. Ge, D. Fukuda / Transportation Research Part C 69 (2016) 291–312

(a) meshes

(b) wards(zones) Fig. 4. Map of the 23 special wards of Tokyo by (a) mesh and (b) ward.

3.3. Sub-model 1: Estimation of trip production and attraction Consider a zone m and assume that we were interested in the inflows and outflows of trip attraction in each time interval t, denoted as {Atm }({Ltm }). Similar to Wilson (1970) and Van Zuylen and Willumsen (1980), we found the probability that the pattern of trip flows is proportional to the number of ways that trips can be arranged to obtain this pattern. Using combinatorial theory, the total number of ways is the multiplication of all possible combinations:

Q. Ge, D. Fukuda / Transportation Research Part C 69 (2016) 291–312

Fig. 5. Distribution of working people in each mesh at 4 a.m. (a) and 1 p.m. (b).

299

300

Q. Ge, D. Fukuda / Transportation Research Part C 69 (2016) 291–312

Fig. 6. Distribution of working people in each ward at 4 a.m. (a) and 1 p.m. (b).

Q. Ge, D. Fukuda / Transportation Research Part C 69 (2016) 291–312

301

(a)

(b)

(c) Fig. 7. Population change of at-work group in Chuo (a), Chiyoda (b) and Minato (c).

 Wa ¼

am A1m

 

am  A1m A2m

!  ... 

am  A1m  A2m  . . .  AT1 m ATm

!

am ! ¼ QT t t¼1 Am !

ð3Þ

Therefore, we could find the most probable pattern by maximizing W a . Taking the logarithm and approximating it using P Stirling’s formula, the objective function Eq. (3) becomes  Tt Atm ðlog Atm  1Þ because am is a constant in this formula and we can omit it in the objective function. The previous procedures can be applied to both arrival and departure of trip attraction or production. Furthermore, the total of inflows and outflows of trip attraction (production) should be equal for each

302

Q. Ge, D. Fukuda / Transportation Research Part C 69 (2016) 291–312

Fig. 8. Overview of the methodology.

     t t Atm log Am  1 þ Ltm log Lmt  1 if we consider both eA tm eL m e t and e the arrival and departure trips, and incorporate the prior information: A L tm . The most probable pattern of trip m   t PT t Pm production takes the form of t¼1 P m log et  1 . Considering the availability of datasets, only the outflow direction of Pm produced trips in zone m can be constrained. Thus we establish an objective function with respect to both trip production and attraction for each zone m as shown in the following: [Problem 1]

day. The most probable pattern of trip attractions then become

PT

t¼1

( ! !) ! T T   X X Atm Ltm P tm t t t t t t minimize W 1 fAm ; Lm ; P m gt¼1;...;T ¼ Am log  1 þ Lm log 1 þ lm P m log 1 et e et fAtm ;Ltm ;P tm gt¼1;...;T L tm P A t¼1 t¼1 m m

ð4Þ

subject to:

Atm  Ltm ¼ rQ tm ;

t ¼ 1; 2; . . . ; T

ð1 þ dm ÞðLtm  am þ Ptm  bm Þ 6 C m ; T X Ptm 6 cm Dm t¼1 Atm P 0; Ltm P 0; Ptm P 0;

ð5aÞ t ¼ 1; 2; . . . ; T

ð5bÞ ð5cÞ

t ¼ 1; 2; . . . ; T

ð5dÞ

t ¼ 1; 2; . . . ; T

ð5eÞ

t ¼ 1; 2; . . . ; T

ð5fÞ

This set of constraints Eqs. (5a)–(5f) is derived from the aggregated mobile phone data and the PT survey and can be interpreted as follows: 1. Eq. (5a) represent the population change constraints. There is a conservation of trip flow exits and enters the mesh. The difference of trips go out and arrive in a mesh during a time unit should equal to the change of people accumulated in this Rt s R R s t t1 mesh. For each zone and mesh, the following conservation law holds, s ðf out  f in Þds ¼ x y ðkxy  kxy Þdxdy where f in and

f out are flow in and out and kxy is the density in location ðx; yÞ. Discretizing the previous equation by one hour and applying

it to work-related trips, we have F tin  F tout ¼ Q t  Q t1 ¼ rQ t , where F tin ¼ Atm is the trip flow arrive at the zone, F tout ¼ Ltm is the trip flow go out the zone. They are decision variables. Q t and Q t1 are number of working persons in the zone in time t and t  1, which can be estimated by GIS tools from the dataset by Eq. (2). rQ t can be viewed as control variable in this formulation since Q t and Q t1 are calculated by PostGIS. 2. Inequalities (5b) are the constraints on the capacity of railway in zone m. This will be explained in more detail later. 3. Inequality (5c) stands for the trip production constraint of zone m. The total work trip from zone m should be less than the trips conducted by the people in zone m. 4. Inequalities (5d)–(5f) are the non-negativity constraints for inflow and outflow of trip attraction and trip production, respectively. We now derive the inequalities (5b). Travelers related to zone m during a time interval t consist of those who arrive at, leave from or pass through that zone, that is, inflow, outflow, or transfer trips respectively. In a congested transport network,

Q. Ge, D. Fukuda / Transportation Research Part C 69 (2016) 291–312

303

the distribution of trip flow is usually dealt with as a mode choice and transit/traffic assignment problem. However, extensive data requirements and high computation loads make this discipline impractical for the problem considered in this paper. For simplification, we therefore propose an alternative constraint on railway capacity noting that work trips in Tokyo are highly dependent on railway-based trips. We assume that for each zone the capacity of all stations is the upper bound of the sum of trip production, trip attraction (outflow) and transfer trips. The following constraint must then be satisfied:

Ltm  bm þ P tm  am þ Ntm 6 C m ; 8m 2 M

ð6Þ

Parameters such as am ; bm can be estimated using a mode choice model, or are straightforwardly computed with historical travel survey data. N tm can be estimated by a transit assignment model or be approximated by a static assignment map. In this study, we empirically calculate am s, bm s and N tm from the PT survey. Meanwhile, dm can be estimated from past surveys:

dm ¼

et N m ; e t  am e t L m  bm þ P m

8m 2 M

ð7Þ

to obtain the inequalities (5b). We obtained the arrival (inflow) and departure (outflow) of working trips destined to each zone and trips originating from each zone at each time interval by solving Problem 1. The total inflow (or outflow) will be the work-based trip attractions of zone m and the total time-of-day trip production will be the work-based trip production of zone m. 3.4. Properties of sub-model 1 We validate the previous formulation through a numerical case as in Table 4. There are 24 measurement intervals in a normal day. Three rush hours are on the morning and three are on the afternoon. True values of trip production and attraction are generated firstly by Gumbel distribution with location parameter of 50 and scaling parameter of 1000. Then the population change is get from the real value. White noises follow normal distribution Nð0; 100Þ are then added to get the reference values. The mode share of railway is fixed in this numerical case to be 1 for simplicity. The capacity of railway is constrained to be 2000 in rush hours, 500 in the night and 1000 in other time period. The constraint on total trip production is limited to 12,000. Fixing the dispersion parameter lm to 1, we may compare the difference between real values, reference values and estimated values as in Fig. 9. Intuitively, we may find the estimated values are closer to real value. Then we have done a bunch of test on the dispersion parameter lm in the interval ½0; 5Þ. The RMSE (root-mean-square error) and MAPE (mean absolute percentage error) of estimated values and true values are summarized in Fig. 10 (the red lines in each figure is the RMSE/MAPE of reference and true values). The choice of lm would affect estimation results and we can view it as a resource allocation problem. The capacity of each zone was allocated to the trips of individuals who leave their workplaces in this zone and the trips from their residence in this zone to other zones. Now we illustrate two extreme cases: (a) lm ¼ 0 meaning that the trip production side was not bounded in the optimization problem, trips to this zone would have the priority on the capacity; (b) lm ¼ 1 meaning that trips from local residence would have priority of transportation modes. But this two extreme case won’t admit to optimal RMSE. To take the numerical case for example, optimal choice of the dispersion parameter should be some value around 1.1. In the following real case study in Tokyo, we will use 1 in the submodel 1 since it can quickly produce a suboptimal RMSE. Another practical concern is that the total trip productions and attractions of all zones may not match in the above problem. In practice of the 4-step model, usually the trip attraction side is scaled by the trip production side since trip production is believed to have higher accuracy (de Dios Ortuzar and Willumsen, 2011). In this case, we match the total trip production and attraction in a opposite fashion since the trip attraction side is directly collected by the mobile phone data while trip production side is estimated by a weak constraint on the total railway capacity of each ward. 3.5. Some simple forms of Problem 1 We also found several simplified variants of Problem 1. Except for inequalities (5c), all other constraints were considered for each time interval t in the simplified problem. For this, we obtained the following: [Problem 2]

minimize W 2 ðAtm ; Ltm ; P tm Þ ¼ Atm log t Am ;Ltm ;P tm

! ! ! Atm Lt Pt  1 þ Ltm log m  1 þ lm Ptm log m  1 et e et L tm P A m m

ð8Þ

subject to the constraints Eqs. (5a), (5b), (5d), (5e) and (5f) for the time slice t. Now there are only three variables and five constraints for this simplified problem.

304

Q. Ge, D. Fukuda / Transportation Research Part C 69 (2016) 291–312

Table 4 Basic settings of numerical case. Time of day

Target trip production

Target trip departure

Target trip arrival

Population change

Capacity

Trip production (real)

Trip departure (real)

Trip arrival (real)

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24

3 276 294 663 1890 2234 1859 1583 1306 381 675 53 122 79 200 704 105 130 388 454 348 3 175 3

6 8 2 2 214 309 276 527 785 202 514 520 327 381 637 1451 1914 1902 1914 1749 833 544 256 5

7 7 522 967 602 1883 1654 1439 998 938 936 759 658 876 567 617 588 469 411 156 266 2 2 6

1 1 449 557 686 1702 1558 1021 814 757 551 448 198 106 32 725 1422 1689 1630 1540 755 597 458 1

500 500 500 500 2000 2000 2000 2000 2000 1000 1000 1000 1000 1000 1000 2000 2000 2000 2000 2000 1000 1000 1000 1000

2 345 404 380 1782 1817 1732 1680 1092 321 288 254 242 99 316 350 121 63 118 336 174 2 2 2

2 3 3 118 180 181 237 281 284 302 318 359 554 579 583 1245 1813 1936 1877 1661 805 632 460 2

3 2 452 675 866 1883 1795 1302 1098 1059 869 807 752 685 551 520 391 247 247 121 50 35 2 3

The further simplified setting is to find the most probable arrival trips and departure trips in each time interval t and for each zone m. In other words, we merely consider the trip attraction constraint in this problem and we then have: [Problem 3]

minimize t Am ;Ltm

W 3 ðAtm ; Ltm Þ

¼

Atm

! ! Atm Ltm t log  1 þ Lm log 1 et e L tm A m

ð9Þ

constrained by Eq. (5a). Problems 1 and 2 can be solved by the sequential quadratic programming (SQP) method and Problem 3 can be solved by Appendix A. With Problem 2 and 3, we can test if there are abnormal data in the aggregate dataset. For instance, if there is negative solution to Problem 2 or 3, we can find the measurement in this time interval is an out-lier and try to fix it. Problems 2 and 3 can also help to customize decomposition methods for Problem 1 when the time intervals are too many (one possible way is shown in Appendix B). 3.6. Sub-model 2: OD matrix fitting Running the model in the previous subsections (Sub-model 1), we obtain estimates of trip production and attraction of P work trips for each zone. We can then generate the trip production and attraction of each zone in the day by pm ¼ Tt1 Ptm PT t PT t and am ¼ t1 Am ¼ t1 Lm . Then, the OD matrices can be updated using spatial interaction models such as the gravity model and entropy maximization model (Sub-model 1). To be consistent with the assumptions in the previous subsections, we have: [Problem 4]

minimize W 4 ðxÞ ¼ x

 X X  xij xij log  1 ~xij i2M j2M

ð10Þ

following Wilson (1970), which is constrained by:

X xmj ¼ pm ;

m2M

ð11aÞ

m2M

ð11bÞ

j2M

X xim ¼ am ; i2M

where k is a zone, xkj the trip flows from zone k to zone j and x denotes the set of OD trips for all possible OD pairs in M. It is possible to solve this problem on the basis of the duality theory (e.g. Boyd and Vandenberghe, 2004). Using the xij , where ki andkj are Lagrangian multipliers. Lagrangian multiplication method, we derive the optimal solution that xij ¼ eki þkj ~

Q. Ge, D. Fukuda / Transportation Research Part C 69 (2016) 291–312

305

Fig. 9. The true value, reference value and estimated value of trip production (a), departure (b) and arrival (c).

Substituting xij back into the constraints, we elicit a dual problem for ks. The dual problem can be solved by a variety of algorithms, such as generalized iterative scaling (GIS), improved iterative scaling (IIS), or gradient-based algorithms. When the observation of path cost of automobile and transit are available, we can make a model that able to incorporate modal split, trip distribution and OD estimation. In this OD estimation model, path flow estimation and trip distribution are the objectives and modal split are constraints. Traffic and transit assignment map (e.g. Teng and Liu (2015)) can be incorporated in the objective function. But due to the limitation of datasets such as link cost and transit fare, link count and number of transit users, we consider the OD estimation problem as a fitting problem here.

306

Q. Ge, D. Fukuda / Transportation Research Part C 69 (2016) 291–312

Fig. 10. The RMSE (a) and MAPE (b) of estimated and real values.

3.7. Solution of sub-model 2 Each zone (m) corresponds to an independent problem in Sub-model 1 and there are then 3T variables in each problem. Because the objective function is convex, the rate of convergence of the SQP algorithm will be super-linear. That means, the submodel 1 for each zone can be solved very quickly. We described several algorithms for solving the problem of Sub-model 2 in the last subsection. We now compare their efficiency. The time complexity of Newton’s method is OðM 2 Þ so it is a poor choice for large scale problems. A much more efficient and convenient alternative is Iterative Proportional Fitting (IPF) noting that IPF can converge to the maximum entropy (Lozano, 2006). The time complexity of IPF is OðMÞ. For large-scale problems, the computational load could be further reduced by iterative scaling method such as GIS and IIS, because IPF scales estimated values up by each clique alternatively, whereas the latter two methods update estimates using all cliques at the same time in each iteration. However, the total number of iterations may be different. Hence, IPF is still a preferable choice in practice because of its simplicity and effectiveness. 4. A case study We conducted a case study for the metropolitan area of Tokyo. There are 23 wards (districts) in Tokyo Metropolitan Prefecture, each of which matches a big zone in the PT survey. In this case study, we used these 23 zones as the unit for estimation. First, the mesh population data was converted to zonal data by Quantum GIS and PostGIS, which are open source GIS tools to visualize or manage a spatial database. With the mesh population data for 2012 obtained from the Congestion Statistics and the prior OD table from the PT survey in 2008 as inputs, we obtained the updated OD matrices for the year 2012 by running the models and the algorithms described in the previous section. 4.1. Results of sub-model 1 The reference values for trip production and attraction (inflow, outflow) were derived from the 2008 PT survey. Constraints on the population change were estimated from the 2012 aggregated mobile phone data. Parameters in the capacity constraints Eq. (5c) such as mode share, and capacity of each mode were calibrated in advance from the 2008 PT survey. The capacity of railway in each zone (C m ) was obtained from the Tokyo Metropolitan Area Public Transport Survey 2010 (MLIT, 2015) conducted by the Ministry of Land, Infrastructure, Transport and Tourism. In this case study, we further assumed the weights of each variable were the same in Sub-model 1, that is, we set lm ¼ 1; 8m 2 M. Figs. 11–13 show the changes in trip inflow and outflow within a day in the three wards of Tokyo central (i.e. Chiyoda, Chuo and Minato wards). It was observed that hourly inflows to each zone in 2008 and 2012 were mostly similar whereas outflows deviated considerably for some time periods. Several reasons may explain this phenomenon. (1) People’s schedules after work are usually more flexible than before work, which can be the result of day-to-day variability. (2) Measurement error of the PT survey. Because the sample size is limited and peoples’ responses to a daily travel patterns are not always correct, errors in the PT survey cannot be ignored. The mobile phone data might be more reliable in terms of the aggregation error. (3) There is another possible reason, that is, the promotion of downtown development since the late 2000s in Tokyo with some deregulation for city planning. In accordance with Urban Regeneration Special Act of 2001, eight districts and approximately 2500 hectares were designated in Tokyo central. In these special districts, Japanese City Planning Law permits

Q. Ge, D. Fukuda / Transportation Research Part C 69 (2016) 291–312

307

Fig. 11. Comparison of hourly trip inflow (a) and outflow (b) of Chiyoda Ward.

Fig. 12. Comparison of hourly trip inflow (a) and outflow (b) of Chuo Ward.

Fig. 13. Comparison of hourly trip inflow (a) and outflow (b) of Minato Ward.

exceptionally relaxed site use, the increase in Floor Area Ratio (FAR) enabled the developers to build higher buildings not only for business workplaces but also for residences in downtown Tokyo. In fact, many high-story buildings were built between late 2000s to and the present time. We believe that this recent policy changes in urban development in Tokyo might have affected change in the inflow/outflow pattern in many wards. Fig. 14a and b shows the comparison of the estimated daily trip attraction and production in 2012 (vertical axis) and their reference values in 2008 (horizontal axis) for all 23 wards. We calculated the cosine similarity (Singhal, 2001) and Pearson’s coefficient to measure the similarity of trip attraction and production in 2008 and 2012. The total trip production and attraction had not changed significantly in each zone. This finding was consistent with the general trend of travel demand in Tokyo. According to a report on the series of past PT surveys in Tokyo,1 the total number of trips had increased from the year 1

http://www.mlit.go.jp/common/000057539.pdf, retrieved December 06,2015.

308

Q. Ge, D. Fukuda / Transportation Research Part C 69 (2016) 291–312

(b)

(a)

Fig. 14. Changes in trip production (a) and attraction (b) for 23 wards in Tokyo.

(a)

(b) Fig. 15. Change of OD trips in number (a) and proportion (b).

Q. Ge, D. Fukuda / Transportation Research Part C 69 (2016) 291–312

309

Fig. 16. Change in daily trip flows (in numbers) for 23  23 OD pairs between 2008 and 2012.

Fig. 17. Change in daily trip flows (in proportions) for 23  23 OD pairs between 2008 and 2012.

1998 (the 4th PT survey) to 2008 (the 5th PT survey), which also happened in work-related trips. The increase was mainly attributed to senior citizens. Another phenomenon observed from the figures is that the trip attraction has increased in most wards, but decreased in Chiyoda and Minato. This result seemed contradictory because recent land use regulations should have a positive impact on the employment in the central wards. It is possible to attribute this inconsistency to error in PT survey. In

310

Q. Ge, D. Fukuda / Transportation Research Part C 69 (2016) 291–312

Table 5 Change of trips in several OD pair. OD pair

Trip flow (2008)

Trip flow (2012)

Difference (%)

Chiyoda–Chiyoda Minato–Minato Koto–Koto Kita–Kita Shinjuku–Shinjuku Sumida–Sumida

153,996 175,583 100,877 50,929 122,986 61,108

134,018 157,625 115,499 62,087 120,702 64,317

12.97 10.23 14.49 21.90 1.86 5.25

fact, several reports support this opinion. A recent report2 shows that the PT survey was inconsistent with several other travel surveys in Japan, for example, the survey on the travel pattern of railway users. This report also argued that trip production and attraction were over-estimated for Chiyoda and Minato in the PT survey (about 18% and 16%, respectively). After taking these errors into account, we found that trip attractions in Chiyoda and Minato were not abnormal. 4.2. Results of sub-model 2 Using the estimates of trip generation for each zone in 2012 shown in Section 4.1 as inputs, the fitting model for Sub-model 2 was able to generate an updated OD trip matrix. The calibration results revealed that the total work-related daily trips had changed approximately from 5.885 M to 6.143 M (one-way trips) during the timespan of the PT survey in 2008 and mobile phone data collection in 2012. A comparison of the trip flows for each OD pair between 2008 and 2012 is shown in Fig. 15a and b, in numbers and proportions, respectively. Further we visualized the changes in trip flow in Figs. 16 and 17, in numbers and proportions, respectively. These figures illustrate that trip flows have changed most dramatically for within-zone trips especially in Chiyoda, Chuo and Minato. Thus, trip flows to these three wards, especially with-zone trips deviated from the solid line. The OD pairs for which trip flows had changed significantly are listed in Table 5. We found several interesting outcomes from this result. (1) Work trips increased with the population in Tokyo. As we have depicted before, work-related trips were increasing from 1998 to 2008 according to the two latest waves of PT surveys. The increase in work-related trips is a consequence of population concentration in the downtown area of Tokyo. Though many mega-cities in Japan suffer from a decrease in population, people are still willing to live in Tokyo. The population of Tokyo keeps growing.3 (2) The change in OD trip flow was most distinct for within-zone (i.e. internal) trips. Within-zone trips were usually the main contributor to the trip flow of each zone. In the model, we found that the larger numbers had greater weights. Therefore, the changes in trip flow were most considerable for these OD pairs (as in Fig. 16). The proportional changes of withinzone trips were less notable (as in Fig. 17). (3) The OD trips were relatively stable in most OD pairs. As in Figs. 16 and 17, there were less than 8,000 changes in trip flow in most OD pairs which was less than 8%. Several examples are shown in Table 5. 5. Conclusions and future work Large-scale transportation surveys are time-consuming and costly for estimating the travel demands of work related trips. This was one of the driving reasons in this study for employing mobile-based data collectors. However, trip flow estimation methods with individual location data were also constrained by concerns about the invasion of privacy. This was the motivation in this study for new estimation methods with aggregated mobile datasets because it could preserve the advantageous aspects of mobile phone data while eliminating unfavorable aspects. This study proposed an approach to estimating trip flows with aggregated mobile phone datasets and low-cost surveys. This consists of two sequential entropy-maximization principle based sub-models. One is the trip attraction–production estimation model with inequity constraints on population change and rail capacity, and the other is an entropy maximization model on trip flow. A numerical example then showed this method performed well for synthetic data. This was the basis for the empirical study in 23 wards of Tokyo. The results of the case study showed that overall work related trips increased slightly in Tokyo between 2008 and 2012. It is consistent with our intuitions and the findings in previous studies. However, this study is also constrained by the fact that only work-related trips can be derived from the dataset. Another limitation of this study is that modal split and route assignment problem are not integrated with the model. We have discussed how to formulate this problem when datasets such as path costs and link flow are available in Section 3.6. If we can have sufficient data, we will validate the proposed idea in the future. Acknowledgments This study was supported by Grants-in-Aid for Scientific Research from the Japan Society for the Promotion of Science (B) #20114853 and (B)#25289161. We are thankful to the editor and anonymous reviewers for improving the details of this manuscript. We are also grateful to Dr. Ma Jiangshan for his discussions and valuable comments to this study. 2 3

http://www.mlit.go.jp/common/001001262.pdf, retrieved on July 23, 2015. http://www.soumu.go.jp/johotsusintokei/whitepaper/ja/h24/html/nc112130.html, retrieved on July 23, 2015.

Q. Ge, D. Fukuda / Transportation Research Part C 69 (2016) 291–312

311

Appendix A. Solution of Problem 3     t t The Lagrangian function of this problem is Atm log Am  1 þ Ltm log Lm  1 þ kðAtm  Ltm  rQ tm Þ. The first order eA tm eL m condition of the Lagrangian function generates analytical solutions for Atm and Ltm as:

qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 2 et e t ðrQ tm Þ þ 4 A m Lm ; ¼ 2 t et e 2Am Lm qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ; Ltm ¼ 2 t et e t L rQ m þ ðrQ tm Þ þ 4 A m m

Atm

rQ tm þ

m2M

ðA1Þ

m2M

ðA2Þ

Appendix B. A decomposition method for solving submodel 1 In line with Boyd et al. (2007) we can solve Problem 1 using the Primal Decomposition Method. There are several ways to decompose this problem based on Problems 1 and 2 and we show one in the following. We use the following denotations:

! ! Atm Ltm t  log  1 þ Lm log 1 et e L tm A m ! Pt t f 2 ðPtm Þ  P tm log m  1 et P

t f i ðAtm ; Ltm Þ

Atm

m

g t1 ðAtm ; Ltm Þ  Atm  Ltm  rQ tm g t2 ðLtm ; Ptm Þ  am Ltm þ bm P tm 

Cm 1 þ dm

T X Ptm  cm Dm

g 3 ðPtm Þ 

t¼1

If we let

lm ¼ 1, the Lagrangian of the original problem (Problem 1) becomes:

LðAtm ; Ltm ; P tm Þ 

T T T T X X X X t t f1 þ f2 þ kt g t1 þ ht g t2 þ /g 3 t¼1

t¼1

t¼1

ðA3Þ

t¼1

where kt ; ht ðt ¼ 1; . . . ; TÞ and / are KKT multipliers. We may decompose the constraint g t2 ðLtm ; P tm Þ into g t2a and g t2b (g t2 ¼ g t2a þ g t2b ) by introducing h as:

g t2a ðLtm Þ  am Ltm 

Cm t h 1 þ dm

g t2b ðPtm Þ  bm Ptm þ h

t

The original Lagrangian then becomes the sum of the smaller problems. Similarly, by defining 1

Lt1 ðAtm ; Ltm Þ  f t þ kt g t1 þ ht1 g t2a L2 ðPtm Þ 

T X

t

f2 þ

t

we have L ¼

PT

T X ht2 g t2b þ /g 3 t¼1

t¼1 L1

þ L2 . The problem can then be solved by the following algorithm: t;0

 Step 1: Choose an initial h0 .  Step 2: Iterate: t;k t;k t;k – Solve the Lagrangian L1 s to obtain the optimal At;k m ; Lm ; k1 and h1 using Appendix B (T problems) k – Solve the Lagrangian L2 to obtain the optimal Ptm ; ht;k 2 ; / using Appendix A (one problem with T þ 1 bound constraints and one linear constraint) t;k

t;k1

t;k k  Step 3: Update h ¼ h  ak ðht;k 2  h1 Þ, where a is an appropriate step size such that there are feasible points in each iteration. k :¼ k þ 1 and then return to Step 2.

312

Q. Ge, D. Fukuda / Transportation Research Part C 69 (2016) 291–312

References Asakura, Y., Hato, E., 2004. Tracking survey for individual travel behaviour using mobile communication instruments. Transp. Res. Part C: Emerg. Technol. 12 (3). Axhausen, K., 2014. Travel surveys: a researcher’s view. In: The 10th International Conference on Transport Survey Methods. Bell, M.G., 1991. The estimation of origin–destination matrices by constrained generalised least squares. Transp. Res. Part B: Methodol. 25 (1). Boyd, S., Vandenberghe, L., 2004. Convex Optimization. Cambridge University Press. Boyd, S., Xiao, L., Mutapcic, A., Mattingley, J., 2007. Notes on Decomposition Methods. Caceres, N., Romero, L.M., Benitez, F.G., 2013. Inferring origin–destination trip matrices from aggregate volumes on groups of links: a case study using volumes inferred from mobile phone data. J. Adv. Transp. 47 (7). Caceres, N., Wideberg, J., Benitez, F., 2007. Deriving origin destination data from a mobile phone network. Intell. Transp. Syst. IET 1 (1). Calabrese, F., Di Lorenzo, G., Liu, L., Ratti, C., 2011. Estimating origin–destination flows using mobile phone location data. IEEE Pervasive Comput. 10 (4). Calabrese, F., Diao, M., Di Lorenzo, G., Ferreira, J., Ratti, C., 2013. Understanding individual mobility patterns from urban sensing data: a mobile phone trace example. Transp. Res. Part C: Emerg. Technol. 26. Cao, P., Miwa, T., Yamamoto, T., Morikawa, T., 2013. Bilevel generalized least squares estimation of dynamic origin–destination matrix for urban network with probe vehicle data. Transp. Res. Rec.: J. Transp. Res. Board 2333. Castillo, E., Jiménez, P., Menéndez, J.M., Nogal, M., 2013. A Bayesian method for estimating traffic flows based on plate scanning. Transportation 40 (1). Cipriani, E., Florian, M., Mahut, M., Nigro, M., 2011. A gradient approximation approach for adjusting temporal origin–destination matrices. Transp. Res. Part C: Emerg. Technol. 19 (2). de Dios Ortuzar, J., Willumsen, L.G., 2011. Modelling Transport. John Wiley & Sons. Gonzalez, M.C., Hidalgo, C.A., Barabasi, A.-L., 2008. Understanding individual human mobility patterns. Nature 453 (7196). Groves, R.M., 2006. Nonresponse rates and nonresponse bias in household surveys. Public Opin. Q. 70 (5). Hendrickson, C., McNeil, S., 1984. Estimation of origin–destination matrices with constrained regression. Transp. Res. Rec. (976) Iqbal, M.S., Choudhury, C.F., Wang, P., Gonzlez, M.C., 2014. Development of origin–destination matrices using mobile phone call data. Transp. Res. Part C: Emerg. Technol. 40. Järv, O., Ahas, R., Witlox, F., 2014. Understanding monthly variability in human activity spaces: a twelve-month study using mobile phone call detail records. Transp. Res. Part C: Emerg. Technol. 38. Lo, H., Zhang, N., Lam, W.H., 1996. Estimation of an origin–destination matrix with random link choice proportions: a statistical approach. Transp. Res. Part B: Methodol. 30 (4). Lozano, J.A., 2006. Towards a New Evolutionary Computation: Advances on Estimation of Distribution Algorithms, vol. 192. Springer. Maher, M., 1983. Inferences on trip matrices from observations on link volumes: a Bayesian statistical approach. Transp. Res. Part B: Methodol. 17 (6). MLIT, 2015. Traffic Cencus of Mega-cities in Japan by Ministry of Land, Infrastructure, Transport and Tourism (in Japanese). Murakami, E., Wagner, D.P., 1999. Can using global positioning system (GPS) improve trip reporting? Transp. Res. Part C: Emerg. Technol. 7 (2). Nie, Y., Zhang, H., Recker, W., 2005. Inferring origin–destination trip matrices with a decoupled GLS path flow estimator. Transp. Res. Part B: Methodol. 39 (6). Pan, C., Lu, J., Di, S., Ran, B., 2006. Cellular-based data-extracting method for trip distribution. Transp. Res. Rec.: J. Transp. Res. Board 1945 (1). Parry, K., Hazelton, M.L., 2012. Estimation of origin–destination matrices from link counts and sporadic routing data. Transp. Res. Part B: Methodol. 46 (1). Santos, A., McGuckin, N., Nakamoto, H.Y., Gray, D., Liss, S., 2011. Summary of Travel Trends: 2009 National Household Travel Survey. . Singhal, A., 2001. Modern information retrieval: a brief overview. IEEE Data Eng. Bull. 24 (4), 35–43. Teng, J., Liu, W.-R., 2015. Development of a behavior-based passenger flow assignment model for urban rail transit in section interruption circumstance. Urban Rail Transit 1 (1). Van Zuylen, H.J., Willumsen, L.G., 1980. The most likely trip matrix estimated from traffic counts. Transp. Res. Part B: Methodol. 14 (3). Wang, M.-H., Schrock, S.D., Vander Broek, N., Mulinazzi, T., 2013. Estimating dynamic origin–destination data and travel demand using cell phone network data. Int. J. Intell. Transp. Syst. Res. 11 (2). Wargelin, L., Stopher, P., Minser, J., Tierney, K., Rhindress, M., O’Connor, S., 2012. GPS-based household interview survey for the Cincinnati, Ohio Region Tech. Rep.. Department of Transportation, Ohio. Weiner, E., 1992. History of urban transportation planning. In: Gray, G., Hoel, L. (Eds.), Public Transportation. Prentice Hall, Englewood Cliffs New Jersey. White, J., Wells, I., 2002. Extracting origin destination information from mobile phone data. In: The Eleventh International Conference on Road Transport Information and Control. Wilson, A.G., 1970. The use of the concept of entropy in system modelling. Oper. Res. Q. 21 (2). Wilson, A.G., 1971. A family of spatial interaction models, and associated developments. Environ. Plan. 3 (1). Xie, C., Kockelman, K.M., Waller, S.T., 2011. A maximum entropy-least squares estimator for elastic origin–destination trip matrix estimation. Transp. Res. Part B: Methodol. 45 (9). Yang, H., Sasaki, T., Iida, Y., Asakura, Y., 1992. Estimation of origin–destination matrices from link traffic counts on congested networks. Transp. Res. Part B: Methodol. 26 (6). Zhang, Y., 2012. Travel Demand Modeling Based on Cellular Probe Data. Zhou, X., Mahmassani, H.S., 2006. Dynamic origin–destination demand estimation using automatic vehicle identification data. IEEE Trans. Intell. Transp. Syst. 7 (1). Zhou, X., Mahmassani, H.S., 2007. A structural state space model for real-time traffic origin–destination demand estimation and prediction in a day-to-day learning framework. Transp. Res. Part B: Methodol. 41 (8).