Insights of human-transportation system interactions inferred from public transit operational data

Insights of human-transportation system interactions inferred from public transit operational data

Available online at www.sciencedirect.com ScienceDirect ScienceDirect Transportation Research Procedia 00 (2018) 000–000 Available online at www.sci...

527KB Sizes 0 Downloads 12 Views

Available online at www.sciencedirect.com

ScienceDirect ScienceDirect

Transportation Research Procedia 00 (2018) 000–000 Available online at www.sciencedirect.com Transportation Research Procedia 00 (2018) 000–000

ScienceDirect

www.elsevier.com/locate/procedia www.elsevier.com/locate/procedia

Transportation Research Procedia 32 (2018) 24–33 www.elsevier.com/locate/procedia

International Steering Committee for Transport Survey Conferences International Steering Committee for Transport Survey Conferences

Insights of human-transportation system interactions inferred from Insights of human-transportation system interactions inferred from public transit operational data public transit operational data Frédéric Roullanda* , Jos Rozenaa, John C. Handleybb a* Frédéric Roulland , Jos Rozen , John C. Handley

a. Naver Labs Europe, 6 chemin de Maupertuis, 38240 Meylan, France b. Goergen for Europe, Data Science, University of Rochester, NY 14627, USA a. Institute Naver Labs 6 chemin de Maupertuis, 38240Rochester, Meylan, France

b. Goergen Institute for Data Science, University of Rochester, Rochester, NY 14627, USA

Abstract Abstract Public transport operations data and in particular fare collection data can be used to reconstruct and analyse mobility patterns. So far, various methods have been proposed and studied in some data specific paper proposes a general framework Public transport operations data and in particular fare collection can contexts. be used toThis reconstruct and analyse mobility patterns. for So looking at allmethods the corehave elements mobilityand in the various possible operational settings public transport. It also describes some far, various been of proposed studied in some specific contexts. This of paper proposes a general framework for novel methods forcore trip elements alignments, travels’ origin and destination andsettings vehicleofload estimation. Two use describes cases illustrate looking at all the of mobility in the various possibledetection operational public transport. It also some the usemethods of the framework and validate the efficiency of destination the proposed reconstruction methods. novel for trip alignments, travels’ origin and detection and vehicle load estimation. Two use cases illustrate the use of the framework and validate the efficiency of the proposed reconstruction methods. © 2018 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY-NC-ND license © 2018 The Authors. Published by Elsevier Ltd. (http://creativecommons.org/licenses/by-nc-nd/3.0/) © 2018 The Authors. by Elsevier Ltd. This is license an open(http://creativecommons.org/licenses/by-nc-nd/3.0/) access article under the CC BY-NC-ND license This is an open accessPublished article under the CC BY-NC-ND Peer-review under under responsibility responsibility of of the the International International Steering Steering Committee Committee for for Transport Transport Survey Survey Conferences Conferences (ISCTSC). (ISCTSC) (http://creativecommons.org/licenses/by-nc-nd/3.0/) Peer-review Peer-review under responsibility of the International Steering Committee for Transport Survey Conferences (ISCTSC) Keywords: Automatic fare collection data; public transit; mobility analysis Keywords: Automatic fare collection data; public transit; mobility analysis

1. Introduction 1. Introduction The growing availability of operational data of all kind generated from sensors and intelligent transportation The growing of of operational data of all kind generated fromto sensors and intelligent transportation systems and the availability development big data technologies offers new ways understand human interactions with systems and the development data heavily technologies offers and new manual ways todata understand interactions transportation services. Insteadofofbig relying on surveys samples human collection, analysis with and transportation Instead of surveys and manual data samples collection, analysis anda planning can beservices. done leveraging the relying massiveheavily amounton of data available in a continuous and automated way and with planning done leveraging thepopulation. massive amount of data available in a continuous and automated way and with a coverage can that be is close to the overall coverage that isdata close theunderstand overall population. From these weto can the two connected dimensions of interests when studying usage of transportation From which these data we can understand the two connected of interests whentostudying usage of transportation services are the demand for transportation and thedimensions quality of service provided this demand. services which are the demand for transportation and the quality of service provided to this demand. *

Corresponding author. Tél. : +33 476 614 148. address: author. [email protected] Corresponding Tél. : +33 476 614 148.

* E-mail

E-mail address: [email protected]

2352-1465 © 2018 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/3.0/) 2352-1465 © 2018 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY-NC-ND license Peer-review under responsibility of the International Steering Committee for Transport Survey Conferences (ISCTSC) (http://creativecommons.org/licenses/by-nc-nd/3.0/) Peer-review under responsibility of the International Steering Committee for Transport Survey Conferences (ISCTSC) 2352-1465  2018 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/3.0/) Peer-review under responsibility of the International Steering Committee for Transport Survey Conferences (ISCTSC). 10.1016/j.trpro.2018.10.003

2

Frédéric Roulland et al. / Transportation Research Procedia 32 (2018) 24–33 Roulland et al. / Transportation Research Procedia 00 (2018) 000–000

25

In public transport, origin / destination is one aspect of demand that has been increasingly studied using operational data and in particular fare collection data (Trépanier et al., 2007) (Munizaga & Palma, 2012) (Alsger et al., 2016). The major problem associated with these data is that many networks implement “tap-on” only systems. Using additional spatio-temporal inferences (Zou, Yao, Zhao, Wei, & Ren, 2016) or potential access to demographic information (Chu et al., 2009) enables some degree of automation of the travel behaviour surveys. Another aspect of demand is to understand route choice within the network for a given origin / destination. Here again tickets validations happening at a node of transfer between service routes have been used to model such route choice for simulation or optimization purposes (Sun & Xu, 2012) (Roulland et al., 2015) (Michel & Chidlovskii, 2016) (Tavassoli et al., 2017). Finally, passenger load per vehicle per trip is a crucial metric for many transit agencies. Although such load can be measured through the users of automatic passenger counters (APC), these systems are expensive and not deployed extensively. One could use the same reconstruction of passenger travels from fare collection data to estimate this load. However, fare collection validators are often not connected to the actual trip schedules followed by the vehicles. In this paper we present some methods where vehicle trips reconstructed from observed ticket validations are aligned to the most likely corresponding trip in the schedule. An important aspect of quality of service is reliability, which can be directly derived from Automatic Vehicle Location (AVL) systems or again derived from fare collection data (Sun & Xu, 2012) (Zhou et al., 2017). However, it is well understood that reliability is related to demand and poor on-time performance is the major source of service complaints. (Roulland et al., 2017) have proposed a method to calculate and visualize the subjective impact of time lost waiting using a combination of fleet management data and of the passenger trajectories reconstructed from the fare collection data. In this paper we will focus on the reconstruction of vehicle and traveller activities that is at the core of the previously listed analytics applications. We believe our effort is the first attempt that considers all the various operational settings of fare collection that can coexist within a single public transportation network. Our work considers as well the various additional data that can be used in conjunction with fare collection data in order to improve accuracy of reconstruction. Whenever possible, we align individual passenger’s trajectory with a specific vehicle in operation; this provides richer information when exploring the microscopic aspects of demand and quality of service. This global approach involves some specific novel methods associated to some of the reconstruction steps that we will describe in section 3. In section 4, we will present two use cases where we have been able to validate the quality of our reconstructed data when comparing them with manually collected data. We conclude our paper by recalling the main achievements and limitations of our work and provide a perspective on generalizing what has been done in the context of public transport to a global and multimodal analysis of mobility patterns. 2. A generic approach for public transportation activity reconstruction In the introduction we have highlighted the various applications that can benefit from a reconstruction of each traveller trajectory. However, the tasks associated with the reconstruction and the outcomes vary a lot depending on the settings of the transportation operations and the data available. We propose in Table 1 a classification that allows listing the most representative settings. We have considered as a first dimension, the type of intelligent transport systems available for sourcing data: Automatic Fare collection (AFC), Schedules (Sched), Automatic Vehicle Location (AVL), and Automated Passenger Counters (APC). We then need to differentiate the fare collection paradigms implemented by the transport service: It can be either check-in only (CI) or check-in check-out (CICO). The fare validations can either happen on the platform or in the vehicle. In the latter case, it is important to distinguish systems which can locate the validation events from the ones which just provide a time stamp. From the combination of these parameters we have identified 22 possible scenarios where some level of reconstruction is possible. It is to be noted that this not an exhaustive list but rather a framework with most typical settings which can be used as a reference for any particular project. Our table matches the various scenarios with all the attributes of travel activity that can be reconstructed. These attributes describe two perspectives. The passenger’s travel perspective is viewed as a sequence of legs with a boarding stop location and an alighting stop location with an identification of the service or vehicle used for a trip. A last property defines whether the trip was going to the final destination of the travel or if it was going to a hub for making

26

Frédéric Roulland et al. / Transportation Research Procedia 32 (2018) 24–33 Author name / Transportation Research Procedia 00 (2018) 000–000

3

a connection. The vehicle or service’s perspective is a set of stop times where we consider the number of travellers boarding (#in) the number of travellers alighting (#out) and the arrival and departure time. From this classification we can describe whether a scenario allows the reconstruction of one travel property, the level of uncertainty which is inherent in such reconstruction and the typology of inference processing that is required. We describe the level of uncertainty using a qualitative scale. On this scale, 0 means that we can get the property directly from the collected data. Each time we have to make an inference using a heuristic or a statistical model in order to approximate the property’s value or to simplify the dimensions of the problem we increment the scale. This scale captures only the level of uncertainty associated with the reconstruction method and it should not be confused with the additional uncertainty that may be generated by the errors produced by the data collection systems themselves. The colour coding of the cells describes typology of inference(s) being applied. We have identified 6 different types: • Stop alignment (Red): Location of a fare validation is required whenever a validation happens in a vehicle which cannot locate the event. It can be done using a combination of the vehicle identification and the time stamps and is more or less certain depending on the availability of a real or theoretical schedule of the vehicle. • Passenger Alighting (Green): It is required for any check-in only AFC system and is well described in the field (Trépanier, Tranchant, & Chapleau, 2007). We distinguish however the uncertainty depending on whether the alighting can be inferred on a specific route and direction (in-vehicle validations) or whether it has to be matched with all possible routes and all direction crossing the stop (platform validations). • Origin Destination detection (Light blue): The fact that a passenger’s alighting is the destination of the travel rather than a connection is usually inferred based on temporal criteria (Munizaga & Palma, 2012). We propose to improve this method based on reasoning with respect to the possible routes and natural connections on the network (see section 3.1). • Trip assignment and alignment (Gold): In order to reconstruct vehicle’s load, we want to assign each person’s travel leg to one vehicle service trip. The uncertainty depends whether fare collection is done in a vehicle where we have only to choose within the trips of that vehicle or on the platform which requires considering all services fitting the spatio-temporal window of the travel. For a coherent and complete view of the actual operations, these service trips need to be aligned with the ones listed by AVL or schedules whenever available (see section 3.2). • Occurrence sizing (Yellow): This is about extrapolating the actual counts of boarding events and alighting events from the ones that could be reconstructed from fare collection data. The first level of uncertainty is associated with the boarding events for which no alighting could be guessed because of no further validation. In that case a heuristic is required to distribute them on the network to make sure that every passenger has alighted somewhere. The second level of uncertainty is associated with all the passengers that may travel without using the fare collection systems (fraud or special pass). This case can be better managed if there is at least some partial passenger counting on the network that can be used to model the additional load to be considered. • Arrival/Departure window (Dark blue): This is about determining the arrival and departure time of the vehicle based on the temporal traces of validations happening during this vehicle’s stop. The uncertainty and approach depends on the workflow (validating-boarding on one side and vehicle arriving-departing on the other) at the vehicle stop. We have put the same uncertainty in our table for simplification purpose. We have implemented the set of methods corresponding to the six categories of inference described above as part of a larger software framework in order to fully automate the reconstruction of the transport activity. The execution of these algorithms represents the third macroscopic step of our reconstruction process: 1. Data collection: data are collected from the various sources and systems in operations 2. Normalization and cleaning: data are normalized to the same format and cleaned from all detected erroneous entries. 3. Reconciliation and enrichment: the algorithms described above are executed according to the need. 4. Optimization: data are structured in an online analytical processing fashion for efficient access at query time.

4

Frédéric Roulland et al. / Transportation Research Procedia 32 (2018) 24–33 Roulland et al. / Transportation Research Procedia 00 (2018) 000–000

27

The whole process is orchestrated through a scenario that has been defined to fit all the various settings identified in the transportation network being considered. Table 1: Classification of Reconstruction scenarios

AFC only

AFC + Sche d

AFC + AVL

AFC + AVL + APC

Passenger travel Boarding Alighting stop stop 0 0

destinati on 1

Service trip 2

Vehicle stop time # in # out Arrival time 3 3 3

Dep time 3

0

3

4

2

4

5

3

3

0 0 3

0 2 3

1 3 4

2

4

4

3

3

CI Vehicle

3

4

5

2

4

6

3

3

CICO Located vehicle CI Located Vehicle CICO platform

0

0

1

1

3

3

2

2

0

2

4

1

2

4

2

2

0

0

1

2

3

3

3

3

CI Platform

0

2

4

3

4

7

4

4

CICO vehicle

1

1

2

0

2

2

0

0

CI Vehicle

1

2

3

0

2

4

0

0

CICO Located vehicle CI Located Vehicle CICO platform

0

0

1

0

1

1

0

0

0

1

3

0

1

3

0

0

0

0

1

1

2

2

0

0

CI Platform

0

2

4

2

3

6

0

0

CICO vehicle CI Vehicle

1 1

1 3

1 4

0 0

0 0

0 0

0 0

0 0

CICO Located vehicle CI Located Vehicle CICO platform CI Platform

0

0

1

0

0

0

0

0

0

1

2

0

0

0

0

0

0 0

0 2

1 3

1 2

0 0

0 0

0 0

0 0

CICO Located vehicle CI Located Vehicle CICO platform CI Platform CICO vehicle

Such level of automation and adaptability to the various sites has been achieved thanks to the logical decomposition of the problem that we have presented in this section. In the next sections we are now going to present details and results achieved in specific settings using this framework.

28

Frédéric Roulland et al. / Transportation Research Procedia 32 (2018) 24–33 Author name / Transportation Research Procedia 00 (2018) 000–000

5

3. Proposed reconstruction inference methods 3.1. Origin and destination detection Origins and destinations detection can be seen as the segmentation of the daily sequence of transport services used by a person into a set of travels motivated by a particular activity. Traditionally, this is done using a time threshold that is either looking at the maximum duration of a travel or the maximum duration of a transfer. However, within the time below the threshold that is usually chosen to encompass possibly long travels and transfers, some people can perform several short travels addressing different goals. This can be illustrated by analyzing the trajectory of the route chosen by the traveler in Figure 1. If we were considering only the time dimension we would say that this traveler boarded at 1, then transferred at 2, transferred again at 3 and alighted at destination 4. Although this was done within a limited time, this route with 2 transfers represents a long detour with respect to choosing the direct line between 1 and 4. We propose to consider that such patterns are motivated by one or more intermediate goals along the route and therefore to split this travel into several ones.

Figure 1: Example of a multiple goal travel

Prior work (Nassir et al., 2015) considers additional criteria looking at the spatial dimension or at the network structure in addition to the time. Our approach follows this intuition of trying to distinguish ‘coherent’ itineraries with transfers imposed by the structure of the network from those with transfers rather motivated by a traveler goal. Rather than using predefined static criteria which are not robust to all network topographies we proposed to use a trip planner that inherently considers these dimensions. For each observed route sequences generated from a temporal split and with more than one leg: • We call our trip planner (Ulloa et al., 2016) between the origin and destination and collect one or more alternative itineraries. • If the route sequence chosen by the traveler is not within the trip planner results, then we decide to split the travel. ○ If the travel had only one transfer, we transform this transfer as a destination of a first travel and the origin of a second. ○ If the travel had more transfers, we apply recursively the approach to two subsets of the trip: – Going from original departure up to last transfer (1 to 3 in Figure 1). – Going from before last transfer to original destination (2 to 4 in Figure 1). – By doing this recursive process, we will assess each of the transfers nodes of the initial travel to be a ‘coherent’ one. However, during the resolution of the two sub problems, no split may be performed. It would be the case in Figure 1, because each sub parts of the travel were following a ‘coherent’ itinerary. In this case, we have to split arbitrarily the initial travel at one of the two transfer locations. Note however that one might try to consider additional criteria to decide how to arbitrate between the two stops.

6

Frédéric Roulland et al. / Transportation Research Procedia 32 (2018) 24–33 Roulland et al. / Transportation Research Procedia 00 (2018) 000–000

29

This method relies heavily on the fact that our trip planner is able to capture some objective criteria for choosing a route sequence for a given origin and destination. We have illustrated our demonstration by assuming that a traveler would consider minimizing the trip distance and the number of transfers for the selection of a route sequence because they are visually easy to represent and looks quite plausible criteria that people would use. However, we acknowledge that in theory some specific cases of itineraries may be considered as a set of travel with different goals although there are just the results of a traveler’s choice considering other qualitative criteria than the one we used in our trip planner. 3.2. Service trip assignment and alignment Service trip alignment corresponds to the need to match a sequence of vehicle’s stop times observed or witnessed by the presence of boarding validations with a description of a service trip in the schedule. It is a complex problem when it is done with a theoretical schedule because it combines the fact that the observed stop times are only partial observations of a sequence of trips performed by a vehicle and the fact that the operations may vary a lot with respect to the schedule (late/early trips, trip cancellation, trip addition). The Marey’s diagram in Figure 2 illustrates the problem.

Figure 2: Marey’s diagram illustrating the trip alignment problem. Diamond-shaped plot are validations and oblique lines are theoretical trips for the same spatio-temporal window.

The diamond-shaped plots represent observed validations of a specific vehicle whereas the oblique lines represent the closest theoretical trips described for this sequence of stops and time window. Although most of the points can be more likely associated to the theoretical stop times of the first or second trips, the orange plot could be reasonably considered belonging to the first trip (assuming some delays) or to the second one (assuming an early arrival). We propose to address the problem through six different steps: • Validations are routed and directed individually. For this step, we consider validation at the traveler’s level and we apply the same heuristic as used for alighting inference. In this case however we do not try to predict the alighting stop but just the route and direction that was taken by the traveler. At the end of this step we have added a route and direction to only one part of all validations. • Observed vehicle’s stop times are compiled. For each vehicle we group together all validations that have happened at the same stop and within a small time window. We usually take 30 min as a parameter. We can attribute a route and direction to each created stop times if at least one of the validations of this stop time was routed and oriented in the previous step and if there is no contradiction between the route and direction of labeled validations in the same stop time. • A direction is attributed to all stop times. For each vehicle we order chronologically all stop times and we identify missing directions by comparing the sequence of a stop time, its previous and following stop times with

30

Frédéric Roulland et al. / Transportation Research Procedia 32 (2018) 24–33 Author name / Transportation Research Procedia 00 (2018) 000–000

7

the topology of the route. If the next stop time is actually reachable from the current stop time’s direction and if the observed duration between the two is less than the theoretical duration it would take to go up to the end of the trip and get back to the second stop time using the reverse direction, then we consider that the next stop time should keep the same direction as the current one. In the contrary, we assign the reverse direction to the next stop time. • Observed trips are created. Once all stop times are attributed a direction we can split the sequence of the vehicle’s stop times into trips by simply detecting the changes of direction. • Observed trips are aligned with theoretical trips. We use as a distance the sum of the stop times temporal distance divided by the number of observed stop times. In a first round, for each observed trip, we look for the closest theoretical trip according to this distance. We align observed trips that are not in conflict with another one for designation of their closest theoretical trips. In the second round, we assign remaining trips through a global minimization of the distances. We however discard trips that have do not a theoretical match below a distance of 300 seconds. • Aligned trips’ theoretical stops times without any observed validations, but potential candidates for alighting events, are interpolated. This is done by using a linear interpolation of the delays with respect to the theoretical schedule between two observed stop times. At the end of this process, we have a set of aligned vehicle trips comprehending all the theoretical stop times, with observed delays and number of validations at some stop times and interpolated delays for the others. 4. Use cases and experimental validation 4.1. Origin destination analysis on Lima’s Bus Rapid Transit (BRT) We have studied the reconstruction of passengers’ travels to support origin destination analysis on the BRT Lines of Lima (Peru). This system is about 30km long and composed of 40 stops and three routes that cross the city from North to South. The settings of this network correspond to the fourth scenario in Table 1. This is a system where buses circulate in dedicated lanes with no predefined schedule. We had only access to the fare collection systems data which is implemented with validators on the platform of each stop used for check-in only. Our reconstruction process is therefore quite limited. The passenger’s alighting is inferred using the state of the art method and the origin and destination of travels are derived from the method of section 3.1. We have validated our results using a survey done on the 27th of September 2015 over 50 745 travels. For each travel we have collected their boarding stop at origin, their alighting stop at destination and traveler’s identifier. This allowed us to have a one-to-one matching with the travel reconstructed from fare collection data. The reconstruction was done using 50704 validations with a ticket identifier present in the survey out the 95554 collected that day. From these we have obtained 49534 raw reconstructed travels and 50432 after running our goal-based travel segmentation method. These are to be compared with the 50745 travels of the survey. We initially obtained 97.4% of trips where the stop alighting at destination is correct which is very high score with respect to previous studies done in the literature. Our interpretation is that this very linear network topology does not allow complex usage behaviors and fits perfectly the assumptions which are inherent to the passenger alighting inference heuristic. In this first phase we had not considered our advanced travel segmentation approach but just used a temporal split of 60 mins. When applying our goal-based trip segmentation, we reached 99.4% correct destinations and validated the benefits of this approach even in a quite simple topology. 4.2. Vehicle Load estimation in Nancy’s buses We have studied the reconstruction of passenger travel and vehicle stop time to estimate the vehicle load on each segment of service trips for the public transport network of Nancy (France). This network is composed of 2 tramways and 36 bus routes going through 1032 stops and is servicing about 100000 unique passenger trips every weekday.

8

Frédéric Roulland et al. / Transportation Research Procedia 32 (2018) 24–33 Roulland et al. / Transportation Research Procedia 00 (2018) 000–000

31

The settings of this network correspond to the eighth scenario in Table 1. This is a system where all vehicles are equipped with validators used for check-in only. The validators are able to localize at which stop the validations have happened. Every service trip is following a publically available schedule that we used in our reconstruction. In addition, a few stops are also equipped with platform validators and for this particular small subset of data (about 10%) we are in the much more complex tenth scenario of Table 1. Our reconstruction uses three types of inferences to achieve the final estimation of the vehicle load. We first need to associate each validation to one service trip in the schedule, then we can infer an alighting for each boarding within the stop serviced by this trip and finally we have to estimate the number of people alighting and boarding at each vehicle’s stop. • The alighting inference uses the same method as in Lima’s use case except that the possible stops for alighting are limited to the ones served by the vehicle for this trip. • The alignment of each boarding event to one vehicle trip uses the method described in section 3.2 • For the number of people boarding and alighting, we have not considered the number of people not validating when entering the vehicle and we take the number of boarding and alighting events reconstructed from validations as an estimate of the actual numbers.

Figure 3: Absolute error distribution of Nancy’s study

We have validated our results using a survey done on weekdays between January 2017 10th and 16th over 96 service trips distributed over three representative routes of the network. We were able to compare the reconstructed and surveyed load after each stop time of the 96 trips. We obtain a mean absolute error (MAE) of 4.63. If we look at the distribution of this error in Figure 3, we can see it is very often below 4 passengers but may increase significantly because of outlier scenarios owing to abnormal behavior. Figure 4

Frédéric Roulland et al. / Transportation Research Procedia 32 (2018) 24–33 Author name / Transportation Research Procedia 00 (2018) 000–000

32

9

provides a qualitative understanding of the alignment of the reconstructed load with the reference load. In such a trip instance, the reconstruction can act as a reasonable proxy to understand the load pattern of a service trip. It is difficult from this data to understand precisely what the impact of each level of uncertainty is in the final resulting error, however we know as well that the 10% of data validated from the platform had a negative impact on the quality of reconstruction and that this setting should be considered as a relatively complex one (see Table 1).

50

40

30 20

10

0

6:56

7:01

7:06

7:11

7:16

7:21

7:26

Figure 4: Example of reconstructed vehicle load for one of the studied trips (survey: red curve, reconstruction: blue curve)

5. Conclusion Fare collection systems data are an important source of information for understanding mobility patterns and interactions of travellers with transportation services. In this paper we have proposed a theoretical framework that enables to characterize what can be reconstructed using such data, with which level of uncertainty and for which applications. This is declined according to the various settings of the system in the field and availability of additional data. We also presented two use cases corresponding to two different applications 1) origin destination analysis and 2) vehicle load estimation. These use cases illustrate the practical application of our theoretical framework. They also validate the applicability of the reconstruction methods used in these scenarios and measure their performances. Our framework qualifies theoretically the inherent level of uncertainty of reconstruction according to the settings. Beyond this, the actual accuracy that we obtain is highly influenced by some aspects of the studied network such as: • The complexity and topology of the network • The amount of people not using automatic fare collection systems • The mobility behaviours of the travellers such as willingness to walk when transferring, complexity of daily activities and asymmetric use of transportation over the day. These aspects cannot be predetermined but the very high accuracy obtained in Lima with respect to the literature (Munizaga et al. 2014) or the variability of error in vehicle load estimated from fare collection in Nancy illustrate their impact on the results. In the future we would like to implement inference algorithms relying on stochastic models

10

Frédéric Roulland et al. / Transportation Research Procedia 32 (2018) 24–33 Roulland et al. / Transportation Research Procedia 00 (2018) 000–000

33

which can provide an interval of confidence with each estimated property so that analysts can incorporate such uncertainty into their work and conclusions. We believe this is an element required for a generalized adoption of such methods in the field of mobility analytics. Another important limitation of the approach is that it assumes that a unique integrated fare collection system is used for all transportation service of the studied area. This is however not always the case, in particular if we would like to understand mobility patterns in all type of mobility services including on demand services such as parking lots, car-pooling, car-sharing, bike sharing systems and the like. Having a fragmented view of the mobility is however very limiting for an analysis especially for origin and destinations. In the future, we should think about how to generalize the approach developed more in the context of public transport usage to a more multimodal context using emerging sources of data. Acknowledgements We would like to thank the teams of Conduent, Inc. and Grand Nancy for their collaboration in collecting the data used within the two presented cases.

References Alsger, A., Assemi, B., Mesbah, M., & Ferreira, L. (2016). Validating and improving public transport origin–destination estimation algorithm using smart card fare data. Transportation Research Part C: Emerging Technologies, Volume 68, 490-506. Chu, K., Chapleau, R., & Trépanier, M. (2009). Driver-assisted bus interview: Passive transit travel turvey with smart card automatic fare collection system and applications. Transportation Research Record, Volume 2105, 1-10. Michel, S., & Chidlovskii, B. (2016). Stochastic optimization of public transport schedules to reduce transfer waiting times. IEEE Second International Smart Cities Conference, 12-15 September, 2016. Trento, Italy. Munizaga, M. A., & Palma, C. (2012). Estimation of a disaggregate multimodal public transport Origin–Destination matrix from passive smartcard data from Santiago, Chile. Transportation Research Part C: Emerging Technologies, Volume 24, 9-18. Munizaga, M., Flavio, D., Navarrete, C., & Silva, D. (2014). Validating travel behavior estimated from smartcard data. Transportation Research Part C: Emerging Technologies, Volume 44, 70-79. Nassir, N., Hickman, M., & MA, Z.-L. (2015). Activity detection and transfer identification for public transit fare card data. Transportation, 683– 705. Roulland, F., Ulloa, L., & Handley, J. (2017). Measuring perceived impact of schedule deviation in public transport. TRB 96th Annual Meeting. Washington DC, United States: Transportation Research Board. Roulland, F., De Souza, C., Ulloa, L., Mondragon, A., Niemaz, M., & Ciriza, V. (2015). Towards data-driven simulations in urban mobility analytics. 14th ITS Asia Pacific Forum 2015, April 27th. Nanjing. Sun, Y., & Xu, R. (2012). Rail transit travel time reliability and estimation of passenger route choice behavior analysis using automatic fare collection data. Transportation Research Record, 2275(2), 58-67. Tavassoli, A., Mesbah, M., & Hickman, M. (2017). Quantifying error in transit assignment using smart card data in a large-scale multimodal transit network. TRB 96th Annual Meeting. Washington DC, United States: Transportation Research Board. Trépanier, M., Tranchant, N., & Chapleau, R. (2007). Individual trip destination estimation in a transit smart card automated fare collection system. Journal of Intelligent Transportation Systems, Volume 11, 1-14. Ulloa, L., Lehoux-Lebacque, V., & Roulland, F. (2016). Trip planning within a multimodal urban mobility. ITS European Congress, June 6-9, 2016. Glasgow. Zhou, Y., Yao, L., Chen, Y., Gong, Y., & Lai, J. (2017). Bus arrival time calculation model based on smart card data. Transportation Research Part C: Emerging Technologies, Volume 74, 81-96. Zou, Q., Yao, X., Zhao, P., Wei, H., & Ren, H. (2016). Detecting home location and trip purposes for cardholders by mining smart card transaction data in Beijing subway. Transportation, 1-26.