Modelling passenger waiting time using large-scale automatic fare collection data: An Australian case study

Modelling passenger waiting time using large-scale automatic fare collection data: An Australian case study

Transportation Research Part F 58 (2018) 500–510 Contents lists available at ScienceDirect Transportation Research Part F journal homepage: www.else...

1MB Sizes 0 Downloads 18 Views

Transportation Research Part F 58 (2018) 500–510

Contents lists available at ScienceDirect

Transportation Research Part F journal homepage: www.elsevier.com/locate/trf

Modelling passenger waiting time using large-scale automatic fare collection data: An Australian case study Ahmad Tavassoli a,⇑, Mahmoud Mesbah a,b, Ameneh Shobeirinejad c a

School of Civil Engineering, The University of Queensland, Brisbane, Australia Department of Civil and Environmental Engineering, Amirkabir University, Tehran, Iran c School of Information Communication Technology, Griffith University, Brisbane, Australia b

a r t i c l e

i n f o

Article history: Received 7 October 2017 Received in revised form 5 May 2018 Accepted 25 June 2018

a b s t r a c t Passenger waiting time at transit stops is an important component of overall travel time and is perceived to be less desirable than in-vehicle travel time or access time. Therefore, an accurate model to estimate waiting time is necessary to better plan for transit and to improve patronage. The majority of previous studies on waiting time have either made very limiting assumptions on the arrival distribution of passengers or lacked a large-scale and high-quality dataset. The smartcard fare collection system in South-East Queensland, Australia, has provided the opportunity of very large-scale and highly accurate data on passenger boarding and alighting times and locations. In this research, all 130,000 daily rail passengers in all 145 stations of a network are considered. First a methodology is developed to match each individual passenger with the most likely rail service he/she boarded. Then, a hazard-based duration modelling approach is adapted to model passenger waiting time as a function of a variety of factors that influence waiting time. Log-logistic accelerated failure time (AFT) models are inferred to be appropriate among the models tested. The results indicate that: (a) the waiting time can be predicted accurately at various confidence levels; (b) the waiting time at all network stations can be predicted with a single model; and (c) a wide range of influencing parameters are statistically significant in the model, which can be categorized to temporal, infrastructure and operation, demographics, and trip characteristics parameters. The results of this study can be used for demand estimation, operational analysis, transit scheduling, and network design through an understanding of the effects of influential variables on waiting time. Ó 2018 Elsevier Ltd. All rights reserved.

1. Introduction Increasing the share of transit is the objective of many transport authorities as a means to reduce traffic congestion, emissions, and increase transport systems’ efficiency. To this aim, a deep understanding of the components of this mode is essential. Considering a transit journey, waiting time is perceived more onerous than in-vehicle travel time and access time (Wardman, 2004; Ceder, 2016). Waiting time has been studied extensively in the literature and is used in many transit models as a key variable affecting travel time, travel time reliability, and use of transit from a passenger’s perspective (Cats and Gkioulou, 2014). The waiting time models in the literature mainly identify two extremes for the spectrum of passengers.

⇑ Corresponding author at: School of Civil Engineering, Faculty of Engineering, Architecture and Information Technology, The University of Queensland, St Lucia, Brisbane, QLD 4072, Australia. E-mail address: [email protected] (A. Tavassoli). https://doi.org/10.1016/j.trf.2018.06.037 1369-8478/Ó 2018 Elsevier Ltd. All rights reserved.

A. Tavassoli et al. / Transportation Research Part F 58 (2018) 500–510

501

When the headways are long, the majority of passengers use the schedule to minimize their waiting time, however, depending on the service reliability, they leave a margin to arrive at the station/stop before the scheduled service time. When the headways are short, most passengers arrive randomly as they can catch a service at any time. At any given station/stop there is a mixture of both groups of passengers. This fact makes the prediction of waiting time challenging. The literature is reviewed in greater detail in the next section. Despite the importance of waiting time, lack of large-scale and high-quality data has limited the previous models to accurately estimate waiting time. The theoretical models reviewed in the next section were developed with few specific assumptions or the empirical models were calibrated with small or incomplete datasets. With the availability of the smartcard fare payment, a new horizon is now available to develop robust models validated by high-quality data on a large-scale. A unique dataset from TransLink, the transit authority in South-East Queensland (SEQ), Australia, is used in this study. The smartcard fare payment system (called GoCard), requires passengers of all types to ‘touch on’ when entering a station or boarding a transit vehicle as well as ‘touch off’ when exiting a station or alighting a transit vehicle. Because of the strong fare incentives, close to 85% of transit passengers use a GoCard. In this study, all transactions of all rail passengers across the network during a complete day are included for analysis. This is equal to approximately 130,000 passengers in 145 stations during one day. This study aims to model the passenger waiting time using the above large-scale database. After testing various theoretical distributions, a hazard-based duration model is proposed to estimate the waiting time. The prediction power of the model is tested by empirical observations. Furthermore, the effect of a wide range of independent variables is tested in the model. 2. Literature review Waiting time has been identified as a significant parameter since the 1970 s. Huddart (1973) study, based on data collected in London, UK, showed that passengers arrive randomly when the headway is short and target a specific service when the headways are long. Ceder and Marguier (1985) derived waiting time distribution based on the variation of some assumptions, such as no queuing to board a transit vehicle and a constant rate of passenger arrivals. In addition to headway, waiting time is also affected by a range of other variables. Many studies have investigated the effect of reliability on passenger waiting time (Turnquist, 1978; Bowman and Turnquist, 1981; Currie and Csikos, 2007; Luethi et al., 2007). Frumin and Zhao (2012) focused on stops with heterogeneous services (more than one service pattern). The proposed study of this research considers this factor as the GoCard data source is rich enough to identify the destination and therefore, the study can include services connecting the origin and destination in headway calculations. The effect of weather on waiting time has been investigated by Lam, Lam, Morrall, and Morrall (1982) and Mishalani, Mccord, and Wirtz (2006). Cats and Gkioulou (2014) examined the effect of information and reliability on waiting time. In terms of the scale of data used, usually smaller scale studies have been undertaken on bus and light rail systems in which an interview or direct observation was used (Salek and Machemehl, 1999; Zahir, Matsui, & Fujita, 2000; Fan and Machemehl, 2009; Iseki and Taylor, 2010; Psarros, Kepaptsoglou, & Karlaftis, 2011; Bouzir et al., 2014). Larger-scale studies such as those in Brisbane (Tavassoli, Mesbah, & Hickman, 2017), London (Frumin and Zhao, 2012) and Melbourne (Csikos and Currie, 2008) have used a heavy rail data source. Note, that in the former study only a part of the network (London Overground) was included and in the latter, the arrival distribution was focused as a time series rather than the waiting time itself. A hazard-based duration modelling approach is suitable for dealing with duration data that are positive and can be censored and time-varying (Bhat and Pinjar, 2008). This hazard-based approach is common in many disciplines including biomedical, social sciences, and engineering (Hensher and Mannering, 1994). In the transport field, this method has been applied over the last two decades in modelling of sometime-related events including safety, traffic studies, vehicle ownership, and activity based models. Examples include time between planning and execution of an activity (Bhat and Pinjar, 2008), residential location (Rashidi, Auld, & Mohammadian, 2012), duration of shopping activity (Bhat, 1996), length of traffic delay (Mannering, Kim, Barfield, & Ng, 1994), incident duration (Tavassoli Hojati, Ferreira, Washington, Charles, & Shobeirinejad, 2014), the analysis of urban travel time (Anastasopoulos et al., 2012) and congestion duration for rail passenger flow (Shi et al., 2016). Many of the previous studies on waiting time estimation cannot be generalized to other cases because the research is based on: (1) small sample size of up to several hundred passengers; (2) incomplete or poor-quality waiting time data; and (3) even though there are a few proposed models with rather a high efficiency, their applicability to other locations and times are not verified for consistency. This means some of the variables could be significant in one location but not significant at another location. Therefore, an analysis of passenger waiting time on a large-scale is carried out here to validate the factors arising in the literature, to identify other potential factors that might influence the waiting time, and to better understand patronage behavior in the Australian context. 3. Model development Since waiting time varies from one passenger to another, a probabilistic method can be utilized to model waiting time. The results of probabilistic methods provide useful insights for selecting an appropriate strategy in transit operations and

502

A. Tavassoli et al. / Transportation Research Part F 58 (2018) 500–510

planning. In this regard, the hazard-based duration concept in probabilistic methods, which is well-suited for analyzing time-related data, are employed to model waiting time. A hazard model is based on a survival analysis, which is a collection of statistical procedures for data analysis, for which the outcome variable of interest is time until an event occurs (or ends). Waiting time can be defined as the time from when a passenger arrives at a station and validates a smart fare card (GoCard) until boarding the first available service. This length of time is a continuous random variable T with a cumulative distribution function F(t) and probability density function f(t). F(t) is also known as the failure function and gives the probability of having a waiting time for boarding before some specified time t. Conversely, the survival function, S(t), is the probability of the duration being greater than some specific time t.

F ðt Þ ¼ PrðT 6 tÞ ¼ 1  PrðT > tÞ ¼ 1  SðtÞ

ð1Þ

The hazard function h(t) gives the instantaneous potential per unit time for the event to occur, given that the individual has survived up to time t (Washington, Karlaftis, & Mannering, 2011).

hðtÞ ¼

f ðtÞ f ðtÞ Prðt þ Dt P T P tjT P tÞ ¼ ¼ lim 1  FðtÞ SðtÞ Dt!0 Dt

ð2Þ

The proportional hazard (PH) model and the accelerated failure time (AFT) model are the two alternative parametric approaches that can incorporate the effect of external covariates on hazard function (Rashidi and Mohammadian, 2011; Greene, 2012b). With fully parametric models, a variety of distributional alternatives, namely exponential, Weibull, loglogistic and lognormal for the hazard function can be interpreted as AFT. In compare, only exponential and Weibull distributions can accommodate PH assumptions (Jenkins, 2005). The key assumption for an AFT model is that survival time accelerates (or decelerates) by a constant factor when comparing different levels of covariates, whereas, the key assumption of a PH model is that hazard ratios are constant over time. Nevertheless, the AFT assumption allows for the estimation of an acceleration factor, which can describe the direct effect of an exposure on survival time (Kleinbaum and Klein, 2012). Therefore, in this study, the AFT model is utilized to model passenger waiting time. The AFT model assumes a linear relationship between the log of survival time T and vector of explanatory variables X. In this model, the effect of external covariates on survival time is direct and accelerates and decelerates the time to transaction.

lnðTÞ ¼ bX þ e

ð3Þ

where b is a vector of the estimated coefficient and A general formulation for AFT can be written:

e is an error term.

hðt; XÞ ¼ wh0 ðwtÞ

ð4Þ expððb00

b01 x1

b0n xn ÞÞ

0

where h0 ðÞ is the baseline hazard function, w ¼ þ þ ::: þ ¼ expððb XÞÞ, and n is the number of explanatory variables. In estimating Eq. (4) with fully parametric models, three distributional alternatives were considered, namely Weibull, lognormal and log-logistic for the hazard function, which are tested to find the best fit to the data. The functional forms of the hazard function for each model can be derived using the general function and each distribution model with the positive location and scale parameters. These models are fitted using the maximum likelihood method, where the distributions and corresponding models are given by Greene (2012a) and Washington et al. (2011):

Weibull distribution : hðtÞ ¼ kpðktÞp1 Lognormal distribution : hðtÞ ¼

ð5Þ

/ðpLogðktÞÞ U½pLogðktÞ

ð6Þ

kpðktÞp1 ½1 þ ðktÞp 

ð7Þ

Log - logistic distribution : hðtÞ ¼

where k and p are two parameters, known as the location parameter and scale parameter, respectively. /ðÞ is the standard probability density function and UðÞ is the standard normal cumulative distribution function. In this study, Log is the neperian logarithm. The results of the modelling can be interpreted according to the measure of p for each model. This approach is based on the assumption that the effect of any individual explanatory variable is the same for each observation; therefore, parameters are treated as constant across observations. However, considering the fact that passenger waiting time might not be homogeneous across observations and may cause erroneous inferences based on the improperly specified model, the following approach is implemented to address the influence of unobserved heterogeneity. This approach considers the gamma distribution over the population with mean 1 and variance h to incorporate heterogeneity into the Weibull model as described by Washington et al. (2011).

Weibull with gamma heterogeneity distribution : hðtÞ ¼

kpðktÞp1 1 þ hðktÞp

ð8Þ

A. Tavassoli et al. / Transportation Research Part F 58 (2018) 500–510

503

In this study, likelihood ratio statistics as described in Washington et al. (2011) were calculated and compared to select the best fitting model. The highest level of significance for this statistic indicates better goodness of fit. Statistical analyses were performed using LIMDEP (Version 10, Econometric Software Inc., NY, USA). 4. Data description Smartcard technologies not only bring a variety of benefits for both travellers and service providers but also facilitate access to high-quality and comprehensive information on the travel behavior of passengers, as well as the performance of a transit system. To apply the proposed modelling approach in this paper, a unique dataset from Translink, the transit authority in SEQ, Australia, has been used. GoCard is TransLink’s electronic ticket that lets passengers travel seamlessly in the SEQ network. Fig. 1 shows the map of the middle part of this network. The Central Business District of Brisbane, the capital of Queensland, is shown in the middle. The SEQ rail network extends to the Sunshine Coast in the north and to the Gold Coast in the south. Typically, passengers are required to both touch-on and touch-off at the time of boarding and alighting. Therefore, the system records a variety of trip information, namely Card ID, stop location and associated time for both boarding and alighting, route name (station name for rail), direction and some other information, that is, journey ID for the purpose of fare calculation. For the rail system, touch-on can be made at stations. Therefore, trip information including the route name and also direction are not available in trip transactions. However, this information is required to calculate passenger waiting time for each trip. In addition, the GoCard dataset provides the location of boarding and alighting stops in terms of names and codes which are not consistent with the SEQ transit network data provided by the Queensland Government (Queensland Government, 2013) in the General Transit Feed Specification (GTFS) (Google, 2013) format. Also, there are no coordinates associated with the GoCard dataset for boarding and alighting stops. To overcome this limitation, a stop matching heuristic, as described in Nassir, Hickman, and Ma (2015), was applied to match stop name and codes between the GoCard and the GTFS of the SEQ network. To address the former limitation, a methodology was put forward to estimate the route and also the direction for each rail trip using GoCard and GTFS datasets. GTFS has a variable named ‘‘trip ID” for each service that is unique and has all information regarding the route, direction, served stops and their sequence as well as the timetable of arrival and departure time at each station. Rail passengers are not required to touch-off and touch-on for changing route and platforms. Therefore, a transfer might be made during each rail trip without a GoCard record. Considering the network topology, these trips can be made with a maximum of one transfer. The rail transit network was divided into fourteen segments; within each segment, any trip can be made between two stations without any transfer. On this basis, seven stations were identified as possible transfer stations between different segments across the rail transit network as shown in Fig. 1. Based on the segments in which boarding and alighting stops

Fig. 1. Rail transit network of South-East Queensland, Australia.

504

A. Tavassoli et al. / Transportation Research Part F 58 (2018) 500–510

were located, the trips with transfer within rail mode were identified and the related transfer station was allocated to each trip. For trips without a transfer, n sets of possible trip IDs, from boarding stop to alighting stop were extracted and symbolized by TIDR. Also, Bact and Aact were defined as the actual arrival time at the boarding station and actual departure time from the alighting station according to GoCard, respectively. In the same way, BRsch and ARsch were defined as the scheduled departure time at boarding and scheduled arrival time at boarding and alighting stations based on the GTFS timetable for each TIDR (R 2 f1; :::; ng), respectively. The best trip ID (TIDselect) for each trip was obtained by minimizing the summation of absolute differences between actual time and scheduled time of both boarding and alighting across possible TIDR as shown in Eq. (9).

    def     TIDselect ¼ 9! TIDR jR¼1;:::;n : Min BRsch  Bact  þ ARsch  Aact 

ð9Þ

For trips with a transfer, all possible trip IDs for both trip legs (from a boarding station to a transfer station and from a transfer station to an alighting station) were selected and denoted by TIDR1 and TIDR2 , respectively. Then, the best trip ID for both trip legs was estimated using Eq. (9). In this equation, 9! denotes existence of one TID. The summation of absolute differences between actual time and scheduled time of both boarding and alighting is minimized. It should be noted in this equation that boarding and alighting refer to different trip legs. Moreover, the arrival time at a transfer station in the first 1 2 trip leg (AT Rsch ) has to be earlier than the departure time of the second trip leg (BT Rsch ).

(

TID1 select 2

TID

select

  ¼ 9! TIDR1 ;R2 

     1   2  2 1 : Min BRsch  Bact  þ ARsch  Aact  & ðBT Rsch  AT Rsch Þ>0

def

ð10Þ

R1 ¼1;:::;n R2 ¼1;:::;n0

Once trip IDs are identified, trip information including route name, direction and timetable are associated with each trip. On one weekday, GoCard data over the SEQ transit network was analyzed for this research, comprising 428,831 trips after the general data cleaning process. The rail mode had a share of about 30% of total trips with 130,010 transactions. The proposed methodology in this section was applied to each trip in the dataset. Trip ID was identified for each rail trip along with route, direction, boarding and alighting time. The results were used to estimate passenger waiting time for each trip. For rail trips that required transfer, waiting time was estimated for the first leg of the trip as the second leg had a transfer time interval before boarding. The results indicated that about 5% of data were excluded due to the data cleaning. The resulting dataset contained nearly 124,670 trips from 145 stations made by more than 73,070 unique passengers across the SEQ network as shown in Fig. 1. In addition, only 4% of rail trips required transfer within the rail mode. Also, rail trips were the first leg of passenger trips in nearly 88.5% of trips and were the second leg of the trip in about 10.5% of trips. The share of the AM-peak was nearly 33% and close to the PM-peak share of 34%. Off-peak consisted of 26% of trips and only 7% of these occurred after the PM peak (noted as other time). Table 1 shows the statistical attributes of passenger waiting time for the rail mode. Potential independent variables were identified from the literature and are shown in Table 2. Statistical analyses derive useful information regarding variations, tendencies and behavior of the data. These analyses are crucial in elucidating the variables which are important for the prediction process and significantly influence the passenger waiting time. In this regard, analysis of variance (ANOVA) was performed to measure and test the statistical significance of passenger waiting time for each of the explanatory variables. Correlation matrices was calculated using the R 3.4.2 (R Core Team, 2017) and visualized using and the R ‘‘corrplot” package (Wei and Simko, 2017). The correlation plot, as shown in Fig. 2, shows the sign of the correlations by colours in which red and blue illustrate negative and positive correlations, respectively. In addition, the strength of correlations is symbolized by the size of the circles. There was a highly positive correlation (97%) between the daily boarding and alighting in each station (Bo_Al) and AM peak boarding and alighting in each station (Bo_Al_AM). The former variable (Al_AM) had a highly positive correlation (99%) with the AM peak alighting in each station (Al_AM). A highly positive correlation (98%) was found between population and employment. The rest of variables had a range of either negative or positive correlation up to 60%. Furthermore, statistical significance tests were conducted for each independent variable of each passenger’s waiting time, in order to identify potentially significant variables. Then, a set of potential explanatory variables as well as various combinations were tested to identify potential interactions between variables. Akaike’s Information Criterion (AIC) was used to compare models including the null model with a constant term only. A decrease in the AIC value reveals the importance of a set of variables in a model to explain variation across passenger waiting times. A stepwise procedure was employed to select the significant variables for each passenger’s waiting time as described by Collett (2003).

Table 1 Summary statistics of passenger waiting time (minutes).

Passenger waiting time *

Coefficient of variation.

Number of trips

Mean

Median

SD

Min

Max

COV*

Skewness

Kurtosis

124,670

6.2

4.9

5.2

0.02

29.8

0.8

1.38

5.01

505

A. Tavassoli et al. / Transportation Research Part F 58 (2018) 500–510 Table 2 Potential independent variables and their descriptions.

1

Variable description

Variable name

Value

Temporal characteristics AM Peak period (3:00–8:30) Off Peak period (8:30–15:30) PM peak period (15:30–19:00) Other time period (19:00–3:00)

AM Peak Off Peak PM peak Other time

1 = Yes; 1 = Yes; 1 = Yes; 1 = Yes;

0 = No 0 = No 0 = No 0 = No

Trip characteristics If rail trip has transfer within a rail mode If trip is the first leg of the whole trip If alternative route available for trip If boarding or alighting were in five stations close to CBD

Transfer IS_1STlg Alt_route IS_CBDs

1 = Yes; 1 = Yes; 1 = Yes; 1 = Yes;

0 = No 0 = No 0 = No 0 = No

Infrastructure and operation characteristics Distance from CBD1 Number of platforms in each station Headway of the route used in the trip Daily boarding and alighting in each station AM peak boarding and alighting in each station AM peak boarding in each station AM peak alighting in each station

Dist2CBD No_platform Headway Bo_Al Bo_Al_AM Bo_AM Al_AM

km 1 to 8 Minute Percentage Percentage Percentage Percentage

Demographic characteristics Population Employment Employment to population ratio ( )

Population Employment Emp2pop

Person Person Ratio

CBD = central business district.

Fig. 2. Correlation between the possible potential independent variables.

506

A. Tavassoli et al. / Transportation Research Part F 58 (2018) 500–510

5. Modelling results As described in the model development section, four parametric distribution models—log-logistic, lognormal and Weibull models with fixed parameters, and a Weibull model with gamma heterogeneity and fixed parameters—were fitted to the passenger waiting time data. Table 3 shows the estimation results including parameter estimates based on t-statistics for each variable, log-likelihood values and measure of likelihood ratio statistics. Parameters of the model are estimated using a standard maximum likelihood method. The best fitting model was selected based on a likelihood ratio test. The results, shown in Table 3, indicate that a log-logistic AFT model is the best fitting model according to the likelihood ratio statistics. In addition, a graphical goodness of fit method was carried out to demonstrate the appropriateness of the loglogistic assumption. If the survival time follows a log-logistic distribution, then the plot of the survival odds,  _ _  _ Log ð1  S ðtÞÞ= S ðtÞ , against LogðtÞ should be linear; where S ðtÞ is the Kaplan–Meier survival estimate (Kleinbaum and Klein, 2012). As shown in Fig. 3, the graph reveals a nearly linear trend. Therefore, the log-logistic AFT model provides a reasonable fit for passenger waiting time. A positive sign of parameter estimate for a variable shows an increase in the waiting time and a decrease in the hazard function. In addition, unobserved heterogeneity is not found to be an important issue for the passenger waiting time dataset. All variables are found to be significant at a 95% confidence level.

Table 3 Summary of survival AFT model estimation results for passenger waiting time. Variable

Log-Logistic

Weibull

Lognormal

Weibull with Gamma heterogeneity

Constant AM Peak PM peak Other time Emp2pop Dist2CBD No_platform Headway Bo_Al Transfer IS_1STlg Alt_route Sigma (r) Teta

2.048 (.000) .487 (.000) .109 (.000) .117 (.000) .498 (.000) .007 (.000) .019 (.000) .0009 (.000) .008 (.000) .172 (.000) .143 (.000) .046 (.003) .577 (.000) –

2.486 (.000) .454 (.000) .146 (.000) .084 (.000) .547 (.000) .004 (.000) .017 (.000) .0009 (.000) .005 (.000) .211 (.000) .151 (.000) .053 (.000) .814 (.000) –

1.892 (.000) .489 (.000) .097 (.000) .136 (.000) .508 (.000) .007 (.000) .008 (.000) .0009 (.000) .009 (.000) .155 (.000) .148 (.000) .059 (.000) 1.084 (.000) –

2.453 (.000) .460 (.000) .144 (.000) .084 (.000) .548 (.000) .004 (.000) .001 (.000) .0009 (.000) .005 (.000) .207 (.000) .152 (.000) .052 (.000) .807 (.000) .006 (.010)

P LL (0)1 LL (b)2 Sample size Number of covariates

1.734 209765 182068 124,670 13

1.228 188792 172352 124,670 13

.922 187014 187004 124,670 13

1.228 188792 172417 124,670 13

Likelihood ratio statistics

55,394

32,880

20

32,750

*

Parameter estimation followed by p-value in parentheses. ** Dependent variable is log of passenger waiting time in minutes. 1 Initial log-likelihood. 2 Log-Likelihood at convergence.

8

Survival odds

4 y = 1.6478x - 9.2194 R² = 0.9618

0 -4 -8 -12 0

2

4

6

8

Log (t) Fig. 3. Graphical evaluation of the log-logistic assumption for passenger waiting time. t denotes the waiting time in seconds.

507

A. Tavassoli et al. / Transportation Research Part F 58 (2018) 500–510

The log-logistic distribution relaxes the monotonicity of the hazard function. In the case of p > 1, the log-logistic hazard function increases from zero until it reaches a maximum at an inflection point, t ¼ ðp  1Þ1=p =k, then decreases over time to approach zero. In the case of p > 1, the hazard decreases monotonically from 1; and if p = 1, then the hazard is monotonically decreasing in the duration from parameter k (Washington et al., 2011). The results suggest that the p value is greater than one for the log-logistic distribution. Based on the above discussion, the calculated value of t at the inflection point is calculated to be 3.73 min. On this basis, the hazard function increases from zero until it reaches a maximum at an inflection point which is 3.73 min, then decreases over time to approach zero. In other words, waiting time is likely to end soon after passing this maximum point, as shown in Fig. 4. The lower coefficient for the AM peak indicates that rail passengers follow the schedule and show up in the station closer to the departure time compared to the other time periods. However, rail passengers have a more relaxed schedule to come to the station but still follow the schedule. Having a positive coefficient for other time periods shows that rail passengers do not follow the schedule and do not have a restricted plan for their journey outside of the peak periods. To gain further insight into the effects of explanatory variables on the AFT model, the exponents of the coefficients can be used in an intuitive way to interpret the results. The coefficient exponents translate to a percent change in waiting time as a function of explanatory variables, changing from zero to one for binary variables and by 1 unit for continuous variables (Jenkins, 2005). For instance, the exponent of the estimated coefficient of ‘AM Peak’ is exp(.487) = 0.61 indicating that the waiting time during AM peak trips was 1–0.61 = 39% shorter compared to the ‘baseline’ waiting time. During PM peak, waiting time decreased by 10%. However, trips between 7:00 PM until 3:00 AM resulted in an increased waiting time of 12%. Moreover, for each kilometre distance from the CBD the waiting time of a trip was 0.7% longer. This effect is likely to reflect other operational and infrastructural factors on waiting time. Similarly, for a unit increase in the total passenger activity (Bo_Al), waiting time increased by 0.8%. Although headway is supposed to be an important factor on passenger waiting time, the impact of this variable is not as significant as other variables since some of its effect is already captured in the peak period variables. A higher number of platforms indicate a larger station size and results in a decrease in waiting time. This behavior can be interpreted as passengers have more options at a large station. So, if a passenger misses a service, the next service will arrive shortly. Therefore, passengers show up at a large station closer to the departure time of services compared to a smaller station in which alternative options to get to destination would be limited. For one unit increase in the ratio of employment to population (Emp/pop), waiting time decreased significantly by 39%. This may reflect passenger behavior of the journey to work when using the rail mode. As anticipated, the availability of an alternative route for a trip, increased waiting time by 5%. Trips where transfer was required or rail mode was the first leg of the trip decreased waiting time by 16% and 13%, respectively. Table 4 shows the impact of each significant variable on passenger waiting time. According to Table 4, the three top parameters in order of importance are proportion of employees, time of the day (morning peak), and having a transfer (within rail mode). The duration of waiting time was determined according to the different confidence levels. Consequently, different scenarios were evaluated based on the different probabilities as depicted in Fig. 5. For example, half of the passengers in SEQ wait less than 5 min and 90% of them experience a waiting time of 16 min or less.

0.25

3.7

Hazard rate

0.2 0.15 0.1 0.05 0 0

10

20

30

40

50

60

70

Passengers' waing me (minute) Fig. 4. Hazard function of log-logistic AFT models for passenger waiting time.

80

90

508

A. Tavassoli et al. / Transportation Research Part F 58 (2018) 500–510 Table 4 Resulting percentage changes in passenger waiting time WRT changing* each variable.

*

Variable

Changes (%)

AM peak PM peak Other time Emp2pop Dist2CBD No_platform Headway Bo_Al Transfer IS_1STlg Alt_route

38.58 10.37 12.36 39.22 0.67 1.94 0.09 0.75 15.77 13.36 4.49

One unit for continuous variables and zero to one for binary variables.

Fig. 5. The typical survival function of log-logistic AFT models.

6. Summary and conclusions In this study, parametric accelerated failure time (AFT) survival models were adapted to estimate passenger waiting time. These models include log-logistic, lognormal and Weibull, which consider fixed parameter assumptions, as well as Weibull with gamma heterogeneity. This facilitated not only investigating the factors affecting waiting time but also finding the best shape for the hazard rate. A total number of 124,670 rail trips were used as the study dataset. These trips were recorded in one day by the automatic fare collection system in SEQ, Australia. The area includes all 145 rail stations in Brisbane, the Gold Coast, and the Sunshine Coast. Nineteen variables were selected as the possible independent four types of variables: temporal characteristics, infrastructure and operation attributes, trip characteristics, and demographics. Then, by performing the analyses, factors affecting the passenger waiting time were identified. Log-logistic AFT models were inferred to be the best fit for passenger waiting time. The analyses described in this paper indicated that a total of 11 variables significantly affected the waiting time: three were related to temporal characteristics, four were related to infrastructure and operation attributes, three were related to trip characteristics and one was related to demographic characteristics. Once an ATF model is established, the survival function based on explanatory variables can be used to make waiting time predictions. Moreover, the duration of waiting time can be determined according to the different confidence levels. Consequently, different scenarios were evaluated based on different probabilities. For example, half of the passengers in SEQ wait less than 5 min and 90% of them experience a waiting time of 16 min or less. The implementation of waiting time predictions in different conditions has enormous benefits and can help to make the best use of limited resources associated with operations and planning. It was found that the three significant parameters in

A. Tavassoli et al. / Transportation Research Part F 58 (2018) 500–510

509

the order of importance are proportion of employees in the boarding zone, time of the day (morning peak), and having a transfer (within rail mode). The higher the waiting time, the longer the passengers have to stay at the stations. Consequently, appropriate facilities and capacities are required to maintain the quality of the service. In this regard, the findings of this study are useful for planners and operators through an improved understanding of the influential variables on passenger waiting time and devising strategies to reduce waiting time. In addition, the developed model can be directly used in order to predict waiting time, given that predictor variables are available. Further research needs to be conducted to investigate the temporal stability of passenger waiting time as well as to verify the results of this study using several days of data. In addition, it is recommended that the effects of trip purpose and also land use variables on passenger waiting time be examined. Furthermore, performing the analysis using the actual arrival time of the services (the GTFS online) is suggested in future analysis to investigate the level of error involved when calculating the waiting time based on the scheduled arrival time. Acknowledgments The authors would like to acknowledge TransLink for providing the AFC data for this research. We also would like to thank Prof Mark Hickman, Chair of the Transport Academic Partnership between DTMR and The University of Queensland, for his support and advice to the research highlighted in this paper. References Anastasopoulos, P. C., Haddock, J. E., Karlaftis, M. G., & Mannering, F. L. (2012). An analysis of urban travel times: A random parameters hazard-based approach. TRB 91st annual meeting compendium of papers. Washington, DC: Transportation Research Board. Bhat, C. R. (1996). A generalized multiple durations proportional hazard model with an application to activity behavior during the evening work-to-home commute. Transportation Research, Part B (Methodological), 30B(6), 465–480. Bhat, C. R., & Pinjar, A. R. (2008). Duration modeling. In D. A. Hensher & K. J. Button (Eds.), Handbook of transport modelling (2nd ed.. Amsterdam, Oxford: Elsevier. Bouzir, A., Souissi, B., & Benammou, S. (2014). Modeling the passengers’ waiting times at multimodal stations. International conference on logistics and operations management. Bowman, L. A., & Turnquist, M. A. (1981). Service frequency, schedule reliability and passenger wait times at transit stops. Transportation Research Part A: General, 15(6), 465–471. Cats, O., & Gkioulou, Z. (2014). Modeling the impacts of public transport reliability and travel information on passengers’ waiting-time uncertainty. EURO Journal on Transportation and Logistics, 1–24. Ceder, A. (2016). Public transit planning and operation: Modeling, practice and behavior (2nd ed.). CRC Press. Ceder, A., & Marguier, P. H. (1985). Passenger waiting time at transit stops. Traffic Engineering & Control, 26(6), 327–329. Collett, D. (2003). Modelling survival data in medical research. Boca Raton, Florida: Chapman & Hall/CRC. Csikos, D., & Currie, G. (2008). Investigating consistency in transit passenger arrivals: Insights from longitudinal automated fare collection data. Transportation Research Record: Journal of the Transportation Research Board, 2042, 12–19. Currie, G., & Csikos, D. R. (2007). The impacts of transit reliability on wait time: Insights from automated fare collection system data. Transportation research board 86th annual meeting. Fan, W., & Machemehl, R. (2009). Do transit users just wait for buses or wait with strategies? Some numerical results that transit planners should see. Transportation Research Record: Journal of the Transportation Research Board, 2111, 169–176. Frumin, M., & Zhao, J. (2012). Analyzing passenger incidence behavior in heterogeneous transit services using smartcard data and schedule-based assignment. Transportation Research Record: Journal of the Transportation Research Board, 2274, 52–60. Google (2013). General transit feed specification reference [Online]. Available: (accessed 20/02/2014). Greene, W. (2012a). Limdep, version 10, economic modelling guide. Plainview, NY, USA: Econometric Software, Inc. Greene, W. H. (2012b). Econometric analysis. Boston: Prentice Hall. Hensher, D. A., & Mannering, F. L. (1994). Hazard-based duration models and their application to transport analysis. Transport Reviews, 14(1), 63–82. Huddart, K. W. (1973). Bus priority in greater London. Bus bunching and regularity of service. Traffic engineering and control, 14, 592–594. Iseki, H., & Taylor, B. D. (2010). Style versus service? An analysis of user perceptions of transit stops and stations. Journal of Public Transportation, 13(3), 2. Jenkins, S. P. (2005). Survival Analysis. Colchester, UK: Institute for Social Science and Economic Research, University of Essex (unpublished manuscript). Kleinbaum, D. G., & Klein, M. (2012). Survival analysis: A self-learning text. New York, NY: Springer. Lam, W., Lam, W., Morrall, J., & Morrall, J. (1982). Bus passenger walking distances and waiting times: A summer-winter comparison. Transportation Quarterly, 36(3), 407–421. Luethi, M., Weidmann, U. A., & Nash, A. (2007). Passenger arrival rates at public transport stations. Transportation research board 86th annual meeting. Mannering, F., Kim, S.-G., Barfield, W., & Ng, L. (1994). Statistical analysis of commuters’ route, mode, and departure time flexibility. Transportation Research Part C, 2(1), 35–47. Mishalani, R. G., Mccord, M. M., & Wirtz, J. (2006). Passenger wait time perceptions at bus stops: Empirical results and impact on evaluating real-time bus arrival information. Journal of Public Transportation, 9(2), 5. Nassir, N., Hickman, M., & Ma, Z.-L. (2015). Activity detection and transfer identification for public transit fare card data. Transportation, 42(4), 683–705. Psarros, I., Kepaptsoglou, K., & Karlaftis, M. G. (2011). An empirical investigation of passenger wait time perceptions using hazard-based duration models. Journal of Public Transportation, 14(3), 6. Queensland Government (2013). General Transit Feed Specification (GTFS)—South East Queensland [Online]. Available (accessed 20/02/ 2014). R Core Team (2017). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. Available (accessed 20/03/2018). Rashidi, T., & Mohammadian, A. (2011). Parametric hazard functions. Transportation Research Record: Journal of the Transportation Research Board, 2230, 48–57. Rashidi, T. H., Auld, J., & Mohammadian, A. (2012). A behavioral housing search model: Two-stage hazard-based and multinomial logit approach to choiceset formation and location selection. Transportation Research Part A: Policy and Practice, 46(7), 1097–1107. Salek, M.-D., & Machemehl, R. B. (1999). Characterizing bus transit passenger wait times. College Station, TX: Southwest Region University Transportation Center.

510

A. Tavassoli et al. / Transportation Research Part F 58 (2018) 500–510

Shi, Z., Zhang, N., & Zhang, Y. (2016). A hazard-based model for estimation of congestion duration in urban rail transit considering loss minimization. TRB 95th Annual Meeting Compendium of Papers. Washington, D.C.: Transportation Research Board. Tavassoli, A., Mesbah, M., & Hickman, M. (2017). Application of smart card data in validating a large-scale multi-modal transit assignment model. Public Transport. https://doi.org/10.1007/s12469-017-0171-1. Tavassoli Hojati, A., Ferreira, L., Washington, S., Charles, P., & Shobeirinejad, A. (2014). Modelling total duration of traffic incidents including incident detection and recovery time. Accident Analysis & Prevention, 71, 296–305. Turnquist, M. A. (1978). A model for investigating the effects of service frequency and reliability on bus passenger waiting times. Wardman, M. (2004). Public transport values of time. Transport policy, 11(4), 363–377. Washington, S., Karlaftis, M. G., & Mannering, F. L. (2011). Statistical and econometric methods for transportation data analysis. Boca Raton, FL: CRC Press. Wei, T. & Simko, V. (2017). R Package ‘Corrplot’: visualization of a correlation matrix. Version 0.84. Available: (accessed 20/03/2018). Zahir, U. M., Matsui, H., & Fujita, M. (2000). Investigate the effects of bus and passenger arrival patterns and service frequency on passenger waiting time and transit performance of Dhaka metropolitan area. International conference on urban transport and the environment for the 21st century. Cambridge, UK: Cambridge University.