Available online at www.sciencedirect.com
ScienceDirect ScienceDirect
Transportation Research Procedia 00 (2018) 000–000 Available online at www.sciencedirect.com Transportation Research Procedia 00 (2018) 000–000
ScienceDirect
www.elsevier.com/locate/procedia www.elsevier.com/locate/procedia
Transportation Research Procedia 32 (2018) 279–290 www.elsevier.com/locate/procedia
International Steering Committee for Transport Survey Conferences International Steering Committee for Transport Survey Conferences
Evaluating the biases and sample size implications of multi-day Evaluating the biases and sample sizetravel implications GPS-enabled household surveys of multi-day GPS-enabled household travel surveys a a
Gregory D. Erhardta,* Louis Rizzob Gregory D. Erhardta,* Louis Rizzob
University of Kentucky, 261 Oliver H. Raymond Building, Lexington, Kentucky 40506, USA b Westat, 1600261 Research USA40506, USA University of Kentucky, Oliver Boulevard, H. RaymondRockville, Building,Maryland Lexington,20850, Kentucky b
Westat, 1600 Research Boulevard, Rockville, Maryland 20850, USA
Abstract Abstract This research explores biases and samples size implications of a multi-day GPS-enabled household travel survey. The bias tests show significant differences survey days and across data collection method, indicating the survey. weaknessThe of bias the GPSThis research explores biasesboth and across samples size implications of a the multi-day GPS-enabled household travel tests only approach in a subsample of thissurvey survey. goes oncollection to examine the sample size implications show significantuse differences both across daysThe andresearch across the data method, indicating the weaknessofofcollecting the GPSadditional survey It finds thatofthe three-day Ohiogoes sample is equivalent to sample a single-day sample between 26% and only approach usedays. in a subsample this survey. Northeast The research on to examine the size implications of collecting 64% larger.survey The framework provides a viable meansNortheast of corrected repeated measurement andsample shouldbetween be repeated additional days. It finds that the three-day Ohiothesample is equivalent to aproblem, single-day 26%using and an unbiased 64% larger. survey. The framework provides a viable means of corrected the repeated measurement problem, and should be repeated using an unbiased survey. © 2018 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY-NC-ND license © 2018 The Authors. Published by Elsevier Ltd. (http://creativecommons.org/licenses/by-nc-nd/3.0/) © 2018 The Authors. by Elsevier Ltd. This is license an open(http://creativecommons.org/licenses/by-nc-nd/3.0/) access article under the CC BY-NC-ND license This is an open accessPublished article under the CC BY-NC-ND Peer-review under responsibility of the International Steering Committee for Transport Transport Survey Survey Conferences Conferences (ISCTSC). (ISCTSC) (http://creativecommons.org/licenses/by-nc-nd/3.0/) Peer-review under responsibility of the International Steering Committee for Peer-review under responsibility of the International Steering Committee for Transport Survey Conferences (ISCTSC) Keywords: Multi-Day Surveys, GPS-Enabled Surveys, Sample Size, Travel Demand Model Keywords: Multi-Day Surveys, GPS-Enabled Surveys, Sample Size, Travel Demand Model
1. Introduction 1. Introduction 1.1 Motivation and Necessary Conditions 1.1 Motivation and Necessary Conditions There has long been an interest in conducting travel surveys in which the same respondent provides information forThere multiple days.anThere are in three important motivations this interest. there is provides a potential to reduce hastravel long been interest conducting travel surveys for in which the sameFirst, respondent information survey costs travel by sampling fewerare households. This savings may for occur large portion the survey cost is for multiple days. There three important motivations thisbecause interest.a First, there is of a potential to reduce survey costs by sampling fewer households. This savings may occur because a large portion of the survey cost is
* Corresponding author. Tel.: +1-859-323-4856. E-mail address:author.
[email protected] * Corresponding Tel.: +1-859-323-4856. E-mail address:
[email protected]
2352-1465 © 2018 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/3.0/) 2352-1465 © 2018 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY-NC-ND license Peer-review under responsibility of the International Steering Committee for Transport Survey Conferences (ISCTSC) (http://creativecommons.org/licenses/by-nc-nd/3.0/) Peer-review under responsibility of the International Steering Committee for Transport Survey Conferences (ISCTSC) 2352-1465 2018 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/3.0/) Peer-review under responsibility of the International Steering Committee for Transport Survey Conferences (ISCTSC). 10.1016/j.trpro.2018.10.051
280 2
Gregory D. Erhardt et al. / Transportation Research Procedia 32 (2018) 279–290 Erhardt and Rizzo / Transportation Research Procedia 00 (2018) 000–000
involved in recruiting respondents, so the cost of adding a second day of data collection is expected to be less than the cost of recruiting a second household. Second, by better distinguishing between-person versus within-person variance, multi-day surveys could provide more reliable model parameter estimates under certain conditions (Pas 1987), similar to the way in which conducting multiple experiments per respondent can be beneficial for stated preference surveys (Cherchi and de Dios Ortúzar 2008; Rose et al. 2009). The third motivation is that multiday surveys allow for the development of more sophisticated models that capture the day-to-day dynamics of travel (Kang and Scott 2009; Bhat, Srinivasan, and Axhausen 2005; Dill and Broach 2014; Xu and Guensler 2015). In order for multi-day surveys to be valuable for these purposes, two conditions should be met. First, each day of data collection should provide an unbiased representation of travel on that day. This was not the case in multi-day diary-based household travel surveys, where it was common to observe lower trip rates on the second day of the survey than on the first—likely due to survey fatigue (Meurs, van Wissen, and Visser 1989; Golob and Meurs 1986; Pendyala and Pas 2000). The new generation of GPS-enabled surveys offers the potential to overcome this problem by reducing the respondent burden (Wolf et al. 2014). The second condition is that subsequent days of data collection should provide additional information. If someone’s travel behavior is identical from day to day, there is limited value to collecting additional days’ worth of data for that same traveler. 1.2 Multiple Uses of Travel Surveys In answering questions about sample size, it is critical to consider the ways in which the data will be used. Often, sample sizes are determined based on the maximum allowable error around a certain measure or set of measures. While such an approach may be appropriate for a public opinion survey, where the goal is to understand, for example, the percent of people who would vote for a particular candidate, such an approach greatly oversimplifies the ways in which travel survey data are used in practice. In addition to providing basic observations of travel behavior, a key use of Household Travel Survey (HTS) data is to develop travel demand models which are used to forecast the behavioral response to transportation infrastructure and service changes. In model development, these data play a role both in model estimation and in model calibration. For model estimation, the data are used to statistically estimate model coefficients for key travel choices, such as the sensitivity of mode choice to travel time and cost. Aggregate tabulations of the weighted and expanded data are also used to calibrate the outputs of the implemented travel model. Sample size will affect both the coefficient estimates and the weighted data tabulations. Therefore, it is important to analyze the data in the way they will ultimately be used. The practical challenge in doing this is both in providing sufficient detail as they relate to the different uses, and in determining how to value those detailed analyses in an effort to determine “the bottom line”. The Southeast Florida Transportation Council (2014) provides discussion of these issues and how they may affect sample size decisions. 1.3 The Repeated Measurement Problem The standard method for estimating travel models is to treat each record as an independent observation. When the observations are not independent, as is clearly the case when collecting multiple days of travel information from the same individual, the “repeated measurements problem” is encountered (Cirillo, Daly, and Lindveld 1998; Ortuzar and Willumsen 2011; Stopher et al. 2008). The basic problem is that there is less information in the repeated measurements than in the same number of independent measurements. The correlation across days is likely to vary with the type of travel behavior. For example, most people go to work in the same place every day, and are likely to take the same mode to work, but their non-work activities may vary to a greater degree. Models estimated from data with the repeated measurement problem will have unbiased parameter estimates but flawed variance estimates (Pas 1986). The key to understanding the value of multi-day survey data is to measure the relative amount of intra-person day-to-day variability versus the inter-person variability (Pas 1987). 1.4 Research Contribution
Gregory D. Erhardt et al. / Transportation Research Procedia 32 (2018) 279–290 Erhardt and Rizzo / Transportation Research Procedia 00 (2018) 000–000
281 3
This research explores whether a multi-day GPS-enabled household travel survey meets the conditions necessary for a multi-day survey to be valuable (lack of bias, additional days add information). It does so using data from the 2012 Northeast Ohio Regional Travel Survey. The analysis expands upon previous work to consider multiple uses of travel surveys, examining the effect of sample size on both model estimation and model calibration. It does this by estimating models for a set of key travel choices, and generating tabulation model calibration summaries for those same travel choices. The study employs a statistically rigorous method for handling the repeated measurement problem, by using the jackknife as an unbiased estimator of the true variance and by measuring design effects and correlation coefficients to understand the relative value of adding survey days versus adding households. Collectively, this study both contributes empirical evidence to our understanding of the potential value of multi-day surveys, and develops a detailed method by which future surveys can be analyzed to further add to that evidence. The remainder of this paper is structured as follows. Section 2 describes the data source and overall approach. Section 3 reports the results of bias checks. Section 4 examines the sample size implications of multi-day surveys, using the design effects approach. The paper finishes with conclusions and recommendations for future work. 2. Data and General Approach 2.1. Data 2012 Northeast Ohio Regional Travel Survey (Wilhelm et al. 2014) was selected as the test data set for this study. This survey in the Cleveland region was selected because: 1) travel data was collected for three to four consecutive days, allowing for multi-day analysis beyond day two, 2) the survey featured a unique design in which separate GPSonly versus GPS-with-prompted-recall samples were collected, allowing for a comparison of these approaches, and 3) it includes a larger sample of households compared to an available GPS travel survey in Cincinnati. The goal of the travel survey was to collect socioeconomic, demographic and travel information for households in the five-county Cleveland region, with an aggregate population of roughly 850,000 households. The survey used a dual sampling frame that included an address based sample frame and a listed residential landline telephone sample frame. It included a stratified sample segmented by geography and by demographic controls and was weighted to align with controls from the American Community Survey and the 2010 Census. The survey was completed using a three stage process. Household characteristics and person characteristics on each household member were collected during the recruitment interview. During the recruitment interview, households were assigned travel days, during which household members between age 13 and 75 were asked to carry GPS loggers. Household members younger than 13 or older than 75 were asked to use logs to record their travel. For a portion of households, as described below, there was a retrieval survey that collected detailed travel characteristics after the GPS loggers were retrieved. The data set has a total sample size H=4,545 completed households. The sample is broken into three distinct partitions based on the method of data retrieval, as shown in Table 1. Table 1: Observations in Each Data Retrieval Partition GPS Flag Prompted Recall Participants GPS Only Participants - No Retrieval Interview Travel Logs Only - Not GPS Eligible - No Retrieval Interview Total
Days of Complete Travel Information
Households
Percent of Households
1
1,312
28.90%
3-4
2,780
61.20%
1
453
10.00%
4,545
100.00%
For households in the prompted recall sample, participants were asked to use a GPS device to track their travel for three or four days, depending on the day of the week. Participants who began the survey on a Friday used the devices for four days to capture at least two weekdays of travel. A prompted recall survey was then used where households were asked to confirm trips made on day 1, and to collect information such as trip purpose and mode. To save
282 4
Gregory D. Erhardt et al. / Transportation Research Procedia 32 (2018) 279–290 Erhardt and Rizzo / Transportation Research Procedia 00 (2018) 000–000
processing cost, the GPS-with-prompted-recall data was only fully processed for Day 1 as part of the original survey work. The result is that the GPS data beyond Day 1 are not available to this study. For households in the GPS only sample, the same GPS methodology was used, but no prompted recall survey was completed. Oliveira and Gupta (2013) describe the process by which the following information was imputed for the GPS-only observations: address, shared travel, travel mode, tolling, parking activity, and trip purpose. A full set of imputed travel information is available for all survey days. The travel log only households were for households where all members are older than age 75, with data collected via travel log instead of GPS device. They are not considered in the remainder of this analysis. Households where any member is 75 or younger are randomly assigned to either the GPS-with-recall or GPS-only samples. 2.2. Key Travel Choices When HTS data are used to develop travel models or analyze travel behavior, it is rarely a single travel choice being considered, but rather a series of linked choices. The key factor in the analysis is the relative day-to-day versus person-to-person variance. A long-term choice, such as auto ownership, will rarely change between travel days for the same person or household. Therefore, when it comes to analyzing auto ownership, having three days of data is no better than having one day of data. In contrast, tour generation for non-workers can reasonably be expected to vary quite a bit between travel days for the same individual, so there could be significant value in having multiple days of data. In addition, different models typically use different levels and dimensions of segmentation. To illustrate the range of effects, this research focuses on four key travel choices, described as follows: • Auto ownership – Low or no day-to-day variation, but the number and spatial location of zero-auto households is a major driver of downstream travel choices. • Tour generation – Separate models are developed for workers and non-workers, with the expectation that nonworkers will have more schedule flexibility and a higher level of day-to-day variation. Demographic and socioeconomic variables are of particular importance, providing the potential to highlight sample size issues for some segments of the population. • Destination choice – The inclusion of destination provides a means for understanding the variation both in distance and in location. Separate models are developed for work tours and for social/recreation tours to highlight the differences across purposes. • Mode choice – The choice of modes is central to the modelling process in many locations, allowing issues related to this key travel choice to be highlighted. Separate models are developed for work tours and for social/recreation tours. We recognize that surveys are used both for model calibration and model estimation. For each key travel choice, a set of tabular data is generated from the survey in a form that could be used for model calibration, and a logit model is estimated from the survey data to provide the basis for understanding the effects model coefficients. 2.3. File Segmentation The analysis focused specifically on the GPS-only sample, because that is the sample that collected multiple days of travel information, and was limited to weekday travel. The data were segmented into three files: a file with just the first monitored weekday of travel for each household, a file with the first two weekdays of travel, and a file with all monitored weekdays (up to 3). This segmentation does two things. First, it allows us to calculate the design effects, which measure the information provided by the additional travel days. Second, it allows us to report both the data summaries and model estimation results for each of the three files. This provides a means of seeing the degree to which the three sample sizes affect those results. 2.4. Jackknife Variance Estimation The introduction to this report discussed the repeated measurement problem. In such a situation the mean values and parameter estimates will be unbiased, but the variances are not. This can lead to flawed t-statistics or tests of
Gregory D. Erhardt et al. / Transportation Research Procedia 32 (2018) 279–290 Erhardt and Rizzo / Transportation Research Procedia 00 (2018) 000–000
283 5
statistical significance. The jackknife can be used as an unbiased estimator of the true variance (Cirillo, Daly, and Lindveld 1998), as is done here, to avoid these problems. The basic approach is that the model estimation (or the calculation of mean values) is repeated multiple times, each time using a different set of weights that excludes a subset of records. The jackknife variance estimate is then calculated as a weighted combination of each individual estimation. The details of the weighting scheme used here are documented separately (Rizzo and Erhardt 2016). Basic references for the jackknife variance estimator include Wolter (2007), Chapter 4 and Valliant, Dever, and Kreuter (2013). 3. Bias Checks 3.1. Methodology The survey was tested for bias in two dimensions: 1 The GPS-with-recall sample vs. day 1 of the GPS-only sample. Because households were randomly assigned to each sample, so there should not be a difference in reality. 2 Day 1 vs subsequent days of the GPS only sample. A major goal of using GPS loggers is to eliminate this bias between collection days. In all cases, bias was tested for a subset of measures focusing on tour generation, tour length and mode shares. Ttests were used, with the variances estimated using the jackknife. 3.2. Selected Results Detailed results of the bias tests are available in technical appendices (Rizzo and Erhardt 2016), but are summarized here. The evaluation of the GPS-with-recall vs. GPS-only samples revealed: • There was a limited and marginally significant difference in trips per person between the two GPS strata. There were 4.03 mean trips per person per day for the GPS-only data, and 4.23 mean trips per person per day for the GPS-with-prompted-recall data. • There were significant differences in trips per person between the two GPS strata for most of the trip-purpose domains. In some cases (home-based shopping, home-based work, home-based school), the GPS-with-promptedrecall stratum had the greater mean. In some cases (non-home-based, home-based social/recreation), the GPS-only stratum had the greater mean. • Similar results were obtained when the data were analyzed as the average number of tours per person, segmented by person type. The total number of tours per person were lower, to varying degrees, in the GPS-only sample stratum, and the number of tours by purpose were different. The evaluation across survey days revealed: • There were in fact differences between Day 1 and Days 2 and 3. There was considerably less difference between Days 2 and 3. There seems to be a dropoff in data collection between Day 1 and the other collection days. The greatest difference between Day 1 and Days 2 and 3 were in mean trips per person, both overall and by separately by trip purpose. • There were significant differences in tour mean values across days, and these varied in a complicated way among tour types for the various person types. • Mean trip length is not significantly different across days, but mean trip duration is significantly different. These differences are further highlighted by two tables of important travel outcomes, stratified by the sample and day. Table 2 shows the percent of workers choosing each daily tour pattern considered in the tour generation model. Each row shows a pattern of tours made by the worker during the travel day, with the letter codes indicating the purpose of each tour, as described at the bottom of the table. For example, W indicates that the traveler made a single work tour, SH indicates that the traveler made a single shopping tour, and W-SH indicates that the traveler made two tours, one for work and one for shopping. H indicates that the traveler stayed home all day. In the GPS-with-recall sample, about 75% have a tour pattern that involves making a single work tour (regardless of the number of intermediate stops). In Day 1 of the GPS-only sample, this number is less than 25%, and in Days 2 and 3 it drops to 15%. Such a low percentage of workers going to work is a prima facie violation of our expectations
Gregory D. Erhardt et al. / Transportation Research Procedia 32 (2018) 279–290 Erhardt and Rizzo / Transportation Research Procedia 00 (2018) 000–000
284 6
for reasonable travel behavior. A combination of factors may be contributing to this large discrepancy. If the traveler neglects to carry the GPS device with them, it would be erroneously recorded as the traveler staying at home all day in the GPS-only sample, whereas in the prompted recall sample the recall interview would identify and correct this. It also appears that work tours in the GPS-only sample are be erroneously recorded as tours for other purposes. This may be due to a limitation of the data processing method that is unable to identify the workplace, and/or people working in locations other than their regular workplace. Table 2: Percent of Workers Choosing Each Daily Tour Pattern GPS with Recall Day 1
Day 1
GPS-Only Sample Day 2 Day 3/4
1 H
11.3%
23.8%
44.6%
46.7%
37.3%
2 SH
0.7%
5.6%
4.8%
4.5%
5.0%
3 SR
0.1%
10.5%
8.6%
8.8%
9.4%
4 O
0.4%
7.4%
7.0%
7.7%
7.3%
5 SH-SR
0.0%
2.6%
1.7%
1.6%
2.0%
6 SH-O
0.1%
1.8%
1.7%
1.5%
1.7%
7 SR-O
0.0%
5.4%
4.1%
4.2%
4.6%
8 SH-SR-O
0.0%
1.3%
0.8%
0.9%
1.0%
9 W
Alt
Label*
All
74.9%
24.4%
15.9%
15.2%
19.0%
10 W-SH
1.9%
1.5%
0.9%
0.9%
1.1%
11 W-SR
0.5%
5.6%
3.1%
2.9%
4.0%
12 W-O
3.0%
3.1%
2.0%
1.8%
2.3%
13 W-SH-SR
0.0%
0.3%
0.0%
0.1%
0.1%
14 W-SH-O
0.2%
0.2%
0.1%
0.1%
0.1%
15 W-SR-O
0.1%
0.7%
0.5%
0.4%
0.5%
16 W-SH-SR-O
0.0%
0.0%
0.0%
0.0%
0.0%
17 W-WB
6.5%
3.9%
2.9%
1.9%
3.0%
18 W-WB-SH
0.1%
0.2%
0.3%
0.0%
0.2%
19 W-WB-SR
0.2%
0.9%
0.8%
0.5%
0.8%
20 W-WB-O
0.0%
0.6%
0.1%
0.3%
0.4%
21 W-WB-SH-SR
0.1%
0.1%
0.0%
0.0%
0.0%
22 W-WB-SH-O
0.0%
0.0%
0.0%
0.0%
0.0%
23 W-WB-SR-O
0.0%
0.2%
0.0%
0.0%
0.1%
24 W-WB-SH-SR-O
0.0%
0.0%
0.0%
0.0%
0.0%
100.0%
100.0%
100.0%
100.0%
100.0%
Total
* H=Stay Home, W=Work Tour, SH=Shop Tour, SR=Social/Recreational Tour, O=Other Tour
Table 3 shows the percent of work tours with each primary mode. While any differences across survey days are not obvious, the GPS-only sample has much higher drive-alone mode shares, and much lower shared-ride mode shares than the GPS-with-recall sample. Note that for auto tours the definition of primary tour mode is based on the highestoccupancy trip during the tour. The GPS-only sample also has lower walk and local bus mode shares. The differences observed here could be the result of two major factors. First, they could be the result of improper participant protocol adherences. Specifically, the GPS loggers need to be carried along and recharged on a daily basis. It may be easy for participants to leave the loggers at home, especially on later survey days.
Gregory D. Erhardt et al. / Transportation Research Procedia 32 (2018) 279–290 Erhardt and Rizzo / Transportation Research Procedia 00 (2018) 000–000
285 7
Table 3: Percent of Work Tours with Each Primary Mode GPS with Recall Alt
GPS-Only Sample
Label
Day 1
Day 1
Day 2
Day 3/4
All
1 Walk
7.2%
3.1%
2.8%
2.1%
2.8%
2 Bike
1.1%
1.1%
0.8%
0.9%
1.0%
3 Drive-Alone
43.8%
79.8%
81.7%
80.1%
80.3%
4 Shared Ride 2
24.8%
9.6%
9.1%
11.1%
9.8%
5 Shared Ride 3+
17.3%
0.8%
0.3%
0.9%
0.7%
6 Walk to Local Bus
4.5%
3.8%
3.5%
3.0%
3.5%
7 Walk to Express Bus
0.2%
0.0%
0.2%
0.2%
0.1%
8 Walk to Rail
0.1%
0.2%
0.3%
0.0%
0.2%
10 Park-and-Ride to Express Bus
0.3%
0.9%
0.2%
1.2%
0.8%
11 Park-and-Ride to Rail
0.6%
0.5%
1.0%
0.4%
0.6%
13 Kiss-and-Ride to Express Bus
0.1%
0.2%
0.0%
0.0%
0.1%
14 Kiss-and-Ride to Rail
0.0%
0.1%
0.0%
0.2%
0.1%
100.0%
100.0%
100.0%
100.0%
100.0%
Total
Second, differences could result from limitations of the imputation process. Shared ride trips are identified based on overlapping GPS traces for different household members. However, it is clearly difficult to impute the number of people in a vehicle, especially if some household members do not bring their devices with them. It may also be difficult to detect the mode when buses and cars operate in mixed traffic. The tour purpose can theoretically be imputed from the land use surrounding the GPS points at the tour destination, in combination with known work and school locations. In practice, this imputation process may be harder, given the much lower rate of work tours and higher rate of non-work tours observed in the GPS-only sample. These data show the limitations of the GPS-only approach, as used in the Cleveland HTS. Such an approach would have serious implications if the GPS-only sample were used to develop a travel model, and it is a caveat on the remaining results presented in this paper too. The implications are less severe here because the GPS-only sample is less different across days than it is different from the GPS-with-recall sample, but the magnitude of the effect on the remaining results remains unknown. 4. Sample Size Implications 4.1. Methodology To understand whether the potential cost savings from multi-day surveys are realized, it is necessary to consider the value of the additional data versus its cost. Stopher et al. (2008) provide a framework for the evaluation of variance for multiday surveys in the GPS era, and Pas (1986) develops an explicit cost model for comparing single-day and multiday studies. The end result is that the optimal number of survey days is determined by two factors: a cost ratio and a design effect. The cost ratio is the ratio of the marginal cost of surveying a household for T days to the cost of surveying the cost of surveying a household for one day. The design effect is a measure of the additional information gained by surveying additional days versus surveying additional households. It is measured as the ratio of the true variance to the variance assuming a simple random sample, with each day’s worth of information assumed to be fully independent. A design effect larger than 1 indicates positive correlation between days within households, reducing the information contributed from the extra monitored days beyond the first day. In deriving the design effects, we start from the ratio of the variance in each multi-day file to the variance in the single-day file. The variance ratios for two-day, and full files as compared to the one-day file are computed as:
Erhardt and Rizzo / Transportation Research Procedia 00 (2018) 000–000 Gregory D. Erhardt et al. / Transportation Research Procedia 32 (2018) 279–290
8 286
𝑣𝑣𝑣𝑣#𝑦𝑦% (') ) =
𝑣𝑣𝑣𝑣#𝑦𝑦% (0) ) =
+, #-%(.) )
(1)
+, #-%(1) )
(2)
+, #-%(/) ) +, #-%(/) )
Where 𝑣𝑣2 #𝑦𝑦% (3) ) is the jackknife variance estimator from the one-day, two-day or full file, with T=1, 2 or 3 respectively. In terms of design effects, 𝑦𝑦% (') has twice as many days, so we would expect the variance to be half as much if each extra day per household is providing as much new information as the first day. Thus the design effects are defined to compare the variance of 𝑦𝑦% (') to half that of 𝑦𝑦% (4), and similarly for 𝑦𝑦% (0): 𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑#𝑦𝑦% (') ) =
𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑#𝑦𝑦% (0) ) =
+, #-%(.) )
+, #-% (/) )⁄'
+, #-% (1) )
+, #-% (/) )⁄3 19
(3)
(4)
𝑇𝑇 0; is the mean number of monitored weekdays per sampled household for the full file. As there are three randomly selected travel days with 𝑇𝑇 =3 (Monday, Tuesday, and Wednesday as the start days) and two randomly selected travel days with 𝑇𝑇 =2 (Thursday and Friday as the start days), we in general set 𝑇𝑇 0; = 2.6. Following Pas (1986), we can decompose the variance ratios and design effects as follows, with T equal to 2 for the two-day file and equal to 𝑇𝑇 0; =2.6 for the full file: 𝑣𝑣𝑣𝑣#𝑦𝑦% (3) ) =
4;?∗(3A4) 3
𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑#𝑦𝑦% (3) ) = 1 + 𝑎𝑎 ∗ (𝑇𝑇 − 1)
(5)
(6)
Suppose the design was simple random sampling under a simple multivariate normal MVN population model. In this case, 𝑎𝑎 in the design effect formula would be the correlation coefficient. In our case, the design and estimator is more complicated: 𝑦𝑦% (4) itself has design effects from household-level clustering and weighting effects, which 𝑦𝑦% (') and 𝑦𝑦% (0) also share. In our case then 𝑎𝑎 is interpreted as something analogous to, but not completely equivalent to, a bivariate normal correlation coefficient. It is this estimated within-person a value that is reported in the results section of this report, on a percentage basis. It is a measure of how consistent person-days are within persons as measured by the jackknife variance estimates. The interpretation of positive values is that if that a behavior happens on one day, it is more likely to happen on the next. This is true of full time workers going to work, for example. The interpretation of negative values is that if that type of behavior happens on one day, it is less likely to happen on the next. This may be true of shopping tours, because many people may be able to fulfil their weekly shopping needs with a trip on a single day. A +100% a value would be indicative of very high positive correlation across travel days, indicating that the extra days provided no new information, such as would be expected from auto ownership. A 0% a value would be indicative of no correlation across travel days, such that sampling additional days adds as much value as sampling additional households. 4.2. Summary of Results As discussed in section 1.2, the survey is used both to estimate choice models from the disaggregate records (labeled below with “Model”), and to produce tabular summaries of aggregate measures (labeled below with “Table”). For example, in mode choice the output includes time and cost coefficients for the mode choice utility equations, as well as a summary of the total trips by mode and market segment. The design effects and correlation coefficients are specific to each estimated parameter or tabulated value within a key travel choice. There are hundreds of individual values (as is typical for the number of coefficients in a travel
Gregory D. Erhardt et al. / Transportation Research Procedia 32 (2018) 279–290 Erhardt and Rizzo / Transportation Research Procedia 00 (2018) 000–000
287 9
demand model), and the results show a wide range of values. The question therefore arises as to how the different correlation coefficients should be combined to make a judgment about the survey’s sample size. We begin by reporting the median and the 75th percentile correlation coefficient for each key travel choice, as shown in Table 4. For readers interested in examining the correlation coefficients for individual model parameters they deem to be of particular importance, those detailed results are available elsewhere (Rizzo and Erhardt 2016). In Table 4, the importance column shows the authors’ judgment of the relative importance of each component for a travel model as a whole. The median a values and 75th percentile a values are generally weighted medians or weighted 75th percentiles by total tours. The means over the tabular values and model are weighted means of the corresponding median and 75th percentiles for the a values, using the importance weights given. The grand means are the simple means of the mean for tabular values and the mean for model estimation values. The highest correlation coefficient is for auto ownership, with a median value of 100%, indicating perfect correlation. This result is expected, and is a verification of the calculations. The lowest correlation coefficient is for a tabulation of the total number of tours by person type and tour purpose, with a median value of only 4.4%. A few points are of note in examining these results. First, as can be observed in the summary values for model estimation, there are generally higher intra-person dayto-day correlations for work travel than for non-work travel. This is what we would expect. The work activity imposes both a constraint and a level of consistency on travel behavior that persists across days, while non-work travel is significantly more variable, presumably because the activities that drive it are more variable. Table 4: Summary of Within-Person Correlation Coefficients (a) Median a value
75th Percentile a value
Importance
Table: auto ownership by county
100.0%
100.0%
10%
Table: total tours by person type
17.6%
25.6%
15%
Category of estimate
Table: total tours by person type/tour purpose
4.4%
7.6%
15%
Table: trip distance by trip purpose
48.2%
58.4%
10%
Table: trip duration by trip purpose
45.1%
49.7%
10%
Table: percentage trips county to county
54.2%
74.5%
10%
Table: mode choice by auto sufficiency (0 autos, autos< workers, autos>=workers)
-7.0%
-4.5%
30%
Mean for tabular values
26.0%
31.9%
100%
Model: auto ownership
100.0%
100.0%
10%
Model: worker tour generation
29.1%
47.0%
15%
Model: non-worker tour generation
18.0%
27.3%
15%
Model: work tour destination choice
84.4%
90.5%
15%
Model: social/recreation tour destination choice
23.4%
35.4%
15%
Model: work tour mode choice
74.1%
89.1%
15%
Model: social/recreation tour mode choice
50.4%
70.1%
15%
Mean for model estimation values
51.9%
63.9%
100%
Grand mean
39.0%
47.9%
Second, it appears that the model estimations may have higher intra-person day-to-day correlations than the tabular values. This is not definitive, as it is influenced by the negative values for the mode choice by auto sufficiency table (auto sufficiency is a categorical variable defined as: 0 autos, autos
=workers), which should be viewed with skepticism due to the low number of observations for many modes, and potential bias in the GPSimputation of modes. However, the same pattern can be observed for tour generation, where the tour generation estimation results have higher a values than the tour generation tabulations. While this would mean that there is less
Gregory D. Erhardt et al. / Transportation Research Procedia 32 (2018) 279–290 Erhardt and Rizzo / Transportation Research Procedia 00 (2018) 000–000
288 10
value to collecting additional travel days for model estimation, it may reflect positively on the predictive ability of the models. Third, tour generation generally has lower correlation coefficients than the other key travel choices, reflecting more day-to-day intra-person variability. This highlights the importance of considering a range of key travel choices, and means that studies that make recommendations focused solely on matters of trip or tour generation should be viewed with skepticism. 4.3. Equivalent Sample Size The variance ratios can be used to calculate the equivalency between the sample size of a multi-day survey and its single day counterpart. This is done using the formula: 𝑆𝑆G = 𝑆𝑆H ∗
4;?(3A4) 3
(7)
Where: 𝑆𝑆G is the new (reduced) sample size, 𝑆𝑆H is the sample size for a one-day survey, 𝑇𝑇 is the sample length in days, in this case 3, and a is the correlation coefficient. In this equation, the sample size is in households. If a were zero, indicating no correlation across days, a threeday survey with a sample of 333 households would be equivalent to a one day survey with a sample of 1000 households. Applying this formula, with the correlation coefficients taken from Table 4 (in combination with the importance weights shown in Table 4), we can calculate the equivalent sample sizes shown in Table 5. Table 5: Equivalent Sample Size for 2,780 GPS-Only Households in Cleveland HTS Median a value
Equiv. Sample Size for Median a
75th Percentile a value
Equiv. Sample Size for 75th Percentile a
Mean for tabular values
26.0%
5,487
31.9%
5,092
Mean for model estimation values
51.9%
4,092
63.9%
3,661
Grand mean
39.0%
4,685
47.9%
4,259
Category of estimate
The equivalent sample sizes are higher for the tabular values than for the model estimation values because the a values are lower. Thus, additional surveys days add more precision to model calibration values than to model estimation results. There is a level of judgment required in selecting among these, depending on which uses are more highly valued, and how conservative one wants to be. In the best case, conducting a 3-day survey is equivalent to nearly doubling the sample size when compared to households surveyed for a single day. In the most conservative case, it is equivalent to increasing the sample size by only 32%. 4.4. Optimal Number of Survey Days Pas (1986) presents a framework for estimating the optimal number of survey days based on the within-person correlation coefficient (a), and the ratio of cost of collecting each additional day (q) to the cost of recruiting a household and collecting the first day’s worth of data (p). Table 6 shows the optimal number of survey days calculated for a range of correlation coefficients and cost ratios. The correlation coefficients reflect a reasonable range, based on the values found in Table 4. Stopher et al. (2008) report cost ratios for a selection of surveys, ranging from 0.063 to 0.124. The values shown in the table reasonably bound those.
Gregory D. Erhardt et al. / Transportation Research Procedia 32 (2018) 279–290 Erhardt and Rizzo / Transportation Research Procedia 00 (2018) 000–000
289 11
Table 6: Optimal Number of Survey Days for Different Cost Ratios and Within-Person Correlation Coefficients Ratio of Cost of collecting each additional day to the cost of recruiting an additional household (q/p) Correlation a 25.0%
0.05
0.075
0.1
0.15
0.2
8
6
5,6
4,5
4
37.5%
6
5
4
3
3
50.0%
4,5
4
3
3
2
62.5%
3,4
3
3
2
2
Assuming one wants to be reasonably conservative, it appears reasonable to conduct surveys with 2, 3 or 4 days of data collection per households. This evidence would argue against very long surveys with a week or more of data collection, unless the cost factor is very low or the intended purpose is very specific. Furthermore, regardless of the number of days, it is still necessary to ensure that enough households are sampled to meet the needs of models with little or no day-to-day variation. 5. Conclusions and Future Work This research contributes to the to our understanding of the value of multi-day household travel surveys. It does so using empirical evidence from the 2012 Northeast Ohio Regional Travel Survey. It considers the two criteria that a multi-day survey should meet: that each day of data collection should provide an unbiased representation of travel on that day, and that subsequent days of data collection should provide additional information. The first criterion is evaluated by testing for potential bias, both across survey days and across the data collection method. This specific survey fails on both accounts. The analysis shows that within the GPS-only sample, there are significantly fewer trips and significantly less non-auto travel on the second and third weekday travel day, relative to the first. In addition, there are significant differences between the GPS-only and the GPS-with-prompted recall samples. This investigation suggests that the GPS-only approach needs further improvement before it can be reliably used as a stand-alone method of data collection. The additional information that weekdays 2 and 3 of data collection provide is evaluated relative to the additional information that could be obtained by collecting additional households with a single day of data collection. This is done by calculating design effects and correlation coefficients for each model coefficient of tabular value for several key travel choices. The correlation coefficient, or within-person a value, measures the correlation of intra-person dayto-day travel, with values of 100% indicating perfect correlation and 0% indicating no correlation across travel days. The results show average median and 75th percentile values of 26% and 32% for tabular values and of 52% and 63% for estimated model coefficients. These values imply that the multi-day Northeast Ohio sample is equivalent to a single-day sample between 26% and 64% larger. The correlation coefficients can be used in combination with the relative cost of recruiting new households (and collecting the first day of travel data) to the cost of collecting additional days of travel data to calculate the optimal number of survey days. This evidence suggests that a survey 2, 3 or 4 days in length may be optimal, depending on the cost ratio and how conservative one wants to be. The empirical evidence provided on sample size is limited by the biases found in the GPS-only sample. Therefore, it would be prudent to repeat this study using a survey that does not contain these biases. Fortunately, this study has shown that developed a detailed method to analyze future surveys, and shown that method to be viable. Several features should be highlighted. The jackknife variance estimator appears to effectively manage the repeated measurement problem. The calculation of design effects and intra-person day-to-day correlation coefficients provides the necessary information to understand which travel choices benefit from additional days more, and which benefit less. Finally, it is important to conduct the analysis for multiple uses of travel surveys, because the results vary substantially for different choices and between tabular values used for model calibration, and coefficients in model estimation.
290 12
Gregory D. Erhardt et al. / Transportation Research Procedia 32 (2018) 279–290 Erhardt and Rizzo / Transportation Research Procedia 00 (2018) 000–000
Acknowledgments This research was funded through the National Cooperative Highway Research Program (NCHRP) on Project 0836C, Task 123: Survey Sample Size and Weighting. The full project report and associated appendices have been published by the Transportation Research Board as Research Results Digest 400 (Rizzo and Erhardt 2016). The work was completed by the authors as representatives of RAND and Westat. We thank Larry Goldstein and the NCHRP project panel for their valuable input, and the Ohio Department of Transportation and the Northeast Ohio Areawide Coordinating Agency (NOACA) for providing data. References Bhat, C.R., Sivaramakrishnan S., Axhausen, K., 2005. An Analysis of Multiple Interepisode Durations Using a Unifying Multivariate Hazard Model. Transportation Research Part B: Methodological 39 (9): 797–823. Cherchi, E., Ortúzar, J de D., 2008. Empirical Identification in the Mixed Logit Model: Analysing the Effect of Data Richness. Networks and Spatial Economics 8 (2–3): 109–24. Cirillo, C., Daly, A., Lindveld, K., 1998. Eliminating Bias Due to the Repeated Measurements Problem in Stated Preference Data. In Operations Research and Decision Aid Methodologies in Traffic and Transportation Management, edited by Martine Labbé, Gilbert Laporte, Katalin Tanczos, and Philippe Toint, 286–301. NATO ASI Series 166. Springer Berlin Heidelberg. Dill, J., Broach, J., 2014. Travel to Common Destinations: An Exploration Using Multiday GPS Data. Transportation Research Record: Journal of the Transportation Research Board, no. 2413: pp 84–91. Golob, T.T., Meurs, H., 1986. Biases in Response over Time in a Seven-Day Travel Diary. Transportation 13 (2): 163–81. Kang, H., Scott, D.M., 2009. Modeling Day-to-Day Dynamics in Individuals’ Activity Time Use Considering Intra-Household Interactions. In The 88th Annual Meeting of the Transportation Research Board. Washington, D.C. Meurs, H., van Wissen, L., J Visser, J., 1989. Measurement Biases in Panel Data. Transportation (Netherlands) 16 (2): 175–94. Oliveira, M.S., Gupta, S. 2013. “NOACA HTS Data Imputation Process.” Technical Memorandum. Westat and Parsons Brinckerhoff. Ortuzar, J de D., Willumsen, L.G., 2011. Modelling Transport, 4th ed. John Wiley & Sons. Parsons Brinckerhoff Inc., Westat, Dunbar Transportation Consulting, 2014. Activity-Based Modeling Framework Final Project Report. Prepared for North Central Texas Council of Governments, Arlington, Texas. Pas, E I. 1986. Multiday Samples, Parameter Estimation Precision, and Data Collection Costs for Least Squares Regression Trip-Generation Models. Environment and Planning A 18 (1): 73–87. Pas, E.I. 1987. Intrapersonal Variability and Model Goodness-of-Fit. Transportation Research Part A: General 21 (6): 431–38. Pendyala, R.M., Pas, E.I. 2000. Multi-Day and Multi-Period Data for Travel Demand Analysis and Modeling. In Transportation Research Circular E-C008: Transportation Surveys: Raising the Standard. Transportation Research Board, Washington, D.C. Rizzo, L, Erhardt, G.D., 2016. Sample Size Implications of Multi-Day GPS-Enabled Household Travel Surveys. Research Results Digest 400. Transportation Research Board, Washington, D.C. Rose, J.M., Hess, S., Bliemer, M.C.J., Daly, A., 2009. The Impact of Varying the Number of Repeated Choice Observations on the Mixed Multinomial Logit Model. In European Transport Conference, Leeuwenhorst, The Netherlands. Southeast Florida Transportation Council. 2014. 2015 Southeast Florida Household Travel Survey: White Paper. Modeling Subcommittee, Regional Transportation Technical Advisory Committee. http://www.fsutmsonline.net/images/uploads/mtffiles/Southeast_Florida_Household_Travel_Survey_0205_2014.pdf. Stopher, P., Kockelman, K., Greaves, S. Clifford, E., 2008. Reducing Burden and Sample Sizes in Multiday Household Travel Surveys. Transportation Research Record: Journal of the Transportation Research Board, no. 2064: pp. 12–18. Valliant, R, Dever, J.A., Kreuter, F., 2013. Practical Tools for Designing and Weighting Survey Samples. Springer, New York, NY. Wilhelm, J., Wolf, J., Kang, E., Taylor, D., 2014. The Cleveland GPS Household Travel Survey: Survey Design, Imputation of Trip Characteristics, and Secondary Uses of the Data. In Transportation Research Board Annual Meeting. Washington, D.C. Wolf, J., Bachman, W., Oliveira, M., Auld, J., Mohammadian, A., Vovsha, P., 2014. Applying GPS Data to Understand Travel Behavior. Volume I. National Cooperative Highway Research Program NCHRP Report 775. Transportation Research Board, Washington, D.C. Wolter, K. 2007. Introduction to Variance Estimation. 2nd edition. Springer, New York, NY. Xu, Y., Guensler, R., 2015. Capturing Personal Modality Styles Using Multiday GPS Data. presented at the TMIP Webinar: Using Multiday GPS Data, July 16, 2015.