Tn,nspnRes. Vol ISA. No. 6. pp. 471-485. 1981 Printed in Great Bntain.
A STRATIFICATION ANALYSIS OF TASTE VARIATIONS IN WORK-TRIP MODE CHOICE RYUICHI Department
of Civil
Engineering,
(Received 16 Norember
KITAMURAt
University
of California,
1979: in revised form
Davis,
CA 95616,
20 December
U.S.A.
1980)
Abstract-This study develops and applies a heuristic stratification procedure to the exploration of variation in tastes in tripmakers’ choice of travel mode. The stratification procedure systematically identifies a set of socioeconomic subgroups that effectively accounts for the variation. Empirical results indicate that a limited number of socioeconomic variables are associated with the taste variation, that the entire sample can be stratified into a few socioeconomic subgroups with distinctive tastes, and that choice models can be relevantly specified for respective subgroups using exclusively level-of-service variables of travel modes as the model’s independent variables.
1. INTRODUCTION
One aspect of discrete travel-choice behavior to which relatively little attention has been directed is the variation in tastes across tripmakers. This paper develops an empirical stratification procedure that explicitly accounts for the variation in developing a set of internally homogeneous strata. The result of the study indicates that the variation can be explained by stratifying the tripmakers into a few groups defined by a limited number of systematically selected socioeconomic variables. The application of disaggregate choice models to transportation demand analysis typically assumes that all choice makers have identical tastes toward (or valuations of) the attributes that characterize the alternatives involved in the choice. That assumption is frequently found in the use of constant-coefficient models where the model’s coefficients represent the valuations of respective attributes entered into the model as independent variables. An intuitive review of travel behavior, however, may cast doubt on this assumption of homogeneity. Many empirical results in the last ten years suggest heterogeneity in tastes across tripmakers or groups of tripmakers (e.g. Stopher, 1969: Constantino, Dobson and Canty, 1974; Lerman and Ben-Akiva, 1976; Burns and Golob, 1976; Nicolaidis, Wachs and Golob, 1977). Besides its fundamental importance in various transportation market analyses (e.g.. identification of target groups or needs groups for transportation services), accounting for this variation in tastes plays a crucial role in tThe author was a graduate student at the University of Michigan, Ann Arbor, when this research was performed. *Yet another approach is attitudinal segmentation. Attitudinal segmentation has been a successful field in transportation analysis where homogeneous perceptual segments are developed by applying psychometric and multivariate analysis techniques (e.g. Golob et al., 1972; Dobson and Kehoe, 1974; Nicolaidis and Sheth, 1976; Stopher, 1977; Dobson and Tischer, 1978; Tardiff, 1979). These attitudinal segmentations involve clustering of individuals in a multidimensional preception space obtained from attitudinal survey data. However, these segmentation studies have been little concerned with application of these segments to the prediction of behavior employing choice models (an example is found in Nicolaidis. et al., 1977).
the relevant application of choice models (see, e.g. McFadden et al., 1977). Little is known about the nature of taste variations. There are, however, at least three approaches to the problem in discrete travel-choice analysis.$ The first is the use of random-coefficient models (Hausman and Wise, 1978; Lerman and Manski, 1981). This appraoch employs probit models together with the assumption that the model coefficients are multinormally-distributed random variables, being independent of other factors. The introduction of the randomness is quite appealing. At the same time, however, one might expect, and wish to explore, possible associations between tastes and some attributes of the tripmaker. The approach does not directly apply to such exploration. The second is the model-specification approach where the model coefficients are specified as appropriate functions of the variables that explain the variation. In other words, the structure of taste variations is explicitly formulated as a function of relevant variables. For example, a pre-BART work-trip mode choice model estimated by Train (1978) employs a travel cost variable divided by the wage rate of the tripmaker. The specification thus assumes that the relative weight of monetary travel cost in mode choice varies as a reciprocal function of the wage rate. Train and McFadden (1978) further discussed the way the wage rate and other socioeconomic variables enter the model on the basis of the goods-leisure tradeoff scheme of microeconomics. The third approach is stratification. This approach assumes the existence of a set of distinctive stata which can be regarded to be internally taste homogeneous. Operationally, the approach is less explicit about the structure of taste variations in that, unlike the specification approach, the stratification does not involve the functional specification of the model coefficients. Examples include the use of life-cycle and occupation strata in joint-choice modeling of household car-ownership and work-trip mode choice (Lerman and Ben-Akiva, 1976); stratification based on the number of household members of driving age (Burns and Golob, 1976); and stratification based on perceived mode availability 473
474
R. KITAMURA
(Reeker and Golob, 1976). These studies reported important differences in the estimated coefficients of choice models across the strata, and concluded the effectiveness of the approach. Development of these stratifications, however, was based largely on behavioral intuitions of the researchers, and not based on comparative analysis of possible alternative stratifications with respect to their effectiveness and validity in choice-model applications. The model specification and stratification approaches are equivalent to some extent (see next section) and share similar difficulties in practical applications. In the model-specification approach, one must first identify those variables that are associated with the variation, then, using them, specify a functional model form that accounts for the variation. The type of variables required may vary depending on the attribute of the alternative. For example, the taste toward walking time may be accounted for mainly by the age of the tripmaker, while those toward other time elements may be more closely associated with income, etc. Available theories may have rather limited capabilities in directing the model specification effort. On the other hand, a serious difficulty may arise because of the possible extensiveness of the potential variables and their interactions that may influence tripmakers’ tastes. aggregation When discrete approximation-thus error-is acceptable, the stratification approach is less difficult in that it requires only the identification of appropriate variables and the definition of the strata such that the homogeneity condition is satisfied. Once a set of homogeneous strata is obtained, a constant-coefficient model can be relevantly applied to explain the travel behavior of each stratum. Such a model requires only those variables that represent the attributes of alternatives. One can then legitimately omit other variables that would otherwise have to be included to account for the taste variation. This alleviates the cumbersome model specification problem under taste variations. Given a number of factors that may be associated with the taste variation, and the lack of theoretical bases to establish an a priori preference to some specific stratification, however, there could be numerous ways of stratifying tripmakers. Practical difficulties involved in evaluating all these potential stratifications lead to the developments of a systematic stratification procedure. Evaluation of the tastes of tripmakers and use of the information in developing* strata are important elements of the procedure that will make possible effective reduction of internal variations in tastes. This study develops and applies a heuristic stratification procedure to generate sample subgroups such that constant-coefficient choice models can be relevantly specified on respective subgroups while the variation in tastes is sufficiently accounted for by the structure of the subgroups. The procedure employs a modified version of Automatic Interaction Detection (AID: Sonquist, Baker and Morgan, 1971) and evolves simultaneously with choice-model estimation, thus evaluating the tastes of groups of tripmakers. Stratification bases (variables that define the strata) are systematically selected such that within-group variations
in tastes can be effectively reduced with a small number of subgroups. Socioeconomic variables are used as stratification bases considering their predictability and association with the variation in tastes. This stratification procedure is applied to short-term work-trip mode choice using the Pre-BART Travel Behavior Data Set. The result indicates that a limited number of socioeconomic variables is associated with the variation in tastes, that the entire sample can be stratified into a small number of socioeconomic subgroups with mutually distinctive tastes, and that choice models can be relevantly specified on these subgroups with level-of-service characteristics of travel modes used exclusively as the model’s independent variables. Comparison of the present results with previous model specifications on the same data set reveals certain discrepancies which require further exploration. The stratification procedure is an effective mechanism in statistically exploring the taste variation and developing homogeneous strata. It is based on the AID algorithm which was developed to stimulate a researcher examining a large data set and exploring it for structure. It is not intended to supercede theoretical model specification, but rather to serve as a tool which by systematically searching for a statistical stratification, can offer empirical evidence for the development of behavioral theory of taste variations in travel choice. 2.STATIFlCATIONPROCEDURE
The stratification problem of this study is formulated as follows: Given a set of mode-choice observations, level-of-service measurements, and socioeconomic attributes of tripmakers, and also given the minimum number of observations to form a subgroup, develop a set of exclusive and exhaustive socioeconomic sample subgroups such that the coefficient vectors of the modechoice models defined for the respective subgroups become as different as possible from each other. The independent variables of the choice models are level-ofservice variables of respective travel modes. This subgrouping at the same time minimizes variations in coefficient values within the subgroups. As an approach to this problem, this study employs the basic structure of the AID analysis. AID is a mechanism for systematic development of a set of sample subgroups such that the within-group variance of the dependent variable is effectively reduced for each subgroup. Categorical independent variables (called “predictors”) are used to define these subgroups. AID functions as a sequential selector of appropriate predictors to develop such a set of subgroups. The categories of each predictor provide one or more ways of splitting into two parts each subgroup that exists in an intermediate stage of the analysis. AID enumerates the candidate splits generated by all predictors, then selects that split which minimizes the sum of within-group variances. The AID analysis evolves by repeating this split selection procedure until no new splitting becomes possible, without violating prespecified conditions on the minimum subgroup size and variance reduction after the split.
A stratification analysis of taste variations in work-trip mode choice
415
Applications of this AID analysis to transportation behavior can be found in Hensher (1976) and Constantino, Dobson and Canty (1974). The latter applied socioeconomic variables as the predictors (or as candidate stratification bases). Those AID analyses, however, sought homogeneity in the dependent variable values themselves, and did not necessarily develop strata of internal homogeneity in tastes. The present study applies the basic AID procedure in a different way. One option of the AID analysis is to select the best split on the basis of the analysis of covariance about the two regression lines respectively estimated for the two subgroups. The criterion in this selection is the deviation of the observed dependent-variable values from the regression lines. Note that this version of AID does not necessarily produce subgroups of similar dependentvariable values. The stratification procedure of this study applies this AID structure together with choice models. The level-ofservice variables of travel modes are the independent variables of this choice model, while the socioeconomic attributes are incorporated in the analysis as predictors that define subgroups. For reasons discussed later, this study employs the binary logit model,
for the two subgroups from each split. The estimation result is then used to evaluate the distinctiveness in the coefficient values between the two subgroups. The following discussion develops a statistic that can be used for this purpose. Consider two subgroups from a parent subgroup. Let D, be a binary socioeconomic variable that defines these two subgroups, and let
Pr(mode 1 is chosen by tripmaker i)
where a and p are also vectors. Let e^= &t D,b^ be the maximum-likelihood estimates of (Y and p. The maximum-likelihood estimates of til and & are related to & and p^as
= e~‘~lq(ewi
+ ew),
where Xii is the vector representing the level-of-service characteristics of mode j measured for tripmaker i, and 6’ is the coefficient vector. It is desired that the value of 8 becomes most distinctive across the subgroups and least variant within each subgroup. Figure 1 illustrates the development of the stratification procedure. Following enumeration of all possible dichotomous splits, choice models are estimated
D = 0, if tripmaker n belongs to subgroup 1, n I 1, if tripmaker n belongs to subgroup 2. Let a, and & be the respective maximum-likelihood estimates of choice-model coefficient vectors for the first and second subgroups. It is desired to develop a statistic that can be used to infer the distinctiveness between 0, and 02, the true population values (it is assumed here that such population vectors exist). The two choice models estimated for the respective subgroups can be integrated into a single model defined over the parent group. Such a model has a coefficient vector, e=atD$,
n=l,...,
N
Then, the difference between a, and & is represented by 8. Thus the distinctiveness between 8, and B2 can be inferred by testing whether /3 = 0. The following statistic can be used for this test,
I where L(i) is log-likelihood value of an unstratified model (i.e. with restriction that /3 = 0) evaluated for the parent group, and Li(.) is the log-likelihood evaluated for subgroup i. The statistic is asymptotically chi-square distributed with a degree of freedom of k. where k is the
number of elements in vector /l. The d-statistic is used in the stratification procedure to select, among alternative splits, the one that provides the maximum distinctiveness. The above equation shows that the split with the largest distinctiveness (largest value of d) at the same time has the largest sum of log-likelihood values of the two choice models. Thus,
this maximum-distinctiveness criterion also implies maximization of the model’s goodness of fit. or minimization
Fig. 1. Stratification
procedure
of the deviations between observed choice and predicted choice probabilities evaluated in terms of the likelihood value. The stratification develops, repeating this splitting procedure for new subgroups. The above discussion also illustrates, for a simplified case, the equivalence of the model-specification and
476
R. KITAMURA
stratification approaches, i.e. estimating a single model for the entire sample with an additional variable introduced into the model to represent taste variations, and estimating one model for each of the subgroups where the variation in the model-coefficient values represents the taste variation across the subgroups. These equivalent approaches offer two alternatives to obtaining a travel choice model, or a system of models, that can relevantly reflect taste variations. When model development can be directed by a theory, one can generate alternative model forms with appropriate use of socioeconomic and other variables, then select the most appropriate one on the basis of some criteria. On the other hand, it may be the case where there is no theory available in hypothesizing the structure of taste variation and formulating a model, or where the size of the model and the number of alternative formulations to be tested become too large to be practically handled, for example, because of possible interaction effects of the contributing factors. This is the case where the systematic stratification approach can be extremely useful. Note that the stratification procedure may be viewed as a systematic modelspecification procedure since the selection of split-base predictors is equivalent to specification of the structure of the coefficient vector using dummy variables and their interaction terms. The usefulness of this version of AID lies in its automatic determination of this coefficientvector structure such that its variation can be effectively explained by a systematically selected set of socioeconomic predictors. Before this stratification procedure is applied, however, it is appropriate to discuss its underlying assumptions and resulting limitations. Representation of the variation in tastes by means of stratification assumes that the coefficient values have discrete distributions across the sample. When the true coefficient value is a continuous transformation of one or more continuous socioeconomic variables, the stratification is an approximation unless the variance of these variables vanishes within the subgroup. Aggregation error due to categorization is inevitable. Another question that arises is how the split selection procedure behaves under this structure of taste variations. The dichotomous splitting could introduce a large magnitude of errors in evaluating the effects of continuous socioeconomic attributes on tastes. Although this error can be improved by applying polychotomous splits. the refinement inevitably increases sample size requirements. Detailed discussion of this issue, especially in relation to the properties of the coefficient estimators under these conditions, is beyond the scope of this tThe statistical analysis of this study employed MIDAS (Michigan Interactive Data Analysis System) packages. A MIDAS program was developed to conduct the stratification by effectively combining its package programs. The computer used was Amdahl Model 470-V6, an equivalent of IBM 360/70. $This is not to say that linear regression can provide appropriate coefficient estimates of the choice model. The empirical result here using two data sets indicated that, only as far as the identification of split predictors is concerned, the linear regression yields similar results as the maximum likelihood estimation.
paper. It merely notes the importance of examining the final subgroups and choice models for their validity when continuous socioeconomic variables are included in the analysis as predictors. This study applies residual analysis as a means of this validation. 3. COMPUTATIONAL ISSUES
Because the stratification procedure is a sequential and systematic examination of a large number of alternative model forms, its computational requirements can be large, especially when many predictors are involved in the analysis. Thus, in spite of its desirable statistical properties, use of the maximum likelihood estimation of the logit model may not be practical. If less-time-consuming estimation methods can generate the identical set of subgroups as the maximum-likelihood estimation, their appropriate use can reduce the cost of the stratification procedure. To examine this, preliminary stratification analysis was conducted using Shirley Highway Corridor Data File, provided by the Urban Mass Transportation Administration, U.S. Department of Transportation. The estimation methods examined include: Berkson’s “minimum logit chi-square” estimation (1953) using the two-stage method proposed by Cox (1970), linear regression where the dependent variable is a O-1 dummy variable representing the binary choice: and the maximum-likelihood estimation. The linear regression can be viewed as a special case of Berkson’s method where the cell size is one (Berkson’s method uses sample cells of similar independent-variable values to estimate the choice probability). The properties of this linear approach are discussed by Domencich and McFadden (1975). The results indicated that the three stratifications are eventually practically identical, classifying each individual into the same subgroup in most cases. The computational requirements were:t 87 set for the maximum-likehood estimation: SOset for Berkson’s estimation (about 85% of this was spent in developing sample cells and variable transformations for weighted least squares); and about 5 set for the entire stratification using linear regression. The advantage of linear regression is obvious in terms of time requirements, while providing a stratification eventually identical to that of the other two methods. These results prompted a decision to use linear regression to evaluate a large number of candidate splits, and apply the maximumlikelihood estimation after the number of candidate splits is narrowed down. Use of the linear regression was later justified further by a supplemental residual analysis conducted in the first stratification analysis, described in the next section. It was found that the predictors selected as split bases by the linear regression also had large correlations with the residual. This indicates the consistency between the linear regression and logit maximum-likelihood estimation in evaluating these predictors’ effects on taste variations.: 4. EMPIRICAL RESULTS The stratification analysis used a Pre-BART Travel Behavior Data Set provided by the Institute of Trans-
477
A stratification analysis of taste variations in work-trip mode choice portation Studies, University of California at Berkeley, and explored taste variations in work-trip mode choice conditioned on long-term decisions such as car ownership and residential and job locations. One of the advantages of this data set is detailed disaggregate measurements of mode attributes, an extremely desirable feature for correct estimation of tastes. Table I lists the level-of-service variables that were used as the independent variables of the choice model. Another advantage of this data set is its wide range of socioeconomic variables, which is essential for systematic development of homogeneous strata. The variables used in this study are listed in Table 2, together with their categories in the stratification analysis. This study uses the 771 observations with complete information that were identified and used by Train (1978). The original data set includes four alternative travel modes for each tripmaker. In order to apply linear regression as a screen mechanism in the split selection procedure, however, the number of alternatives must be reduced to two. Further, the limited sample size and the lower choice frequencies of some alternatives precluded application of the four-alternative logit model in the stratification procedure. This reduction in the number of alternatives does not impair the consistency of the maxi-
Table I. Level-of-service variables used VARIABLE*
(Unit)
in
the
ABBREVIATION
Travel Cost Difference (Cl’
DCOST
Auto On-Vehicle Difference (l/IO
Tlme2 min.)
AONVEH
Bus On-Vehicle Time (l/IO min.)
BONVEH
Bus-Trip Walkin Time (l/IO min. ei
BUSWALK
Bus-Transfer Waiting Time
XFERWAIT (l/IO
min.)
Headway of the First Bus (I/IO min.)’
HEADWAY
l:
All
I 2
Differences are taken as (Auto) - (Bus). “Auto On-Vehicle Time Difference” is defined as (Auto on-vehicle time of auto trip) - (Auto access time of bus trip with auto access). The second term is zero for bus trip with walk access. With ceiling of 16 minutes.
3
measurements
are
for
round
trip.
mum-likehood estimator, although the estimation becomes less efficient (Manski, 1973).The rule employed in reducing the alternative size is summarized in Table 3.
Table 2. Socioeconomic variables used in the analysis VARIABLE
CATEGORIES
ABBREVIATION
(Unit)
Number Persons (person)
of in Household
#PERSONS
I;
2;
3; 4 or
Number Workers (person)
of I”
“lore
#WORKERS
I;
2;
3 or
Number Drivers
of in Household
#DRIVERS
0;
I;
2;
3 oc more
Number of Cars Available (car)
#CARS
0;
I;
2;
3 or
Number of Cars per Driver’ (car/person)
CARPDRVR
0; 0’ - 0.5; 0.75 - 1.0-;
Number of NondrIvers (person)
IIDEPNDNT
0;
Family
INCOME
less than 7,500_; IO500 - 15000 30000 or “lore
mclre
Household
(person)
Income’
(S/year)
Post-Tax (C/(mmllO)) How Long Bay Area (month) Age
of
Wage
Rate3
Lived
in
Respondent
Employment at Destination (worker/square I 2 3
Density Zone mile)
I
or
“lore
0.5 I.0
- 0.75‘;
“lore
;
only
in
7500 - 10500-i 15000 - 30000
WAGERATE
(Used
YRS.BAY
less than 12. IZ-24-. 60-IZO-; IZo( or “we
AGE
less than 25; 50 or “lore
EMPDNSTY
less than ZOO-350-;
(year)
analysis
residual
25-35-;
;
analysis)
24-60-;
35-50-;
100; IOO-ZOO-; 350 or more
With ceiling of I.0 Pre-tax value adjusted for inflation between 1972 and 1974. Constructed from pre-tax family income and number of workers. Dunguay, et. al. (1976).
For
detail,
see
478
R. K~TAMLJRA Table
2 (Concluded)
CATEGORICAL VARIABLE
Driving
Home
VARIABLES
ABBREVIATION
Status
CATEGORIES
DRIVE
Ownership
Drive (1) Not Drive
HOME
OWN.
(0)
owns (1) Rents (0)
Housing structure+
STRUCTRE
Sex
SEX
Male (0) Female (1)
MARITAL
Married Separated Divorced or Widower/Widow Never Married
Marital
Status
Occupation Respondent*
REL.
of
Residential Location*
:
Categories O-I code
developed assigned to
Table
HEAD
RESIDENS
Central Others
Sharing
3. The binary
choice
ALTERNATIVES
IN
I
set in the stratification THE
BlNARY
Non-Auto
Auto Walk
Access
Bus
with
Auto
Access
I
:
Round
trip
access
tir&
access
The first stratification was conducted with the objective of obtaining the widest variety of socioeconomic variables that are related to the variation in tastes. The key question here is whether the variation can be explained by a number of socioeconomic variables that is practically small. Thus this stratification is a preliminary explorative analysis to assess the feasibility of construting a small number of subgroups with appropriate internal homogeneity. Therefore this stratification did not impose any restriction on the minimum subgroup size (except the sample size requirement for model estimation) or on the variety of socioeconomic variables used
Access, Access,
if if
At At
2 min.*
Walk
Access
Alone
Non-Auto
time
Auto Walk
Owners
Ride
&t
SET
Owners with
Drive
1
analysis
CHOICE
Owners
Bus
wth
(1)
Alternative 2 (Other Alternative)
I
Bus
Cities (0)
in this study. the binary category.
Bus with Bus with
Ride
Head
Professional Sales Others
Auto Alone
House
OCCUPATN
1
Drive
(1)
Head of Spouse Child of Others
Alternative (Observed Choice)
‘:
(0)
Multiple
Respondent’s Relation to Head*
* ( ):
Single
Owners Sharmg
difference;
as
(Walk
access
time)
-
(Auto
the predictors, and exclusively used the linear-regression model. This made it possible to examine a large number of socioeconomic variables at small cost. The liberal subgroup size requirement, together with the use of linear regression, also made it possible to identify the socioeconomic attributes associated with small segments in the data set that have potentially extreme tastes and behavior. On the other hand, the use of linear regression and the possible small subgroup size could yield a stratification that did not exactly reflect the tastes in mode choice. At this stage, however, this potential problem was considered less crucial to the objective. The
A stratification analysis of taste variations in work-trip mode choice stratification was terminated when no further split reduced the within-group variance to below a prespecified level (the ratio of the total variance due to regression to the square sum of residuals was compared against the F-value at the 95% level) or when the subgroup size became less than 10% of the total sample size. The result is summarized in Fig. 2. The dominant effects of variables associated with car availability were found from the results. The predictors that defined important splits were: number of cars available to the household; number of cars per driver; number of persons in the household; and number of workers in the household. The number of persons was found to be an important factor for the single-car households, while the number of workers was a more important factor for the multi-car households. The importance of car availability found here supports previous segmentations based on tripmakers’ subjective judgments on the mode availability (Reeker and Golob, 1976).The results further suggest that such a set of easily available objective measurements of car availability may be applied to construct tripmaker segments of distinct tastes. Other predictors appearing in the stratification results were: duration of residence in the Bay Area, age, sex,
occupation and marital status. These predictors, together with the car-availability factors, repeatedly appeared as split bases and developed the stratification. In spite of the wide variety of socioeconomic predictors used in this stratification, the number of predictors that actually appeared in the result is rather limited. This suggests that the variation in tastes can be sufficiently explained by a small number of socioeconomic attributes. Furthermore, most of the split-base variables in Fig. 2 can be found in the census data, indicating that the population sizes and geographical distributions of these subgroups can be estimated using census information. This stratification also indicated that there are several subgroups of small sizes that were separated out as subgroups with extreme tastes, e.g. single-car households with three or more household members. Among the predictors listed above, marital status was associated only with such small subgroups. As the subgroup size becomes smaller, it becomes more likely that the apparent distinctiveness of the group is governed by random chance in the sample realizations. It is also possible that a subgroup of extreme choice behavior is obtained. For example, a subgroup of the 77 households owning three or more cars and living in the Bay Area more than
Fig. 2. Stratificationresult using 18 socioeconomic predictors. l-R(A)
I5:bD
419
R. KITAMURA
480
five years did not make any bus trips. This small sample size and the extreme choice behavior of these subgroups did not allow their further analysis. Although the above stratification was quite useful in exploring the factors related to the taste variation, practical purposes call for the entire sample to be classified into a relatively small number of subgroups, and for the stratification basis to consist of variables whose population distributions or future values can be easily forecast. Considering these, another stratification was conducted by constraining the subgroup size to be not less than 10% of the sample (due mainly to the sample size requirement for logit maximum-likelihood estimation). The following socioeconomic predictors were used in the stratification: number of cars, number of persons, number of drivers, existence of nondrivers (mainly children) in the household; and age, sex, marital status (married or not), and occupation of the tripmaker. Those predictors that are redundant, or that appeared only in minor splits of the previous stratification were excluded from this analysis. The no-car subgroup was excluded here because the distinctiveness of this subgroup is obvious from the previous stratification. This stratification used the linear-regression model for screening and then applied the logit maximum-likehood estimation to the selected candidate splits.
Table
4. Stratum
choice
models
estimated
The stratification was terminated with four final subgroups defined by three socioeconomic bases: single-car single-person households; single-car multi-person households; multi-car households with tripmaker’s age below 35; and multi-car households with tripmaker’s age 35 or older. The two single-car subgroups are constructed exclusively from car availability, while the two multi-car subgroups are distinguished in terms of age, presumably selected as the best single indicator of the life cycle. Split analysis of these four subgroups showed that no further split substantially improved the likelihood value while satisfying the subgroup size requirement (the d-statistic was compared against the chi-square value at the 95% level). The mutual distinctiveness of these four subgroups can be seen in the logit coefficient estimates presented in Table 4, together with goodness-of-fit statistics. The monetary cost of travel (DCOST) is the most significant variable for all the subgroups, and shows certain differences in the coefficient values across the subgroups ranging from 0.00934 to 0.0192. The small value of the multi-car age-not-less-than-35 subgroup is noted. There is no single variable other than this cost variable for which all the subgroups have significant coefficient values. Rather, each subgroup is sensitive to a particular set of level-of-service attributes (similar results were pre-
for four socioeconomic
sample
subgroups
SUBGROUP I#CARS=~ #PERSONS=1
1
1 #cARS=I, #PERSONS>_Z)
1 #“,;;;;;i
1 #CARS>Z, ACE>T5}
N
106
201
174
231
-2LG) -n(c)
47.4 82.7
166.3 250.0
70.5 128.2
72.0 116.3
35.4
83.6
57.7
44.3
0.800
0.661
0.817
0.856
-2[L(C)-L(e)1 m
ESTIMATED
VARIABLE 0.0159 (3.20) 0.00643 (1.48) 0.00353 (I .03) 0.0173 (I .99) 0.00735 (I .03) -0.00160 C-0.15) -0.715 C-0.38)
DCOST AONVEH BONVEH BUSWALK XFERWAIT HEADWAY CONST.
0.0131 (5.23) -0.000161 (-0.10) 0.000375 (0.30) 0.00324 (0.20) 0.0108 (3.05) 0.00776 (6.54) q.954 (1.13)
COEFFICIENTS 0.0192 (3.68) -0.00255 (-0.90) -0.000581 (-0.30) 0.00281 (0.80) 0.00348 (0.65) 0.0177 (2.25) 0.387 (0.35)
0.00934 (2.68) 0.00261 (I .lO) 0.00188 (1.03) 0.00621 (1.41) 0.00272 (0.70) 0.00925 (0.98) -0.845 C-0.53)
( ): L(C): . L(e):
t-statistic Log-likelihood value with constraint that all coefficients except th’e constant term is 0. Maximum log-likelihood value with no constraints on the coefficient values. ^ -2(L(C)-L(e)): Chi-square distributed with degree of freedom 6. 7: The statistic, (DuMouchel,
TI, represents 1976) *=e
This may be interpreted behavior by the model.
the
fraction
explained
by the
model
and computed
as
-L(&/N BS the average
probability
of cqrrect
prediction
o- observed
A stratification
analysis
of taste variations
viously found by Constantino et al., 197&. The single-car single-person subgroup is especially sensitive to the walking time involved in bus trips (BUSWALK); the single-car multi-person subgroup to the waiting time for bus trips (XFERWAIT and HEADWAY); the first multicar subgroup (age under 35) also has a significant coefficient for bus waiting time (HEADWAY); while the other multi-car subgroup (age not less than 35) does not have any significant coefficient other than the cost variable. Accordingly the coefficient values show large variations across the subgroups. Also notable are the very minor roles of the on-vehicle-time variables (AONVEH and BONVEH) in these models. (Note that these models include many variables that are not significant, with occasional theoretically unsupportable negative signs. These variables would be excluded from the model in normal model specification efforts. However, since comparison of coefficient vectors across strata is an important element of this stratification analysis, all variables are left in the model even when their coefficients are insignificant.) Figure 3 presents the overall distinctiveness of these coefficient vectors as evaluated with the d-statistic. Although these subgroups are developed by evaluating the distinctiveness only between a pair of subgroups generated from the same parent subgroup, the resulting Table 5.
Correlations
between
socioeconomic
{tCARS:l, 1
481
mode choice 1
I
SINGLE SINGLE
CAR PERSONIFY
I
MULTI-CAR AGE
d = 35.3
Fig. 3. Distinctiveness
among
stratum
choice
HEADOl’ D.O.F. p, {a=.10 a=.05
OCCUPATN MARITAL
97 0.162 0.192
0.125 (0.82) 0.292*’ (3.12)
REL.HEAD
of choice
models
for four
SUBGROUP (IIcARS=~, ffKARS>Z, ACE& #PERSONS>Z) 174
201
231
O.OVO 0.033 -0.112 -O.LlVf 0.118s -0.032f 0.027 0.054 -0.038 0.078 0.136’ 0.081
-0.037 -0.377** -0.099 -0.044 0.096 -0.198H 0.008 0.052 -0.053 0.024 0.102 -0.016 -0.037 0.083 0.128*
0.036
0.089
-0.035
192 0.117 0.139
165 0.125 0.149
222 0.109 0.129
-0.106 -0.196”
0.182** (3.39) 0.070
p,:
( ): t:
-0.088 -0.314** -0.091 -0.052 0.080 0.062 -0.016 0.019 -0.070 0.106 -0.016 0.031 -0.055 -0.022 -0.009
0.079
0.063
(0.53) 0.127 (0.93) 0.011 (0.01)
(0.32) 0.070 (0.49)
l: tt:
(0.46) 0.268+* (8.82) 0.054 (0.34)
Slgniflcant at the 90 percent level. Significant at the 95 percent level. Critical level o’ the correlation coefficient. F-statistic HEAD01 is a binary version s the head of household or
constrained
four subgroups show large mutual distinctiveness. The d-statistic values are relatively small, however, for two combinations of subgroups: the single-car single-person subgroup versus the two multi-car subgroups. It is not appropriate to statistically infer the population distinctiveness on the basis of the d-statistic presented here (for discussions on the statistical test in the AID analysis, see Morgan and Andrews, 1973; Kass, 1975; Scott and Knott, 1976). Judging from the substantial and important differences among the estimated coefficient values, however, it would be appropriate to conclude
VARIABLE
0.1791 -0.071 -0.042 0.111 -0.076 0.078 -0.029 -0.148 0.002
models:
L 35
stratification.
106
-0.295H
1
I
\
/
SIZE
CARPDRVR DRIVE #CARS l/DRIVERS STRUCTRE YRS.BAY INCOME SEX WAGERATE RESIDENS EMPDNSTY AGE OWN.HOME #PERSONS #WORKERS
1
“,“,‘;“;“:p
SINGLE CAR MULTI-PERSON
variables and residuals sample subgroups
#PERSONS=I SUBGROUP
in work-trip
of REL.HEAD, not.
representing
whether
the
tripmaker
socioeconomic
R. KITAMURA
482
the choice models. Although the sample size does not warrant further splits, the results strongly support the stratification analysis and indicate the feasibility of developing a practical set of distinctive subgroups. It is of some interest to compare the results of this stratification study with those of previous model estimations on the same data set (Train, 1976, 1978). The coefficients of the Train’s model (Train, 1978) presented in the first column of Table 6, differ substantially from those of the four subgroup choice models presented in Table 4. The following discussion explains this. First, compare the models estimated for the entire sample (first and second columns). Note that the Train’s model is estimated with additional socioeconomic variables, while the model of the present study (denoted hereafter by “L.O.S. model”) is constructed exclusively of the six level-of-service attributes. The obvious difference in the coefficient values found in the table is not surprising in light of the substantial difference between the two specifications. In general, the L.O.S. model has smaller coefficient values than those of Train, but their values are relatively large for the excess-time variables and relatively small for the on-vehicle-time variables. Notable is the large coefficient of the headway variable (HEADWAY) in the L.O.S. model. When the same L.O.S. model is estimated for the subgroup of 712 car-owners, i.e. when the car-ownership effect is introduced by means of stratification, a drastic reduction is observed for the car-on-vehicle-time coefficient (ANOVEH), both in its absolute value and in its significance. Some reduction in the significance of the bus-on-vehicle time (BONVEH) is also noted. Other coefficient values stay stable, but in somewhat closer agreement with the Train’s in their relative values. This is the origin of the insignificance of the on-vehicle-time variables found in Table 4. One of the conceivable reasons for this is the correlation between tastes and level-of-service attributes (often noted as “ecological correlation”: additional results can be found in Kitamura, 1978). The result here raises some important methodological questions in the
that these four subgroups exhibit mutually distinctive tastes of travel-mode attributes. The validity of the models defined for respective subgroups is tested by examining whether there exist socioeconomic variables that are correlated with the model’s residual and thus should be incorporated into the model. The residual is defined as [O-l dummy variable representing observed choice; 1 if bus is chosen]-[the probability of bus choice predicted by the model]. If the residual is not correlated with any variables not included in the model, then the model specification can be validated. The weighted residual was computed for respective models according to McFadden (1974), and correlations with socioeconomic variables were estimated. Table 5 presents the results. Negative correlations imply that the sample probability that the car is chosen increases with the socioeconomic-variable values. The correlations with multiple-category variables were estimated by linear regression, and regression coefficients of correlation are presented in the table. Although the stratification is constrained in terms of both the subgroup size and the variety of socioeconomic predictors, the residual correlation decreases rapidly and the final four subgroups have no significant correlations except with a few socioeconomic variables. This result is extremely encouraging since it indicates that relevant choice models can be specified at a satisfactory level with a small number of subgroups, with a restricted number of base variables, and the model’s independent variables being the level-of-service variables. The table shows high correlations between the residual and driving status. Examination of the sample, however, showed that there are only a few nondrivers in this sample of car-owners. Therefore, the nondrivers could not constitute a subgroup although their distinctiveness emerged in this residual analysis. Thus, this high correlation can be ignored in judging the model’s general validity. Also misleading is the significant correlation with marital status. The remaining correlations can be eliminated either by applying a few more splits or by introducing the correlated socioeconomic variables into Table 6. Mode-attribute
coefficients of Train’s model, L.O.S. model for the entire sample, and L.O.S. model for the car-owner subgroup BINARY TRAIN’S
DCOST AONVEH BONVEH BUSWALK XFERWAIT HEADWAY TRANSFER
):
2
WITH
SAMPLE)
6 L.O.S.
771
0.00352 0.0126 0.00848 0.0247 0.0359 0.0645 -_
(4.31)2 (5.65) (2.94) (5.28) (2.30) (3.18) (0.776)
VARIABLES
{//CARS
1
11
712
(7.29) (2.24) (I .97) (3.72) (3.33) (3.74)
0.00708 0.00427 0.00609 0.0178 0.0343 0.0474
(8.08) (0.07) (1 JO) (2.68) (2.95) (2.61)
t-statistic
TRANSFER:
I
0.0284 0.0644 0.0259 0.0689 0.0538 0.0318 0.105
MODEL
{ENTIRE
771
N
(
MODEL’
Number
of
Train’s model (Train, several socioeconomic the 7 L.0.S variables. The cost presented
variable is the
transfers 1978, p.169) variables.
is divided coefficient
by for
was estimated The table shows
the wage rate COST/WACERATE.
in
with only
Train’s
4 alternatives, the coefficients
model.
The
and for
value
A stratification analysis of taste variations in work-tripmode choice
correlative structure of the data set. Such variables as income show this tendency. The effect of these variables are then accounted for, in an approximate manner, by the variations in the level-of-service coefficient values in the stratification. Nevertheless, it is rather surprising that those variables introduced as proxies of unobserved variables (e.g. employment density as a proxy for parking cost and time) did not play any role in the stratification. The above discussion is rather inconclusive if the desired output is the methodology of discrete-choice analysis under taste variations. The stratification analysis may be too sensitive to the correlative structure of the data set, casting doubt on its generality. Furthermore, the procedure as presented in this study is purely statistical, allowing no intervention of behavioral inferences or intuitions. On the other hand, the model specification approach appears to be insensitive to interactive effects of socioeconomic variables and vulnerable to ecological fallacy. Although no immediate conclusion or explanation for the above discrepancy is available, the result here is indicative of further research needs on the correlative structure (or simultaneity) among tastes, levelof-service attributes of travel alternatives, and socioeconomic attributes of tripmakers.
evaluation of taste variations, which clearly require further research. Table 7 compares the socioeconomic variables that appear in the respective analyses. All socioeconomic variables in the Train’s model are linear additive, except the wage rate, but his specification does not assume that every variable has a linear effect. The respective approaches account for the car-availability effect by different sets of variables: the Train’s model by dividing the cost variable by the number of cars per driver, and also by the number of drivers as a linear additive term; while the stratification by cars per driver, number of cars, number of persons and number of workers. The discrepancy here is rather trivial. For the rest of the variables, however, the two approaches are in complete disagreement. The table indicates that the variables that appear in the stratification have significant effects only on certain subgroups of the sample (except for DRIVE), an expected result since the AID procedure is particularly sensitive to interaction effects, which ordinary model specification effort may not necessarily capture. Most of the variables found significant in the Train’s single-equation system, on the other hand, played little role in the stratification, both in the unconstrained stratification and the residual analysis on the four subgroups. An immediate inference is that the subgroups have become relatively homogeneous in terms of these socioeconomic variables because of the Table 7. Socioeconomic
483
5. CONCLUSION Before the findings of this stratification analysis are
variables in analyses of
pre-BARTwork-tripmode
choice
ANALYSIS Unconstrained VARIABLE
Train’
CARPDRVR
X
Residual
Stratification’
for
1234
//CARS ~DR~VERS
X
X
X
3
X
STRUCTURE
1 X+
BAY
X
INCOME
2
SEX
X
2
X
X
#PERSONS
X
x2
//WORKERS
X
3
OCCUPATN
X
2
X
14
WACERATE
X
RESIDENCE
X
EMPDNSTY
X
AGE OWN.
HOME
MARITAL HEADOlf I
2 3
Analysis Subgroups3
X
DRIVE
YRS.
Four
X
“X” Indicates that the variable was used as a linear for WAGERATE) of the four-alternative logit model. “X” indicates
that
the variable
appeared
as a split
additive
term
(except
base of the stratification
The number indicates the subgroup number for which a variable had a significant correlation (at the 90% level) with the residual. “1” is for the first subgroup of Table 5, ‘?‘I for the second subgroup, etc. “X” indicates the stratification base.
‘:
Found
l:
Defined
significant in Table
in Train, 5.
1976.
484
R. K~~AMURA
discussed, it is appropriate to note certain limitations of the study that may prevent immediate generalization of the results. First, the appropriateness of the dichotomous splitting procedure is not confirmed for the case where tastes vary as continuous transformations of socioeconomic attributes. The result here may have underevaluated the effects of continuous socioeconomic attributes on the taste variation. Second, the results are strictly limited by the sample size. Many potential splits could not be evaluated in the stratification process simply because one or both of the subgroups generated were too small to obtain reliable coefficient estimates. In such case, the distinctiveness statistic, even if it were estimated, would have underrepresented the population distinctiveness that might have existed. This may be the difficulty that stratification analysis in general inevitably encounters because of the sample size requirement that exists in any statistical method. Finally, the results here are valid only for work-trip mode-choice behavior conditioned on long-term decisions, e.g. car ownership and residential and job locations. Immediate generalization to other transportation behavior is, of course, not warranted. Nevertheless, some of the findings have certain implications to stratification and discrete-choice analysis of transportation behavior. The stratification using linear regression showed that only a limited number of socioeconomic attributes are associated with the taste variations even when those that distinguish small-size subgroups are included. Further, among these attributes, the dominant factor is the car-availability variable. This suggests that the variation in tastes given the long-term decision, may have a relatively simple structure. The stratification identified four socioeconomic subgroups (excluding no-car households) defined in terms of predictable and easily available variables. These subgroups can be practically applied to prediction of demand in the market. It was also found that these subgroups have mutually distinctive tastes and have particular sets of mode attributes to which they are respectively sensitive. Differentiating the coefficient values among the sample subgroups is crucially important in applying choice models. The residual analysis of the choice models defined on these four subgroups confirmed their validity up to a few socioeconomic variables that should be, and easily can be, incorporated into the analysis, either as bases for further splits or as model-independent variables. The validity of these models demonstrates the effectiveness and feasibility of the stratification approach. By contrasting the result with previous model estimation results, this study has also indicated the need for further exploration of the structure of taste variations in travel choice. Variation in tastes does exist in people’s travel behavior. The conventional approach of introducing socioeconomic-attribute variables as linear additive terms of the choice model may not always be appropriate for accounting for this variation. The undifferentiated coefficient values of alternatives’ attributes could be another source of bias in aggregate prediction of travel demand. This study indicates the importance and prac-
ticality of the stratification approach for relevant applications of choice models under taste variation.
REFERENCES Berkson J. (1953) A statistically precise and relatively simple method of estimating the bio-assay with quanta1 response, based on the logistic Function. JASA 48, 565-599. Burns L. D. and Golob T. F. (1976)The role of accessibilitv in basic transportation choice behavior. Trunspn 5, 175-198.. Constantino D. P., Dobson R. and Canty E. T. (1974) An Investigation of Modal Choice for Dual Mode Transit, People Mover and Personal Rapid Transit Systems. Paper presented at the Inf. Conf. on Dual Mode Transportation, Washington, D.C. Cox D. R. (1970) The Analysis of Binary Data. Methuen. London. Dobson R. and Kehoe J. F. (1974)Disaggregate behavioral views of transportation attributes. Transpn Res. Rec. No. 527, l-15. Dobson R. and Tischer M. L. (1978) A perceptual market segmentation technique for transportation analysis. Transpn Res. Rec. No. 673, 145-152. Domencich T. A. and McFadden D. (1975) Urban Travel Demand, A Behavioral Analysis. North:Holland, Amsterdam. DuMouchel W. H. (1976) On the analogy between linear and log-linear regression. Techn. Rep. No. 67, Department of Statistics, The University of Michigan, Ann Arbor, Michigan. Duguay G., Fernandez L., Reid F., Winston C. and Woroch G. (1976)The urban travel demandforecastingproject pre-BART socioeconomiccodebook. Working Paper No. 7605, Urban Travel Demand Forecasting Project, Institute of Transportation Studies, University of California, Berkeley, California. Golob T. F., Canty E. T., Gustafson R. L. and Vitt J. E. (1972) An analysis of consumer preferences to a public transportation system. Transpn Res. 6, 81-102. Hausman J. A. and Wise D. A. (1978)A conditional probit model for qualitative choice: discrete decisions recognizing interdependence and heterogeneous preferences. Econometrica 46, 403-426.
Hensher D. A. (1976) Market segmentation as a mechanism in allowing for variability of traveller behavior. Transpn 5, 257284.
Kass G. V. (1975) Significance testing in automatic interaction detection (A.I.D.). Aool. Stalist. 24. 178-189. Kitamura d. (19%) grban Travel Demand Forecasting by Stratified Choice Models. Unpublished Ph.D. dissertation, Department of Civil Engineering, The University of Michigan. Ann Arbor, Michigan. Lerman S. R. and Ben-Akiva M. E. (1976) Disaggregate behavioral model of automobile ownership. Transpn. Res. Rec. No. 569, 34-55.
Lerman S. R. and Manski C. F. (1981)On the use of simulated frequencies to approximate choice probabilities. In Structural Analysis of Discrete Data:
With Econometric
Applications
(Edited by C. F. Manski and D. McFadden). MIT Press, Cambridge, Mass (forthcoming). Manski C. F. (1973)The analysis of qualitative choice. Unpublished Ph.D. dissertation, Department of Economics, Massachusetts Institute of Technology, Cambridge, Mass.
MorganJ. N. and AndrewsF. M. (1973)A commentof Einhorn’s ‘Alchemyin the behavioral sciences.’ Public Opinion Quart. 37, Spring, 127-129. McFadden D. (1974) Conditional logit analysis of qualitative choice behavior. In Frontiers in Economekics (Ediied by P. Zarembka). DD.105-142.Academic Press. New York. McFadden D.,‘?ye W. B. and Train K. (1977)An application of diagnostic tests for the independence from irrelevant alternatives property of the multinominal logit model. Transpn Res. Rec. No. 637, 39-46.
Nicolaidis G. C. and Sheth J. N. (1976)An Application of Market Segmentation in Urban Transportation Planning. General Motors Research Laboratories, Warren, Michigan, Publiration No. GMR-2139.
A stratification analysis of taste variations in work-trip mode choice Nicolaidis G. C., Wachs M. and Golob T. F. (1977)Evaluation of alternative segmentations for transportation planning. Trunspn Res. Rec. No. 649, 23-31.
Reeker W. W. and Golob T. F. (1976)An attitudinal mode choice model. Transpn Res. 10, 299-310. Scott A. J. and Knott M. (1976)An approximate test for use with AID. Appl. Statist. 25, 103-106. Sonquist J. A., Baker L. and Morgan J. N. (1971) Searching for Slrucfure. Institute for Social Research, The University of Michigan, Ann Arbor, Michigan. Stopher P. R. (l%9) A probability model of travel mode choice for the work journey..HighwayRes. Rec. No. 283, 57-65. Stopher P. R. (1977) The develooment of market seements of d&nation choice.’ Transpn Res. Rec. No. 649, l&22:
485
Tardiff T. J. (1979) Attitudinal market segmentation for transit design, marketing, and policy analysis. Trunspn Res. Rec. No. 735, l-7.
Train K. (1976) Work Trip Mode Split Models: An Empirical Exploration of Estimate Sensitivity to Model and Data Specification. Working Paper No. 7602, Institute of Transportation Studies, University of California, Berkeley, California. Train K. (1978)A validation test of a disaggregate mode choice model. Transpn Res. 12, 167-174. Train K. and McFadden D. (1978)The good leisure tradeoff and disaggregate work trip mode choice models. Trunspn Res. 12, 349-353.