Semiparametric estimation of a work-trip mode choice model

Semiparametric estimation of a work-trip mode choice model

Journal of Econometrics 58 (1993) 49-70. North-Holland Semiparametric estimation of a work-trip mode choice model Joel L. Horowitz* University of ...

1MB Sizes 0 Downloads 72 Views

Journal

of Econometrics

58 (1993) 49-70.

North-Holland

Semiparametric estimation of a work-trip mode choice model Joel L. Horowitz* University of Iowa. Iowa City, IA 52242,

USA

A binary response model of choice between automobile and transit for travel to work is estimated using fixed- and random-coefficients probit specifications, a semiparametric single-index specification, and a semiparametric specification that permits arbitrary heteroskedasticity of unknown form. The estimation methods are maximum likelihood for the probit models, semiparametric quasimaximum-likelihood for the single-index model, and maximum score and smoothed maximum score for the arbitrarily heteroskedastic model. Specification tests result in rejection of the fixedcoefficients probit and single-index models but not of the random-coefficients probit model and the semiparametric model with arbitrary heteroskedasticity.

1. Introduction Binary response models are widely used in economics to investigate phenomena such as labor force participation, union membership, travel behavior, and migration. The most frequently used form of binary response model is Y=l = 0

if

p’x+u>o,

otherwise,

(1)

where y is the indicator of the response, x is a q x 1 vector of explanatory variables, and p is a q x 1 vector of parameters. In applications, /? must be estimated from observations of (y, x). If the distribution of u conditional on x is known up to a finite set of parameters, Band Correspondence to: Joel L. Horowitz, IA 52242, USA.

Department

of Economics,

University

of Iowa, Iowa City,

*I thank Herman Bierens, Wolfgang Hlrdle, Tony Lancaster, Charles Manski, Whitney Newey, Geert Ridder, and an anonymous referee for comments on an earlier draft of this paper, and Charles Manski and Scott Thompson for providing me with their computer program for maximum score estimation. The research reported in this paper was supported in part by NSF grant no. SES-8922460.

0304~4076/93/%06.00

G

l993-Elsevier

Science Publishers

B.V. All rights reserved

50

J.L. Horowitz,

Work-trip

mode choice model

any parameters of the distribution of u can be estimated by maximum likelihood. See, for example, McFadden (1974). Often it is assumed that u has either the normal or the logistic distribution independent of x. Eq. (1) yields the binary probit model if u is normally distributed and the binary logit model if u is logistically distributed. In practice, there is rarely any justification other than convenience for assuming that the distribution of u belongs to a known parametric family. Misspecification of the distribution causes maximum likelihood parameter estimators to be inconsistent, and the resulting predictions of choice behavior can be highly erroneous. This has motivated the development of so-called semiparametric methods that enable /I to be estimated consistently without specifying the distribution of a. Two classes of semiparametric binary response models are especially important for applications. One consists of single-index models. I will call the other ‘arbitrarily heteroskedastic models’ for reasons that are explained below. The single-index form of eq. (1) is P(y = l(x) = F(B’X),

(2)

where F is an unknown function (not necessarily a distribution function) whose range is contained in [0, 11. N 1’2-consistent, asymptotically normal estimators of fl in single-index models have been developed by Ichimura (1988), Klein and Spady (1989), and Powell et al. (1989). The estimator of Klein and Spady (1989) achieves the asymptotic efficiency bound of Chamberlain (1986) and Cosslett (1987) if F is a continuous distribution function and certain other regularity conditions are satisfied. The other important class of semiparametric models consists of eq. (1) with the auxiliary assumptions that median(ulx) = 0 and that the distribution of u satisfies certain regularity conditions. ’ In other respects the distribution of u is assumed to be unknown. Manski (1975, 1985) and Horowitz (1992) give estimators of /3 in this model. Kim and Pollard (1990) show that Manski’s estimator converges in probability at the rate N -lj3. The centered, normalized estimator has a complicated, nonnormal asymptotic distribution. Under smoothness conditions that are slightly stronger than those of Manski, my estimator converges in probability at a rate that is at least as fast as N -2’5 and can be arbitrarily close to N -“’ depending on the details of the smoothness conditions. The centered, normalized estimator is asymptotically normally distributed with a covariance matrix that can be estimated consistently. The two classes of semiparametric models are not substitutes for one another. One reason for this is that they are nonnested. The second class permits u to ’ The assumption median(ulx) = 0 can be replaced between 0 and 1. This extension will not be used here.

with quantile(ulx)

= 0 for any

quantile

J.L. Horowitz. Work-trip mode choice model

51

have virtually arbitrarily heteroskedasticity of unknown form provided that the centering assumption median(u Ix) = 0 is satisfied. This is why the second class is called arbitrarily heteroskedastic here. Single-index models (the first class) make no centering assumption, but they restrict any heteroskedasticity of u to forms that are consistent with the single-index specification (2). The two classes of models also differ in their ability to predict y. In single-index models, F(j’x) can be estimated by nonparametric regression at continuity points of F in the support of Fx. In arbitrarily heteroskedastic models with median(u)x) = 0, P(y = 1 Ix) = 0.5 if /7x = 0 and P(y = 1 Ip’x) - 0.5 has the same sign as p’x if p’x # 0. P(y = 1Ix) can be estimated by nonparametric regression of y on x at continuity points of P(y = 11.) for which p’x # 0, but the estimates will be very imprecise if x is multi-dimensional. Semiparametric methods for binary response modeling have not yet been used much in applications, and it is not known to what extent they yield results in practice that are substantially different from those of familiar parametric models such as probit and logit. Newey et al. (1990) reported that probit and single-index semiparametric estimates of the parameters of a model of labor force participation were similar. Das (1991) estimated logit and arbitrarily heteroskedastic semiparametric models of the decision whether to idle a cement kiln. An informal examination of the estimation results suggested that the logit model may have been misspecified, but no formal tests were carried out. This paper reports the results of estimating several parametric and semiparametric models of the choice between automobile and transit for the work trip.’ The objectives are to determine whether (1) it is possible to distinguish empirically between the parametric and semiparametric models, (2) it is possible to distinguish empirically between single-index and arbitrarily heteroskedastic semiparametric models, (3) the semiparametric models provide better descriptions of the data than do the parametric ones, and (4) the semiparametric models yield inferences and predictions that are substantially different from those of the parametric models. It turns out that a parametric model that assumes homoskedasticity of u and a semiparametric single-index model are rejected by specification tests. A parametric model based on a random-coefficients specification and a semiparametric model that permits arbitrary heteroskedasticity of u are not rejected. These two models yield different inferences concerning the effects on mode choice of in-vehicle and out-of-vehicle travel time and moderately different predictions of choice. Section 2 describes the models. Section 3 describes the data and estimation results, and concluding comments are given in section 4.

’ Many work-trip mode choice models divide the automobile mode into drive alone and carpool. The choice among the three modes drive alone, carpool, and transit is modeled. I have not done this here since semiparametric methods for multinomial response are even less well-developed and understood than methods for binary response.

52

J.L. Horowitz, Work-trip mode choice model Table 1 Variables

of the models.

Variable

Definition

INTERCEPT

Intercept term equal to 1; not used in the single-index not identified

AUTOS

Number

DO VTT

Transit out-of-vehicle travel travel time (minutes)

DIVTT

Transit in-vehicle (minutes)

DCOST

Transit

of cars owned by the traveler’s time

household

minus

automobile

travel time minus automobile

fare minus automobile

model where it is

out-of-vehicle

in-vehicle

travel time

travel cost ($)

2. The models The estimated models have the form (1). The dependent variable y equals 1 if automobile is chosen and 0 if transit is chosen. The explanatory variables x are listed in table 1. As in all disaggregate mode choice models, they represent transportation service quality and characteristics of the traveler. Of course, other sets of explanatory variables are possible. For example, it is not unusual for mode choice models to include the traveler’s income as a component of x and to use DOVTT/(travel distance) in place of DOVTT and DCOST/(traveler’s income) in place of DCOST. In preliminary experiments with parametric models, I found that using DOVTT/(travel distance) instead of DOVTT or DCOST/(traveler’s income) in place of DCOST produced statistically significant decreases in goodness-of-fit and that adding income to x did not yield a statistically significant improvement in fit.3 Therefore, I have not used these variables in the models reported here. 2.1. Parametric models

The parametric models are fixed- and random-coefficients probit models. The fixed-coefficients probit model has the specification P(y = 11x) = @(D’x/a),

(3)

3 In a probit model that includes the variables listed in table 1 plus DOVTT/(travel distance), DCOST/(traveler’s income), and income, the coefficients of the latter three variables are collectively nonsignificant in a White (1982) robust Wald test (p > 0.40), whereas the coefficients of DO VTTand DCOST are collectively significant (p < 0.01). Similar results are obtained when individual variables are tested.

J.L. Horowitz, Work-trip mode choice model

53

@ is the cumulative normal distribution function and (Tis the standard deviation of U. Scale normalization is achieved by setting PDCOST = 1, where /IDCosTis the /I component corresponding to DCOST. In the random-coefficients probit model, the coefficients of x are assumed to be independently distributed as N(j?, C), where C is a diagonal matrix. Thus, the model is P(y = 1Ix) = @[B’x/V(x)“2],

(4)

V(x) = X’CX.

(5)

where

Scale normalization is achieved by setting aDCOST= 1. Random coefficient variation arises in a mode choice model if the nonintercept coefficients of x depend on unobserved attributes of the traveler. Evidence of random coefficient variation in travel demand models has been given by Fischer and Nagin (1981) and Hausman and Wise (1978). Both probit models were estimated by maximum likelihood. 2.2.

The single-index

model

The single-index model is given by (2) with the scale normalization D,-OST = 1. The intercept component of fi is subsumed in F and, therefore, is not D identified. The estimator of fl is obtained by maximizing the quasi-log-likelihood function log L,(b) = N - ’ 5 {y, log F,(b’x,) n=l

+ (1 -

Yn)logCl- Fdb’XJl)?

(6)

where FN(*) is a nonparametric estimate of F( a). Klein and Spady (1989) give the asymptotic theory of the quasi-maximum-likelihood estimator (QMLE) and methods for estimating asymptotic standard errors. N ‘I2 times the centered estimator of j? is asymptotically normal.4

“The asymptotic theory of the quasi-maximum-likelihood estimator requires trimming observations for which /I’x is close to the boundary ofits support. Monte Carlo results of Klein and Spady (1989) indicate that trimming has little effect on the estimates, although it greatly increases the amount of computation required. As a result, I have not carried out trimming. Formally, this amounts to assuming that the support of j?‘x is larger than that observed in the data.

54

J.L. Horowitz, Work-trip mode choice model

As in Klein and Spady (1989), FN is calculated from nonparametric kernel estimates of the density of b’x conditional on y. Specifically, set PN = N -i It= 1 y, . PN is the sample proportion of travelers who choose auto. Then for any real u,

J’dv) =

PNSN(~lY

PNSN(dY= 1) = 1) + (1 - PN)SN(UlY = 0)’

(7)

where gN(*)y) is a kernel estimate of g( *ly), the conditional density of b’x. This estimate is given by g&ly

= 1) = (NPN~N)-I

%v(VlY = 0) = CNU -

5 Y,KC@ - b’xJlbv1,

(8)

PI=1

PN)hI-l

5 (1 - YJKC@ - b’%)lhl>

II=1

(9)

where K is the kernel function and (hN} is a sequence of bandwidths satisfying Nh$-r 00 and Nhi -+ 0 as N --) cc. I have used the following kernel: K(z) = (21/64)[1 - 5(2/5)’ + 7(~/5)~ - 3(~/5)~] l(j.zj < 5).

(loj

It is easily verified that j”_5 zi K (z) dz = 0 if 1 I i I 3 but not if i = 4, so K is a fourth-order kernel. There is no obvious optimality criterion for selecting hN. The asymptotic distribution of the QMLE is independent of the bandwidth, and the asymptotically optimal bandwidth for estimating g( *1y) using a fourth-order kernel converges at the rate N -i19, which is too slow for the QMLE. Given this situation, there is no point in using an elaborate bandwidth selection procedure. 1 used a preliminary estimate of /I to compute plug-in estimates, h&, of the asymptotically optimal bandwidths for estimation of g( . Iy) (y = 0 or 1) with the kernel (lo).’ I obtained bandwidths hNy that converge at the rate N-l” by setting hNy= N -‘w h,&. This procedure yielded hNyx 0.5 for both values of y. Examination of the graphs of the estimates gN( *Iy) indicated that 0.5 is a reasonable bandwidth. The graphs are fairly smooth and do not exhibit the flattening that occurs when the bandwidth is too large. Accordingly, the single-index estimation results reported in this paper are based on h, = 0.5. 2.3. Models with median (ulx) = 0 and arbitrary heteroskedasticity These models are based on (1) with PDCosT= 1. I estimated fl using Manski’s (1975, 1985) maximum score estimator and my smoothed maximum score 5 The plug-in method provides an estimate of the asymptotically initial bandwidth that is chosen by the analyst.

optimal

bandwidth

based on an

J.L. Horowitz, Work-lrip mode choice model

55

estimator [Horowitz (1992)]. Maximum score estimation consists of selecting the estimator to maximize Q)=N-’

: [2sl(y,=l)-l]l(b’x,20). n=l

(11)

Manski (1975, 1985) and Kim and Pollard (1990) give the asymptotic theory of this estimator. The estimator converges in probability at the rate N -lj3. Its asymptotic distribution is very complicated and not useful for making inferences in applications. Manski and Thompson (1986) suggest using the bootstrap to estimate standard errors and give Monte Carlo evidence on its performance. There has been no theoretical investigation of the properties of the bootstrap in maximum score estimation. The version of the smoothed maximum score estimator used here maximizes &(b) = N -l 5

[2 - 1(y, = 1) - l] @@‘x,/h,),

(12)

n=l

where @is the cumulative normal distribution function and {hN}is a sequence of bandwidths that converge to 0 at the rate N-‘j5. Horowitz (1992) gives the asymptotic theory of the estimator, methods for removing its asymptotic bias and for estimating asymptotic standard errors, and a plug-in method for selecting the bandwidth. N ‘I5 times the centered, bias-corrected estimator obtained from (12) is asymptotically normal. 5: was maximized using the algorithm of Manski and Thompson (1986). The objective function of the smoothed estimator, S,, has many local maxima and requires a global optimization algorithm. I obtained satisfactory results by carrying out 300 iterations of the simulated annealing algorithm of Szu and Hartley (1987), followed by as many Newton-Raphson iterations as were needed to obtain convergence. Simulated annealing yields a value of b that is sufficiently near the global maximum of Slv to enable the maximum to be found by Newton-Raphson.

3. Data and estimation results 3.1. Data The data consist of 842 observations sampled randomly from the Washington, DC, area transportation study. The study was carried out by the Washington Metropolitan Area Council of Governments and included a homeinterview survey of travel by individuals. Its results include records of daily travel by individuals and information on transportation system performance

J.L. Horowitz, Work-trip mode choice model

56

Table 2 Parameter Coefficients

Model

INTERCEPT

Fixed-coeK probit Random-coeff. probit

max.

Max. score

“Log-likelihood

(standard

errors)

CARS

DOVTT

DIVTT

DCOST

log LNa

- 0.6278 (0.2323)

1.2802 (0.2566)

0.0338 (0.0146)

0.0056 (0.0040)

1.0

- 230.16

- 1.0934 (0.3411)

1.9737 (0.4743)

0.0496 (0.02 15)

0.0051 (0.0046)

1.0

- 220.19

2.8448 (0.2693)

0.0719 (0.0171)

0.0047 (0.0045)

1.0

- 221.58

- 1.5761 (0.1684)

2.2418 (0.1790)

0.0269 (0.0074)

0.0143 (0.0020)

1.0

- 1.6466 (0.1374)

2.2520 (0.1480)

0.0411 (0.0294)

0.0110 (0.0106)

1.0

Single-index Smoothed score

estimates.

for the parametric

models and quasi-log-likelihood

for the single-index

model.

(i.e., travel times and costs). Each record in the estimation data set contains information for a single work trip, including the chosen mode and the values of the variables x listed in table 1. Eighty-four percent of the trips in the estimation data set are by automobile and 16% are by transit. 3.2. Estimation

results

Table 2 shows the estimates of /3 for the parametric and semiparametric models as well as estimated asymptotic standard errors. The smoothed maximum score estimates are bias-corrected, and the coefficient of DCOST is 1.0 in all of the models by scale normalization. The standard errors of all but the unsmoothed maximum score estimates are based on formulas derived from asymptotic theory. The standard errors of the coefficients of the probit models were obtained with White’s (1982) specification-robust method and are asymptotically correct even if the models are misspecified. The standard errors of the single-index and smoothed maximum score estimates assume that the estimated models are correctly specified. The standard errors of the unsmoothed maximum score estimates were computed using the bootstrap procedure of Manski and Thompson (1986). The bandwidth for the smoothed maximum score estimator was set at 0.15 by using the plug-in method [Horowitz (1992)].‘j

6The estimated initial bandwidths.

asymptotically

optimal

bandwidth

was between 0.14 and 0.16 for a wide range of

J.L. Horowitz, Work-trip mode choice model

57

The estimates of B differ among models by more than a factor of 2. In this sense, the different estimation methods yield very different results. Section 3.3 presents evidence indicating that the differences are not simply artifacts of random sampling errors. Implications of the differences for predicting choice are discussed in section 3.4. The estimated standard errors from the random-coefficients probit model are larger than those from the fixed-coefficients model. This is to be expected since the random-coefficients model has more estimated parameters than the fixedcoefficients one. The estimated standard errors from the single-index model are between those of the two probit models. However, little significance can be attributed to this relation since, as will be discussed in section 3.3, the fixedcoefficients probit and single-index models are likely misspecified. All of the estimated standard errors of the smoothed maximum score coefficient estimates and two of the standard errors of the unsmoothed maximum score estimates are much smaller than those obtained from the parametric and single-index models. This is inconsistent with asymptotic theory, since the smoothed and unsmoothed maximum score estimators converge in probability more slowly than the other estimators. Thus, it appears that asymptotic theory in the case of the smoothed maximum score estimator and the bootstrap in the case of the unsmoothed maximum score estimator do not give accurate estimates of standard errors in this application. This is not surprising. Monte Carlo results reported by Horowitz (1992) indicated that very large samples are needed to make estimated asymptotic standard errors reasonable approximations of true standard errors in smoothed maximum score estimation. Monte Carlo results reported by Manski and Thompson (1986) indicated that the accuracy of bootstrap standard errors for the unsmoothed maximum score estimator depends on the form of the heteroskedasticity of U. Despite the inaccuracy of the estimated standard errors, it may be possible to obtain good estimates of confidence intervals for the smoothed maximum score estimates by using the bootstrap. Beran (1988) and Hall (1986) have shown that the bootstrap distribution of an asymptotically pivotal test statistic (e.g., a tstatistic) based on an N ‘I2-consistent estimator coincides through order N - 1’2 with the Edgeworth expansion of the exact finite-sample distribution. Thus, the bootstrap yields more accurate critical values for such a statistic than does first-order asymptotic theory. These critical values can be used to form confidence intervals whose coverage probabilities are closer to nominal values than are those based on first-order asymptotics [Hall (1986)]. At present, there is no proof that similar results hold for the smoothed maximum score estimator, which is not N iI2-consistent. However, I have done some preliminary calculations suggesting that the bootstrap distribution of the smoothed maximum score r-statistic coincides with an Edgeworth expansion of the exact distribution through order N -2’5 in the case of the version of the estimator used here. In addition, Monte Carlo evidence [Horowitz (1992)]

58

J.L. Horowitz, Work-trip mode choice model

suggests that critical values based on the finite samples than are those obtained Accordingly, I have used the bootstrap to the smoothed maximum score parameter dure is:

bootstrap are much more accurate in from first-order asymptotic theory. estimate 90% confidence intervals for estimates. The bootstrapping proce-

(1) Generate a bootstrap sample of size 842 by sampling the estimation data set randomly with replacement. (2) Using this sample, compute the bias-corrected smoothed maximum score estimate of /I and the r-statistic for testing the hypothesis pi = biN, where /Ii is the ith component of /I and biH is the bias-corrected smoothed maximum score estimate of pi obtained from the original estimation data set (the estimate shown in table 2). (3) Estimate the 0.10 critical value of the t-statistic from the empirical distribution of this statistic obtained through repetition of steps 1 and 2. Estimate a 90% confidence interval for the ith component of /I as biN +_ CiSi,where ci is the bootstrap estimate of the critical value of the t-statistic for pi and si is the estimated asymptotic standard error of biN. The estimated 0.10 critical values based on 500 bootstrap samples are in the range 4.18-4.55, whereas the asymptotic critical value is 1.645.’ Table 3 shows the half-widths of the resulting 90% confidence intervals. The half-widths of asymptotic 90% confidence intervals based on the parametric, single-index, and unsmoothed maximum score estimates also are shown. The half-widths for the parametric and single-index models are 1.645 times the estimated standard errors shown in table 2. The half-widths for the unsmoothed maximum score estimates were obtained from the Manski and Thompson (1986) bootstrap procedure. Since this procedure is not based on an asymptotically pivotal statistic, there is no reason to expect it to yield results that are more accurate than first-order asymptotic theory. However, because the asymptotic distribution of the maximum score estimator is intractable, standard methods cannot be used to obtain confidence intervals with this estimator. The bootstrap may provide a method for obtaining confidence intervals. There is, of course, no way of knowing whether the coverage probabilities of the bootstrap-based 90% confidence intervals are close to 90%. However, the half-widths shown in table 3 seem reasonable. Except for the intercept obtained from the unsmoothed maximum score estimator, the bootstrap-based intervals are wider than those based on the fixed-coefficients probit and single-index models and comparable to or wider than those based on the random-coefficients ‘The number bootstrapping.

of bootstrap

samples

was limited

by the very long computing

time required

for

59

J.L. Horowitz, Work-trip mode choice model Table 3 Half-widths Based on first-order bootstrap

of 90% confidence

intervals

for estimated

asymptotic formulas for the parametric for the maximum score and smoothed

parameters.

and single-index estimators and on the maximum score estimators.

Coefficients INTERCEPT

Model Fixed-coeff. probit Random-coeff. probit Single-index Smoothed max. score Max. score

0.3822 0.5611 0.7664 0.1178

CARS

DOVTT

DIVTT

0.4222 0.7802 0.4325 0.7488 2.1062

0.0239 0.0354 0.0286 0.0310 0.0486

0.0065 0.0076 0.0074 0.0087 0.0176

probit model. In addition, except for the intercept term, the bootstrap confidence intervals based on the smoothed maximum score estimator are narrower than those based on the unsmoothed estimator. The smoothed maximum score estimator yields inferences that are different from those obtained from the parametric and single-index estimators. Based on the confidence intervals in table 3, the coefficient of DIVTT is statistically significantly different from 0 at the 0.10 level according to the smoothed maximum score estimation results but not according to the parametric and single-index results. By contrast, the coefficient of DOVTT is significant at the 0.10 level according to the parametric and single-index results but not according to the smoothed maximum score results. Neither coefficient is significant at the 0.10 level according to the unsmoothed maximum score estimates. 3.3. SpeciJication

tests

The specification tests have two objectives. One is to determine whether the distributional assumptions made by the fixed- and random-coefficients probit models are consistent with the estimation data. The other is to investigate whether the data enable one to discriminate between the single-index and arbitrarily heteroskedastic semiparametric models. The fixed-coefficients probit model can be tested against the random-coefficients model, which nests it, with a likelihood ratio, Wald, or Lagrangian multiplier test of the hypothesis that the variances of the nonintercept coefficients are 0. These tests all reject the hypothesis of fixed coefficients (p < 0.01). Thus, the fixed-coefficients probit model is misspecified.’

s

The hypothesis that the variances of the nonintercept coefficients are zero is on the boundary of the parameter set of the random-coefficients probit model. This problem can be dealt with by truncating the distribution of x so that it has bounded support and expanding the parameter set to include all values of the ‘variance’ parameters that are in a small neighborhood of 0.

J.L. Horowitz. Work-trip mode choice model

60

It is convenient to test the random-coefficients probit model against a nonparametric alternative. To do this, let /& and and 2, denote the maximum likelihood estimates of /I and C in the random-coefficients probit model. Let V,(x) = x’c^,x. If the random-coefficients probit model is a correct description of the data-generation process, a nonparametric regression of y on &x/ V,(X)“~ yields a consistent estimate of the cumulative normal distribution function, @(*). The statistical significance of differences between the nonparametric regression curve and @( *) can be assessed by constructing a uniform confidence band for the nonparametric regression function. The random-coefficients probit model is rejected at the 1 - M level if @(*) is not contained in the 1 - c1 uniform confidence band. The method for constructing the band is given in Proposition 1, which is a modified form of Theorem 4.3.1 of Hardle (1990). The main modification consists of replacing /I and Z with their maximum likelihood estimates. Heuristically, this can be done because the rate of convergence in probability of maximum likelihood estimators is faster than the rate of convergence of the nonparametric regression function. The proof of Proposition 1 is in the appendix. Proposition 1. Let Fn denote the kernel regression of y on b;Vx/ V,(x)‘/‘. Let the kernel, K, be a probability density that is symmetrical about 0, has bounded support, and whose first derivative has bounded variation. Let the bandwidth be o.,,,, = hi, where 1 < 6 c 513 and hN oc N - l/‘. Let g( *) denote the probability density function of /?‘x/ V(x) ‘I2 . Let gN denote the kernel estimate of g based on pnx/ VN(X),kernel K, and bandwidth UN. Let S be a closed interval of the real line on which g is strictly positive. Assume that P(y = 1Ix) = @[px/ V(X)“~] and that g is twice diflerentiable. Then for any real z,

lim P (0.46 log N)1’2 ( NoN/c)~‘~sup DES [

[~N(“,/a~(v,l”2~~N(v)

N-m

-

dn

where o;(v)

1I

=

< z

FNb)

= exp[-

cl

-

Zexp(-

z)],

FN(v)l,

dN = (0.46 log N)“2 + (0.46 log N)- “’ log[c*/(2rt2)]“2,

s 00

C=

-m

K(u)~ du,

s m

c* = (2cC’)

-00

K’(u)~ du.

-

@@)I

J.L. Horowitz,

Work-trip

61

mode choice model

1.2 - - - Conf. Band -Normal Distr.

-. 2

I -3

Fig. 1. P(CAR)

1

I

1

I

I

1 3

B’X/SQRT(V) from the random-coefficients

probit

model and nonparametric

regression,

Fig. 1 shows a plot of the kernel regression of y on flNx/ V&)“‘, the uniform 95% confidence band for the regression function, and @(*). The kernel function is the normal density.’ The value of hN was set at 0.30 using least-squares cross-validation, and 6 = 4/3. The estimated regression function and @(*) are close to one another, and @(*) is well within the confidence band. Thus, the hypothesis that the random-coefficients probit model is correctly specified cannot be rejected.’ O A formal specification test for the single-index model is not available. An informal test can be carried out by dividing the support of x into cells and comparing the within-cell average predicted probabilities of choosing automobile according to the single-index model with the observed fractions

‘The normal density can be modified to have bounded support, as is required by Proposition 1, by replacing its tails with cubic splines that go smoothly to 0 for some finite IuJ. This has no effect on the results of the test if the truncation point is made sufficiently large. lo The test based on nonparametric regression also can be applied to the fixed-coefficients probit model, and that model is rejected at the 0.05 level. @( *) is not entirely inside the_uniform 95% confidence band obtained from the nonparametric regression of y on &x when pN is the fixedcoefficients probit estimate of /I.

J.L. Horowitz, Work-trip mode choice model

.k

b

1

Predicted Probability Fig. 2. Observed

of

Q pred

travelers =c

necell

choosing FN(~%xJ/N~~II

and predicted

probabilities

automobile. and

That

for the single-index

is,

for

model.

each

Qobs= Locell Y~IN,,N,

cell

where

compute l%

is

the

single-index estimate of 8, FN is given by (7)-(9) with b replaced by a;J, and N cell is the number of observations in the cell. If the single-index model is correctly specified, Qobs and Qpred will be close, and a graph of Qobs against will show scatter around a straight line through the origin with unit slope. Q pred I defined cells according to the predicted probability of choosing automobile obtained from the random-coefficients probit model. The cells correspond to predicted probit probabilities in the intervals [0, O.l), [O.l, 0.2), . . . , CO.9,1.01. Fig. 2 shows the resulting graph with a 45” line superimposed. Fig. 3 shows a similar graph for the random-coefficients probit model. The fit is much worse in fig. 2 than in fig. 3. Without a formal test, one cannot exclude the possibility that the poor fit of the single-index model is caused by random sampling errors. Nonetheless, it is clear that the random-coefficients probit model provides a more accurate description of the data than does the singleindex model. Now consider the semiparametric model with arbitrary heteroskedasticity. According to this model P(y = 1jp’x = 0) = 0.5. The model can be tested by

63

J.L. Horowitz, Work-trip mode choice model

I

I

I

0

.5

1

Predicted Probability Fig. 3. Observed

and predicted

probabilities

for the random-coefficients

probit

model.

using nonparametric regression to estimate P(y = 1 Ip’x = 0) and using the result to test the hypothesis that this probability is 0.5. The following proposition, which is proved in the appendix, provides the basis for the test: Proposition 2. Let bN denote the estimate of /?, and let Fg( *) denote a symmetrical second-order kernel 1 < 6 < 513 and hN cc N-l”. Under as N+ co, (Nhs,/ V,)“* [F;(O) where

bias-corrected smoothed maximum score the kernel regression of y on bhx using K(s). Let the bandwidth be hi, where assumptions l-11 of Horowitz (1992) and

- 0.5-JA

N(0, I),

(13)

m v, = 0.25g,(O) - l

gn(0) = (Nh;)-’

K(z)* dz,

(14)

2 K(b;vx,/h;). II=1

(15)

s -m

64

J.L. Horowitz, Work-trip mode choice model

It follows from Proposition 2 that the hypothesis P(y = 1 Ip’x = 0) = 0.5 can be tested at the ~1 level by comparing the test statistic I(AJ/I~)“~ x [F;(O) - 0.5-J]/ VY’ with the 1 - a/2 quantile of the normal distribution. To implement the test, I carried out the nonparametric regression using the normal density as the kernel. The bandwidth hN was set at 0.25 using least-squares cross-validation. The hypothesis P(y = 1 lp’x = 0) = 0.5 was not rejected at the 0.10 level with any of several 6 values between 1 and 5/3. Thus, according to this test, the arbitrarily heteroskedastic model is consistent with the data. This result agrees with the result of testing the random-coefficients probit model, which is nested by the arbitrarily heteroskedastic model.”

3.4. Predictions One of the main uses of mode choice models is to predict the effects on modal market shares of changes in the values of transportation service quality variables. Therefore, it is interesting to compare the predictions obtained from the models discussed here. Since the estimated arbitrarily heteroskedastic model provides predictions of shares only when b’,x = 0, where bN is the estimate of 8, I have restricted the comparison to cases in which this condition holds. Table 4 shows a variety of x values for which bkx = 0 when bN is the bias-corrected smoothed maximum score estimate of fl. Also shown are the predicted probabilities that automobile is chosen obtained from the estimated fixed- and random-coefficients probit models, the single-index model, and the arbitrarily heteroskedastic model. The differences among the predicted probabilities are large. Depending on the models and the values of the explanatory variables, the fractional differences between pairs of predicted probabilities can exceed 50%. The fractional differences between the probabilities predicted by the random-coefficients probit and arbitrarily heteroskedastic models are 2-34%) depending on the explanatory variables. The predicted probabilities of the two models are fairly close when CARS = 1 but differ greatly in several cases with CARS = 0. The differences are presumably due to random sampling errors, since the random-coefficients probit and arbitrarily heteroskedastic models are nested and neither was rejected by the specification tests.

r1 Charles Manski has pointed out that if P(y = 11s’ x) IS a continuous function of px, another test of the arbitrarily heteroskedastic model can be obtained by using a nonparametric regression of y on b’,x and uniform confidence bands for the regression function to test whether P(y = 1 Ipx) - 0.5 has the same sign as px. This test accepts the arbitrarily heteroskedastic model.

I_

20.00 10.00 5.00 20.00 10.00 5.00 5.00 20.00 10.00 5.00 20.00 10.00 5.00 5.00

1.00 1.00 1.00 1.00 1.00 1.00 1.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00

I. “, -

DOVTT

CARS

I._

30.00 30.00 30.00 10.00 10.00 10.00 5.00 30.00 30.00 30.00 10.00 10.00 10.00 5.00

DIVTT

Values of explanatory variables

_.

_.._

-

“-_

1.63 1.36 1.23 1.35 1.08 0.94 0.87 0.61 0.88 1.01 0.89 1.16 1.30 1.37

DCOST

---

0.47 0.47 0.47 0.55 0.55 0.55 0.57 0.65 0.65 0.65 0.72 0.72 0.72 0.73

Fixedcoeff. probit

Predicted choice probabilities.

Table 4

--

0.55 0.52 0.51 0.58 0.56 0.55 0.56 0.67 0.59 0.56 0.67 0.60 0.58 0.58

Randomcoeff. probit

.,..._

I

_

--

0.77 0.68 0.63 0.78 0.72 0.68 0.69 0.65 0.51 0.40 0.69 0.58 0.50 0.52

Single-index

Predicted probability of auto

--

._

0.50 0.50 0.50 0.50 0.50 0.50 0.50 0.50 0.50 0.50 0.50 0.50 0.50 0.50

Arbitrarily heteroskedastic

J.L. Horowitz, Work-trip mode choice model

66

4. Conclusions

The results presented in this paper show that there are applications in which it is possible to distinguish empirically between parametric and semiparametric binary response models and between different semiparametric specifications. In the setting described here, the estimation results indicate that neither a fixedcoefficients probit model nor a semiparametric single-index model adequately represents the heteroskedasticity of u in eq. (1). A random-coefficients probit model and a semiparametric model that permits arbitrary heteroskedasticity of unknown form are not rejected by specification tests. These results show that distributional assumptions can be important in applied binary response modeling. Parametric and semiparametric specifications that make overly restrictive assumptions about heteroskedasticity can be rejected in specification tests. This finding reinforces and extends the results obtained by Fischer and Nagin (1981) and Hausman and Wise (1978) with parametric models. The results presented here also show that semiparametric methods are workable tools for applied binary response analysis. They are computable and falsifiable, and they yield estimates that have reasonable precision with samples of practical size.

Appendix: Proofs of propositions

The following lemma is used in the proof of Proposition

1.

Lemma 1. Let the assumptions and notation of Proposition 1 hold. Let F,’ denote the kernel regression of y on /YxJV(x) Ii2 based on kernel K and bandwidth UN = hi, where h, cc N _ ‘I5 and 1 c 6 -c 513. Let gi denote the kernel estimate of g based on p’x/ V(x) ‘I2 , kernel K, and bandwidth ON. Then

Proof.

plim

SUP

(NwN)l” 1FN(u) - FG (II)1= 0,

(A.1)

plim

SUP

(NwN)l” (IN

(A.21

- g,’ (u)I = 0.

It suffices to prove that (A.2) holds and that

;lit

suf (NoN)-~‘~ 5 YJK{~G II=1 XK{W&U

- P’x,,/V(x)]})

Co- BZxJ VN(X.)I}

= 0..

Only (A.2) will be proved here. The proof of (A.3) is similar.

64.3)

J.L. Horowitz,

Work-trip mode choice model

61

Let 0 = (8, C) and r = (p*, C*), where /I* is any column vector with the same dimension as /I and Z* is any positive, semidefinite, diagonal matrix with the same dimensions as C. Let $(x, 0) = /?‘x/(x’Z~)i/~ and $(x, r) = ~*‘x/(~‘C*x)‘~‘. Define

5 (K{WilCU -

$(X”, r,l)

dN(V,0,5) = (NcoN)-1’2

n=l

Define TV = (BN, iN). It is necessary to prove that dN(u, 8, tN)L 0 as N + CC uniformly over u E S. To do this, let P( .) and PN( *), respectively, denote the cumulative and empirical distribution functions of x. Write

4df4 4 5) = 444

8,~) + dN2b48,~)~

where

AN1(4 8, 7) =

(Nl~iv)“~

s

O”(K{o$

II0 - 4% r)l>

-CC

cv-

- K(f&l

dN2(U,

65)

=

m

(NIwvN)~‘~ s

- K{o,’

$k em) dCPd4 - fwl, (K{O$

-02

co

-

ax,

T)l)

C~-W~)l~)W-4.

By the mean value theorem

dN1 (u, 8, T) = (N/w;p2(e

-

T)

sm

K’{

ON% - bwh e*)i)

-m

x caw e*me*idm4.4

- ~641,

where 8* is between 8 and r. Define K’([/o) = 0 for all [ if w = 0. It is not difficult to show that the family of functions K’{w-‘[u - $(x, r)]} x [8$(x, t)/&], considered as functions of x indexed by u, q and o(o 2 0), satisfies the assumptions of Theorem 7.21 of Pollard (1984). It follows from this theorem that dN1(u, 8, r) = O,(l(r - 011/022) uniformly over v E S. Therefore, dN1(v, 8, rN) = O,[( No~)-“~] uniformly over v E S.

68

J.L. Horowitz. Work-trip mode choice model

Now consider dNZ(q 0, r). Let G( *, 0) and G( *, z), respectively, denote the cumulative distribution functions of $(x, 1!3)and Ii/(x, 7). Then 00 dN2(U,

0,

z)

=

(Nlqv)“2

KC@,‘@

01

-

s -03

x dCG(I, 7) - G(L @I.

(A.4)

Integration by parts on the right-hand side of (A.4) yields ~NZ(O,

0,

2)

=

W/O;)“~

O”

[WI,

s -Co

~1

-

G(i,

@I

x K’ [ON ’ (u - <)I di.

(A-5)

A change of variables on the right-hand side of (AS) and symmetry of K yield

AN2

(4

4

7)

=

mICG(~+wlvi,8)-G(u-~~r,e)l

(NIwvP2 s 0

-

CG(u+

wvi,z)-

G(u- ~~53z)l) K’(Od5.

(A4

It follows from differentiability of G that dN2(u, 8, r) = O[(N~~)i’~\jr - e\/] uniformly over u E S. Therefore, dN2(u, 8, rN) = O,(oy’) uniformly over u E S. Q.E.D. Proof of Proposition 1. If the nonparametric regression were based on B’x/(x’Zx)“2 instead of ~Nx/(xt~Ivx) lj2 the proposition would follow immediately from Theorem 4.3.1 of Hardle (1990). By Lemma 1, the result does not change if /I and C are replaced by /?Nand c^,. Q.E.D. Proof of Proposition 2. Given any vector b with the same dimension as x, let Fg( -, b) denote the kernel regression of y on b’x using a symmetrical order 2 kernel K(s). Let the bandwidth be as stated in Proposition 2. Under the hypothesis that P(y = 1 lpx = 0) = 0.5, a Taylor series expansion of FE(O, bN) about b, = p yields

(Nh6,)“2 [F;(O, bN) - OS] = (Nhd,)“2 [F;(O, 8) - 0.51

where /I* is between /I and b,. It is not difficult to show that aFz(O, b)/ab is bounded in probability as N -+ cc uniformly over b in a neighborhood of /I. In

J.L. Horowitz, Work-trip mode choice model

addition (Nhi)“’ (b, - j) converges in probability Horowitz (1992). Therefore, (WY

to 0 by Theorem

69

2 of

CF;(O, b,) - 0.51 = (A%;)“2 [F;G(O,p) - 0.51 + OJl).

By Theorem 2.2.1 of Bierens (1987), (N/I~)“~ [Fz(O, j?) - 0.51 is asymptotically normally distributed with mean 0 and variance I” = 0.25 g(0)

m K(z)’ dz,

where g is the density of px.” The proposition follows from the fact that VNis a consistent estimate of VF. Q.E.D.

Beran, Rudolph, 1988, Prepivoting test statistics: A bootstrap view of asymptotic refinements, Journal of the American Statistical Association 83, 687-697. Bierens, Herman J., 1987, Kernel estimators of regression functions, in: Truman F. Bewley, ed., Advances in econometrics: Fifth world congress, Vol. 1 (Cambridge University Press, New York, NY) 99-144. Chamberlain, Gary, 1986, Asymptotic efficiency in semiparametric models with censoring, Journal of Econometrics 32, 189-218. Cosslett, Stephen R., 1987, Efficiency bounds for distribution-free estimators of the binary choice and censored regression models, Econometrica 55, 559-585. Das, Sanghamitra, 1991, A semiparametric structural analysis of the idling of cement kilns, Journal of Econometrics 50, 235-256. Fischer, Gregory W. and Daniel Nagin, 1981, Random versus fixed coefficient quanta1 choice models, in: Charles F. Manski and Daniel McFadden, eds., Structural analysis of discrete data with econometric applications (MIT Press, Cambridge, MA) 273-304. Hall, Peter, 1986, On the bootstrap and confidence intervals, Annals of Statistics 14, 1431-1452. Hlrdle, Wolfgang, 1990, Applied nonparametric regression (Cambridge University Press, New York, NY). Hausman, Jerry A. and David A. Wise, 1978, A conditional probit model for qualitative choice: Discrete decisions recognizing interdependence and heterogeneous preferences, Econometrica 46,403-426. Horowitz, Joel L., 1992, A smoothed maximum score estimator for the binary response model, Econometrica 60, 505-531. Ichimura, Hidehiko, 1988, Semiparametric least squares estimation of single index models, Manuscript (Department of Economics, University of Minnesota, Minneapolis, MN). Kim, Jeankyung and David Pollard, 1990, Cube root asymptotics, Annals of Statistics 18, 191-219. Klein, Roger L. and Richard H. Spady, 1989, An efficient semiparametric estimator for discrete choice models, Manuscript (Bell Communications Research, Morristown, NJ). Manski, Charles F., 1975, Maximum score estimation of the stochastic utility model of choice, Journal of Econometrics 3, 205-228.

“Bierens (1987) assumes that 9 is twice continuously differentiable, but for the bandwidth sequence used here his proof also holds if g is only once continuously differentiable as assumed in Horowitz (1992).

70

J.L. Horowitz. Work-trip mode choice model

Manski, Charles F., 1985, Semiparametric analysis ofdiscrete response: Asymptotic properties of the maximum score estimator, Journal of Econometrics 27, 313-334. Manski, Charles F. and T. Scott Thompson, 1986, Operational characteristics of maximum score estimation, Journal of Econometrics 32, 65-108. McFadden, Daniel, 1974, Conditional logit analysis of qualitative choice behavior, in: Paul Zarembka, ed., Frontiers in econometrics (Academic Press, New York, NY) 237-253. Newey, Whitney K., James L. Powell, and James R. Walker, 1990, Semiparametric estimation of selection models: Some empirical results, American Economic Review: Papers and Proceedings 80, 324-328. Pollard, David, 1984, Convergence of stochastic processes (Springer-Verlag, New York, NY). Powell, James L., James H. Stock, and Thomas M. Stoker, 1989, Semiparametric estimation of index coefficients, Econometrica 57, 4744523. Szu, Harold and Ralph Hartley, 1987, Fast simulated annealing, Physics Letters A 122, 157-162. White, Halbert, 1982, Maximum likelihood estimation of misspecified models, Econometrica 50, l-25.