The Journal of Choice Modelling ∎ (∎∎∎∎) ∎∎∎–∎∎∎
Contents lists available at ScienceDirect
The Journal of Choice Modelling journal homepage: www.elsevier.com/locate/jocm
Variance-component-based nested logit specifications: Improved formulation, and practical microsimulation of random disturbance terms David S. Bunch a,n, David M. Rocke b a b
Graduate School Management, University of California, Davis, USA Division of Biostatistics and Department of Biomedical Engineering, University of California, Davis, USA
a r t i c l e i n f o
abstract
Article history: Received 4 October 2015 Received in revised form 2 March 2016 Accepted 5 April 2016
The initial motivation leading to the results in this paper was a problem most choice modeling researchers may have not considered: how to simulate random disturbance terms from nested logit (NL) models. We develop an approach using results from Cardell (1997), who proved the existence of a probability distribution (C (λ )) that could be used to formulate NL models based on statistically independent variance components. These components can be interpreted as unobserved preference heterogeneity for the choice ‘dimensions’ used to define NL tree structures. Simulation aside, we consider this formulation to have other practical advantages for empirical work, but it does not appear to have penetrated the literature (possibly due to notational obstacles). We use notation from Daly (2001) to implement an equivalent representation, which also establishes mathematical equivalence between Cardell (1997) and other important results in the NL literature. & 2016 Elsevier Ltd. All rights reserved.
Keywords: Discrete choice modeling Nested logit Random utility maximization Monte Carlo simulation Random disturbance terms
1. Introduction To provide context, consider a simple discrete choice example (mode choice for commute trip) expressed in the random utility maximization (RUM) framework:
Uj = V *j + εj = αj + Xj β + εj , j ∈ {drive alone , carpool , bus , subway} .
(1)
where αj ’s are alternative-specific constants (ASCs), Xj is a 1 × p vector of explanatory variables, β is a p-vector of parameters, and εj ’s are random disturbance terms.1 If the εj ’s are iid Gumbel (type I extreme value) with scale μ , then the probability
(
)
that alternative i is chosen Prob { Ui ≥ Uj, ∀ j ∈ C} is given by the multinomial logit (MNL) model
Pi =
n
e
μVi* *
∑j ∈ C e μV j
(2)
Corresponding author.
1 The vector Xj can be a function of consumer-specific characteristics in addition to attributes of choices alternatives, an aspect suppressed in this notation. The “fixed” utility V *j includes a star for notational reasons arising later in the paper.
http://dx.doi.org/10.1016/j.jocm.2016.04.001 1755-5345/& 2016 Elsevier Ltd. All rights reserved.
Please cite this article as: Bunch, D.S., Rocke, D.M., Variance-component-based nested logit specifications: Improved formulation, and practical microsimulation of random disturbance terms. The Journal of Choice Modelling (2016), http: //dx.doi.org/10.1016/j.jocm.2016.04.001i
D.S. Bunch, D.M. Rocke / Journal of Choice Modelling ∎ (∎∎∎∎) ∎∎∎–∎∎∎
2
where an important early reference is McFadden (1973).2 The MNL's well-known limitations (e.g., independence from irrelevant alternatives, unrealistic substitution patterns across alternatives, etc.) are due to the iid assumption, and many of the methodological advances in discrete choice modeling can be viewed as efforts to relax this assumption. These generally fall into two categories: simulation-based models (e.g., mixed MNL, a.k.a., mixed logit, multinomial probit), and generalized extreme value (GEV)-based models. At the risk of oversimplifying: the first approach allows more flexibility for implementing behavioral concepts (e.g., individual-level taste variation, especially in repeated measures/panel data), but requires simulation methodologies for estimation and analysis, whereas the second approach yields closed-form expressions for models with non-independent correlation structure (which avoids simulation), but at the cost of more complexity. Either approach has its own practical challenges, e.g., identification issues, and the need for specialized software. For additional background, see Train (2009). This paper focuses on nested logit (NL) models (from the GEV family), and was initially motivated by an unusual need arising from a research project on energy systems models: simulating random disturbance terms from a pre-existing, complex NL model3. NL modeling needs are almost universally limited to computing choice probabilities (for model estimation and/or prediction) which can be done using closed-form expressions, so this particular problem has received scant attention in the literature (see discussion in Section 5). The approach we developed uses results from Cardell (1997) [“Cardell”], who (i) shows how the total disturbance term in (1) can be decomposed into independent variance components, and (ii) establishes the existence and properties of probability distributions for these components. This high-level description suggests an obvious solution: simulate the total disturbance term by generating a random draw for each component, and then add them together. Although this may sound straightforward, there are a number of technical challenges, all of which are arguably a result of the model's mathematical complexity. For example, although fundamental theoretical results for NL were (independently) established by Williams (1977), and Daly and Zachary (1978), and subsequently generalized by McFadden (1978, 1981) to the GEV family, confusion over a variety of issues eventually led to controversies in the literature in the late 1990s and early 2000s. Carrasco and Juan De Dios Ortúzar (2002) provides a review, and addresses these in a rigorous and concise way. They demonstrate how two approaches with alternative interpretations (“Williams/Daly-Zachary” and “McFadden”) lead to mathematically equivalent NL formulations, and review important implementation issues that require researchers to make choices among specification and normalization options that can have important implications for the model's properties. Their review is framed in the context of two-level NL models, using a widely used form of notation. In this paper we use a more general form from Daly (2001) [“Daly”], for reasons to be discussed. However, because their treatment is general, we frequently refer to “Carrasco-Ortúzar” to avoid unnecessary replication of technical detail. We combine Daly's notation with Cardell's results to derive expressions that support a practical solution to the simulation problem. They also establish mathematical equivalence among various model forms and derivations in the literature (which is necessary for them to be truly useful). In this regard, we view this as a natural extension of Carrasco-Ortúzar, with additional practical advantages for empirical work. Section 2 establishes required NL notation and definitions. Section 3 summarizes relevant results from Cardell, and provides expressions using Daly notation that establish key equivalences. Section 4 describes generation of random variates for variance components, and Section 5 concludes with comments on the approach, and related results in the literature.
2. Nested logit notation and normalization First, recall the mode choice problem defined in (1). This problem appears frequently in the literature, where one possible NL tree structure is in Fig. 1. It depicts a hierarchy based on an upper-level “choice dimension” defined by nominal features (Auto versus Transit) that are shared by subsets of lower-level alternatives. Such multi-dimensional structure is common in transport applications (e.g., destination-mode choice, or destination-route choice), but, more generally, tree structures provide a useful way to represent perceived “similarity” or “substitutability” among competing alternatives, and how they might vary as a function of such shared choice dimensions. In the context of random utility maximization, this can also be interpreted in terms of the degree of correlation among the random disturbance terms in (1). As discussed in Carrasco-Ortúzar (and also here), there are multiple pathways for deriving NL models based on alternative conceptual interpretations that yield mathematically equivalent formulations. As previously noted, we use notation from Daly (2001). Let c denote any node in the tree, which includes elemental alternatives (that can also be denoted by e) and a root node (root). The tree function t(c) identifies the (unique) parent (precursor) of c. The set of c and its ancestors is A(c) ¼ {c, t(c), t(c(c)),…, t(c) ¼k | t(k) ¼ root}. Daly distinguishes between normalized and non-normalized forms. We are using the normalized form, where each nonelemental node c (a composite node that subsumes a subset of elementary alternatives) is assigned a structural parameter μc . Models are recursively defined using functions V that represent the ‘attractiveness’ of nodes, defined for elementary and 2
The CDF takes the form Fε (y ) = exp ( −e−μy ) Specifically, we developed an approach for incorporating heterogeneous consumer preferences for alternative fuel vehicles within a linear programming-based energy systems model. This required generating realizations of random utilities from a NL model that included 40 choice alternatives and up to four “levels” in the tree: for details, see Bunch et al. (2015). 3
Please cite this article as: Bunch, D.S., Rocke, D.M., Variance-component-based nested logit specifications: Improved formulation, and practical microsimulation of random disturbance terms. The Journal of Choice Modelling (2016), http: //dx.doi.org/10.1016/j.jocm.2016.04.001i
D.S. Bunch, D.M. Rocke / Journal of Choice Modelling ∎ (∎∎∎∎) ∎∎∎–∎∎∎
3
Fig. 1. Two-Level Mode Choice.
composite nodes, respectively, by:
Ve = μt ( e) X e β and Vc =
μt ( c ) μc
log
∑
Va
t ( a)= c
(3)
where it is required that 0 < μt ( c ) ≤ μc for all c. The overall attractiveness of the choice set is given as:
Vroot =
1 log ∑ Va μ root t ( a)= root
(4)
Using this notation, the probability of choosing any node c is given by
log pc =
⎧ ⎫ ⎪ ⎪ ⎨ Va − log ∑ exp Vb ⎬ ⎪ ⎪ a ∈ A (c ) ⎩ t ( b) = t ( a) ⎭
∑
(5)
When elemental alternatives are linked directly to the root node (with no composite nodes), this notation yields the MNL model in (2) with μV *j = Ve and μ = μroot . More generally, μroot represents the (common) scale of the random disturbance terms in (1). In this approach, all mean-related utility terms are associated with elemental nodes. The form of .. in (3) is defined using a non-alternative-specific β , and is consistent with Carrasco-Ortúzar (eq. 19), and Cardell (Eqs. 2 and 3), respectively. Although this form does not explicitly include alternative-specific constants as in (1), they (as well as alternative-specific attributes) can be accommodated by suitable definition of Xe . It is readily confirmed that for two-level NL models (as in, e.g., Fig. 1), (3)–(5) yield expressions equivalent to, e.g., Carrasco-Ortúzar (Eq. 9). However, the formulation (3)–(5) is non-identified: multiplying all μ ’s by a factor leaves the model unchanged if β is divided by the same factor. Identification is achieved by assigning a fixed value to any one of the μ ’s. For two-level NL, setting μroot = 1 corresponds to Carrasco-Ortúzar (eq. 17), which they also denote “upper normalization” (sec. 3.1.1), yielding a model equivalent to Carrasco-Ortúzar (eq. 22, and 15). Moreover, they denote this the UMNL (utility maximizing nested logit) specification (sec. 3.2), emphasizing its equivalence to NL models of McFadden. For completeness, we note that these are also generally equivalent to “random utility model 2” (RU2) of Hensher and Greene (2002), under suitable parameter definitions and restrictions. One advantage of this notation is demonstrated in Fig. 2, which extends (1) by adding another mode (walk), yielding an asymmetric tree with an additional “level.” Fig. 2a corresponds to the example in Cardell (Fig. 1), which is readily accommodated using the Daly tree function. Fig. 2b is included to show that notation relying on level-related indices can require creation of “dummy nodes” for some alternatives, and also some parameter restrictions to ensure a properly defined model —see Carrasco-Ortúzar (Section 3.6.2). Finally, we note that there are potential parameter identification issues if alternativespecific constants are not chosen correctly—see Carrasco-Ortúzar (Section 3.6.1) and Bierlaire et al. (1997).
3. Correlation, variance components, and the C (λ ) distribution For both the MNL and NL models, the marginal distributions of all εj ’s are Gumbel, with the same variance. The main difference is that, although they are iid in MNL, in NL they are correlated. For example, consider the elemental alternatives bus and subway in Fig. 1. Random utilities are represented by
Ubus = Xbus β + εbus Usubway = Xsubway β + εsubway
(6)
where εbus and εsubway are presumed to be (positively) correlated (under multiple possible interpretations). As discussed in Please cite this article as: Bunch, D.S., Rocke, D.M., Variance-component-based nested logit specifications: Improved formulation, and practical microsimulation of random disturbance terms. The Journal of Choice Modelling (2016), http: //dx.doi.org/10.1016/j.jocm.2016.04.001i
D.S. Bunch, D.M. Rocke / Journal of Choice Modelling ∎ (∎∎∎∎) ∎∎∎–∎∎∎
4
Fig. 2. (a) Node-based mode choice model and (b) 3-Level NL with dummy nodes.
Carrasco-Ortúzar, McFadden's approach defines conditions that yield a multivariate extreme value distribution with correlated disturbances. In contrast, Williams (and others) recognized that correlation can be viewed as arising from a decomposition of the ε ’s into two variance components:
* εbus = νtransit + νbus * εsubway = νtransit + νsubway
(7)
* * , and νsubway where νtransit , νbus are independently distributed, and correlation occurs due to the shared component (νtransit ). * * and νsubway are Gumbel distributed (with a common Carrasco-Ortúzar (Section 2.1) describes Williams’ assumptions: νbus scale factor), and νtransit is distributed (with some variance) so that the ε ’s resulting from (7) have the desired Gumbel properties. However, aside from its variance, nothing else about νtransit ’s distribution appears to have been known; moreover, Carrasco-Ortúzar (page 199) notes “such a distribution may not exist”. Cardell fills this gap by establishing the existence and properties of a distribution denoted C (λ ), defined for 0 ≤ λ ≤ 1, where λ determines the mean, shape, and scale of the distribution. For λ = 0, C(0) is standard Gumbel (a.k.a., type I extreme value, with location ¼0 and scale ¼1).4 For λ = 1, C(1) is a degenerate distribution, i.e., ν ~C (1) ⇒ ν ≡ 0. For two independent variates ν1 and ν2, where ν1~C (0) and ν2 ~C (λ ) for 0 < λ < 1, he shows that ν2 + λν1~C (0). Using highly complex notation, Cardell (Thm. 3.1) provides an NL model formulation based on recursive combination of variance components and the assumption of random utility maximization. Although Cardell is widely cited, we have not his formulation explicitly used in the literature. An equivalent recursive combination of variance components can be implemented more effectively using Daly notation. Starting with any tree structure, a variance component νc ~C (λ c ) can be assigned to each node, where λ e ≡ 0 and λroot ≡ 1, and 0 < λ c < 1 for all composite nodes. By defining the factor
ξc =
∏
λt (b),
b ∈ A (c )
(8)
random utilities for the elemental alternatives are given by:
Ue = X e β + εe ,
(9)
where 4
Cardell notes that it is possible to generalize C (λ ) to include a scale parameter. However, for our purposes we proceed using the scale ¼ 1 assumption.
Please cite this article as: Bunch, D.S., Rocke, D.M., Variance-component-based nested logit specifications: Improved formulation, and practical microsimulation of random disturbance terms. The Journal of Choice Modelling (2016), http: //dx.doi.org/10.1016/j.jocm.2016.04.001i
D.S. Bunch, D.M. Rocke / Journal of Choice Modelling ∎ (∎∎∎∎) ∎∎∎–∎∎∎
εe =
∑
5
ξa νa. (10)
a ∈ A (e )
For example, this yields the following random disturbance terms for {bus, carpool, walk} in Fig. 2a:
εbus
= νvehicle + λvehicle νtransit + λvehicle λtransit νbus * * = νvehicle + νtransit + νbus
εcarpool = νvehicle + λvehicle νauto + λvehicle λ auto νcarpool εwalk
= νvehicle + = νwalk
* νauto
+
* νcarpool (11)
where the expressions with starred terms have been included to emphasize the nature of the variance components. Using the definition μt (c ) = ξc−1, it is straightforward to show equivalence between Cardell's NL formulation and the normalized Daly formulation, (and therefore other models, as discussed in Section 2). In particular, note that μt (c ) /μc = λ c ⇒ μt (c ) = λ c μc , so the restriction 0 < λ c < 1 ensures that the 0 < μt ( c ) ≤ μc requirement is automatically satisfied. A more detailed exploration of expressions for elasticities, correlation among error terms, etc., would clearly be possible, but is beyond the scope of this paper. The implications of this approach are discussed in the next two sections.
4. Practical microsimulation of NL disturbance terms Because the νc ’s are independent and (conveniently) univariate, an obvious procedure for simulating ε ’s is to generate νc ~C (λ c ) values and apply Eq. (10). Unfortunately, Cardell reports that the CDF for C (λ ) (for λ > 0) does not have a closedform expression, so simple inversion procedures cannot be used.5 However, he derives the characteristic function for C (λ ) in an Appendix, providing an opening to a solution. There is a long tradition in statistics of developing generalized families of continuous, univariate distributions that can be used to produce flexible approximations—see, e.g., Ord (1972). The MATLAB Statistics Toolbox includes a function (pearsrnd) that generates random variates from the best approximation it can construct using the Pearson system, based on user-provided information on the first four moments of a distribution. These can be derived using the characteristic function, which we did for C (λ ). Denote the desired quantities by: mean(λ ), variance(λ ), skewness(λ ), and kurtosis(λ ).6 First, note that C(0) is standard Gumbel (location¼0, and scale¼1). For this distribution, mean(0) ¼ γ ≈ 0.5772 (Euler's constant), and variance(0) ¼ π 2/6. For C (λ ), mean(λ ) ¼ mean(0)*(1 − λ ) = γ (1 − λ ), and variance(λ ) ¼ π2 variance(0)*(1 − λ2) = 6 (1 − λ2). Skewness and kurtosis are more complex, and most easily expressed using cumulants. Let th κn (λ ) denote the n cumulant of C (λ ). It can be shown that κn (λ ) = (1 − λn) κn (0), where .. gives C(0) cumulants for n 4 1 (and ζ (n) is the Riemann zeta function). These are consistent with the mean and variance expressions above (e.g., κ1 (0) = γ and ζ (2) = π 2/6). The coefficients of skewness and kurtosis (used by pearsrnd) are given by:
(1 − λ 3) κ 3 (0) (1 − λ 3) 12 6 ζ (3) = ⎡⎣ variance (λ ) ⎤⎦3/2 ( 1 − λ2)3/2 π 3
(12)
κ 4 (λ ) (1 − λ 4 ) κ 4 (0) (1 − λ 4 ) 12 1 + λ2 12 +3= +3= = + 3. 2 2 ⎡⎣ variance (λ ) ⎤⎦2 5 1 − λ2 5 κ22 (λ ) (1 − λ )
(13)
skewness (λ ) =
κ 3 (λ ) κ23/2 (λ )
=
and
kurtosis (λ ) =
Recall that C (λ ) becomes degenerate as λ → 1. In the expressions above, both the mean and variance approach zero, and skewness and kurtosis approach infinity (albeit very slowly) as λ → 1. Because C (λ ) is a univariate distribution with a single parameter, it was practical to perform numerical testing to confirm that using pearsrnd with these expressions worked well over the full range of λ′s . Moreover, choice probabilities computed using simulated variates for the large NL model in Bunch et al. (2015) closely matched those from close-form expressions.
5. Discussion As noted in the introduction, the problem of simulating disturbance terms from NL models has received limited attention in the literature. We could locate only two recent references: Garrow et al. (2010), and Ye (2011). Both were aware of Cardell and the existence of , but viewed the lack of a closed-form CDF as precluding an approach like ours. Garrow et al. (2010) remark on the lack of solutions, and report (based on communication with other researchers) that a common approach is 5 6
Note that the CDF for standard Gumbel (C(0)) has a closed form and is easy to simulate. For background on characteristic functions, moments, and cumulants used to derive the results discussed next, see Stuart and Ord (2010).
Please cite this article as: Bunch, D.S., Rocke, D.M., Variance-component-based nested logit specifications: Improved formulation, and practical microsimulation of random disturbance terms. The Journal of Choice Modelling (2016), http: //dx.doi.org/10.1016/j.jocm.2016.04.001i
D.S. Bunch, D.M. Rocke / Journal of Choice Modelling ∎ (∎∎∎∎) ∎∎∎–∎∎∎
6
“approximating the Gumbel distribution with normals.” They explore this using a Monte Carlo study for a two-level NL model, concluding that it produces biased estimates for simulated data sets. They also explore methods that rely on derivations using two different bivariate Gumbel distributions, with mixed results. Similarly, Ye (2011) does a Monte Carlo study for two- and three-level NL models by simulating C (λ ) variates using expressions relying on bivariate Gumbels and application of copulas. Our approach is much simpler, particularly for the larger and more complex model we were using, and worked well in our numerical tests. Note that we used conveniently available software for the Pearson family, but that better methods might be identified by investigating other general distribution families with, e.g., more compatible mathematical structure. Beyond providing a solution to the simulation problem, we view our results as having other advantages for empirical work. For example, researchers seeking to develop and test alternative tree structures could find it helpful to frame their problem from the perspective of heterogeneous consumer preferences for the qualitative factors associated with composite nodes. Another potential benefit is the further clarification of equivalence among various results in the literature. Our discussion (although informal) provides specific links through Carrasco-Ortúzar. In addition, it is important to note that the Daly NL notation is a special case of more general notation in Daly and Bierlaire (2006) for the GEV family, which opens the possibility of extension to more general models (e.g., cross-nested logit). Finally, it is perhaps worth a reminder that, although the NL model can be a very useful tool for empirical work, this is only true when the data are generally compatible with the underlying modeling assumptions. For example, the model assumes that all random disturbances have the same scale/variance, and (under the perspective offered here), the only type of preference heterogeneity being specifically addressed is associated with the nominal levels of “qualitative attributes,” and, moreover, only in a way that adheres to a specific hierarchical structure. Estimating an NL model using a data set “incompatible” with the assumed tree structure would result in (attempted) violations of the prescribed restrictions on the structural parameters. In the context a variance component formulation, this would suggest, e.g., values of λ outside the interval [0, 1]. However, the technical details in Cardell confirm that this would be mathematically problematic. The main question to be answered is whether or not there are any tree structures that are compatible with the data. If not, a modeling framework more flexible/general than NL should be considered. For related, in-depth discussion, see Carrasco-Ortúzar (sections 3.5 and 3.6.3).
References Bierlaire, Michel, Lotan, Tsippy, Toint, Philipe, 1997. On the overspecification of multinomial and nested logit models due to alternative specific constants. Transp. Sci. 31 (4), 363–371. Bunch, David S., Ramea, Kalai, Yeh, Sonia, Yang, Christopher, 2015. Incorporating behavioral effects from vehicle choice models into bottom-up energy sector models, Institute of Transportation StudiesUniversity of California, Davis Research Report UCD-ITS-RR-15-13. Carrasco, Antonio, Juan De Dios Ortúzar, 2002. A Review and Assessment of the Nested Logit Model. Transp. Rev. 22, 197–218. Cardell, N. Scott, 1997. Variance components structures for the extreme-value and logistic distributions with application to models of heterogeneity. Econ. Theory 13, 185–213. Daly, Andrew, Zachary, Stan, 1978. Improved Multiple Choice Models. In: Hensher, D.A., Dalvi, M.Q. (Eds.), Determinants of Travel Choice, Saxon House, Westmead. Daly, Andrew, 2001. Alternative Tree Logit Models: Comments on a Paper of Koppelman and Wen. Transp. Res. Part B 35, 717–724. Daly, Andrew, Bierlaire, Michel, 2006. A general and operational representation of generalized extreme value models. Transp. Res. Part B 40, 285–305. Garrow, Laurie A., Bodea, Tudor D., Lee, Misuk, 2010. Generation of synthetic datasets for discrete choice analysis. Transportation 37, 183–202. Hensher, David, Greene, William H., 2002. Specification and estimation of the nested logit model: alternative normalizations. Transp. Res. Part B 36, 1–17. McFadden, Daniel, 1973. Conditional Logit Modeling of Qualitative Choice Behavior. In: Zarembka, P. (Ed.), Frontiers of Econometrics, Academic, New York, pp. 105–142. McFadden, Daniel, 1978. Modelling the Choice of Residential Location. In: Karlquist, A., et al. (Eds.), , Spatial Interaction Theory and Planning Models, North Holland: Amsterdam, pp. 75–96. McFadden, Daniel, 1981. Econometric Models of Probabilistic Choice. In: Manski, C.F., McFadden, D. (Eds.), Structural Analysis of Discrete Data with Econometric Applications, MIT Press, Cambridge, pp. 198–272. Ord, J.K., 1972. Families of Frequency Distributions. Griffin, London. Stuart, Alan, Ord, Keith, 2010. Kendall’s Advanced Theory of Statistics, Sixth EditionDistribution Theory, vol. 1. John Wiley, New York. Train, Kenneth, 2009. Discrete Choice Methods with Simulation, 2nd edition. Cambridge University Press, Cambridge. Williams, Huw C.W.L., 1977. On the formation of travel demand models and economic evaluation measures of user benefit. Environ Plan 9A, 285–344. Ye, Xin (2011). Investigation of Underlying Distributional Assumptions in Nested Logit Model Using Copula-Based Simulation and Numerical Approximation. Transportation Research Record: Journal of the Transportation Research Board, No. 2254, Transportation Research Board of the National Academies, Washington, D.C., pp. 36-43.
Please cite this article as: Bunch, D.S., Rocke, D.M., Variance-component-based nested logit specifications: Improved formulation, and practical microsimulation of random disturbance terms. The Journal of Choice Modelling (2016), http: //dx.doi.org/10.1016/j.jocm.2016.04.001i