A Bayesian sample selection model based on normal mixture to investigate household car ownership and usage behavior

A Bayesian sample selection model based on normal mixture to investigate household car ownership and usage behavior

Travel Behaviour and Society 20 (2020) 36–50 Contents lists available at ScienceDirect Travel Behaviour and Society journal homepage: www.elsevier.c...

4MB Sizes 0 Downloads 24 Views

Travel Behaviour and Society 20 (2020) 36–50

Contents lists available at ScienceDirect

Travel Behaviour and Society journal homepage: www.elsevier.com/locate/tbs

A Bayesian sample selection model based on normal mixture to investigate household car ownership and usage behavior

T

Na Wua,b, , Xiang (Ben) Songc, Ronghan Yaod, Qian Yua,b, Chunyan Tange, Shengchuan Zhaod ⁎

a

Key Laboratory of Transport Industry of Management, Control and Cycle Repair Technology for Traffic Network Facilities in Ecological Security Barrier Area (Chang‘an University), Xi’an City, Shannxi Province 710064, PR China b School of Highway, Chang’an University, Xi’an City, Shannxi Province 710064, PR China c Department of Civil and Environmental Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, USA d School of Transportation & Logistics, Faculty of Infrastructure Engineering, Dalian University of Technology, Dalian City, Liaoning Province 116023, PR China e School of Transportation Engineering, Dalian Maritime University, Dalian City, Liaoning Province 116026, PR China

ARTICLE INFO

ABSTRACT

Keywords: Sample selection Normal mixture Bayesian Markov Chain Monte Carlo Copula model Car ownership and usage

Selection bias is an important issue in analyzing household car ownership and usage behavior. If it is not well considered in the modeling process, estimates will be biased. In this paper, we use a Bayesian sample selection model, which accounts for the selection bias, to investigate household car ownership and usage behavior. Employing the approach of normal mixture, the new established model relaxes the bivariate normal assumption in the traditional sample selection model and can capture flexible coupling relationship between car ownership and usage behavior. Moreover, the model does not require specifying any marginal distribution. Three cross validation experiments using simulated data suggest that the new model is effective in revealing parameters’ true values and in capturing actual error distribution. Considering overfitting issue, various tests are proposed to determine the most likely number of normal components. After testing, the new model with 3 components has a stronger explanation power in analyzing the interdependence between household car ownership and usage behavior in terms of goodness of fit and generalization ability. By comparison, estimates from the traditional normal model are seriously biased regarding magnitude, significance level, even the sign. Lastly, to test the efficiency of Bayesian normal mixture model, performance of the Copula model is evaluated. The result indicates that the normal mixture model with two components already has a strong power in capturing the general pattern of error distribution, and its goodness of fit has been impressively improved compared with the traditional normal model and the Copula model.

1. Introduction The rising cost of pollutants, energy, and traffic congestion caused by private car ownership and usage, impedes the further improvement of economy, people’s living standard, individual mobility, and residents’ happiness. Instead of providing additional road capacity, people realize that we need to regulate travel demand to make the society less car dependent. Travel behavior shifting from car dependency to lower-carbon transportation modes through policy interventions is an effective and sustainable way (Chu, 2015; Liu and Cirillo, 2016; Meurs et al., 2013; Shekarchian et al., 2017; Yao and Wang, 2018). However, the objective cannot be achieved without knowing the decision-making mechanism of household car ownership and usage

behavior (Hao et al., 2016; Liu and Cirillo, 2016). Therefore, many scholars have devoted themselves to integrated modeling of car ownership and usage behavior (Bhat and Guo, 2007; Bhat et al., 2009; Ding and Cao, 2019; Liu et al., 2014; Musti and Kockelman, 2011; Senbil et al., 2009; Spissu et al., 2009; Yang et al., 2017; Yin and Sun, 2018). Car ownership and usage behavior is jointly determined by many factors including land use pattern, existing transportation system, household travel need, and relevant policy instruments. Land use determines the spatial distribution of activity locations. It was found that car ownership and usage levels declined in high-density area with the presence of mixed land uses (Cirillo and Liu, 2013; Soltani, 2017). Moreover, job-housing separation and school-housing separation can result to increasing car trips (Liu et al., 2018). In the newly built area, a

Corresponding author at: Key Laboratory of Transport Industry of Management, Control and Cycle Repair Technology for Traffic Network Facilities in Ecological Security Barrier Area (Chang‘an University), Xi’an City, Shannxi Province 710064, PR China. E-mail addresses: [email protected] (N. Wu), [email protected] (X.B. Song), [email protected] (R. Yao), [email protected] (Q. Yu), [email protected] (C. Tang), [email protected] (S. Zhao). ⁎

https://doi.org/10.1016/j.tbs.2020.02.006 Received 13 July 2019; Received in revised form 8 February 2020; Accepted 11 February 2020 2214-367X/ © 2020 Hong Kong Society for Transportation Studies. Published by Elsevier Ltd. All rights reserved.

Travel Behaviour and Society 20 (2020) 36–50

N. Wu, et al.

well-designed land use pattern can restain private car ownership and usage. However, in many well-built urban areas, land use cannot be changed substantially. Transportation systems including infrastructures, facilities, and operational strategies, shape travel behavior from supply side. Public transit accessibility is one of the most important factors in transportation systems. Without providing attractive alternatives, it is almost impossible to steer residents away from private cars. “Transit habit” (Smart and Klein, 2018) is found to be critical in this process. Having lived in a neighborhood with high-quality public transit would continue to influence travel behavior even if one moves to a neighborhood with worse public transit services (Smart and Klein, 2018). However, households gathering behavior near a public transit station and its relationship with private car ownership and usage behavior are rarely investigated. Household travel need is determined by demographic structure (Bhat et al., 2009; Soder and Peer, 2018), such as the number of children, elderly family members, and workers, etc. It varies across different countries with different demographic, social, and cultural contexts. Generational differences can also contribute to the variations (Etminani-Ghasrodashti et al., 2018; Lavieri et al., 2017; Luke, 2018; Tao et al., 2019; Tosa et al., 2018). Education plays a role in shaping the travel behavior in the future (Zhang et al., 2016). The influence of education, however, is different for households without a car, with one car, and with multiple cars (De Borger et al., 2016; Tezcan et al., 2011). China has been experiencing the demographic transition in past decades due to the increase of elderly population, “one-child-only” policy, and the implementation of “two-children-allowed” policy recently. The great change in demographic structure will inevitably influence household travel behavior. However, influences of household structure such as number of children and elderly members on car ownership and usage behavior are rarely investigated. Moreover, financial stress for households who are homeowners or renters is significantly different, and situation is intensified by ever-rising housing price in China. The influence of home ownership on car ownership and usage behavior is rarely quantified as well. Selection bias is an important issue in analyzing household car ownership and usage behavior. It is introduced when the sample obtained cannot be representative of the population of interest since vehicle mileages are observed only for those who own cars. If the slection bias is not well considered in modeling process, estimates will be biased. Therefore, the results cannot be used to explain the causal or correlation effect (Kroesen, 2019; Mokhtarian and Cao, 2008), which is the key step in policy analysis. Therefore, special methods (Alemi et al., 2019; Mannering and Hensher, 1987; Mannering and Winston, 1985; Mannering, 1986) should be employed to handle the selection bias problem. Sample selection model is one of these approaches. Type 2 Tobit model (Amemiya, 1985) is the most widely used sample selection model, in which two equations are included. The first equation is called selection equation, whose selection rule is binary. The second equation is outcome equation. Moreover, two unobserved error terms from selection and outcome equations are correlated. Heckman (1974, 1979) assumed two unobserved error terms followed the bivariate normal assumption for simplicity. However, if the assumption is not accurate, actual relationship cannot be correctly reflected. Lee (1982, 1983) relaxed normal assumption about the marginals by transforming the non-normal stochastic components of the model into normal random variables. Lee’s approach, however, requires estimating additional parameters characterizing the marginals and still imposes bivariate normal coupling. Bivariate normal coupling suggests that the dependence of two error terms is linear and symmetric (Bhat and Eluru, 2009). Two ways are commonly employed so far to avoid the bivariate normal coupling assumption (Bhat and Eluru, 2009). One is to use Semiparametric or non-parametric methods (Ahn and Powell, 1993; Cosslett, 1991; Lee, 1994). However, they are computationally challenging, and difficult to be

implemented in empirical applications. Another way is to use Copula-based approach (Bhat and Eluru, 2009; Eluru et al., 2010; Sener and Bhat, 2011; Spissu et al., 2009; Tosa et al., 2018; Zou and Zhang, 2016). Copula approach with given marginal distributions can describe non-linear and asymmetric dependence structures between two random variables (Golshani et al., 2018a; Golshani et al., 2018b; Sun et al., 2017; Zarabi et al., 2019). While copula approach is flexible, as mentioned above, marginal distributions need to be specified. However, it is difficult to find a wellspecified and appropriate marginal distribution since the observed data do not belong to any pre-specified parametric distribution family. Moreover, to find an appropriate dependence structure, multiple combinations of marginal distributions and copulas should be tested, which is time-consuming. Mixture of normals, is an effective approach in providing the flexibility of dependence structures, such as multiple modes, fatter tails, and skewed distributions (Castillo et al., 2014). Importantly, the approach is not confined in any pre-specified dependence structure. It suggests that the unobserved error terms follow a mixture distribution of multiple normal distributions. Each normal distribution, here, can be regarded as a pile of gravel. With enough piles of gravel, any shape of hill can be imitated (Rossi and Allenby, 2005). Similarly, with enough number of normals, we can imitate the actual and flexible distribution of error terms. In addition, we do not need to specify marginals. Using mixture of normals, Van Hasselt (2011) proposed a semiparametric model to capture the flexible coupling relationships. Using the semiparametric approach, Van Hasselt’s model can determine the number of normal components automatically for the data. However, semiparametric or nonparametric procedures can be computationally challenging. Therefore, their use in empirical applications remains scarce. Moreover, the chosen model may suffer from the overfitting problem, which is crucial in forecasting. Van Hasselt’s paper did not describe how to examine the overfitting problem. In this study, employing the Bayesian sample selection model based on mixture of normals, we evaluate estimates bias resulted from bivariate normal assumption in the traditional sample selection model using household car ownership and usage dataset. From theoretical contribution side, two aims could be achieved. One is to analyze the performance of the sample selection model with relaxation of bivariate normal assumption in interpreting actual data, and tdevelop procedures that can enable practitioners to test overfitting problem. Another is to find a model with as few as possible the number of normal components, but still capturing the flexible coupling relationship. If a model with two normal components, for example, can improve the goodness of fit greatly compared with the traditional model, it can be widely used in practice, avoiding the computation burden and the examination of overfitting issue. From pratical side, it contributes to investgating households gathering behavior near a public transit station and its relationship with car ownership and usage behavior. Moreover, influence of household structure in the new demographic context in China is quantified, such as influences of number of children, elderly household member, and home ownership. These findings can be helpful in making policy interventions to make the society less car dependent. The rest of this paper is organized as follows. The next section provides a theoretical introduction about the model. We firstly present the model framework including model structure and Bayesian Markov Chain Monte Carlo (MCMC) estimation procedures. Section 3 provides a simulation example to illustrate the model performance. We apply the new model to explore influence of household structure and living environment on private car ownership and usage behavior in Section 4. It includes data description, model evaluation, analysis of estimation results, and Comparison with the Copula model. Section 5 concludes the paper by highlighting paper findings and summarizing implications.o 2. Model framework 2.1. Model structure In this study, we only investigate whether a household owns a 37

Travel Behaviour and Society 20 (2020) 36–50

N. Wu, et al.

private car and annual vehicle miles travelled (AVMT) by all private cars in the household. Thus, the sample selection model we focus on belongs to type 2 Tobit model, in which the selection rule is binary. If the latent variable crosses the threshold, an outcome variable is observed. Otherwise, the outcome variable equals to zero. Therefore, the sample selection model can be written as follows.

sn = xn1 sn

1

+

(1)

n1

(2)

I (sn > 0) x n2

yn =

In order to allow for multiple modes and asymmetry of the error distribution, µk is not restricted to be zero (Van Hasselt, 2011). Therefore, in our model does not include intercepts. Eq. (8) is a general expression of normal mixture model. In the Gibbs sampling process, we need to determine which component household n belongs to in each iteration. Therefore, an indicator variable ind , whose dimension is n × 1, is introduced. Then for each household n , we have, n

indn

+ n2 if sn = 1, 0 if sn = 0.

2

(3)

sn latent variable deciding whether the outcome observed or not for household n = 1, 2, ...,N ; sn manifest variable for sn , if sn > 0 , sn = 1, otherwise, sn = 0 ; yn outcome variable for household n, if sn = 1, yn has a positive value, otherwise, yn = 0 ; x n1 explanatory variable vector in selection equation; x n2 explanatory variable vector in outcome equation; x n define x n = (xn1, x n2) ; parameters to be estimated, and define = ( 1 , 2 ) ; n1 unobserved error term in selection equation; n2 unobserved error term in outcome equation; I (·) indicator function.

iid

(00),

N

1 12

12 2 2

, n = 1, 2, ...,N

N (0, ).

12 n1

+

n,

n

N (0,

2

where v2 is the conditional variance of Then, 2 2

=

2

+

K

pveck k=1

k

+

pveck (µk k=1

µ¯)(µk

µ¯)

(13)

Label switching problem occurs when distribution parameters in each component cannot be uniquely identified. Specifically, it refers that more than one combination of K sets of parameters can constitute the same p ( ) in Eq. (11). Taken 2 components as an example (Van Hasselt, 2011), assume component 1 follows N (µ1 , 12) and component 2 follows N (µ 2 , 22) . According to Eq. (11), the joint probability density function can be computed by following equation.

(4)

p ( ) = pvec ( |µ1 ,

2 1)

+ (1

pvec ) ( |µ 2 ,

2 2)

(14)

If we label component 1 as component 2, and component 2 as component 1, p ( ) will not change. Label switching issue, however, is only a problem when researchers investigate population heterogeneity and attempt to attach some meaning to a specific normal component. Since the fitted bivariate density distribution is what we only care about in this study, label-switching issue is irrelevant (Rossi and Allenby, 2005). In this study, mixture of normals is only regarded as an approach to approximate the flexible error distribution and we do not investigate the meaning of each component.

(6)

+

2.2. Bayesian MCMC

2 12

Classical Maximum Likelihood Estimation (MLE) is to infer the fixed and unknown parameters of population by a set of samples. It heavily relys on asymptotic approximations. While Bayesian MCMC estimation produces exact finite sample inferences and avoids the direct evaluation of multiple integrals (Brownstone and Fang, 2014; Fang, 2008). It has the advantage of transforming the problem of optimization in MLE to the problem of average. Due to these advantages of Bayesian inference, it is increasingly popular recently (Daziano, 2013; Daziano and Achtnicht, 2014; Ding, 2014; Xiong et al., 2018). Therefore, Bayesian MCMC estimation approach is employed in this study.

(7)

K

K

pveck N (µk , k=1

(12)

K

E ( |pvec, {µk }, { k }) =

So far, the model described above is the traditional sample selection model with bivariate normal assumption for the two error terms. The new model assumes that error terms follow mixture distribution with K normal components. n |x n

pveck µk k=1

12 2

12

(11)

K

can be expressed as follows,

1

k)

E (µ|pvec, {µk }) = µ¯ =

n1.

2 12

Thus,

=

n2 , given

pveck ( |µk ,

where is a n × 2 matrix with elements { n} . The first and second moment of normal mixture model can be expressed as follows (Rossi and Allenby, 2005).

(5)

)

(10)

k=1

where 12 is the covariance and is the variance of n2 . Attention is required in such error structure. Since one element is already normalized, we cannot draw directly in the Gibbs sampling process. Therefore, we need to draw the elements separately. Specify error structure as Koop and Poirier (1997).

=

(9)

K

2 2

n2

)

Multinomial (pvec )

p( ) =

We assume that interdependence exists between the selection equation and the outcome equation. Moreover, var( n1) is specified as 1 for the purpose of identification. Therefore, error structure is specified as Eq. (4). n1

indn

where Multinomial refers to the multinomial distribution. In the normal mixture model, joint probability density function of can be expressed as follows.

where

n2

(

N µindn ,

k ),

p veck

0,

pveck = 1 k=1

(8)

where n n = ( n1, n2 ) ; µk denotes mean vector of component k , including µk1 and µk2 ; 1 12k , covariance matrix of component k ; k k = 2 2 12k k + 12k pvec mixing probability vector, pvec = (pvec1, ...,pvecK ) .

2.2.1. Prior and likelihood Conjugate priors are employed and they are expressed as follows.

38

(

1

(

2

1

N b1 , A

2

N b2 , A

1

)

(15)

)

(16)

1

Travel Behaviour and Society 20 (2020) 36–50

N. Wu, et al.

pvec

(17)

Dirichlet ( )

distributions. For simplicity, conditioning on x is omitted throughout this paper.

Moreover, for each component k (k = 1, ...,K ), we have,

(

N b3k , A µk1

µk

(

12k

N b4k , A

2 k

IG (ck, dk )

) 1

12k

ind|[pvec, {µk ,

(18)

)

b1 mean vector in prior distribution of 1; b2 mean vector in prior distribution of 2 ; b3k mean vector in prior distribution of µk ; b4k mean value in prior distribution of 12k ; A 11 covariance matrix in prior distribution of 1, and A 1 is called precision matrix; A 21 covariance matrix in prior distribution of 2 ; A µk1 covariance matrix in prior distribution of µk ; A 121k covariance matrix in prior distribution of 12k ; a vector of parameters in Dirichlet distribution, = ( 1, ..., K ) ; ck , dk parameters of inverted gamma distribution for component k .

(xn1

1

+ µindn 1

)

=(

×

2 12indn

2 1/2 vindn )

(xn1

1

+ µindn 1)

yn

2 2 12indn / vindn

y, s ,

2]

(27)

2 |[ind ,

{µk ,

k },

y, s ,

1]

(28)

k1, k 2]

(31)

2 k |[ 12k ,

µk ,

k1, k 2]

(32)

Details for each step in the Gibbs sampler are described in the appendix. 3. Simulation In this section, we estimate a normal mixture model using simulated data and examine whether the model can reveal parameters’ true values. Moreover, to infer the most likely number of mixture components, various tests are evaluated.

(21)

12indn ( yn vindn

3.1. Data generation We employ the same model as Van Hasselt (2011) to generate the data. It is given by

n N1

Pr(yn , sn = 1| , µindn ,

indn)

xn11 + xn12 +

yn = 1 + 0.5xn21

xn2 2 µindn 2 ) 2 2 12indn + vindn

(33)

n1

0.5xn22 +

(34)

n2

where

x n11 N (0, 3) ; x n12 U ( 3, 3) ; x n21 N (0, 3) ; x n22 = xn12 ; n follows mixture distribution with two components.

f (y, s| , {µk }, { k}) Pr(sn = 0| 1, µindn 1) ×

(30)

k]

µk ,

(22)

n N0

k1, k 2,

sn = 2 +

(29)

], k = 1, ...,K

2 k,

Then the likelihood is given by

=

k }|[ind ,

12k |[

xn2 2 µindn 2 2 2 12indn + vindn

1+

k },

µk |[

indn)

+

{µk ,

After drawing ind , observations are subsequently divided into K components. Denote Hk = {n, indn = k } . Each set of parameters {µk , k} can also be obtained by estimating another sample selection model, which we will discuss later in the appendix. For n Hk , the conditional distributions of {µk , k}|[ind , ] can be expressed as follows.

where µindn 1 is the mean of error in selection equation and µindn 2 is the mean of error in outcome equation. For n N1, Pr(yn , sn = 1| , µindn ,

(26)

k }, yn ]

1 |[ind ,

{µk ,

We need to determine indicator variable (ind ) first before writing the likelihood function. When it is known, each household n corresponds to a specific component indn , which belongs to one of K components. To write the Likelihood function, we need to consider two distinct cases. Define N1 = {n: sn = 1} and N0 = {n: sn = 0} . When sn = 0 , the outcome variable does not contribute to the likelihood. Therefore, the probability is given by the following equation. For n N0 ,

)

(25)

sn |[indn, , {µk ,

where

(

(24)

, s, y ]

pvec|ind

(19) (20)

Pr sn = 0| 1, µindn 1 = 1

k },

(23)

Combining Eq. (21), Eq. (22), and Eq. (23), we can obtain the likelihood function.

Here, n

2.2.2. Gibbs sampler After indn is determined for each household n, Data Augmentation strategy is employed to construct the Gibbs sampler. Data Augmentation was early used to handle missing data problem. The core idea is that missing data are also unobserved and, therefore, can be regarded as part of “parameters” for Bayesians. Then, it is natural to compute the joint posterior distribution of missing data and “parameters”. At last, only those “parameters” that we are interested in are analyzed. Since it was firstly proposed by Tanner and Wong (1987), Data Augmentation has been widely used in many fields. In normal mixture sample selection model (abbreviated for normal mixture model) developed above, indicator variable ind and latent variable s are the unknown data that we need to introduce. The Gibbs sampler for the normal mixture model can be expressed as following conditional

n

is assumed to follow the following distribution:

N (µ1 , ) + (1

) N (µ 2 , ),

=

1 0.85 0.85 1

(35)

where is the mixing probability and µ1 = (0, 2.1) ; µ 2 = (0, 0.9) .

= 0.3;

According to Eq. (12), we can compute the expectation of .

E ( | , µ1 , µ 2 ) = µ¯ = 0.3 × (0,

2.1) + 0.7 × (0, 0.9) = (0, 0)

(36)

The second moment loses much its meaning and interpretability when is away from elliptically symmetric distribution. Thus, it is not computed in this study. Instead, we directly examine the fitted joint distribution of . A sample of size N = 2000 is generated. 39

Travel Behaviour and Society 20 (2020) 36–50

N. Wu, et al.

3.2. Cross validation

Therefore, we restrain our emphasis on these estimates. Fig. 1 describes the variation of means and their corresponding standard deviations for parameters in selection equation. It can be clearly seen that estimates are remarkably close to parameters’ true values when k = 1 and 2. When k 3, estimates move farther away from their true values. Moreover, standard deviations of 12 and 13 are very robust when k = 1 and 2. But they rise greatly and fluctuate with the increase of k when k 3, indicating estimates are not robust any more. Therefore, models with k = 1 and 2 are relatively superior. Note that true values of parameters are unknown in empirical applications. Standard deviation of the estimate, however, can be an indicator for evaluating model performance. When it keeps rising and becomes unstable, model performance is decreasing.

3.2.1. Estimation results In order to fully evaluate the performance of normal mixture model with different number of components (k = 1, 2, 3, 4, 5), cross validation technique is adopted in this section. Three experiments are performed. In each experiment, 2000 data are randomly divided into two parts with 75% and 25%, respectively. 75% data are used to calibrate the model while remaining 25% data are used for hold-out sample test. As a result, calibration sample sizes for three experiments are 1498, 1453, and 1483, respectively. Note that the model is becoming more complex with the number of components (k ) increasing. The priors are specified as: 1 N (0, 100I ) , 2 N (0, 100I ) , N (0, 100I ) , µk N (0, 100I ) , pvec Dirichlet (5, ...,5) , 12k 2 IG (10, 0.1) . And Bayesian procedure is run for 20,000 iterations. k The first 10,000 draws are considered as burn-in and discarded. Remaining 10,000 draws from the posterior are used to compute the estimates and standard errors. Table 1 lists estimation results for three experiments. The value behind is its true value. Values in the first line of each model are the posterior means and values in parentheses are standard deviations. In normal mixture model, standard deviation of the intercept loses its meaning since it is computed. Thus, it is not listed in the Table 5 models have different performance. Therefore, top priority next is to choose a sound model among them.

3.2.2.2. Goodness of fit test. In this study, goodness of fit is measured by log likelihood value (logLik) and log marginal density (logMarDen). logMarDen is calculated by Newton-Raftery approach (1994). Fig. 2 illustrates variations of logLik and logMarDen with k in three experiments. According to Fig. 2, goodness of fit of the normal mixture model with k = 2 is greatly improved compared with the model with k = 1. Moreover, results from three experiments are extremely close. When k 3, goodness of fit maintains the same level with the model with k = 2 (especially indicated by logMarDen), and fluctuation increases among three experiments. When evaluating models, both of goodness of fit and model complexity should be considered. Since the likelihood can always be improved by adding parameters, resulting in overfitting. Overfitting means that the model describes training data too carefully including random error or noise, thereby ignoring the underlying relationship. A model which is overfitted has lost its generalization ability and therefore has poor predictive power. Taking goodness of fit and model complexity into account, we believe the model with k = 2 performs best.

3.2.2. Model evaluation In this part, four tests are used to evaluate the performance of each model and finally the model with the most likely number of components is picked up. They are discussed separately below. 3.2.2.1. Estimates robustness test. Observing the same parameter across 5 models in three experiments, variation of estimates for parameters in selection equation is great with the increase of number of components. Table 1 Estimation Results of Normal Mixture Model (simulation). experiment

model

1

k=1 k=2 k=3

k=4 k=5 2

k=1 k=2 k=3

k=4 k=5 3

k=1 k=2 k=3

k=4 k=5

11 (2.0)

12 (−1.0)

13 (1.0)

21 (1.0)

22 (0.5)

23 (−0.5)

ACPP

1.996 (–) 1.951 (–) 2.285 (–) 2.841 (–) 2.742 (–)

−1.030 (0.063) −1.021 (0.062) −1.195 (0.149) −1.479 (0.235) −1.431 (0.199)

1.042 (0.063) 1.013 (0.064) 1.189 (0.150) 1.470 (0.235) 1.419 (0.196)

0.968 (–) 1.009 (–) 1.030 (–) 1.046 (–) 1.053 (–)

0.467 (0.031) 0.465 (0.019) 0.464 (0.020) 0.464 (0.020) 0.463 (0.019)

−0.479 (0.035) −0.540 (0.022) −0.547 (0.022) −0.553 (0.023) −0.556 (0.023)

95%

2.012 (–) 2.010 (–) 2.262 (–) 2.631 (–) 2.916 (–)

−1.060 (0.063) −1.049 (0.061) −1.174 (0.107) −1.491 (0.358) −1.638 (0.355)

1.065 (0.064) 1.053 (0.062) 1.174 (0.107) 1.490 (0.357) 1.638 (0.364)

0.959 (–) 0.971 (–) 0.995 (–) 1.008 (–) 1.011 (–)

0.499 (0.031) 0.497 (0.021) 0.498 (0.021) 0.496 (0.021) 0.496 (0.022)

−0.505 (0.035) −0.529 (0.023) −0.536 (0.024) −0.542 (0.024) −0.542 (0.024)

95%

2.156 (–) 2.133 (–) 2.617 (–) 3.200 (–) 3.309 (–)

−1.129 (0.068) −1.127 (0.066) −1.417 (0.199) −1.651 (0.272) −1.971 (0.264)

1.109 (0.068) 1.097 (0.066) 1.382 (0.194) 1.610 (0.267) 1.916 (0.257)

0.970 (–) 0.991 (–) 1.021 (–) 1.034 (–) 1.038 (–)

0.479 (0.031) 0.476 (0.021) 0.475 (0.021) 0.475 (0.021) 0.475 (0.021)

−0.499 (0.036) −0.534 (0.023) −0.543 (0.023) −0.549 (0.023) −0.553 (0.024)

94%

ACPP: average correct prediction probability. 40

95% 95% 96% 96%

94% 95% 93% 94%

93% 93% 93% 94%

Travel Behaviour and Society 20 (2020) 36–50

N. Wu, et al.

3.5 3.0

ˆ

11

ˆ

12

2.0

2

3

4

13

experiment 1 experiment 2 experiment 3 true value

-2.5

5

1

2

k

1.00

3

4

0.75

5

1

2

3

4

5

k

(1b) ˆ12

(1c) ˆ13 0.4

0.4 experiment 1 experiment 2 experiment 3

0.3

12

1.25

k

(1a) ˆ11

ˆ

1.50

ˆ

-1.5 -2.0

1

experiment 1 experiment 2 experiment 3 true value

1.75

-1.0

2.5

1.5

2.00

-0.5 experiment 1 experiment 2 experiment 3 true value

experiment 1 experiment 2 experiment 3

0.3

ˆ

13

0.2

0.2

St. Dev.

St. Dev.

0.1

0.1 1

2

3

4

0.0

5

1

2

3

4

5

k

k

(1e) ˆ13’s standard deviation

(1d) ˆ12’s standard deviation

Fig. 1. Variation of Means and Standard Deviations for Parameters in Selection Equation.

points in the training data suggested by the enlarged horizontal coordinate. Now the model has the risk of overfitting. When k = 4 and k = 5, joint distributions of exhibit the similar pattern, but incorporate more low density points. Therefore, the two models have higher risk of overfitting. To summarize, the model with k = 1 cannot capture the overall pattern of actual distribution while the model with k = 2 can. When k = 3, 4, 5, the general pattern models captured keeps similar with k = 2 , but they has a risk of overfitting. Thus, the model with k = 2 has a stronger generalization ability than others. This is a very rough but safe inference. In empirical applications, since distribution of can be very complex, overfitting phenomenon can be detected by contours with many inflection points. 3.2.2.4. Hold-out sample test. We use remaining 25% data to test the predictive power of normal mixture models. Based on the analysis above, parameters’ estimates in outcome equation are quite robust

-1600

-1600

-1800

-1800

-2000

-2000

logMarDen_NR

logLik.

3.2.2.3. Image inspection. Based on Eq. (11), the fitted joint distributions of for different normal mixture models can be obtained. By directly inspecting the fitted joint distribution, we can roughly screen out the overfitted model and pick up the simplest model, which can capture general pattern of the unobserved error distribution. Taking experiment 1 as an example, joint distribution of is depicted in Fig. 3. Colors in the figure represent probability density. Green represents the lowest density and white represents the highest density. In the process changing from green to white, the color experiences yellow and red, indicating density increase. Contours in the figure confine the distribution range and points on the same contour have the same density value. When k = 1, it is an elliptical and symmetric distribution, which is what we expect from a bivariate normal distribution. When k = 2 , the distribution witnesses a great change from one mode to two modes. The general pattern captured by the model with k = 3 is the same as the one with k = 2 , but it tries to contain some low density data

experiment 1 experiment 2 experiment 3

-2200 -2400 -2600 -2800

experiment 1 experiment 2 experiment 3

-2200 -2400 -2600

1

2

3

4

-2800

5

1

2

3

k

k

logLik

logMarDen

Fig. 2. Variation of LogLik and LogMarDen (simulation). 41

4

5

Travel Behaviour and Society 20 (2020) 36–50

N. Wu, et al.

Fig. 3. Fitted Joint Distribution of Error Terms (simulation).

while those from selection equation differ greatly. Thus, we only focus on the prediction power for selection variable sn , which is measured by average correct prediction probability (ACPP). Performances of different models in three experiments are listed in the last column of Table 1. Observing the table, predictive powers of different models are remarkably close. It suggests that hold-out sample test in this simulation example is not a valid test in inferring the most likely number of components. However, the lowest accuracy is 93%. It proves that normal mixture sample selection model based on Bayesian MCMC and Dirichlet prior has a good predictive power.

this case study. Hence, vector x n1 includes household structure and living environment variables. In addition to variables listed in x n1, x n2 also contains car attributes variables. Table 2 lists sample statistics based on the car ownership levels. Numbers denote count and proportions are inside parentheses. We define the household group with one or more car as the positive group while another group as the negative group. From Table 2, average number of employed persons in the positive group is 2.027, which is greater than 1.855 in the negative group. Percentage of households who have elderly members in the positive group is 14.928%, higher than 8.333% in the negative group. In the positive group, proportions of households with children are higher than those in the negative group. It indicates that having children will have a positive influence on the car ownership. In the positive group, proportion of households being home owners is 89.162%, significantly higher than 51.235% in the negative group. Due to the privacy concern, people tend to report their income conservatively. Income data thus can be downwards biased. Instead, the expense data are more reliable and expense can be a good indicator of actual income level. Therefore, we collected the average monthly expense data of households. We noticed that majority of households in the negative group are in the lowest month expenditure level category which is within 5000 CNY. In the positive group, 34.765% households live in the CBD district while it is 35.802% in the negative group. The statistics difference exhibited between two groups with respect to CBD district is small. Walking time to the nearest public transit station is selected as an indicator for public transit accessibility. According to Table 2, most households live in areas within 10 min to the nearest public transit station regardless of car ownership. For households in the positive group, average cost per household spent in buying private cars is 244,000 CNY, and average cost per household per year spent in car maintenance is 14,860 CNY. Average engine displacement in the

4. Dalian case study 4.1. Overview The normal mixture model developed above is used to explore the influence of household structure and living environment on private car ownership and usage behavior. Data used in this case study come from a Revealed Preference (RP) survey conducted at Dalian, China. The survey was conducted in 2015 using the stratified random sampling. The questionnaire collected information about household car ownership and annual miles traveled by each car in the household. Moreover, household demographic structure, living environment, and car attributes including displacement, purchasing cost, and annual maintenance cost, were collected as well. The total valid sample size we obtained is 813, in which 60.1% households own at least one car. Dependent variables in selection and outcome equations are ownership (whether the household owes at least one car) and annual AVMT associated with the household. Therefore, AVMT refers to the total annual miles travelled by all cars in the household. Similarly, displacement, purchasing cost, and maintenance cost are calculated in the same way. Since we do not clearly know which variable only influences ownership rather than usage, the exclusion restriction is not imposed in 42

Travel Behaviour and Society 20 (2020) 36–50

N. Wu, et al.

Table 2 Statistics of the Sample. Variables

Household structure variables Employer Living with elderly people 0 1 Children 0 1 2 or more Home ownership 0 1 Monthly expense (1,000 CNY) Within 5 5–10 10–20 Above 20

Car ownership = 1

Car ownership = 0

489 (60.148%)

324 (39.852%)

Mean: 2.027, St. Dev.:0.622

Mean: 1.855, St. Dev.:0.787

416 (85.072%) 73 (14.928%)

297 (91.667%) 27 (8.333%)

171 (34.969%) 284 (58.078%) 34 (6.953%)

222 (68.519%) 92 (28.395%) 10 (3.086%)

53 (10.838%) 436 (89.162%)

158 (48.765%) 166 (51.235%)

140 (28.630%) 259 (52.965%) 72 (14.724%) 18 (3.681%)

207 (63.889%) 102 (31.482%) 13 (4.012%) 2 (0.617%)

Living environment variables CBD district Yes 170 (34.765%) No 319 (65.235%) Public transport accessibility variables: WT: Walk Time to the nearest public transport station WT0: Within 5 min 292 (59.714%) WT1: 5–10 min 148 (30.266%) WT2: 10–20 min 32 (6.544%) WT3: above 20 min 17 (3.476%) Car attributes variables AVMT (1,000 km) Mean: 17.060, St. Dev.: 13.862 Displacement (litre) Mean: 2.275, St. Dev.: 1.152 Purchasing cost (100,000 CNY) Mean: 2.440, St. Dev.: 2.412 Annual maintenance cost (10,000 CNY) Mean: 1.486, St. Dev.: 1.749

116 (35.802%) 208 (64.198%) 190 (58.642%) 86 (26.543%) 26 (8.025%) 22 (6.790%) – – – –

CNY: China Yuan.

positive group is 2.275 L. Moreover, the average AVMT is 17,060 km. Fig. 4 describes the distribution of AVMT, which is left-skewed with heavy tail.

specified the same as those in simulation example. Since empirical data are usually more complicated than simulated ones, Bayesian procedure is run for 30,000 iterations. The first 20,000 draws are discarded as burn-in and remaining 10,000 draws are used to compute the estimates and their standard errors. Firstly, a suitable model with sound goodness of fit and generalization ability is chosen from the six models. Then estimation results of this model are compared with those from the normal model. Here, we employ goodness of fit test and image inspection to infer the most likely number of components.

4.2. Model evaluation The normal and normal mixture models with different number of components are estimated, respectively. For the normal mixture model, the number of components is ranging from one to six. Priors are

Fig. 4. The Distribution of AVMT for Households in the Positive Group.

43

Travel Behaviour and Society 20 (2020) 36–50

-1600

-1600

-1700

-1700

-1800

-1800

-1900

-1900

logMarDen_NR

logLik.

N. Wu, et al.

-2000 -2100 -2200

Markov chain 1 Markov chain 2 Markov chain 3

-2300 -2400

1

2

3

4

5

-2000 -2100 -2200

Markov chain 1 Markov chain 2 Markov chain 3

-2300 -2400

6

1

2

k

3

4

5

6

k

logLik

logMarDen

Fig. 5. Variation of LogLik and LogMarDen (empirically).

4.2.1. Goodness of fit test To obtain an universal result, three Markov chains are run. Log likelihood value and log marginal density for different models in three chains are described in Fig. 5. Observing them, we discover that goodness of fit is gradually improved from k = 1 to k = 3. And results from three Markov chains are extremely consistent. When k 4 , estimation results are not robust any more although the goodness of fit is still improved. Considering the overfitting issue we discussed above, we conclude that the model with k = 3 is better in terms of both of goodness of fit and generalization ability. This maybe a relatively conservative choice but it produces robust estimates.

ownership and usage. As discussed above, when k = 1, it is a standard elliptical and symmetric distribution. But it describes the unobserved errors too rough. When k = 2 , the pattern changes greatly. When k = 3, the pattern experiences minor adjustments from the one described by the model with k = 2. When k 4 , the kernel pattern does not change too much, but these models try to explain more data which have very low densities. Moreover, contours now have many inflection points. The generalization ability of these models is not strong. Thus, the mixture model with k = 3 is more suitable and reliable.

4.2.2. Image inspection Fig. 6 is the fitted joint distribution of for normal mixture models with different k , which represents the interdependence of car

Estimation results of normal and normal mixture model with k = 3 are summarized in Table 3. Number “1” behind variables in the first conlum represents variables in selection equation while number “2”

4.3. Results analysis

Fig. 6. Fitted Joint Distribution of Error Terms (empirically). 44

Travel Behaviour and Society 20 (2020) 36–50

N. Wu, et al.

the selection equation is significant in the normal mixture model, and the positive influence on private car ownership is bigger than that of 1 child. Due to the “one-child-only” policy implemented for decades in China and close relationship between parents and children, living with elderly parents is quite common in China. Moreover, parents in China devote too much emphasis on the growth of children, revolving around health care, entertainment, and education. Children are one of the critical elements in determining the household travel behavior pattern, which is proved by the estimation result. To deal with issues caused by aging society, “two-children-allowed” policy has been implemented in China since 2015. Under this scenario, traffic congestion caused by cars will get intensified since more households are going to have two children. Therefore, decarbonization of elderly people and children related travel behavior could contribute to achieving a sustainable future transportation system. Moreover, home ownership and monthly expense are positively correlated with the car ownership and usage behavior. Due to rapid urbanization, more households are eager to have their own homes in the cities. The home ownership in some regions of China is even a prerequisite for getting married. Therefore, mental stress for households being home owners or renters is significantly different, and situation is intensified by the ever-rising housing price. As a result, the home ownership does have a direct influence on households’ travel behavior, which is strongly reflected by the estimation results. Moreover, car ownership and usage behavior is also constrained by household monthly expenses, which is an indicator of household income levels. CBD district has a statistically significant and negative influence on private car ownership and usage revealed by the normal mixture model. The estimation result makes sense. Firstly, good public transit accessibility in the CBD district provides an attractive travel alternative for private cars. Secondly, the higher cost of car ownership and usage impedes further development of private cars in the CBD district. Positive and negative influence of public transport accessibility revealed by two models is the same while significance level exhibits minor difference. WT1 is positive while it is insignificant. WT2 and WT3 are negative and statistically significant. The estimation results suggest that if the household lives within 5 min’ walk to the nearest transit station, the model cannot forecast whether the household owns a car. Most Chinese households prefer to live near public transit systems regardless of car ownership. If the walking time is greater than 10 min, the household has a higher likelihood not to own a car. The estimation results are consistent with what exhibited by the sample data. In terms of estimates’ magnitude, we discover that positive estimates in selection equation from the normal model are downward biased while negative estimates are upward biased. Situation in outcome equation is contrary. In addition to variables related to car attributes, positive estimates are upward biased while negative estimates are downward biased. Based on the analysis above, when the distribution of unobserved error differs greatly from the normal distribution, estimates from the normal model are significantly biased. By comparison, the normal mixture model performs extremely better than the normal model from both of overall goodness of fit and interpretability of estimates.

Table 3 Results of the Normal Model and the Normal Mixture Model (k = 3). Variables

Constant1 Household structure variable1 Employer1 Living with elderly people1 1 child1 2 or more children1 Home ownership 1 Monthly expense1

The normal model Mean

St. Dev. Mean

St. Dev.

−0.839***

0.158

−1.283

1.227

0.042 0.065 0.306*** 0.259 0.675*** 0.533***

0.062 0.133 0.093 0.202 0.111 0.057

0.115 0.024 0.462*** 0.506* 1.112*** 0.676***

0.081 0.135 0.149 0.270 0.146 0.088

0.079

−0.217**

0.086

0.095 0.151 0.185 2.667

0.161 −0.438** −0.661*** −2.358

0.105 0.207 0.212 12.447

0.971 1.895 1.418 2.913 1.777 0.870

−0.520 1.887* 1.681** 3.336** 7.293*** 2.039***

0.486 1.046 0.733 1.510 1.054 0.515

1.239

−1.943***

0.631

1.455 2.356 2.896

0.507 −2.163* −5.802***

0.671 1.284 1.470

0.733 0.406 0.284

2.020*** 0.744** 0.124 −1974.410 −2079.753

0.565 0.315 0.236

Living environment variable 1 CBD district1 −0.002 Public transport accessibility variables1 WT1: 5–10 min −0.003 WT2: 11–20 min −0.325** WT3: Above 21 min −0.528*** Constant2 −13.688*** Household structure variable2 Employer2 Living with elderly people2 1 child2 2 or more children2 Home ownership2 Monthly expense2

−0.077 3.002 4.659*** 5.981** 10.777*** 5.807***

Living environment variable 2 CBD district2 −1.997 Public transport accessibility variables2 WT1: 5–10 min 0.918 WT2: 11–20 min −4.904** WT3: Above 21 min −9.920*** Private car attributes 2 Displacement2 Purchasing cost2 Annual maintenance cost2 logLik logMarDen_NR

The normal mixture model (k = 3 )

2.135*** −0.058 −0.595** −2285.441 −2294.667

represents variables in outcome equation. Log likelihood value and log marginal density are listed as well at bottom of the table. Compared with the normal model, model fit of the normal mixture model is significantly higher. Log likelihood value is improved from −2285.441 to −1974.410. And log marginal density has been improved from −2294.667 to −2079.753. Obviously, normal mixture model with three components explains the data much better than the normal model. In terms of parameter estimates, great difference occurs between two models. Among them, the greatest disparity occurs in car attributes. Estimation results from the normal model suggest that purchasing cost is negatively correlated with car usage although the influence is not statistically significant. However, the relationship between two variables is positive and significant indicated by the normal mixture model. Meanwhile, influence of annual maintenance cost is also different. According to the normal model, it is negatively and significantly correlated with car usage. The normal mixture model, however, indicates that it is positive but not significant. Analyzing data, correlation coefficients of car purchasing cost and annual maintenance cost with car usage are 0.422, and 0.451, respectively. Moreover, it is consistent with social experience that annual maintance cost should be higher if the car is used more often. Hence, results suggested by the normal mixture model are more reasonable. Estimates of variables with respect to household structure also exhibit great difference. Although the existence of elderly people exerts little influence on the car ownership, it promotes the increase of AVMT with statistically significant coefficient revealed by the normal mixture model. The variable representing households with 2 or more children in

4.4. Comparison with the Copula model For comparison, Copula models are estimated. In terms of dependence structures (copulas), we try Clayton, Joe, Gumbel, FGM, and AMH. For the selection equation, Logit model and Probit model are the two most commonly used models. In Logit model, the discrete variable is independent and identically distributed (IID) type-1 extreme value while it follows a normal distribution in Probit model. Due to the Independence from Irrelevant Alternatives (IIA) property restriction exhibited by the Logit model, the Probit model is employed in this study. Normal marginal distribution thus is assumed for the discrete variable. With respect to marginal distribution for the outcome 45

Travel Behaviour and Society 20 (2020) 36–50

N. Wu, et al.

5. Conclusion and implication

Table 4 Goodness of Fit for Copula Models with Normal Marginal Distributions. Copula

logLik.

Normal Clayton Joe Gumbel FGM AMH

−2271.273 −2271.329 −2314.402 −2268.546 −2317.111 −2296.462

In this paper, a normal mixture sample selection model based on Bayesian MCMC and Dirichlet prior is developed in the parametric framework to relax the normal coupling assumption in the traditional sample selection model. After detailed theoretical derivation, a simulation example is conducted to justify the model. Simulation results suggest that the normal mixture model can not only reveal parameters’ true values, but also improve the goodness of fit greatly. Generally, if a model incorporates more parameters, the goodness of fit is better. However, this may induce overfitting issue. Subsequently, various tests are proposed to identify it, including estimates robustness test, goodness of fit test, image inspection, and hould-out sample test. We use the new model to explore the influence of household structure and living environment on household car ownership and usage behavior in Dalian, China. For comparison, both of the normal model and the normal mixture model are estimated. The normal mixture model with two components already has the ability to capture the flexible coupling relationship between car ownership and usage behaviors. However, the model fit can be greatly improved when the number of components is three. Moreover, results are very robust across three Markov chains. When the number of components is 4 or greater than it, results are not robust and the fitted error distribution is not representative for the popuation. Therefore, the normal mixture model with 3 components performs better in terms of goodness of fit and generalization ability. By comparing its estimates with those from the normal model, great disparity occurs. Significance level, magnitude, even the sign of variables from the normal model are seriously biased. Estimation results from the normal mixture model suggest that households with children have a higher likelihood to own a private car, and tend to use the car more. And the influence of 2 or more children is bigger than that of 1 child. Although the existence of elderly people does not have a significantly positive influence on private car ownership, it indeed prompts the increase of private car usage with statistically significant coefficient. Households with higher monthly expense and their own apartments or houses tend to own a private car and use it more. In terms of the influence of local public transport accessibility, estimation results from this case study suggest that households prefer to live near the public transit station regardless of car ownership level. Moreover, households living in CBD district are restrained from owning and using private cars. Although CBD district variable can capture the influence of some land use patterns on household car ownership and usage behavior, such investigation is still limited. This is one limitation of this case study. In the future research studies, the influence of more land use variables should be further investigated. Findings above have following policy implications. For households with higher consumption level or heavy assets, such as an apartment, they are more dependent on private cars. Thus, the effect of monetary stimuli is rather limited. Instead, we should keep our eyes on the travel behavior of elderly people and children in these households. Due to the implementation of “2-children-allowed” policy in China, private car ownership and usage behavior will be intensified. Therefore, the development of flexible and reliable public travel alternatives for children and elderly people is particularly important. However, the improvement of local public transport is not enough. Network level design of public transport is required. Lastly, special attention should be paid to efficiency of the normal mixture model. Both of simulation example and empirical case study indicate that the normal mixture model with two components already has a strong power in capturing the general pattern of error distribution. Its goodness of fit has been impressively improved compared with the normal model. Moreover, the efficiency is justified again by comparison with the Copula model. Testing different copulas and combinations of marginal distributions, influence of marginal distributions is

Note: The loglikelihood value for the normal model listed in Table 3 is −2285.442 rather than −2271.273. The negligible difference is resulted from different estimation methods (Bayesian Markov Chain Monte Carlo Estimation and Maximum Likelihood Estimation). Table 5 Goodness of Fit for Three Models. The normal model

The normal mixture model (k = 2 )

logLik.

The Copula model (NGA-G) logLik.

MCMC Chain

logLik.

– – – −2271.273

– – – −2178.021 (4.106% )

1 2 3 Mean

−2088.365 −2084.238 −2088.537 −2087.047 (8.111% )

N-GA-G: Normal marginal distribution for the selection variable, Gamma marginal distribution for the outcome variable, and the dependence structure is captured by the Gumbel copula.

variable, we use gamma distribution since it can describe different distribution patterns of nonnegative continuous variable. In order to pick up the most appropriate copula, normal-normal marginal distributions are firstly assumed. Then normal-gamma marginal distribution combination is evaluated under this copula. Table 4 lists the log likelihood value for different Copula models with normal-normal marginal distributions. Observing it, the model with Gumbel copula performs best (-2268.546). The improvement in terms of goodness of fit, however, could be almost negligible compared with the performance of traditional normal model (-2271.273). Using Gumbel copula, we estimate the Copula model with Normal-GAmma marginal distribution combination, abbreviated for N-GA-G Copula model. Goodness of fit of N-GA-G Copula model is listed in Table 5, together with those from traditional normal model and the normal mixture model (k = 2 ). Log likelihood in each MCMC chain for the normal mixture model (k = 2 ) is the average. Observing the table, goodness of fit of N-GA-G Copula model is −2178.021, improved by 4.106% compared with that of traditional normal model. The improvement is obvious, indicating the influence of marginal distributions is bigger than that of coupling relationship (copulas). Goodness of fit of the normal mixture model (k = 2 ) is −2087.047, improved by 8.111% compared with the performance of traditional normal model. Moreover, performance of the normal mixture model (k = 2 ) is rather stable across three Markov chains. Comparison result suggests the superiority of the normal mixture model (k = 2) over the Copula model. Since no assumption is made on marginal distributions, advantages of the normal mixture model (k = 2 ) over the Copula model are obvious. The comparison suggests the super high efficiency of Bayesian approach based on mixture of normals in capturing the flexiable interdependency of discrete and continuous decisions.

46

Travel Behaviour and Society 20 (2020) 36–50

N. Wu, et al.

suggested to be more important than that of the coupling relationships. Goodness of fit of the Copula model with appropriate marginal distributions and coupling relationship is improved compared with that of traditional normal model. The improvement, however, is not significant compared with the normal mixture model with two components, which imposes no assumption on marginal distributions. Therefore, when multiple tests put forward above fail to identify the overfitting problem, or when the overfitting problem is crucial in policy analysis, the normal mixture model with two components is a better option than the normal model and the Copula model. Even though it may not be the best model, it can explain data better.

Writing - original draft, Funding acquisition. Xiang (Ben) Song: Methodology, Writing - review & editing. Ronghan Yao: Resources, Project administration. Qian Yu: Formal analysis, Writing - review & editing. Chunyan Tang: Investigation, Validation. Shengchuan Zhao: Supervision, Project administration. Acknowledgement The authors wish to thank anonymous reviewers for their valuable suggestions and comments. This work is jointly supported by Natural Science Basic Research Plan in Shaanxi Province of China (2019JQ212), and the Fundamental Research Funds for the Central Universities, CHD (Grant No. 300102219101).

CRediT authorship contribution statement Na Wu: Conceptualization, Methodology, Software, Data curation, Appendix Details for each step in the Gibbs sampler are described as follows. 1) Draw ind|[pvec, {µk ,

k },

, s, y ].

For each observation n , draw indn from a multinomial distribution whose probability vector is with pveck as the prior probability of membership for each component.

indn

Multinomial ( n),

For n For n nk

N0 and n N0 ,

=(

n1,

...,

nK ),

n

is computed based on the likelihood ratio (A1)

for k = 1, ...,K

N1, expressions for likelihood ratios are different. Thus,

nk

is given by distinct equations.

Pr(sn = 0| 1, µk1) pvec k Pr(sn = 0| 1, µk1) k

= pveck For n

nk

n

n.

(A2)

N1,

= pveck k

Pr(yn , sn = 1| , µk , k ) pveck Pr(yn , sn = 1| , µk ,

(A3)

k)

Substituting Pr(sn = 0| 1, µk1) and Pr(yn , sn = 1| , µk ,

k)

with Eqs. (21) and (22),

nk

can be computed.

2) Draw pvec conditional on ind : pvec|ind . Posterior of pvec is also a Dirichlet distribution:

pvec

(A4)

Dirichlet ( )

Parameter vector ~ can be expressed as,

~ =h + k k

(A5)

k

N

hk =

I (indn = k )

(A6)

n=1

3) Draw sn for n = 1, ...,N conditional on other parameters: sn |[indn, , {µk ,

k }, yn ].

Latent variable sn follows a truncated normal distribution. Conditional mean and variance are obtained according to Poirier (1995). Similarly, we need to consider two different cases.

sn

TN(

sn

TN(0, +

where

,0)

)

(xn1 x n1

2 vindn

1

1

2 2 12indn + vindn

)

+ µindn 1 , 1 , n

+ µindn 1 +

= var (

N0

(A7)

12indn 2 2 12indn + vindn

indn 1 | indn 2

), n

(

× yn

x n2

2

)

µindn 2 ,

2 vindn 2 12indn

+

2 vindn

, n

N1

(A8)

N1.

For simplicity, denote 2 cind = n

2 vindn 2 12indn

4) Draw

1

+

2 vindn

conditional on other parameters:

(A9) 1 |[ind ,

{µk ,

k },

y, s ,

2].

47

Travel Behaviour and Society 20 (2020) 36–50

N. Wu, et al.

Posterior of

is a normal distribution specified as follows.

1

N (b¯1, B¯1)

1

(A10)

where

(

B¯1 = A

1

+

(

n N1

b¯1 = B¯1 A 1 b1 +

=

1 (1)

=

xn1 x n1

1 (0)

1 (0) ,

(

)1

(

x n1 xn1

2

)1

(A12)

µindn 1

cind2n x n1 (sn n N1

(A13)

µindn 1

conditional on other parameters:

Posterior of

)

)

x n1 sn n N0

cind2n xn1 xn1 n N1

5) Draw

1 (1)

n N1

are two least square estimators derived according to Van Hasselt (2011).

1 (1)

n N0

(

(A11)

cind2n xn1 xn1

+

n N0

And 1 (0)

)1

cind2n x n1 xn1

xn1 xn1 + n N0

2 |[ind ,

12indn 2 12indn

{µk ,

+

2 vindn

k },

y, s ,

( n2

)

µindn 2 )

1].

is a normal distribution specified as:

2

N (b¯2, B¯2)

2

(A14)

(A15)

where

(

B¯2 = A

2

2 vindn x n2 x n2

+ n N1

(

=

(

(A16)

)

b¯2 = B¯2 A 2 b2 +

2

)1

2 vindn x n2 x n2 2 n N1

2 vindn x n2 x n2 n N1

6) Draw {µk ,

2 vk }

12k ,

)1

(A17)

2 vindn x n2 n N1

(yn

µindn 2

12indn

( n1

))

µindn 1

(A18)

for each component k .

For Hk = {n, indn = k } , we have, nk1

= µk1 +

nk 2

=

(A19)

nk1

µk2 +

if nk1 > µk1 0 or else.

nk 2

xn1

1

(A20)

And nk 2

=

12k nk1

+

nk ,

(A21)

2 k)

nk Ñ(0,

Then iid

nk1 nk 2

N

(00),

1 12k

12k 2 12k +

2 vk

(A22)

This is also another sample selection model. nk1 and nk2 correspond to sn and yn in Eqs. (1) and (3). x n1 and x n2 in Eqs. (1) and (3) are ones in Eqs. 2 (A19) and (A20). µk1 and µk2 correspond to 1 and 2 in Eqs. (1) and (3). 12k and vk correspond to 12 and v2 in Eq. (7). Therefore, repeating step 1 to 2 5, we can draw {µk , 12k, vk } from their posteriors for each component k . Define H1k as observations in component k with nk2 0 , and H0k as observations in component k with nk2 = 0 . The number of observations in H1k and H0k is h1k and h 0k , respectively. We have hk = h1k + h 0k . 2 } specified as Eqs. (18) to (20), Gibbs sampler for component k (k = 1, ...,K ) can be constructed as follows. With Conjugate priors for {µk , 12k, vk (a) Draw µk1: µk1 |[

k1, k 2,

µk2 ,

12k ,

2 vk ].

Posterior of it is a normal distribution specified as:

µk1

(

)

N b¯ µk1, B¯ µk1

(A23)

where

B¯ µk1 = (A µk1 + h 0k + [

2 12k

2 vk

+

2 vk

] 1h1k )

1

(A24)

48

Travel Behaviour and Society 20 (2020) 36–50

N. Wu, et al.

b¯ µk1 = B¯ µk1 (A µk1 b3k1 + h 0k µk1 (0) +

2 12k

2 vk

+

1 2 vk

h1k µk1 (1))

(A25)

And µk1 (0), µk1 (1) are two least square estimators. 1

µk1 (0) = h 0k

nk1

(A26)

n H0k 1

µk1 (1) = h1k

12k

nk1

2 12k

n H1k

(b) Draw (µk 2 ,

12k )|[ k 2, k1,

2 vk

+

µk1 ,

= µk 2 +

12k ( nk1

µk1) +

nk 2

µk 2)

(A27)

2 vk ].

Note that only observations n have, nk 2

(

H1k are informative about µk2 ,

12k ,

2 vk .

Thus, inference here is only based on these observations. For n

(A28)

nk

where nk N (0, Eq. (A28) is a standard linear regression. Define Mk as h1k × 2 matrix with row elements {1, element { nk 2}n H1k , and 2 vk ) .

1 A µk2 0 b3k2 , A D01 = b4k 0 A 121k

d0 =

H1k , we

nk1

µk1}n

H1k ,

and Ek2 as a h1k × 1 vector with

(A29)

Then

(µk 2 ,

N (d¯k , D¯ k )

12k )

(A30)

where

D¯ k = (Mk Mk

2 vk

d¯k = D¯ k [Mk Ek2 (c) Draw

IG

where nk

=

k

2 vk

2 k | 12k ,

For n 2 k

+ A D0 )

(A31)

1

(A32)

+ A D0 d 0 ]

µk ,

k1, k 2 .

H1k , we have, h1k 1 + ck, (dk 1 + 2 2

k k)

1

is a h1k × 1 vector with elements {

nk 2

µk 2

(A33) nk } ,

and

nk

can be expressed as follows. (A34)

12k nk1

175–198. Daziano, R.A., 2013. Conditional-logit Bayes estimators for consumer valuation of electric vehicle driving range. Resour. Energy Econom. 35, 429–450. Daziano, R.A., Achtnicht, M., 2014. Forecasting adoption of ultra-low-emission vehicles using Bayes estimates of a multinomial probit model and the GHK simulator. Transp. Sci. 48, 671–683. De Borger, B., Mulalic, I., Rouwendal, J., 2016. Substitution between cars within the household. Transp. Res. Part A: Policy Practice 85, 135–156. Ding, C., Cao, X., 2019. How does the built environment at residential and work locations affect car ownership? An application of cross-classified multilevel model. J. Transp. Geogr. 75, 37–45. Ding, P., 2014. Bayesian robust inference of sample selection using selection-t models. J. Multivariate Anal. 124, 451–464. Eluru, N., Bhat, C.R., Pendyala, R.M., Konduri, K.C., 2010. A joint flexible econometric model system of household residential location and vehicle fleet composition/usage choices. Transportation 37, 603–626. Etminani-Ghasrodashti, R., Paydar, M., Hamidi, S., 2018. University-related travel behavior: Young adults' decision-making in Iran. Sustainable Cities Soc. 43, 495–508. Fang, H.A., 2008. A discrete–continuous model of households’ vehicle choice and usage, with an application to the effects of residential density. Transp. Res. Part B: Methodol. 42, 736–758. Golshani, N., Shabanpour, R., Auld, J., Mohammadian, A., 2018a. Activity start time and duration: Incorporating regret theory into joint discrete-continuous models. Transportmetrica A: Transp. Sci. 14, 809–827. Golshani, N., Shabanpour, R., Mahmoudifard, S.M., Derrible, S., Mohammadian, A., 2018b. Modeling travel mode and timing decisions: Comparison of artificial neural networks and copula-based joint model. Travel Behaviour Soc. 10, 21–32. Hao, H., Geng, Y., Sarkis, J., 2016. Carbon footprint of global passenger cars: scenarios through 2050. Energy 101, 121–131. Heckman, J., 1974. Shadow prices, market wages, and labor supply. Econometrica 42,

References Ahn, H., Powell, J.L., 1993. Semiparametric estimation of censored selection models with a nonparametric selection mechanism. J. Econom. 58, 3–29. Alemi, F., Circella, G., Mokhtarian, P., Handy, S., 2019. What drives the use of ridehailing in California? Ordered probit models of the usage frequency of Uber and Lyft. Transp. Res. Part C: Emerging Technol. 102, 233–248. Amemiya, T., 1985. Advanced econometrics. Harvard University Press. Bhat, C.R., Eluru, N., 2009. A copula-based approach to accommodate residential selfselection effects in travel behavior modeling. Transp. Res. Part B: Methodol. 43, 749–765. Bhat, C.R., Guo, J.Y., 2007. A comprehensive analysis of built environment characteristics on household residential choice and auto ownership levels. Transp. Res. Part B: Methodol. 41, 506–526. Bhat, C.R., Sen, S., Eluru, N., 2009. The impact of demographics, built environment attributes, vehicle characteristics, and gasoline prices on household vehicle holdings and use. Transp. Res. Part B: Methodol. 43, 1–18. Brownstone, D., Fang, H., 2014. A vehicle ownership and utilization choice model with endogenous residential density. J. Transp. Land Use 7, 135–151. Castillo, E., Calviño, A., Nogal, M., Hong, K.L., 2014. On the probabilistic and physical consistency of traffic random variables and models. Comput.-Aided Civ. Infrastruct. Eng. 29, 496–517. Chu, S., 2015. Car restraint policies and mileage in Singapore. Transp. Res. Part A: Policy Practice 77, 404–412. Cirillo, C., Liu, Y., 2013. Vehicle ownership modeling framework for the State of Maryland: analysis and trends from 2001 and 2009 NHTS Data. J. Urban Plann. Devel.-ASCE 139, 1–11. Cosslett, S.R., 1991. Semiparametric estimation of a regression model with sample selectivity. Nonparametric & Semiparametric Methods in Econometrics & Statistics

49

Travel Behaviour and Society 20 (2020) 36–50

N. Wu, et al. 679–694. Heckman, J.J., 1979. Sample selection bias as a specification error. Appl. Econom. 31, 153–161. Koop, G., Poirier, D.J., 1997. Learning about the across-regime correlation in switching regression models. J. Econom. 78, 217–227. Kroesen, M., 2019. Residential self-selection and the reverse causation hypothesis: assessing the endogeneity of stated reasons for residential choice. Travel Behaviour Soc. 16, 108–117. Lavieri, P.S., Garikapati, V.M., Bhat, C.R., Pendyala, R.M., 2017. Investigation of heterogeneity in vehicle ownership and usage for the millennial generation. Transp. Res. Rec. 91–99. Lee, L.F., 1982. Some approaches to the correction of selectivity bias. Rev. Econ. Stud. 49, 355–372. Lee, L.F., 1983. Generalized econometric models with selectivity. Econometrica 51, 507–512. Lee, L.F., 1994. Semiparametric two-stage estimation of sample selection models subject to Tobit-type selection rules. J. Econom. 61, 305–344. Liu, Y., Cirillo, C., 2016. Evaluating policies to reduce greenhouse gas emissions from private transportation. Transp. Res. Part D: Transport Environ. 44, 219–233. Liu, Y., Ji, Y., Shi, Z., He, B., Liu, Q., 2018. Investigating the effect of the spatial relationship between home, workplace and school on parental chauffeurs' daily travel mode choice. Transp. Policy 69, 78–87. Liu, Y., Tremblay, J.-M., Cirillo, C., 2014. An integrated model for discrete and continuous decisions with application to vehicle ownership, type and usage choices. Transp. Res. Part A: Policy Practice 69, 315–328. Luke, R., 2018. Car ownership perceptions and intentions amongst South African students. J. Transp. Geogr. 66, 135–143. Mannering, F., Hensher, D.A., 1987. Discrete/continuous econometric models and their application to transport analysis. Transport Rev. 7, 227–244. Mannering, F., Winston, C., 1985. A dynamic empirical analysis of household vehicle ownership and utilization. Rand J. Econ. 215–236. Mannering, F.L., 1986. Selectivity bias in models of discrete and continuous choice: an empirical analysis. Transp. Res. Rec. Meurs, H., Haaijer, R., Geurs, K.T., 2013. Modeling the effects of environmentally differentiated distance-based car-use charges in the Netherlands. Transp. Res. Part D: Transport Environ. 22, 1–9. Mokhtarian, P.L., Cao, X., 2008. Examining the impacts of residential self-selection on travel behaviour: A focus on methodologies. Transp. Res. Part B: Methodol. 42, 204–228. Musti, S., Kockelman, K.M., 2011. Evolution of the household vehicle fleet: Anticipating fleet composition, PHEV adoption and GHG emissions in Austin, Texas. Transp. Res. Part A: Policy Practice 45, 707–720. Newton, M.A., Raftery, A.E., 1994. Approximate Bayesian inference with the weighted likelihood bootstrap. J. Roy. Stat. Soc. 56, 3–48. Poirier, D.J., 1995. Intermediate Statistics and Econometrics: A Comparative Approach, vol 1: MIT Press. Rossi, P.E., Allenby, G.M., 2005. Bayesian Statistics and Marketing. Wiley. Senbil, M., Kitamura, R., Mohamad, J., 2009. Residential location, vehicle ownership and travel in Asia: a comparative analysis of Kei-Han-Shin and Kuala Lumpur

metropolitan areas. Transportation 36, 325–350. Sener, I.N., Bhat, C.R., 2011. A copula-based sample selection model of telecommuting choice and frequency. Environ. Plann. A 43, 126–145. Shekarchian, M., Moghavvemi, M., Zarifi, F., Moghavvemi, S., Motasemi, F., Mahlia, T.M.I., 2017. Impact of infrastructural policies to reduce travel time expenditure of car users with significant reductions in energy consumption. Renew. Sustain. Energy Rev. 77, 327–335. Smart, M.J., Klein, N.J., 2018. Remembrance of cars and buses past: How prior life experiences influence travel. J. Plann. Educ. Res. 38, 139–151. Soder, M., Peer, S., 2018. The potential role of employers in promoting sustainable mobility in rural areas: Evidence from Eastern Austria. Int. J. Sustainable Transp. 12, 541–551. Soltani, A., 2017. Social and urban form determinants of vehicle ownership: Evidence from a developing country. Transp. Res. Part A: Policy Practice 96, 90–100. Spissu, E., Pinjari, A.R., Pendyala, R.M., Bhat, C.R., 2009. A copula-based joint multinomial discrete–continuous model of vehicle type choice and miles of travel. Transportation 36, 403–422. Sun, B., Ermagun, A., Dan, B., 2017. Built environmental impacts on commuting mode choice and distance: Evidence from Shanghai. Transp. Res. Part D: Transport Environ. 52, 441–453. Tanner, T., Wong, W., 1987. The calculation of posterior distributions by Data Augmentation. J. Am. Stat. Assoc. 82, 528–549. Tao, S., He, S.Y., Thogersen, J., 2019. The role of car ownership in attitudes towards public transport: a comparative study of Guangzhou and Brisbane. Transp. Res. Part F: Traffic Psychol. Behaviour 60, 685–699. Tezcan, H.O., Ogut, K.S., Cidimal, B., 2011. A multinomial logit car use model for a megacity of the developing world: Istanbul. Transp. Plann. Technol. 34, 759–776. Tosa, C., Sato, H., Morikawa, T., Miwa, T., 2018. Commuting behavior in emerging urban areas: Findings of a revealed preferences and stated-intentions survey in Cluj-Napoca, Romania. J. Transp. Geogr. 68, 78–93. Van Hasselt, M., 2011. Bayesian inference in a sample selection model. J. Econom. 165, 221–232. Xiong, C., Yang, D., Zhang, L., 2018. A high-order hidden Markov model and its applications for dynamic car ownership analysis. Transp. Sci. 52, 1365–1375. Yang, Z., Jia, P., Liu, W., Yin, H., 2017. Car ownership and urban development in Chinese cities: a panel data analysis. J. Transp. Geogr. 58, 127–134. Yao, M., Wang, D., 2018. Mobility and travel behavior in urban China: the role of institutional factors. Transp. Policy 69, 122–131. Yin, C., Sun, B., 2018. Disentangling the effects of the built environment on car ownership: a multi-level analysis of Chinese cities. Cities 74, 188–195. Zarabi, Z., Manaugh, K., Lord, S., 2019. The impacts of residential relocation on commute habits: A qualitative perspective on households' mobility behaviors and strategies. Travel Behaviour Soc. 16, 131–142. Zhang, D., Schmoecker, J.-D., Fujii, S., Yang, X., 2016. Social norms and public transport usage: empirical study from Shanghai. Transportation 43, 869–888. Zou, Y., Zhang, Y., 2016. A copula-based approach to accommodate the dependence among microscopic traffic variables. Transp. Res. Part C: Emerging Technol. 70, 53–68.

50