Regional Science and Urban Economics 45 (2014) 1–21
Contents lists available at ScienceDirect
Regional Science and Urban Economics journal homepage: www.elsevier.com/locate/regec
Spatial autoregressive models with unknown heteroskedasticity: A comparison of Bayesian and robust GMM approach☆ Osman Doğan ⁎, Süleyman Taşpınar Program in Economics, The Graduate School and University Center, The City University of New York, New York, United States
a r t i c l e
i n f o
Article history: Received 29 August 2013 Received in revised form 9 December 2013 Accepted 20 December 2013 Available online 4 January 2014 JEL classification: C13 C21 C31 Keywords: Spatial autoregressive models Unknown heteroskedasticity Robustness GMM Asymptotics MLE Markov Chain Monte Carlo (MCMC)
a b s t r a c t Most of the estimators suggested for the estimation of spatial autoregressive models are generally inconsistent in the presence of an unknown form of heteroskedasticity in the disturbance term. The estimators formulated from the generalized method of moments (GMM) and the Bayesian Markov Chain Monte Carlo (MCMC) frameworks can be robust to unknown forms of heteroskedasticity. In this study, the finite sample properties of the robust GMM estimator are compared with the estimators based on the Bayesian MCMC approach for the spatial autoregressive models with heteroskedasticity of an unknown form. A Monte Carlo simulation study provides evaluation of the performance of the heteroskedasticity robust estimators. Our results indicate that the MLE and the Bayesian estimators impose relatively greater bias on the spatial autoregressive parameter when there is negative spatial dependence in the model. In terms of finite sample efficiency, the Bayesian estimators perform better than the robust GMM estimator. In addition, two empirical applications are provided to evaluate relative performance of heteroskedasticity robust estimators. © 2013 Elsevier B.V. All rights reserved.
1. Introduction Most of the estimation methods suggested in the literature are valid under the assumption that the disturbance terms of spatial models are independently and identically distributed (i.i.d.) or i.i.d. normal. However, in many regression applications, heteroskedasticity may well be present. For example, cross-sectional units usually differ in size and some other characteristics, which in turn implies that the disturbance terms in the regression analyses across these cross-sectional units may be heteroskedastic. It may also be present in a random coefficient model, where the parameters of the model are random around fixed values. In this case, heteroskedasticity depends on the exogenous variables of the spatial models. Moreover, in regression analysis, many dependent variables are constructed by data aggregation. In such a case, heteroskedasticity arises from the process of averaging with different
☆ We would like to thank Wim Vijverberg for the insightful and instructive comments on this study. We are also grateful to James P. LeSage for his helpful comments for the improvement of this study. This research was supported, in part, by a grant of computer time from the City University of New York High Performance Computing Center under NSF Grants CNS-0855217 and CNS-0958379. ⁎ Corresponding author at: The Graduate Center, Economics 365 Fifth Avenue NY, NY 10016-4309, United States. Tel.: +1 212 817 8255; fax: +1 212 817 1514. E-mail addresses:
[email protected] (O. Doğan),
[email protected] (S. Taşpınar). 0166-0462/$ – see front matter © 2013 Elsevier B.V. All rights reserved. http://dx.doi.org/10.1016/j.regsciurbeco.2013.12.003
numbers of observations when the data is getting aggregated (Griffiths, 2007; Lee, 2010).1 In the present study, we evaluate the performance of various heteroskedasticity robust estimators suggested in the literature for spatial autoregressive models. To this end, we conduct a Monte Carlo study and provide two empirical illustrations to show how these estimators perform in applied research. In the presence of heteroskedastic disturbances, the ML and GMM estimators are generally inconsistent. The ML estimator is inconsistent if heteroskedasticity is not incorporated into estimation, because the likelihood function is not maximized at the true parameter values.2 The GMM estimators are also inconsistent since the moment functions are often designed under the assumption that disturbances are i.i.d. (Kelejian and Prucha, 1998, 1999; Liu et al., 2010). Hence, the orthogonality conditions for the moment functions may not be satisfied. To handle unknown forms of heteroskedasticity, Kelejian and Prucha (2010) extend their two-step GMM estimation approach by modifying the moment functions for a spatial model that has a spatial lag in the 1 For some recent empirical studies, see Lin and Lee (2010) and Doğan and Taşpınar (2013b). 2 For many spatial model specifications, the ML estimation has been the most widely used technique and has often been the only technique that is implemented. The ML approach is well treated in Anselin (1988), Lee (2004) and LeSage and Pace (2009).
2
O. Doğan, S. Taşpınar / Regional Science and Urban Economics 45 (2014) 1–21
dependent variable and the disturbance term (for short SARAR(1,1)). Badinger and Egger (2011) extend the robust estimation approach in Kelejian and Prucha (2010) to the case of SARAR(p,q) specification. Lin and Lee (2010) suggest a one-step robust GMM estimator for the model with only spatial dependence in the dependent variable (for short SARAR(1,0)).3 In this approach, the parameters of a spatial model are simultaneously estimated by a GMM estimator formulated from the combination of a set of linear and quadratic moment functions (Lee, 2007a, 2007b; Lee and Liu, 2010; Liu et al., 2010). The moment functions considered for both two-step and one-step GMM estimators are motivated by the reduced form of the spatial models. For example, the reduced form for the case of SARAR(1,0) indicates that the endogenous spatial lag of the dependent variable is a function of a stochastic and a non-stochastic variable. The linear moment functions are formulated from the non-stochastic part and the quadratic moment functions are formulated for the stochastic part. The two-step GMM approach of Kelejian and Prucha is motivated by computational simplicity as the ML estimation involves significant computational burden for the large samples. Despite the computational advantage, the two step GMM estimator is inefficient relative to the one-step GMM estimators suggested in Lee (2007a, 2007b), Liu et al. (2010) and Lee and Liu (2010). An alternative to ML and GMM/IV estimation methods is the Bayesian estimation method, which has been receiving attention in recent years. Bayesian data analysis is distinctly different from the classical (or frequentist) analysis in its treatment of parameters of the model. In Bayesian econometrics the parameter vector is a random variable, and Bayesian analysts formulate probabilistic statements about the parameters before observing any data. These ex-ante probabilities are called priors and usually take the form of a probability distribution with known moments. This notion of the prior probabilities (or subjective probabilities) is totally absent in classical econometrics where all estimation and inference are based on observed data. In both approaches, the likelihood function has the same functional form and reflects the relation between data and the parameters of interest. In Bayesian approach, the likelihood function is combined with prior functions via Bayes' rule to construct the posterior probability distribution of parameters. The posterior distribution function of the parameter vector contains the information necessary for the estimation and inference. The Bayesian approach requires the evaluation of higher dimensional integrals to obtain posterior expectations, marginal likelihoods, and predictive densities. The application of the Bayesian methods to the estimation of spatial models follows the path of the progress that has been made in the context of Bayesian computation techniques. Hepple (1995b) identifies four phases for the development of Bayesian computation techniques. In the first phase, Bayesian studies involved problems that can be characterized with well known probability distributions such that the characteristics of posterior distributions such as means and covariances can be analytically derived. In the second phase, Bayesian studies focused on techniques through which problems involving multidimensional probability distributions can be reduced to univariate or bivariate integrations. In this phase, numerical techniques for the univariate and bivariate integration were used. In the third phase, Bayesian analysts worked on efficient procedures for higher order integrations. Gauss–Hermite, importance sampling and Monte Carlo integration techniques were used to tackle complicated and high-dimensional problems. In the fourth phase, Markov Chain Monte Carlo (MCMC) simulation techniques were introduced that make the computation of higher dimension problems feasible. The advent of the MCMC approach represents a shift in thinking, where the focus on the question of analytical moment calculation is replaced with a more general question of sampling issues from the posterior distributions (Albert and Chib, 1993; Casella and George, 1992; Chib, 2001). In the Gibbs sampling version of the MCMC approach,
3
For a robust 2SLS estimator of the SARAR(1,0) specification, see Anselin (2006).
the joint posterior distribution is decomposed into conditional posterior distributions through which random draws (or a simulated sample) can be obtained. Inferences such as posterior mean and posterior covariance matrix can be estimated from the simulated sample obtained from the conditional posterior densities. The development in Bayesian computation techniques provides a wide range of tools that can be applied to the estimation of spatial models. The early literature on the Bayesian perspective on spatial models uses combinations of tools developed during the period from the first to the third phase. For example, Hepple (1979) analytically derives the joint posterior and marginal posterior distributions of parameters for a spatial model containing a spatial lag in the disturbance term. The posterior moments are calculated through numerical univariate and bivariate integration techniques. Anselin (1982, 1988) considers the Bayesian approach for pure spatial autoregressive and spatial error models. Diffuse priors for parameters of models are suggested and marginal posterior distributions of parameters are analytically derived. The posterior mean of the autoregressive parameter for a pure spatial autoregressive model in Anselin (1982) is estimated with univariate numerical integration. A small Monte Carlo simulation study in Anselin (1982) demonstrates that the Bayesian estimator performs as well as the ML estimator for larger values of the autoregressive parameter and larger samples.4 Hepple (1995a, 1995b) develops Bayesian analyses for major spatial specifications including the SARAR(1,0) model, SARAR(0,1) (or the SEM) model and spatial moving average models (for short SARMA(0,1)). In each case, the joint posterior distributions of the parameters are stated from which the marginal posterior of the spatial autoregressive parameters is analytically derived. The analytical derivation for the marginal posterior distribution of the parameters of the exogenous variables is not available as the spatial autoregressive parameters can not be analytically integrated out from the joint posterior distribution. However, the dimension of joint posterior can be reduced to two dimensions so that bivariate numerical integration techniques can be used for the estimation of the marginal posterior moments. As a result, the estimate of the spatial autoregressive parameters can be obtained through univariate numerical integration, and the estimates of the parameters of the exogenous variables can be obtained through bivariate numerical integration over a grid of pairs of values.5 Hepple (2002, 2003) provides further analytical simplifications such that the moments of the marginal posterior distributions of the exogenous variables in spatial models with only one autoregressive parameter can be obtained through univariate numerical integration. However, for the case of SARAR(1,1) and SARMA(1,1) where there are two spatial autoregressive parameters the calculation of these moments again requires bivariate integration over the parameter space of autoregressive parameters. The recent studies use the MCMC approach to estimate spatial models. This approach is more appropriate for cases where marginal posterior distributions are difficult to simplify analytically and to integrate numerically. The MCMC approach is introduced for most types of spatial models in LeSage (1997) and LeSage and Pace (2009). Kakamu and Wago (2008) compare finite sample properties of the Bayesian estimators based on the MCMC approach with that of the ML estimator for the static panel spatial autoregressive model. The Monte Carlo simulation results in Kakamu and Wago (2008) show that the Bayesian estimator is virtually as efficient as the ML estimator. In spatial models, the boundaries of the parameter space for spatial autoregressive parameters are known, which facilitate the
4 To the best of our knowledge, Anselin (1982) is the first study, where small sample properties of the ML estimator are compared with the frequentist properties of the Bayesian estimator in the context of spatial models. 5 Note that as the dimension of spatial parameter increases, the dimension of numerical integration rises. For example, for the case of SARAR(1,1) the marginal posterior of autoregressive parameters is two dimensional.
O. Doğan, S. Taşpınar / Regional Science and Urban Economics 45 (2014) 1–21
selection of the proper uninformative priors for these parameters. Thus, in most studies, uniform priors over the parameter space are assigned to spatial autoregressive parameters (as in LeSage and Pace, 2009; Parent and LeSage, 2008; LeSage, 1997; Kakamu and Wago, 2008). Oliveira and Song (2008) consider two versions of Jeffreys' prior, called independence Jeffreys and Jeffreys-rule priors for the spatial autoregressive parameter for the case of SARAR(0,1). Jeffreys priors are uninformative improper priors that are constructed from the information matrix. The appealing aspect of these priors is that they are robust to reparametrization of the model. For the case of SARAR(0,1), Oliveira and Song (2008) state the marginal posterior density of autoregressive parameter (up to a constant of proportionality) and use the adaptive rejection Metropolis sampling algorithm to simulate random draws. In a Monte Carlo study, they show that the Bayesian inferences based on the independence Jeffreys prior are slightly better than the inferences obtained based on uniform and Jeffreys-rule priors. The Bayesian estimators mentioned above are designed for the cases where spatial models have homoskedastic disturbances. The treatment of the unknown form of heteroskedasticity in the Bayesian approach starts by assuming a structure for the unknown covariance matrix of disturbances. First, it is assumed that the covariance matrix can be decomposed into a constant component and a component that varies over observations. Second, a prior density function is assigned to each component so that the joint posterior density of the parameters can be constructed. The most important assumption is that the component that varies over observations is assumed to be i.i.d. with a common prior density function. In this approach, a prior density function for the parameter of the common density function is also specified. In the literature, the priors determined in this way are known as hierarchical priors. This feature of the Bayesian approach allows flexible models through which any distribution for the disturbance term can be approximated with a high degree of accuracy (Koop, 2003). Once the prior density functions are determined, the joint posterior density function can be easily constructed, from which a full set of conditional posterior density functions can be obtained to form an MCMC sampling scheme. The random draws (or the simulated sample) are then used to calculate the relevant moments for the estimation. LeSage (1997) is the first study in which the above outlined hierarchical Bayesian approach of unknown heteroskedasticity is introduced for the spatial autoregressive models. Following Geweke (1993); LeSage (1997) assigns an i.i.d. Chi-square distribution as the prior for the elements of the component of the covariance matrix that is assumed to vary over observations. The sampling for the spatial autoregressive parameter is carried out through the ratio of uniform sampling algorithm. LeSage and Pace (2009) use a Metropolis Hasting algorithm based on a tuned random walk procedure to sample from the conditional posterior density function of autoregressive parameters to form the Metropolis within Gibbs sampling procedure for the whole model. Parent and LeSage (2008) consider the hierarchical Bayesian approach for the estimation of a conditional autoregressive (CAR) spatial model, in which the spatial structured random component of the model is assumed to have an autoregressive process. It is of interest to investigate the performance of the aforementioned heteroskedasticity-robust GMM and Bayesian estimators through a Monte Carlo study and applied applications. In this study, we compare the small sample properties of these estimators through a Monte Carlo study.6 In this respect, we make two contributions to the literature. First, this study is the first to compare the mentioned approaches of the unknown heteroskedasticity.7 For the case of SARAR(1,0), the robust 6
To the best of our knowledge, there is no such study in the literature. Anselin (1988, p.254) states promising directions of research and he writes “An assessment of the relative merits of alternative paradigms for the analysis of spatial data, such as parametric analysis vs. nonparametrics, classical probability vs. Bayesian approaches, and exploratory vs. explanatory analysis” as the first direction. The present study is in this first direction as it compares classical probability and Bayesian approaches of the estimation. 7
3
estimation methods from both perspectives are outlined. Second, we provide two empirical examples to shed light on the effect of heteroskedasticity on the parameter estimates. In the following, Section 2 provides model assumptions that are required for the formal study of the autoregressive spatial models. Sections 3 and 4 build the robust GMM approach and the Bayesian hierarchical approach for the case of SARAR(1,0). Section 5 contains the Monte Carlo experiments we use to provide finite sample properties of the estimators. In Section 6, we provide two empirical illustrations. Section 7 concludes. 2. Model specification and assumptions In this study, the following first order SARAR(1,0) specification is considered: Y n ¼ λ0 W n Y n þ X n β0 þ εn ;
ð2:1Þ
where Yn is the n × 1 vector of dependent variable, Xn is the n × k matrix of non-stochastic exogenous variables, Wn is the n × n spatial weight matrix of known constants with zero diagonal elements, and εn is the n × 1 vector of disturbances (or innovations). The variable WnYn is known as the spatial lag of the dependent variable. The spatial effect parameter λ0 is known as the spatial autoregressive parameter. As spatial data are characterized with triangular arrays, the variables in Eq. (2.1) have subscript n.8 Let Θ be the parameter space of the model. In order to distinguish the true parameter vector from other possible values in Θ, the model is stated with the true parameter vector θ0 = (λ0,β0′)′. For the notational simplicity, we denote Sn(λ) = (In–λWn) and 1 Gn(λ) = WnS− n (λ). Also, at the true parameter value λ 0 , we denote S n (λ0 ) = S n and Gn (λ0 ) = Gn . Let Diag(∙) be the operator that creates a diagonal matrix from the diagonal elements of the input matrix. Finally, let ∑n = Diag(σ2n1,…,σ2nn) be n × n diagonal covariance matrix, where σ2ni = E(ε2ni), for i = 1,…,n. Next, assumptions that are required for the asymptotic properties of estimators are elaborated and then their interpretations are considered for the case of SARAR(1,0). Assumption 1. The elements εni of the disturbance term εn are distributed independently with mean zero and variance σ2ni, and E|εin|v b ∞ for some v N 4 for all n and i. This assumption allows independent and heteroskedastic disturbances. The elements of the disturbance term have moments higher than the fourth moment. The existence of moment condition is required for the application of the central limit theorem for the quadratic form given in Kelejian and Prucha (2010) for the GMM estimator. In addition, the variance of quadratic form in εn exists and is finite when the first four moments are finite.9 Finally, Lyapunov's inequality guarantees that the moment less than ν is also uniformly bounded for all n and i. Assumption 2. The spatial weight matrix Wn is uniformly bounded in 1 1 absolute value in row and column sums. Moreover, S− and S− n n (λ) exist and are uniformly bounded in absolute value in row and column sums for all values of λ in a compact parameter space. The uniform boundedness of the terms in Assumptions 1 and 2 is motivated to control spatial autocorrelations in the model at a tractable level (Kelejian and Prucha, 1998).10 Assumption 2 also implies that the model in Eq. (2.1) represents an equilibrium relation for the dependent variable. By this assumption, the reduced form of the model 8
See Kelejian and Prucha (2010). For the variance of a quadratic form in εn, see Lemma 2(3). 10 For a definition and some properties of uniform boundedness see Kelejian and Prucha (2010). 9
4
O. Doğan, S. Taşpınar / Regional Science and Urban Economics 45 (2014) 1–21
−1 becomes feasible as Yn = S−1 n Xnβ0 + Sn εn. Finally, the statement of Assumption 2 is assumed to hold at the true and arbitrary autoregressive parameter vector. The uniform boundedness of S−1 n (λ) is required for the ML estimator, not for the GMM estimator (Liu et al., 2010). In the literature, the parameter space for spatial autoregressive parameter λ0 is restricted to the interval (−1,1), when spatial weight matrices are row normalized.11 The following assumptions are the usual regularity conditions required for the GMM estimator. Throughout this study, the vector of moment functions considered for the GMM estimator is in the form of g(θ0) = (εn′P1nεn,…,εn′Pmnεn, εn′Qn)′. The moment functions involving n × n constant matrices Pjn for j = 1,…,m are known as quadratic moment functions. The last moment function Qn′εn is the linear moment function, where Qn is an n × k⁎ with full column rank and with k⁎ ≥ k + 1. The matrices Pjns and Qn are chosen in such way that orthogonality conditions of population moment functions are not violated. Let P 2n be the class of n × n constant matrices with zero diagonal elements. The quadratic moment functions involving matrices from this class satisfy the orthogonality conditions when disturbance terms satisfy Assumption 1. Assumption 4 states regularity conditions for these matrices and the last assumption characterizes the parameter space.
Assumption 3. The regressors matrix Xn is an n × k matrix consisting of uniformly bounded constant elements. It has full column rank of k. Moreover, limn→∞ 1n X ′n X n exists and is nonsingular. Assumption 4. Elements of IV matrix Q n are uniformly bounded. Pjn ∈ P 2n for j = 1,…,m is uniformly bounded in absolute value in row and column sums. Assumption 5. The parameter space Θ is a compact subset of ℝ and θ0 ∈ Int(Θ).
In this section, the robust GMM estimator for the case of SARAR(1,0) is reviewed.12 For the case of SARAR(1,0), the reduced form of the −1 model is Yn = S−1 n Xnβ0 + Sn εn. Then, the spatial lag of the dependent 1 −1 variable can be expressed as W n Yn = W n S − n Xnβ0 + WnSn εn = G n X n β 0 + G n ε n . This result indicates that the endogenous variable WnYn in a SARAR(1,0) model is a function of a non-stochastic term GnXnβ0 and a stochastic term Gnεn. This decomposition motivates the formation of the moment functions for the GMM estimation. The nonstochastic part is instrumented by Qn = (GnXnβ0,Xn), which forms the population moment function Qn′εn (Lee, 2003, 2007a). The stochastic part is instrumented by Pjnεn, for j = 1,…,m, where Pjn ∈ P 2n. In this case, the moment function is in the form of εn′Pjnεn (Lee, 2007b; Lee and Liu, 2010; Lin and Lee, 2010; Liu et al., 2010). Under Assumption 1, the orthogonality condition of moment functions hold as shown below: ′ ′ E Q n εn ¼ Q n Eðεn Þ ¼ 0ðkþ1Þ1 ; ′ ′ ′ E εn P jn εn ¼ E tr εn P jn εn ¼ E tr P jn εn εn ′ ¼ tr P jn Σn ¼ 0; ¼ tr P jn E ε n εn where Σn = Diag(σ2n1,…,σ2nn) is the n × n covariance matrix of disturbances. The moment function based on Qn is linear and the one based There are some other formulations for the parameter space of autoregressive parameter, for details see Kelejian and Prucha (2007, 2010), LeSage and Pace (2009) and Elhorst et al. (2012). 12 Lin and Lee (2010) suggest a one-step robust GMM estimator for the case of SARAR(1,0). Doğan and Taşpınar (2013a) extend the one-step robust GMM approach of Lin and Lee (2010) for the case of SARAR(1,1). For a two-step robust GMM estimator for the case of SARAR(1,1), see Kelejian and Prucha (2010). Badinger and Egger (2011) extend the two-step robust estimation of Kelejian and Prucha (2010) to higher order spatial autoregressive models, i.e., SARAR(p,q).
! ′ εn ðθÞðGn −Diag ðGn ÞÞεn ðθÞ ; ′ ðGn X n β0 ; X n Þ εn ðθÞ
g n ðθÞ ¼
ð3:1Þ
where εn(θ) = Sn(λ)Yn − Xnβ. Let Ωn = E[g n(θ0)g n′(θ0)] and Γ n ¼ ∂gn ðθ0 Þ −E . Straightforward calculations show that13 ′ ∂θ Ωn ¼
0 X ′ P P n Σn þ Σn P n @ tr v n
Γn ¼
0 ′ @ tr Σn P n þ P n Gn
0ðkþ1Þ1
′
Q n Gn X n β0
1 01ðkþ1Þ A ; Q ′n Σn Q n
ð3:2Þ
1 01k A ; ′ Q nXn
ð3:3Þ
where Σn = Diag(σ2n1,…,σ2nn) is again the n × n covariance matrix of disturbances. Then, the robust GMM estimator is defined by ^θ ¼ n
^ λ n ^ β
n
k+1
3. Robust GMM estimation of spatial autoregressive models
11
on Pjn is quadratic in εn. For the selection of the quadratic moment matrices Pjn s, the asymptotic efficiency of the GMM estimator is considered. Lee (2007a) shows that when disturbance terms are i.i.d., the best selection of Pjn ∈ P2n is P∗n = (Gn − Diag(Gn)). Lin and Lee (2010) suggest that this selection can also be used for the case where disturbance terms satisfy Assumption 1. Consider the following set of moment functions
′ ^ −1 g ðθÞ; ¼ argminθ∈Θ g n ðθÞΩ n n
ð3:4Þ
^ is a consistent estimate of Ω n . An estimate of Ω n where Ω n ′ ^ ¼ requires estimates of tr Σn P n P n Σn þ Σn P n and Qn′ΣnQn. Let Σ n 2 2 ^ ^ ^ Diag εn1 …; εnn , where εni s are residuals based on a consistent estimator of θ0. Among others, a consistent estimator can be a 2SLS estimator based on the instrument matrix (W 2n X n , W n X n , X n ). Under our stated assumptions, it can be shown that the components of ^ . Ω can be consistently estimated by simply replacing Σ with Σ n
n
n
The fact that quadratic moment matrices are uniformly bounded in absolute value in row and column sums and the fact that distur1
bances terms have uniformly bounded fourth moments ensure that n ^ þΣ ^ P P ′ Σ ^ P −1 tr Σ P P ′ Σ þ Σ P ¼ o tr Σ ð 1 Þ. The asn n n n n n n n n n n n p n
1 ′^ 1 ′ ymptotic argument for the remaining component Q n Σ n Q n − Q n Σn n n Q n ¼ op ð1Þ is in line with the argument given by White (1980).14 Under Assumptions 1–5, Lin and Lee (2010) show that the robust GMM estimator defined in Eq. (3.4) is consistent and asymptotically normally distributed, namely15 −1 pffiffiffi 1 ′ −1 d n ^θn −θ0 Þ → N 0; lim Γ n Ωn Γ n : n→∞ n
An estimate of the asymptotic covariance matrix of
ð3:5Þ pffiffiffi^ n θn −θ0 is
needed to make asymptotically valid inferences and to construct asymptotically correct confidence regions. Lin and Lee (2010) show 13
Lemma 2 in Appendix A can be used to derive Ωn matrices in this section. The same argument also holds for Γn. That is a consistent estimate of Γn can be obtained ^ . by simply replacing Σn with Σ n 15 The identification of the parameters in a GMM framework requires limn→∞ 1n Eðg n ðθ0 ÞÞ ¼ 0 (Newey and McFadden, 1994). Lin and Lee (2010) investigate this condition and state the identification conditions for the parameters. Here, we simply assume that the parameters are identified. 14
O. Doğan, S. Taşpınar / Regional Science and Urban Economics 45 (2014) 1–21
that a consistent estimator is
1 ^′ ^ −1 ^ −1 Γ n Ωn Γ n , which is constructed from n
a consistent estimator of θ0. Note that Pn∗ and Q n also involve unknown parameters. In practice, an initial consistent estimator can be used to construct these matrices. For example, an initial estimator can be a GMM estimator based on the moment functions involving P+ ′Wn − Diag(Wn′Wn) and Q + n = Wn n = (W2nXn, WnXn, Xn).16 4. Bayesian MCMC estimation of spatial autoregressive models The model in Eq. (2.1) under Assumption 1 has an unknown form of heteroskedasticity. For Bayesian estimation, we assume that Σn = σ02Vn, where Vn = Diag(v1,…,vn) is a diagonal n × n matrix with σ2 the diagonal elements vi ¼ ni2 for i = 1,…,n. This assumption σ0 indicates that the heteroskedasticity specification has two components: (i) a constant component σ02, and (ii) a component vi that varies over observations. The constant component is an arbitrary term that facilitates the Bayesian analysis.17 In this section, we review the Bayesian MCMC approach for the spatial autoregressive model. Let v = (v1,…,vn)′ be an n × 1 vector. Then, the likelihood function of this model can be written as − n 2 2 −1=2 ∏ vi L Y n jθ; σ ; ν ¼ ð2πÞ σ jSn ðλÞj i¼1 ( ) −1 1 ′ 2 exp − ðSn ðλÞY n −X n βÞ σ V n ðSn ðλÞY n −X n βÞ : 2 −n 2
1 1 r ~ G(m,k), (iv) λ∼U − ; , where U(∙) denotes uniform distribuτ τ tion function, and τ is the spectral radius of Wn.18 All prior distribution functions are assumed to be independent. The r prior distribution of is considered by Geweke (1993). A non-spatial vi linear regression model with covariance matrix σ20Vn, where vis are independently and identically distributed, corresponds to a linear regression model with disturbances having a scale mixture of normal distribution (Geweke, 1993; Chib, 2001). In particular, Geweke (1993) shows that the posterior distribution function of the parameter of a non-spatial linear regression model with Y ni ∼ N(X ni′β 0 , σ 20 v i ), r where ∼ i.i.d. χ2(r) for i = 1,…,n is proportional to the posterior vi distribution function of a non-spatial linear regression model with Yni ∼ t(Xni′β0, σ02, r), where t(∙) denotes the univariate Student-t distribution with mean Xni′β0, variance parameter σ20, and degrees-offreedom parameter r.19 The prior density functions are given in the following equations. −k=2
pðβÞ ¼ ð2πÞ
It will be convenient to write the exponent of the above likelihood function in terms of generalized least squares (GLS) quantities. To this end, for a given value of λ, the GLS estimator from the regression −1 e ¼ X ′ V −1 X n X ′n V −1 Sn(λ)Yn = Xnβ + εn is β n n n n Sn ðλÞY n . Let SSE ¼ ′
−1 e e be the weighted residual σ 2V n Sn ðλÞY n −X n β Sn ðλÞY n −X n β n n sum of squares. Then, the exponent of the likelihood function can be written as −1 1 ′ 2 exp − ðSn ðλÞY n −X n βÞ σ V n ðSn ðλÞY n −X n βÞ 2 ( !) 1 e ′ X ′ σ 2 V −1 X e SSE þ β−β β− β ¼ exp − : n n n n n 2
jT j
−1=2
n o ′ −1 exp ðβ−cÞ T ðβ−cÞ ;
a b 2 −ðaþ1Þ b 2 σ p σ ¼ exp − 2 ; Γ ðaÞ σ
ð4:4Þ ð4:5Þ
pðν Þ ¼
r nr=2 h r i−n n r −ðrþ2Þ=2 Γ ∏ vi exp − ; 2 2 2vi i¼1
ð4:6Þ
pðr Þ ¼
n ro 1 1 m−1 r exp − ; for r N 0; and m; k N 0; km Γ ðmÞ k
ð4:7Þ
n 2
ð4:1Þ
5
8 <τ ; pðλÞ ¼ 2 : 0
1 1 if λ∈ − ; τ τ otherwise:
ð4:8Þ
Note that p(v) is the joint prior distribution function of vi for i = 1,…, n.20 In the formulation of prior density functions, Γ(∙) denotes Gamma a − 1 −μ function defined by Γ(a) = ∫ ∞ e dμ. The uniform prior distribu0 μ tion for the autoregressive parameter in Eq. (4.8) indicates that all values 1 1 in the interval − ; are equally probable, where τ is the spectral radius τ τ
ð4:2Þ
of Wn. Parent and LeSage (2008) introduce an alternative prior distribution function for λ based on Beta function. In this formulation ⋆
d−1
d−1
1 ð1 þ λÞ ð1−λÞ Betaðd; dÞ 22d−1 d−1 d−1 Γ ðdÞΓ ðdÞ ð1 þ λÞ ð1−λÞ ¼ ; 2d−1 Γ ð2dÞ 2
p ðλÞ ¼
Thus, the likelihood function in Eq. (4.1) can be written in terms of GLS quantities in the following way: −n n −n 2 2 −1=2 LðY n jθ; σ ; νÞ ¼ ð2π Þ 2 σ 2 ∏ vi jSn ðλÞj i¼1 ( !) 1 e e ′ X ′ σ 2 V −1 X : β−β exp − SSE þ β−β n n n n n 2
ð4:3Þ Following Geweke (1993) and LeSage and Pace (2009), we assume the following prior distributions for the parameters of SARAR(1,0): (i) β ~ N(c,T), (ii) σ 2 ~ IG(a,b), where IG(a,b) is the inverse gamma density function with the shape parameter a and r the scale parameter b, (iii) ∼ i.i.d. χ2(r), for i = 1,…,n, where the vi degrees of freedom parameter r has gamma distribution, namely 16 Note that P+ n along with Wn is considered by Kelejian and Prucha (2010) for a two-step GMM estimation method for a SARAR(1,1) model. 17 For the case of non-spatial models, see Koop (2003) and Griffiths (2007).
ð4:9Þ
where Beta(d,d) = ∫ t0 μ d − 1 (1 − μ) d − 1 dμ is the standard Beta function. Parent and LeSage (2008) show that when d is close to unity and λ ∈ (− 1,1), p⋆(λ) produces uninformative prior over the interval (−1,1) such that the end-points get zero prior weight, which is consistent with the parameter space of λ.21 18 Note that the parameter space of autoregressive parameter is ð−1τ; 1τÞ. For a proof, see Kelejian and Prucha (2007). 19 Thus, t(Xni′β0, σ20, r) can be treated as a mixture of normal and chi-square distributions. For the multivariate version, see Liu and Rubin (1995). 20 The density of vi a can be obtained from that of r/vi. To this end, for any r, let X ¼ vri with density function PX(∙). Define Y = r/X = v i, then the derived density function
. The straightforward calculation shows that pðvi Þ ¼ of Y is given by pY ðyÞ ¼ pX ðr=yÞ dX dY n o − r=2 −1 ½2r ½Γ ðr=2Þ vi exp −2vr . Let G(∙) denote the gamma distribution function. Note ðrþ2Þ 2
i
2
that 1/v i ~ χ (r)/r. Since χ 2 (r)/r = G(r/2,2/r), we have 1/v i ~ G(r/2,2/r) which implies that v i ~ IG(r/2,r/2). 21 Note that, the parameter space of autoregressive parameters is (–1,1), when the weight matrices are row normalized. If weight matrices are not row normalized, then (In–λ0Wn) can be singular for certain values from the interval (–1,1).
6
O. Doğan, S. Taşpınar / Regional Science and Urban Economics 45 (2014) 1–21
Let p(β,σ 2 ,v,r,λ|Y n ) be the joint posterior density function of parameters. Bayes' Theorem indicates that the joint posterior density is proportional to the product of the likelihood function and prior densities. Under the assumption that prior distributions of parameters are independent, the joint posterior density can be written as 2 2 2 p β; σ ; ν; r; λjY n ∝LðY n jθ; σ ; v1 ; …; vn ÞpðβÞp σ pðνÞpðr ÞpðλÞ − n 2 −1=2 ∝ð2πÞ σ ∏ vi jSn ðλÞj i¼1 1 e e ′ X ′ σ 2 V −1 X β−β exp − SSE þ β−β n n n n n 2 n o −ðaþ1Þ b −1=2 ′ −1 2 jTj exp ðβ−cÞ T ðβ−cÞ σ exp − 2 σ r nr=2 h r i−n n n ro r −ðrþ2Þ=2 m−1 pðλÞ: r Γ ∏ vi exp − exp − 2 2 2v k i¼1 i −n 2
n 2
ð4:10Þ
The MCMC approach requires a full set of conditional posterior density functions of parameters to form an algorithm through which random draws (or simulated sample) can be obtained. The simulated sample obtained from the conditional posterior densities is a Markov chain with the property that its limiting distribution is the joint posterior distribution (Albert and Chib, 1993; Casella and George, 1992; Chib, 2001; Tierney, 1994). Conditional on the other parameters of the model, the conditional posterior density of β can be obtained by collecting all terms in Eq. (4.10) that are not multiplicatively separable from components that include β. Let p(β|σ2,v,r,λ,Yn) be the conditional posterior density function of β. Then, 1 2 e e ′ X ′ σ 2 V −1 X β−β pðβ jσ ; ν; r; λ; Y n Þ∝exp − SSE þ β−β n n n n n 2 ′ −1 þðβ−cÞ T ðβ−cÞ :
ð4:11Þ By completion of square (see Lemma 1 in Appendix), the last two terms of the exponent in Eq. (4.11) can be written as
e ′ X ′ σ 2 V −1 X e þ ðβ−cÞ′ T −1 ðβ−cÞ β−β β−β n n n n n −1 ⋆ ′ ′ 2 −1 ⋆ Xn σ V n Xn þ T β−β ¼ β−β −1 −1 −1 ′ 2 e −c ; e −c −1 X σ V X þ T β þ β n n n n n
ð4:12Þ
2 −1 −1 2 −1 ) (Xn′V−1 c). Using where β⋆ = (Xn′V−1 n Xn + σ T n Sn(λ)Yn + σ T Eq. (4.12), the conditional posterior density of β can be written as
−1 1 2 ⋆ ′ ′ 2 −1 ⋆ ; β−β pðβjσ ; ν; r; λ; Y n Þ∝exp − β−β Xn σ V n Xn þ T 2
Now, we return to joint posterior density in Eq. (4.10) to determine the conditional posterior distribution of σ2. By collecting all terms in Eq. (4.10) that are not multiplicatively separable from the components that include σ2, the conditional posterior density of σ2 can be obtained as follows: ( ) −
1 2 2 e Þ′ X ′ σ 2 V −1 X Þ β−β e p σ β; ν; r; λ; Y n Þ∝ σ exp − SSE þ β−β n n n n n 2 −ðaþ1Þ b 2 exp − 2 σ σ ( ) − −ðaþ1Þ 1 ðS ðλÞY n −X n β Þ′ V −1 2 n ðSn ðλÞY n −X n β Þ þ 2b exp − 2 n ∝ σ : 2 σ n 2
n 2
ð4:15Þ The comparison of the above kernel with that of an inverse gamma density function indicates that the conditional posterior distribution of σ2 is an inverse gamma distribution with the shape parameter a⋆ ¼ n2 þ a ⋆
and with the scale parameter b ¼ Thus, 2 ⋆ ⋆ σ jβ; ν; r; λ; Y n ∼IG a ; b :
−1 −1 2 ⋆ ′ 2 −1 Xn þ T βjσ ; ν; r; λ; Y n ∼N β ; X n σ V n :
ð4:14Þ
ð4:16Þ
In the same way, the conditional posterior distribution of v can be obtained by ignoring all terms in the joint posterior density function that are not related to v. Thus, n −1 1 2 −1=2 ′ 2 p νjβ; σ ; r; λ; Y n ∝∏ vi exp − ðSn ðλÞY n −X n βÞ σ V n ðSn ðλÞY n −X n βÞ 2 i¼1 n
−ðrþ2Þ=2
∏ vi i¼1
ð4:17Þ
r exp − : 2vi
For notational simplification, let en = Sn(λ)Yn–Xnβ. Then ðSn ðλÞY n −
−1
−1 e2 n ðSn ðλÞY n −X n βÞ ¼ e′n σ 2 V n en ¼ ∑ j¼1 2nj , where enj X n βÞ′ σ 2 V n σ vj
is the jth element of en. Then, Eq. (4.17) can be written as 8 9 2
The above result shows that the conditional posterior density of each vi can be written as
ð4:13Þ where the terms that are multiplicatively separable from β are absorbed into the constant of proportionality. The kernel of p(β|σ2, ν, r, λ, Yn) in Eq. (4.14) is proportional to the kernel of a multivariate normal density with mean β⋆ and covariance (Xn′(σ2Vn )− 1Xn + T− 1)− 1. Thus,
ðSn ðλÞY n −X n β Þ′ V −1 n ðSn ðλÞY n −X n β Þ þ 2b . 2
pðvi jβ; σ
2
9 8 <− σ −2 e2ni þ r = ; for i ¼ 1; …; n: ; 2vi
−ðrþ3Þ=2 ; r; λ; Y n Þ∝vi exp :
ð4:19Þ
The functional form in the right hand side of Eq. (4.19) does not correspond to any known form of density function. Geweke (1993, p.S26) suggests a method through which the conditional posterior distribution of v i can be determined. Let Ψ ¼
−2 2
σ eni þ r . vi
Then, the
O. Doğan, S. Taşpınar / Regional Science and Urban Economics 45 (2014) 1–21
result in Eq. (4.20) implies that the density function of Ψ is proportional to Ψ(r − 1)/2exp{− Ψ/2}, which implies22 σ −2 e2ni þ r
2 2
β; σ ; r; λ; Y n ∼χ ðr þ 1Þ; for i ¼ 1; :::; n: 2vi
ð4:20Þ
Now, we return to conditional posterior distribution of the hyperparameter r. By collecting all terms that are not multiplicatively separable from the components that include r from joint posterior density yields, r nr=2 h r i−n n r ðk þ 2vi Þ 2 m−1 −r=2 pðr jβ; σ ; ν; λ; Y n Þ∝ r Γ ∏ vi exp − : 2 2 2kvi i¼1 ð4:21Þ The above density does not correspond to a standard distribution, therefore a Metropolis Hasting type algorithm can be employed to obtain random draws. Finally, the conditional posterior distribution of λ takes the form: −1 1 2 ′ 2 pðλjβ; σ ; ν; r; Y n ∝jSn ðλÞjexp − ðSn ðλÞY n −X n βÞ σ V n ðSn ðλÞY n −X n βÞ : 2
ð4:22Þ The uniform prior density for λ, which is absorbed in the proportionality constant is used for the result in Eq. (4.22). For the case where prior takes the form suggested by Parent and LeSage (2007), the conditional posterior distribution of λ takes the following form: −1 1 2 ′ 2 pðλjβ; σ ; ν; r; Y n Þ∝jSn ðλÞjexp − ðSn ðλÞY n −X n β Þ σ V n ðSn ðλÞY n −X n β Þ 2 1 ð1 þ λÞd−1 ð1−λÞd−1 : Betaðd; dÞ 22d−1
ð4:23Þ
Both conditional posterior densities in Eqs. (4.22) and (4.23) are not in the form of a known density. Like hyper-parameter r, sampling for λ can be accomplished through the Metropolis–Hasting approach.23 4.1. Bayesian MCMC computation In Bayesian analysis, the joint posterior distribution of parameters embodies all information about the parameters. A cost (loss) function is specified as a criterion for determining an optimal point estimate. Let ζ = (λ,β′,σ 2 ,ν,r)′ be the parameter vector. One of the most popular function is loss ′ the weighted squared error function, C ζ^ ; ζ ¼ ζ^ −ζ Q ζ^ −ζ , where Q is a positive definite matrix. Then, the Bayesian point estimate is defined by h i ζ^ ¼ argminζ^ Eζ jY n C ζ^ ; ζ Z ′ ζ^ −ζ Q ζ^ −ζ pðζ jY n Þdζ ; ¼ argminζ^
ð4:24Þ
Θ
where p(ζ|Y n ) is the posterior density of ζ. The solution of above minimization problem yields ζ^ ¼ Eðζ jY n Þ ¼ ∫Θ ζpðζ jY n Þdζ , which is the mean of the posterior distribution of ζ (Koop et al., 2007, p.34). The closed form solution for ζ^ from the joint posterior 22
This result can be seen through the derived density function of Ψ. Since dv ¼ −ð Ψ Þ, dΨ n o −2 2
σ e þr Þ ðr−1Þ=2 ðσ e þrÞ 1 Ψ exp − σ e þ r ∝Ψ exp −Ψ=2 f g. The ni 2 Ψ ðΨÞ ðσ e þrÞ
then pðΨÞ∝ð
i
−ðrþ3Þ=2 −2 2 ni −ðrþ3Þ=2
−2 2 ni
−2 2 ni 2
kernel of p(Ψ) implies Ψ ~ χ2(r + 1). 23 The Metropolis-Hasting algorithm is described in Section 4.1.
σ −2 e2ni þr 2
7
density functions in Eq. (4.10) can not be obtained. In the MCMC approach, a Markov chain is constructed from the conditional posterior distributions so that a large number of a simulated sample of parameters can be obtained. The mean of the simulated sample is then used as an estimate of E(ζ|Y n ) (Casella and George, 1992; Chib, 2001). The previous section shows that the joint posterior densities of spatial autoregressive models can be decomposed into conditional posterior density functions of parameters through which random draws can be obtained. When conditional posterior distributions are known, the Gibbs sampling method can be used to draw from conditional distributions. As the conditional posterior distributions of β, σ2 and vi are in the form of known distributions, random draws can be easily obtained through the Gibbs sampling method for these parameters. Unfortunately, the conditional posterior distributions of the autoregressive parameter and hyper-parameter r do not correspond to known distributions. The sampling for these parameters can be accomplished through a Metropolis–Hasting algorithm. As a result, the sampling scheme for spatial autoregressive models consists of the combination of the Gibbs sampling and the Metropolis–Hasting algorithm. This whole procedure of sampling is called Metropolis within Gibbs sampling (Geweke and Keane, 2001). The Metropolis–Hasting sampling method requires a proposal distribution (or candidate generating distribution) from which the candidate values for the autoregressive parameters can be generated. LeSage and Pace (2009) and Kakamu and Wago (2008) use a normal distribution as the proposal distribution with a tuned random-walk procedure.24 According to this method, the candidate values for λnew are generated by
λ
new
¼λ
old
þ c ϕ;
where
ϕ∼Nð0; 1Þ:
ð4:25Þ
The constant parameter c in Eq. (4.25) is called tuning parameter, which ensures that the sampler moves over the entire conditional posterior distributions of the autoregressive parameters. The increment random variable denoted by ϕ in Eq. (4.25) is symmetric so that the acceptance probability value for the candidate value λnew takes the following form: ( )
p λnew β; σ 2 ; ν; r; Y n Þ new old ¼ min 1; old ; p λ ;λ p λ β; σ 2 ; ν; r; Y n Þ
The new candidate λnew is accepted with a probability of p(λnew,λold). The tuning parameter or the spread of the candidate generating density affects the behavior of the chain in at least two ways: (i) it affects the acceptance rate of new candidate values through acceptance probability, (ii) it also affects the region where the new candidate values are sampled. To illustrate the role of the tuning parameter, consider a chain that has converged so that the new candidates are sampled around the mode of the marginal posterior densities of autoregressive parameters. If the tuning parameter is chosen to be large to allow the new generated candidates to be far from the current value (or mode), then the new candidates will have low probability of being accepted. In that case, the chain may get stuck at the current value (the candidates will never be accepted). Reducing the tuning parameter will correct this problem. On the other hand, a small tuning parameter value will reduce the chance of getting candidates that are far away from the current value. In this case, it will take longer time for the chain to traverse the support of the density, as a result, low probability regions of the density support will be under-sampled. These two issues raise the question of 24
Chib (2001) calls this method as random walk M–H chain.
8
O. Doğan, S. Taşpınar / Regional Science and Urban Economics 45 (2014) 1–21
Disturbance terms 10 5 0 −5 −10 −15
0
20
40
60
80
100
80
100
Group variances 15
10
5
0
0
20
40
60
a) Group variance sand disturbance terms of true data generating process Estimates of variance components: ν
Estimates of variance components: ν
i
15
15
r=5
r=5
r=6
r=10
r=7
r=50
10
10
5
5
0
0
20
40
60
80
i
100
b) Posterior mean of variance components ( )
0
0
20
40
60
80
100
c) Posterior mean of variance components ( )
Fig. 1. The effect of hyper-parameter r on the estimates of variance components vis. (a) Group variances and disturbance terms of true data generating process. (b) Posterior mean of variance components (vi). (c) Posterior mean of variance components (vi).
how to choose the optimal tuning parameter. For our case, the literature has suggested a tuning parameter that can result in approximately 0.5 acceptance rate (Chib, 2001; Chib and Greenberg, 1995). In practice, the tuning parameter c is adjusted in such way that the acceptance rate falls between 40% and 60%.25 LeSage and Pace (2009) consider a more efficient sampling procedure known as the Griddy Gibbs sampler for the autoregressive parameter. This procedure involves a numerical integration technique to calculate the normalizing constant and the related cumulative distribution function (CDF) of the autoregressive parameter. Then, the random draws are taken from the CDF by an inversion approach in each pass
through the sampler. In the Monte Carlo study, we compare the performance of this approach with the M–H random walk procedure. Finally, the elicitation of the parameters of the prior distributions for β and σ2 is required. The prior distributions for β can be made relatively diffuse by assigning a large prior variance. This can be accomplished by setting c = 0 and T = Ik × 10 5 . The conditional posterior distribution of σ2 is an inverse Gamma distribution. The analytical results provided for the mean of this distribution indicate that a relatively diffuse prior for σ 2 can be obtained by assigning values of a = 0 and b = 0.26
25 In practice, c can be set 0.5 at the initial step. LeSage and Pace (2009) suggest that c′ = c/1.1 for the case where the acceptance rate falls below 40%, and c′ = 1.1 for the case where the acceptance rate rises above 60%. This can be accomplished through a while loop in the algorithm.
26 For the case of SARAR(1,0), we show that σ2|β, ν, r, λ, Yn ∼ IG(a⋆ b⋆), which is given in
Eq. (4.17). Then, the conditional posterior mean E σ 2 jβ; ν; r; λ; Y n ¼ a b−1. The choice of a = 0 and b = 0 eliminates the effect of the prior distribution on the conditional posterior mean. ⋆
⋆
O. Doğan, S. Taşpınar / Regional Science and Urban Economics 45 (2014) 1–21
For illustration, a single pass through the Metropolis within Gibbs sampling scheme consists of the following steps: 1. Let ζ(0) = (λ(0),β′(0),σ2(0),ν(0),r(0))′ be the initial parameter values. 2. Update β: Draw β(1) from p(β|λ(0),σ2(0),v(0),r(0),Yn). 3. Update σ2: Draw σ2(1) from p(σ2|λ(0),β(1),r(0),Yn). σ −2 e2ni þ r ð0Þ ð1Þ from p jλ ; β ; σ 2ð1Þ ; r ð0Þ ; Y n for 4. Update v: Draw v(1) i vi ; σ 2ð1Þ ; r ð0Þ ; Y n Þ for i ¼ 1; …; n. new (0) 5. Update λ: Calculate λ = λ +(c × ϕ, ) p λnew jβ ð1Þ ; σ 2ð1Þ ; νð1Þ ; r ð0Þ ; Y n , (a) Calculate p λnew ; λð0Þ ¼ min 1; 0 ð1Þ p λ jβ ; σ 2ð1Þ ; νð1Þ ; r ð0Þ ; Y n (b) Draw U from Uniform(0,1), (1) new new (0) (c) Set λ = λ if p(λ ,λ ) N U, otherwise λ(1) = λ(0). The sequence {ζ (0) ,ζ (1) ,…,ζ (M) } obtained from the Metropolis within Gibbs sampling form a Markov chain whose probability density function converges to the joint posterior density p(ζ|Yn) as M goes to infinity (Chib, 2001; Tierney, 1994).27 To discard the effect of the initial value ζ(0) on the chain, iterates from some initial periods are usually excluded from the chain (burn-in repetitions). Let h be a function defined on the parameter space of the model. Under a suitable law of large numbers for Markov chains, it can be (j) shown that M− 1 ∑M j = 1 h(ζ ) → ∫ Θ h(ζ)p(ζ|Y n )dζ as M → ∞. Note that the convergence is established in terms of the simulated sample size M and not in terms of the data sample size n (Chib, 2001; Geweke and Keane, 2001; Hoogerheide et al., 2009). Thus, posterior moments such as mean and second moment can be easily estimated by taking h(ζ) = ζ and h(ζ) = ζζ′, respectively. The hyper-parameter r is not updated in the above illustrated Metropolis within Gibbs sampling scheme.28 The sampling from the conditional posterior density of the hyper-parameter r can be carried out with an argument similar to the one described above for λ. The hyper-parameter r plays a central role in generating heteroskedasticity robust estimates. In the preceding section, the prior of vi is assumed to be IG(r/2,r/2). Then, the prior mean and variance of vi are respectively given as
r r−2
and
2r 2 . r 3 −8r 2 þ 20r−16
The prior variance approaches to zero
and the prior mean approaches to 1 as r goes to infinity. This observation implies that vi converges in probability to its mean 1 as r → ∞. Thus, higher values of r are associated with a homoskedastic assumption and lower values imply a heteroskedastic assumption.29 In our Monte Carlo simulation we set r = 5 as the data generating process is assumed to be heteroskedastic. The hyper-parameter affects estimates of variance components v i s through the conditional posterior distribution given in Eq. (4.20). Fig. 1 illustrates the effect of this parameter on the posterior mean of v for a sample size of n = 100 for the case of a heteroskedastic SARAR(1,0) specification. For this illustration, the spatial weight matrix Wn is based on a small group interaction scenario, where group sizes are random draws from the interval (3,20).30 Fig. 1(a) shows true group variances and the resulting disturbance terms. The other
27 Theorem 1 of Tierney (1994) and Theorem 2 of Chib (2001) give conditions under which the convergence holds. Theorems 4–5 in Geweke (1993, p.S28–S29) show that these conditions hold for a non-spatial linear regression model. Therefore, it can be shown that these conditions also hold for linear spatial models. 28 Following LeSage and Pace (2009), we assume that this parameter is known. In the Monte Carlo design, we set r = 5. 29 Geweke (1993) shows that in the case where uninformative priors are assumed, the second moment of joint posterior density exists only if r N 4. For our case, the uninformative priors (uniform distributions) are assumed for the autoregressive parameters, therefore the lowest value that r can take is bigger than 4. LeSage and Pace (2009) suggest a value of r between 2 and 7 as an optimal choice for the estimation of heteroskedastic models. 30 The block diagonal weight matrix is row normalized. Members of a group have the same variance. For each group, if the group size is greater than 10, we set the variance equal to the group size; otherwise we set the variance to the square of the inverse of the group size. This setup is similar to the one in Lin and Lee (2010).
9
plots in Fig. 1 picture the effects of various r values on the estimates of variance components v i . Fig. 1(b) indicates that the estimates are similar for r = 5,6,7. On the other hand, Fig. 1(c) shows that the r-values bigger than 10 give estimates close to unity, suggesting a homoskedastic assumption. This example indicates that a hyperparameter value near 5 is an optimal choice for the estimation of heteroskedastic models. 5. Monte Carlo experiments In the previous sections, we provided analytical results for the estimators. The conditional posterior distribution of the parameters indicates that the Bayesian estimators resemble to the generalized least square (GLS) estimators such that the effect of outliers is downweighted in the estimation. In the robust GMM approach, the sample moment functions are weighted with the inverse of their covariances so that they are made as close as possible to zero. Despite these analytical results, it is difficult to make any general statement about the performance of estimators in finite samples. Hence, to evaluate the finite sample properties of estimators, we resort to a Monte Carlo simulation. 5.1. Design In this section, the small sample properties of the ML estimator, the robust GMM estimator and the Bayesian estimator are compared through Monte Carlo simulation experiments. The model is specified as Y n ¼ λ0 W n Y n þ X n β0 þ εn :
ð5:1Þ
There are two regressors and no intercept term such that Xn = [xn,1, xn,2] and β0 = (β10,β20)′ = (1,1)′, where xn,1 and xn,2 are n × 1 vectors of variables. We use the date set described in Kelley Pace and Barry (1997) on the votes cast in the 1980 presidential election across 3107 U.S. counties. We use the normalized version of the income per-capita and the homeownership from this data set for xn,1 and xn,2, respectively. These variables are normalized by subtracting the corresponding sample averages, and then dividing by the sample standard deviations. The first n observations of these variables are used in the Monte Carlo experiment. The spatial autoregressive parameter λ0 takes values from the set K = (−0.9,−0.6,−0.3,0,0.3,0.6,0.9) in order to allow for weak and strong spatial interactions. The row normalized spatial weight matrices are based on the interaction scenario described in Arraiz et al. (2010) and Drukker et al. (2013). In this interaction scenario, units are distributed across four quadrants of a space in a such way that the proportions of number of units in the northeast quadrant to the total number of units are allowed to vary by design. It is interesting to see the results for high density and low density cases. In a high density case, the units in the northeast portion are closer to each other. As stated in Arraiz et al. (2010), the location specification of units in this interaction scenario reflects the distribution of the U.S. states over space, where the northeastern portion contains relatively more states. The location of each unit across the space is determined by the xy-coordinates over a square grid. Let m and m be two integers. Then, the units in the northeast quadrant of the space have discrete coordinates satisfying m þ 1 ≤x ≤m and m þ 1≤y≤m, with an increment value of 0.5. For the other quadrants, the location coordinates are integers satisfying 1 ≤ x ≤ m, 1≤y≤m, and 1≤x ≤m, 1 ≤ y ≤ m. The distance dij between any two units i and j, located at (x1,y1) and (x2,y2) is measured by the Euclidean distance given by h i 2 2 1=2 dij ¼ ðx1 −x2 Þ þ ðy1 −y2 Þ :
10
O. Doğan, S. Taşpınar / Regional Science and Urban Economics 45 (2014) 1–21
Then, the (i,j) th element of the weight matrix is defined by wij ¼
is generally inconsistent. The log-likelihood function under the assumption that the disturbance terms are i.i.d. normal(0, σ 20 ) is given by
1 if 0 ≤dij ≤1; 0 otherwise:
n n 2 1 ′ 2 ln Ln θ; σ ¼ − lnð2πÞ− ln σ þ lnjSn ðλÞj− 2 εn ðθÞεn ðθÞ; ð5:4Þ 2 2 2σ
We generate heteroskedasticity by using the number of neighbors a unit has. Let hi be the number of neighbors for unit i. Then, the ith element εin of the innovation vector is generated in the following way ε ni ¼ σ ni ξni ; σ ni ¼ c Xn
n
hi
j¼1
h j =n;
ð5:2Þ
′ −1 E ε′n S−1 n Sn ε n 2 R ¼ 1− ′ ′ −1′ −1 β′0 X ′n S−1 S−1 n X n β 0 þ E ε n Sn Sn ε n −1′ −1 tr Sn Sn Σn : ¼ 1− ′ −1′ −1 β′0 X ′n S−1 S−1 n X n β 0 þ tr Sn Sn Σn
ð5:3Þ
5.2. Simulation results The simulation results are presented in Appendix B. In each table, the empirical mean (mean), the bias (Bias), the empirical standard error (SD) and the root mean square error (RMSE) of parameter estimates are presented near to each other for easy comparisons. Before presenting the simulation results, a couple of points for the MLE should be considered. Theoretically, the MLE of λ 0 For details, see Arraiz et al. (2010). For the RGMME, we use two initial estimators: (i) the2SLS estimator formulated from the instrument matrix (W 2n X n , W n Xn , X n ), and (ii) the initial GMM estimator based on Qn = (W2nXn, WnXn, Xn) and Pn = (Wn′Wn − Diag(Wn′Wn)). Both initial estimators produce similar results, therefore we only report the results based on the initial GMM estimator.
ð5:5Þ
The MLE of λ0 is the extremum estimator obtained from the maximization of the concentrated log-likelihood function. The necessary condi^ n requires plimn→∞ ∂lnLn ðλÞ ¼ 0. Lin and Lee tion for the consistency of λ ∂λ
(2010) show that this condition is not satisfied in the presence of heteroskedastic disturbances. More specifically, the condition is
cov Gn;ii ; σ 2ni σ2
þ op ð1Þ ,
∂lnLn ðλÞ ∂λ
¼
where G n,ii s are diagonal elements of G n and
^ n depends σ ¼ Hence, the inconsistency of the MLE λ on the covariance between the diagonal elements of Gn and the variance terms σ2nis. For certain spatial weight matrices, the diagonal elements of G n are the same, implying a zero covariance with the individual variances. For a circular world weight matrix with equal diagonal elements that relate each unit to the units in front and in back, there is no variation in the diagonal element of G n . In this case, the necessary condition is not violated, even if the disturbance terms are heteroskedastic. ^ n contaminates the MLE of β 0 , The inconsistency of λ ^ n . The MLE of β0 has an asymptotic bias since it is a function of λ −1 ^n X′ Xn X ′ Gn X n β0 . 33 For our Monte Carlo setup, there of λ0 −λ n 2
The Monte Carlo experiments are based on the combinations ðc; m; mÞ ¼ fð10; 5; 15Þ; ð10; 7; 21Þ; ð10; 14; 20Þ; ð10; 20; 28Þg, where the first two combinations yield high density cases whereas the last two combinations yield the low density cases in the northeastern portion.31 These combinations produce the corresponding sample sizes of n = {486,974,485,945}. In all experiments, the following estimators are considered: (1) the Gaussian maximum likelihood estimator (MLE), (2) the robust generalized method of moment estimator (RGMME) of (3.7),32 (3) the Bayesian MCMC estimator (BE1), where the Griddy Gibbs sampling method is used for λ, and (4) the Bayesian MCMC estimator (BE2), where the random-walk Metropolis Hasting algorithm is used for λ. For each specification, the Monte Carlo experiment is based on 1000 repetitions. For the BEs, we run the MCMC algorithm 6000 times and discard the first 1000 iterations.
32
^ ðλÞ. Concentrating out β and σ2 from the log-likelihood yields Y n −X n β n n n 2 ^ n ðλÞ þ lnjSn ðλÞj: ln Ln ðλÞ ¼ − lnð2π Þ− ln σ 2 2
where σni is the standard error for the i-th observation and ξni's are i.i.d. Normal(0,1). Here, the noise variance in a location is a function of a ratio of neighbors of that location to the average number of neighbors in the model. The idea is that having relatively larger number of neighbors in a neighborhood implies a larger noise variance. The constant c adjusts the signal–noise ratio of the model. For example, c = 10 yields an approximate signal–noise ratio of 0.55. We calculate the signal–noise ratio by the R2 statistic suggested in Pace et al. (2012):
31
where εn(θ) = Sn(λ)Yn–Xnβ. The solution of the first order conditions ^ ðλÞ ¼ X ′ X −1 X ′ S ðλÞY for β and σ2 yields the ML estimators β n n n n n n 2 1 ′ ^ ^ ^ ^ n ðλÞ ¼ ε n λ; βn ðλÞ εn λ; βn ðλÞ , where εn λ; βn ðλÞ ¼ Sn ðλÞ and σ
n ∑i¼1 σ 2ni .
1 n
is variation in the diagonal elements of G n , so the MLE of λ 0 and β0 is expected to produce inconsistent estimates. Table 3 shows the simulation results for ðc; m; m; nÞ ¼ ð10; 5; 15; 486Þ. The MLE imposes substantial amount of bias on λ0 for the case where there is negative strong spatial dependence relative to the case of strong positive spatial dependence. The same pattern shows itself for the BE estimates. The RGMME imposes relatively smaller bias on λ0 in all cases. In cases where there is a positive spatial dependence, the MLE and BEs underestimate λ0, while the RGMME overestimates. In terms of finite sample efficiency measured by RMSE, the BEs of λ0 are more efficient. Table 3 also reports the estimation results for β10 and β20. Theoretically, the bias reported for λ0 also contaminates the estimates of β10 and β20 for the case of the MLE (Lin and Lee, 2010). The results for β10 and β20 are consistent with this theoretical prediction. The MLE of β10 and β20 has more than 5% bias in many cases. In terms of finite sample efficiency and bias, the BEs are performing better than the RGMME. Table 4 reports the estimation results for ðc; m; m; nÞ ¼ ð10; 7; 21; 974Þ. The MLE results repeat the same pattern for λ0, β10 and β20. Again, the BEs impose substantial amount of bias on λ0 when there is a strong negative spatial dependence. In other cases, both BE1 and BE2 impose less than 5% bias on λ0. When there is positive spatial dependence, the MLE and the BEs underestimate λ0. The RGMME imposes almost no 33
For an asymptotic argument, see Doğan and Taşpınar (2013a).
O. Doğan, S. Taşpınar / Regional Science and Urban Economics 45 (2014) 1–21
bias on λ0 in all cases and is slightly more efficient. The finite sample efficiency is higher for all estimators relative to Table 3. For the case of β10 and β20, the estimators report similar results and the BEs are relatively more efficient. Table 5 reports the estimation results for ðc; m; m; nÞ ¼ ð10 ; 14; 20; 485Þ. Again, the MLE results repeat the same pattern for λ0, β10 and β20. That is, the MLE imposes relatively greater bias on λ0 when there is a strong negative spatial dependence. The BEs of λ0 impose less than 5% bias in many cases and are relatively more efficient. For the case of β10 and β20, the BEs are performing better than the RGMME in terms of bias and efficiency in most cases. Finally, Table 6 reports the estimation results for ðc; m; m; nÞ ¼ ð10; 20; 28; 945Þ. Again, the MLE results repeat the same pattern for λ0, β10 and β20. The RGMME imposes almost no bias on λ0, and it is as efficient as the BEs. Again, both BE1 and BE2 report almost identical results and they impose less than 5% bias on λ0 in many cases. For the case of β10 and β20, although the RGMME imposes relatively smaller bias, it is generally inefficient relative to the BEs. Overall, the MLE imposes substantial amount of bias on λ0 when there is strong negative spatial dependence in the model. The amount of bias does not decrease as the sample size increases. This pattern suggests inconsistency for the MLE. In the case of BEs, the amount of bias is generally greater relative to the RGMME. In terms of efficiency, the BEs are generally more efficient and the BE 1 performs better than the BE2. In spatial models, the marginal effect calculations for the exogenous variables require a different analysis. For the model in Eq. (5.1), the marginal effects are given by ∂Y n −1 ¼ ðI n −λ0 W n Þ In βk0 ; ∂x′n;k
for k ¼ 1; 2:
ð5:6Þ
The marginal effects take the form of n × n matrix (In–λ0Wn)−1Inβk0. LeSage and Pace (2009) suggest scalar summary measures for the marginal effects. They label the average of the diagonal elements as the direct effects and the average of either the row sums or the column sums of the non-diagonal elements as the indirect effects. The total effect is the sum of the direct and indirect effects. Tables 7 and 8 of Appendix B report the simulation results for the total effects.34 The general pattern from the estimates reported in these tables is that all estimators impose relatively larger bias on the total effect estimates when there is strong positive spatial dependence. Especially, the amount of bias is substantial when λ0 = 0.9. As the sample size increases, the amount of bias decreases more in the case of RGMME. In terms of bias, the RGMME and the BE1 perform better than the MLE, and the RGMME is slightly better than the BE 1. In terms of the finite sample RMSE, the BEs and the MLE perform better than the RGMME and the BE1 is relatively more efficient. In general, the Monte Carlo results show that the RGMME is relatively inefficient when the sample size is 486 and when there exists positive spatial dependence. The efficiency of the RGMME depends on the selection of the moment functions for the estimation. As stated in Section 3, the best selection of the quadratic moment matrices is not available for the case where disturbance terms satisfy Assumption 1. Hence, the best selection of the linear and quadratic moment functions in the presence of heteroskedastic disturbances would provide further efficiency improvement for the RGMME. 35 Another related issue is about the calculation of the measure of dispersion of the total effects. In practice, measures of dispersion of the total effects are needed to draw inferences 34 The results based on BE1 are reported. The total effects results of BE2 are similar and are available upon request. 35 The issue of selection of the moment functions in the heteroskedastic case is a potential topic to be explored in the future studies.
11
regarding the point estimates. LeSage and Pace (2009) suggest a simulation method for the calculation of the dispersions of total effects, where the parameter estimates and asymptotic covariances of estimators are used. Hence, in this simulation method, the tradeoffs between bias and precisions of estimators may determine the quality of the inference regarding the marginal effect of explanatory variables.36
6. Empirical illustrations 6.1. Housing price model In the first illustrative application, the Harrison and Rubinfeld (1978) data set described in Gilley and Kelley Pace (1996) is used to estimate a hedonic housing-price model for a sample of 506 observations of census tracts in the Boston Standard Metropolitan Area in 1970. This data set has been used by many authors to illustrate various econometric techniques, see for example LeSage (1999) and Baltagi (2008). In this application, the median house price is explained by thirteen variables that are described more fully in Table 9 of Appendix C: per-capita crime rate by town, proportion of residential land zoned for lots over 25,000 ft2, proportion of non-retail business acres per town, Charles River dummy variable, nitric oxides concentration parts per 10 million, average number of rooms per dwelling, proportion of owner-occupied units built prior to 1940, weighted distance to five Boston employment centers, index of accessibility to radial highways, full-value property tax rate, pupil–teacher ratio by town, proportion of blacks per town, and proportion of lower class per town.37 We construct a contiguity weight matrix following LeSage (1999, p.115).38 The estimation results are given in Table 1. First of all, the null hypothesis of no heteroskedasticity is rejected as indicated by the Breusch–Pagan LM tests and its Koenker version.39 The presence of heteroskedasticity can also be tested from the Bayesian perspective by investigating the posterior mean of variance components. Recall that the Bayesian estimators are based on the assumption that the covariance matrix Σn can be decomposed as σ2Vn, where Vn = Diag(v1,…,vn). Thus, the estimates of vis that deviate substantially from unity indicate the presence of heteroskedasticity. The posterior mean of variance components vis are presented in Fig. 2. The estimates of the variance components indicate that the constant variance assumption may not hold for this application. The parameter estimates in Table 1 are mostly in agreement except for the pollution (denoted by nox) and house age variables. The Bayesian estimator reports a negative insignificant estimate for the pollution variable.40 Both MLE and RGMME report negative significant estimates for the pollution variable, which represents important policy implications for this variable. In the case of house age variable, the Bayesian estimate is negative and significant, while both MLE and RGMME report insignificant positive estimates. 36
This issue is pointed out by a referee, and we left it for future studies. The dependent variable is in log form. Following LeSage (1999), we scale each variable by subtracting the mean and dividing with the standard deviation. 38 We use the ‘xy2cont’ function of Spatial Econometric Toolbox to create the weight matrix, which returns three weight matrices: W1, W2, and W3. Following LeSage (1999, p.115), we use W2 for this application. 39 The LM test is specified based on the square of the crime, rooms, tax rate, and pupil/teacher variables. We examined the relationship between the residuals and the explanatory variables: The scatter plots indicate that the most likely candidates are these variables. 40 LeSage (1999, p.117) also reports an insignificant estimate for this variable. In the case of Charles River dummy variable, spatial model specifications have insignificant estimates. LeSage (1999) states that this locational dummy variable is unnecessary, since the spatial models account explicitly the spatial nature of the data. 37
12
O. Doğan, S. Taşpınar / Regional Science and Urban Economics 45 (2014) 1–21
Posterior mean of variances
25
SARAR(1,0) Heteroscedastic variance
20
15
10
5
0
0
50
100
150
200
250
300
350
400
450
500
Observations Fig. 2. Estimates of vi: Housing Price Model.
Table 1 Hedonic Housing-Price. Parameter estimates Log(hprice) Crime Zoning Industry Charlesr Nox Rooms House age Distance Access Tax rate Pupil/teacher Black pop Lower class λ BP LM BP Koenker Jarque-Bera R σ
2 ̅
Total effects
ML
RGMM
BE
ML
RGMM
BE
−0.1648 [−6.8730] 0.0808 [3.0149] 0.0443 [1.2565] 0.0167 [0.8969] −0.1288 [−3.4138] 0.1609 [6.5518] 0.0186 [0.5971] −0.2149 [−6.0965] 0.2719 [5.6216] −0.2209 [−4.1614] −0.1015 [−4.0559] 0.0774 [3.7669] −0.3361 [−10.1178] 0.4549 [12.5379] 142.5982 [0.0000] 45.1843 [0.0000] 475.8595 [0.0010] 0.7709 0.1575
−0.1294 [−3.1908] 0.0805 [3.3951] 0.0468 [1.6834] −0.0384 [−1.4151] −0.1292 [−2.9714] 0.1329 [3.2520] 0.0272 [0.8053] −0.2041 [−5.3051] 0.2208 [4.0229] −0.2255 [−4.5818] −0.1201 [−4.8498] 0.0699 [2.6762] −0.3393 [−5.3235] 0.4373 [6.4776] –
−0.1154 [−4.6229] 0.0507 [2.4450] 0.0360 [1.3428] 0.0011 [0.0743] −0.0542 [−1.7469] 0.2974 [11.7856] −0.0598 [−2.2384] −0.1695 [−5.9347] 0.1647 [3.8799] −0.2045 [−4.7507] −0.0811 [−4.2052] 0.1073 [5.8033] −0.2188 [−6.4348] 0.4126 [12.3915] –
−0.2993 [−6.8828] 0.1471 [2.9246] 0.0862 [1.3406] 0.0304 [0.8799] −0.2361 [−3.5177] 0.2958 [5.9919] 0.0318 [0.5471] −0.3938 [−5.7026] 0.4983 [5.4681] −0.4088 [−4.1635] −0.1871 [−4.4142] 0.1437 [3.7259] −0.6179 [−10.9477] –
−0.2312 [−3.4600] 0.1438 [3.1578] 0.0849 [1.7181] −0.0707 [−1.3033] −0.2289 [−3.3085] 0.2394 [2.9171] 0.0507 [0.8341] −0.3613 [−5.2180] 0.3927 [4.0784] −0.4033 [−4.0740] −0.2142 [−5.4084] 0.1251 [2.7106] −0.6050 [−6.8317] –
−0.1972 [−4.4388] 0.0866 [2.4175] 0.0614 [1.3385] 0.0019 [0.0753] −0.0929 [−1.7354] 0.5079 [9.8774] −0.1024 [−2.2083] −0.2897 [−5.5538] 0.2815 [3.7533] −0.3494 [−4.5295] −0.1385 [−4.0704] 0.1832 [5.5015] −0.3736 [−6.0629] –
–
–
–
–
–
–
–
–
–
–
–
–
–
0.8313 –
0.7475 0.2525
– –
– –
– –
Note: The brackets contain t-statistics for the parameter estimates and p-values for the test statistics.
O. Doğan, S. Taşpınar / Regional Science and Urban Economics 45 (2014) 1–21
The estimate of the spatial autoregressive parameter λ is 0.4549 for the MLE, 0.4373 for the RGMME and 0.4126 for the BE. For similar sample sizes, our Monte Carlo results in Tables 3 and 5 show that both the MLE and BE underestimate this parameter, while the RGMME imposes almost no bias. The estimation results for λ from this empirical application are not consistent with the Monte Carlo results. The last three columns of Table 1 report the total effect estimates of the exogenous variables. Again the important difference is observed for the total effect of the pollution and house age variables. These differences imply important policy implications toward the effect of these variables on house prices. The t-statistics reported for the RGMM total effect estimates are relatively smaller in most cases, which are consistent with the Monte Carlo results of Tables 7 and 8 of Appendix B, where RMSE of the total effect estimates from the RGMME are relatively greater. The question that how to choose between estimators for this application should be considered along the specification issues for the housing price model.41 LeSage (1999) provides evidences that the SARAR(1,0) specification may not be the correct specification for this application. In the presence of mis-specifications, the performance of estimators can change as shown in the next application.
6.2. Dual gravity model This second empirical illustration checks for the robustness of some of the results presented in Behrens et al. (2012), which estimates the so-called gravity equation system using techniques borrowed from spatial econometrics. For the analysis bilateral trade flows, the gravity equation system yields convenient regression specifications. Although theoretical ground is simple and powerful, estimation is not straightforward owing to the problem of how to account for the interdependence among trade flows. If one were to implicitly assume that trade flows between any two trading partners are independent from the rest of the world, then including ad hoc regional remoteness indices or origin and destination specific fixed effects into the specification could provide a way to control for interdependence. However, the independence assumption between trade flows lacks theoretical foundation. Behrens et al. (2012) make the observation that structural estimation of gravity equation systems depends on the correct treatment of the unobservable price indices. Their idea is to utilize, instead, some other observed factor that has one-to-one correspondence with unobserved price indices. To do so, they derive a gravity equation from the quantity-based version of the constant elasticity of substitution (CES) model by making the observation that price indices are implicit functions of trade flows. Conveniently, the solution yields an econometric specification that properly accounts for interdependence between trade flows. Estimation can be carried out with spatial econometrics methods which explicitly impose certain structures on spatial dependence over cross sectional units via spatial
weight matrices. Moreover, the structural model yields the spatial dependence structure, which allows skipping extensive sensitivity analysis. The model consists of n regions, each of which is populated with workers who have inelastic labor supply. Region i is endowed with Li unit of labor, which is the only production factor. Let Z ij ≡
The specification for the spatial weight matrix should also be considered. We also investigated the performance of estimators for this application through a Monte Carlo study in which we use the data set of this application. We consider two data generating processes: (i) the true data generating process is characterized by the parameter estimates reported by the RGMME, (ii) the true data generating process is based on the parameter estimates reported by the BE. The heteroskedastic disturbances are generated by using posterior mean of vi. The simulation results for both data generating processes are inconsistent with the results reported by the RGMME and BE in Table 1. This result may indicate the presence of a model mis-specification for this application. The results are available upon request.
F ij Y iY j
be
the GDP-standardized trade flow from region i to region j, where F ij is the value of trade flows from i to j, and Y i and Y j stand for GDP level in i and j, respectively. The CES monopolistic competition model with free entry in Behrens et al. (2012) yields the following expression for the trade flows from region i to j log Z ij ¼ −σlog ðLÞ−ðσ −1Þ ! n n X X Lk Lk log τ ij − log τkj −σ log ðwi Þ−ðσ −1Þ log Z kj ; L L k¼1 k¼1 ð6:1Þ where L i is the population of region i, L ≡ ∑ nk = 1 L k is the total world population, τ ij is the iceberg trade costs of shipping one unit of any variety between region i and j, and σ N 1 denotes the constant elasticity of substitution between any two product varieties. The last term in Eq. (6.1) shows that trade flow between region i and j depends on all the trade flows from other regions k to region j. Since varieties are substitutes, higher flows from other regions k to region j reduce flows from region i to region j. Lower relative trade barriers measured by the deviation of bilateral trade barriers from the population weighted average, which are denoted by n L log τ ij −∑k¼1 k log τkj Þ, increase trade flows from i to j. Also, L
lower wages in the origin region i improve the competitiveness of region i's firms in market j and thus increase the trade flows from i to j. In matrix form Eq. (6.1) can be written as
ℤ ¼ −σ lnðLÞln2 −ðσ −1Þ In2 −W τ−σ w−ðσ−1ÞWℤ;
ð6:2Þ
where ℤ denotes the n2 × 1 vector of log of GDP-standardized trade flows stacked, l is the n2 × 1 vector of ones, τ is the n2 × 1 vector of the log trade costs, w is the n2 × 1 vector of log origin wages, In2 is the n2 × n2 identity matrix, and W = (S × D(L)) ⊗ In is the n2 × n2 spatial weight matrix. S is the n × n matrix whose elements are equal to 1 and D(L) denotes the n × n diagonal matrix with kth diagonal term of LL .42 Behrens et al. (2012) assume that the iceberg trade costs of shipping one unit of any variety between region i and j relate to distance and γ border effects by τij = dijexp(ξbij), where the distance between regions i and j is denoted by dij, and bij is a dummy variable taking the value 1 if the trade flows cross an international border and zero otherwise. Taking the logarithm and rewriting it in matrix form yields τ = γd + ξb, where d is the n2 × 1 vector with elements log(dij) and b is the n2 × 1 vector of dummy variables indicating cross-border flows. Then, e τ¼ e þ ξb e . Substituting back into ðIn2 −WÞτ ¼ ðIn2 −WÞðγd þ ξbÞ ¼ γd Eq. (6.2) yields k
e þ β w þ θb e þ λWℤ; ℤ ¼ β 0 ln 2 þ β 1 d 2
41
13
ð6:3Þ
where β0 ≡ −σln(L) b 0 is the constant term, β1 ≡ −(σ–1)γ b 0 is the distance coefficient of deviations from population weighted average 42 Note that the diagonal elements of W are not zero. The spatial weight matrices are characterized with zero diagonal elements. A zero diagonal weight matrix can be obtained by removing diagonal elements from W. Specifically, define Wdiag = D(L) ⊗ In. Then, subtracting (σ–1)diagℤ from both sides of (6.2) yields the zero diagonal weight matrix Wd = (W–Wdiag). However, this operation produces a heterogeneous coefficient model. In our application, we only consider the homogeneous coefficient model so that W is assumed to have zerodiagonal elements.
14
O. Doğan, S. Taşpınar / Regional Science and Urban Economics 45 (2014) 1–21
Table 2 SARAR(1,0) estimation results. log(Zij)(US − US flows × 1.3)
log(Zij)
Intercept log(wi) e dij e bij λ BP LM test
ML
RGMM
BE 1
ML
RGMM
BE1
−13.8985 [−32.0439] 1.1617 [−6.4775] −1.2550 [−36.6686] −1.0458 [−15.9843] 0.0299 [1.0677] 315.8548 [0.0000]
−14.2546 [−17.2922] 1.0889 [−5.2064] −1.2445 [−34.7267] −1.0631 [−13.3942] 0.0203 [0.5915] –
−13.2967 [−23.2082] 1.1122 [−7.0701] −1.2296 [−39.7895] −1.1767 [−19.2912] 0.0676 [3.2101] –
−14.0349 [−19.9543] 0.9393 [−5.2942] −1.2519 [−35.9740] −1.1493 [−17.3638] 0.0439 [1.5774] 286.0448 [0.0000]
−15.6591 [−18.8065] 0.8603 [−4.1377] −1.2336 −35.2560 −1.1606 [−15.0120] 0.0317 [0.9245] –
−13.5307 [−23.7556] −0.8627 [−5.5056] −1.2243 [−40.0829] −1.2944 [−21.7863] 0.0794 [3.8114] –
6.3691 [5.8951] 2.5237 [43.4416] 0.3962 [6.8206] 1.2717 [65.8322] 1.1277 [149.5478] 0.8868 [117.6011] 2.8459 [12.9498]
6.5670 [4.9702] 2.5626 [36.4753] 0.3902 [5.5544] 1.2767 [54.1974] 1.1299 [122.5164] 0.8850 [97.1922] 2.8955 [10.9439]
8.0300 [5.8121] 2.8337 [52.4833] 0.3529 [6.5359] 1.3105 [70.6978] 1.1448 [163.3460] 0.8735 [124.6458] 3.2439 [12.6654]
7.6498 [5.4576] 2.7658 [47.2068] 0.3616 [6.1710] 1.3023 [65.1785] 1.1412 [150.0577] 0.8763 [115.2290] 3.1563 [11.9126]
7.8032 [4.6372] 2.7934 [40.8218] 0.3580 [5.2314] 1.3056 [55.7965] 1.1426 [128.6453] 0.8752 [98.5322] 3.1919 [10.1149]
9.8882 [5.4914] 3.1446 [60.5362] 0.3180 [6.1220] 1.3464 [72.4465] 1.1603 [169.9878] 0.8618 [126.2545] 3.6487 [11.8836]
Border Effects CA (total) CA–CA (intra) CA–US (inter) US (total) US–US (intra) US–CA (inter) Average
Note: The brackets contain t-statistics for the parameter estimates and p-values for the test statistics. The BP LM test stands for the Breusch–Pagan Lagrange multiplier test of heteroskedasticity.
distances, β2 ≡ −σ b 0 is the coefficient of wage at the origin region, θ ≡ −(σ–1)ξ b 0 denotes the coefficient of border effects, and the spatial autoregressive parameter λ ≡ −(σ–1) b 0 captures the strength of the interdependence across trade flows, i.e., a measure of spatial competition. Adding a disturbance term to Eq. (6.3), we obtain the following SARAR(1,0) regression specification ℤ ¼ λWℤ þ Xγ þ ϵ;
ð6:4Þ
e w; b e , γ = (β ,β ,β ,θ)′, and ϵ is the n 2 × 1 where X ¼ ln2 ; d; 0 1 2 column vector of independently distributed disturbance term. 43 6.2.1. Data and the results The data set contains information on bilateral trade flows between 30 U.S. states and 10 Canadian provinces.44 The aggregate manufacturing exports F ij , internal absorption F ii , and regional GDPs Yi are measured in millions of U.S. dollars for the year 1993. The bilateral distances between the states and provinces using the great circle formula applied to the state and province capitals' geographic coordinates: between any two regions i and j
dij ¼ ℝ0 arccos cos ϱi −ϱ j cosðψi Þcos ψ j þ sinðψi Þsin ψ j ; ð6:5Þ 43 Behrens et al. (2012) makes the observation that the multilateral resistance variable and the border effects in any region may depend on a spatially weighted average of resistance variables and border effects in all the other regions. Hence, they conjecture that error terms may exhibit some form spatial dependence as well and consider two ways of introducing the γ error terms: (i) as noise in the measurement of trade costs, i.e., τij = dij exp(ξbij + ϵij), where ϵij is an i.i.d. normal error term; and (ii) as noise in the measurement of the trade flows, i.e., Zijreal ≡ Zijobsexp(ϵij). But, the authors assume a first-order spatial moving average process for the error process in an ad hoc manner and specify: u = ϵ–ρWdϵ, where W d ≡ (W–D(L)⊗In ) and ρ is the spatial moving average parameter. Then, Eq. (6.3) and the specification for u correspond to the SARMA(1,1) model. 44 Available at Robert C. Feenstra's data page: http://cid.econ.ucdavis.edu/.
where R0 is the Earth's radius, and ϱi and ψi denote region i's longi 1 surf i 1=2 to measure tude and latitude, respectively. We use dii ¼ 3
π
internal trade costs, where surfi refers to region i's surface. 45 The regional wages w i are average hourly manufacturing wages for the year 1988 at the state and province level. The regional and provincial population shares are computed from 1988 population figures. 46 The reason for choosing 5-year lagged values of regional population and wage values is to control for potential endogeneity. Behrens et al. (2012) successfully replicate the results in Anderson and van Wincoop (2003) and Feenstra (2004) and show that there remains a significant amount of cross-sectional correlation in the OLS residuals. Therefore, the OLS estimator is inefficient and may be biased. Their OLS estimates confirm that treating trade flows as independent yields puzzling large values for the border coefficient.47 Table 2 presents the ML, the RGMM, and the Bayesian (BE1) estimates for the SARAR(1,0) model. The columns 1–4 indicate that the ML results are similar with the results presented in Behrens et al. (2012). The Breusch–Pagan LM statistic provides strong evidence for heteroskedasticity in the ML residuals, suggesting inconsistency for the MLE. Fig. 3 shows the plots of the posterior mean of vis for this application. The substantial deviation of the posterior mean of vi from the unity also indicates the presence of heteroskedasticity in this application. The estimated parameter values for the spatial autoregressive parameter λ from all estimators are positive, which is opposite of the
45 Behrens et al. (2012) also employ two other measures of internal trade costs:
1=2 (i) 14min j dij , and (ii) 23ðsurf . π Þ 46 See Behrens et al., 2012, Appendix B) for further details. 47 McCallum (1995) finds that the Canadian provinces seem to trade 21.5 to 27 times more with themselves than with US states of equal size and distance. i
O. Doğan, S. Taşpınar / Regional Science and Urban Economics 45 (2014) 1–21
Posterior mean of variance components 16
15
Posterior mean of variance components 18
SARAR(1,0)
SARAR(1,0)
16
14
14
12
12 10 10 8 8 6 6 4
4
2 0
2
0
200
400
600 800 1000 1200 1400 1600 Observations
a) Posterior mean of
0
0
200
400
600 800 1000 1200 1400 1600 Observations
b) Posterior mean of
: US-US flows×1.3
Fig. 3. Posterior mean of variance components vis: dual gravity model. (a) Posterior mean of vi. (b) Posterior mean of vi: US–US flows × 1:3.
predicted sign. The RGMM and ML estimates of the spatial autoregressive parameter are insignificant, whereas the BE estimates are significant. The results are incompatible with the qualitative prediction of the model. Behrens et al. 2012 state that this anomaly stems from the specification adopted for the disturbance terms. There is no assumed process for the disturbance terms in a SARAR(1,0) specification. Behrens et al. (2012) show that when a SARAR(1,1) or a SARMA(1,1) specification is adopted, the estimate of λ turns to be negative and significant as predicted by the underlying theory. The disturbance process is specified as ϵ = ρWϵ + ζ for a SARAR(1,1) model and ϵ = ρWζ + ζ for a SARMA(1,1) model. Here, ζ is n2 × 1 vector of innovations and ρ is another dependence parameter. In both specifications, the diagonal elements of the covariance matrix of ϵ are not the same, suggesting heteroskedasticity. Hence, when the true data generating process is characterized by a SARAR(1,1) or a SARMA(1,1) model, the regression disturbance terms, i.e., the elements of ϵ, are heteroskedastic even if the elements of innovation vector ζ are homoskedastic. Our results in Table 2 show that estimating the misspecified model SARAR(1,0) instead of SARAR(1,1) or SARMA(1,1), the heteroskedasticity robust estimators do not yield the predicted sign for λ. Hence, the correct specification of the disturbance process plays a vital role for the estimation of the autoregressive parameter for this application. Turning to the explanatory variables, we see that all estimators capture the predicted signs. There does not seem to be a significant difference between the estimated coefficients. In addition, the estimated coefficients are almost identical to the ones reported for a SARAR(1,1) and a SARMA(1,1) model in Behrens et al. (2012). This result indicates that the coefficients of explanatory variables remain fairly stable across specifications. The bottom lines of Table 2 report the estimated border effects, which are defined as the ratio of trade flows in a world with border to that in a borderless world. Let Zij be the GDP standardized trade flows in a world with border, and Z ij be the GDP standardized trade flows in a borderless world. From Eq. (6.1), the logarithm of the border effects
Z log Bij ¼ log ij can be written as Z ij
Z ij log Bij ¼ log Z ij
!
n X Lk b ¼ θ bij − L kj k¼1
! þλ
n X L
k
k¼1
L
log
Z kj Z kj
! :
ð6:6Þ
The reduced form of the above equation can be written in vector form as follows
−1 B ¼ θ In2 −λW b;
ð6:7Þ
! Z where B is the n2 × 1 vector stacking log ij , and b is the n 2 × 1 Z ij n L vector with ijth element bij −∑k¼1 k bkj Þ . Under the assumption L
−1 that I n2 −λW exists, the above equation can be written as B ¼ ∞ l l θ∑l¼0 λ W b. The special structure of W implies that Wl b ¼ 0, for l = 1,2,…. (Behrens et al., 2012). Then, the log of border effects simplifies as
n X Lk b Þ¼ log Bij ¼ θðbij − L kj k¼1
f
X L −θ k∈US k if i ∈CA; j∈CA L X L θ k∈US k ifi∈CA; j∈US L X Lk θ k∈CA ifi∈US; j∈CA L X L −θ k∈CA k ifi∈US; j∈US L
ðCA−CAÞ; ðCA−USÞ; ðUS−CAÞ; ðUS−USÞ:
ð6:8Þ Thus, the border effects are decomposed into two components: the trade-boosting intra-national effect denoted by CA–CA and US–US, and the trade reducing international effect denoted by CA–US and US–CA. The trade boosting intra-national effect is defined as the increase in the intra-national trade due to the presence of the borders. The rationale is that the international borders protect domestic firms against the competition of foreign firms and thus create an advantage for the domestic firm. The trade reducing international effect is defined as the effect of the international border on trade flows across countries. The volume of trade flows will be higher in a borderless world than in a world with borders. Finally, the logarithm of total border effects is obtained by combining the trade boosting and reducing effects, which L are given by −2ξλ∑k∈US k for the Canadian provinces and −2ξλ ∑k∈CA
Lk L
L
for the US states.
16
O. Doğan, S. Taşpınar / Regional Science and Urban Economics 45 (2014) 1–21
The estimated border effects for Canadian provinces and US states in Table 2 are similar to those reported by Behrens et al. (2012) and they range from 6.3 to 9.8 for Canadian provinces and cluster around 1.3 for US states. 48 The total border effect for the Canadian provinces is estimated as 6.5670 by the RGMME in the second column. The same column shows that the total border effect of U.S. is 1.2767. As a robustness check, columns 4–6 of Table 2 replicates the results when all US state–state trade flows are inflated by a factor of 1.3, because the construction of data set involves scaling of shipping data, which may understate trade flows between US states while making the Canadian intra border effects seem significantly larger than that of US states.49 The estimated parameter coefficients are similar to the ones in columns 1–3. The estimated values of elasticity of substitution however are less than unity contrary to the model's assumption.
Rubinfeld (1978) data set described in Gilley and Kelley Pace (1996) is used to estimate a hedonic housing-price model for a sample of 506 observations of census tracts in the Boston Standard Metropolitan Area in 1970. The estimation results show that the parameter estimates for the pollution and housing age variables do not agree, which in turn imply important policy implication toward these variables. In the second application, we try to see the effect of ignoring potential heteroskedasticity in a gravity model of bilateral trade flows between Canadian provinces and US states. We replicate some of the results in Behrens et al. (2012). Although the structural model dictates negative cross-sectional correlation among trade flows, the heteroskedasticity robust RGMM and Bayesian estimators do not able to capture it. This result may indicate that the correct specification of the process for the disturbance terms is vital for the estimation of the autoregressive parameter in spatial models.
7. Conclusion
Appendix A. Some useful lemmas
This study examines the effect of unknown forms of heteroskedasticity on spatial models that have a spatial autoregressive process in the dependent variable. The GMM and Bayesian methodologies have the advantage that the estimator formulated from these estimation frameworks can be robust to an unknown form of heteroskedasticity. In the GMM framework, the unknown heteroskedasticity is incorporated into estimation through the formulation of appropriate moment functions. The quadratic moment functions considered for the estimation are designed in a such way that the orthogonality condition of the moment functions is not violated in the presence of heteroskedastic disturbance terms. In the Bayesian framework, the structure of the heteroskedasticity is assumed to have two components: (i) a constant component and (ii) a component that varies across observations. The important assumption of the Bayesian approach is that the component that varies over observations is assumed to be i.i.d. to facilitate the estimation. The mean of the resulting conditional posterior distribution of the parameters of the exogenous variables indicates that the Bayesian estimator mimics the generalized least squares estimator, where the aberrant observations are automatically down-weighted in the estimation. To compare finite sample properties of the heteroskedastiticy robust estimators, a Monte Carlo simulation is designed for the case of SARAR(1,0). The Monte Carlo simulation results show that the MLE of the autoregressive parameter imposes significant bias and the bias amount does not decrease as the sample size increases. The RGMME imposes trivial biases in all cases. The Bayesian estimator is efficient relative to the RGMME, but it imposes relatively larger bias on the autoregressive parameter when there is strong negative spatial dependence. The Monte Carlo results for the estimates of the total effects of exogenous variables show that the RGMME is slightly better than the Bayesian estimator in terms of bias. In terms of finite sample efficiency, the Bayesian estimator is more efficient. In the final section, we present two empirical illustrations to show how the heteroskedasticity robust estimators perform in applied research. In the first illustrative application, the Harrison and
Lemma 1. Completion of Square Let A and B be positive definite n × n matrices. Define C−1 = A−1 + B−1 and D = A + B. Let x, α and γ be n × 1 vectors. Then, there exists an n × 1 vector μ such that ′ −1
ðx−α Þ A
′ −1
ðx−α Þ þ ðx−γÞ B
−1
ðα−γÞ;
where μ = C(A−1α + B−1γ). Proof. See Abadir and Magnus (2005, Exercise 8.15, p. 216).
□
Lemma 2. Let An, Bn and Cn be n × n matrices with ijth elements respectively denoted by an,ij, bn,ij and cn,ij. Assume that An and Bn have zero diagonal elements, and Cn has uniformly bounded row and column sums in absolute value. Assume that εn satisfies Assumption 1 with covariance matrix denoted by Σn = Diag{σ2n1,…,σ2nn}. Then, (1)
n X n X ′ ′ 2 2 E ε n An εn εn Bn ε n ¼ an;ij bn;ij þ bn;ji σ ni σ nj i¼1 j¼1
′ ¼ tr Σn An Bn Σn þ Σn Bn 2 2 where Σn ¼ Diag σ n1 ; …; σ nn : (2) h i 2 4 4 cn;ii E ε ni −3σ ni
n X
2
Eðε n C n εÞ ¼
i¼1
2 n n X n X X 2 2 2 cn;ii σ ni Þ þ cn;ij cn;ij þ cn;ji σ ni σ nj þ i¼1
¼
n X
2 cn;ii
i¼1 j¼1
h i 4 4 2 E εni −3σ ni þ tr ðΣn C n Þ
i¼1
′ þ tr Σn C n C n Σn þ Σn C n Σn C n ; (3)
¼ Behrens et al. 2012 report that the border effects cluster respectively around 14.5 and 1.5 for Canadian provinces and US states when spatial dependence structure is ignored. 49 When constructing the data set, Anderson and van Wincoop (2003) employs a scaling scheme to US state–state trade flows taken from the Commodity Flow Survey in order to make it more comparable to the Canada–US flows.
′
ðx−μ Þ þ ðα−γÞ D
ðA:1Þ
varðε n C n ε Þ ¼
48
′ −1
ðx−γÞ ¼ ðx−μ Þ C
n X
n X n h i X 2 4 4 2 2 cn;ii E ε ni −3σ ni þ cn;ij cn;ij þ cn;ji σ ni σ nj
i¼1
i¼1 j¼1
n X
h i 2 4 4 ′ cn;ii E ε ni −3σ ni þ tr Σn C n C n Σn þ Σn C n Σn C n :
i¼1
Proof. See Lin and Lee (2010).
□
O. Doğan, S. Taşpınar / Regional Science and Urban Economics 45 (2014) 1–21
17
Appendix B. Monte Carlo results
Table 3 c ¼ 10; m ¼ 5; m ¼ 15; N ¼ 486. λ
−0.9 −0.6 −0.3 0.0 0.3 0.6 0.9
−0.9 −0.6 −0.3 0.0 0.3 0.6 0.9
−0.9 −0.6 −0.3 0.0 0.3 0.6 0.9
(Mean)[Bias](Std.)[RMSE]
(Mean)[Bias](Std.)[RMSE]
(Mean)[Bias](Std.)[RMSE]
(Mean)[Bias](Std.)[RMSE]
λML
λRGMM
λBE1
λBE2
(−0.7203)[0.1797](0.0551)[0.1879] (−0.4613)[0.1387](0.0751)[0.1578] (−0.2395)[0.0605](0.0838)[0.1033] (−0.0072)[−0.0072](0.0896)[0.0899] (0.2479)[−0.0521](0.0825)[0.0976] (0.5372)[−0.0628](0.0704)[0.0944] (0.8674)[−0.0326](0.0358)[0.0484]
(−0.9110)[−0.0110](0.1090)[0.1096] (−0.5912)[0.0088](0.1025)[0.1028] (−0.2982)[0.0018](0.1112)[0.1112] (−0.0031)[−0.0031](0.1086)[0.1086] (0.2913)[−0.0087](0.1017)[0.1021] (0.6005)[0.0005](0.1180)[0.1180] (0.9125)[0.0125](0.1052)[0.1059]
(−0.7589)[0.1411](0.0523)[0.1505] (−0.4804)[0.1196](0.0714)[0.1393] (−0.2477)[0.0523](0.0827)[0.0978] (−0.0079)[−0.0079](0.0884)[0.0887] (0.2501)[−0.0499](0.0806)[0.0948] (0.5424)[−0.0576](0.0688)[0.0897] (0.8699)[−0.0301](0.0336)[0.0451]
(−0.7557)[0.1443](0.0520)[0.1534] (−0.4784)[0.1216](0.0710)[0.1408] (−0.2463)[0.0537](0.0827)[0.0986] (−0.0064)[−0.0064](0.0884)[0.0886] (0.2514)[−0.0486](0.0806)[0.0941] (0.5437)[−0.0563](0.0688)[0.0889] (0.8709)[−0.0291](0.0336)[0.0444]
β10,ML
β10,RGMM
β 10;BE1
β10;BE2
(0.9261)[−0.0739](1.7065)[1.7081] (0.9745)[−0.0255](1.6667)[1.6669] (1.0712)[0.0712](1.6753)[1.6769] (1.0596)[0.0596](1.7130)[1.7141] (0.9477)[−0.0523](1.7354)[1.7362] (0.9920)[−0.0080](1.7060)[1.7061] (1.0623)[0.0623](1.7871)[1.7882]
(1.1290)[0.1290](1.7011)[1.7060] (1.0447)[0.0447](1.6786)[1.6792] (1.0948)[0.0948](1.6830)[1.6856] (1.0677)[0.0677](1.7169)[1.7183] (0.9509)[−0.0491](1.7225)[1.7232] (1.0264)[0.0264](1.6966)[1.6968] (1.1268)[0.1268](1.7835)[1.7880]
(0.9006)[−0.0994](1.5768)[1.5800] (0.9757)[−0.0243](1.5551)[1.5553] (1.0632)[0.0632](1.5654)[1.5667] (1.0704)[0.0704](1.6005)[1.6021] (0.9514)[−0.0486](1.6123)[1.6130] (0.9923)[−0.0077](1.5789)[1.5790] (1.0401)[0.0401](1.6837)[1.6842]
(0.8942)[−0.1058](1.5804)[1.5839] (0.9740)[−0.0260](1.5545)[1.5547] (1.0637)[0.0637](1.5659)[1.5672] (1.0723)[0.0723](1.6013)[1.6029] (0.9497)[−0.0503](1.6113)[1.6121] (0.9935)[−0.0065](1.5772)[1.5772] (1.0431)[0.0431](1.6822)[1.6827]
β20,ML
β20,RGMM
β 20;BE1
β20;BE2
(1.0875)[0.0875](1.7203)[1.7225] (1.0352)[0.0352](1.6866)[1.6869] (0.9308)[−0.0692](1.6607)[1.6621] (0.9321)[−0.0679](1.7374)[1.7387] (1.0569)[0.0569](1.7367)[1.7377] (1.0315)[0.0315](1.7241)[1.7244] (0.9725)[−0.0275](1.8035)[1.8037]
(0.8462)[−0.1538](1.7133)[1.7202] (0.9537)[−0.0463](1.7018)[1.7024] (0.9034)[−0.0966](1.6671)[1.6699] (0.9199)[−0.0801](1.7400)[1.7419] (1.0412)[0.0412](1.7237)[1.7242] (0.9730)[−0.0270](1.7139)[1.7141] (0.8681)[−0.1319](1.8027)[1.8076]
(1.1158)[0.1158](1.5641)[1.5684] (1.0343)[0.0343](1.5396)[1.5399] (0.9421)[−0.0579](1.5188)[1.5199] (0.9249)[−0.0751](1.5812)[1.5829] (1.0572)[0.0572](1.5793)[1.5803] (1.0295)[0.0295](1.5605)[1.5608] (0.9973)[−0.0027](1.6606)[1.6606]
(1.1232)[0.1232](1.5687)[1.5735] (1.0362)[0.0362](1.5386)[1.5390] (0.9412)[−0.0588](1.5185)[1.5197] (0.9229)[−0.0771](1.5819)[1.5838] (1.0583)[0.0583](1.5784)[1.5795] (1.0278)[0.0278](1.5584)[1.5586] (0.9932)[−0.0068](1.6591)[1.6591]
Table 4 c ¼ 10; m ¼ 7; m ¼ 21; N ¼ 974. λ
−0.9 −0.6 −0.3 0.0 0.3 0.6 0.9
−0.9 −0.6 −0.3 0.0 0.3 0.6 0.9
−0.9 −0.6 −0.3 0.0 0.3 0.6 0.9
(Mean)[Bias](Std.)[RMSE]
(Mean)[Bias](Std.)[RMSE]
(Mean)[Bias](Std.)[RMSE]
(Mean)[Bias](Std.)[RMSE]
λML
λRGMM
λBE1
λBE2
(−0.7196)[0.1804](0.0407)[0.1850] (−0.4653)[0.1347](0.0553)[0.1456] (−0.2398)[0.0602](0.0614)[0.0859] (−0.0000)[−0.0000](0.0652)[0.0652] (0.2604)[−0.0396](0.0589)[0.0710] (0.5461)[−0.0539](0.0475)[0.0718] (0.8721)[−0.0279](0.0248)[0.0373]
(−0.9032)[−0.0032](0.0675)[0.0676] (−0.5985)[0.0015](0.0743)[0.0743] (−0.3002)[−0.0002](0.0808)[0.0808] (0.0034)[0.0034](0.0778)[0.0779] (0.3016)[0.0016](0.0661)[0.0661] (0.5991)[−0.0009](0.0483)[0.0483] (0.9050)[0.0050](0.0591)[0.0593]
(−0.7616)[0.1384](0.0377)[0.1434] (−0.4864)[0.1136](0.0523)[0.1251] (−0.2506)[0.0494](0.0605)[0.0782] (−0.0003)[−0.0003](0.0638)[0.0638] (0.2630)[−0.0370](0.0587)[0.0694] (0.5515)[−0.0485](0.0463)[0.0671] (0.8746)[−0.0254](0.0237)[0.0348]
(−0.7598)[0.1402](0.0377)[0.1452] (−0.4849)[0.1151](0.0522)[0.1264] (−0.2492)[0.0508](0.0607)[0.0792] (0.0011)[0.0011](0.0639)[0.0639] (0.2644)[−0.0356](0.0588)[0.0687] (0.5530)[−0.0470](0.0464)[0.0661] (0.8760)[−0.0240](0.0237)[0.0337]
β10,ML
β10,RGMM
β 10;BE1
β 10;BE2
(1.0211)[0.0211](1.6046)[1.6047] (1.0065)[0.0065](1.6063)[1.6063] (1.0701)[0.0701](1.6015)[1.6030] (1.0649)[0.0649](1.6261)[1.6274] (0.9337)[−0.0663](1.6115)[1.6128] (0.9652)[−0.0348](1.6350)[1.6353] (0.9027)[−0.0973](1.6747)[1.6775]
(1.0610)[0.0610](1.6076)[1.6087] (1.0030)[0.0030](1.6123)[1.6123] (1.0658)[0.0658](1.6139)[1.6153] (1.0731)[0.0731](1.6267)[1.6284] (0.9535)[−0.0465](1.5974)[1.5981] (1.0018)[0.0018](1.6136)[1.6136] (0.9567)[−0.0433](1.6404)[1.6409]
(0.9887)[−0.0113](1.5532)[1.5532] (0.9683)[−0.0317](1.5478)[1.5481] (1.0779)[0.0779](1.5654)[1.5673] (1.0809)[0.0809](1.5482)[1.5503] (0.9377)[−0.0623](1.5471)[1.5483] (0.9817)[−0.0183](1.5528)[1.5529] (0.8988)[−0.1012](1.6249)[1.6281]
(0.9894)[−0.0106](1.5513)[1.5513] (0.9680)[−0.0320](1.5495)[1.5499] (1.0786)[0.0786](1.5674)[1.5694] (1.0809)[0.0809](1.5501)[1.5522] (0.9398)[−0.0602](1.5444)[1.5456] (0.9814)[−0.0186](1.5545)[1.5546] (0.9012)[−0.0988](1.6229)[1.6259]
β20,ML
β20,RGMM
β 20;BE1
β 20;BE2
(1.0287)[0.0287](1.6837)[1.6839] (1.0047)[0.0047](1.6904)[1.6904] (0.9319)[−0.0681](1.6776)[1.6789] (0.9348)[−0.0652](1.7068)[1.7081] (1.0642)[0.0642](1.6934)[1.6946] (1.0585)[0.0585](1.7133)[1.7143] (1.1256)[0.1256](1.7475)[1.7520]
(0.9457)[−0.0543](1.6895)[1.6903] (0.9909)[−0.0091](1.6999)[1.6999] (0.9311)[−0.0689](1.6916)[1.6930] (0.9217)[−0.0783](1.7087)[1.7105] (1.0330)[0.0330](1.6789)[1.6793] (0.9997)[−0.0003](1.6908)[1.6908] (1.0394)[0.0394](1.7094)[1.7099]
(1.0572)[0.0572](1.6210)[1.6220] (1.0455)[0.0455](1.6202)[1.6209] (0.9224)[−0.0776](1.6304)[1.6323] (0.9188)[−0.0812](1.6124)[1.6144] (1.0577)[0.0577](1.6188)[1.6199] (1.0392)[0.0392](1.6154)[1.6159] (1.1259)[0.1259](1.6853)[1.6900]
(1.0568)[0.0568](1.6193)[1.6203] (1.0459)[0.0459](1.6219)[1.6226] (0.9216)[−0.0784](1.6325)[1.6344] (0.9192)[−0.0808](1.6146)[1.6166] (1.0554)[0.0554](1.6158)[1.6168] (1.0389)[0.0389](1.6169)[1.6174] (1.1217)[0.1217](1.6828)[1.6872]
18
O. Doğan, S. Taşpınar / Regional Science and Urban Economics 45 (2014) 1–21
Table 5 c ¼ 10; m ¼ 14; m ¼ 20; N ¼ 485. λ
−0.9 −0.6 −0.3 0.0 0.3 0.6 0.9
−0.9 −0.6 −0.3 0.0 0.3 0.6 0.9
−0.9 −0.6 −0.3 0.0 0.3 0.6 0.9
(Mean)[Bias](Std.)[RMSE]
(Mean)[Bias](Std.)[RMSE]
(Mean)[Bias](Std.)[RMSE]
(Mean)[Bias](Std.)[RMSE]
λML
λRGMM
λBE1
λBE2
(−0.7979)[0.1021](0.0358)[0.1082] (−0.4761)[0.1239](0.0540)[0.1352] (−0.2326)[0.0674](0.0697)[0.0969] (−0.0031)[−0.0031](0.0741)[0.0742] (0.2378)[−0.0622](0.0780)[0.0998] (0.5254)[−0.0746](0.0663)[0.0998] (0.8604)[−0.0396](0.0361)[0.0536]
(−0.9074)[−0.0074](0.0646)[0.0650] (−0.5963)[0.0037](0.0653)[0.0654] (−0.2982)[0.0018](0.0906)[0.0906] (−0.0020)[−0.0020](0.0921)[0.0921] (0.2921)[−0.0079](0.0891)[0.0894] (0.5954)[−0.0046](0.0647)[0.0649] (0.8961)[−0.0039](0.0345)[0.0348]
(−0.8521)[0.0479](0.0274)[0.0552] (−0.5360)[0.0640](0.0480)[0.0799] (−0.2642)[0.0358](0.0625)[0.0720] (−0.0051)[−0.0051](0.0668)[0.0670] (0.2623)[−0.0377](0.0663)[0.0762] (0.5564)[−0.0436](0.0526)[0.0683] (0.8746)[−0.0254](0.0267)[0.0368]
(−0.8503)[0.0497](0.0274)[0.0567] (−0.5346)[0.0654](0.0481)[0.0812] (−0.2628)[0.0372](0.0625)[0.0728] (−0.0036)[−0.0036](0.0670)[0.0671] (0.2639)[−0.0361](0.0664)[0.0755] (0.5580)[−0.0420](0.0526)[0.0673] (0.8759)[−0.0241](0.0267)[0.0359]
β10,ML
β10,RGMM
β 10;BE1
β10;BE2
(1.2551)[0.2551](1.9547)[1.9713] (1.0615)[0.0615](1.8573)[1.8583] (0.9810)[−0.0190](1.8731)[1.8731] (1.0196)[0.0196](1.7893)[1.7894] (1.0065)[0.0065](1.9023)[1.9023] (1.0144)[0.0144](1.9185)[1.9185] (0.9293)[−0.0707](1.9317)[1.9330]
(1.0779)[0.0779](1.9049)[1.9065] (0.9797)[−0.0203](1.8598)[1.8599] (0.9801)[−0.0199](1.8646)[1.8647] (1.0574)[0.0574](1.7860)[1.7869] (1.0605)[0.0605](1.8507)[1.8517] (1.0679)[0.0679](1.8391)[1.8404] (0.9907)[−0.0093](1.8532)[1.8532]
(1.1025)[0.1025](1.6337)[1.6369] (1.0142)[0.0142](1.5712)[1.5713] (0.9874)[−0.0126](1.5379)[1.5380] (1.0121)[0.0121](1.4868)[1.4869] (1.0109)[0.0109](1.5385)[1.5385] (1.0228)[0.0228](1.5285)[1.5286] (0.9569)[−0.0431](1.5931)[1.5936]
(1.1071)[0.1071](1.6336)[1.6371] (1.0140)[0.0140](1.5726)[1.5727] (0.9858)[−0.0142](1.5400)[1.5401] (1.0119)[0.0119](1.4876)[1.4876] (1.0106)[0.0106](1.5382)[1.5382] (1.0220)[0.0220](1.5265)[1.5267] (0.9586)[−0.0414](1.5892)[1.5898]
β20,ML
β20,RGMM
β 20;BE1
β20;BE2
(0.8054)[−0.1946](2.0958)[2.1048] (0.9638)[−0.0362](1.9981)[1.9984] (1.0238)[0.0238](2.0178)[2.0180] (0.9696)[−0.0304](1.9381)[1.9383] (1.0087)[0.0087](2.0577)[2.0577] (1.0320)[0.0320](2.0703)[2.0706] (1.1401)[0.1401](2.0854)[2.0901]
(0.9142)[−0.0858](2.0485)[2.0503] (1.0154)[0.0154](2.0047)[2.0047] (1.0169)[0.0169](2.0113)[2.0113] (0.9249)[−0.0751](1.9356)[1.9371] (0.9291)[−0.0709](2.0037)[2.0049] (0.9267)[−0.0733](1.9850)[1.9864] (1.0059)[0.0059](2.0041)[2.0041]
(0.9299)[−0.0701](1.7327)[1.7341] (0.9977)[−0.0023](1.6740)[1.6740] (1.0123)[0.0123](1.6365)[1.6366] (0.9782)[−0.0218](1.5848)[1.5849] (0.9952)[−0.0048](1.6455)[1.6455] (1.0049)[0.0049](1.6295)[1.6295] (1.0850)[0.0850](1.6963)[1.6984]
(0.9262)[−0.0738](1.7330)[1.7346] (0.9982)[−0.0018](1.6750)[1.6750] (1.0140)[0.0140](1.6384)[1.6385] (0.9786)[−0.0214](1.5856)[1.5857] (0.9948)[−0.0052](1.6449)[1.6449] (1.0049)[0.0049](1.6274)[1.6274] (1.0811)[0.0811](1.6921)[1.6941]
Table 6 c ¼ 10; m ¼ 20; m ¼ 28; N ¼ 945. λ
(Mean)[Bias](Std.)[RMSE]
(Mean)[Bias](Std.)[RMSE]
(Mean)[Bias](Std.)[RMSE]
(Mean)[Bias](Std.)[RMSE]
λML
λRGMM
λBE1
λBE2
(−0.7963)[0.1037](0.0275)[0.1073] (−0.4706)[0.1294](0.0424)[0.1362] (−0.2329)[0.0671](0.0489)[0.0830] (0.0015)[0.0015](0.0587)[0.0587] (0.2421)[−0.0579](0.0591)[0.0827] (0.5269)[−0.0731](0.0538)[0.0907] (0.8602)[−0.0398](0.0276)[0.0484]
(−0.9006)[−0.0006](0.0371)[0.0371] (−0.5961)[0.0039](0.0497)[0.0498] (−0.3030)[−0.0030](0.0646)[0.0647] (0.0022)[0.0022](0.0722)[0.0722] (0.2964)[−0.0036](0.0667)[0.0668] (0.6013)[0.0013](0.0527)[0.0527] (0.8982)[−0.0018](0.0250)[0.0250]
(−0.8542)[0.0458](0.0202)[0.0500] (−0.5348)[0.0652](0.0364)[0.0747] (−0.2671)[0.0329](0.0435)[0.0545] (0.0010)[0.0010](0.0511)[0.0511] (0.2675)[−0.0325](0.0489)[0.0587] (0.5602)[−0.0398](0.0420)[0.0579] (0.8772)[−0.0228](0.0195)[0.0300]
(−0.8528)[0.0472](0.0201)[0.0513] (−0.5334)[0.0666](0.0365)[0.0760] (−0.2656)[0.0344](0.0434)[0.0554] (0.0025)[0.0025](0.0512)[0.0512] (0.2689)[−0.0311](0.0489)[0.0580] (0.5616)[−0.0384](0.0420)[0.0569] (0.8787)[−0.0213](0.0195)[0.0289]
β10,ML
β10,RGMM
β 10;BE1
β 10;BE2
−0.9 −0.6 −0.3 0.0 0.3 0.6 0.9
(1.1639)[0.1639](1.3388)[1.3488] (1.0914)[0.0914](1.3231)[1.3263] (1.1016)[0.1016](1.3052)[1.3091] (0.9676)[−0.0324](1.2531)[1.2535] (1.0165)[0.0165](1.2885)[1.2886] (0.9443)[−0.0557](1.2591)[1.2603] (0.9342)[−0.0658](1.3342)[1.3358]
(1.0210)[0.0210](1.2981)[1.2983] (1.0228)[0.0228](1.3082)[1.3084] (1.0753)[0.0753](1.3066)[1.3088] (0.9726)[−0.0274](1.2533)[1.2536] (1.0348)[0.0348](1.2742)[1.2747] (0.9747)[−0.0253](1.2198)[1.2200] (1.0016)[0.0016](1.2815)[1.2815]
(1.0792)[0.0792](1.2288)[1.2313] (1.0501)[0.0501](1.2298)[1.2308] (1.0902)[0.0902](1.2222)[1.2255] (0.9700)[−0.0300](1.2016)[1.2020] (1.0103)[0.0103](1.1888)[1.1888] (0.9769)[−0.0231](1.1702)[1.1704] (0.9722)[−0.0278](1.2024)[1.2027]
(1.0812)[0.0812](1.2293)[1.2320] (1.0501)[0.0501](1.2283)[1.2293] (1.0919)[0.0919](1.2214)[1.2249] (0.9708)[−0.0292](1.2024)[1.2028] (1.0089)[0.0089](1.1881)[1.1881] (0.9769)[−0.0231](1.1702)[1.1704] (0.9718)[−0.0282](1.2024)[1.2028]
β20,ML
β20,RGMM
β 20;BE1
β 20;BE2
−0.9 −0.6 −0.3 0.0 0.3 0.6 0.9
(0.9104)[−0.0896](1.4123)[1.4152] (0.9490)[−0.0510](1.4096)[1.4106] (0.9017)[−0.0983](1.3883)[1.3918] (1.0299)[0.0299](1.3247)[1.3251] (0.9876)[−0.0124](1.3666)[1.3667] (1.0860)[0.0860](1.3324)[1.3351] (1.1057)[0.1057](1.4140)[1.4180]
(0.9605)[−0.0395](1.3741)[1.3747] (0.9742)[−0.0258](1.3957)[1.3960] (0.9169)[−0.0831](1.3910)[1.3935] (1.0210)[0.0210](1.3250)[1.3252] (0.9528)[−0.0472](1.3510)[1.3518] (1.0149)[0.0149](1.2895)[1.2896] (0.9816)[−0.0184](1.3558)[1.3559]
(0.9466)[−0.0534](1.2838)[1.2849] (0.9702)[−0.0298](1.2991)[1.2994] (0.9087)[−0.0913](1.2894)[1.2926] (1.0258)[0.0258](1.2601)[1.2604] (0.9887)[−0.0113](1.2478)[1.2478] (1.0348)[0.0348](1.2278)[1.2283] (1.0441)[0.0441](1.2568)[1.2576]
(0.9457)[−0.0543](1.2840)[1.2852] (0.9706)[−0.0294](1.2974)[1.2977] (0.9070)[−0.0930](1.2887)[1.2921] (1.0248)[0.0248](1.2612)[1.2614] (0.9899)[−0.0101](1.2474)[1.2474] (1.0340)[0.0340](1.2278)[1.2282] (1.0427)[0.0427](1.2569)[1.2576]
−0.9 −0.6 −0.3 0.0 0.3 0.6 0.9
Table 7 Total effects: xn,1. MLE m ¼ 5; m ¼ 15; N ¼ 486 −0.9 (0.4487)[−0.0776](1.0363)[1.0392] −0.6 (0.7363)[0.1113](1.1686)[1.1738] −0.3 (0.8202)[0.0510](1.4148)[1.4157]
RGMME
BE1
(0.5128)[−0.0135](0.9256)[0.9257] (0.7153)[0.0903](1.0786)[1.0824] (0.8014)[0.0322](1.3711)[1.3715]
(0.4484)[−0.0779](0.9593)[0.9624] (0.7276)[0.1026](1.0744)[1.0792] (0.8060)[0.0368](1.2885)[1.2890]
O. Doğan, S. Taşpınar / Regional Science and Urban Economics 45 (2014) 1–21
19
Table 7 (continued) MLE
RGMME
BE1
(0.8703)[−0.1297](1.7656)[1.7703] (1.3902)[−0.0384](2.3117)[2.3120] (2.2905)[−0.2095](3.8475)[3.8532] (7.3677)[−2.6323](14.645)[14.880]
(0.8642)[−0.1358](1.8425)[1.8475] (1.4769)[0.0483](2.5512)[2.5516] (2.6775)[0.1775](4.4527)[4.4563] (9.0038)[−0.9962](24.232)[24.252]
(0.8827)[−0.1173](1.6362)[1.6404] (1.4106)[−0.0180](2.1430)[2.1431] (2.2950)[−0.2050](3.6186)[3.6245] (7.9447)[−2.0553](13.801)[13.953]
m ¼ 7; m ¼ 21; N ¼ 974 −0.9 (0.5479)[0.0216](0.9239)[0.9241] −0.6 (0.6859)[0.0609](1.0811)[1.0828] −0.3 (0.8668)[0.0976](1.3092)[1.3128] 0.0 (1.0005)[0.0005](1.6295)[1.6295] 0.3 (1.3622)[−0.0663](2.2496)[2.2506] 0.6 (2.2510)[−0.2490](3.6965)[3.7049] 0.9 (8.1163)[−1.8837](13.551)[13.682]
(0.5134)[−0.0129](0.8382)[0.8383] (0.6235)[−0.0015](0.9967)[0.9967] (0.8265)[0.0573](1.2611)[1.2624] (1.0103)[0.0103](1.6371)[1.6371] (1.4704)[0.0418](2.3708)[2.3712] (2.6373)[0.1373](4.2110)[4.2132] (10.384)[0.3836](17.349)[17.353]
(0.5080)[−0.0184](0.8720)[0.8722] (0.6850)[0.0600](1.0362)[1.0379] (0.8461)[0.0768](1.2411)[1.2434] (1.0148)[0.0148](1.5526)[1.5526] (1.3653)[−0.0633](2.1583)[2.1592] (2.2459)[−0.2541](3.5782)[3.5872] (8.1433)[−1.8567](13.022)[13.154]
m ¼ 14; m ¼ 20; N ¼ 485 −0.9 (0.6943)[0.1679](1.0477)[1.0611] −0.6 (0.7055)[0.0805](1.2695)[1.2720] −0.3 (0.8508)[0.0815](1.5437)[1.5459] 0.0 (0.9627)[−0.0373](1.8300)[1.8304] 0.3 (1.3411)[−0.0875](2.5335)[2.5350] 0.6 (2.1366)[−0.3634](4.2218)[4.2374] 0.9 (7.5849)[−2.4151](16.024)[16.205]
(0.5551)[0.0288](0.9621)[0.9625] (0.6059)[−0.0191](1.1785)[1.1787] (0.8055)[0.0363](1.4844)[1.4848] (1.0029)[0.0029](1.8371)[1.8371] (1.5446)[0.1160](2.7145)[2.7170] (2.7278)[0.2278](4.8890)[4.8943] (10.984)[0.9836](20.976)[20.999]
(0.6045)[0.0782](0.8733)[0.8768] (0.6607)[0.0357](0.9952)[0.9958] (0.7591)[−0.0101](1.2443)[1.2443] (1.0133)[0.0133](1.4842)[1.4842] (1.3902)[−0.0384](2.1090)[2.1093] (2.2631)[−0.2369](3.5772)[3.5851] (8.3183)[−1.6817](14.328)[14.427]
m ¼ 20; m ¼ 28; N ¼ 945 −0.9 (0.6389)[0.1126](0.7251)[0.7338] −0.6 (0.7261)[0.1011](0.8875)[0.8932] −0.3 (0.8621)[0.0928](0.9955)[0.9998] 0.0 (1.0264)[0.0264](1.2393)[1.2396] 0.3 (1.4071)[−0.0215](1.7296)[1.7297] 0.6 (2.0415)[−0.4585](2.7426)[2.7806] 0.9 (6.8801)[−3.1199](9.9827)[10.458]
(0.5345)[0.0081](0.6601)[0.6601] (0.6232)[−0.0018](0.8093)[0.8093] (0.8014)[0.0321](0.9465)[0.9470] (1.0317)[0.0317](1.2483)[1.2487] (1.5683)[0.1397](1.8599)[1.8651] (2.5226)[0.0226](3.1775)[3.1776] (10.117)[0.1173](13.173)[13.173]
(0.5817)[0.0554](0.6393)[0.6417] (0.6763)[0.0513](0.7986)[0.8003] (0.8192)[0.0500](0.9215)[0.9229] (0.9975)[−0.0025](1.1756)[1.1756] (1.4363)[0.0077](1.6417)[1.6417] (2.2410)[−0.2590](2.7008)[2.7132] (7.8362)[−2.1638](10.039)[10.269]
0.0 0.3 0.6 0.9
(Mean)[Bias](Std.)[RMSE].
Table 8 Total effects: xn,2. RGMME
BE1
m ¼ 5; m ¼ 15; N ¼ 486 −0.9 (0.7405)[0.2142](1.0495)[1.0711] −0.6 (0.6426)[0.0176](1.1832)[1.1833] −0.3 (0.7893)[0.0200](1.4192)[1.4194] 0.0 (1.1352)[0.1352](1.7735)[1.7786] 0.3 (1.3130)[−0.1155](2.3382)[2.3410] 0.6 (2.1650)[−0.3350](3.8321)[3.8467] 0.9 (8.8208)[−1.1792](14.870)[14.917]
MLE
(0.5400)[0.0137](0.9383)[0.9384] (0.5452)[−0.0798](1.0945)[1.0974] (0.7363)[−0.0330](1.3769)[1.3773] (1.1425)[0.1425](1.7978)[1.8034] (1.3641)[−0.0644](2.5104)[2.5112] (2.2367)[−0.2633](4.8739)[4.8810] (8.983)[−1.0173](21.194)[21.219]
(0.7143)[0.1880](0.9527)[0.9711] (0.6336)[0.0086](1.0625)[1.0625] (0.7930)[0.0238](1.2671)[1.2673] (1.1187)[0.1187](1.6122)[1.6166] (1.3039)[−0.1247](2.1258)[2.1295] (2.2058)[−0.2942](3.5488)[3.5610] (8.4111)[−1.5889](13.663)[13.755]
m ¼ 7; m ¼ 21; N ¼ 974 −0.9 (0.6375)[0.1112](0.9676)[0.9740] −0.6 (0.6904)[0.0654](1.1364)[1.1383] −0.3 (0.7488)[−0.0205](1.3616)[1.3618] 0.0 (0.9979)[−0.0021](1.7208)[1.7208] 0.3 (1.3588)[−0.0697](2.3514)[2.3525] 0.6 (2.2487)[−0.2513](3.8327)[3.8409] 0.9 (7.9484)[−2.0516](14.125)[14.273]
(0.5371)[0.0108](0.8787)[0.8788] (0.6286)[0.0036](1.0510)[1.0510] (0.7127)[−0.0566](1.3102)[1.3114] (0.9924)[−0.0076](1.7300)[1.7301] (1.4002)[−0.0283](2.4762)[2.4764] (2.4129)[−0.0871](4.3412)[4.3421] (9.1202)[−0.8798](17.435)[17.457]
(0.6472)[0.1209](0.9089)[0.9169] (0.6710)[0.0460](1.0818)[1.0828] (0.7570)[−0.0122](1.2826)[1.2826] (0.9843)[−0.0157](1.6311)[1.6312] (1.3683)[−0.0603](2.2376)[2.2384] (2.3032)[−0.1968](3.6830)[3.6883] (8.2548)[−1.7452](13.476)[13.588]
m ¼ 14; m ¼ 20; N ¼ 485 −0.9 (0.4541)[−0.0722](1.1198)[1.1221] −0.6 (0.6735)[0.0485](1.3686)[1.3694] −0.3 (0.7766)[0.0074](1.6621)[1.6621] 0.0 (1.0455)[0.0455](1.9743)[1.9749] 0.3 (1.3500)[−0.0785](2.7266)[2.7277] 0.6 (2.2405)[−0.2595](4.5546)[4.5620] 0.9 (8.3963)[−1.6037](17.399)[17.473]
(0.4928)[−0.0335](1.0320)[1.0326] (0.6521)[0.0271](1.2722)[1.2725] (0.7352)[−0.0340](1.5977)[1.5981] (1.0062)[0.0062](1.9803)[1.9803] (1.3245)[−0.1041](2.8995)[2.9014] (2.3083)[−0.1917](5.1855)[5.1890] (9.4221)[−0.5779](22.578)[22.585]
(0.4934)[−0.0329](0.9253)[0.9259] (0.6550)[0.0300](1.0590)[1.0594] (0.8281)[0.0588](1.3254)[1.3268] (0.9899)[−0.0101](1.5811)[1.5812] (1.3690)[−0.0596](2.2282)[2.2290] (2.3677)[−0.1323](3.7946)[3.7969] (8.8920)[−1.1080](15.193)[15.233]
m ¼ 20; m ¼ 28; N ¼ 945 −0.9 (0.5204)[−0.0059](0.7638)[0.7638] −0.6 (0.6675)[0.0425](0.9379)[0.9389] −0.3 (0.7681)[−0.0011](1.0509)[1.0509] 0.0 (0.9787)[−0.0213](1.3102)[1.3104] 0.3 (1.2626)[−0.1659](1.8237)[1.8312] 0.6 (2.2583)[−0.2417](2.9123)[2.9223] 0.9 (8.2405)[−1.7595](10.521)[10.667]
(0.5129)[−0.0134](0.6969)[0.6970] (0.6339)[0.0089](0.8553)[0.8554] (0.7348)[−0.0345](0.9988)[0.9994] (0.9722)[−0.0278](1.3183)[1.3186] (1.2965)[−0.1320](1.9505)[1.9550] (2.4962)[−0.0038](3.3585)[3.3585] (10.063)[0.0634](13.813)[13.813]
(0.5136)[−0.0127](0.6664)[0.6665] (0.6437)[0.0187](0.8319)[0.8321] (0.7654)[−0.0038](0.9627)[0.9627] (1.0084)[0.0084](1.2270)[1.2271] (1.3216)[−0.1069](1.7183)[1.7216] (2.3318)[−0.1682](2.8320)[2.8370] (8.9324)[−1.0676](10.441)[10.495]
(Mean)[Bias](Std.)[RMSE].
20
O. Doğan, S. Taşpınar / Regional Science and Urban Economics 45 (2014) 1–21
Appendix C. Variable definitions and summary statistics This section contains variable definitions and descriptive statistics for the data sets used in the empirical illustrations. Table 9 Hedonic-housing price equation, N = 506. Variables
Definition
Min
Max
Mean
Std
Crime Zoning Industry Charles river Nox Rooms House age Distance Access Tax rate Pupil/teacher Black pop. Lower class House price
Per capita crime by town Proportion of residential land zoned for lots over 25,000 ft2 Proportion of non-retail business acres per town Charles River dummy variable 1 if tract bounds river Nitric oxides concentration parts per 10 million Average number of rooms per dwelling Proportion of owner-occupied units built prior to 1940 Weighted distances to five Boston employment centers Index of accessibility to radial highways Full-value property tax rate per $10,000 Pupil–teacher ratio by town 1000(Bk–0.63)2 where Bk is the proportion of blacks by town Lower status of the population Median value of owner occupied house price in $1000
0.0063 0 0.4600 0 0.3850 3.5610 2.9000 1.1296 1.0000 187.0000 12.6000 0.3200 1.7300 5.0000
88.9762 100.0000 27.7400 1 0.8710 8.7800 100 12.1265 24.00 711.0000 22.0000 396.9000 37.9700 50.0000
3.6135 11.3636 11.1368 0.0692 0.5547 6.2846 68.5749 3.7950 9.5494 408.2372 18.4555 356.6740 12.6531 22.5328
8.6015 23.3225 6.8604 0.2540 0.1159 0.7026 28.1489 2.1057 8.7073 168.5371 2.1649 91.2949 7.1411 9.1971
Source: This data set is available in the Kelley Pace's Spatial Statistics Toolbox.
References Abadir, Karim M., Magnus, Jan R., 2005. Matrix algebra. Econometric Exercises.Cambridge University Press, New York9780521822893. Albert, James H., Chib, Siddhartha, 1993. Bayesian analysis of binary and polychotomous response data. J. Am. Stat. Assoc. 88.422, 669–679 (ISSN 01621459). Anderson, James E., van Wincoop, Eric, 2003. Gravity with gravitas: a solution to the border puzzle. Am. Econ. Rev. 93.1, 170–192. Anselin, Luc, 1982. A note on small sample properties of estimators in a first-order spatial autoregressive model. Environ. Plan. A 14.8, 1023–1030. Anselin, Luc, 1988. Spatial Econometrics: Methods and Models. Springer, New York. Anselin, Luc, 2006. In: Mills, T.C., Patterson, K. (Eds.), Spatial Econometrics, vol. I. Palgrave Macmillan, Basingstoke. Arraiz, Irani, Drukker, David M., Kelejian, Harry H., Prucha, Ingmar R., 2010. A spatial Clifford-type model with heteroskedastic innovations: small and large sample results. J. Reg. Sci. 50.2, 592–614. Badinger, Harald, Egger, Peter, 2011. Estimation of higher-order spatial autoregressive cross-section models with heteroskedastic disturbances. Pap. Reg. Sci. 90.1, 213–235 (ISSN 1435–5957). Baltagi, Badi H., 2008. Econometric Analysis of Panel Data. John Wiley & Sons. Behrens, Kristian, Ertur, Cem, Koch, Wilfried, 2012. ‘Dual’ gravity: using spatial econometrics to control for multilateral resistance. J. Appl. Econ. 27.5, 773–794. Casella, George, George, Edward I., 1992. Explaining the Gibbs sampler. Am. Stat. 46.3, 167–174. Chib, Siddhartha, 2001. Chapter 57 Markov chain Monte Carlo methods: computation and inference. In: Heckman, J.J., Leamer, E. (Eds.), Handbook of Econometrics. Handbook of Econometrics, vol. 5. Elsevier, pp. 3569–3649. Chib, Siddhartha, Greenberg, Edward, 1995. Understanding the Metropolis–Hastings algorithm. Am. Stat. 49.4, 327–335. Doğan, Osman, Taşpınar, Süleyman, 2013a. GMM estimation of spatial autoregressive models with autoregressive and heteroskedastic disturbances. Working Paper. City University of New York Graduate Center. http://dx.doi.org/10.2139/ssrn.2227163. Doğan, Osman, Taşpınar, Süleyman, 2013b. GMM estimation of spatial autoregressive models with moving average disturbances. Reg. Sci. Urban Econ. 43.6, 903–926. Drukker, David M., Egger, Peter, Prucha, Ingmar R., 2013. On two-step estimation of a spatial autoregressive model with autoregressive disturbances and endogenous regressors. Econ. Rev. 32.5-6, 686–733. Elhorst, J. Paul, Lacombe, Donald J., Piras, Gianfranco, 2012. On model specification and parameter space definitions in higher order spatial econometric models. Reg. Sci. Urban Econ. 42.1-2, 211–220. Feenstra, Robert C., 2004. Advanced International Trade. Princeton University Press, Princeton, NJ. Geweke, John, 1993. Bayesian treatment of the independent student-t linear model. J. Appl. Econ. 8, S19–S40. Geweke, John, Keane, Michael, 2001. Chapter 56 computationally intensive methods for integration in econometrics. In: Heckman, J.J., Leamer, E. (Eds.), Handbook of econometrics. Handbook of Econometrics, vol. 5. Elsevier, pp. 3463–3568. Gilley, Otis W., Kelley Pace, R., 1996. On the Harrison and Rubinfeld data. J. Environ. Econ. Manag. 31.3, 403–405. Griffiths, William E., 2007. Heteroskedasticity. In: Baltagi, Badi H. (Ed.), A Companion to Theoretical Econometrics. Blackwell Publishing Ltd, pp. 82–100. Harrison, David, Rubinfeld, Daniel L., 1978. Hedonic housing prices and the demand for clean air. J. Environ. Econ. Manag. 5.1, 81–102. Hepple, Leslie W., 1979. Bayesian analysis of the linear model with spatial dependence. In: Bartels, Cornelis P.A., Ketellapper, Ronald H. (Eds.), Exploratory and Explanatory Statistical Analysis of Spatial Data. Martinus Nijhoff Publishing, Boston, Massachusetts. Hepple, Leslie W., 1995a. Bayesian techniques in spatial and network econometrics: 1. model comparison and posterior odds. Environ. Plan. 27.3, 247–469.
Hepple, Leslie W., 1995b. Bayesian techniques in spatial and network econometrics: 2. computational methods and algorithms. Environ. Plan. 27.4, 615–644. Hepple, Leslie W., 2002. Analytical simplifications for marginal distributions in Bayesian spatial econometrics. School of Geographical Sciences, University of Bristol, Working Papers Series. University of Bristol, School of Geographical Sciences. Hepple, Leslie W., 2003. Bayesian and maximum likelihood estimation of the linear model with spatial moving average disturbances. School of Geographical Sciences, University of Bristol, Working Papers Series. University of Bristol, School of Geographical Sciences. Hoogerheide, Lennart F., van Dijk, Herman K., van Oest, Rutger D., 2009. Simulation based Bayesian econometric inference: principles and some recent computational advances. In: Belsley, David A., John Kontoghiorghes, Erricos (Eds.), Handbook of Computational Econometrics. John Wiley & Sons, Ltd, Chichester, UK, pp. 215–281. Kakamu, Kazuhiko, Wago, Hajime, 2008. Small-sample properties of panel spatial autoregressive models: comparison of the Bayesian and maximum likelihood methods. Spat. Econ. Anal. 3.3, 305–319. Kelejian, Harry H., Prucha, Ingmar R., 1998. A generalized spatial two-stage least squares procedure for estimating a spatial autoregressive model with autoregressive disturbances. J. Real Estate Financ. Econ. 17.1, 1899–1926. Kelejian, Harry H., Prucha, Ingmar R., 1999. A generalized moments estimator for the autoregressive parameter in a spatial model. Int. Econ. Rev. 40.2, 509–533. Kelejian, Harry H., Prucha, Ingmar R., 2007. Specification and Estimation of Spatial Autoregressive Models with Autoregressive and Heteroskedastic Disturbances. Department of Economics, University of Maryland (July). Kelejian, Harry H., Prucha, Ingmar R., 2010. Specification and estimation of spatial autoregressive models with autoregressive and heteroskedastic disturbances. J. Econ. 157, 53–67. Kelley Pace, R., Barry, Ronald, 1997. Quick computation of spatial autoregressive estimators. Geogr. Anal. 290 (3), 232–247. Koop, Gary, 2003. Bayesian Econometrics. John Wiley and Sons Ltd., West Sussex, England. Koop, Gary, Poirier, Dale J., Tobias, Justin L., 2007. Bayesian Econometric Methods. Cambridge University Press, New York, USA. Lee, Lung-fei, 2003. Best spatial two-stage least squares estimators for a spatial autoregressive model with autoregressive disturbances. Econ. Rev. 22.4, 307–335. Lee, Lung-fei, 2004. Asymptotic distributions of quasi-maximum likelihood estimators of spatial econometrics models. Econometrica 72, 1899–1926. Lee, Lung-fei, 2007a. GMM and 2SLS estimation of mixed regressive, spatial autoregressive models. J. Econ. 137, 489–514. Lee, Lung-fei, 2007b. The method of elimination and substitution in the GMM estimation of mixed regressive, spatial autoregressive models. J. Econ. 140, 155–189. Lee, Myoung-Jae, 2010. Micro-econometrics: Methods of Moments and Limited Dependent Variables. Springer, New York, USA. Lee, Lung-fei, Liu, Xiaodong, 2010. Efficient GMM estimation of high order spatial autoregressive models with autoregressive disturbances. Econ. Theory 26.01, 187–230. LeSage, James P., 1997. Bayesian estimation of spatial autoregressive models. Int. Reg. Sci. Rev. 20.1-2, 113–129. LeSage, James P., 1999. Spatial Econometrics. Regional Research Institute, West Virginia University. LeSage, James P., Pace, Robert K., 2009. Introduction to Spatial Econometrics Statistics: A Series of Textbooks and Monographs. Chapman and Hall/CRC, London. Lin, Xu, Lee, Lung-fei, 2010. GMM estimation of spatial autoregressive models with unknown heteroskedasticity. J. Econ. 157.1, 34–52. Liu, Chuanhai, Rubin, Donald B., 1995. Ml estimation of the t distribution using EM and its extensions, ECM and ECME. Stat. Sin. 5, 19–39.
O. Doğan, S. Taşpınar / Regional Science and Urban Economics 45 (2014) 1–21 Liu, Xiaodong, Lee, Lung-fei, Bollinger, Christopher R., 2010. An efficient GMM estimator of spatial autoregressive models. J. Econ. 159.2, 303–319. McCallum, John, 1995. National borders matter: Canada–U.S. regional trade patterns. Am. Econ. Rev. 85.3, 615–623. Newey, Whitney K., McFadden, Daniel, 1994. Large sample estimation and hypothesis testing. Handb. Econ. IV, 2111–2245. Oliveira, Victor De, Song, Joon Jin, 2008. Bayesian analysis of simultaneous autoregressive models. English Sankhyā: The Indian Journal of Statistics, Series B (2008), 700 (2), pp. 323–350. Pace, Robert K., LeSage, James P., Zhu, Shuang, 2012. Spatial dependence in regressors and its effect on performance of likelihood-based and instrumental variable estimators. In: Terrell, Daniel Millimet Dek (Ed.), 30th Anniversary
21
Edition. Advances in Econometrics, vol. 30. Emerald Group Publishing Limited, pp. 257–295. Parent, Olivier, LeSage, James P., 2007. Bayesian model averaging for spatial econometric models. University of Cincinnati, Economics Working Papers Series 2007–02. University of Cincinnati, Department of Economics. Parent, Olivier, LeSage, James P., 2008. Using the variance structure of the conditional autoregressive spatial specification to model knowledge spillovers. J. Appl. Econ. 23.2, 235–256. Tierney, Luke, 1994. Markov chains for exploring posterior distributions. Ann. Stat. 22.4, 1701–1728. White, Halbert G., 1980. A heteroskedasticity-consistent covariance matrix estimator a direct test for heteroskedasticity. Econometrica 48, 817–838.