Bayesian estimation and model selection of threshold spatial Durbin model

Bayesian estimation and model selection of threshold spatial Durbin model

Journal Pre-proof Bayesian estimation and model selection of threshold spatial Durbin model Yanli Zhu, Xiaoyi Han, Ying Chen PII: DOI: Reference: S0...

623KB Sizes 0 Downloads 64 Views

Journal Pre-proof Bayesian estimation and model selection of threshold spatial Durbin model Yanli Zhu, Xiaoyi Han, Ying Chen

PII: DOI: Reference:

S0165-1765(20)30009-4 https://doi.org/10.1016/j.econlet.2020.108956 ECOLET 108956

To appear in:

Economics Letters

Received date : 1 November 2019 Revised date : 3 January 2020 Accepted date : 9 January 2020 Please cite this article as: Y. Zhu, X. Han and Y. Chen, Bayesian estimation and model selection of threshold spatial Durbin model. Economics Letters (2020), doi: https://doi.org/10.1016/j.econlet.2020.108956. This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of record. This version will undergo additional copyediting, typesetting and review before it is published in its final form, but we are providing this version to give early visibility of the article. Please note that, during the production process, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

© 2020 Published by Elsevier B.V.

*Highlights (for review)

Journal Pre-proof

Highlight  We consider a threshold spatial Durbin model that allows for threshold effects in both endogenous and exogenous spatial interactions among cross-sectional units.

pro of

 We develop a computationally tractable Markov Chain Monte Carlo (MCMC) algorithm to estimate the model.  We propose a nested model selection procedure to test for spatial threshold effects, based upon the Bayes factor computed from the Savage-Dickey Density Ratio in Verdinelli and Wasserman(1995) .

Jo

urn a

lP

re-

 We also discuss alternative tests for spatial threshold effect, such as the Bayesian t-test and the Bayesian 95% confidence interval.

1

*Title Page

Journal Pre-proof

Title:

Bayesian estimation and model selection of threshold spatial Durbin model

Yanli Zhu, Xiaoyi Han, Ying Chen [email protected] (Yanli Zhu), [email protected] (Xiaoyi Han) [email protected] (Ying Chen) Author affiliations: Business School, Institute of Industrial Economics, and Jiangsu Provincial Collaborative Innovation Center of World Water Valley and Water Ecological Civilization, Hohai University, Nanjing, 211100 China (Yanli Zhu) Wang Yanan Institute for Studies in Economics(WISE), Department of Public Economics, MOE Key Laboratory of Econometrics, and Fujian Key Laboratory of Statistical Science, Xiamen University, Xiamen, 361005 China (Xiaoyi Han) Department of Mathematics, Risk Management Institute, National University of Singapore, 119076 Singapore (Ying Chen)

pro of

Authors: Email address:

lP

re-

Abstract: We consider a threshold spatial Durbin model that allows for threshold effects in both endogenous and exogenous spatial interactions among cross-sectional units. We develop a computationally tractable Markov Chain Monte Carlo (MCMC) algorithm to estimate the model. We also propose a nested model selection procedure to test for spatial threshold effects, based upon the Bayes factor computed from the Savage-Dickey Density Ratio in Verdinelli and Wasserman(1995) . Simulation studies suggest that the Bayesian estimator is more precise than the spatial 2SLS (S2SLS) estimator in Deng(2018) . The model selection procedure works well when the sample size increases and the difference between spatial parameters enlarges.

Jo

urn a

Key words: Threshold Spatial Durbin Model; Bayesian Estimation; Bayes Factor; Savage-Dickey Density Ratio. JEL classification: C2, C5, C11

*Manuscript Click here to view linked References

Journal Pre-proof

Bayesian estimation and model selection of threshold spatial Durbin modelI Yanli Zhua , Xiaoyi Hanb,∗, Ying Chenc a Business

pro of

School, Institute of Industrial Economics, and Jiangsu Provincial Collaborative Innovation Center of World Water Valley and Water Ecological Civilization, Hohai University, Nanjing, 211100 China b Wang Yanan Institute for Studies in Economics(WISE), Department of Public Economics, MOE Key Laboratory of Econometrics, and Fujian Key Laboratory of Statistical Science, Xiamen University, Xiamen, 361005 China c Department of Mathematics, Risk Management Institute, National University of Singapore, 119076 Singapore

Abstract

We consider a threshold spatial Durbin model that allows for threshold effects in both endogenous and exogenous spatial interactions among cross-sectional units. We develop a computationally tractable Markov Chain Monte Carlo (MCMC) algorithm to estimate the model. We also propose a nested model selection

re-

procedure to test for spatial threshold effects, based upon the Bayes factor computed from the Savage-Dickey Density Ratio in Verdinelli and Wasserman (1995). Simulation studies suggest that the Bayesian estimator is more precise than the spatial 2SLS (S2SLS) estimator in Deng (2018). The model selection procedure works well when the sample size increases and the difference between spatial parameters enlarges.

Ratio JEL Classification: C2, C5, C11

urn a

1. Introduction

lP

Keywords: Threshold Spatial Durbin Model, Bayesian Estimation, Bayes Factor, Savage-Dickey Density

In past decades, the threshold model (Tong, 1978) and the spatial econometric model (Anselin, 1988) have both received growing attention in various areas of economics. Recently, Deng (2018) attempts to connect the two literature by proposing a threshold spatial autoregressive (TSAR) model with varying spatial parameters for different regimes. In her model, the slopes of all exogenous regressors remain the same for different subsamples. This paper further generalizes the TSAR model in Deng (2018) by considering a threshold spatial Durbin (TSD) model, which allows for heterogeneous slope coefficients for both spatial lags and all

Jo

exogenous regressors. Particularly, the TSD model can capture heterogeneous influence from the spatial I We are grateful to the Co-editor and two anonymous referees for instructive comments. Zhu gratefully acknowledges the financial support of the National Nature Science Foundation of China (No. 71703030) and the Fundamental Research Funds for the Central Universities (No. 2018B20314). Han gratefully acknowledges the financial support of the National Nature Science Foundation of China (No. 71501163 and No. 71973113). Chen gratefully acknowledges the financial support of Singapore Ministry of Education Academic Research Fund Tier 1 at National University of Singapore. ∗ Corresponding author: Xiaoyi Han Email addresses: [email protected] (Yanli Zhu), [email protected] (Xiaoyi Han), [email protected] (Ying Chen)

Journal Pre-proof

Durbin term. We believe this extension is valuable because the effect of the spatial Durbin term, which is labeled as “exogenous effect” or “contextual effect” in the social interaction literature (Lee et al., 2010), has proven to be important in some empirical studies of exogenous gender peer effect (i.e., a higher proportion of female peers) at school (Lavy and Schlosser, 2011; Gong et al., 2019). Some researchers have begun to look at heterogeneous contextual effects with respect to students’ personality (Hsieh and van Kippersuluis,

pro of

2018). Hence, it would be appealing to incorporate the spatial Durbin term into the TSAR model and study possible nonlinearities from it.

This paper is related to two distinct lines of thriving literature on the threshold and the spatial econometric models. On one hand, the threshold model can achieve sample spliting through comparing some observed threshold variables to an unknown threshold, and is capable of capturing possible nonlinearities in empirical studies. The estimation and inference of the threshold model have been investigated by a sequence of studies,1 including the conventional profile least square approach in Hansen (2000), the 2SLS approach in Caner and Hansen (2004) and the Bayesian approach in Geweke and Terui (1993), Chen and Lee (1995)

re-

and Wu and Chen (2007). However, those studies do not explore estimation and testing issues under spatial dependence. On the other hand, the spatial econometric model can be applied to study the setting where the outcome of a cross-sectional unit is influenced by those of its neighbors. Various estimation methods of spatial models have appeared in the literature, such as the MLE method in Lee (2004), the IV and GMM

lP

method in Kelejian and Prucha (1998) and Lee (2007) and the Bayesian MCMC method in LeSage and Pace (2009). Those studies typically assume homogeneous spatial interaction.2 The TSAR model in Deng (2018) is more flexible and can capture heterogeneous spatial effects under different circumstances. Deng (2018) suggests a spatial 2SLS (S2SLS) method for estimation. But she does not study testing issues for spatial

urn a

threshold effects.

Our methodological contributions over the above-mentioned literatures are twofold. First we develop a Bayesian Markov chain Monte Carlo (MCMC) algorithm to estimate the TSD model. The Bayesian estimator is expected to be more efficient than the S2SLS method in Deng (2018) because it is likelihoodbased. Furthermore, the classical maximum likelihood estimation for the TSD model might be more difficult than that for a standard SAR model, because one needs to perform a constrained optimization for multiple spatial parameters as well as the threshold parameter. Thus, we propose the powerful and practical

Jo

Bayesian estimation strategy, where constraints over spatial and threshold parameters can be conveniently imposed through the acceptance-rejection step of the Metropolis-Hasting (MH) algorithm. The computational tractability and efficiency of our Bayesian estimator are supported by simulation results. Second, we 1A

review regarding the development of theories and applications of threshold models can be found in Hansen (2011). are some studies using high order spatial autoregressive (SAR) model to investigate heterogeneous spatial interaction

2 There

effects, such as Yu et al. (2016) and Hsieh and van Kippersuluis (2018). But few of them exploit the spatial heterogeneity with respect to an unknown threshold.

2

Journal Pre-proof

study testing issues for spatial threshold effects, which tend to be ignored by previous literatures, under the Bayesian framework. We propose a nested model selection procedure to formulate the test based on the Savage-Dickey density ratio (SDDR) in Verdinelli and Wasserman (1995). The posterior draws from the MCMC sampler enable us to evaluate the Bayes factor and conduct the test in a straightforward manner. The remainder of the paper proceeds as follows. Section 2 lays out the model. Section 3 describes the

pro of

MCMC algorithm. Section 4 discusses the test for spatial threshold effects. Section 5 provides Monte Carlo simulation results. Section 6 concludes. Technical details are collected in the supplementary materials.

2. Model Specification

Consider the following two-regime TSD model   P P  n n 0 0  λ1 φ1 + ui , qi ≤ γ θ + w x w y + x 1 ij ij j j i Pj=1  Pj=1  yi = n n 0 0  λ 2 j=1 wij yj + xi θ2 + j=1 wij xj φ2 + ui , qi > γ

i = 1, 2 . . . , n

(2.1)

re-

where the subscript i denotes cross-sectional units, which can be countries, cities, individuals and so on. yi is the scalar dependent variable of i. wij is the (i, j)th element of the exogenous spatial weight matrix Wn , which may or may not be row-normalized. xi is a k × 1 vector of exogenous regressors. θj and φj Pn are, respectively, k × 1 slope coefficient vectors of xi and j=1 wij x0j . ui is the i.i.d. normally distributed

lP

disturbance with mean 0 and variance σ 2 . λj is the scalar spatial parameter for regime j that captures the heterogeneous spatial interaction effect for the jth regime. φj captures the corresponding exogenous interaction effect (or contextual effect) in regime j. qi is the exogenous continuous threshold variable, with γ being the threshold parameter. For instance, for social interaction in classrooms with randomly

urn a

seat assignment (Lu and Anderson, 2015), students being surrounded by more female neighbors might face different peer effects in academic performance than students with less female neighbors. Here the peer group of i could be the whole class excluding i, and qi could be the proportion of females among i’s nearby 4 to 5 deskmates. Define



   Dγ =    



I(q1 ≤ γ)

I(q2 ≤ γ)

..

. I(qn ≤ γ)

   ,   

Jo

¯ γ = In − Dγ . Also denote Yn = (y1 , y2 , . . . , yn )0 , Xn = (x1 , x2 , . . . , xn )0 , Un = (u1 , u2 , . . . , un )0 , and D

˜ n = (Dγ Xn , Dγ Wn Xn , D ¯ γ Xn , D ¯ γ Wn Xn ) and β = (θ0 , φ0 , θ0 , φ0 )0 . In matrix form, (2.1) can be written as X 1 1 2 2 ¯ γ Wn Yn + X ˜ n β + Un Yn = λ1 Dγ Wn Yn + λ2 D

(2.2)

Compared to the TSAR model in Deng (2018), the TSD model in (2.2) not only has heterogeneous spatial interaction effects λ1 and λ2 in two regimes, but it also allows for heterogeneous coefficients θ1 , θ2 , φ1 and 3

Journal Pre-proof

φ2 from exogenous regressors as well as the spatial Durbin term. Hence, it is a further generalization of the TSAR model in Deng (2018). ¯ γ Wn with λ = (λ1 , λ2 )0 . If Sn (λ, γ) is invertible, the reduced form Let Sn (λ, γ) = In − λ1 Dγ Wn − λ2 D for Yn can be represented by, ˜ n β + S −1 (λ, γ)Un Yn = Sn−1 (λ, γ)X n

pro of

(2.3)

According to Horn and Johnson (1985), a sufficient stability condition that ensures Sn (λ, γ) is invertible, is ¯ γ Wn k < 1, where k · k denote any matrix norm. This condition might be computationally kλ1 Dγ Wn + λ2 D demanding when n is large. To simplify computation, we consider a more restrictive stability condition on ¯ γ = In . Thus we have λ and γ with the maximum row sum norm k · k∞ . Note that Dγ + D  ¯ γ Wn k∞ ≤ max kλ1 Dγ Wn k∞ , kλ2 D ¯ γ Wn k∞ kλ1 Dγ Wn + λ2 D  ¯ γ k∞ × kWn k∞ ≤ max |λ1 | × kDγ k∞ × kWn k∞ , |λ2 | × kD

re-

≤ max {|λ1 | × kWn k∞ , |λ2 | × kWn k∞ } = max {|λ1 |, |λ2 |} × kWn k∞

As long as max {|λ1 |, |λ2 |} × kWn k∞ < 1, Sn (λ, γ) will be invertible. Since the norm kWn k∞ only need to be evaluated once before the MCMC sampler, the computational burden is greatly reduced. If Wn is

lP

row-normalized, the corresponding condition will suffice to max {|λ1 |, |λ2 |} < 1. 3. MCMC Estimation Let Θ = λ1 , λ2 , γ, β 0 , σ 2

0

urn a

be the parameter vector of the model in (2.2). The likelihood function3 is  h i0 h i ˜ ˜ S (λ, γ)Y − X β S (λ, γ)Y − X β n n n n n n − n   (3.1) f (Yn |Θ) ∝ σ 2 2 × |Sn (λ, γ)| × exp − . 2 2σ

Assume the following priors for Θ: λj ∼ U (−1, 1),

j = 1, 2;

γ ∼ U (γ, γ);

β ∼ N4k (βO , BO );

2

σ ∼ IG



a b , 2 2



.

(3.2)

Specifically, the priors of λ1 and λ2 are uniform distributions. As we begin with row-normalized Wn in the following simultation study, we follow LeSage and Pace (2009) to set the upper and lower bounds of

Jo

λ1 and λ2 to be, respectively, 1 and -1 for convenience.4 We follow Geweke and Terui (1993) and Chen and Lee (1995) to assume a uniform prior over (γ, γ) for the threshold parameter γ. For some η > 0, γ 3 For

notational convenience, the exogenous variables Xn , the exogenous threshold variables Qn = (q1 , q2 , . . . , qn ) as well as

the exogenous spatial weights matrix Wn are suppressed from the conditioning set of the likelihood function. 4 As will be detailed below, the sampling step for λ and λ is a Metropolis-Hasting step. So their priors do not play a 1 2 specific role (i.e., not conjugate priors). The stability condition in Section 2 on λ1 and λ2 would be imposed through the acceptance-rejection step in the MCMC sampler, not through the priors.

4

Journal Pre-proof

and γ are usually set, respectively, as the lower and the upper η% percentile of the distinct values among the sorted observations on the threshold variable Qn = (q1 , q2 , . . . , qn ). This prior restriction on γ would also be imposed on the sampling steps to ensure the model is not fitting outliers in different regimes (Koop and Potter, 1999). The prior of β is a 4k-dimensional multivariate normal distribution with mean βO and variance-covariance matrix BO . The prior of σ 2 is an inverse gamma distribution with shape parameter a

pro of

and scale parameter b. Coupled with the priors, the posterior distribution of Θ is given by p(Θ|Yn ) ∝ π(λ1 ) × π(λ2 ) × π(γ) × π(β) × π(σ 2 ) × f (Yn |Θ).

With conjugate priors, β and σ 2 can be sampled directly from their conditional posterior distributions, which is a multivariate normal or an inverse gamma distribution, through Gibbs sampling steps. However, a MH step is needed to sample the spatial parameters and the threshold parameter in δ = (λ1 , λ2 , γ), since the conditional posterior distribution of them does not take a known form. The whole MCMC sampler can be realized by the following steps:5

re-

Step 1: Sample δ = (λ1 , λ2 , γ) from p(δ|Yn , β, σ 2 ) using a MH step; Step 2: Sample β from p(β|Yn , δ, σ 2 ) using a Gibbs step;

lP

Step 3: Sample σ 2 from p(σ 2 |Yn , δ, β) using a Gibbs step.

4. Testing for Spatial Threshold Effect 4.1. Testing via Bayes factor

Before adopting the TSD model, it is necessary to test for spatial threshold effects, i.e., whether λ1 and λ2

urn a

are indeed different from each other.6 For instance, when studying adolescents’ behavioral interaction in classrooms, one might wonder whether students with higher proportion of female neighbors would face larger peer effects, compared to students with lower proportion of female neighbors. In the Bayesian framework, this test can be formulated as a nested model selection problem between (2.2) and the alternative nested competing model. Let λ2 = λ1 + ψ. (2.2) can be reparameterized as: ¯ γ Wn Yn + X ˜ n β + Un . Yn = λ1 Wn Yn + ψ D

(4.1)

Jo

With ψ = 0, there is no spatial threshold effect and (4.1) reduces to a model with homogeneous spatial effect, namely,

5A 6 In

˜ n β + Un . Yn = λ1 Wn Yn + X

(4.2)

detailed discussion of the MCMC sampler is provided in Supplements A. this paper we focus on testing for the endogenous spatial threshold effects, i.e., whether λ1 = λ2 or not. The test on

the threshold slope coefficients, i.e., whether θ1 = θ2 or δ1 = δ2 can be formulated in a similar manner.

5

Journal Pre-proof

So the hypothesis testing problem H0 : ψ = 0 against H1 : ψ 6= 0 is equivalent to a nested model selection problem between (4.1) and (4.2) under the Bayesian framework. In the Bayesian paradigm, initialized by Zellner (1971), Bayes factor is a popular criterion to conduct model selection among nested or non-nested competing models (Kass and Raftery, 1995). Denote the unrestricted model in (4.1) as M1 and the restricted competing model in (4.2) as M2 . Let Θ2 = (λ1 , γ, β, σ 2 )

pro of

and Θ1 = (Θ2 , ψ) be, respectively, the parameter vector of M2 and M1 . Let P (M1 ) and P (M2 ) be the corresponding prior probabilities of the two models, and π(Θ1 ) and π(Θ2 ) be the prior densities for the parameters. The posterior odds, which is the ratio of the products of the prior odds and the marginal likelihoods of the two models, is:

P (M2 ) f2 (Yn |M2 ) P (M2 |Yn ) = × , P (M1 |Yn ) P (M1 ) f1 (Yn |M1 ) | {z } | {z } | {z }

P osterior odds

where fi (Yn |Mi ) =

R

P rior odds

Bayes f actor

fi (Yn |Θi , Mi )π(Θi )dΘi , i = 1, 2. Usually the same prior probabilities are assumed for

re-

competing models. Thus, one only needs to evaluate the Bayes factor, which is just the ratio of the two model’s marginal likelihoods. The model with a larger marginal likelihood is more likely to be the model that generates the data. However, with the threshold parameter γ in the spatial weights, it might not be easy to evaluate the marginal likelihoods of the models in (4.1) and (4.2), because the univariate numerical

lP

integration procedure in LeSage and Pace (2009) is not directly applicable. One the other hand, the Savage Dickey density ratio (SDDR) in Verdinelli and Wasserman (1995) can be employed to ease the computation of the Bayes factor for the nested model selection issues. Below we provide a simple procedure to evaluate the Bayes factor of (4.2) over (4.1) using SDDR.

urn a

4.2. Evaluating the Bayes factor through SDDR

Jo

Let π(.) be the prior under H1 and π0 (.) be the prior under H0 . The Bayes factor of M2 over M1 is R f (Yn |ψ = 0, Θ2 )π0 (Θ2 )dΘ2 f2 (Yn |M2 ) R BF21 = = . f1 (Yn |M1 ) f (Yn |ψ, Θ2 )π(ψ, Θ2 )dψdΘ2 R Let C1 = f (Yn |ψ, Θ2 )π(ψ, Θ2 )dψdΘ2 . Following Verdinelli and Wasserman (1995), we have R f (Yn |ψ = 0, Θ2 )π0 (Θ2 )dΘ2 BF21 = p(ψ = 0|Yn ) × p(ψ = 0|Yn ) × C1 R f (Yn |ψ = 0, Θ2 )π0 (Θ2 )p(Θ2 |Yn , ψ = 0)dΘ2 = p(ψ = 0|Yn ) × p(ψ = 0, Θ2 |Yn ) × C1 Z π0 (Θ2 ) p(ψ = 0|Yn ) = × p(Θ2 |Yn , ψ = 0)dΘ2 π(ψ = 0) π(Θ2 |ψ = 0)

(4.3)

where the last line follows because p(ψ = 0, Θ2 |Yn ) × C1 = f (Yn |ψ = 0, Θ2 ) × π(ψ = 0, Θ2 ). Also note that π0 (Θ2 ) = π(Θ2 |ψ = 0) because all priors are independent. Therefore, (4.3) can be further simplified to BF21 =

p(ψ = 0|Yn ) , π(ψ = 0) 6

(4.4)

Journal Pre-proof

which is labeled as the the Savage-Dickey density ratio (SDDR) in Verdinelli and Wasserman (1995). If we assume a uniform prior for ψ, the denominator of (4.4) is just a constant. Hence, it remains to evaluate the marginal posterior likelihood p(ψ = 0|Yn ). Denote δ−ψ = (λ1 , γ) and Θ2 = (δ−ψ , β, σ 2 ). We first modify the MCMC sampler in Section 3 to use two MH steps to sample δ−ψ from p(δ−ψ |Yn , ψ, β, σ 2 ) and ψ from p(ψ|Yn , δ−ψ , β, σ 2 ). Then, according to Chib and Jeliazkov (2001), p(ψ = 0|Yn ) can be approximated by7 PL (l) L−1 l=1 P r(ψ (l) , 0|Yn , Θ2 )g(0|ψ (l) ) Eu (P r(ψ, 0|Yn , Θ2 )g(0|ψ) , = PJ (j) Er (P r(0, ψ|Yn , Θ2 )) P r(0, ψ (j) |Yn , Θ ) J −1

pro of

p(ψ = 0|Yn ) =

(4.5)

2

j=1

where P r(ψ, 0|Yn , Θ2 ) is the acceptance probability of moving from ψ to 0 and g(0|ψ) is the corresponding proposal density. The expectation Eu is taken with respect to the conditional posterior density of the (l)

unrestricted model p(Θ1 |Yn ) and ψ (l) ’s and Θ2 ’s are posterior draws from it, whereas the expectation Er (j)

is taken with respect to the density g(ψ|0) × p(Θ2 |Yn , ψ = 0) and ψ (j) ’s and Θ2 are MCMC draws from it.

Thus, to calculate p(ψ = 0|Yn ), in addition to the MCMC draws from the unrestricted model in (4.1), one

re-

merely need to obtain the reduced MCMC draws from g(ψ|0) × p(Θ2 |Yn , ψ = 0). 4.3. Alternative tests based on the MCMC samples

The previous subsections formulate the test for spatial threshold effect as a nested model selection problem

lP

between the restricted and the TSD models, via the Bayes factor. Alternatively, one may use the Bayesian t-test in the recent Bayesian mathematical statistic literature(G¨ onen et al., 2005, 2019; Rouder et al., 2009; Wang and Liu, 2016; Gronau et al., 2019) to conduct a more direct test for threshold effect, based upon the MCMC samples of λ1 and λ2 .8 Specifically, if we follow G¨ onen et al. (2005) to treat the MCMC samples of λ1 and λ2 as two groups of independent samples, the test of ψ = λ2 − λ1 equaling zero or not can be

urn a

formulated as a t-test for the difference of the two group means. G¨ onen et al. (2005) has shown that, under some prior assumptions on the common grand mean and standard deviation of the two groups, as well as the standardized threshold effect size, the analytical expression of the Bayes factor for testing H0 : ψ = 0 against H1 : ψ 6= 0 can be derived. They label the Bayes factor as the “Bayesian t-test statistic” because its expression involves the two-sample t-statistic for testing the difference of the two group means. In addition to the Bayes factor in Subsections 4.1 and 4.2, we utilize the MCMC samples of λ1 and λ2 to conduct the Bayesian t-test in G¨onen et al. (2005) to test for spatial threshold effect. Meanwhile, as the posterior draws

Jo

of λ1 and λ2 might not be independent, we also follow Chen et al. (2010) to employ the Bayesian 95% confidence interval to make inferences on ψ = λ2 − λ1 . The detailed discussions about these alternative tests and the corresponding simulation results are provided in Supplements C.2 and C.3. 7 The 8 We

detailed derivation of (4.5) can be found in Supplements B. thank one referee for suggesting the Bayesian t-test as a more direct alternative to the Bayes factor in Subsection 4.1.

7

Journal Pre-proof

5. Simulation study In this section, we first evaluate the performance of the MCMC sampler for the TSD model using simulated data sets, and compare with the S2SLS method in Deng (2018).9 We then utilize the SDDR to test whether ψ = λ2 − λ1 = 0. We consider the TSD model in (4.1) with xi ∼ N (0, 1), qi ∼ N (2, 1), ui ∼ N (0, 1) and

pro of

sample size n ∈ {100, 250, 500}. We generate the spatial weights matrix Wn as follows: first generate a u u n × n zero-diagonal un-row normalized matrix Wnu = [wij ], where wij = 1 if |i − j| ≤ 3 and i 6= j, otherwise

u wij = 0, and then row normalize Wnu to obtain Wn . The data is simulated with γ = 2, λ1 = 0.2, θ1 = 0.4,

δ1 = 0.2, θ2 = 0.8, δ2 = 0.6 and different values of λ2 . Particularly, λ2 ∈ {0.4, 0.6, 0.8} for estimation and λ2 ∈ {0.2, 0.4, 0.6, 0.8} for model selection. The number of repetitions R is 1,000 for all experiments. The length of the Markov chain in each repetition is 10,000 with the first 2,000 draws discarded as burn-in samples.10

For model estimation, we use the posterior mean of MCMC draws as the point estimate for each parameter

re-

and compute the mean absolute error (MAE) and root mean squared error (RMSE) of parameter estimates across 1000 repetitions. Table 1 reports MAE and RMSE of the threshold and slope parameter estimates under different settings. For comparison, we calculate the following relative MAE and RMSE: M AE S2SLS ; M AE M CM C

Relative RM SE =

RM SE S2SLS . RM SE M CM C

lP

Relative M AE =

Relative MAE or RMSE greater than (less than) 1 indicates that the S2SLS approach performs worse (better) than the MCMC approach. When n increases and the difference between spatial parameters (λ2 − λ1 ) enlarges, the relative MAE and RMSE of parameter estimates of the S2SLS method tend to decrease in most cases, but still much larger than 1. This suggests that the MCMC estimator is more precise than the

urn a

S2SLS estimator. To give readers some sense of how long the Bayesian estimation would take,11 we run the MCMC algorithm on the simulated data set for the TSD model, on a computer with 64G memory and Intel Core I9-9900K processor, and calculate the average computational time across 1,000 repetitions. The results are provided in Table 2. We find that, with a sample of 500, it takes about 22 minutes to finish the whole MCMC estimation procedure.

For model selection, we first evaluate BF21 , i.e., the Bayes factor of the restricted model over the TSD model and compute log10 BF12 = −log10 BF21 .12 We then follow the guidelines in Jeffreys (1961) to compare

9 We

Jo

log10 BF12 with a cut-off α, where α ∈ {0, 1/2, 1, 3/2, 2}. If log10 BF12 exceeds α, the TSD model (M1 ) is choose Zγ =



¯ γ Wn Xn , Wn Dγ Wn Xn , Wn D ¯ γ Wn Xn ¯ γ Xn , Dγ Wn Xn , D Dγ Xn , D



as the instruments for Wn Yn . The

detailed S2SLS estimation procedure can be found in Deng (2018). 10 We set the hyperparameters of prior distributions in (3.2) as (β , B ) = (0, 100 × I ), (a, b) = (2, 2), and, γ and γ to be, 4 O O respectively, the 10% and 90% percentile of the distinct values of qi ’s. 11 We thank one referee for suggesting the need of reporting the computational time of MCMC estimation. 12 For model selection, we assume a uniform prior for ψ = λ − λ , i.e., ψ ∼ U (−2, 2). 2 1

8

Journal Pre-proof

preferred to the restricted model (M2 ).13 We calculate the percentages of the TSD model to be selected as the better model across repetitions, for different values of α. For each α, we consider different values of ψ with varying λ2 while holding other parameters fixed as in estimation. Table 3 provides the percentages of the TSD model being selected as the better model under different settings. For cases with ψ = 0 (bolded in Table 3), the true model is the restricted model, and the percentages of the TSD model to be selected as the

pro of

better model are quite low, indicating the model selection procedure inclines to choose the true model. For cases with ψ 6= 0, the true model is the TSD model. We find that for a given α, the percentage of the TSD model being selected increases when n increases and the difference between λ1 and λ2 (ψ = λ2 −λ1 ) becomes larger. In addition, as α decreases, the criterion to support the TSD model becomes less conservative, so the percentage of the TSD model being selected enlarges. Particularly, when α = 0 and ψ = 0.6, the percentage of the TSD model being selected is nearly 100%, providing strong evidence that the model selection procedure is reliable.

re-

6. Conclusion

In this paper we consider a TSD model that allows for threshold spatial effects for both spatial lags and spatial Durbin terms in different regimes, where regimes are defined by an exogenous threshold variable.

lP

We propose a computationally tractable Bayesian MCMC algorithm to estimate the model. We also study a nested model selection procedure to test for spatial threshold effects, based upon the SDDR in Verdinelli and Wasserman (1995), under the Bayesian framework. Monte Carlo experiment shows that our Bayesian estimator is more efficient than the S2SLS estimator in Deng (2018). The model selection procedure performs

urn a

well when the sample size increases and the difference between spatial parameters enlarges.

References

Anselin, L., 1988. Spatial Econometrics: Methods and Models. Boston: Kluwer Academic Publishers. Caner, M., Hansen, B., 2004. Instrumental variable estimation of a threshold model. Econometric Theory 20, 813–843. Chen, C., Lee, J., 1995. Bayesian inference of threshold autoregressive. Journal of Time Series Analysis 16, 483–492. Chen, R., Guo, R., Lin, M., 2010. Self-selectivity in firm’s decision to withdraw IPO: Bayesian inference for hazard models of bankruptcy with feedback. Journal of the American Statistical Association 492, 1297–1309. Chib, S., Jeliazkov, I., 2001. Marginal likelihood from the Metropolis-Hasting output. Journal of the American Statistical

Jo

Association 96, 270–281.

Deng, Y., 2018. Estimation for the spatial autoregressive threshold model. Economics Letters 171, 172–175. Geweke, J., Terui, N., 1993. Bayesian threshold autoregressive models for nonlinear time series. Journal of Time Series Aanalysis 14, 441–454.

13 According

to Jeffreys (1961), if log10 BF12 > α with α = 0, we have “weak evidence” for the TSD model over the restricted

model. With α = 2, the same relation implies we have “decisive” support for the TSD model.

9

Journal Pre-proof

G¨ onen, M., Johnson, W., Lu, Y., Westfall, P., 2005. The Bayesian two-sample t-test. The American Statistician 59, 252–257. G¨ onen, M., Johnson, W., Lu, Y., Westfall, P., 2019. Comparing objective and subjective Bayes factors for the two-sample comparison: The classification theorem in action. The American Statistician 73, 22–31. Gong, J., Lu, Y., Song, H., 2019. Gender peer effects on students’ academic and noncognitive outcomes: Evidence and mechanisms. Journal of Human Resources Forthcoming. Gronau,

Q.,

Ly,

A.,

Wagenmakers,

E.,

2019.

Informed Bayesian t-tests.

The American Statistician DOI:

pro of

10.1080/00031305.2018.1562983.

Hansen, B., 2000. Sample splitting and threshold estimation. Econometrica 68, 575–603.

Hansen, B., 2011. Threshold autoregression in economics. Statistics and Its Interface 4, 123–127. Horn, R., Johnson, C., 1985. Matrix Analysis. New York: Cambridge University Press.

Hsieh, C., van Kippersuluis, H., 2018. Smoking initiation: Peer and personality. Quantitative Economics 9, 825–863. Jeffreys, H., 1961. Theory of Probability. Clarendon Press, Oxford. 3rd edn.

Kass, R., Raftery, A., 1995. Bayes factors. Journal of the American Statistical Association 90(430), 773–795. Kelejian, H., Prucha, I., 1998. A generalized spatial two-stage least squares procedure for estimating a spatial autoregressive model with autoregressive disturbances. Journal of Real Estate Finance and Economics 17, 99–121.

Koop, G., Potter, S., 1999. Dynamic asymmetries in U.S. unemployment. Journal of Business Economic & Statistics 17,

re-

298–312.

Lavy, V., Schlosser, A., 2011. Mechanisms and impacts of gender peer effects at school. American Economic Journal: Applied Economics 3, 1–33.

Lee, L., 2004. Asymptotic distributions of quasi-maximum likelihood estimators for spatial autoregressive models. Econometrica 72, 1899–1925.

489–514.

lP

Lee, L., 2007. GMM and 2SLS estimation of mixed regressive, spatial autoregressive models. Journal of Econometrics 137,

Lee, L., Liu, X., Lin, X., 2010. Specification and estimation of social interaction models with network structures. The Econometrics Journal 13, 145–176.

LeSage, J., Pace, R., 2009. Introduction to Spatial Econometrics. CRC Press. Boca Raton, FL, USA. Lu, F.W., Anderson, M., 2015. Peer effects in microenvironments: The benefits of homogeneous classroom groups. Journal of

urn a

Labor Economics 33, 91–122.

Rouder, J., Speckman, P., Sun, D., Morey, R., Iverson, G., 2009. Bayesian t-tests for accepting and rejecting the null hypothesis. Psychonomic Bulletin & Review 16, 225–237.

Tong, H., 1978. On a threshold model in pattern recognition and signal processing. Amsterdam: Sijthoff and Noordhoff. Verdinelli, L., Wasserman, L., 1995. Computing Bayes factors using a generalization of the Savage-Dickey density ratio. Journal of the American Statistical Association 90, 614–618.

Wang, M., Liu, G., 2016. A simple two-sample Bayesian t-test for hypotheis testing. The American Statistician 70, 195–201. Wu, S., Chen, R., 2007. Threshold variable determination and threshold variable driven switching autoregressive models. Statistica Sinica 17, 241–264.

Jo

Yu, J., Zhou, L., Zhu, G., 2016. Strategic interaction in political competition: Evidence from spatial effectsects across chinese cities. Regional Science and Urban Economics 57, 23–37. Zellner, A., 1971. An Introduction to Bayesian Inference in Econometrics. New York: J. Wiley and Sons, Inc.

10

Jo

11

0.155 0.158 0.175 0.135 0.379 0.146 0.391

λ1 = 0.2 λ2 = 0.8 γ=2 θ1 = 0.4 δ1 = 0.2 θ2 = 0.8 δ2 = 0.6

0.096 0.079 0.062 0.076 0.218 0.078 0.210

0.114 0.110 0.121 0.077 0.228 0.082 0.227

0.066 0.057 0.029 0.053 0.151 0.055 0.148

0.079 0.074 0.062 0.054 0.154 0.057 0.153

0.088 0.091 0.102 0.057 0.150 0.059 0.159

4.462 6.371 1.936 1.332 2.270 1.888 3.290

M AE S2SLS ; M AE M CM C

0.200 0.220 0.255 0.174 0.485 0.191 0.511

0.084 0.074 0.045 0.067 0.189 0.068 0.188

0.101 0.094 0.114 0.069 0.194 0.071 0.196

0.113 0.116 0.173 0.073 0.189 0.075 0.203

RM SE S2SLS . RM SE M CM C

0.122 0.108 0.096 0.097 0.276 0.099 0.274

0.146 0.141 0.191 0.100 0.289 0.103 0.289

0.170 0.161 0.263 0.108 0.309 0.108 0.306

6.145 11.211 1.926 1.724 2.698 2.862 5.401

24.365 37.627 1.737 2.468 6.744 3.204 4.822

10.820 12.195 1.776 1.998 4.463 2.440 4.957

4.369 6.253 2.736 1.315 2.310 1.817 3.145

5.436 7.823 1.636 1.359 2.925 1.957 3.735

9.188 16.262 1.402 1.726 4.425 2.316 5.051

3.698 4.926 2.790 1.192 2.096 1.445 2.749

4.382 4.913 1.247 1.236 2.557 1.403 2.758

5.462 6.019 1.162 1.342 3.172 1.486 3.246

S2SLS: Relative RMSE n = 100 n=250 n = 500

pro of Relative RM SE =

3.456 4.723 2.202 1.200 2.017 1.427 2.749

0.227 0.259 0.320 0.209 0.510 0.180 0.534

0.233 0.242 0.353 0.194 0.544 0.197 0.529

re-

4.115 4.675 1.275 1.236 2.447 1.381 2.764

5.053 5.300 1.076 1.306 2.994 1.458 3.159

MCMC: RMSE n = 100 n=250 n = 500

1,000 times.

Relative MAE or RMSE greater than (less than) 1 indicates that the S2SLS approach performs worse (better) than the MCMC approach. Each experiment is replicated

Relative M AE =

3.786 5.904 2.402 1.297 2.182 1.661 3.081

4.735 5.503 1.536 1.333 2.691 1.654 3.189

6.100 7.330 1.270 1.423 3.188 1.683 3.544

lP

8.327 8.557 1.713 1.623 3.505 2.015 3.501

8.087 8.787 1.785 1.590 3.534 1.934 4.250

Note: The relative MAE and RMSE are calculated as follows:

0.176 0.197 0.236 0.146 0.386 0.137 0.403

λ1 = 0.2 λ2 = 0.6 γ=2 θ1 = 0.4 δ1 = 0.2 θ2 = 0.8 δ2 = 0.6

0.128 0.124 0.182 0.084 0.241 0.084 0.242

S2SLS: Relative MAE n = 100 n=250 n = 500

urn a

0.181 0.190 0.265 0.148 0.409 0.152 0.396

MCMC: MAE n = 100 n=250 n = 500

λ1 = 0.2 λ2 = 0.4 γ=2 θ1 = 0.4 δ1 = 0.2 θ2 = 0.8 δ2 = 0.6

qi ∼ N (2, 1) rep = 1000

Table 1: MAE and RMSE of the threshold and slope parameter estimates.

Journal Pre-proof

Journal Pre-proof

Table 2: The average computational time for the MCMC estimation procedure (Unit: minutes)

λ2 = 0.4 λ2 = 0.6 λ2 = 0.8

n = 100

n = 250

n = 500

0.1947 0.1958 0.1973

2.5372 2.4416 2.5171

22.8487 22.8726 22.5937

pro of

Parameters settings

Note: Each experiment is replicated 1,000 times. We run the MCMC algorithm on the simulated data set for the TSD

model, on a computer with 64G memory and Intel Core I99900K processor.

Table 3: Percentages of the threshold spatial Durbin (TSD) model to be selected as the better model

n = 250

n = 500

α=2

= 0.2, = 0.2, = 0.2, = 0.2,

λ2 λ2 λ2 λ2

= 0.2 = 0.4 = 0.6 = 0.8

(ψ = 0) (ψ = 0.2) (ψ = 0.4) (ψ = 0.6)

0.0% 0.0% 0.4% 6.3%

0.0% 0.1% 3.6% 36.3%

0.1% 0.6% 18.1% 85.0%

α = 3/2

λ1 λ1 λ1 λ1

= 0.2, = 0.2, = 0.2, = 0.2,

λ2 λ2 λ2 λ2

= 0.2 = 0.4 = 0.6 = 0.8

(ψ = 0) (ψ = 0.2) (ψ = 0.4) (ψ = 0.6)

0.1% 0.0% 1.1% 9.2%

0.1% 0.6% 6.3% 45.9%

0.2% 2.6% 25.5% 89.3%

λ1 λ1 λ1 λ1

= 0.2, = 0.2, = 0.2, = 0.2,

λ2 λ2 λ2 λ2

= 0.2 = 0.4 = 0.6 = 0.8

(ψ = 0) (ψ = 0.2) (ψ = 0.4) (ψ = 0.6)

0.1% 0.7% 2.7% 16.5%

0.3% 1.9% 10.8% 57.8%

0.4% 4.1% 35.0% 94.0%

α = 1/2

λ1 λ1 λ1 λ1

= 0.2, = 0.2, = 0.2, = 0.2,

λ2 λ2 λ2 λ2

= 0.2 = 0.4 = 0.6 = 0.8

(ψ = 0) (ψ = 0.2) (ψ = 0.4) (ψ = 0.6)

1.5% 2.3% 8.8% 28.0%

1.2% 4.5% 20.7% 71.8%

0.8% 8.6% 51.1% 97.4%

α=0

λ1 λ1 λ1 λ1

= 0.2, = 0.2, = 0.2, = 0.2,

λ2 λ2 λ2 λ2

= 0.2 = 0.4 = 0.6 = 0.8

(ψ = 0) (ψ = 0.2) (ψ = 0.4) (ψ = 0.6)

7.4% 11.1% 22.6% 50.8%

6.3% 10.9% 38.5% 85.6%

4.1% 17.1% 66.9% 99.3%

Jo

urn a

α=1

Parameters settings

re-

n = 100

λ1 λ1 λ1 λ1

lP

Criterion for Bayes factor

Note: Each experiment is replicated 1,000 times. For cases with ψ = 0 (bolded), the true model is the restricted model. For cases with ψ 6= 0, the true model is the TSD model. If log10 BF12 exceeds α, the

TSD model (M1 ) is preferred to the restricted model (M2 ). The size of α reflects the degree of support for the TSD model.

12