Spatial dynamic panel data models with random effects

Spatial dynamic panel data models with random effects

Regional Science and Urban Economics 42 (2012) 727–738 Contents lists available at SciVerse ScienceDirect Regional Science and Urban Economics journ...

NAN Sizes 0 Downloads 129 Views

Regional Science and Urban Economics 42 (2012) 727–738

Contents lists available at SciVerse ScienceDirect

Regional Science and Urban Economics journal homepage: www.elsevier.com/locate/regec

Spatial dynamic panel data models with random effects☆ Olivier Parent a,⁎, James P. LeSage b a b

Department of Economics, University of Cincinnati, Cincinnati, OH 45221, United States Urban and Regional Economics, Department of Finance and Economics, Texas State University‐San Marcos, San Marcos, TX 78666, United States

a r t i c l e

i n f o

Article history: Received 12 December 2011 Received in revised form 23 April 2012 Accepted 27 April 2012 Available online 3 May 2012 JEL classification: C11 C21 C23 O40

a b s t r a c t We develop a general space–time filter applied to panel data models in order to control for heterogeneity as well as both time and spatial dependence. Treatment of initial period observations is analyzed when the number of time periods is small. A second issue relates to a restriction implied by the filter specification on the space– time cross-product term that can greatly simplify interpretation of model estimates as well as the estimation procedure. An applied illustration of the method is provided using a Solow growth model. The application shows that the theoretical restriction implied for the cross-product term in our space–time filter specification is consistent with this particular dynamic space–time panel data set. © 2012 Elsevier B.V. All rights reserved.

Keywords: Spatial correlation Dynamic panels Bayesian estimations

1. Introduction Dynamic panel data models that accommodate spatial dependence have attracted attention recently in the panel data literature. Spatial econometrics allows for interaction of cross-sectional units in space whereas dynamic models evaluate how observations are related over time. The development of new theoretical and empirical models in social science to analyze social networks (Bramoullé et al., 2009), habit and interaction effects (Korniotis, 2010), externalities and spillover effects (Mohl and Hagen, 2010; Lottmann, 2012), consider spatiotemporal interaction involving the dependent variable. A variety of models that control for various types of correlation across locations have been explored (see Lee and Yu, 2010, for an excellent review). Su and Yang (2007) consider a dynamic model for spatial dependence in the error terms, whereas Yu et al. (2008) introduce a dynamic panel approach that treats spatial dependence in the dependent variable vector. Yu et al. (2008) implement a specification that allows for spatial and time dependence as well as component mixing of space and time dependence that can be interpreted as spatial diffusion that takes place over time. We extend this approach introducing a space–time filter

☆ We would like to thank the Editor and two anonymous referees for their constructive comments. Financial support from the Charles Phelps Taft Research Center is gratefully acknowledged. ⁎ Corresponding author. E-mail addresses: [email protected] (O. Parent), [email protected] (J.P. LeSage). 0166-0462/$ – see front matter © 2012 Elsevier B.V. All rights reserved. doi:10.1016/j.regsciurbeco.2012.04.008

that we apply to the dependent variable producing a specification that subsumes many of the space–time panel data model specifications that have appeared in the literature (e.g., Elhorst, 2004) as special cases. The filter specification implies a constraint on the mixing term that reflects spatial diffusion or space–time covariance. We show that imposing this constraint results in a type of separability of space and time dependence that simplifies the estimation procedure as well as interpretation of the marginal effects. Moreover we note that omitting this constraint could lead to an overidentification problem. The constraint allows for a number of simplifications in the Bayesian Markov Chain Monte Carlo (MCMC) estimation scheme we set forth. In our applied illustration we find the theoretically implied constraint to be consistent with the sample data, suggesting there are likely other cases where the constraint and associated simplifications might be useful in practice. In fact, Keller and Shiue (2007) analyzing interregional trade patterns in China emphasize the importance of including this component mixing space and time, and their estimates appear to represent another empirical example where the restriction on the space–time interaction term is consistent with the sample data. With large number of cross sectional observations (N) and time periods (T), Yu et al. (2008) have developed and studied estimation procedures that treat the data generating process as conditional on the initial period cross-sectional observations. In the context of models that treat spatial dependence in the disturbances rather than the dependent variable, treatment of the initial period observations has been found to be important (Su and Yang, 2007). We explore this issue in the context of our space–time filter specification that models

728

O. Parent, J.P. LeSage / Regional Science and Urban Economics 42 (2012) 727–738

space–time dependence in the dependent variable. We set forth MCMC approaches that treat the initial period observations as exogenous or conditional on the initial period cross-sectional observations as in the case of Yu et al. (2008), as well as an approach for treating these observations as endogenous. The latter draws on work by Bhargava and Sargan (1983) that estimates or predicts initial period observations which we adapt to our MCMC estimation scheme. A Monte Carlo study shows that when the initial cross-sectional observations are endogenous, but incorrectly treated as exogenous, estimates and inferences can be biased, a finding similar to that of Su and Yang (2007) for the case where dependence is modeled in the disturbances. An applied illustration based on a Solow growth model for 48 US states over the period 1973–1997 is used to demonstrate a situation where the theoretical restriction on the space–time cross product term implied by the filter specification is consistent with the sample data. Moreover, this application demonstrates the importance of treating the initial period observation correctly. Using a shorter time span, estimation results reveal an improvement when the first cross section is treated as endogenous and modeled using an optimal predictor robust to spatial dependence. We focus only on panel data models with random effects. Analyzing a dynamic panel data model with spatial dependence in the error term, Su and Yang (2007) found that estimates based on the MLE method for both fixed and random effects are consistent and asymptotically normally distributed. For data involving short panels, they show that MLE estimates are not consistent only if the initial period observation is misspecified. For the case of a spatial dynamic panel data model with fixed effects, Yu et al. (2008) show that the MLE method is biased, and Hsiao (2003, Chapter 4) notes that if individual effects are random and exogenous, estimates based on random effects are more efficient. As shown by Lee and Yu (2010), the direct estimation method without a proper transformation that eliminates the fixed effects will not yield consistent estimates for the variance parameter when T is small. A similar conclusion was found by Yu and Lee (2010) in analyzing spatial panel data models with fixed effects. Thus, they proposed a new transformation that eliminates fixed effects to obtain consistent estimates. However, this transformation reduces the time dimension of the sample which poses problems for cases where T is small. This provides another motivation for use of random effects rather than a transformation in conjunction with fixed effects. 1 In Section 2, we set forth the space–time filter applied to the dependent variable vector in a panel data setting, and discuss issues pertaining to treatment of the initial period observations. In Section 3 we describe Bayesian MCMC estimation, and results from a Monte Carlo experiment are presented in Section 4. Section 5 provides an applied illustration with a space– time dynamic Solow growth model and Conclusions are in Section 6. 2. A dynamic spatial lag model with random effects We introduce serial and spatial dependence following Anselin (2001). The general model is based on a dynamic panel data model with random effects including a spatial autoregressive process governing the dependent variable at the contemporaneous time t as well as the previous period t − 1 for t = 1,…, T: yt ¼ ϕyt−1 þ ρWyt þ θWyt−1 þ ιN α þ xt β þ ηt ηt ¼ μ þ ε t ;

ð1Þ

where yt = (y1t, …, yNt)′ is the N × 1 vector of observation for the tth time period, α is the intercept, ιN is an N × 1 column vector of ones, xt denotes the N × k matrix of non-stochastic regressors and ηt represents the 1 Using Gaussian linear mixed models, the Bayesian literature has proposed a variety of hierarchical specifications to model subject-specific random effects (see Chib, 2008). Here, we rely on a simple specification of these random effects assuming they are uncorrelated with the disturbances.

summation of two unobserved normally distributed random components: μ is an N × 1 column vector of random effects with μi ~ N(0, σμ2), and disturbance term εt is assumed to be independent and identically distributed with zero mean and variance σε2IN. We make the traditional assumption that μ is uncorrelated with εt for identification purposes. W is a known N × N spatial weight matrix whose diagonal elements are zero. This matrix defines the dependence between cross sectional (spatial) observational units. We will also assume that W is transformed such that it has a spectral radius of one. The strength of the spatial dependence is measured by the parameter ρ, the first order time dependence reflected in the scalar parameter ϕ, and θ represents the component mixing space and time dependence. The later plays a key role in defining space–time covariance. We will show that imposing a specific constraint on this component leads to separable time and spatial effects which simplify interpretation of the estimation results, as well as a simplification in the variance–covariance matrix for the model disturbances, which simplifies the Markov Chain Monte Carlo estimation scheme. Estimating the full variance–covariance matrix can be computationally intensive and its complexity greatly increases when modeling the initial period observations. If the time dimension is large, ignoring information conveyed by the initial observations should have little effect on the estimates and will greatly simplify the computational complexity of estimation. In the exogenous case (or conditional specification), it is assumed that y0 is given. When the time span is smaller, the impact of the first observation can be important. As highlighted by Bhargava and Sargan (1983), in the endogenous case specification of the first observation y0 can be based on an optimal predictor. In order to address computational issues, there has been some effort to simplify the covariance matrix. Baltagi et al. (2007) develop a model that allows for spatial dependence in the disturbances εt = (ε1t, …, εNt)′, as well as first-order time dependence. This model can be rewritten using the space–time filter we propose here applied only to the disturbances (see Parent and LeSage, 2011, for a complete discussion). In this study, we set forth an extension to the dynamic spatial panel data model. In order to understand how the time dependence can be separated from spatial dependence, we start with a time dependence filter based on the Prais–Winsten transformation defined as: 0

1 ψ 0 … 0 B −ϕ 1 … 0C C: C¼B @ ⋮ ⋱ ⋱ ⋮A 0 … −ϕ 1

ð2Þ

Specification of ψ, the (1, 1) element in C depends on whether the first period is modeled (which we refer to as the endogenous case) or assumed to be known (which we label the exogenous case). Assuming the process is stationary, the (1, 1) element ψ of the (T + 1) × (T+ 1) matrix C is given by: ψ¼

qffiffiffiffiffiffiffiffiffiffiffiffiffi 1−ϕ2 ;jϕjb1:

ð3Þ

The spatial dependence filter is defined as a nonsingular matrix B = (IN − ρW). The space–time filter we propose corresponds to the Kronecker product of the matrices C and B: C⊗B ¼ INðTþ1Þ −ρI Tþ1 ⊗W−ϕL⊗I N þ ðρ  ϕÞL⊗W;

ð4Þ

where L is a (T+ 1)× (T +1) matrix representing the time-lag operator.2 This filter implies a restriction that the parameter associated with spatial 2



Note that if the process is stationary, the (1, 1) element of L is equal to r ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi  1−ϕ2 =ϕ:

1−

O. Parent, J.P. LeSage / Regional Science and Urban Economics 42 (2012) 727–738

effects from the previous period (L ⊗ W) is equal to −ρ×ϕ. Parent and LeSage (2011) show that applying this space–time filter to the error terms greatly simplifies the estimation procedure for a model where space–time dependence is modeled in the disturbance structure. Unfortunately, the same simplifications do not apply in the model specification considered here, where we apply the filter to the dependent variable vector producing a dynamic model context. In this context, the first period cross section depends on pre-sample values, which requires that these values be approximated (using an optimal predictor) in the endogenous case. Nonetheless, we show that this filter specification can still be used to simplify the computational complexity of the MCMC estimation procedure. Ignoring for now issues pertaining to prediction of the first period cross-section values, applying the filter to the dependent variable results in a model specification: ðC⊗BÞy ¼ ιNðTþ1Þ α þ xβ þ η   ~ η e N 0; Ω

ð5Þ

729

given which leads to a filter specification that is equivalent to C~ ⊗B, where C~ is the T × (T + 1) matrix defined in Eq. (7). 0

1 −ϕ 1 … 0 C~ ¼ @ ⋮ ⋱ ⋱ ⋮A 0 … −ϕ 1

ð7Þ

  We set y =(y′0, …, y′T)′, X = (x′1, …, x′T)′ and e ¼ C~ ⊗B y− Xβ−ιNT α. The log-likelihood function of the complete sample size of T for the panel data model defined in Eq. (1) is given by: ln LT ðξÞ ¼ −

N X NT 1 1 ′ −1 lnð2πÞ− lnjΩj þ T ln½ð1−ρϖi Þ− e Ω e; 2 2 2 i¼1

where ξ = (β′, α, σε2, σμ2, ϕ, ρ). Since the random effects are integrated out, the NT × NT variance– covariance matrix for η is equivalent to: 2

2

Ω ¼ σ μ ðJ T ⊗I N Þ þ σ ε ½IT ⊗IN ; where y =(y′0,…,y′T)′ is a N×(T+ 1) column vector, x =(x′0,…,x′T)′ is a N (T + 1) ×k matrix and   ~ ¼ σ 2 J ⊗I þ σ 2 I Ω μ Tþ1 ε NðTþ1Þ ; N

ð6Þ

with JT + 1 =ιT + 1ι′T + 1. The case of spatial dynamic panel models estimation is considerably more complex than dynamic panel data models as is interpretation of the resulting model estimates. These facts might motivate use of the restriction. If there is empirical evidence that the restriction is consistent with the sample data, then one would benefit from a simpler estimation as well as interpretation. The interpretation of the direct and indirect effects can be extremely complex for the general case, a point made in (Parent and LeSage, 2010) and detailed in Debarsy et al. (2012). Since the restriction allows spatial and temporal dynamic effects to be disentangled, interpretation of the estimation results is greatly simplified as are analysis of the short-run and long-run dynamic effects or impulse responses. A number of studies have treated the parameter ρ × ϕ associated with the cross-product term in different ways. For example, Elhorst (2004, 2010) and Devereux et al. (2007) refer to this type of model specification as the “time–space simultaneous” model initially proposed by Anselin (1988). They both neglect the effect of the space–time interaction term and rely on a specification equivalent to imposing the restriction that θ = −ρ × ϕ = 0. Anselin (1988) proposed a related “time–space dynamic model” specification explored by Yu et al. (2008) who relax the implied constraint θ = −ρ × ϕ and estimate an unrestricted parameter θ. As we will see, the filter constraint θ = −ρ × ϕ could be relevant in many applied situations and it can be used to greatly simplify the MCMC estimation procedures we set forth. One advantage of the Bayesian MCMC estimation scheme we propose for this specification is that it does not require integration over the random effects that appear in the likelihood. However integration over these parameters can reduce serial dependence in the MCMC samples of parameters drawn (see Chib and Carlin, 1999). Since the Bayesian estimation procedure allows us to separate the initial cross sectional observations from the remaining time periods, we will see that integration over these effects parameters can be done at almost no additional cost.

ð8Þ

ð9Þ

Using the decomposition proposed by Wansbeek and Kapteyn (1982), we replace JT by its idempotent counterpart J T ¼ J T =T, and it is easy to verify that this can be rewritten as:    

2 2  2  Ω ¼ Tσ μ þ σ ε J T ⊗IN þ σ ε IT −J T ⊗IN ;

ð10Þ

and the resulting precision matrix becomes −1

Ω

¼

   

1 J ⊗I þ 1 I −J ⊗I : T N T T N Tσ 2μ þ σ 2ε σ 2ε

ð11Þ

Magnus (1982) shows that the determinant of the variance– covariance matrix is equivalent to |Ω| = (Tσμ2 + σε2) N(σε2) (T − 1)N. One focus of our work here is to test the practical relevance of the parameter restriction implied by the space–time filter. When estimating the model, we can test the restriction θ=−ϕ×ρ by estimating an unrestricted parameter θ. In this most general case where the restriction is not imposed a priori, the space–time filter components are not separable, requiring a more complicated treatment of the initial period cross-sectional observations. However, for the exogenous case where there is no need to deal with the initial period observations, this is not an issue of concern. To see this, consider the new filter P shown in Eq. (12) for the exogenous case. 0 B P¼B @

−ðϕI N þ θW Þ 0

B ⋱

0 ⋱

1 C C A

ð12Þ

−ðϕIN þ θW Þ B

We can rewrite the model Eq. (1) as: e = (Py − Xβ − ιNTα), where the log-likelihood function of the complete sample size of T is given by: ln LT ðυÞ ¼ −

N X NT 1 1 ′ −1 lnð2πÞ− lnjΩj þ T ln½ð1−ρϖi Þ− e Ω e; 2 2 2 i¼1

ð13Þ

where υ = (β′, α, σε2, σμ2, ϕ, ρ, θ)′. This specification where the first period is not modeled allows us to use conventional matrix expressions and decompositions from the panel data literature that reduce the dimensionality of matrices requiring manipulation during estimation.

2.1. The exogenous first-period specification 2.2. The endogenous first-period specification If the time dimension is large, ignoring information conveyed by the first cross section should have little effect on the estimates and will greatly simplify the computational complexity of estimation. In the exogenous case (or conditional specification), it is assumed that y0 is

Here we consider cases where the cross-sectional spatial units are observed only for a few time periods. Properties of the dynamic relationship that arises in these panel data situations has been extensively

730

O. Parent, J.P. LeSage / Regional Science and Urban Economics 42 (2012) 727–738

discussed by Su and Yang (2007). We adapt the estimation procedure used by Su and Yang (2007) to the case of MCMC and incorporate the approach set forth in Bhargava and Sargan (1983) for treatment of the initial period observations. Endogenous (or unconditional) treatment of the first time period cross-section typically assumes that the withinsample observations and pre-sample values were generated by the same process. The recent space–time panel data modeling literature draws heavily on the assumption that we are dealing with a stationary process. We note that Yu et al. (2007) propose an estimation procedure for non-stationary data for the case where T is large, allowing them to ignore issues related to treatment of the first period observations. When T is assumed fixed, these issues are likely important, so we set forth an approach that incorporates a prediction for the first period cross-section that exploits spatial dependence between locations as part of the MCMC estimation procedure. Unlike the exogenous case, imposing the constraint θ = −ρ × ϕ can make a difference in the computational complexity of estimation here. The spatial dynamic panel data model described by Eq. (5) for t = 0,…, T can be expressed as: Byt ¼ ϕ

sþ1

þ

Byt−ðsþ1Þ þ

þ∞ X

þ∞ X

−1

s

ϕ xt−s β þ ð1−ϕÞ

ϕ εt−s ;

ð14Þ

where B = IN − ρW. Assuming the process started in the distant past, the stationary assumptions |ϕ| b 1 and |ρ| b 1 imply that the first right-hand side expression approaches zero as s approaches infinity. In addition, the first period observations can be expressed as: þ∞ þ∞ X X s −1 s ϕ x−s β þ ð1−ϕÞ ðιN α þ μ Þ þ ϕ ε−s −1

By0 ¼ y~ 0 þ ð1−ϕÞ

þ∞ X s μþ ϕ ε−s :

s¼0

ð15Þ

s¼0

Since ∑ s+=∞0x− sβ is not observed, var–cov (y0) is unknown and we need to predict y~ 0 : A number of studies have proposed different consistent estimators for panels involving large N over a short period of time. We follow Bhargava and Sargan (1983) who suggest that an optimal predictor of y~ 0 , which under the stationary assumption can be approximated by: y~ 0 ¼ Eðy~ 0 jxÞ þ ξ

ð16Þ



¼ ιN π0 þ x π þ ξ;

ð17Þ

where π0 is an intercept coefficient, x⁎ = (x0,…, xT) is the N × (T + 1) k matrix of explanatory variables, π = (π′1, …, π′T + 1)′ is a k (T + 1) vector of unknown parameters and ξ ~ N (0, σξ2IN). 3 We can approximate the initial period observation y0 using: −1

y0 ¼ B

ιN π 0 þ B

−1 

−1

x πþB

ξ0 :

It is of interest to note that we can exploit spatial dependence between observations to produce predictions that “borrow strength” from information contained in neighboring locations. Another way to view this is that the matrix B acts as a spatial smoother. Different predictors could have been used to approximate the likelihood (see Hsiao, 2003 Chapter 4,

3 As explained in Hsiao et al. (2002), the stationary assumption can be relaxed for the first period if we assume the process started in periods that are not too distant from the initial time period observation.

þ∞ X

s

ϕ ε−s :

ð18Þ

The variance and covariance of this first cross-section is equivalent to:   h i   2 2 2 −1 2 2 −1 w11 ¼ E ξ0 ξ ′0 ¼ σ ξ IN þ σ ε 1−ϕ I N þ σ μ ð1−ϕÞ IN

ð19Þ

  ′ 2 −1 ′ w12 ¼ E ξ0 η ¼ σ μ ð1−ϕÞ ι T ⊗I N :

ð20Þ

The joint distribution of y = (y0, y1, …, yT)′ is derived from Eqs. (4) and (15). This suggests that Ω ★, the N(T + 1) × N(T + 1) variance–  ′ covariance matrix of η~ ¼ ξ0 ; η′ Þ equals:  ¼

ðιN α þ μ Þ

s¼0

μþ

s¼0



s

s¼0

−1

ξ0 ¼ ξ þ ð1 þ ϕÞ

Ω

s¼0

By0 ¼

for a detailed discussion). As described in Eq. (18), the error term ξ0 is the sum of three components: the prediction error ξ associated with y~ 0 , the individual effects μ and the cumulative shocks before time zero. Thus,

w11 w21

w12 Ω

 ð21Þ

where Ω, w11 and w21 = w′12, are defined in Eqs. (9), (19) and (20), respectively. Using the predictor for the first cross section, the panel data model defined in Eq. in matrix form as in Eq. (22), h (1) can be rewritten h ~ ¼ π ′ β ′ ′ : ~ ¼ ι′N π0 ι′NT α ′ , β where: α ~ þ η~ ~ þ X~ β Py ¼ α

ð22Þ

The error term η~ in Eq. (22) follows a normal distribution with zero mean and variance–covariance matrix Ω ★ defined in Eq. (21), and 0

x0 B0 X~ ¼ B @ ⋮ 0

… xT … 0 ⋱ ⋮ … 0

1 0 x1 C C: ⋮ A xT

ð23Þ

The N(T + 1) × N(T + 1) matrix P = C ⊗ B corresponds to the space–time filter described in Eq. (4). Note that the initial value ψ for the (T + 1) × (T + 1) matrix C defined in Eq. (2) is equal to one. The var–cov (y) = (C− 1 ⊗ B− 1)Ω⋆(C − 1 ⊗ B − 1)′ for this model, so the matrix inverse for Ω⋆ can be solved using a partitioned inverse expression (see Su and Yang, 2007). Nonetheless, when N and T are large, this procedure will still be computationally expensive. In the next section, we show that Bayesian MCMC estimation does not require integration over the random effects μ because of the separation that arises between the first period cross-section and remaining periods when the implied restriction θ = − ρ × ϕ holds. This can greatly simplify the estimation procedure. By setting Y = (y′1, …, y′T)′ and Y− 1 = (y′0, …, y′T − 1)′, the unconditional log likelihood with integrated random effects has the following form: N ð T þ 1Þ 1 ⋆ logð2πÞ− log Ω 2 2 N X 1 ⋆′ ⋆−1 ⋆ þ ðT þ 1Þ ln½ð1−ρϖi Þ− u Ω u ; 2 i¼1

log LðξÞ ¼ −

ð24Þ

where ξ= (β′, α, σε2, σξ2,π0,π′,σμ2, ϕ, ρ)′ and u⋆ =(y′0B′− (ιNπ0 +x⋆π)′,u′)′ with u= (IT ⊗ B)Y− ϕ(IT ⊗B)Y− 1 − Xβ − ιNTα. We now consider the case where the assumption θ = − ϕ × ρ is relaxed, which alters the space–time filter. In this case, the inability to separate the space and time dimensions along with the requirement that we estimate the additional parameter θ increases the complexity of the estimation procedure.

O. Parent, J.P. LeSage / Regional Science and Urban Economics 42 (2012) 727–738

As in the previous endogenous case where the restriction holds, we have to predict the unobservable components of the initial period. We begin by substituting the spatial dynamic panel data model described by Eq. (1) leading to observations yt that equal: þ∞     X −1 s −1 s Ayt−ðsþ1Þ þ AB xt−s β Byt ¼ AB

731

The joint distribution of Y = (y′0, y′1, …, y′T)′ is derived from Eqs.  (1) ′ and (28), and the full variance–covariance matrix Ω ⋆ for η~ ¼ ξ0 ; η′ Þ has the following form:





Ω ¼

w11 w21

 w12 ; Ω

ð31Þ

s¼0

þ∞    −1  X −1 −1 s ðιN α þ μ Þ þ AB εt−s ; þ IN − AB

ð25Þ

s¼0

where A = (ϕIN + θW) and B = IN − ρW. Again, assuming the process began in the distant past, the stationary assumption |AB − 1| b 1 implies the first expression on the right-hand of Eq. (25) approaches zero as s approaches infinity. Using the Eigenvalue decomposition, the stationary assumption will ϕþθϖi be satisfied if 1−ρϖ b1, where ϖi, i = 1, …, n represent the Eigenvalues i of the spatial weight matrix W. Thus the stationary conditions are equivalent to: if ϕ þ ðρ þ θÞϖmax b1 ϕ þ ðρ þ θÞϖmin b1 if ϕ−ðρ−θÞϖmax > −1 if ϕ−ðρ−θÞϖmin > −1 if

ρ þ θ≥0 ρ þ θb0 ρ−θ≥0 ρ−θb0:

ð26Þ

We note that imposing the constraint θ = − ρ × ϕ, leads to a ϕþθϖi situation where 1−ρϖ b1 is equivalent to |ϕ| b 1. In addition, if the i restriction implied by the space–time filter holds, then θ must be negative since the spatial dependence parameter ρ and the time dependence parameter ϕ are almost always positive in practice. As noted by Parent and LeSage (2010), a number of studies have implemented specifications equivalent to our space–time filter applied to the disturbance structure. In these models, the restriction θ = − ρ×ϕ has been implicitly imposed (e.g. Baltagi et al., 2007; Kelejian et al., 2012) leading to stationarity in situations where |ϕ|+ |ρ| > 1 with −1 −1 |ϕ|b 1 and ϖmin b ρb ϖmax . Again, assuming the process is stationary, our first period observation is equal to: þ∞  X

By0 ¼



−1 s

AB

þ∞     X −1 −1 −1 s x−s β þ IN −AB ðιN α þ μ Þ þ AB ε−s

s¼0

þ∞   −1  X −1 s By0 ¼ y~ 0 þ I N −AB−1 μþ AB ε−s :

s¼0

s¼0

ð27Þ Using the same predictor y~ 0 defined in Eq. (17), the initial observation y0 can be approximated by −1

y0 ¼ B

ιN π 0 þ B

−1 ⋆

−1

x πþB

ξ0 ;

ð28Þ

where þ∞     X −1 −1 −1 s ξ0 ¼ ξ þ I N −AB μþ AB ε−s : s¼0

Thus, h    i−1 h  i−1 2 2 −1 −1 2 −1 −1 AB ′ ′ IN −AB w11¼ Eðξ0 ξ0 Þ ¼ σ ξ IN þ σ ε I N − AB þ σ μ IN −AB

ð29Þ     ′ 2 −1 −1 : w12 ¼ E ξ0 η ¼ σ μ ι′Tþ1 ⊗ IN −AB

ð30Þ

where Ω, w11 and w21 = w12, are defined in Eqs. (9), (29) and (30), respectively. Using the matrix form, the panel data model is equivalent to: ~ þ η~ ~ þ X~ β P~ y ¼ α 0

B B −ðϕIN þ θW Þ P~ ¼ B @ ⋱ 0

ð32Þ

0

1

C C; A ⋱ −ðϕIN þ θW Þ B

ð33Þ

~ X~ are identical to the previous specification shown in ~ , β, where α Eq. (22). The variance–covariance matrix Ω⋆ for the disturbance term η~ is defined in Eq. (31). Inversion of the new filter P~ is more difficult due to introduction of the new parameter θ that captures the space–time diffusion effect. The variance–covariance of the model is now given by var–cov (y) = P− 1Ω⋆P− 1′. The unconditional log likelihood with integrated random effects is given by: N ðT þ 1Þ 1 ⋆ logð2πÞ− log Ω 2 2 N X 1 ⋆′ ⋆−1 ⋆ ln½ð1−ρϖi Þ− u Ω u ; þ ðT þ 1Þ 2 i¼1

log Lðζ Þ ¼ −

ð34Þ

where ζ = (β′, α, σε2, σμ2, σξ2, π0, π′, ϕ, θ, ρ)′ and u ⋆ = (y′0B′ − (ιNπ0 + x ⋆π)′, u′)′, with u = (IT ⊗ B)Y − [ϕINT + θ(IT ⊗ W)]Y− 1 − Xβ − ιNTα, with Y = (y′1, …, y′T)′ and Y− 1 = (y′0, …, y′T − 1)′. By way of summary, inversion of the disturbance covariance matrices Eqs. (21) and (31) is no longer straightforward. The need to introduce an additional parameter into the variance–covariance expression for the first period predictor, rules out the simple decomposition involving idempotent symmetric matrices as in Eq. (10). However, as we will see in the next section, the Bayesian MCMC estimation procedure still allows us to model and estimate the first period cross-section separately, which leads to some computational simplification. 3. Estimation method Most studies of space–time panel data models have relied on Quasi-Maximum Likelihood (QML) estimation. The most difficult step in QML involves maximizing the concentrated log-likelihood function which contains the parameters ρ, ϕ and θ (see Su and Yang, 2007; Yu et al., 2008). The MCMC method can handle the problem of local optima, that often arise in space–time modeling, whereas QML or conventional likelihood procedures may provide misleading inference in the face of local optima (LeSage and Pace, 2007). The ability of Bayesian MCMC methods to directly sample from the posterior avoids these problems. It also simplifies the task of estimating the parameters ρ, ϕ and θ and allows for simple tests of the implied restriction θ = − ρ × ϕ. A fully hierarchical Bayesian approach to estimation requires specification of the prior distributions for the parameters of interest. Because this model will contain only proper prior distributions, the joint posterior distribution is proper. Bayesian inference is based on the joint posterior distribution of the parameters given the data. We focus first on the most complex case

732

O. Parent, J.P. LeSage / Regional Science and Urban Economics 42 (2012) 727–738

which we label the unconditional dynamic spatial autoregressive (UDSAR) model. Due to the hierarchical structure of Eq. (1), its unnormalized form is easily derived as:     2 2 2 p β; α; σ ε ; ψ0 ; σ μ ; ϕ; ρ; θ y ∝ f y γ; μ; σ ε ; ϕ; θ; ρ; ψ0       2 2 2 p μ σ μ p σ μ pðα Þpðβ Þp σ ε pðθ; ρ; ϕÞpðψ0 Þ;

ð35Þ

where Y = (y′0, …, y′T)′ and γ = (α, β′), ψ0 = (σξ2, π0, π′) are the parameters of interest for the prediction of y~ 0 and the likelihood function is defined by f(y|.) and prior distributions by p(.). Direct evaluation of the joint posterior distribution involves multidimensional numerical integration and is not computationally feasible. We use MCMC sampling methods which involve generating sequential samples from the complete set of conditional posterior distributions. We suppose that the prior distributions for the parameters (γ, σε2, σμ2) are all a priori independent. Concerning the parameters γ = (α, β′), we estimate separately the intercept term α and the parameters β assuming the following priors:   −1 α ∼ N α0 ; Mα

ð36Þ

  −1 β ∼ N β0 ; Mβ :

ð37Þ

The scheme for sampling of the posterior distributions is developed in Appendix A.

We conduct a Monte Carlo experiment to evaluate the performance of our sampling method. First, we investigate the consequences of treating the initial observations as exogenous when they are truly endogenous. In treating y0 as endogenous, we assume that the data generating process for y0 conditional on the exogenous variable has achieved stationarity. As noted by Hsiao (2003, Chapter 4), a stochastic specification of this first cross section is relevant only when T is small. In this situation, misspecification of y0 can exert a large bias in the dynamic panel data model due to time dependence modeled in the dependent variable. The data generating process is shown in Eq. (38), yt ¼ ρWyt þ ϕyt−1 þ θWyt−1 þ ιN α þ xt β þ ηt ηt ¼ μ þ ε t ;

ð38Þ

where the exogenous explanatory variable is generated from the following process: xt ¼ φxt−1 þ ζ t

ð39Þ

with, ζt ∼ N(0, σζ2In). As previously indicated, the constraint θ = − ϕ × ρ has the advantage of separating time and spatial dependence which greatly simplifies analysis of the signal-tonoise ratio. As in Kiviet (1995), the long-run effect is set to unity in all experiments, so that β = (1 − (ϕ + θ)/(1 − ρ)) − 1 = (1 − ϕ) − 1 if θ = − ϕ × ρ, and the DGP can be rewritten as: yit −ρ

0 wij yjt ¼ ϕ@yi;t−1 −ρ

j¼1

n X

1 −1

wij yj;t−1 A þ α þ ð1−φLÞ

ζ it β

j¼1

þ μ i þ εit ;

varðy~ it −εit jμ Þ σ 2ε

ð41Þ

¼ varðy~ it jμ Þ−1;

ð42Þ

ψ1 ¼

with y~ it ¼ yit −ρ∑j¼1 wij yjt and σε2 = 1. We set ψ1 constant across experiments. It is interesting to note that unlike the Spatial Error Model where spatial dependence is only in the error term, the strength of the spatial dependence does not influence the signal-to-noise ratio since both the explanatory variables and the error term follow the same spatial process. Increasing or decreasing the value of ρ would have the same influence on the signal and the noise. Thus, the variance varðy~ t jμ Þ varies across designs, with the aim being to keep the signal-to-noise ratio constant over changes in ϕ, so that the explanatory power of the model does not change. In particular, we set ψ1 = 12. We normalize σε2 = 1 and we keep the total signal-to-noise ratio fixed by modifying var(yit|μ) accordingly through changes in σζ2. Conditional on μ and setting σε2 = 1, the signal-to-noise ratio is equivalent to: n

ψ1 ¼

ϕ2 ð1 þ ϕφÞ 2 2   þ β σζ : 1−ϕ2 1−ϕ2 1−φ2 ð1−φϕÞ

ð43Þ

In line with the simulation design of Bun and Kiviet (2006), we choose σμ2 such that the ratio of the impacts on var(yit) of the two variance components μi and εit is ψ22. This implies that,

4. Monte Carlo results

n X

with the MATLAB Delaunay routine to produce a symmetric contiguity weight matrix that is then row-normalized (see Chapter 4 in LeSage and Pace, 2009). We define the signal-tonoise as

ð40Þ

where L is the traditional time-lag operator. The spatial weight matrix W was generated using random points in conjunction

2

2

σ μ ¼ ψ2



 1−ϕ : 1þϕ

ð44Þ

Thus, σμ2 is fixed through ψ2, σζ2 through ψ1 and σε2 = 1. We choose ψ2 = 1 and φ = 0.2. To explore small sample properties, we used T = 5 and N = 50, and considered a reciprocal misspecification experiment regarding treatment of the initial period observations. This involved examining estimates produced when assuming the correct and incorrect specifications regarding treatment of the first period observations as endogenous versus exogenous. In the case where y0 was endogenous, we assumed that the pre-sample iterations were generated by the same process as the within-sample values. This was implemented by generating a pre-sample of 1000 iterations and taking the last T periods as our sample. For each simulation case, we used 20,000 iterations after a burn-in period of 10,000 to calculate posterior means and standard deviations along with 0.05 and 0.95 credible intervals reported in Table 1. When the cross-section of the first period is endogenous but treated incorrectly as exogenous, the estimates can be biased, especially in cases where ρ or ϕ are large. When the spatial dependence parameter is weak, we obtain results similar to those reported by Hsiao (2003, Chapter 4), who found that estimation of the intercept was biased in the face of high levels of time dependence. Our results also reveal that incorrect treatment of the initial period observations can lead to biased estimates when T is small. Note that in this case estimation of the parameter θ is redundant. Parent and LeSage (2011) show in a similar setting where the space– time dependence is only in the error terms that convergence of the algorithm is faster when omitting estimation of θ. In fact, ignoring the restriction θ = − ϕ × ρ does not impact the posterior means, but increases dispersion as well as correlation across the draws. These results will be demonstrated in our empirical example.

O. Parent, J.P. LeSage / Regional Science and Urban Economics 42 (2012) 727–738

733

Table 1 Monte Carlo simulations for T = 5 and N = 50. Parameter

ρ ϕ θ σε2 σμ2 β α ρ ϕ θ σε2 σμ2 β α

True value

Mean

0.9 0.9 − 0.81 1 0.0526 10 10 0.2 0.2 − 0.05 1 0.6667 1.25 1.25

0.8981 0.8999 − 0.8098 1.0008 0.0577 10.0391 10.0292 0.1912 0.2063 − 0.0476 0.9964 0.6876 1.2722 1.2595

s.d.

0.05

0.95

Endogenous model 0.0010 0.0003 0.0007 0.0151 0.0061 0.0113 0.1197 0.0059 0.0030 0.0064 0.0162 0.0285 0.0035 0.0315

Mean

s.d.

0.05

0.95

0.0003 0.0003 0.0001 0.0147 0.0073 0.0106 0.1979 0.0057 0.0029 0.0059 0.0154 0.0363 0.0038 0.0297

0.9031 0.9028 − 0.8104 0.9304 0.0493 10.0912 9.4384 0.1544 0.1747 − 0.0287 0.8866 0.8627 1.2281 1.2417

0.9039 0.9037 − 0.8099 0.9788 0.0732 10.1262 10.0947 0.1731 0.1844 − 0.0091 0.9375 0.9825 1.2406 1.3394

Exogenous model 0.8966 0.8995 − 0.8108 0.9762 0.0480 10.0205 9.8386 0.1813 0.2014 − 0.0584 0.9702 0.6420 1.2665 1.2071

0.8996 0.9005 − 0.8085 1.0259 0.0681 10.0578 10.2126 0.2009 0.2112 − 0.0371 1.0236 0.7359 1.2781 1.3105

0.9035 0.9031 − 0.8102 0.9547 0.0608 10.1088 9.7648 0.1637 0.1795 − 0.0190 0.9118 0.9214 1.2343 1.2910

Note: The last four columns correspond to estimates based on treating y0 exogenously, whereas the first four columns are estimates derived by treating y0 endogenously. The first period y0 is endogenously determined in the data generating process.

Table 2 Monte Carlo simulations for T = 5 and N = 50. Parameter

ρ ϕ θ σε2 σμ2 β α

True value

Mean

0.9 0.9 − 0.85 1 0.0526 2 2

0.9005 0.9001 − 0.8512 0.9555 0.0575 2.0081 1.9728

s.d.

0.05

0.95

Endogenous model 0.0016 0.0123 0.0194 0.1068 0.0520 0.0718 0.0342

Mean

s.d.

0.05

0.95

0.0141 0.0016 0.0036 0.0155 0.0054 0.0104 0.0755

0.9009 0.8980 − 0.8408 0.9733 0.0356 2.1194 2.4529

0.9479 0.9032 − 0.8287 1.0246 0.0534 2.1536 2.7015

Exogenous model 0.8980 0.8772 − 0.8832 0.7953 0.0107 1.8888 1.9338

0.9032 0.9188 − 0.8190 1.1452 0.1581 2.1254 2.0464

0.9239 0.9005 − 0.8354 0.9986 0.0441 2.1364 2.5720

Note: The last four columns correspond to estimates based on treating y0 exogenously, whereas the first four columns are estimates derived by treating y0 endogenously. The first period y0 is endogenously determined in the data generating process.

Relaxing the constraint θ = − ϕ × ρ gives a different signal-tonoise ratio. In order to compare this case with the previous results, we set the parameter θ relatively close to the constraint. As explained earlier, we expect the parameter θ to be close to − ϕ × ρ in a number of applied settings. Estimation results presented in Table 2 again show the importance of properly modeling the initial period observations. A third experiment explored the impact of sample size on the estimation and inference outcomes. Results from this experiment are presented in Table 3. Because of computational complexity as well as sample data constraints, many panel data studies rely on a small number of time periods (see Baltagi et al., 2007). For the case of QML estimation, Yu et al. (2008), show that estimates are sensitive to the number of periods. To explore large sample properties, we set T = 50 and N = 200. For both QML and MCMC estimation methods the

precision of estimates is of course related to the sample size. This can be seen in Table 3 results, where we see average standard deviations based on T = 50 that are one-tenth the size of those from Tables 1 and 2, where T = 5 and N = 50 in both cases. We also observe that the impact of treating initial period observations as exogenous versus endogenous has little impact in this case, as we would expect. Finally, as noted by Blanchard and Matyas (1996), departure of the error components from normality is an important issue in economics and especially with small samples. To check for robustness to nonnormal disturbances, we generated two experiments. The first using a student − t distribution (t (10)) for the disturbance terms εi, and the second generating random effects μi from a χ 2 distribution with 4 degrees of freedom. Table 4 shows that departure from normality in the error terms has little effect on the estimates, leading us to conclude that the proposed method is robust to non-normality.

Table 3 Monte Carlo simulations for T = 50 and N = 200. Parameter

ρ ϕ θ σε2 σμ2 β α

True value

Mean

0.9 0.9 − 0.85 1 0.0526 2 2

0.9022 0.9000 − 0.8498 0.9816 0.0564 1.9957 2.0021

s.d.

0.05

0.95

Endogenous model 0.0009 0.0007 0.0011 0.0045 0.0026 0.0032 0.0263

Mean

s.d.

0.05

0.95

0.0009 0.0005 0.0010 0.0048 0.0028 0.0034 0.0243

0.9005 0.9004 − 0.8557 1.0090 0.0565 1.9830 1.9539

0.9038 0.9021 − 0.8523 1.0247 0.0658 1.9941 2.0340

Exogenous model 0.9008 0.8989 − 0.8516 0.9742 0.0523 1.9905 1.9616

0.9037 0.9011 − 0.8480 0.9890 0.0607 2.0010 2.0477

0.9020 0.9013 − 0.8538 1.0168 0.0611 1.9886 1.9948

Note: The last four columns correspond to estimates based on treating y0 exogenously, whereas the first four columns are estimates derived by treating y0 endogenously. The first period y0 is endogenously determined in the data generating process.

734

O. Parent, J.P. LeSage / Regional Science and Urban Economics 42 (2012) 727–738

Table 4 Monte Carlo simulations for T = 5 and N = 50. Parameter

ρ ϕ θ σε2 σμ2 β α

True value

Mean

0.9 0.9 − 0.85 1 0.0526 2 2

0.9177 0.9094 − 0.8678 1.2158 – 2.0253 2.0404

s.d.

0.05

0.95

Mean

s.d.

0.05

0.95

0.0055 0.0046 0.0065 – 0.0162 0.0226 0.2532

0.8958 0.8959 − 0.8602 – 0.0329 1.9417 1.5769

0.9144 0.9106 − 0.8408 – 0.0852 2.0154 2.4259

2 ε ~ χ(4)

μ ~ t (10) 0.0054 0.0229 0.0229 0.1576 – 0.0861 0.3569

0.9086 0.8681 − 0.9041 0.9789 – 1.8853 1.3225

0.9262 0.9458 − 0.8265 1.4948 – 2.1666 2.4611

0.9053 0.9029 − 0.8504 – 0.0556 1.9785 2.0250

Note: for both specifications, estimates are derived by treating y0 endogenously. The first period y0 is endogenously determined in the data generating process.

5. Space–time analysis of regional spillovers In this empirical illustration, we model and estimate the presence of regional externalities in a growth regression equation. We rely on a simple model of growth based on the Solow model with physical and human capital. The illustration represents an accounting exercise that decomposes output growth into that attributable to growth in capital and labor. This empirical analysis shows that implementing spatial and time dependence convey important information related to the role played by spillovers to neighboring states in economic growth rates. We consider a Cobb–Douglas production function that uses physical capital Kt, human capital Ht, and raw labor Lt for each period t = 0, …, T: β

β

β

Y t ¼ At K t 1 H t 2 Lt 3 ; where At is a Hicks-neutral productivity term that grows at a constant rate g, such that At = A0 exp(gt). Under the neoclassical assumption of constant returns to scale, we have β3 = 1 − β2 − β1. The production structure in logarithmic terms (see Garofalo and Yamarik, 2002) is equivalent to: lnðY t Þ ¼ lnðA0 Þ þ gt þ β1 lnðK t Þ þ β2 lnðLt Þ þ β3 lnðH t Þ:

ð45Þ

Serious problems might arise when estimating Eq. (45) if the output per worker does not have the specific linear trend gt specified in the model. To ensure stationarity and allow for a possible deterministic or stochastic trend, we first differentiate Eq. (45). Differentiation of the logged production function results in the key identity from growth accounting shown in Eq. (46). Y_ t K_ L_ H_ ¼ g þ β1 t þ β 2 t þ β 3 t Yt Kt Lt Ht

ð46Þ

statewide data from 1973 to 1997. The REIS database reports full time and part time private employment as the total number of workers. To measure the stock of human capital Ht, we use a proxy based on FullTime-Equivalent (FTE) four-year enrollment per college-age population based on the data from the National Bureau of Economic Research (NBER) that extracts CPS Merged Outgoing Rotation Groups (see Fortin, 2006, for further detail on the enrollment rates based on four-year institutions of higher education). We assume that states with higher enrollment rates have skilled workers that embody a higher level of human capital. Focusing on regional economies, we assume that integrated areas may benefit from direct externalities, including for instance common markets for final goods and access to similar types of physical and human capital. A number of studies have presented theoretical motivations for the role of production inputs from nearby regions in regional output. With similar local conditions, transfers of goods or technology will tend to be relatively higher between regions that are located near each other. Focusing on European regions, Ciccone (2002) observed a large fraction of growth in total factor productivity spreading to neighboring regions. The applied illustration attempts to measure the extent to which output growth rates from neighboring US states influence those of neighboring state growth rates over both space and time. The empirical specification of the growth Eq. (46) that includes regional spillovers is described with the following expression: g y ðt Þ ¼ ϕg y ðt−1Þ þ ρWg y ðt Þ þ θWg y ðt−1Þ þ ιN α þ β1 g k ðt Þ þ β2 g h ðt Þ þ β3 g l ðt Þ þ μ þ vðt Þ;

where gx(t) represents the growth rate of the variable x at time t. Table 5 Estimation results for the general specification (T = 24, N = 48). Parameters

The growth rate of output is determined by the growth rates of raw labor and (human and physical) capital. We measure the influence of labor and capital growth rates on the growth rate of output for the period 1973–1997 across 48 states.4 We estimate the panel data model based on Eq. (46). The measure of output Yt is Gross State Product (GSP) in manufacturing from the Bureau of Economic Analysis (BEA). Extension of the estimates beyond 1997 is not possible due to the change from SIC industry classifications to the North American Industrial Classification System (NAICS) beginning in 1997. The physical capital Kt is measured using an estimate of the net private capital stock created by Garofalo and Yamarik (2002) based on regional personal income. They allocate capital in the manufacturing sector according to each state's proportion of personal income in each sector at the national level.5 The measurement of labor Lt comes from the U.S. Department of Labor's Bureau of Labor Statistics (BLS) and includes 4 5

The states of Alaska, Hawaii and the District of Columbia are excluded. A revised and extended series through 2001 is available on the author's website.

ð47Þ

s.d.

Lower 0.05

Upper 0.95

Part A: Endogenous specification Constant 0.0083 β1 0.4264 β2 0.4280 β3 0.0597 ρ 0.5915 ϕ 0.0970 θ − 0.0578 σμ− 2 0.06 × 10− 4 σε− 2 0.58 × 10− 3

Posterior mean

0.0020 0.0373 0.0437 0.0240 0.0225 0.0265 0.0317 0.03 × 10− 4 0.03 × 10− 3

0.0050 0.3650 0.3561 0.0203 0.5536 0.0537 − 0.1101 0.02 × 10− 4 0.54 × 10− 3

0.0116 0.4874 0.4996 0.0992 0.6284 0.1406 − 0.0060 0.11 × 10− 4 0.62 × 10− 3

Part B: Exogenous specification Constant 0.0083 β1 0.4264 β2 0.4280 β3 0.0597 ρ 0.5915 ϕ 0.0970 θ − 0.0578 −2 σμ 0.06 × 10− 4 σε− 2 0.58 × 10− 3

0.0020 0.0373 0.0437 0.0240 0.0225 0.0265 0.0317 0.03 × 10− 4 0.03 × 10− 3

0.0050 0.3650 0.3561 0.0203 0.5536 0.0537 − 0.1101 0.02 × 10− 4 0.54 × 10− 3

0.0116 0.4874 0.4996 0.0992 0.6284 0.1406 − 0.0060 0.12 × 10− 4 0.62 × 10− 3

O. Parent, J.P. LeSage / Regional Science and Urban Economics 42 (2012) 727–738

735

Table 6 Estimation results for the general specification (T = 3, N = 48). Parameters

s.d.

Lower 0.05

Part A: Endogenous specification Constant 0.0344 β1 0.4595 β2 0.4344 β3 0.0082 ρ 0.3452 ϕ − 0.0359 θ − 0.2864 σμ− 2 0.06 × 10− 3 σε− 2 0.28 × 10− 3

Posterior mean

Upper 0.95

0.0096 0.1247 0.1911 0.0771 0.0786 0.0696 0.1628 0.03 × 10− 3 0.05 × 10− 3

0.0147 0.2591 0.1190 − 0.1211 0.2212 − 0.1470 − 0.4857 0.03 × 10− 3 0.20 × 10− 3

0.0478 0.6715 0.7540 0.1358 0.4777 0.0801 0.0453 0.11 × 10− 3 0.37 × 10− 3

Part B: Exogenous specification Constant 0.0324 β1 0.4551 β2 0.3780 β3 0.0104 ρ 0.3705 ϕ − 0.0042 θ − 0.2817 σμ− 2 0.08 × 10− 3 −2 σε 0.26 × 10− 3

0.0096 0.1231 0.1951 0.0753 0.0879 0.0805 0.1392 0.04 × 10− 3 0.05 × 10− 3

0.0164 0.2526 0.0444 − 0.1108 0.2251 − 0.1261 − 0.5010 0.03 × 10− 3 0.19 × 10− 3

0.0472 0.6577 0.6860 0.1352 0.5066 0.1357 − 0.0510 0.15 × 10− 3 0.34 × 10− 3

Externalities generated by one region are allowed to influence neighboring regions within the same (annual) time period (the spatial effect), the same region in subsequent periods (the time effect), as well as neighboring regions in subsequent periods (the space–time diffusion effect). This space–time dynamic allows us to compare the relative importance of contemporaneous spatial dependence with time dependence and spatial interaction from the previous periods. The spatial weight matrix W is defined as a row-normalized binary contiguity matrix. Estimation results for T =24, N=48 are presented in Table 5, based on a sample of 50,000 draws collected after a burn-in period of 10,000 draws. Part A of the table corresponds to estimates based on the endogenous treatment of the initial period vector y0, whereas Part B presents estimates based on the exogenous specification. In the following discussion of the parameter estimates we relied on 5 and 95 percentage points of the highest posterior density intervals (HPDI) to draw inferences regarding whether the posterior means were different from zero. Focusing on the spatial dependence parameter ρ, the estimation results indicate there is significant positive spatial dependence in the dependent variable. This indicates that neighboring states exhibit similar growth rates in economic activity. There is also a significant positive time dependence effect indicated by the estimated value for the time dependence parameter ϕ. Note that both conditional and unconditional specifications produce similar estimation results, which seems consistent with our Monte Carlo results for the case where T is large. Since the sum of the parameters ϕ + ρ + θ is less than unity and ϕ − (ρ − θ) is more than negative one, the process is stationary. The strength of spatial dependence is greater than that of time dependence for all specifications. This result confirms the possibility of localized spillovers across space and time. The results show that the assumption of constant returns to scale is not consistent with the sample data. The marginal effects for the three types of inputs are positive and significant, but sum to β1 +β2 + β3 =0.92, consistent with less than constant returns to scale. As an illustration of the importance of treating the initial period observation correctly, we produced estimates using a shorter time span, shown in Table 6. These results reveal an improvement using the endogenous specification for the first cross-section. For this case based on only three years (T = 3), estimates for the elasticities are much closer to results from the full sample size of T = 24 when the first cross section is treated as endogenous and modeled using an optimal predictor robust to spatial dependence. Whereas the elasticities sum to 0.902 in the endogenous case, they sum to 0.844 for the exogenous case appearing to exhibit some downward bias.

Fig. 1. θ vs. − ϕ × ρ (space–time filter constraint).

Turning to the restriction implied by the space–time filter θ = − ρ × ϕ, estimation results presented in Table 5 reveal that this restriction is consistent with the data for both specifications. This is also confirmed in Fig. 1 where kernel density estimates of the posterior distributions are presented from the model where the first period cross-section is modeled as endogenous. In situations such as this where the sample data are consistent with the restriction implied by the space–time filter, the estimation procedure can be greatly simplified by replacing the parameter θ with − ρ × ϕ. This allows us to estimate the following regression: g y ðt Þ ¼ ϕg y ðt−1Þ þ ρWg y ðt Þ−ϕρWgy ðt−1Þ þ ιN α þ β1 g k ðt Þ þ β2 g h ðt Þ þ β3 g l ðt Þ þ μ þ vðt Þ:

ð48Þ

As already noted, the estimation procedure for this model is simpler to implement and requires less time. Focusing the entire sample with T = 24, we obtain similar results as shown in Table 7. 6 The first part of the table corresponds to estimates based on treating y0 endogenously, whereas the second part of the table corresponds to estimates treating y0 exogenously. Again, the method of treatment for the first cross section does not alter the estimation results or inferences when T is large. 6. Conclusion In this paper we consider a general specification for a spatial autoregressive dynamic panel model that contains random effects. This specification encompasses a number of different models covered in the literature. We discuss the importance of the restriction θ = − ρ × ϕ implied by the space–time filter that has been widely used (see Baltagi et al., 2007). We show how this restriction can greatly simplify the computational task of estimating these models, when it is consistent with the sample data. In an applied illustration, we demonstrated that the restriction implied by the space–time filter specification was consistent with our sample data. An interesting issue to explore would be whether imposing this restriction in cases where the sample data are not fully consistent with the restriction results in substantial harm to the resulting estimates and inferences. Since one could view the cross-product term arising from the 6 These estimates were based on an MCMC sample of 50,000 iterations collected after a burn-in period of 10,000 draws. Without the restriction MCMC simulations were executed in 14,064 s for the endogenous case and 7703 s for the exogenous case. Implementing the restriction reduced the processing time to 9285 and 6559 s, respectively.

736

O. Parent, J.P. LeSage / Regional Science and Urban Economics 42 (2012) 727–738

Table 7 Estimation results for the (restricted) space–time filter specification (T = 24, N = 48). Parameters

Posterior mean

Part A: Exogenous specification Constant 0.0080 β1 0.4221 β2 0.4353 β3 0.0615 ρ 0.5938 ϕ 0.0969 0.17 × 10− 4 σμ− 2 σε− 2 0.58 × 10− 3 Part B: Endogenous specification Constant 0.0079 β1 0.4234 0.4375 β2 β3 0.0636 ρ 0.5958 ϕ 0.0938 σμ− 2 0.18 × 10− 4 −2 σε 0.58 × 10− 3

s.d.

Lower 0.05

Upper 0.95

0.0017 0.0372 0.0431 0.0236 0.0208 0.0255 0.06 × 10− 4 0.02 × 10− 3

0.0053 0.3610 0.3640 0.0217 0.5586 0.0553 0.10 × 10− 4 0.54 × 10− 3

0.0109 0.4833 0.5065 0.1006 0.6280 0.1406 0.29 × 10− 4 0.62 × 10− 3

0.0018 0.0371 0.0438 0.0243 0.0218 0.0271 0.05 × 10− 4 0.03 × 10− 3

0.0048 0.3632 0.3666 0.0228 0.5618 0.0498 0.11 × 10− 4 0.54 × 10− 3

0.0109 0.4851 0.5107 0.1040 0.6309 0.1375 0.28 × 10− 4 0.62 × 10− 3

space–time filter specification as a second order effect, it may be that the computational gains are great enough to justify some inaccuracy in the parameter associated with the space–time covariance term in the model. In addition, we considered treatment of the initial period crosssectional observations as endogenous versus exogenous, which is an important issue when the time span of the panel data is small. As highlighted by several seminal papers (Bhargava and Sargan, 1983; Hsiao et al., 2002), the impact of the initial observation can be important when we assume that the underlying process generating the data started in the distant past. Our proposed method for endogenous treatment of the initial period observations extends the approach of Bhargava and Sargan (1983) to exploit spatial dependence in the cross-sectional observations. This allows us to produce first-period predicted values that borrow strength from neighboring observations. Monte Carlo experiments showed that this issue is important when T is small. Conditioning on the first period cross-section in these cases can produce biased estimates. Future research might involve exploring a more complex time dependence specification as well as a more general specification for space–time panel data consistent with work by Chib (2008).

distribution given the first period cross-section allows us to eliminate the full variance–covariance matrix defined in Eq. (31). This implies that the posterior distribution of β conditioned on (σε2, ϕ, ρ, θ, α) is given by   n 1  o ′ −1 ~ Y −Xβ−ιNT α p β y; σ 2ε ; σ 2μ ; ϕ; ρ; θ; α ∝ exp − Y~ −Xβ−ιNT αÞ Ω 2 o n 1 ′ exp − β−β0 Þ M β ðβ−β 0 Þ 2 o   n 1 ′ ′ −1 β−β 1 Þ X Ω X þ M β ðβ−β 1 Þ ∝ exp − 2 Y~ ¼ Y−ϕY −ρðI ⊗W ÞY−θðI ⊗W ÞY h

−1

T

i−1 h

T

−1

^ þM β β1 ¼ X Ω X þ Mβ X Ω Xβ β 0  −1   ′ −1 ′ −1 ^¼ XΩ X Y~ −ιNT α : XΩ β ′

−1



−1

i

That is p(β|y, σε2, σμ2, ϕ, ρ, θ, α) ∼ N(β1, [X′Ω − 1X + Mβ] − 1). Simulating the intercept α is also straightforward. We adopt the same procedure marginalizing over μ the posterior distribution of α. Therefore p(α|y, β, σε2, σμ2, ϕ, ρ, θ) ∼ N(α1, [ι′NTΩ− 1ιNT + Mα]− 1), where h i−1 h i−1 h i ^ ¼ ι′NT Ω−1 ιNT ^ þ Mα α 0 and α α 1 ¼ ι′NT Ω−1 ιNT þ M α ι′NT Ω−1 ιNT α   ι′NT Ω−1 Y~ −Xβ : To estimate the random effects, we use a proper prior and assume that, for i = 1, …, N, μi ∼ N(0, σμ2) with μi and μj is independent for i ≠ j. A hierarchical structure of the prior arises because we treat σμ2 as an unknown parameter which requires its own prior. From the variance–covariance matrix Ω ⋆ defined in Eq. (31), we can rewrite the variance–covariance conditional on μ as ~ ¼ Ω



~ 11 w 0

0 σ 2ε INT

 ;

ð50Þ

~ 11 equals: where w   i ′ −1 ~ 11 ¼ V ðξ0 Þ ¼ σ 2ξ I N þ σ 2ε IN − AB−1 AB−1 Þ w :

ð51Þ

Based on the optimal predictor for the first cross-section given by Eq. (17), we define the error terms as the N(T + 1) × 1 vector u⋆ = ((By0)′ − (ιNπ0 + x ⋆π)′, u′)′, where u = Y − ρ(IT ⊗ W)Y − [IT ⊗ (ϕIN + θW)]Y− 1 − Xβ − ιNTα. We obtain the following posterior distribution  h i−1    ~ −1 Z~ þ σ −2 I N , where p μ y; β; α; σ 2ε ; σ 2μ ; θ; ϕ; ψ0 ∼N m1 ; Z~ ′ Ω μ

Appendix A. Bayesian updating schemes

h i−1 h i ~ −1 Z~ þ σ −2 IN ~ −1 u⋆ , ψ0 = (σξ, π0, π′), Ω ~ is given by Z~ ′ Ω m1 ¼ Z~ ′ Ω μ

First, we note that having the posterior distribution for the parameters associated with the explanatory variables β conditional on the random effects μ is not desirable, since these parameters tend to be highly correlated which can create problems with mixing. Chib and Carlin (1999) suggest first sampling β marginalized over μ and then sampling μ conditioned on β. Given the fact that Y = (y1, …, yT)′ and Y− 1 = (y0, …, yT − 1)′, we can integrate the likelihood function defined by Eq. (34) over μ to obtain:

Eq. (50) and Z~ is based on the concatenation between the first crosssection (Eq. (27)) and the following periods [(IN − AB− 1)− 1′ Z′]′ where Z = ιT ⊗ IN. For the hierarchical structure of the random term, we specify a Gamma distribution for the precision parameter, σμ− 2 ∼ G(v1/2, S1/2). Thus, the posterior distribution for σμ− 2 corresponds to

Y y0 ¼ ϕY −1 þ ρðIT ⊗W ÞY þ θðIT ⊗W ÞY −1 þ ιNT α þ Xβ þ ε; ε ∼ Nð0; ΩÞ; y0 ¼ B−1 ιN π 0 þ B−1 x⋆ π þ B−1 ξ0 ; B

−1

ð49Þ

ξ0 ∼ Nð0; w11 Þ

where Ω and w11 are defined in Eqs. (9) and (29), respectively, and the unconditional likelihood function in Eq. (34) is equivalent to f(Y|y0) f(y0). Note that the Bayesian updating scheme greatly simplifies the likelihood function. Integrating over μ and using the conditional

( ) ( ) −2 −2      σμ σμ ′ −2 −2 N=2 −2 v1 =2−1 p σ μ jy; μ Þ ∝ σ μ exp − exp − μ μ σμ S1 2 2 ( ) −2    σμ  ′ −2 ðv1 þN Þ=2−1 exp − ∝ σμ μ μ þ S1 ; 2

that is p(σμ− 2|y, μ) ∼ G([v1 + N]/2, [μ′μ + S1]/2). Focusing on the space–time parameters, we adopt joint uniform priors for ϕ, ρ and θ over the stationary interval defined in Eq. (26). In fact, by relaxing the assumption θ = − ρ × ϕ, a uniformly distributed joint prior distribution does not produce vague marginal priors. Different approaches have been proposed (Sun and Berger, 1998) to define priors constrained on a parameter space. Since we are concerned

O. Parent, J.P. LeSage / Regional Science and Urban Economics 42 (2012) 727–738

with the parameter vector (θ, ρ, ϕ), then a prior can be constructed that takes the form p(ρ, ϕ, θ) = p(ρ)p(ϕ|ρ)p(θ|ρ, ϕ). Let ϒ = (ϕ, ρ, θ). Assuming that the largest Eigenvalue of W, ϖmax = 1 and the smallest Eigenvalue ϖmin = −1, we can now define the following conditional prior p(ϕ|ρ, θ) ∼ U(−1 + |ρ − θ|,1 − |ρ + θ|) based on the stationary interval defined in Eq. (26).7 Then focusing only on the parameters θ and ρ it is easy to derive the conditional prior p(θ|ρ) ∼ U(−1 + |ρ|,1 − |ρ|). The last prior is therefore p(ρ) ∼ U(−1,1). Note that the joint prior is a uniform distribution and equal to 1/2 over the parameter space define by stationary interval Eq. (26). Thus, each posterior distribution is given by Eq. (52), where l = 1, 2, 3.   −1 1 ′ ~ −1 ~ 2 ′ Tþ1 ϒl ϒ−l ; σ ε ; μ; γ; ψ0 ; α; β ; y∝ Ω e ð52Þ jIN −ρW j exp − e Ω 2 This is not reducible to a standard distribution, so we adopt a Metropolis–Hastings step during the MCMC sampling procedures that relies on a random walk proposal with normally distributed increments, ϒl, new = ϒl, old + γlN(0, 1). The acceptance probability is calculated as the ratio of Eq. (52) evaluated at the old and new candidate draws. The proposal tuning parameter was systematically incremented or decremented when the acceptance rate moved below 0.40 or above 0.60, which resulted in an acceptance rate close to 0.50 after a burn-in period. We also rely on a Metropolis–Hastings step to generate σε2, because the variance–covariance of the first cross-section precludes finding the exact posterior distribution. A Gamma distribution is introduced as a prior for σε− 2 ∼ G(v0/2, S0/2). Thus the posterior is proportional to



−2 ′ σ ε y; γ; μ; θ; ϕ; θ; ψ0 ; α; β ; ∝







−2 v0 =2−1 ~ −1 σε Ω exp

(

) ~ −1 e e′ Ω S0 − − 2 ; 2 2σ ε

ð53Þ where e=(u′0,u′)′ with u0 =(By0 −x⋆π−ιNπ0 −(IN −AB− 1)− 1μ) and ~ is u=Y−ρ(IT ⊗W)Y−[IT ⊗(ϕIN +θW)]Y− 1 −Xβ−ιNTα−Zμ and Ω defined in Eq. (50). The Metropolis–Hastings procedure is based on a random walk proposal with normally distributed increments, σε,2 new = σε,2 old +γεN(0,1) and the acceptance probability is calculated as the ratio of Eq. (53) evaluated at the old and new candidate draws. Relating to the first period, parameters π0 and π are generated the same way we draw α and β. Assuming a Normal prior for π ∼ N(b1, T1− 1), the posterior distribution corresponds to  ′      ⋆ p π y0 ; σ 2ε ; σ 2μ ; ϕ; θ; ρ; π0 ; σ 2ξ ∼N b~ 1 ; T~ 1 , where T~ 1 ¼ x⋆ w−1 11 x þ T 1 h ′ i x⋆ w11 ðBy0 −π0 ιN Þ þ T 1 b1 , where w11 is defined in and b~ 1 ¼ T~ −1 1 Eq. (29). We use a normal prior for π0 ∼N(b0,T0− 1), where b0 =((IN − −1 −1 AB ) ιN)1α.8 If we assume that the precision parameter T0 is very large, we place more weight on  the long term multiplier b0.  The posterior   distribution is equivalent to p π0 y0 ; σ 2ε ; σ 2μ ; ϕ; θ; ρ; π; σ 2ξ ∼N b~ 0 ; T~ 0   ⋆ ~ ~ −1 where T~ 0 ¼ ι′N w−1 11 ιN þ T 0 and b 0 ¼ T 0 ½ι′N w11 ðBy0 −x π Þ þ T 0 b0 : We use an additional Metropolis–Hastings step in order to generate σξ2. A Gamma distribution is introduced as a prior for σξ− 2 ∼ G(v2/2, S2/2). Thus the posterior is proportional to   h    i−1 −1 −2 2 −2 v2 =2−1 2 −2 −1 −1 AB ′ σ ξ y0 ; σ ε ; ϕ; ρ; θ; π; π 0 ; μ ∝ σ ξ j σ ξ IN þ σ ε IN − AB ( ) ð54Þ −1 ′ ~ u u w S exp − 0 11 0 − 22 ; 2 2σ ξ 7 As explained in Kelejian and Prucha (2010), we can set ϖmax = 1 and ϖmin = − 1 if we work with the normalized weights matrix W/τn, where τn denotes the spectral radius of W; i.e., τn = max{|ϖ1|, …, |ϖn|} and ϖ1, …, ϖn denote the Eigenvalues of W. 8 Since W is row-normalized, all the coefficients of (IN − AB− 1)− 1ιN are identical and we just describe the first one.

737

~ 11 is defined in where u0 =By0 −x⋆π− ιNπ0 − (IN − AB− 1)− 1μ and w Eq. (51). The Metropolis–Hastings procedures rely again on a random walk 2 2 proposal with normally distributed increments, σξnew =σξold + γξN(0, 1). The acceptance probability is calculated as the ratio of Eq. (54) evaluated at the old and new candidate draws. For each of these Metropolis–Hastings steps, we need to satisfy the stationarity assumption   i−1 2 σ I þ σ −2 I − AB−1 AB−1 Þ′ b1: N ε ξ N

ð55Þ

The estimation method for a UDSAR model imposing the filter constraint θ = − ρ × ϕ is straightforward using the same updating scheme. References Anselin, L., 1988. Spatial Econometrics, Methods and Models. Kluwer Academic, Boston. Anselin, L., 2001. Spatial econometrics. In: Badi H., Baltagi (Ed.), A Companion to Theoretical Econometrics. Blackwell Publishers Lte, Massachusetts. Baltagi, B.H., Song, S.H., Jung, B.C., Koh, W., 2007. Testing for serial correlation, spatial autocorrelation and random effects using panel data. Journal of Econometrics 140, 5–51. Bhargava, A., Sargan, J.D., 1983. Estimating dynamic random effects models from panel data covering short time periods. Econometrica 51, 1635–1659. Blanchard, P., Matyas, L., 1996. Robustness of tests for error components models to non-normality. Economics Letters 51, 161–167. Bramoullé, Y., Djebbari, H., Fortin, B., 2009. Identification of peer effects through social networks. Journal of Econometrics 150 (1), 41–55. Bun, M.J.G., Kiviet, J.F., 2006. The effects of dynamic feedbacks on LS and MM estimator accuracy in panel data models. Journal of Econometrics 132 (2), 409–444. Chib, S., 2008. Panel data modeling and inference: a Bayesian primer, In: Matyas, L., Sevestre, P. (Eds.), The Econometrics of Panel Data, Third edition. Springer-Verlag, Berlin Heidelberg, pp. 479–515. Chib, S., Carlin, B.P., 1999. On MCMC sampling in hierarchical longitudinal models. Statistics and Computing 9, 17–26. Ciccone, A., 2002. Agglomeration effects in Europe. European Economic Review 46, 213–227. Debarsy, N., Ertur, C., LeSage, J.P., 2012. Interpreting dynamic space–time panel data models. Statistical Methodology 9, 158–171. Devereux, M.P., Lockwood, B., Redoano, M., 2007. Horizontal and vertical indirect tax competition: theory and some evidence from the USA. Journal of Public Economics 91, 451–479. Elhorst, P., 2004. Serial and spatial error dependence in space–time models. In: Getis, A., Mur, J., Zoller, H.G. (Eds.), Spatial Econometrics and Spatial Statistics. Palgrave-MacMillan, pp. 176–193. Elhorst, J.P., 2010. Dynamic panels with endogenous interaction effects when T is small. Regional Science and Urban Economics 40, 272–282. Fortin, N.M., 2006. Higher education policies and the college premium: cross-state evidence from the 1990s. American Economic Review 96, 959–987. Garofalo, G., Yamarik, S., 2002. Regional convergence: evidence from a new state-bystate capital stock series. The Review of Economics and Statistics 84 (2), 316–323. Hsiao, C., 2003. Analysis of Panel Data. Cambridge University Press, Cambridge. Hsiao, C., Pesaran, M.H., Tahmiscioglu, A.K., 2002. Maximum likelihood estimation of fixed effects dynamic panel data models covering short time periods. Journal of Econometrics 109, 107–150. Kelejian, H., Prucha, I.R., 2010. Specification and estimation of spatial autoregressive models with autoregressive and heteroskedastic disturbances. Journal of Econometrics 157, 53–67. Kelejian, H., Tavlas, G.S., Petroulas, P., 2012. In the neighborhood: the trade effects of the Euro in a spatial framework. Regional Science and Urban Economics 42, 314–322. Keller, W., Shiue, C.H., 2007. The origin of spatial interaction. Journal of Econometrics 140, 304–332. Kiviet, J.F., 1995. On bias, inconsistency, and efficiency of various estimators in dynamic panel data models. Journal of Econometrics 68, 53–78. Korniotis, G.M., 2010. Estimating panel models with internal and external habit formation. Journal of Business and Economic Statistics 28, 145–158. Lee, L.F., Yu, J., 2010. Some recent developments in spatial panel data models. Regional Science and Urban Economics 40, 255–271. LeSage, J., Pace, R.K., 2007. A matrix exponential spatial specification. Journal of Econometrics 140, 190–214. LeSage, J., Pace, R.K., 2009. An Introduction to Spatial Econometrics. CRC Press, Taylor & Francis Group, Boca Raton, London, New York. Lottmann, F., 2012. Spatial dependencies in German matching functions. Regional Science and Urban Economics 42, 27–41. Magnus, J.R., 1982. Multivariate error component analysis of linear and nonlinear regression models by maximum likelihood. Journal of Econometrics 19, 239–285.

738

O. Parent, J.P. LeSage / Regional Science and Urban Economics 42 (2012) 727–738

Mohl, P., Hagen, T., 2010. Do EU structural funds promote regional growth? New evidence from various panel data approaches. Regional Science and Urban Economics 40, 353–365. Parent, O., LeSage, J.P., 2010. A spatial dynamic panel model with random effects applied to commuting times. Transportation Research — Part B 44, 633–645. Parent, O., LeSage, J.P., 2011. A space–time filter for panel data models containing random effects. Computational Statistics and Data Analysis 55, 475–490. Su, L., Yang, Z., 2007. QML Estimation of Dynamic Panel Data Models with Spatial Errors. Singapore Management University manuscript. Sun, D., Berger, J.O., 1998. Reference priors with partial information. Biometrika 85, 55–71.

Wansbeek, T.J., Kapteyn, A., 1982. A simple way to obtain the spectral decomposition of variance components models for balanced data. Communications in Statistics Part A — Theory Methods 11, 2105–2112. Yu, J., Lee, J.F., 2010. Efficient GMM estimation of spatial dynamic panel data models. Working Paper. Yu, J., de Jong, R., Lee, L.F., 2007. Quasi-maximum likelihood estimators for spatial dynamic panel data with fixed effects when both n and T are large: a nonstationary case. Working Paper. Yu, J., de Jong, R., Lee, L.F., 2008. Quasi-maximum likelihood estimators for spatial dynamic panel data with fixed effects when both n and T are large. Journal of Econometrics 146, 118–134.