Economics Letters 101 (2008) 282–284
Contents lists available at ScienceDirect
Economics Letters j o u r n a l h o m e p a g e : w w w. e l s ev i e r. c o m / l o c a t e / e c o n b a s e
A spatial Hausman test R. Kelley Pace a,⁎,1, James P. LeSage b a
LREC Endowed Chair of Real Estate, Department of Finance, E.J. Ourso College of Business Administration, Louisiana State University, Baton Rouge, LA 70803-6308, United States Fields Endowed Chair in Urban and Regional Economics, McCoy College of Business Administration, Department of Finance and Economics, Texas State University - San Marcos, San Marcos, Texas 78666, United States
b
a r t i c l e
i n f o
Article history: Received 20 October 2007 Received in revised form 2 September 2008 Accepted 16 September 2008 Available online 23 September 2008
a b s t r a c t Often, authors report materially different OLS and spatial error model estimates. However, under the null of correct specification, these estimates should be similar. We propose a spatial Hausman test and conduct a Monte Carlo experiment to examine its performance. © 2008 Elsevier B.V. All rights reserved.
Keywords: Spatial autoregression Specification test Spatial econometrics SAR SEM JEL classification: C11 C13
1. Introduction Both ordinary least squares (OLS) and the spatial error model (SEM) have been widely applied to spatial data. Often, authors provide both sets of estimates along with standard errors, allowing a pairwise comparison. This type of comparison reveals cases where OLS and SEM estimates are quite similar (Pace,1997; Cohen and Coughlin, 2006), other indeterminate cases where various, but not obviously significant, differences exist (Neill et al., 2007; Theebe, 2004), and cases where differences appear to be statistically significant. For example, Brasington (2007), in a study on the willingness to pay for public schools, found OLS and SEM coefficients with different signs on variables representing educational attainment and owner occupied housing. In a study on retail location, Lee and Pace (2005), report an OLS estimate relating store size to sales that was negative and significant, while the SEM estimate was positive and significant. In fact, in Ord's seminal paper on spatial regression models (Ord, 1975), he reports OLS and SEM estimates from a univariate model (with intercept) where the slope coefficient differs by two standard errors.
⁎ Corresponding author. Tel.: +1 225 578 6256 (OFF); fax: +1 225 578 9065. E-mail addresses:
[email protected] (R. Kelley Pace),
[email protected] (J.P. LeSage). 1 The authors would like to thank David Brasington, Dek Terrell, Donald Lacombe and Jennifer Zhu for their valuable comments. In addition, the author would like to acknowledge support from NSF SES-0729259, 0729264 as well as the Louisiana and Texas Sea Grant Programs. 0165-1765/$ – see front matter © 2008 Elsevier B.V. All rights reserved. doi:10.1016/j.econlet.2008.09.003
Under the SEM model assumptions, OLS and SEM regression parameter estimates should be unbiased (Anselin, 1988, p. 59). This suggests that significant differences in regression parameter estimates will arise only from misspecification. We formalize this result with a spatial Hausman specification test for significant differences between OLS and SEM estimates. In a Monte Carlo experiment, we show that the spatial Hausman test has the correct size. 2. Spatial Hausman test The linear model where the disturbances are independent identically distributed (iid) represents a simple data generating process that we label the iid DGP, shown in 1. The n observation vector y represents the regressand, the matrix X contains n observations on k exogenous explanatory variables, β is a k by 1 vector of regression parameters, and ε is a n by 1 vector of iid disturbances. y ¼ Xβ þ ɛ:
ð1Þ
ˆ = (X′X)-1X′y, or OLS. The canonical estimator for the iid DGP is β The iid error model has been widely used with spatial data samples where the observations represent points or regions located in space. As an alternative, assume the disturbances follow a spatial autoregressive process, labeled the spatial error DGP in (2), y ¼ Xβ þ ðI−ρW Þ−1 ɛ
ð2Þ
R. Kelley Pace, J.P. LeSage / Economics Letters 101 (2008) 282–284
where ρ is a scalar parameter that governs spatial dependence, and W is a n by n spatial weight matrix with zeros on the diagonal and nonnegative off-diagonal elements. In W any two observations i and j are neighbors if W ij N 0 (i ≠ j), and elements of W are fixed prior to estimation. Powers of matrix W have an interpretation such that, a positive element ij in W 2 means that j is a second-order neighbor of observation i (a neighbor to a neighbor). Similar interpretations apply to higher-order powers of W. Assuming it exists, (I − ρW)-1 = I + ρW + ρ2W2+…, the variance–covariance matrix Ω equals G2(I − ρW)− 1(I − ρW′)− 1. Therefore, the elements of the weight matrix directly affect corresponding elements in the variance–covariance matrix. Note, the spatial error DGP 2 nests the iid DGP 1 as a special case when ρ = 0. The canonical estimator for the spatial error DGP associated with the spatial error model (SEM), appears in Eq. (3). −1 ~ 0 0 ~ ~ ~ ~ β ¼ X 0 ðI− ρW Þ ðI− ρW ÞX X 0 ðI− ρW Þ ðI− ρW Þy:
ð3Þ
Due to the transformation of the variables implied by the SEM, these models require estimation by maximum likelihood or some other technique to avoid biased estimation of the spatial dependence parameter, ρ (Ord, 1975, p. 121). Assuming the spatial error DGP, consistency of maximum likelihood (Mardia and Marshall, 1984; Lee, 2004) should lead to an estimate of ρ˜ close to the true value ρ in large samples. A consequence is that under the spatial error DGP, maximum likelihood estimates, β˜ approach the true β as n becomes large. Also, under the assumed ˆ from the OLS model are unbiased spatial error DGP, estimates β (Anselin, 1988, p. 59). Unbiasedness guarantees the weaker property of consistency for OLS estimates. h i ˆ ¼ ð X 0 X Þ−1 X 0 Xβ þ ðI−ρW Þ−1 ɛ β
ð4Þ
ˆ ¼ β þ ð X 0 X Þ−1 X 0 ðI−ρW Þ−1 ɛ β
ð5Þ
283
Table 1 Empirical versus theoretical sizes and spatial dependence ρ ρ
0.01
0.05
0.10
0.25
0.50
0.30 0.60 0.90
0.0093 0.0098 0.0130
0.0499 0.0521 0.0533
0.0984 0.0964 0.1049
0.2472 0.2498 0.2576
0.5023 0.5067 0.5038
˜s in Eq. (8) has a wellThe estimated variance–covariance matrix V known form (Anselin, 1988). ~ ~2 ~ Þ0 ðI− ρW ~ ÞX −1 : V s ¼ σ X 0 ðI−ρW
ð8Þ
However, the usual OLS variance–covariance matrix σ2(X′X)− 1 is inconsistent under the null of a spatial error DGP (Cordy and Griffith, 1993, p. 1167–1168). Nonetheless, deriving a consistent estimator of the OLS variance–covariance matrix under a spatial error DGP is straightforward (Cordy and Griffith, 1993, p. 1167). Given the DGP with a known value of ρ, the variance of the OLS estimates appears in Eq. (10). ˆ ˆ ¼ ð X 0 X Þ−1 X 0 ðI−ρW Þ−1 ɛ β−E β
−1
ð9Þ
−1
−1
Vo ¼ σ 2 ð X 0 X Þ X 0 ðI−ρW Þ−1 ðI−ρW 0 Þ X ð X 0 X Þ
ð10Þ
~ ~ 2 ð X 0 X Þ−1 X 0 ðI− ρW ~ Þ−1 ðI− ρW ~ 0 Þ−1 X ð X 0 X Þ−1 : Vo ¼ σ
ð11Þ
Under the assumed null of the spatial error process, the maximum likelihood estimate, σ˜ 2, based on the variance of the residuals from the SEM provides a consistent estimate of σ 2. The maximum ˜ provides a consistent estimate of ρ. These likelihood estimate ρ estimates permit a feasible calculation of the variance of the least ˆ , β˜ , and V˜s , this completes all squares estimates as in 11. Along with β the necessary ingredients for computing T˜ . 3. Monte Carlo results
E βˆ ¼ β:
ð6Þ
Although OLS and SEM estimators under the spatial error DGP should yield estimates that approach β for large n, the literature contains examples where the estimates do not appear similar. A Hausman test (Hausman, 1978) can be used whenever under the null hypothesis there are two consistent estimators differing in efficiency, and under the alternative hypothesis of misspecification the two estimators yield divergent results. We propose such a test for statistical differences between the OLS and SEM estimates. Given the theoretical results set forth above, a significant difference between the two sets of estimates suggests misspecification. Let δ˜ = βˆ - β˜ , represent the difference between OLS and SEM estimates of the model parameters. Under the null hypothesis of the spatial error DGP, the Hausman test statistic T˜ has the simple form shown in Eq. (7), ~ ~0 ~ ~ −1~ T ¼ δ V o− V s δ
ð7Þ
˜o is a consistent estimate of the variance–covariance matrix where V associated with βˆ from OLS (under the null of the spatial error DGP), and V˜s is a consistent estimate of the variance–covariance matrix ˜ from the spatial error model. The statistic T˜ follows a associated with β Chi-squared distribution with degrees-of-freedom equal to the number of regression parameters under test. See Davidson and MacKinnon (1993, p. 389–395) for a clear exposition of Hausman tests.
To obtain some idea of the performance of the spatial Hausman test under controlled conditions, we simulated a spatial error process with 3000 observations using five explanatory variables (including a constant term), and a setting of σ2 = 0.2. Three cases were considered based on ρ = 0.3, 0.6, and 0.9, corresponding to low, medium, and high levels of spatial dependence. For each case, we simulated 10,000 separate trials of y and estimated the SEM model via maximum likelihood, the iid model via OLS, and calculated the spatial Hausman test statistic T˜ . The empirical size was determined by the proportion of T˜ from the 10,000 trials that exceeded the Chi-squared critical values at levels of 0.01, 0.05, 0.10, 0.25, and 0.50 for 5 degrees-offreedom. As shown by Table 1, the empirical sizes conformed closely with the theoretical sizes. 4. Conclusion Most testing relies on choosing models with better fit. However, the magnitude of the regression parameter estimates themselves have value, particularly in cases like the one examined here where theoretical results suggest that OLS and SEM should produce similar estimates in large samples. For many fundamentally spatial problems such as those involving real estate data, SEM will almost always yield a significantly higher likelihood than OLS. For a given set of variables, a divergence between the coefficient estimates from SEM and OLS suggests that neither is yielding regression parameter estimates matching the underlying parameters in the DGP. This calls into question use of either OLS or SEM for that set of variables.
284
R. Kelley Pace, J.P. LeSage / Economics Letters 101 (2008) 282–284
Suppose a particular specification fails the spatial Hausman test? Since under the null hypothesis, both estimators are unbiased, a rejection implies a failure of the orthogonality condition, i.e., the stochastic disturbance is correlated with one or more right hand side variables. Since the differences between OLS and the SEM arise in the presence of spatial dependence, enriching the spatial specification for the explanatory variables may reduce the correlation between the stochastic disturbance and the included variables. Often a more general model with separate spatial lags of both the explanatory variables and the dependent variable (which nests the SEM) may be appropriate as it fits better and has a richer spatial interpretation. However, if theory suggests an error model, adding spatially lagged explanatory variables can reduce the Hausman test statistic below its critical value. Naturally, the results will depend upon the transformations of the explanatory and dependent variable as well as the actual explanatory variables chosen by the investigator. The spatial Hausman test developed here could be easily extended to other models of spatial disturbances such as conditional autoregression, moving average, geostatistical, and the matrix exponential spatial specification (Cordy and Griffith, 1993; Dubin, 1988; LeSage and Pace, 2007). References Anselin, Luc, 1988. Spatial Econometrics: Methods and Models. Kluwer Academic Publishers, Dorddrecht. Brasington, David M., 2007. Private schools and the willingness to pay for public schooling. Education Finance and Policy 2, 152–174. Cohen, Jeffrey P., Coughlin, Cletus C., 2006. Spatial Hedonic models of airport noise, proximity, and housing prices. Working paper 2006-026B, Federal Reserve Bank of St. Louis.
Cordy, Clifford B., Griffith, Daniel A., 1993. Efficiency of least squares estimators in the presence of spatial autocorrelation. Communications in Statistics – Simulation and Computation 22, 1161–1179. Davidson, Russell, MacKinnon, James, 1993. Estimation and Inference in Econometrics. Oxford University Press, New York. Dubin, Robin, 1988. Estimation of regression coefficients in the presence of spatially autocorrelated error terms. Review of Economics and Statistics 70, 466–474. Hausman, J.A., 1978. Specification tests in econometrics. Econometrica 46, 1251–1272. Lee, L.-F., 2004. Asymptotic distributions of quasi-maximum likelihood estimators for spatial econometric models. Econometrica 72, 1899–1926. Lee, Ming Long, Pace, R. Kelley, 2005. Spatial distribution of retail sales. Journal of Real Estate Finance and Economics 31, 53–69. LeSage, James P., Pace, R. Kelley, 2007. A matrix exponential spatial specification. Journal of Econometrics 140, 190–214. Mardia, K.V., Marshall, R.J., 1984. Maximum likelihood estimation of models for residual covariance in spatial regression. Biometrika 71, 135–146. Neill, Helen R., Hassenzahl, David M., Assane, Djecto D., 2007. Estimating the effect of air quality: spatial versus traditional hedonic price models. Southern Economic Journal 73, 1088–1111. Ord, Keith, 1975. Estimation methods for models of spatial interaction. Journal of the American Statistical Association 70, 120–126. Pace, R. Kelley, 1997. Performing large-scale spatial autoregressions. Economics Letters 54, 283–291. Theebe, Marcel A.J., 2004. Planes, trains, and automobiles: the impact of traffic noise on house prices. Journal of Real Estate Finance and Economics 28, 209–234.