Regional Science and Urban Economics xxx (2018) 1–11
Contents lists available at ScienceDirect
Regional Science and Urban Economics journal homepage: www.elsevier.com/locate/regec
Theoretical foundations for spatial econometric research☆ Xingbai Xu a,* , Lung-fei Lee b a
Wang Yanan Institute for Studies in Economics (WISE), Department of Statistics, School of Economics, MOE Key Lab of Econometrics and Fujian Key Lab of Statistics, Xiamen University, Xiamen, 361005, China Department of Economics, The Ohio State University, Columbus, OH, USA
b
A R T I C L E
I N F O
JEL: C13 C21 C24 C57 Keywords: Spatial autoregressive model GMM MLE Linear-quadratic form Martingale CLT Spatial near-epoch dependence
A B S T R A C T
This paper reviews the development of large sample theories for spatial econometric models. These theories form important parts on statistical foundations for spatial econometrics. Another important component is the theoretical economics foundation for spatial econometric model specifications. We discuss how spatial econometric models can be regarded as the Nash equilibrium of some complete information games. Moran’s I test for spatial dependence is based on a statistic with a linear-quadratic form. Scores of the ML and moments for 2SLS and GMM are also in linear-quadratic form. A statistic with a linear-quadratic form can be characterized as a sum of martingale differences, so the central limit theorem for martingale difference arrays is crucial for asymptotic distributions of such statistics. For linear spatial models, statistics on linear-quadratic forms are the basis of spatial econometrics. For nonlinear spatial models, near-epoch dependent random fields play a crucial role. We summarize some important properties of near-epoch dependent random fields and illustrate how they are used in studying nonlinear spatial models such as spatial Tobit and spatial binary choice models.
1. Introduction Spatial econometrics study estimation, tests, and inference of econometric models where spatial correlation exists among economic data across spatial units, which can be individuals, cities, counties, etc. Several books, e.g., Cliff and Ord (1973), Anselin (1988) and LeSage and Pace (2009), summarize some important developments in economic and statistical theories, computation, estimation, and empirical applications. Various estimation approaches have been developed for spatial econometric models. Kelejian and Prucha (1998) study the twostage least squares (2SLS) estimation of spatially autoregressive (SAR) models with possibly spatially autoregressive disturbances. Kelejian and Prucha (1999) investigate the generalized method of moments (GMM) for a spatial error (SE) model. Lee (2004) establishes the consistency and asymptotic normality of the quasi-maximum likelihood estimator (QMLE) of SAR models. Lee (2007) discovers a type of quadratic moment conditions for SAR models to obtain a best GMM estimator.
And spatial panel data models are also a field with many empirical applications, see. e.g., Lee and Yu (2010) and Qu et al. (2017). In recent years, large sample theories for nonlinear spatial econometrics attract interest. Linear spatial models are not suitable for some empirical applications, e.g., discrete choice and limited dependent variables in spatial econometrics are two obvious cases. Linear models usually have a closed form expression in terms of exogenous regressors and disturbances. However, for nonlinear models, that will not be the case. As a result, large sample properties of estimators are expected to be more challenging to be investigated. In this regard, recent developments on weak laws of large numbers (WLLN) and central limit theorems (CLT) for nonlinear spatial econometrics are important. Jenish and Prucha (2009, 2012) provide some limiting laws for mixing and near-epoch dependent (NED) random fields. Based on the spatial NED theory, Xu and Lee (2015b, 2018) examine asymptotic theories of estimators for spatial Tobit models, Qu and Lee (2015) investigate a SAR model with endogenous spatial
☆ We gratefully acknowledge the financial support of the Fundamental Research Funds for the Central Universities (ZK1038) to Xiamen University, and the Chinese Natural Science Fund (No. 71703135). We also thank for valuable comments from an anonymous referee. * Corresponding author.
E-mail address:
[email protected] (X. Xu). https://doi.org/10.1016/j.regsciurbeco.2018.04.002 Received 27 September 2017; Received in revised form 29 March 2018; Accepted 5 April 2018 Available online XXX 0166-0462/© 2018 Elsevier B.V. All rights reserved.
Please cite this article in press as: Xu, X., Lee, L.-f., Theoretical foundations for spatial econometric research, Regional Science and Urban Economics (2018), https://doi.org/10.1016/j.regsciurbeco.2018.04.002
X. Xu and L.-f. Lee
Regional Science and Urban Economics xxx (2018) 1–11
(
others’ actions: yi,n 𝜆0
∑n
) wij,n yj,n + xi′,n 𝛽0 + 𝜖i,n , which can be sub-
weight matrix, and Xu and Lee (2016) study a binary choice SAR model.1 This paper reports foundations and techniques on analyzing large sample properties of estimators for linear and nonlinear spatial econometrics. In Section 2, we discuss relevant economic foundations on games to derive linear SAR models, and review large sample theories for linear spatial econometric models. Model stability would require uniform boundedness of spatial weight matrices in norms. Statistics on linear-quadratic form characterize various estimation methods for linear SAR models. Martingale CLT provides an essential tool for statistical inference. In Section 3, we extend the game foundations of linear SAR models to nonlinear ones. For nonlinear spatial econometric models, we discuss some spatial weak dependence concepts and their properties, especially NED random fields. We point out some other tools relevant for analyzing nonlinear spatial models. In Section 4, we illustrate the use of NED random fields to investigate spatial Tobit and binary choice models and their estimation problems. Conclusions are drawn in Section 5.
where the first component represents private utility associated with an action y i,n and the second component captures a conformity effect with friends (see, Brock and Durlauf, 2001).
2. Theoretical foundations for linear spatial econometric models
2.3. Linear spatial models and stability conditions
2.1. Some popular linear spatial econometric models
We will use a SAR model as an example to discuss some widely adopted structures in spatial econometrics. As one would like to exclude self-influence, the diagonal elements of W n are specified to be zero. For the model to be stable in the sense that y i,n ’s do not have unbounded variances as n becomes large, we need to add some restrictions on W n . As in Kelejian and Prucha (1998, 1999), W n is often set to be uniformly bounded in both the row sum norm (∞-norm) and column sum norm (1-norm), i.e.,
stitute or complementary depending on the sign of 𝜆0 . Then his utility is ( ) n y2 ∑ ′ wij,n yj,n + xi,n 𝛽0 + 𝜖i,n − i,n , (5) ui,n ( yi,n ) = yi,n 𝜆0 2 j =1
where xi,n and 𝜖 i,n are known to all individuals. The optimal action for i will be characterized by Eq. (1). When Eq. (1) has a solution, the solution is a Nash equilibrium of this game. Depending on applications of this model, there are possibly other theoretical justifications. For social interactions, one may have a private and social utility: n ( ) ∑ 1 wij,n yj,n )2 , ui,n ( yi,n ) = yi,n xi′,n 𝛽0 + 𝜖i,n − ( yi,n − 𝜆0 2 j =1
Among spatial econometrics models, linear ones are the most useful ones, where linearity means that dependent variables are affine functions of disturbances in a model. There are several popular linear spatial econometric models. A SAR model describes the interaction of endogenous variables of spatial units: yi,n = 𝜆0
n ∑ j =1
wij,n yj,n + xi′,n 𝛽0 + 𝜖i,n ,
(1) sup ‖Wn ‖∞ ≡ sup n
where y i,n is the dependent variable, xi,n is an exogenous variable (column) vector, W n ≡ (wij,n ) is a specified spatial weight matrix, 𝜖 i,n ’s are disturbance terms that are usually assumed to be independently or even independently and identically distributed (iid), and 𝜆0 and 𝛽 0 are true coefficients. We can write Eq. (1) in a matrix form: Yn = 𝜆0 Wn Yn + Xn 𝛽0 + 𝜖n ,
(2)
𝜖n = 𝜆0 Wn 𝜖n + un ,
(3)
where ui,n ’s are set to be independent or iid with zero mean and finite variances. A SAR model with SAR disturbances (SARAR) model combines a SAR model and a SE model: Yn = 𝜆0 Wn Yn + Xn 𝛽0 + 𝜖n ,
𝜖n = 𝜆0 Mn 𝜖n + un ,
i,n
n ∑
|wij,n | < ∞
and
j =1
sup ‖Wn ‖1 ≡ sup n
n ∑
j,n i=1
|wij,n | < ∞.
∑ Sometimes, W n is assumed to be row-normalized: nj=1 |wij,n | = 1 for every nonzero row of W n . In this case, ∥W n ∥∞ ≡ 1. One may interpret the influence on y i,n under interactions is the average of neighboring units’ activities. In this case, the uniform boundedness is solely imposed on the column sums of W n . In general, uniform boundedness is imposed on both rows and columns of W n . The uniform boundedness in a norm for the sequence of W n , however, is not sufficient for a SAR process to be stable. An example is a unit root process y i = yi−1 + 𝜖 i , i = 2, …, n, with y 1 = 𝜖 1 . The implied W n is uniformly bounded in both row and ∑ column sum norms, but yi = ij=1 𝜖j is not a stable process. To rule out this situation, Kelejian and Prucha (1998, 1999) assume the additional condition that Sn (𝜆0 ) = (In − 𝜆0 Wn )−1 is uniform bounded in both row and column sum norms. A sufficient condition for model stability is that ∥𝜆0 W n ∥ < 1 for some norm ∥·∥. With the additional condition, by the Neumann’s expansion,
where Yn ≡ ( y1,n , … , yn,n )′ and Xn ≡ (x1,n , … , xn,n )′ . A SE model assumes that spatial interactions take place among disturbance terms: Yn = Xn 𝛽0 + 𝜖n ,
j =1
(4) Sn (𝜆0 )−1 ≡ (In − 𝜆0 Wn )−1 =
where the two spatial weight matrices W n and M n can be the same or different.
∞ ∑
(𝜆0 Wn )l .
l=0
Then, Eq. (2) implies 2.2. Game foundations for SAR models
Yn = (Xn 𝛽0 + 𝜖n ) + 𝜆0 Wn (Xn 𝛽0 + 𝜖n ) + 𝜆20 Wn2 (Xn 𝛽0 + 𝜖n ) + · · · .
A SAR model can be regarded as a model on the Nash equilibrium of a static complete information game with linear-quadratic utilities. Suppose that there are n individuals, and they choose their actions (e.g., efforts) to maximize their utilities. Let the action for individual i be y i,n ,
The second term on the right hand side (RHS) of Eq. (6) can be regarded as the impact from the first-order contiguous neighbors, and the third term is the impact from the second-order contiguous neighbors (neighbors’ neighbors), etc. ∥𝜆0 W n ∥ < 1 implies that the impact of the m-th order contiguous neighbors decreases exponentially as m increases. This model is thus stable. With uniform boundedness ∥W n ∥∞ = 1, the additional condition ∥𝜆0 W n ∥∞ < 1 gives rise to |𝜆0 | < 1. This setting is similar to that for a stationary AR(1) process for which we restrict the value of 𝜆0 . As an extension, for a more general model Y n = W n (𝜆0 )Y n + X n 𝛽 0 + 𝜖 n , where unknown parameters are involved in spatial weights, a stable condition for the system is to assume ∥W n (𝜆0 )∥ < 1 for some norm ∥·∥ for all n.
y2
and its cost equal 2i,n . Suppose individual i’s benefit from his action is proportional to his action, and it depends on his characteristics and
1 Beyond classical statistical approaches, LeSage and Pace (2009) and Greene (2011, 2013) discuss the Bayesian MCMC estimation approach for nonlinear spatial econometrics.
2
(6)
X. Xu and L.-f. Lee
Regional Science and Urban Economics xxx (2018) 1–11
Such a generalized case includes the high order SAR model, Y n = (𝜆10 W n1 + · · · + 𝜆k0 W nk )Y n + X n 𝛽 + 𝜖 n , where W n1 , · · ·W n,k are k > 1 distinct spatial weight matrices. In the linear SAR model of the first order described above, in addition to the condition that {W n } are uniformly bounded in both 1 row and column sum norms, {S− n (𝜆0 )} are also assumed to be uniformly bounded in both row and column sum norms. It is of interest to note that, in the MESS model originated in LeSage and Pace (2007), ∑ 𝛼 i Wni e𝛼Wn Yn = Xn 𝛽 + 𝜖n , where the matrix exponential e𝛼Wn = ∞ i=0 i! , only the condition that {W n } are uniformly bounded in both row and column sum norms is needed. Indeed, the sequence of matrix exponentials {e𝛼Wn } is bounded in some norm ∥·∥ when {∥W n ∥} is bounded. However, that is sufficient because a well-defined matrix exponential is always invertible, with its inverse being e−𝛼Wn . As a result, no restrictions on the parameter space of 𝛼 needs to be imposed. The MESS model is proposed as a substitute for the SAR model with stability builds in as one can see from its reduced form.
data models in Baltagi and Liu (2008) and Baltagi et al. (2007). Theorem 1.
is a nonstochastic symmetric matrix and its column sum is uniformly bounded. bn = (b1,n ,…, bn,n ) is a nonstochastic vector satisfying ∑ supn 1n ni=1 |bi,n |2+𝜂1 for some 𝜂 1 > 0. Suppose supi,n E|𝜖i,n |4+𝜂2 < ∞ for some 𝜂 2 > 0 holds. Denote Qn ≡ 𝜖n′ An 𝜖n + bn 𝜖n and 𝜎Qn ≡ [var(Qn )]1∕2 . If n−1 𝜎Q2 ⩾ c for some c > 0 for all n, then n
Qn − EQn d → N (0, 1). std(Qn ) Theorem A.1 in Kelejian and Prucha (2010) generalizes Theorem 1 to a multivariate form. m different linear-quadratic forms are defined as in Eq. (8), i.e., Qr ,n = 𝜖n′ Ar ,n 𝜖 + br ,n 𝜖n , r = 1,…, m. Let Vn = [Q1,n , … , Qm,n ]′ and Σn = var(V n ). Under similar conditions as those in Theorem 1, −1∕2
Σn 2.4. Estimation and statistical inference: linear-quadratic form and martingale CLT
1 √ 𝜖 ′ W n 𝜖n n n
Qn − EQn ≡ 𝜖n′ An 𝜖 + bn 𝜖n − 𝜎 2 tr(An )
=
involves a quadratic form. In addition to
−
2𝜎 2
[Sn (𝜆)Yn − Xn 𝛽]′ [Sn (𝜆)Yn − Xn 𝛽] + ln |In − 𝜆Wn |,
i=1
aii,n 𝜖i2,n + 2
i−1 n ∑ ∑
aij,n 𝜖i,n 𝜖j,n − 𝜎 2
i=1 j=1
n ∑ i=1
aii,n ≡
n ∑
Zi,n ,
i=1
where Zi,n ≡ bi,n 𝜖i,n + aii,n 𝜖i2,n + 2 j=1 aij,n 𝜖i,n 𝜖j,n − 𝜎 2 aii,n . Denote the sigma field i,n ≡ 𝜎(𝜖1,n , … , 𝜖i,n ). Since E𝜖 i,n = 0, E𝜖i2,n = 𝜎i2,n , and 𝜖 i,n ’s are independent, E(Zi,n |i−1,n ) = 0. Hence, {Zi,n } is a MDA. The MDA CLT for statistics of linear-quadratic forms provides an important tool for linear spatial models. This MDA CLT is useful for univariate linear spatial models. It can be extended to models with multivariate SAR equations, and panel data with both cross section and time dimensions. A multivariate MDA CLT is in Yang and Lee (2017). Yang and Lee (2017) consider that, for each spatial unit, there are m disturbances with zero mean, which can be correlated, but are independent across different spatial units. They are denoted 𝜖 n1 , …, 𝜖 nm and each of them is an n-dimensional column vector. Put them in an n × m matrix [𝜖 n1 , …, 𝜖 nm ]. Then each row of this matrix represents the m disturbances of an individual that can be correlated, but the different rows are independent. The linear-quadratic statistic has the form
(7)
𝜕 ln Ln (𝜃) 1 = 2 Xn′ 𝜖(𝜃), 𝜕𝛽 𝜎 𝜕 ln Ln (𝜃) n 1 = − 2 + 4 𝜖(𝜃)′ 𝜖(𝜃), 𝜕𝜎 2 2𝜎 2𝜎 𝜕 ln Ln (𝜃) 1 = 2 [Wn Sn (𝜆)−1 Xn 𝛽]′ 𝜖(𝜃) 𝜕𝜆 𝜎 1
𝜎
n ∑
(8)
where 𝜃 ≡ (𝜆,𝛽 ′ )′ . Denote 𝜖 (𝜃 ) ≡ Sn (𝜆)Y n − X n 𝛽 . The derivatives of ln Ln (𝜃 ) are
+
bi,n 𝜖i,n +
∑i−1
n ln(2𝜋𝜎 2) 2 1
n ∑ i=1
Moran’s test, quadratic forms are also useful for various LM test statistics, e.g., LM tests for spatial correlations in Anselin and Bera (1998) for linear SAR models, and for spatial Tobit models in Qu and Lee (2012). For estimation, consider the QMLE of a SAR model as an illustration. Assume that 𝜖 i,n ’s are iid (0, 𝜎 2 ) in Eq. (1). The quasi-log-likelihood function of the SAR model is ln Ln (𝜃) = −
d
(Vn − EVn ) → N (0, Im ).
With this theorem, Kelejian and Prucha (2010) show the consistency and asymptotic normality of a GMM estimator for a SARAR model with unknown heteroskedastic innovations. The proofs of Theorem 1 and Theorem A.1 in Kelejian and Prucha (2010) rely on a CLT of martingale difference arrays (MDA). A general linear-quadratic form of independent random variables can first be written in a single summation:
For estimation and inference, statistics of linear-quadratic form characterize testing, such as the Moran’s I test of spatial correlation, and estimation, such as the 2SLS, ML, GMM, and generalized empirical likelihood (GEL). For testing, Moran’s I test is widely used to test the existence of spatial correlation. In a linear regression setting, the Moran’s I statistic is based on checking whether 𝜖 i,n is correlated with wi·,n 𝜖 n that represent neighbors’ outcomes of i, where wi·,n is the ith row of W n . The statistic
[Kelejian and Prucha, 2001] Suppose elements of
𝜖 n = (𝜖 1,n ,…, 𝜖 n,n ) are independent and satisfy E𝜖 i,n = 0. An = (aij,n )
𝜖(𝜃)′ [Wn Sn (𝜆)−1 ]′ 𝜖(𝜃) − tr[Sn (𝜆)−1 Wn ]. 2
Qn =
m ∑
b′nk 𝜖nk +
k=1
All the three derivatives are linear-quadratic forms of 𝜖 (𝜃 ). Linearquadratic forms of random variables play a fundamental role in establishing the large sample properties of estimators for linear spatial econometric models. Thus, Kelejian and Prucha (2001) derive a CLT for linear-quadratic forms of independent random variables that are suitable to study spatial econometric models (Theorem 1 below). In addition to QMLE of SAR models, linear-quadratic forms are also indispensable when we study 2SLS estimation (Kelejian and Prucha, 1998) and GMM estimation (Kelejian and Prucha, 1999; Lee, 2007), the estimation of SARAR panel data models (Yu et al., 2008; Lee and Yu, 2010; Kuersteiner and Prucha, 2015), and various test statistics for spatial panel
m m ∑ ∑
′ A ′ [𝜖nk n,kl 𝜖nl − E(𝜖nk An,kl 𝜖nl )],
k=1 l=1
which extends the univariate case with multiple constant vectors b’s and quadratic matrices A’s under similar preceding regularity conditions. With the average variances 1n 𝜎Q2 bounded away from zero, n
d
Qn ∕𝜎Qn → N (0, 1). With this CLT, Yang and Lee (2017) study the QML estimation of general multivariate and simultaneous SAR models and establish asymptotic distributions of the QMLEs. The MDA CLT has been extended in Yu et al. (2008) and Kuersteiner and Prucha (2013) to study spatial panel data models. To incorporate ∑ time lags in the linear-quadratic form, denote QnT = Tt=1 (Un′ ,t −1 𝜖nt + ′ A 𝜖 ), where 𝜖 = (𝜖 ′ b′nt 𝜖nt + 𝜖nt n nt nt 1t ,n , … , 𝜖nt ,n ) with t representing a 3
X. Xu and L.-f. Lee
Regional Science and Urban Economics xxx (2018) 1–11
time period in the panel and 𝜖 i,nt ’s are independent across spatial units and time. This Qnt has a time lag component in U n,t −1 , where ∑∞ ∑ h h Un,t = ∞ h=1 Pnh 𝜖n,t +1−h with Pnh = Bn Pn such that Bn and h=1 |Pn | , where |Pn | is the matrix with each entry taken absolute value, are uniformly bounded in both row and column sums norms. This form can be ∑ ∑ rewritten as QnT = Tt=1 ni=1 Zit ,n where Zit ,n = (ui,t −1 + bit ,n )𝜖it ,n + aii,n 𝜖it2,n + 2(
i−1 ∑
v < 1 is known as the empirical likelihood, and 𝜌(v) = − 12 (v + 1)2 corresponds to the continuous updating GMM. Under the null H 0 of no spatial correlation, Y n = 𝜖 n and Zi,n is a MDA. In this case, the ratio test statistic converges in distribution to 𝜒 2 (1). This ratio test statistic is valid even 𝜖 i,n ’s are heteroskedastic because heteroskedasticity of the MDA has been internalized in the GEL formulation.
aij,n 𝜖jt ,n )𝜖it ,n .
3. Nonlinear spatial econometric models
j =1
Defining the adaptive 𝜎 -field it ,n = 𝜎(𝜖j𝜏,n , 1 ⩽ j ⩽ n, 1 ⩽ 𝜏 ⩽ t − 1; 𝜖1t ,n , … , 𝜖it ,n ) in the order by arranging units form 1 to n at each time first and then with time moving forward. The {Zit,n −E(Zit,n )} is a MDA based on {it ,n }. This CLT can be applied to the dynamic panel data models with additive individual and time fixed effects (Yu et al., 2008) or fixed time factors (Shi and Lee, 2017). Further extension of a similar MDA CLT can be applied to a spatial time dynamic model with endogenous time varying spatial weight matrices (Qu et al., 2017). A new MDA CLT is derived in Kuersteiner and Prucha (2013). Based on this MDA CLT, Kuersteiner and Prucha (2013) establish a general CLT for panel data models with large n and fixed T, which allows for cross sectional dependence, which might stem from spatial interactions and/or from common random shocks. This CLT also allows regressors to be sequentially exogenous, rather than strictly exogenous. Kuersteiner and Prucha (2013) present some examples to illustrate how to use this CLT to establish asymptotic distributions of some GMM and ML estimators. Kuersteiner and Prucha (2015) utilize the MDA CLT in Kuersteiner and Prucha (2013) to examine GMM estimators of a type of dynamic spatial panel data models that allow for endogenous spatial weights matrices and time-varying interactive effects. Asymptotic distributions of Moran’s I test, and estimators of SAR and spatial panel models can be studied via the MDA CLT. In addition, the MDA representation of linear-quadratic statistics has a robust property for formulations of test and estimation of linear SAR models against unknown heteroskedastic variances. As an illustration, consider the Moran’s I test for spatial correlation in the SAR process Y n = 𝜆W n Y n + 𝜖 n , where 𝜖 i,n ’s are independent but with unknown heteroskedastic variances 𝜎i2,n ’s. To implement the Moran test statistic Yn′ Wn Yn , a proper asymptotic normal N(0, 1) or 𝜒 2 (1) test needs to take into account the unknown heteroskedasticity. The Moran test statistic can be represented as Yn′ Wn Yn = ∑n ∑i−1 i=1 Zi,n , where Zi,n = yi,n j=1 (wij,n + wji,n )yj,n . Under the null H 0 : ∑ 𝜆0 = 0, Zi,n = 𝜖i,n ji−=11 (wij,n + wji,n )𝜖j,n is a MDA. By the uncorrelated∑ ness of Zi,n ’s, var(Yn′ Wn Yn ) = ni=1 E(Zi2,n ). It follows that the Moran ∑n ∑ ∑ test can be formulated as ( i=1 Zi,n )′ ( ni=1 Zi,n Zi′,n )−1 ni=1 Zi,n , which
3.1. Utility beyond linear-quadratic functions and limited dependent variables The linear SAR can be justified as a perfect information game with individual linear-quadratic utility and cost functions. However, it is quite possible that utility functions and/or cost functions might not have the form in Eq. (5). As an illustration, consider a Cournot oligopoly game with n firms. Let ai be a possible output of firm i with ∑ a cost ci (ai ), and F ( nj=1 aj ) be an inverse demand function. A profit ∑ function of firm i is 𝜋i (a) = F ( nj=1 aj )ai − ci (ai ). If the inverse demand is linear, i.e., F(x) = 𝛼 − 𝛽 x, and ci (ai ) is a linear quadratic cost, ∑ ∑ then 𝜋i (a) = (𝛼 − 𝛽 nj=1 aj )ai − ci (ai ) = −𝛽 j=1,j≠i aj ai − [ci (ai ) + 𝛽 a2i − 𝛼 ai ]. With the profit function for each firm being linear and quadratic in its output, these specifications give a SAR model (as in a Nash equilibrium setting). However, if the inverse demand or the cost function is not linear, the derived model will be nonlinear. See, e.g., Huang (1983) for some well-known utility functions for inverse demand functions. A more general utility function beyond Eq. (5) can be ui,n ( yi,n ) = F ( yi,n ,
n ∑ j =1
wij,n yj,n , xi′,n 𝛽0 + 𝜖i,n ) − c( yi,n ),
(9)
where F(·) is a monotonic function in y i,n and c( y i,n ) is a convex cost function. Restrictions on the action set can also lead to nonlinear spatial models. Xu and Lee (2015a) study a special case of Eq. (9): ) ( n y2 ∑ wij,n yj,n + xi′,n 𝛽0 + 𝜖i,n − i,n . ui,n ( yi,n ) = yi,n F 𝜆0 2 j =1 The Nash equilibrium for this model is ) ( n ∑ wij,n yj,n + xi′,n 𝛽0 + 𝜖i,n . yi,n = F 𝜆0
(10)
j =1
The F(·) function can be specified to be non-negative if y i,n would take on only non-negative values. If y i,n is a positive share whose values are within (0, 1), then F(·) can be a cumulative distribution function (CDF). If F(·) has a known functional form, the implied Eq. (10) is nonlinear ∑ with endogenous regressor nj=1 wij,n yj,n and non-separable disturbance 𝜖 i,n . If the functional form F(·) is unknown, it will be a semiparametric SAR model. In some cases, outcomes can be limited dependent variables, such as discrete or censored variables. For example, the time a student studies per week, the amount of tobaccos a student smokes per week, and some specific tax rates determined by local governments, can be censored variables with censoring at zero, i.e., Tobit variables. In these cases, even if utility functions are of linear-quadratic forms, but because there are non-negative binding constraints on outcomes, the Nash equilibrium will not be a linear SAR model. The Nash equilibrium will be characterized by
is asymptotically 𝜒 2 (1) under the null of no spatial dependence. This formulation of the test statistic is robust against unknown heteroskedasticity, and can simply be evaluated by a regression package as the explained sum of square of the regression of the constant 1 on Zni – an outer-product-gradient formulation (Jin and Lee, 2017a). In the statistics and econometrics literature, in addition to QML and GMM approaches, another fruitful estimation and testing approach is known as the GEL approach. The GEL approach is mainly designed for cross sectional independent samples, but with a few studies on time series models. However, with the martingale difference representation of linear-quadratic moments, one can adopt the GEL approach as an alternative to GMM estimation for linear SAR models (Jin and Lee, 2017b), so GEL approach can be formally used for analyzing spatial data. For an illustration, consider again the Moran test of spatial dependence for the preceding SAR process Y n = 𝜆W n Y n + 𝜖 n . The Moran test can be formulated as a ratio test statistic in the GEL ∑ framework with 2[max𝛼 ni=1 𝜌(𝛼 Zni ) − n𝜌(0)], where 𝜌(·) is a twice continuously differentiable concave function. The 𝜌(v) = ln(1 − v) for
yi,n = max(0, 𝜆0
n ∑ j =1
wij,n yj,n + xi′,n 𝛽0 + 𝜖i,n ).
(11)
This is a simultaneous SAR Tobit model. In some other cases, the choice set is a finite set. A common example is a binary choice model. For example, a student smokes or not, or a 4
X. Xu and L.-f. Lee
Regional Science and Urban Economics xxx (2018) 1–11
local government embraces a policy or not. For binary choice games, denote the choice set to be {0, 1}. When an individual chooses 0, his utility is normalized to be zero; for alternative 1, his utility is ui,n ( yi,n = 1) = 𝜆0
n ∑
in most empirical applications in economics, individuals are heterogeneous, the spatial correlation might vary with locations, and individuals are not located on integer lattices. Finally, random variables in many important spatial processes, such as the SAR process, are also indexed by their sample sizes. As a consequence, most WLLNs and CLTs for random fields in statistics may not be useful in spatial econometrics. Jenish and Prucha (2009) extend previous works to develop new WLLNs and CLTs that are suitable for economic applications. The 𝛼 mixing or 𝜙-mixing spatial processes in Jenish and Prucha (2009) are allowed to be nonstationary and located in unevenly locations, and the sample is allowed to be a triangular array. See Definition 1 below. All of our following discussion is based on Assumption 1. Assumption 1 implies that as the sample size n increases, the sample region Dn will expand. The expanded asymptotic will be appropriate for many regional economics problems.
wij,n yj,n + xi,n 𝛽0 + 𝜖i,n .
j =1
In this case, a Nash equilibrium is characterized by yi,n = 1(𝜆0
n ∑ j =1
wij,n yj,n + xi′,n 𝛽0 + 𝜖i,n > 0).
(12)
This is a SAR binary choice model. Recently, there are emerged interest in qualitative and limited dependent variables in spatial econometrics. Several theoretical and empirical papers are collected in a book edited by Baltagi et al. (2016). In addition to discrete dependent variables, there are spatial models on counting and survival variables.
Assumption 1. Individual spatial units are located in a region Dn ⊂ D ⊂ ℝd , where the cardinality of Dn satisfies |Dn | = n. For any i ≠ j ∈ Dn , their distance dij ⩾ 1.
3.2. Theoretical foundations for nonlinear spatial econometric models
Definition 1. (Jenish and Prucha, 2009). Let X ≡ {Xi,n ; i ∈ Dn ⊆ ℝd , n ∈ ℕ} be a triangular array of random variables on a probability space (Ω, , P). For and , two sub-𝜎 -algebras of , the 𝛼 -mixing coefficient and 𝜙-mixing coefficient between and are defined as
For nonlinear spatial econometric models, a common theme is that while dependent variables of a model will be spatially correlated, a dependent variable can not be expressed as a linear function of disturbances. Because a spatial model has its economic foundation as a perfect information game, one has to investigate whether a model is well specified so that corresponding nonlinear reaction functions would have a solution. If not, then such a model would be irrelevant to generate samples that an investigator has observed. If solutions would exist, the next issue is to investigate whether the model would have a unique Nash equilibrium or multiple ones. Different estimation methods might be designed for estimation of a model with a single equilibrium as compared to those with multiple ones. A common feature of spatial nonlinear models is that an equilibrium would be an implicit function of disturbances. Statistics or statistical functions used for estimation are stochastic spatial processes. In order to understand asymptotic properties of estimators, proper LLN and CLTs (beyond Martingale CLT) will be needed. For analysis of consistency, uniform LLN is an additional required tool. In the following, some recently developed LLN and CLT for random fields, which are useful for nonlinear spatial econometric models, will first be reviewed. In a subsequent section, specific tools used for analyzing estimators for both the spatial Tobit and SAR binary choice models will be presented.
𝛼(, ) ≡ sup(| Pr(A ∩ B) − Pr(A) Pr(B)|, A ∈ , B ∈ ), And the 𝛼 -mixing coefficient and 𝜙-mixing coefficient for X are defined as )) ( ( 𝛼k,l,n (r ) ≡ sup{𝛼n 𝜎(Xi,n ; i ∈ U ), 𝜎 Xj,n ; j ∈ V ∶ U , V ⊆ Dn , |U| ⩽ k, |V| ⩽ l, d(U , V ) ⩾ r } ,
𝜙k,l,n (r ) ≡ sup{𝜙n (𝜎(Xi,n ; i ∈ U ), 𝜎(Xj,n ; j ∈ V )) ∶ U , V ⊆ Dn , |U| ⩽ k, |V| ⩽ l, d(U , V ) ⩾ r }, where |U| is the cardinal of U, and d(U, V) is the distance between U and V. Denote 𝛼 k,l (r) ≡ supn 𝛼 k,l,n (r) and 𝜙k,l (r) ≡ supn 𝜙k,l,n (r). X is said to be 𝛼 -mixing (𝜙-mixing) iff for any k, l ∈ ℕ, limn→∞ 𝛼 k,l,n (r) = 0 (limn→∞ 𝜙k,l,n (r) = 0).
𝜙(, ) ≡ sup(| Pr(A|B) − Pr(A)|, A ∈ , B ∈ , Pr(B) > 0). 𝛼 -mixing and 𝜙-mixing spatial processes defined above satisfy some WLLNs and CLTs. In time series, MA(∞) processes are 𝛼 -mixing under some conditions (Davidson, 1994), and many nonlinear Markov processes are stationary 𝛽 -mixing2 processes (Chen and Shen, 1998). But Andrews (1984) also constructed a stationary AR(1) process that is not 𝛼 -mixing.
3.3. Weakly dependent random fields For nonlinear spatial econometric models, techniques to establish asymptotic properties of estimators are usually different from those for linear models, as the generated dependent variable y i,n would not be an affine function of disturbances in a nonlinear model. For nonlinear time series models, some weakly dependent concepts, such as ergodicity, mixing, and NED, are needed for asymptotic analysis of estimators. Analogously, weak spatial dependence is often necessary to examine large sample properties for nonlinear spatial econometric models. In spatial econometrics, spatial units are usually irregularly located in some finite dimensional Euclidean space, say ℝd . A spatial process is also called a random field, which generalizes a stochastic process from a time domain to a space domain. In statistics and mathematics, there are considerable researches for random fields. Some books, e.g., Doukhan (1994) and Bradley (2007), summarize some progress in the study of random fields. However, most results, especially WLLNs and CLTs, are developed for strictly stationary random fields on ℤd , the ddimensional Euclidean space with integer coordinates. Strictly stationary random fields are useful for computer science, physics, and some other fields. However, as pointed out in Jenish and Prucha (2009),
So far, we do not know whether a random field generated by a SAR model is a mixing random field or not. Hence, we need a concept that holds at least for SAR models. From Gallant and White (1988), an ARMA process of finite order with zeros lying outside the unit circle is a NED process. Jenish and Prucha (2012) generalize the definition of a NED process from time series to random fields. It turns out that NED random fields are convenient to explore SAR models (see the next section for more discussion). We summarize several properties of random fields in Propositions 1–3. Proposition 1 implies that the covariance between Zi,n and Zj,n decreases as the distance between i and j increases. Thus, the spatial NED property
2 The 𝛽 -mixing coefficient between two sub-𝜎 -algebras and is defined as 𝛽(, ) ≡ E sup(| Pr(B|) − Pr(B)|, B ∈ ). The 𝛽 -mixing for a random field can be defined similarly to 𝛼 -mixing and 𝜙-mixing coefficients.
5
X. Xu and L.-f. Lee
Regional Science and Urban Economics xxx (2018) 1–11
4. Applications of L2 -NED random fields
describes a weak spatial dependence. From Propositions 2 and 3, under some regularity conditions, a spatial NED random field satisfies a WLLN and a CLT when the base field is a proper spatial mixing field. These three propositions are useful tools when we study large sample properties of nonlinear spatial econometric models.
For several SAR models, dependent variables can be established as a uniform L2 -NED random field. Consequently, for SAR models, the concept of spatial NED is more convenient to use than spatial mixing. That is why a series of nonlinear spatial econometric models are explored based on the spatial NED in Jenish and Prucha (2012). Xu and Lee (2015a) study a SAR model with a nonlinear transformation of the dependent variable; Qu and Lee (2015) investigate a SAR model with an endogenous spatial weight matrix; Xu and Lee (2015b) examine the large sample properties of the MLE of a spatial Tobit model; Xu and Lee (2016) explore a binary choice spatial model; and Qu et al. (2017) study dynamic spatial panel data models with endogenous time varying spatial weight matrices. The following proposition provides a useful tool to justify a dependent random variable from a spatial model can be derived from a structural model and spatial NED, if one can establish a Lipschitz relation between the dependent random variable and disturbances in the model.
Definition 2. [Jenish and Prucha, 2012]. Let Z = {Zi,n , i ∈ Dn , n ⩾ 1} and 𝜐 = {𝜐i,n , i ∈ Dn , n ⩾ 1} be two random fields. Suppose that, for 1
some p ⩾ 1, ‖Z‖Lp ≡ supi∈Dn ,n (E|Zi,n |p ) p < ∞. i,n (s) ≡ 𝜎{vj,n ∶ dij ⩽ s} is denoted as the 𝜎 -field generated by the random variables 𝜐j,n ’s. Z is said to be Lp -near-epoch dependent on 𝜐 if ‖Zi,n − E(Zi,n |i,n (s))‖Lp ⩽ di,n 𝜓(s) for some array of finite positive constants d = {di,n , i ∈ Dn , n ≥ 1} and for some sequence 𝜓 (s) ⩾ 0 with lims→∞ 𝜓 (s) = 0. The di,n ’s are called NED scaling factors. The 𝜓 (s), called the NED coefficients, can be non-increasing without loss of generality. The NED random field is uniform iff supn supi∈Dn di,n < ∞, and it is called geometric iff 𝜓 (s) = O(𝜌s ) for some 0 < 𝜌 < 1. Proposition 1. (Jenish and Prucha, 2012) Let {Zi,n } be uniformly L2 -NED on a random field {𝜖 i,n } with 𝛼 -mixing coefficients 𝛼u,v (r ) = (u + v )𝜏 ̂ 𝛼 (r ) for some constant 𝜏 ≥ 0. Suppose that (1) the NED coefficients ∑ d−1 𝜓(r ) < ∞ and ‖Z‖ of {Zi,n } satisfy ∞ L2+𝛿 < ∞ for some 𝛿 > 0, and 1r ∑∞ d(𝜏 +1) r =𝛿∕( 𝛼 2+𝛿) (r ) < ∞, where 𝜏∗ ≡ 2𝛿𝜏 . Then (2) r =1 r ∗ ̂ +𝜏 [ ( |cov(Zi,n , Zj,n )| ⩽ C1 𝛼 ̂
dij
(
)
3
+𝜓
dij 3
)]
Proposition 4. (Jenish and Prucha, 2012) Let 𝜖 = {𝜖 i,n , i ∈ Dn , n ⩾ 1} be a random field such that ‖𝜖 ‖L2 ≡ supn,i∈Dn (E𝜖i2,n )1∕2 < ∞. {Zi,n } is generated from a Lipschitz function of 𝜖 : Zi,n = Hi,n ((𝜖j,n )j∈Dn ), and | ∑ | wij,n |ej − e′j | |Hi,n (e) − Hi,n (e′ )| ⩽ | |
(13)
j∈Dn
⎞ Z ⎟ ⩽ C2 n ⎜i∈D i,n ⎟ ⎠ ⎝ n ⎛∑
, and var ⎜
with wij,n ⩾ 0. Suppose 𝜓(s) ≡ supn,i∈Dn lim 𝜓(s) = 0.
∑
j∈Dn ∶dij >s wij,n
satisfies (14)
for some constants C1 and C2 that do not depend on i, j, n.
s→∞
Proposition 2. (Jenish and Prucha, 2012) Let {Zi,n } be a uniformly L1 -NED random field on a random field {𝜖 i,n } with 𝛼 -mixing coefficients 𝛼u,v (r ) = 𝜑(u, v)̂ 𝛼 (r ). Suppose that (1) ‖Z‖Lp < ∞ for some p > 1, ∑ d−1 𝜓(r ) < ∞, and (3) (2) the NED coefficients of {Zi,n } satisfy ∞ r =1 r ∑∞ d−1 r 𝛼 ̂ ( r ) < ∞ . Then r =1
Then Z = {Zi,n , i ∈ Dn , n ⩾ 1} is well-defined, and L2 -NED on 𝜖 with 𝜓(s) = ‖𝜖 ‖L2 𝜓(s). However, as contrary to spatial mixing random fields, it is not necessary that spatial NED would be preserved under any measurable function. As in time series, spatial NED is preserved under Lipschitz transformation as shown in Propositions 2 and 3 in Jenish and Prucha (2012). But for some statistics in a model, they might not be derived from a Lipschitz transformation. So in some models, one has to go beyond the Lipschitz transformation. It is this complication that we need to care about for its usefulness for asymptotic analysis of estimators for a nonlinear spatial model. Some relevant statistics involved need to be established as spatial NED random fields. In the following, we will discuss a SAR Tobit model and a SAR binary choice model to see how the NED property can be derived from the structure of a model and employed to obtain the asymptotic distribution of estimators for those nonlinear spatial models. For some econometric estimation and testing, spatial NED property for sample observations would be assumed to be generated from an unknown underlying model, and the NED random field would be regarded as a basic statistical process for analysis. An example in the econometric literature is Jenish (2012), which is to estimate the nonparametric regression function g(x) = E( y i,n |xi,n = x) for x ∈ ℝp . Another example is to consider the test of equality of two log-likelihood functions, both of which might be misspecified and the true model is unknown (Liu and Lee, 2017). The interest is to test whether the two misspecified models provide similar approximations to the unknown model or one is more close to the true model than the other one (Vuong, 1989).
L1 1∑ (Z − EZi,n ) → 0. n i∈D i,n n
Proposition 3. (Jenish and Prucha, 2012) Let {Zi,n } be uniformly L2 -NED on a random field {𝜖 i,n } with 𝛼 -mixing coeffi𝛼 (r ) for some constant 𝜏 ≥ 0. Suppose that (1) cients 𝛼u,v (r ) = (u + v)𝜏 ̂ ∑∞ d(𝜏 +1) 𝛿∕(2+𝛿) ∗ for some 𝛿 > 0, 𝛼 ̂ (r ) < ∞, where 𝜏∗ ≡ 2𝛿𝜏 ; (2) r =1 r +𝜏 limk→∞ supi∈Dn ,n E[|Zi,n |p 1(|Zi,n | > k)] = 0; (3) the NED coefficients of ∑∞ d−1 𝜎2 r 𝜓(r ) < ∞; and (4) infn nn > 0, where 𝜎n = {Zi,n } satisfy (∑ ) r =1 std i∈Dn Zi,n . Then ∑
i∈Dn (Zi,n
𝜎n
− EZi,n )
d
→ N (0, 1).
The notion of NED dependence is originated from the time series literature. The generalization of spatial NED by Jenish and Prucha (2012) seems to have emphasized on spatial aspects but not time. But that is not the case. The spatial notation of NED process can be for both space and time dependence. To accommodate the time dimension, one may regard about the location of spatial units in Definition 2 is for both space and time. The ℝd space will have a time dimension ℝ and a space of dimension (d − 1) with d ≥ 2. For a discrete time spatial process, n spatial units interact in space and through discrete time. With T time periods, there will be NT units allocated in ℝd , that is, each unit i will occupy T time-space locations. There will be total nT nodes in the space-time ℝd . Each i at a time might interact with his past nodes, which captures time series interactions. With NED on both space and time, its LLN and CLT can be useful for studies of dynamic spatial panel data models. An example is in Qu et al. (2017).
4.1. MLE of a SAR Tobit model When the dependent variable y i,n is censored, we need to consider Tobit type models. For example, Donfouet et al. (2012) use a SAR Tobit model to examine community-based health insurance. Since many families have not bought the insurance in the sample, many of the dependent variables are equal to zero. LeSage and Pace (2009) 6
X. Xu and L.-f. Lee
Regional Science and Urban Economics xxx (2018) 1–11
point out that a SAR Tobit model might be useful in modeling origindestination flows, because usually many elements of a flow matrix equal zero. A SAR Tobit model is of the form in Eq. (11), or in a matrix form, Y n = max(0, 𝜆0 W n Y n + X n 𝛽 0 + 𝜖 n ). Since Eq. (11) is a system of nonlinear equations, before studying its estimation, we need to understand first whether the system has a solution. If it did not, it would not be a model of interest as there would be no chance that the observed sample Y n would be generated from it. Notice that a Lipschitz coefficient of max(0, x) is 1. Assume that 𝜁 ≡ 𝜆m supn ∥W n ∥∞ < 1, where Λ = [−𝜆m , 𝜆m ] is the parameter space of 𝜆. Then by the contraction mapping theorem, Eq. (11) has a unique solution. Define F(Y n ) = max(0, 𝜆0 W n Y n + X n 𝛽 0 + 𝜖 n ). The system of nonlinear equations is simply Y n = F(Y n ), and hence any solution would correspond to a fixed point. The vector value mapping F is a contraction mapping because it satisfies the inequality ∥F(Y 1 ) − F(Y 2 )∥∞ ⩽ |𝜆0 ∥|W n ∥∞ ·∥Y 1 − Y 2 ∥∞ . The contraction mapping is a convenient way to see the existence and uniqueness of a Nash equilibrium. But there are other mathematical methods to investigate such a possibility. For the Tobit model here, another method can follow some results in the mathematical programming literature (see, e.g., Amemiya (1974) for model coherency of a multivariate Tobit model). The condition 𝜁 < 1 will also be useful to obtain finite moments for y i,n when {𝜖 i,n } satisfies some comment conditions. So 𝜁 < 1 provides also the stability of the system. When 𝜖 i,n ’s are iid and follow N(0, 𝜎 2 ), Qu and Lee (2012) derive the log-likelihood function of this model, which is
ln Ln (𝜃) =
n ∑ i=1
1( yi,n = 0) ln Φ(zi,n (𝜃)) −
n ∑ 1 ln(2𝜋𝜎 2 ) 1( yi,n > 0) 2 i=1
n 1∑ + ln |I2,n − 𝜆W22,n | − 1( yi,n > 0)z2i,n (𝜃), 2 i=1
To establish consistency of the MLE, a key step is to verify uniform convergence of the sample average log-likelihood function to its p
mean, i.e., sup𝜃∈Θ 1n |Ln (𝜃) − ELn (𝜃)| → 0. An important ingredient is to have a point-wise convergence first. For that purpose, a tool is the WLLN for NED random fields (Proposition 2). To apply Proposition 2, we must show that {z2i,n (𝜃)}ni=1 , {ln Φ(zi,n (𝜃))}ni=1 and {1( yi,n > 0)}ni=1 are NED random fields, and write ln |I 2,n − 𝜆W 22,n | into a summation form, where terms in the summation are also NED. Notice that d d ln Φ(x) ∼ (−x) as x → −∞ and limx→+∞ dx ln Φ(x) = 0. Thus, | dx ln Φ(x1 ) − ln Φ(x2 )| ⩽ C1 (|x1 | + |x2 | + 1)|x1 − x2 | for some constant C1 . The following lemma, which generalizes the Lipschitz transformation, implies that {z2i,n (𝜃)}ni=1 and {ln Φ(zi,n (𝜃))}ni=1 are also uniformly L2 -NED. Lemma 1 is also useful for several terms in the first and second order derivatives of the log-likelihood function as they satisfy its conditions. Lemma 1. (Xu and Lee, 2015b) Suppose G(x): Domain(⊂ R) → R satisfies |G(x1 ) − G(x2 )| ⩽ C1 (|x1 |a + |x2 |a + 1)|x1 − x2 | for some integer a ⩾ 1. If {ui,n }ni=1 is a random field with ‖ui,n − E[ui,n ∣ i,n (s)]‖2 ⩽ C2 𝜓(s) for all i and n, and supi,n ∥ui,n ∥p < ∞ for some p > 2a + 2. Then ‖G(ui,n ) − E[G(ui,n ) ∣ i,n (s)]‖2 ⩽ C𝜓(s)(p−2a−2)∕(2p−2a−2) . However, Lemma 1 is irrelevant for {1( yi,n > 0)}ni=1 , as 1(y > 0) is not a continuous function. Notice that 1( yi,n > 0) ≡ 1(yi∗,n > 0), where ∑ yi∗,n = 𝜆0 nj=1 wij,n yj,n + xi′,n 𝛽0 + 𝜖i,n . Xu and Lee (2015b) prove the following result: Proposition 5. If ‖yi∗,n − E[yi∗,n ∣ i,n (s)]‖L2 ⩽ 𝜓(s) and supi,n supt f i,n (t) < ∞, where f i,n (·) is the probability density function (PDF) of yi∗,n , then ‖1(yi∗,n > 0) − E[1(yi∗,n > 0) ∣ i,n (s)]‖L2 ⩽ C𝜓(s)1∕3 for some constant C. Xu and Lee (2015b) verify that supi,n supt f i,n (t) < ∞ with the normal distribution of disturbances. In an extension, Xu and Lee (2018) show that in general it holds without normality by convolution. Hence, under Assumption 2, {1( yi,n > 0)}ni=1 is also a uniformly L2 -NED random field. The remaining challenge is to analyze ln |I 2,n − 𝜆W 22,n |, which is stochastic rather than deterministic. It needs to be expressed into a summation over observations so that WLLN and CLT can be applied to investigate its stochastic convergence. Denote Gn (Y n ) ≡ ̃n = Gn ( Yn )Wn Gn ( Yn ). By Taydiag(1(y 1,n > 0),…, 1(y n,n > 0)) and W lor’s formula,
(15)
∑ where zi,n (𝜃) ≡ ( yi,n − 𝜆 nj=1 wij,n yj,n − xi,n 𝛽)∕𝜎 , Φ(·) is the CDF of the standard normal distribution, W 22,n is the principal submatrix of W n corresponding to the strictly positive y i,n ’s, and I 2,n is the identity matrix with the same dimension as W 22,n . Based on this likelihood function, Xu and Lee (2015b) show that the MLE of parameters in this model is consistent and asymptotically normally distributed using the theory of spatial NED random fields. First, it is important to show that the observed y i,n is a NED random field, because relevant statistics are functions of y i,n . Proposition 4 provides a useful tool. Xu and Lee (2015b) show that {y i,n } is an L2 NED random field using Proposition 4. They prove that Eq. (13) holds due to 𝜁 < 1. To verify the condition Eq. (14), Xu and Lee (2015b) show that it suffices for at least one of the following two conditions to hold:
ln |I2,n − 𝜆W22,n | = −
Assumption 2. (1) There is a constant distance d0 such that when dij > d0 , wij,n = 0. (2) (i) For every n, the number of columns, w·j,n , of ∑ W n with |𝜆0 | ni=1 |wij,n | > 𝜁 , is ⩽ N, where N is a constant integer that does not depend on n; (ii) there exists an 𝛼 > d and a constant C0 such that |wij,n | ⩽ C0 /d(i,j)𝛼 .
∞ l ∑ 𝜆 l tr(W22 ,n ) l l=1
=−
∞ l ∑ ]l 𝜆 [ tr Gn ( Yn )Wn Gn ( Yn ) l l=1
=−
n ∞ ∑ ∑ 𝜆l { [ ]l } Gn ( Yn )Wn Gn ( Yn ) . l ii i=1 l=1
(16)
Notice that {[ ∑ ∑ ]l } = ··· wij1 ,n wj1 j2 ,n · · · wjl−1 i,n Gn ( Yn )Wn Gn ( Yn ) ii
j1
jl−1
1( yi,n > 0, yj1 > 0 · · · , yjl−1 > 0).
Under condition (1), {yi,n }ni=1 is uniformly and geometrically L2 }n { NED on xi,n , 𝜖i,n i=1 : ‖yi,n − E[yi,n |i,n (s)]‖L2 ⩽ C(𝜁 1∕d0 )s for some constant C, where i,n (s) ≡ 𝜎({xj,n , 𝜖j,n ∶ dij ⩽ s}); under condition (2), ‖yi,n − E[yi,n |i,n (s)]‖L2 ⩽ C∕s𝛼−d for some constant C.
Denote j0 ≡ i for convenience. for any given positive integer m,
7
Under
Assumption
2(1),
X. Xu and L.-f. Lee
Regional Science and Urban Economics xxx (2018) 1–11
{ ] [{[ ]l } ]l } || [ −E 𝜆Gn ( Yn )Wn Gn ( Yn ) ∣ i,n (md0 ) |||| 2 || 𝜆Gn ( Yn )Wn Gn ( Yn ) || ||L ii ii ] [ l−1 l−1 || || ∏ ∑ ∑ ∏ || || ⩽ |𝜆l | ··· wij1 ,n wj1 j2 ,n · · · wjl−1 i,n || 1( yjk > 0) − E 1( yjk > 0) ∣ i,n (md0 ) || || || ||L2 || k=0 j1 jl−1 k=0 l−1 l−1 [ ] || || ∏ ∑ ∏ ∑ || || ⩽ |𝜆l | ··· wij1 ,n · · · wjl−1 i,n || 1( yjk > 0) − E 1( yjk > 0) ∣ i,n (md0 ) || || || || k=0 ||L2 j1 jl−1 k=0
(17)
l−1 { [ ]} || || ∑ ∑ ∑ || || ⩽ |𝜆l | ··· wij1 ,n · · · wjl−1 i,n || 1( yjk > 0) − E 1( yjk > 0) ∣ i,n (md0 ) || || || ||L2 || k=0 j1 jl−1 l−1 || ] || [ ∑ ∑ ∑ || || ⩽ |𝜆l | ··· wij1 ,n · · · wjl−1 i,n ||1(yjk > 0) − E 1(yjk > 0) ∣ i,n (md0 ) || , || || ||L2 j1 jl−1 k=0 ||
where the first and last inequalities are by Minkowski’s inequality, the second inequality is because the conditional expectation is the best predictor under L2 norm, and the third inequality is by ∏ ∑ ∏ the following inequality: | kl−=10 ak − kl−=10 bk | ⩽ kl−=10 |ak − bk | when all ak ’s and bk ’s are within [-1,1]. As {1(yjk > 0)}ni=1 is uniformly 5, Xu and Lee (2015b) and geometrically L2 -NED by Proposition {[ ]l } 𝜆Gn (Yn )Wn Gn (Yn ) is also uniformly and geometriprove that
normal distributional assumption to the case that the distribution of 𝜖 i,N ’s3 are iid but of unknown form. Given a strictly increasing CDF G(·∣𝜃 g ), where 𝜃 g can be some location or scalar parameters, any CDF F(·) can be written as F(x) = H(G(x∣𝜃 g )), where H(·) is a CDF on [0, 1]. For example, G(x∣𝜃 g ) can be the CDF of the Logistic distribu1 tion G(x ∣ 𝜎) = −x∕𝜎 , or the normal distribution G(x ∣ 𝜇, 𝜎) = Φ( x−𝜇 ). 𝜎 e +1 ·) is equivalent√to approximate H(·). Denote Then approximating F(√ 1 h(u) ≡ H ′ (u). Since ∫0 [ h(u)]2 du = 1, h(u) ∈ L2 [0, 1]. Consequently, √ h(u) can be approximated by some basis functions of the Hilbert space L2 [0, 1]. Following Bierens (2014), Xu and Lee (2018) use cosine functions as basis functions, as they provide closed forms for both the h(u) and H(u):
ii
cally L2 -NED, and the NED coefficient does not depend on l. Since |{[ ]} | | 𝜆Gn (Yn )Wn Gn (Yn ) l | ⩽ 𝜆−1 𝜁 l , when l is large, it is neglectable. | | m ii | | { ]l } ]l } l [ ∑∞ 𝜆l {[ ∑ Gn (Yn )Wn Gn (Yn ) Thus, ≈ Kl=1 𝜆l Gn (Yn )Wn Gn (Yn ) l=1 l ii
ii
for a large K. And the convergence in probability of the latter is then a direct result of the WLLN of NED random fields. Under Assumption 2(2), the calculation is more complicated, but the idea is similar. It is not difficult to show that { 1n Ln (𝜃)} is stochastically equicontinuous. Then the uniform convergence in probability of p
1 L (𝜃), n n
h(u ∣ 𝛿) = (1 − 𝜖0 )(𝛿0 +
i.e.,
∞ ∑
√
𝛿k 2 cos k𝜋 u)2 + 𝜖0 ,
k=1
sup𝜃∈Θ 1n |Ln (𝜃) − ELn (𝜃)| → 0 follows by Theorem 1 in Andrews (1992). These ingredients are sufficient to guarantee consistency of the MLE (see, e.g., Amemiya, 1985). As for the asymptotic distribution of the MLE, we need to show that the score of the log-likelihood function follows the CLT of a spatial NED process. By Taylor’s formula, dd𝜆 ln |I2,n ̃n ] = ∑n ̃n )−1 W −𝜆W22,n | = −tr[(I2,n − 𝜆W22,n )−1 W22,n ] = −tr[(In − 𝜆W i=1 ( ) ∑ l ̃ l+1 ri,n (𝜆), where ri,n (𝜆) ≡ − ∞ . Thus, 𝜕 ln Ln (𝜃)∕𝜕𝜃 = l=0 𝜆 Wn ii ∑n i=1 qi,n (𝜃), where
H (u ∣ 𝜹) =
u
∫0
+
h(v ∣ 𝜹)dv = u + (1 − 𝜖0 )
∞ ∑ k=1
[
×
z (𝜃)wi·,n Yn 𝜙(zi,n (𝜃))wi·,n Yn ⎛ ⎞ + 1( yi,n > 0) i,n + ri,n (𝜆) −1( yi,n = 0) ⎜ ⎟ 𝜎Φ(zi,n (𝜃)) 𝜎 ⎜ ⎟ ′ ′ ⎜ ⎟ zi,n (𝜃)xi,n 𝜙(zi,n (𝜃))xi,n ⎟, qi,n (𝜃) = ⎜ −1( yi,n = 0) + 1( yi,n > 0) ⎜ ⎟ 𝜎Φ(zi,n (𝜃)) 𝜎 ⎜ ⎟ 2 ⎜ zi,n (𝜃) ⎟ 𝜙(zi,n (𝜃))(𝜆wi·,n Yn + xi.n 𝛽) 1 − 1 ( y > 0 ) + 1 ( y > 0 ) ⎜1( yi,n = 0) ⎟ i,n i,n Φ(zi,n (𝜃))𝜎 2 𝜎 𝜎 ⎠ ⎝ where wi·,n is the ith row of W n . Next, we can verify that {qi,n (𝜃 0 )} satisfies the conditions of the CLT of L2 -NED random fields under some proper regularity conditions. The asymptotic distribution of the MLE can then follow from the conventional Taylor expansion of the first order conditions (see,e.g., Amemiya, 1985).
𝛿k2
{
(19)
∞ √ ∑ sin(k𝜋 u) 2 2𝛿0 𝛿k k𝜋 k=1
∞ k∑ −1 ∑ sin(2k𝜋 u) 𝛿k 𝛿m +2 2k𝜋 k=2 m=1
sin((k + m)𝜋 u) sin((k − m)𝜋 u) + (k + m)𝜋 (k − m)𝜋
(20) ]}
,
(18)
∑ 2 where ∞ k=0 𝛿k = 1. To make theoretical analysis convenient, a very small constant 𝜖 0 is added to h(u) so that h(u) is bounded away from zero. As 𝜖 0 is so small, it is neglectable in practical computation. When √ the sample size is N, the functions {1, 2 cos k𝜋 u, k = 1, 2, … , K }, where limN K = ∞, are used to approximate h(u), i.e., 𝛿 k = 0 for all k > K in Eq. (19). The idea of having a transformation G(u∣𝜃 g ) is in one way to avoid some technique issue; but also in the expansion, G(u∣𝜃 g ) is the leading term so the expansion is targeting G(u∣𝜃 g ) as a prior distribution. Then we can write down the log-likelihood function, similar to Eq. (15), and obtain the sieve MLE.
4.2. Sieve MLE of a SAR Tobit model For the SAR Tobit model studied in the previous section, simulations show that MLE works well when the disturbance terms are iid normally distributed. But the estimator can be sensitive to distributional misspecification. The MLE might have some biases which might be relatively small for unimodally distributed disturbances, but can be substantial for bimodal distributions. Xu and Lee (2018) relax the
3 Following Xu and Lee (2018), we use N to denote the sample size in this sub-section.
8
X. Xu and L.-f. Lee
Regional Science and Urban Economics xxx (2018) 1–11
√ ability N (−8∕ 17, 4∕17)), whose PDF has two peaks. Biases of parametric MLEs based on the normal distribution are large. But for the sieve MLE, not matter whether the number of sieves is fixed or determined by AIC or BIC, no matter whether we use the logistic CDF or the normal CDF as the transformation function, both the biases and the standard errors of the sieve MLEs are much smaller than those of the parametric MLEs. Another natural question is to compare the performance of the sieve MLE and the normal parametric MLE in terms of loss of efficiency for the sieve approach when disturbance terms are truly normally distributed. It turns out that the loss of efficiency of the sieve MLE is small, especially when the number of sieves is determined by BIC. As the sample size increases from 200 to 1000, the probability of choosing the true model by AIC increases from 68.8% to 75.2%. And the probability is as high as 98%–99.2% when we use BIC.
In order to be consistent of a sieve MLE, as the sample size N increases, the series approximation needs to be closer to the true unknown distribution. So the number of parameters will increase as N increases even though the number of included parameters can not be too large relative to N. Technically, for uniform convergence of the sample average of log-likelihood function with increasing dimensional parameter spaces, exponential inequalities are usually needed (White and Wooldridge, 1991; Fenton and Gallant, 1996; Chen and Shen, 1998). For independent variables, time series and random fields on ℤd , there are various exponential inequalities (see, e.g., Saulis and Statulevicius, 1991; White and Wooldridge, 1991; Doukhan, 1994; Doukhan and Louhichi, 1999; Merlevède et al., 2011). Delyon (2009) establishes an exponential inequality for mixing random fields on ℝd . To accommodate spatial NED random fields, some general exponential inequalities for weakly dependent random fields are proved in Xu and Lee (2018) (Propositions 6 and 7 below). With these exponential inequalities, the consistency of the sieve estimator can then be established.
4.3. SAR binary choice model The SAR binary choice model in Eq. (12) can be regarded as a model for the Nash equilibrium of a binary choice game. Eq. (12) is a nonlinear equation system, which involves multivariate probabilities of discrete decisions in estimation. Thus, for computationally tractable estimation, one has to employ simulation approaches (McFadden, 1989). Before estimating this model, we need to consider whether its solution, i.e., a pure strategy Nash equilibrium, exists and is unique. In fact, without more conditions, there might be no solutions for Eq. (12) (see Jia (2008) for an example). That is why Xu and Lee (2016) assume that the network peer effect is non-negative, i.e., 𝜆 ⩾ 0. Then such a game is strategically complementary and is called a supermodular game (Milgrom and Roberts, 1990). From Milgrom and Roberts (1990), by Tarski’s Fixed Point Theorem, for any complete information static supermodular game, there must be at least one pure strategy Nash equilibrium, but there can also be multiple equilibria in such a game (Tamer, 2003). In the presence of multiple equilibria, it raises an issue of equilibrium selection. In the econometric literature on multiple equilibria games, an approach is to specify an equilibria selection rule such that the observed sample corresponds to the selection one (see, e.g., Bajari et al., 2010), or as alternative, one might use set estimation methods to estimate parameters as point identification might not be possible and a likelihood function might not be constructed (see, e.g., Chernozhukov et al., 2007). For our model, the set of all Nash equilibria is a complete sublattice in {0,1}n . Any complete sublattice contains a supremum and an infimum. In addition, the supremum (the maximum Nash equilibrium) is both Pareto optimal and strongly coalition-proof equilibrium (SCPE) (Milgrom and Roberts, 1995). In a Nash equilibrium, any individual will not deviate from his strategy given all the rest players’ strategies; in an SCPE, any subset of players will not deviate from their strategies simultaneously given the rest players’ strategies. Thus, SCPE is a stronger equilibrium concept than Nash equilibrium. When players can discuss their strategies freely, some of them might deviate from their strategies in Nash equilibrium simultaneously to obtain mutual benefit. Then, in this case, Nash equilibrium is not a sufficiently strong concept. Because of the above characteristics and the uniqueness of the maximum Nash equilibrium (Milgrom and Roberts, 1995) of model (12), Xu and Lee (2016) assume that it is selected in the model. ∑ Denote the SCPE solution of yi,n = 1(𝜆 nj=1 wij,n yj,n + xi′,n 𝛽 + 𝜖i,n > ′ ′ 0) by y i,n (𝜃 ), where 𝜃 = (𝜆,𝛽 ) . Thus, y i,n = yi,n (𝜃 0 ). Suppose that 𝜖 i,n ∼ i.i.d. N(0, 1). Even when n is not large, the likelihood function of this model is difficult to simulate. Thus, Xu and Lee (2016) consider a simulated GMM to estimate this model. Various moment equations can be constructed from the conditional moment equation E[Pr( y i,n = 1∣X n , 𝜃 0 ) − yi,n ∣X n ] = 0, e.g., E{[Pr( y i,n = 1∣X n , 𝜃 0 ) − yi,n ]xi,n } = 0 and E{[Pr( yi,n = 1|Xn , 𝜃0 ) − yi,n ](wi·,n Xn )′ } = 0. Denote qi,n = (xi′,n , wi·,n Xn )′ and the moment g i,n (𝜃 ) = [Pr( y i,n = 1∣X n , 𝜃 0 ) − yi,n ]qi,n . The Pr( yi,n = 1∣X n , 𝜃 0 ) is hard to be calculated analytically, but can be evaluated by stochastic simulation. The algorithm is as follows: (1) Generate R random 𝜖n(r ) ∼ N (0, In ). (2)
Proposition 6. (Exponential inequality for bounded random fields, Xu and Lee, 2018) Under Assumptions 1, let 𝜖 N = {𝜖 i,N , i ∈ DN } be an 𝛼 -mixing random field with 𝛼 -mixing coefficients 𝛼(u, v, r ) ⩽ (u + v)𝜏 exp(−a𝜖 st𝜖 ) for some 𝜏 > 0, a𝜖 > 0 and t 𝜖 ∈ (0, 1]. Let {X i,N : i ∈ DN } be an L2 -NED random field on 𝜖 N such that supi ∥X i,N ∥∞ ⩽ M < ∞, EX i,N = 0 for all i and N, and ‖Xi,N − E[Xi,N ∣ i,N (s)]‖L2 ⩽ Cx exp(−ax stx ) qC
for some Cx > 0, ax > 0 and t x ∈ (0, 1]. Denote C𝜏 Mx ≡ supq∈ℕ ( Mx + 4q𝜏 )e−q , ax𝜖 ≡ min(ax 3−tx , a𝜖 3−t𝜖 ) and t = min(t 𝜖 , t x ). Then for some constant Ca that depends only on d, ax𝜖 and t, Pr(∣ SN ∣ ⩾ N 𝜖) ⩽ ⎧
exp(5 + 3d∕t ) √ 4 𝜋 1∕(2+2d∕t ) ⎫
⎤ ⎪ ⎡ N 𝜖 2 t 2d (1 + d∕t )2+2d∕t (eMCd ax𝜖 dd )−2 ⎥ × exp ⎨−⎢ ( d − 1 )∕ t 1 t exp(2 ax𝜖 )] ⎥⎦ ⎪ ⎢⎣ 16 max[1, Ca + t Cd−1 C𝜏 Mx ax𝜖 ⎩ −d∕t
⎪ ⎬. ⎪ ⎭
Proposition 7. (Exponential inequality for unbounded random fields, Xu and Lee, 2018) Under Assumption 1, let 𝜖 N = {𝜖 i,N , i ∈ DN } be an 𝛼 -mixing random field with 𝛼 -mixing coefficients 𝛼(u, v, s) ⩽ (u + v)𝜏 exp(−a𝜖 st𝜖 ) for some constants 𝜏 , a𝜖 > 0 and t 𝜖 ∈ (0, 1]. Let {X i,N : i ∈ DN } be anL2 -NED random field on 𝜖 N satisfying (1) EX i,N ≡ 0, (2) CEB = supi,N E exp(𝛾 |X i,N |𝛼 ) < ∞ for some 𝛼 > 0 and 𝛾 > 0, and (3) ‖Xi,N − E[Xi,N ∣ i,N (s)]‖L2 ⩽ Cx exp(−ax stx ) for some constants Cx , ax > 0 and t x ∈ (0, 1]. Denote t = min(t 𝜖 , t x ) and ax𝜖 ≡ min(ax 3−tx , a𝜖 3−t𝜖 ). Then, for some constant Ca > 0 that depends only on d, ax𝜖 and t, and some constant C𝜏 x > 1 that depends only on 𝜏 and Cx , ] [ 4C1 1∕q exp(5 + 3d∕t ) CEB + · Pr(∣ SN ∣ ⩾ N 𝜖) ⩽ √ 𝜖 4 𝜋 𝛼∕[(2d+2)𝛼+2] ⎫ ⎧ ⎡ d∕t d −2 2∕𝛼 ⎤ ⎪ ⎢ N 𝜖 2 t 2d (1 + d∕t )2+2d∕t (eCd a− ⎪ x𝜖 d ) (𝛾∕q) ⎥ exp ⎨− ⎬, (d−1)∕t 1 −1 t ⎢ ⎥ 16 max [ 1 , C + C C a exp ( 2 a )] ⎪ ⎣ ⎪ a 𝜏 x x𝜖 x𝜖 ⎦ t d ⎩ ⎭
where p > 0 and q > 0 satisfy p−1 + q−1 = 1. Xu and Lee (2018) follow an idea in Bierens (2014) to establish the ̂ and 𝛽̂. First, combine the first order conasymptotic normality of 𝜆 ditions of the log-likelihood function maximization into one random function. Second, derive the asymptotic normality of the random function by a functional central limit theorem. Last, extract the asymptotic distribution of the structural parameters by projection. And it turns out ̂ and 𝛽̂ as if it were a parathat we can estimate the standard errors of 𝜆 metric model by working with the information matrix, even the whole parameter space is increasing in dimension. Simulations in Xu and Lee (2018) show that the above sieve MLE has a satisfactory finite sample performance even under a bimodal distribution of disturbances. In the simulations, the disturbance terms follow a √ mixed normal distribution (half probability N (8∕ 17, 4∕17), half prob9
X. Xu and L.-f. Lee
Regional Science and Urban Economics xxx (2018) 1–11
Calculate yi,n (𝜖n(r ) , Xn , 𝜃). This step can be calculated using the iteration
matrix. Spatial links are network links. In order for the resulted spatial process across space to be stable, it requires the underlying spatial network to be sparse or weakly linked when units are far apart. For linear spatial econometric models, properties of linear-quadratic forms play an important role in deriving large sample properties of estimators and test statistics. For the estimation of a nonlinear spatial model, mixing random fields and NED random fields are essential statistical tools to derive asymptotic properties, such as consistency and asymptotic distributions of parameter estimators. We study how NED properties of relevant statistics in a Tobit model and a discrete choice model are established, and how WLLN and CLT of spatial NED random fields can be utilized in investigating asymptotic properties of estimators of these models.
Yn(t +1) = 1(𝜆Wn Yn(t ) + Xn 𝛽 + 𝜖n(r ) > 0), where Yn(0) = (1, … , 1)′ , the n × 1 (0) one vector (Jia, 2008). The initial value Yn = (1, … , 1)′ is important, as it guarantees that the iteration will converge to a SCPE. Usually it takes fewer than 10 iterations in this step, and thus it is very fast. ̂ ( yi,n = 1 ∣ Xn , 𝜃) = 1 ∑R yi,n (𝜖n(r ) , Xn , 𝜃) provides a simulated (3) Pr r =1 R frequency estimate of the choice probability of interest. Then, a simulated moment can be based on
̂ ( yi,n = 1 ∣ Xn , 𝜃0 ) − yi,n ]qi,n . ̂ gi,n (𝜃) = [Pr By selecting a GMM weighting matrix Ωn (𝜃 ), we can obtain a simulated GMM estimator 𝜃̂n : [ n ]′ [ n ] 1∑ 1∑ ̂ ̂ gi,n (𝜃) Ωn (𝜃) gi,n (𝜃) , 𝜃̂n = arg min n i=1 𝜃∈Θ n i=1
References Amemiya, T., 1974. Multivariate regression and simultaneous equation models when the dependent variables are truncated normal. Econometrica 42, 999–1012. Amemiya, T., 1985. Advanced Econometrics. Harvard university press, Cambridge, Mass. Andrews, D.W., 1984. Non-strong mixing autoregressive processes. J. Appl. Probab. 21, 930–934. Andrews, D.W., 1992. Generic uniform convergence. Econom. Theor. 8, 241–257. Anselin, L., 1988. Spatial Econometrics: Methods and Models. Kluwer Academic Publishers, Boston. Anselin, L., Bera, A., 1998. Spatial dependence in linear regression models with an introduction to spatial econometrics. In: Ullah, A., Giles, D. (Eds.), Handbook of Applied Economics Statistics. Marcel Dekker, New York. Bajari, P., Hong, H., Ryan, S.P., 2010. Identification and estimation of a discrete game of complete information. Econometrica 78, 1529–1568. Baltagi, B.H., LeSage, J., Pace, R., 2016. Spatial Econometrics: Qualitative and Limited Dependent Variables. Advances in Econometrics 37. Emerald Publishing Limited, Bingley, United Kingdom. Baltagi, B.H., Liu, L., 2008. Testing for random effects and spatial lag dependence in panel data models. Stat. Probab. Lett. 78, 3304–3306. Baltagi, B.H., Song, S.H., Jung, B.C., Koh, W., 2007. Testing for serial correlation, spatial autocorrelation and random effects using panel data. J. Econom. 140, 5–51. Bierens, H.J., 2014. Consistency and asymptotic normality of sieve ML estimators under low-level conditions. Econom. Theor. 30, 1021–1076. Bradley, R., 2007. Introduction to Strong Mixing Conditions, vol. 3, (Heber City, UT). Brock, W.A., Durlauf, S.N., 2001. Discrete choice with social interactions. Rev. Econ. Stud. 68, 235–260. Chen, X., Shen, X., 1998. Sieve extremum estimates for weakly dependent data. Econometrica 66, 289–314. Chernozhukov, V., Hong, H., Tamer, E., 2007. Estimation and confidence regions for parameter sets in econometric models. Econometrica 75, 1243–1284. Cliff, A., Ord, J., 1973. Spatial Autocorrelation. Pion Ltd., London. Delyon, B., 2009. Exponential inequalities for sums of weakly dependent variables. Electron. J. Probab. 14, 752–779. Davidson, J., 1994. Stochastic Limit Theory: an Introduction for Econometricians. Oxford university press. Donfouet, H.P.P., Jeanty, P.W., Malin, É., 2012. Accounting for spatial interactions in the demand for community-based health insurance: a Bayesian spatial Tobit analysis. In: 11th International Workshop Spatial Econometrics and Statistics. Doukhan, P., 1994. Mixing: Properties and Examples. Springer, New York. Doukhan, P., Louhichi, S., 1999. A new weak dependence condition and applications to moment inequalities. Stoch. Process. Appl. 84, 313–342. Fenton, V., Gallant, R., 1996. Convergence rates of SNP density estimators. Econometrica 64, 719–727. Gallant, A.R., White, H., 1988. A Unified Theory of Estimation and Inference for Nonlinear Dynamic Models. Basil Blackwell, New York. Greene, W., 2011. Spatial Discrete Choice Models. Spatial Econometrics Advanced Institute, University of Rome. Greene, W., 2013. Nonlinear Models with Spatial Data. Spatial Econometrics Association, Washington D.C.. Huang, K.S., 1983. The family of inverse demand systems. Eur. Econ. Rev. 23, 329–337. Jenish, N., 2012. Nonparametric spatial regression under near-epoch dependence. J. Econom. 167, 224–239. Jenish, N., Prucha, I.R., 2009. Central limit theorems and uniform laws of large numbers for arrays of random fields. J. Econom. 150, 86–98. Jenish, N., Prucha, I.R., 2012. On spatial processes and asymptotic inference under near-epoch dependence. J. Econom. 170, 178–190. Jia, P., 2008. What happens when Wal-Mart comes to town: an empirical analysis of the discount retailing industry. Econometrica 76, 1263–1316. Jin, F., Lee, L., 2017a. Outer-product-of-gradients Tests for Spatial Autoregressive Models. Working paper, forthcoming in Regional Science and Urban Economics. Jin, F., Lee, L., 2017b. GEL Estimation and Tests of Spatial Autoregressive Models. Working paper. Kelejian, H.H., Prucha, I.R., 1998. A generalized spatial two-stage least squares procedure for estimating a spatial autoregressive model with autoregressive disturbances. J. Real Estate Fin. Econ. 17, 99–121. Kelejian, H.H., Prucha, I.R., 1999. A generalized moments estimator for the autoregressive parameter in a spatial model. Int. Econ. Rev. 40, 509–533.
where the parameter space Θ is assumed to be compact. The consistency of 𝜃̂n requires that the spatial network would not be too dense and the strength of interaction between individuals would not be too strong. Because Θ is compact, sup
i,n,𝜆,𝛽,xi,n
Pr(−𝜆‖Wn‖∞ − xi,n 𝛽 ⩽ 𝜖i,n < −xi,n 𝛽 |xi,n ) = 𝛿
for some constant 𝛿 ∈ (0, 1). As 𝛿 depends to the largest possible 𝜆, it is a variable related to the strength of direct interaction. Let us define some glossaries about graphs before discussing the complexity of the network. A path jk → jk−1 → · · · → j0 is defined to satisfy two conditions: (1) any two individuals involved are different, and (2) wjp−1 jp ,n ≠ 0 for all 1 ⩽ p ⩽ k. We call the length of the path jk → jk−1 → · · · → j0 to be k. For j ≠ i, define the geodesic distance between j and i, denoted dij , to be the length of the shortest path from j to i. An equivalent definition is dij ≡ inf{1 ⩽ k ∈ ℕ ∶ (Wnk )ij ≠ 0} when i ≠ j. For a set A, |A| denotes its cardinality. The key assumption for the NED of {yi,n (𝜃 )} is that there is an m0 ∈ ℕ such that 𝛿 l p < 1, where l p ≡ sup sup |{path jm → jm−1 → · · · → j1 → i ∶ dijm = m}|1∕m . m⩾m0 i,n
With this condition, it can be shown that given xj,n and 𝜖 i,n for all the j’s near i, with a large probability, the value of y i,n can be determined. As a corollary, y i,n (𝜃 ) is NED on {xi,n , 𝜖 i,n }. Having obtained the NED property of y i,n (𝜃 ) and the uniform convergence of empirical process
theory in Pakes and Pollard (1989), Xu and Lee (2016) show that 𝜃̂n is consistent. Next, we will discuss the asymptotic distribution of 𝜃̂n . Notice that ̂ ( yi,n = 1 ∣ Xn , 𝜃) = 1 ∑R yi,n (𝜖n(r ) , Xn , 𝜃) is a frequency function so it Pr r =1 R is not a continuous function of 𝜃 . As a result, the GMM objective function is discontinuous in 𝜃 , and the usual trick of using Taylor expansion to derive the asymptotic distribution of an estimator does not work for a simulated GMM estimator. Following Pakes and Pollard (1989), a crucial condition is the stochastic equicontinuity of the empirical process of ∑n the moment equation: √1 i=1 [gi,n (𝜃1 ) − gi,n (𝜃2 )]. Notice that {g i,n (𝜃 )} n
is both spatially correlated and heterogeneous. Xu and Lee (2016) establish the stochastic equicontinuity of the empirical process of a spatial L2 -NED random fields using the method of brackets. Then the asymptotic distribution of 𝜃̂n is established. However, since the asymptotic variance is hard to estimate, it is suggested in Xu and Lee (2016) that we construct a confidence interval by bootstrap. 5. Conclusion This paper reviews some theoretical issues and certain foundations for some spatial econometric models. We discuss the game theoretical background on economic agents’ decisions underlying spatial econometric models. Spatial links between a spatial unit and its physically or economically neighboring units are described by a spatial weight 10
X. Xu and L.-f. Lee
Regional Science and Urban Economics xxx (2018) 1–11 Qu, X., Lee, L., 2012. LM tests for spatial correlation in spatial models with limited dependent variables. Reg. Sci. Urban Econ. 42, 430–445. Qu, X., Lee, L., 2015. Estimating a spatial autoregressive model with an endogenous spatial weight matrix. J. Econom. 184, 209–232. Qu, X., Lee, L., Yu, J., 2017. QML Estimation of spatial dynamic panel data models with endogenous time varying spatial weights matrices. J. Econom. 197, 173–201. Saulis, L., Statulevicius, V.A., 1991. Limit Theorems for Large Deviations. Kluwer Academic Publishers, Norwell, MA. Shi, W., Lee, L., 2017. Spatial dynamic panel data models with interactive fixed effects. J. Econom. 197, 323–347. Tamer, E., 2003. Incomplete simultaneous discrete response model with multiple equilibria. Rev. Econ. Stud. 70, 147–165. Vuong, Q., 1989. Likelihood ratio tests for model selection and non-nested hypotheses. Econometrica 57, 307–333. White, H., Wooldridge, J., 1991. Some results on sieve estimation with dependent observations. In: Barnett, W., Powell, J., Tauchen, G. (Eds.), Non-parametric and Semi-parametric Methods in Econometrics and Statistics. Cambridge University Press, Cambridge, pp. 459–493. Xu, X., Lee, L., 2015a. A spatial autoregressive model with a nonlinear transformation of the dependent variable. J. Econom. 186, 1–18. Xu, X., Lee, L., 2015b. Maximum likelihood estimation of a spatial autoregressive Tobit model. J. Econom. 188, 264–280. Xu, X., Lee, L., 2016. Estimation of a Binary Choice Game with Network Links. Working paper). Xu, X., Lee, L., 2018. Sieve Maximum likelihood estimation of the spatial autoregressive Tobit model. J. Econom. 203 (2018), 96–112. Yang, K., Lee, L., 2017. Identification and QML estimation of multivariate and simultaneous equations spatial autoregressive models. J. Econom. 196, 196–214. Yu, J., de Jong, R., Lee, L.F., 2008. Quasi-maximum likelihood estimators for spatial dynamic panel data with fixed effects when both n and T are large. J. Econom. 146, 118–134.
Kelejian, H.H., Prucha, I.R., 2001. On the asymptotic distribution of the Moran I test statistic with applications. J. Econom. 104, 219–257. Kelejian, H.H., Prucha, I.R., 2010. Specification and estimation of spatial autoregressive models with autoregressive and heteroskedastic disturbances. J. Econom. 157, 53–67. Kuersteiner, G.M., Prucha, I.R., 2013. Limit theory for panel data models with cross sectional dependence and sequential exogeneity. J. Econom. 174, 107–126. Kuersteiner, G.M., Prucha, I.R., 2015. Dynamic Spatial Panel Models: Networks, Common Shocks, and Sequential Exogeneity. Working paper. University of Maryland. LeSage, J., Pace, R.K., 2007. A matrix exponential spatial specification. J. Econom. 140, 190–214. LeSage, J., Pace, R.K., 2009. Introduction to Spatial Econometrics. CRC press, Boca Raton. Lee, L., 2004. Asymptotic distributions of quasi-maximum likelihood estimators for spatial autoregressive models. Econometrica 72, 1899–1925. Lee, L., 2007. GMM and 2SLS estimation of mixed regressive, spatial autoregressive models. J. Econom. 137, 489–514. Lee, L., Yu, J., 2010. Some recent developments in spatial Panel data models. Reg. Sci. Urban Econ. 40, 255–271. Liu, T., Lee, L., 2017. A Likelihood Ratio Test for Spatial Model Selection. Working paper. McFadden, D., 1989. A method of simulated moments for estimation of discrete response models without numerical integration. Econometrica 57, 995–1026. Merlevède, F., Peligrad, M., Rio, E., 2011. A Bernstein type inequality and moderate deviations for weakly dependent sequences. Probab. Theor. Relat. Field 151, 435–474. Milgrom, P., Roberts, J., 1990. Rationalizability, learning, and equilibrium in games with strategic complementarities. Econometrica 58, 1255–1277. Milgrom, P., Roberts, J., 1995. Strongly Coalition-proof Equilibria in Games with Strategic Complementarities. Working paper. Stanford University. Pakes, A., Pollard, D., 1989. Simulation and the asymptotics of optimization estimators. Econometrica 57, 1027–1057.
11