Efficient GMM estimation of spatial dynamic panel data models with fixed effects

Efficient GMM estimation of spatial dynamic panel data models with fixed effects

Journal of Econometrics 180 (2014) 174–197 Contents lists available at ScienceDirect Journal of Econometrics journal homepage: www.elsevier.com/loca...

1MB Sizes 0 Downloads 82 Views

Journal of Econometrics 180 (2014) 174–197

Contents lists available at ScienceDirect

Journal of Econometrics journal homepage: www.elsevier.com/locate/jeconom

Efficient GMM estimation of spatial dynamic panel data models with fixed effects✩ Lung-fei Lee a , Jihai Yu b,∗ a

Department of Economics, Ohio State University, United States

b

Guanghua School of Management, Peking University, China

article

info

Article history: Received 31 August 2011 Received in revised form 1 July 2013 Accepted 8 March 2014 Available online 29 March 2014 JEL classification: C13 C23 R15 Keywords: Spatial autoregression Dynamic panels Fixed effects Generalized method of moment Many moments

abstract In this paper we derive the asymptotic properties of GMM estimators for the spatial dynamic panel data model with fixed effects when n is large, and T can be large, but small relative to n. The GMM estimation methods are designed with the fixed individual and time effects eliminated from the model, and are computationally tractable even under circumstances where the ML approach would be either infeasible or computationally complicated. The ML approach would be infeasible if the spatial weights matrix is not row-normalized while the time effects are eliminated, and would be computationally intractable if there are multiple spatial weights matrices in the model; also, consistency of the MLE would require T to be large and not small relative to n if the fixed effects are jointly estimated with other parameters of interest. The GMM approach can overcome all these difficulties. We use exogenous and predetermined variables as instruments for linear moments, along with several levels of their neighboring variables and additional quadratic moments. We stack up the data and construct the best linear and quadratic moment conditions. An alternative approach is to use separate moment conditions √ for each period, which gives rise to many moments estimation. We show that these GMM estimators are nT consistent, asymptotically normal, and can be relatively efficient. We compare these approaches on their finite sample performance by Monte Carlo. © 2014 Elsevier B.V. All rights reserved.

1. Introduction Recently, there is a growing literature on spatial panel and dynamic panel models. By including spatial effects into static or dynamic panel models, we can take into account the cross section dependence from contemporaneous or lagged cross section interactions. Kapoor et al. (2007) extend the method of moments estimation to a spatial panel model with error components.

✩ We would like to thank participants of seminars at Fudan University, Shanghai Jiao Tong University, University of Cincinnati, Singapore Management University, 10th World Congress of the Econometric Society (Shanghai, 2010), Shanghai Econometrics Workshop at Shanghai University of Finance and Economics (2011), two anonymous referees and the co-editor, Professor Cheng Hsiao, of this journal for helpful comments. Yu acknowledges funding from the National Science Foundation of China (project no. 71171005, 71322105) and support from the Center for Statistical Science of Peking University. ∗ Corresponding author. Tel.: +86 10 62760702. E-mail addresses: [email protected] (L.-f. Lee), [email protected] (J. Yu).

http://dx.doi.org/10.1016/j.jeconom.2014.03.003 0304-4076/© 2014 Elsevier B.V. All rights reserved.

Baltagi et al. (2007) consider the testing of spatial and serial dependence in an extended error components model, where serial correlation on each spatial unit over time and spatial dependence across spatial units are in the disturbances. Su and Yang (2007) study a dynamic panel data model with spatial error and random effects. These panel models specify spatial correlations by including spatially correlated disturbances and have emphasized on error components. In the fixed effects setting, Korniotis (2008) estimates a time-space recursive model, where individual time lag and spatial time lag are present, by the least square dummy variable (LSDV) regression approach. Yu et al. (2008, 2012) and Yu and Lee (2010) study the quasi maximum likelihood (QML) estimation for, respectively, the stable, spatial cointegration, and unit root spatial dynamic panel data (SDPD) models, where individual time lag, spatial time lag and contemporaneous spatial lag are all included. For the stable SDPD model with fixed effects, the asymptotics of the QML estimation in Yu et al. (2008) is developed under T → ∞ where T cannot be too small relative to n. In empirical applications, we might have data sets where n is large while T is relatively

L.-f. Lee, J. Yu / Journal of Econometrics 180 (2014) 174–197

small. Under this circumstance, in the literature of dynamic panels without spatial interactions, the maximum likelihood estimator (MLE) of the autoregressive coefficient of a linear dynamic panel model, which is also known as the within estimator, is biased and inconsistent when n tends to infinity but T remains finite (Nickell, 1981; Hsiao et al., 2002). This bias is due to the incidental parameter problem in Neyman and Scott (1948). By taking time differences to eliminate individual fixed effects in the dynamic equation, the estimation method of instrumental variables (IV) is popular (see Anderson and Hsiao, 1981; Arellano and Bond, 1991; Arellano and Bover, 1995; Blundell and Bond, 1998; Bun and Kiviet, 2006, etc.). This motivates our study of generalized method of moments (GMM) estimation of the SDPD model in order to cover the scenario that both n and T can be large, but T is small relative to n. The case of a finite T will also be considered.1 In this paper, we investigate the GMM estimation of an SDPD model with possibly high order spatial lags. The inclusion of high order spatial lags can allow spatial dependence from different interactions characteristics such as geographical contiguity and economic interaction.2 Compared to QML estimation, the GMM estimation has the following merits for the SDPD model: (1) GMM has a computational advantage over MLE, because GMM does not need to compute the determinant of the Jacobian matrix in the likelihood function for a spatial model, which is especially inconvenient for MLE when n is large or the model has high order spatial lags3 , 4 ; (2) some GMM methods can be applied to a short SDPD model and is free of asymptotic bias, while ML estimation of the SDPD model requires a large T and a bias correction procedure is needed to eliminate the asymptotic bias. For a finite T case, an initial specification for the first time period observations would also be needed in order to formulate a likelihood function.5 (3) With carefully designed moment conditions, the GMM estimate can be more efficient than the QML estimate when the true distribution of the disturbances are not normal and has a nonzero degree of excess kurtosis; (4) GMM is also applicable for the SDPD

1 The reason for focusing on the asymptotic with T → ∞ instead of a finite T is that, in this framework, we have the best IV or best GMM estimation with proper designs of IVs and moment conditions. This might not be possible for a fixed effects model when T is assumed to be finite. 2 In addition, a high order spatial lag model can be regarded as a general case of the first order spatial lag model with spatial disturbances. To see this, for a cross sectional SAR model Yn = λ0 Wn Yn + Xn β0 + Un where Un = ρ0 Mn Un + Vn , with premultiplication of (In − ρ0 Mn ), we have Yn = (λ0 Wn + ρ0 Mn − λ0 ρ0 Mn Wn )Ynt + (In − ρ0 Mn )Xn β0 + Vn after re-arrangement. This is a high order spatial lags model with spatial weights matrices Wn , Mn , and Mn Wn and constrained coefficients. 3 For a first order SAR model where the spatial weights matrix is diagonalizable, the determinant of the Jacobian term can be computed by its eigenvalues (see Ord, 1975). If the spatial weights matrix is not diagonalizable or we have some higher order spatial lags, the Ord device might not be applicable. 4 We note that, to construct the best instruments for the GMM in Section 3.1.2, we

p

need to inverse the n × n matrix Sn (λ) = In − j=1 λj Wnj in (4) (the matrix inversion is also involved in obtaining the information matrix of ML estimation). This will cause a computation burden if n is large. However, unlike the computation of the determinant in ML estimation that is repeated in the parameter search, the matrix inverse computation needs to be obtained only once given a consistent estimate of parameter vector so that the computation burden is less severe (we can use power series expansion to compute the matrix inverse if necessary). 5 Elhorst (2010) has developed an ML estimation using the initial value approximation in Bhargava and Sargan (1983), which does not have much bias from their Monte Carlo results. Due to the multiple dimension search in the nonlinear variance matrix function, the ML estimation in Elhorst (2010) is computationally complicated; also, it has a larger bias than the GMME. In Yu et al. (2008), the consistency of the ML estimator is derived under large n and large T . The MLE can have satisfactory finite sample results after the bias correction from the Monte Carlo simulation. Both Yu et al. (2008) and Elhorst (2010) work well under a first order SDPD model.

175

model with time effects and non-row-normalized spatial weights matrices.6 , 7 Compared to dynamic panel data models where serial correlation occursin the time dimension, the SDPD model may have correlation in the time dimension as well as spatial correlation across units. In one approach, we stack up the data and use moment conditions where the IVs have a fixed column dimension for all the periods. In another, we can use separate moment conditions for each time period, which result in many moments. Those many moments not only come from time lags, but are also designed for spatial lags. We focus on the design of estimation methods that can have some asymptotic efficient properties. Normalized asymptotic distributions of IV estimators with a finite number of moments are properly centered at the true parameter vector. In the many moment approach, normalized asymptotic distributions of IV estimates might not be properly centered or an IV estimator might not be consistent due to the many IV moments (but not directly due to the fixed effects). In contrast to the asymptotics in Yu et al. (2008) where there are ratio conditions on how T and n go to infinity in order that ML estimates can be consistent or their normalized asymptotic distributions are properly centered, such ratio conditions may no longer be needed in the proposed GMM estimation with a finite number of moments in the present paper. In the many IVs estimation method, the ratio condition concerns about the number of IVs or moments relative to the total sample size nT , but not directly the ratio of T and n. However, if the total number of IVs is essentially a function of T , then n and T ratio conditions would appear; but in that case, the ratio condition requires that T shall be small relative to n. Thus, the many IVs approach is complementary to the QML approach. In other words, the proposed estimation methods can be applied to some scenarios where the T is small relative to n, while the QML method might not be so, in theory.8 The paper is organized as follows. Section 2 introduces the model and discusses moment conditions. Section 3 investigates the consistency and asymptotic distribution of various GMM estimators, and we discuss the asymptotic efficiency of the proposed estimators. Monte Carlo results for various estimators are provided in Section 4. Section 5 concludes the paper and summarizes the contributions. Some lemmas and proofs are collected in the Appendices.9

6 It is possible to eliminate the time effects by taking cross sectional difference, but the resulting equation would not have an SAR representation and, therefore, one cannot set up a likelihood function for estimation. The MLE will have an additional incidental parameter problem if time effects need to be estimated in addition to the individual effects. 7 Bell and Bockstael (2000) argue that, based on some underlying economic story, it is not necessary to always row-normalize the spatial weights matrix. In some cases, row-standardizing changes the total impact of neighbors across observations, although it does not change the relative dependence among all neighbors of any given observation. They use the real estate problem to argue that row-standardizing will attach too much weight to the neighbors of remote houses. In social interaction and network literatures, when the social interaction is specified as an SAR model, the measure of centrality in Bonacich (1987) comes out naturally in the reduced form equation. When the indegrees (the sums of each row) of the sociomatrix have a non-zero variation, so does the Bonacich centrality measure, which helps to identify the various interaction effects. Therefore, in empirical applications, sometimes a spatial weights matrix without row-normalization would be appropriate. For estimation procedure in spatial econometrics, Kelejian and Prucha (2010) consider implications on the parameter space of the SAR model when the spatial weights matrix is not row-normalized. 8 However, for the case with multiple spatial weights matrices, when T is not really small, in order to accommodate spatial expansions, the IVs might be too many in order to be practical. This finite sample issue is presented in the Monte Carlo section. 9 Proofs for lemmas and more Monte Carlo results are provided in a supplement file available on request (see Appendix E).

176

L.-f. Lee, J. Yu / Journal of Econometrics 180 (2014) 174–197

2. The model and moment conditions

∗ The time effects αt0 can be further eliminated11 in the following 12 equation :

2.1. The model

∗ Jn Ynt =

p 

p 

1) λj0 Jn Wnj Ynt∗ + γ0 Jn Yn(∗,− ,t − 1 +

j =1

The model under consideration is an SDPD model with both individual and time effects Ynt =

p 

λj0 Wnj Ynt + γ0 Yn,t −1 +

p 

j=1

ρj0 Wnj Yn,t −1 + Xnt β0

j =1

+ cn0 + αt0 ln + Vnt ,

t = 1, 2, . . . , T ,

(1)

where Ynt = (y1t , y2t , . . . , ynt )′ and Vnt = (v1t , v2t , . . . , vnt )′ are n × 1 column vectors, and vit ’s are i.i.d. across i and t with zero mean and variance σ02 . The Wnj is an n × n spatial weights matrix for j = 1, . . . , p, which is nonstochastic and generates the dependence of yit ’s across spatial units. If p ≥ 2, (1) is a high order SAR structure. The Wnj ’s may or may not be row-normalized. Xnt is an n × kx matrix of nonstochastic regressors, cn0 = (c1,0 , . . . , cn,0 )′ is an n × 1 column vector of individual effects, and αt0 is a scalar time effect. The initial values in Yn0 are assumed to be observable. We impose the normalization l′n cn0 = 0 where ln is an n × 1 vector of ones to avoid the un-identification of ci,0 and αt0 because ci,0 + αt0 = (ci,0 + η) + (αt0 − η) for an arbitrary η. To avoid the incidental parameter problem in estimating (1), the individual effect cn0 and time effect αt0 shall be eliminated. Let [FT ,T −1 , √1 lT ] be the orthonormal matrix of the eigenvectors T

of JT = (IT − T1 lT l′T ), where FT ,T −1 is the T × (T − 1) eigenvectors matrix corresponding to the eigenvalues of one and lT is the T -dimensional vector of ones. The n × T matrix of dependent variables [Yn1 , Yn2 , . . . , YnT ] can be transformed into the n × (T − ∗ ∗ 1) matrix [Yn1 , Yn2 , . . . , Yn∗,T −1 ] = [Yn1 , Yn2 , . . . , YnT ]FT ,T −1 ; and, (∗,−1)

also, [Yn0

(∗,−1) 1) , Yn1 , . . . , Yn(∗,− ,T −2 ] = [Yn0 , Yn1 , . . . , Yn,T −1 ]FT ,T −1 . (∗,−1)

It is important to note that Yn,t −1 and Yn∗,t −1 are not equal. ∗ ∗ The Vnt and Xnt are defined similarly for t = 1, . . . , T − 1. ′ As lT FT ,T −1 = 0, it follows that [cn0 , . . . , cn0 ]FT ,T −1 = 0 and individual effects are eliminated by the orthonormal transformation. Among these orthonormal transformations, the forward orthogonal difference (FOD) transformation (also known as the Helmert transformation) is found to be convenient.10 After the FOD transformation FT ,T −1 to eliminate the individual effects, (1) becomes ∗ Ynt =

p 

1) λj0 Wnj Ynt∗ + γ0 Yn(∗,− ,t − 1 +

j =1

p 

1) ρj0 Wnj Yn(∗,− ,t − 1

j =1

+ Xnt β0 + αt0 ln + Vnt , ∗





1

t = 1, . . . , T − 1,

(2) (∗,−1)

1 ∗ = where Vnt = ( T −T −t +t 1 ) 2 [Vnt − T − h=t +1 Vnh ] and Yn,t −1 t  1 T −1 T −t 1 ( T −t +1 ) 2 [Yn,t −1 − T −t h=t Ynh ] depend on current and future ∗ ∗ , α20 , . . . , αT∗−1,0 ] = variables, but not on the past ones, and [α10 [α10 , α20 , . . . , αT 0 ]FT ,T −1 can be considered as transformed time ′ ′ ′ ∗′ ) and , . . . , VnT effects. As (Vn1 , . . . , Vn∗′,T −1 )′ = (FT′ ,T −1 ⊗ In )(Vn1 ′ ∗ ∗ FT ,T −1 FT ,T −1 = IT −1 , the vit ’s are uncorrelated where vit is the ith ∗ element of Vnt .

T

10 Anderson and Hsiao (1981) propose to use the first difference to eliminate the individual effects, where lagged values of the dependent variable can be used as IVs, and the resulting disturbances have serial correlation. Instead of first difference, Arellano and Bover (1995) use the FOD for the data transformation, where the resulting disturbances are still uncorrelated if they are originally i.i.d.

1) ρj0 Jn Wnj Yn(∗,− ,t − 1

j =1

+ Jn Xnt β0 + Jn Vnt , ∗



t = 1, . . . , T − 1,

(3)

1 l l′ n n n

where Jn = In − because Jn ln = 0. Estimation of (3) by the ML method has two issues. First, when Wnj is not row-normalized, (3) ∗ 13 would not have a well-defined SAR structure for Jn Ynt . Second, in the presence of time lags variables, the regressors in (3) are correlated with the disturbances after data transformation by FT ,T −1 . For these reasons, a likelihood function could not be formed directly from (3). We propose GMM estimation of (3), which does not require an ∗ SAR form for Jn Ynt and it can be free of asymptotic bias as will be shown. For asymptotic analysis of GMM estimates of (3), the reduced form of (1) is needed. Denote λ = (λ1 , . . . , λp )′ and ρ = (ρ1 , . . . , ρp )′ . At true parameter values, let λ0 = (λ10 , . . . , λp0 )′ and ρ0 = (ρ10 , . . . , ρp0 )′ . For the reduced form from (2), by p denoting Sn (λ) = In − j=1 λj Wnj , Sn ≡ Sn (λ0 ) and An =

Sn−1 (γ0 In +

p

j =1

ρj0 Wnj ), we have

(∗,−1) An Yn,t −1

∗ ∗ + Sn−1 (Xnt∗ β0 + αt0 ln + Vnt ). ∗ For each spatial lag Wnj Ynt for j = 1, . . . , p, by defining Gnj ≡ ∗

Ynt =

Wnj Sn−1 , we have

∗ ∗ ∗ Wnj Ynt = Gnj (Znt∗ δ0 + αt0 ln ) + Gnj Vnt ,

where δ0 = (γ0 , ρ0 , β0 ) and ′

′ ′

predetermined regressors in

(4)

(∗,−1) (∗,−1) ∗ Znt = [Yn,t −1 , Wn Yn,t −1 , Xnt ] is the (∗,−1) (∗,−1) (2) with Wn Yn,t −1 = (Wn1 Yn,t −1 , ∗

1) . . . , Wnp Yn(∗,− ,t −1 ). In general, for a vector (or matrix) bn with n rows,

we denote Wns bn with s being an integer as a matrix consisting sp s s of vectors (or matrices) Wn11 Wn22 · · · Wnp bn where s1 , s2 , · · · , sp are nonnegative integers such that s1 + s2 + · · · + sp = s. 2 For example, Wn2 bn = [Wn1 bn , . . . , Wn1 Wnp bn , Wn2 Wn1 bn , . . . , 2 Wn2 Wnp bn , . . . , Wnp Wn1 bn , . . . , Wnp bn ]. 2.2. Moment conditions (∗,−1)

For the estimation Eq. (3), we note that Yn,t −1 is correlated with ∗ Vnt . For this reason, IVs are needed for

(∗,−1) Yn,t −1

(∗,−1)

and Wnk Yn,t −1 for

11 Even if we can estimate the time effects when n is large under the MLE, we will have the incidental parameter problem with a large T and the asymptotic distribution of the estimators might not be properly centered if T does not increase faster than n. See Lee and Yu (2010) for details under a first order SDPD model (Theorems 3 and 6), and also Hahn and Kuersteiner (2002) and Alvarez and Arellano (2003) under a dynamic panel data setting. 12 The ML estimation of (2) will incur incidental parameter problem as time effects need to be estimated when T is large and Wnj ’s are not row-normalized. When T is small, we need to specify the initial observation process to write down the likelihood function. For these reasons, we focus on the GMM estimation of the following (3). Alvarez and Arellano (2003) investigate a random effects ML estimator where there is no exogenous variable. They have derived the likelihood function of the random component model as a product of two likelihood functions where one is for the between equation. The likelihood for the fixed effect model is a ratio of the two likelihoods. When exogenous variables are present and correlated with the individual effects, it is not obvious on how to derive the likelihood function. Instead, Elhorst (2010) studies an ML approach by an initial period approximation similar to Bhargava and Sargan (1983). 13 When W is row-normalized, we have J W = J W J as W l = l and nj

n

nj

n

nj n

nj n

n

∗ Jn ln = 0, which implies that (3) may have Jn Ynt as a dependent variable in an SAR form. For the SDPD model with row-normalized weights matrix, Lee and Yu (2010) has formulated a partial likelihood approach for the estimation with the time effects removed but not the individual effects. The partial likelihood approach estimates both the common parameters of interest as well as the individual effects.

L.-f. Lee, J. Yu / Journal of Econometrics 180 (2014) 174–197

∗ each t (and also for Wnj Ynt ). In addition to all strictly exogenous variables Xns for s = 1, . . . , T − 1, the time lag variables (∗,−1) Yn0 , . . . , Yn,t −1 can also be used to construct IVs for Yn,t −1 as in the literature of dynamic panel data models (Alvarez and Arellano, 2003, etc.). Correspondingly, we may use Wnk Xns for s = 1, . . . , T − (∗,−1) 1 and Wnk Yns for s = 0, . . . , t − 1 as IVs for Wnk Yn,t −1 . Therefore, for the estimation of (3), we need to find IVs for the regressors (∗,−1) (∗,−1) Jn [Wn Ynt , Yn,t −1 , Wn Yn,t −1 ]. ∗

(5)

For notational purpose, we denote It −1 as the σ -algebra spanned by (Yn0 , . . . , Yn,t −1 ), conditional on (Xn1 , . . . , XnT , cn0 , α10 , . . . , αT 0 ). The best theoretical IV for (5) shall be its conditional mean conditional on It −1 . We might use the estimated conditional mean (so that we have a finite number of moments in Section 3.1), or use levels of neighboring variables (spatial power series expansion) of predetermined and exogenous variables to approximate the best IV (so that we have many moments in Section 3.2). When the spatial weights matrices are row-normalized, the time effects will not influence the conditional mean of (5) (see Lee and Yu, 2010). However, when Wnj ’s are not row-normalized, the conditional mean of (5) will be influenced by the time effects.14 For the linear moments, we can stack up the data and construct moment conditions. An IV matrix for (5) can take the form Jn Qnt where Qnt has a fixed column dimension q greater than or equal to kx + 2p + 1. For example, Qnt could be

[Yn,t −1 , Wn Yn,t −1 , Wn2 Yn,t −1 , Xnt∗ , Wn Xnt∗ , Wn2 Xnt∗ ].

Q′n,T −1 Jn,T −1 V∗n,T −1 (θ ) ′ where Qn,T −1 = (Qn1 , . . . , Qn′,T −1 )′ and Jn,T −1 = IT −1 ⊗ Jn . Here, ∗ even though we have the transformed time effect αt∗ ln in Vnt (θ ), it will be eliminated in the moment conditions by Jn . In addition to the linear moments, due to the spatial correlation in the DGP, quadratic moments can capture correlations and may increase the ∗ efficiency of estimates.15 The vector Pnl Vnt can be uncorrelated ∗ with Jn Vnt in (3) for an n × n nonstochastic matrix Pnl satisfying ∗ the property tr(Pnl Jn ) = 0, while it may correlate with Gnj Vnt in (4). Denote Pnl,T −1 = IT −1 ⊗ Pnl where Pnl satisfies tr(Pnl Jn ) = 0. The quadratic moments are ∗ V∗′ n,T −1 (θ )Jn,T −1 Pnl,T −1 Jn,T −1 Vn,T −1 (θ )

moments, the moment conditions would be



for l = 1, 2, . . . , m

so that we can have m such quadratic moments. For analytical tractability, we assume that Pnl is uniformly bounded in both row and column sums in absolute value (for short, UB).16 These settings provide general frameworks in which one may discuss the best designs of Qnt and Pnl,T −1 . For the approach with finite number of

∗ V∗′ n,T −1 (θ )Jn,T −1 Pn1,T −1 Jn,T −1 Vn,T −1 (θ )



  ..   . . ∗ V∗′  (θ ) J P J V (θ ) n,T −1 nm,T −1 n,T −1 n,T −1 n,T −1 Q′n,T −1 Jn,T −1 V∗n,T −1 (θ )

gnT (θ ) = 

(7)

For the many moment approach, denoting hnt = (Yn0 , . . . , Yn,t −1 , Xn1 , . . . , XnT , ln ), we can use the IV matrix Jn Hnt with Hnt = (hnt , Wn hnt , . . . , Wnpn hnt ).

(8)

The column dimension of hnt is pt = kx T + t + 1 and that of Hnt can be pt ·(1 + p + p2 +· · ·+ ppn ) where pn is the order of spatial power series expansion. Here, in addition to the spatial power series expansion of the lagged and exogenous variables, we also include the spatial power series expansion of the n×1 vector ln . When the Wnj ’s are not row-normalized, the spatial power series expansion of ln as products with Wn and its high orders will approximate the time effects component Gnj ln in the conditional mean of the spatial lag in (4). On the other hand, if Wnj ’s are row-normalized, neither ln nor its product with Wn would matter because Wnj ln = ln and Jn ln = 0. Combined with the quadratic moments, for the many moment approach, we have



∗ V∗′ n,T −1 (θ )Jn,T −1 Pn1,T −1 Jn,T −1 Vn,T −1 (θ )



  ..   . , ∗ V∗′  (θ ) J P J V (θ ) n,T −1 nm,T −1 n,T −1 n,T −1 n ,T − 1 Diag(Hn1 , . . . , Hn,T −1 )′ Jn,T −1 V∗n,T −1 (θ )

gnT (θ ) = 

(6)

∗′ Let V∗n,T −1 (θ ) = (Vn1 (θ ), . . . , Vn∗′,T −1 (θ ))′ where Vnt∗ (θ ) = Snt (λ) ∗ ∗ ∗ Ynt −Z nt δ − αt ln with θ = (λ, δ ′ )′ and δ = (γ , ρ, β ′ )′ . IV estimation corresponds to the linear moments

177

(9)

where Diag(Hn1 , . . . , Hn,T −1 ) is a block diagonal matrix with diagonal blocks Hnt ’s. For (7), the column dimension of Qnt is fixed and is the same for all t. For (9), the column dimension of Hnt might increase in t. The latter approach requires careful analysis due to the many moments issue when T → ∞. 3. Asymptotic properties of GMME In the following, Section 3.1 derives the consistency and asymptotic distribution of GMM estimators when we use a finite number of moment conditions where T can be finite or large. Under the framework of T being large, optimal moment conditions can be designed. Section 3.2 derives the asymptotic properties of GMM estimators when we use many moment conditions. For our analysis of the asymptotic properties of estimators, we make the following assumptions. Assumption 1. Wnj is a nonstochastic spatial weights matrix with zero diagonals for j = 1, . . . , p. Assumption 2. The disturbances {vit }, i = 1, 2, . . . , n and t = 1, 2, . . . , T , are i.i.d. across i and t with zero mean, variance σ02 and E |vit |4+η < ∞ for some η > 0. Assumption 3. Sn (λ) is invertible for all λ ∈ Λ, where the parameter space Λ is compact and λ0 is in the interior of Λ.

14 Even though time effects in (2) can be eliminated with the J premultiplication n so that the estimation equation (3) is free of time effects for its regressors, the time ∗ effects component in the DGP of Wnj Ynt cannot be eliminated if Wnj is not rownormalized as seen from (4). 15 The use of quadratic moments is motivated by the likelihood function of the SAR model under normality disturbances (Lee, 2007), as well as the Moran test statistic (Moran, 1950). 16 We say a (sequence of n × n) matrix P is uniformly bounded in row and n

column sums if supn≥1 ∥Pn ∥∞ < ∞ and supn≥1 ∥Pn ∥1 < ∞, where ∥Pn ∥∞ ≡  n   n  sup1≤i≤n j=1 pij,n  is the row sum norm and ∥Pn ∥1 = sup1≤j≤n i=1 pij,n  is the column sum norm.

Assumption 4. Wnj is UB for j = 1, . . . , p and ∥ 1. Also, Sn−1 (λ) is UB, uniformly in λ ∈ Λ.

p

j =1

λj0 Wnj ∥∞ <

Assumption 5. Xnt , αt0 and cn0 are nonstochastic with supn,T

1 nT 2+η

t =1 i=1 |xit ,l |  < ∞ for l = 1, . . . , k, t =1 |αt0 | < ∞ and supn 1n ni=1 |cni |2+η < ∞ for some η > 0, where xit ,l is the (i, l) element of Xnt and ci0 is the ith element of cn0 . Also, 

T

n

limn→∞ n(T1−1)

2+η

T −1 t =1

supT T1

T

∗′ ∗ Xnt Jn Xnt exists and is nonsingular.

178

L.-f. Lee, J. Yu / Journal of Econometrics 180 (2014) 174–197

h∗

h −1 Assumption 6. Yn0 = h=0 An Sn (cn0 + Xn,−h β0 + α−h,0 ln ∗ + Vn,−h ), where h could be finite or infinite.

Assumption 7.

∞

h =0

abs(Ahn ) is UB where [abs(An )]ij = An,ij .





The zero diagonal assumption on Wnj helps the interpretation of the spatial effect as self-influence shall be excluded in practice. Assumption 2 provides regularity assumptions for vit . Assumption 3 guarantees that the model is an equilibrium one. Also, the compactness of the parameter space  is a condition for p theoretical analysis.17 In Assumption 4, when ∥ j=1 λj0 Wnj ∥∞ < 1, Sn−1 can be expanded as an infinite series in terms of Wnj . In many empirical applications of spatial issues, each of the rows of Wnj sums to 1, which ensures that all the weights are between 0 and 1. The uniform boundedness assumption in Assumption 4 is originated by Kelejian and Prucha (1998, 2001) and also used in Lee (2004, 2007). That Wnj and Sn−1 (λ) are UB is a condition that limits the spatial correlation to a manageable degree. From Lee and Liu  space  for λ could be the one satisfying p (2010), the parameter   −1 .18 When exogenous variables j=1 |λj | < (maxj=1,...,p Wnj ∞ ) Xnt are included in the model, an empirical moment of a higher than second order restriction is imposed as in Assumption 5. So is for the individual effects and time effects. The second empirical moment restrictions are useful for some sample statistics to be bounded in our asymptotic analysis. The higher than the second moment restrictions are used in a central limit theorem for a linear and quadratic form in Kelejian and Prucha (2001). The remaining ∗ part of Assumption 5 points out that the regressors of Xnt are asymptotically linearly independent. Assumption 6 specifies the initial condition so that the process may start from a finite or infinite past. The h∗ can be arbitrary to capture the origin of the process from the past and it does not need to be specified in the estimation or asymptotic analysis. Assumption 7 combines the absolute summability condition and the UB condition of the power series of An , which is essential for the analysis in this paper, because it limits the dependence over time series and across spatial units.19 3.1. Asymptotic properties of GMME with finite moments 3.1.1. Consistency and asymptotic distribution of GMME For the moment conditions in (7), the identification requires that plimn→∞ n(T1−1) gnT (θ ) = 0 should have a unique solution θ0 .

∗ ∗ From the linear moments, because Ynt = Sn−1 (Znt∗ δ0 + αt0 ln +

17 The invertible S makes sure that we have an equilibrium system. To construct n the best instruments in Section 3.1.2, invertibility of estimated Sn is also needed. For the parameter space, compactness is relatively stronger than the boundedness assumption for convenience. 18 Therefore, if one takes this as the parameter space of interest, for the case

of p = 3, the true values of the λ′j s are not allowed to be all 0.5 or all −0.5 when Wnj ’s are row-normalized. Elhorst et al. (2012) discussed the parameter space of the two order spatial lag model for stationarity. They find that the rectangle with vertices (− |w 1 | , − |w 1 | ), (− |w 1 | , w 1 ), ( w 1 , − |w 1 | ) and ( w

1,min

2,min

1,min

2,max

1,max

2,min

, w2,1max ) would be too broad, while the circle with |λ1 | + |λ2 | <   (maxj=1,...,p Wnj ∞ )−1 would be too restrictive. They argue that, according to 1 1,max

Hepple (1995), the exact boundaries for the curves connecting the four coordinates can only be determined by a numerical search. In Elhorst et al. (2012), they have developed a procedure to decide the parameter space of λj ’s of a cross sectional SAR model under the ‘‘stationary’’ condition, where the parameter space of λj ’s will be decided by the largest and smallest eigenvalues of each spatial weights matrix in a complicated way. In the spatial dynamic panel data model, the λj ’s, γ and ρj ’s shall then satisfy the absolute summability condition in Assumption 7. Therefore, the parameter space for the high order spatial dynamic panel data model is complicated. 19 In this paper, we focus only on the stable dynamic model setting, but not unit root or related issues.

∗ Vnt ) and Jn ln = 0, we have Jn Vnt∗ (θ ) = Jn [Sn (λ)Ynt∗ −Z ∗nt δ] = ∗ ∗ ∗ Jn [Sn (λ)Sn−1 (Znt δ0 +αt0 ln )− Znt δ + Sn (λ)Sn−1 Vnt∗ ] where Sn (λ)Sn−1 = p ∗′ In − j=1 (λj − λj0 )Gnj . Denote Z∗n,T −1 = (Zn1 , . . . , Zn∗′,T −1 )′ , ∗ ∗ L∗nj,t = Gnj (Znt δ0 + αt0 ln ),

and

L∗nt = [L∗n1,t , L∗n2,t , . . . , L∗np,t ]

∗′ ′ L∗n,T −1 = [L∗′ n1 , . . . , Ln,T −1 ] .

(10)

We have Q′n,T −1 Jn,T −1 V∗n,T −1 (θ ) = Q′n,T −1 Jn,T −1 [Z∗n,T −1 (δ0 − δ) + 1 ∗ L∗n,T −1 (λ0 −λ)+Sn,T −1 (λ)S− n,T −1 Vn,T −1 ] where Sn,T −1 = IT −1 ⊗Sn . As plimn→∞ n(T1−1)

∗ Qnt′ Jn Sn (λ)Sn−1 Vnt = 0 uniformly20 in θ ∈ Θ

 T −1 t =1

from Lemma 1(iv), the unique solution of plimn→∞ n(T1−1) gnT (θ ) = 0 at θ0 requires that 1 Q′n,T −1 Jn,T −1 [L∗n,T −1 , Z∗n,T −1 ] plimn→∞ n(T − 1)

× ((λ0 − λ)′ , (δ0 − δ)′ )′ = 0 should have a unique solution θ0 . That plimn→∞ n(T1−1) Q′n,T −1 Jn,T −1 [L∗n,T −1 , Z∗n,T −1 ] has the full column rank kz + p, where kz = p + kx + 1, is a sufficient condition. Because Z∗n,T −1 consists of time and spatial time lags, this condition will, in general, be satisfied as long as δ0 ̸= 0 because time and spatial time lags can be used to construct relevant IVs. Assumption 8. The n × q IV matrix Qnt is predetermined such that E(Qnt |It −1 ) = Qnt , its column dimension is fixed for all n and t. Let Cnt be an n × 1 column vector from Qnt . The E[|Cnt ,i |2+η ] for some η > 0 is bounded uniformly in all i = 1, . . . , n and all n and t. Additionally, plimn→∞ n(T1−1) Q′n,T −1 Jn,T −1 Qn,T −1 is of full rank q and plimn→∞ n(T1−1) Q′n,T −1 Jn,T −1 [L∗n,T −1 , Z∗n,T −1 ] is of full rank kz + p.

As in Hansen’s GMM setting (1982), one considers a linear transformation of the moment conditions, anT gnT (θ ), where anT is a matrix with its number of rows greater than or equal to (kz + 1) and anT is assumed to converge in probability to a constant full rank matrix a0 . For the optimal GMM (OGMM) estimation, we need the variance matrix of the moment conditions. Let vecD (A) be the column vector formed by diagonal elements of any square matrix A; vec(A) the column vector formed by stacking the columns of A; and As = A + A′ . For the variance matrix of the moment conditions in (7), let ωnm,T = [vecD (Jn,T −1 Pn1,T −1 Jn,T −1 ), . . . , vecD (Jn,T −1 Pnm,T −1 Jn,T −1 )] and

∆mn,T = [vec(Jn,T −1 P′n1,T −1 Jn,T −1 ), . . . , vec(Jn,T −1 P′nm,T −1 Jn,T −1 )]′

× [vec(Jn,T −1 Psn1,T −1 Jn,T −1 ), . . . , vec(Jn,T −1 Psnm,T −1 Jn,T −1 )]. Denote µ4 as the fourth moment of vit . The variance matrix of the moments in (7) can be approximated by21  1

 n(T − 1) ΣnT = σ04 

∆nm,T

0q×m

+

1 n(T − 1)



1

1



0m×q ′

σ02 n(T − 1)

Qn,T −1 Jn,T −1 Qn,T −1

′ (µ4 − 3σ04 )ωnm ,T ωnm,T

0q×m

∗ 0q×q

 

 .

(11)

T −1 ′ 1 20 Because S (λ)S −1 = I + (λ − λ)G , the plim ∗ n n 0 n n→∞ n(T −1) n t =1 Qnt Bn Vnt = 0 in Lemma 1 implies uniform convergence in θ ∈ Θ for those terms because the parameter space of λ is bounded and λ appears linearly or in a quadratic form in those relevant terms. 21 Here, Σ is not exactly the variance matrix of the moment conditions nT

but a consistent estimate. While the elements involving the quadratic moment conditions take the expectation form, the elements involving the linear moment conditions take the regular form without expectations or probability limit (because the Qnt ’s are functions of predetermined variables). By Lemma 1(i), the exact σ2

0 variance matrix of the linear moment is n(T − 1)

σ02 Q′ J Q n(T −1) n,T −1 n,T −1 n,T −1

T −1 t =1

E (Qnt′ Jn Qnt ), which has the

same limit as in ΣnT . The reason we use such a ΣnT is due to its simplicity. As we see from the Monte Carlo results, the estimation of standard deviation of estimates using ΣnT in (11) is close to the empirical standard deviation.

L.-f. Lee, J. Yu / Journal of Econometrics 180 (2014) 174–197

179

When vit is normally distributed, the second component of ΣnT will −1 be zero because µ4 − 3σ04 = 0.22 For the optimal GMM, ΣnT is used ′ in place of anT anT . Also, as shown in Appendix C.1, we have

The efficiency of the OGMME θˆo,nT relative to the 2SLSE is apparent due  to the additional quadratic moments. For the 2SLS, anT =

∂ gnT (θˆnT ) = DnT + RnT + Op n(T − 1) ∂θ ′

quadratic moments. For optimal GMM, we have

1



1

 ,



nT

0m×m 0q×m

anT

where DnT is O(1) specified in (32) and RnT is O T1 specified in (33). Denote DnT = DnT + RnT . Thus, when T is large, the component RnT will disappear and DnT will be reduced to DnT asymptotically. Throughout this paper, we assume a large number of spatial units n, while the time period T could be either large or small. The particular interest in this paper is for the case that T can be large, but small relative to n, as the estimation of such a case has not been explicitly covered in the spatial panel literature. Theorem 1 provides the consistency and asymptotic distributions of GMM estimates. The results are valid with either a finite T or T → ∞. For the OGMM estimation, we assume that Qn,T −1 does not contain redundant IV’s so that ΣnT is invertible.

 

Theorem 1. Suppose we use the moment conditions in (7). Assume that a0 plimn→∞ n(T1−1) gnT (θ ) = 0 has a unique root at θ0 in Θ .

Under Assumptions 1–8, as n → ∞, the GMME θˆnT derived from ′ minθ∈Θ gnT (θ )a′nT anT gnT (θ ) is consistent and



 d ˆ n(θnT − θ0 ) → N 0, plimn→∞

1



T −1

′ ′ DnT anT anT DnT

× DnT anT anT ΣnT anT anT DnT DnT anT anT DnT ′









−1



−1



d

n(θˆo,nT − θ0 ) → N

0, plimn→∞

1

.

−1 ′ (DnT ΣnT DnT )−1 . (12)

−1 −1 ˆ nT Suppose that Σ − ΣnT = op (1), then the feasible OGMME 23

−1 ′ ˆ nT derived from minθ ∈Θ gnT (θ )Σ gnT (θ ) has the same asymptotic distribution as (12).



d

When T is large, (12) will become n(T − 1)(θˆo,nT − θ0 ) → −1 N (0, plimn,T →∞ (D′nT ΣnT DnT )−1 ) where the DnT is reduced to DnT . The OGMME can be compared with the 2SLSE applied to (3) using IV matrix Qn,T −1 :

 −1 θˆ2sl,nT = (Wn,T −1 Y∗n,T −1 , Z∗n,T −1 )′ MJQ ,nT (Wn,T −1 Y∗n,T −1 , Z∗n,T −1 )   × (Wn,T −1 Y∗n,T −1 , Z∗n,T −1 )′ MJQ ,nT Y∗n,T −1 , (13) where Wn,T −1 Y∗n,T −1 = [Wn1,T −1 Y∗n,T −1 , . . . , Wnp,T −1 Y∗n,T −1 ] with Wnj,T −1 = IT −1 ⊗ Wnj and MJQ ,nT = Jn,T −1 Qn,T −1 (Q′n,T −1 Jn,T −1 Qn,T −1 )−1 Q′n,T −1 Jn,T −1 . The θˆ2sl,nT is consistent and asymptotically normal with the limiting variance matrix

σ02 plimn→∞



1 n(T − 1)

−1 (L∗n,T −1 , Z∗n,T −1 )′ MJQ ,nT (L∗n,T −1 , Z∗n,T −1 ) .

22 Due to the deviation from the time average to eliminate individual effects, the third moment µ3 is irrelevant in the variance matrix in (11) even though vit is not normally distributed. Therefore, the variance matrix ΣnT of the moment conditions (7) is block diagonal. See Lemma 2 and its proof in Appendix B. 23 The optimal weighting matrix involves the true parameter σ 2 and µ , which can 0

4

ˆ ∆Ynt − be consistently estimated using the initial GMM θˆnT . Denote ∆rˆnt = Sn (λ) ∗ ˆ Ynt∗ − Znt∗ δˆnT . The σ02 can = Sn (λ) ∆Znt δˆnT where ∆ is the first difference and rˆnt ∗ be estimated from rˆnt by σˆ 2 = Sˆ4 2n(T −1)

1 n(T −1)

T −1 t =1

∗′ ˆ ∗ rˆnt Jn rnt and the µ4 can be estimated

− 3σˆ 4 where Sˆ4 = i=1 t =2 [Jn ∆rˆnt ]i from ∆rˆnt by µ ˆ4 = supplement file (see Appendix E) for the estimation of µ4 in detail. n T



0q×m

 0m×q

(σ02 Q′n,T −1 Jn,T −1 Qn,T −1 )−1/2

.

3.1.2. The best linear and quadratic moment conditions under large T As the quadratic and linear moment conditions of V∗n,T −1 do not interact with each other (see Lemma 2), ΣnT in (11) is block diagonal. When T is finite, the choice of the best linear moments might not be obvious. When T is large, the RnT component in DnT = DnT + RnT will disappear and the best linear moments would be tractable.24 Especially, under the large T setting, we can compare the asymptotic variance matrix of the GMM with the ML estimates which are derived by using a direct estimation approach with the individual effects being estimated.25 Thus, in this section, we will investigate the best linear and quadratic moments for GMM estimation when T is large. Under large T , the precision matrix of the OGMME from Theorem 1 is 1

σ02 n(T − 1)



T −1

=

 −1/2 4 4 ′  σ0 ∆nm,T + (µ4 − 3σ0 )ωnm,T ωnm,T

1 −1 D′nT ΣnT DnT =

Also, the optimal GMM estimator (OGMME) θˆo,nT derived from −1 ′ minθ∈Θ gnT (θ )ΣnT gnT (θ ) has



0m×q

(σ02 Q′n,T −1 Jn,T −1 Qn,T −1 )−1/2 , which gives zero weight to the

4

. See the

× (L∗n,T −1 , Z∗n,T −1 )′ MJQ ,nT (L∗n,T −1 , Z∗n,T −1 )   1 1 +O + T n(T − 1)   − 1 µ4 − 3σ04 ′ C ′ Cmp,nT ω ω + ∆ mn,T mp,nT nm,T nm,T × σ04  0kz ×p

 0p×kz   0kz ×kz

,

(14) where the O( ) term is related to RnT in DnT . As is derived in Appendix D, the best quadratic moment matrix is 1 T

∗ P∗nj,T −1 = IT −1 ⊗ Pnj , ∗ where Pnj = (Gnj − n b∗n = ( n− )2 ( 2

for j = 1, . . . , p

tr(Gnj Jn )

1

n−1

η −3 n + 42 n−2



(15)



Jn ) + b∗n diag(Jn Gnj Jn ) − 1

n n−2

) and η4 =

µ4 , σ04

tr(Gnj Jn ) n



In with

diag(A) denotes the

diagonal matrix formed by diagonal elements of a square matrix A. ∗ is the best within the class of matrices such that tr(Pnj Jn ) = Thus, Pnj 0. With such a set of p best quadratic moments, the derived GMME is efficient relative to any finite number of quadratic moments used for estimation. When Vnt is normally distributed so that η4 = 3, we have b∗n = 0 and the best quadratic matrix is reduced to tr(G Jn )

IT −1 ⊗ (Gnj − n−nj1 Jn ) for j = 1, . . . , p. For linear moments, the best IV matrix is to choose the con∗ ditional mean26 E(Wn Ynt , Znt∗ |It −1 ), where the main component (∗,−1)

is E(Yn,t −1 |It −1 ). While this ideal IV matrix might not be directly available, one may design an approximated sequence for it.

24 From the detailed analysis involved, the individual effects provide information about the best IV for the model estimation. When T becomes large, those individual effects can be consistently estimated and the best IV can be based on such information. However, with finite T , the individual effects cannot be consistently estimated, so the best IVs might not be available. 25 Such an ML estimation method has been considered in detail in Lee and Yu (2010). 26 That E (W Y ∗ , Z ∗ |I ) is the best IV can be seen from the asymptotic variance n nt t −1 nt component of a GMM estimator due to the IVs in Theorem 2.

180

L.-f. Lee, J. Yu / Journal of Econometrics 180 (2014) 174–197

 T −1 1 For that purpose, define X˜ n,tT = T − S −1 h=t ΦT −h Xnh , V˜ n,tT = t n   T −1 T −1 1 1 S −1 h=t ΦT −h Vnh , α˜ tT = T − S −1 h=t ΦT −h αh0 and Ψt = T −t n t n  1 A Φ j−1 T −t h 2 cTt (In − nT −Tt−t ) where Φj = h=0 An and cTt = ( T −t +1 ) . As is derived in Lemma 4, the best IV matrix can be obtained from Hnt in (∗,−1) Yn,t −1

 = Hnt + Ψt (In − An ) Sn

−1 −1

1

t −1 

t − 1 s=1

 Vns − cTt V˜ n,tT , (16)

Assumption 9. The probability limit of 6nT ,22 = Z∗n,T −1 )′ Jn,T −1 (L∗n,T −1 , Z∗n,T −1 ) is nonsingular.

from (15) for j = 1, . . . , p.27 Under Assumptions 1–9, as both n and T tend to infinity, the feasible best GMME (BGMME) θˆb,nT derived −1 −1 −1 ′ ˆ nT ˆ nT from minθ∈Θ gnT (θ )Σ gnT (θ ), where Σ − ΣnT = op (1), has d

1 n(T − 1)(θˆb,nT − θ0 ) → N 0, 6− where b



 Hnt = Ψt Yn,t −1 − (In − An )−1 Sn

×

1



t − 1 s=1

n,T →∞

 (Sn Yns − Zns δ0 − αs0 ln )

− cTt X˜ n,tT β0 − cTt α˜ tT ln .

+ (17)

(∗,−1)

Thus, the best theoretically IV Jn E(Yn,t −1 |It −1 ) can be approximated by predetermined variables up to the period t − 1 and exogenous variables up to the period T − 1 via Jn Hnt . Even though  −1 Ψt (In − An )−1 Sn−1 t −1 1 st = 1 Vns in (16) is in It −1 but cannot be observed, it can be ignored. Indeed, it can be small as long as t is far from the initial period. Thus, the approximation can be accurate for those t’s far away from the initial period t = 0. Hence, we may use (∗,−1) Jn Hnt as a desirable IV for Jn Yn,t −1 . For t = 1, we may simply take

Hn1 = Ψ1 Yn0 − cT 1 X˜ n,1T β0 − cT 1 α˜ 1T ln . For these IVs with t’s close to the initial period t = 0, the approximations yield valid IVs but might not be adequate. However, as T is large, the segment with early observations is short relative to the later segment of observations; asymptotically, these IVs are adequate. Therefore, the best ∗ ∗ IV for Jn Znt may be taken as Jn Knt where Knt ≡ (Hnt , Wn Hnt , Xnt ). ∗ ∗ Also, from (4), the best IV for Jn Wnj Ynt is Jn Gnj (Knt δ0 + αt0 ln ). This ∗ suggests that we may use Jn Qnt as an IV matrix for Jn (Wn Ynt , Znt∗ ) where ∗ Qnt = (Gn (Knt δ0 + αt0 ln ), Knt ),

(18)

∗ ∗ and Gn (Knt δ0 + αt0 ln ) = [Gn1 (Knt δ0 + αt0 ln ), . . . , Gnp (Knt δ0 + ∗ αt0 ln )]. For feasible IVs, there are two sources of unobservables, namely, cn0 and [αt0 , . . . , αT −1,0 ] in (17), while other unknown parameters can be estimated consistently from Theorem 1. After Qnt is premultiplied by Jn if Wnj ’s are row-normalized, the time effects ∗ components α˜ tT ln in (17) and αt0 ln in (18) will be eliminated as they are proportional to ln ; however, they may not be eliminated when Wnj ’s are not row-normalized. Anyhow, as n is large, the time effects for each period can be estimated consistently. Denote ˆ nT )Ynt − Znt δˆnT as the estimate for rnt = cn0 + αt0 ln (= rˆnt = Sn (λ Sn Ynt − Znt δ0 − Vnt ). With l′n cn0 = 0 imposed, the estimate for αt0 is

αˆ t =

1 ′ ln rˆnt n

∗ for t = 1, . . . , T . It follows that α˜ tT ln in (17) and αt0 ln in (18) can be estimated by using the estimated αˆ s for s = t , . . . , T − 1. By plugging them back to (17) and (18) along with the estimated IVs, the feasible version for (18) is

ˆ nt = (Gˆ n (K ˆ nt δˆ + αˆ t∗ ln ), K ˆ nt ) Q

1

6b = lim  n(T − 1)

t −1

(19)

ˆ nt , δˆ and αˆ t∗ are feasible counterparts constructed with where Gˆ n , K initial consistent estimates as described.

(L∗n,T −1 ,

Theorem 2. Suppose we use the moment conditions in (7) where ˆ nt in (19) and Pˆ ∗nj,T −1 is estimated Qnt takes the special form Q



where

1 n(T −1)

1

σ02



Cpb,nT

0kz ×p



 0p×kz  0kz ×kz

plimn,T →∞ 6nT ,22 ,

(20)

with Cpb,nT tr(G′n1,T −1 Jn,T −1 P∗n1s ,T −1 Jn,T −1 )

  = 

··· .. . ···

.. .

tr(G′n1,T −1 Jn,T −1 P∗nps ,T −1 Jn,T −1 )

tr(G′np,T −1 Jn,T −1 P∗n1s ,T −1 Jn,T −1 )

.. .

tr(G′np,T −1 Jn,T −1 P∗nps ,T −1 Jn,T −1 )

   

and Gnj,T −1 = (IT −1 ⊗ Gnj ). For the QML estimator (QMLE) in Yu et al. (2008), it has an O(1/T ) bias, which can be eliminated but requires the condition 3

that Tn → ∞.28 From Theorem 2, the BGMME does not have a bias term with such an order. Under normality of Vnt , the BGMME and MLE have the same asymptotic variance. However, when Vnt is not normally distributed, the BGMME with the best IV and best quadratic moment matrix in (15) can be more efficient than the QMLE. This efficiency comes from the quadratic moment in the GMM estimation which incorporates kurtosis of the disturbances.29 We note that when T is finite, the proposed BGMME is still consistent and asymptotic normal. However, its limiting variance matrix is not the inverse of 6b , but the inverse of Σ c ,T + 6r ,T where



1

6c ,T ≡ lim  n(T − 1) n→∞

Cpb,nT

0kz ×p

+

1

σ02

plimn→∞

 0p×kz  0kz ×kz

1 n( T − 1 )

× MJ Q,nT (L∗n,T −1 , Z∗n,T −1 ),

(L∗n,T −1 , Z∗n,T −1 )′ (21)

2 27 The P nj,T −1 involves the true parameter λ0 , σ0 and µ4 , where λ0 can be estimated from Theorem 1 and the moments parameters σ02 and µ4 can be consistently estimated as explained in footnote 23. 28 The SDPD model considered in Yu et al. (2008) has one spatial lag, but the argument for the bias order of the QMLE will be applicable to the general case with multiple spatial lags. 29 The IV moments use first moment property of the disturbances to set up an

estimation framework, and quadratic moments use the second moment property. Thus, the ‘‘best’’ GMM is referred to the best choice of linear and quadratic moments. Compared with QMLE, some linear combinations of the first and second moments of GMME characterize the first order conditions of QMLE. Such linear combinations give rise to the efficient GMM estimator under the normal disturbances, but they are not the best combinations under other distributions rather than normal. Other GMM estimators based on additional moments higher than the first two might be possible to provide more efficient estimate than the BGMME currently established in our paper. For the SAR model for cross section data, this has been investigated in Liu et al. (2010).

L.-f. Lee, J. Yu / Journal of Econometrics 180 (2014) 174–197

with MJ Q,nT = Jn,T −1 Qn,T −1 (Q′n,T −1 Jn,T −1 Qn,T −1 )−1 Q′n,T −1 Jn,T −1 , and Σ r ,T is the component associated with RnT so that −1 −1 −1 6r ,T = lim R′nT ΣnT RnT + R′nT ΣnT DnT + D′nT ΣnT RnT





the GMM estimate, i.e., a larger number of moments might increase the bias of an IV estimator but decrease its variance.31 The 2SLS estimate of (3) with many moments is

n→∞



=



∗ −1

1 bmp [Cpp ] bmp lim ′ ∗ −1 T 2 n→∞ bmz [Cpp ] bmp

+ +

1 T 1 T

bmp [C∗pp ]−1 Cpb,nT lim ′ ∗ −1 b n→∞ bmz [Cpp ] Cp,nT  b′ ∗ −1 Cp,nT [Cpp ] bmp lim n→∞ 0kz ×p

where C∗pp =



µ4 −3σ04 σ04





∗ −1

bmp [Cpp ] bmz b′mz [C∗pp ]−1 bmz 0p×kz 0kz ×kz



θˆ2sl,nT

  T −1  ∗ ∗ ′ ∗ × (Wn Ynt , Znt ) Mnt Ynt ,

Cpb,′nT [C∗pp ]−1 bmz 0kz ×kz

′ ′ where Mnt = Jn Hnt (Hnt Jn Hnt )+ Hnt Jn . When T and pn are finite, the

′ ωnm ,T ωnm,T + ∆mn,T is related to the variance

IV in (6) and quadratic matrices IT −1 ⊗ (Wnj − n−1 Jn ) and IT −1 ⊗ n −1

Jn ) for j = 1, 2.

Step 2: By using θ˜ , obtain the best linear IV from (19) and the best quadratic matrices from (15). Then implement the GMM estimation. 3.2. Asymptotic properties of GMME with many moments When T is finite, the IVs from all the available time lags variables may improve, in principle, the asymptotic efficiency of the estimators. When T is moderate or large, however, the many moment issue will appear. In the literature on IV and GMM estimation with many moment conditions, e.g., in nonlinear simultaneous equations models or conditional moments restrictions models, many moments decrease the variances of the IV or GMM estimates, but increase their biases (see Bekker, 1994; Donald and Newey, 2001; Chao and Swanson, 2005; Han and Phillips, 2006, etc.). In a simple dynamic panel data model with fixed effects, when T is moderately large, but small relative to n, Alvarez and Arellano (2003) study the many IV estimation and its asymptotic properties. In this section, we use the moment condition in (9) where the dimension of Hnt might increase with t (and also increase with pn , where pn is the order of spatial power series expansion of Gnj ). We investigate the asymptotic properties of the 2SLS and GMM estimators for this approach. 3.2.1. Consistency, asymptotic normality and efficiency of 2SLSE For the many moment approach, we can use the IV matrix Hnt = (hnt , Wn hnt , . . . , Wnpn hnt )

(23)

t =1



trWnj

2) tr (Wnj

 −1 T −1  = (Wn Ynt∗ , Znt∗ )′ Mnt (Wn Ynt∗ , Znt∗ ) t =1



matrix of the best quadratic moments, and bmp and bmz are defined from (33). Hence, for the GMM estimation using finite number of moments, we can follow the following two steps. Step 1: Obtain an initial consistent estimate θ˜ with the linear

(Wnj2 −

181

(22)

motivated by (17),30 where, if Wnj ’s are not row-normalized, hnt = (Yn0 , . . . , Yn,t −1 , Xn1 , . . . , XnT , ln ) with its column dimension pt = kx T + t + 1; otherwise, ln shall be dropped from hnt if Wnj ’s are row-normalized. The pn (respectively, pt ) needs to increase as n (respectively, t and T ) increases in order to provide adequate approximation to the best theoretical IV. Therefore, the dimension of Hnt is Kt = pt · (1 + p + p2 + · · · + ppn ). The choice of many moments might have a trade-off between the bias and variance of

30 There are some technical difficulties in the presence of many IVs which involve estimated parameters in the literature, which is also true for our model. Hence, it is desirable to avoid it by using IVs which do not involve estimated parameters.

θˆ2sl,nT is consistent and asymptotically normal with the limiting variance matrix

 σ plimn→∞ 2 0

 −1 T −1  ∗ ∗ ′ ∗ ∗ (Lnt , Znt ) Mnt (Lnt , Znt ) .

1

n(T − 1) t =1

When T or pn is large, we need to consider the many moment issue as those many moments will introduce bias to the estimates. In the following, we will focus on the asymptotic analysis of the many moment case where both T and pn are large.32 Assumption 10. Both T and pn → ∞ as n → ∞. Assumption 10 specifies that, many moments in Hnt come out not only from the spatial power series expansion (pn → ∞) but also from the inclusion of lagged values (T → ∞). As we use a finite number of quadratic moment conditions in the SDPD model, we pay special attention to the linear moments. The additional quadratic moment conditions will not complicate the asymptotic analysis as the two sets of moments do not interact with each other. ∗ The Jn fnt where fnt = E(Wn Ynt , Znt∗ |It −1 ) is the best IV for ∗ Jn (Wn Ynt , Znt∗ ). From (4) and (16), we have (Wn Ynt∗ , Znt∗ ) = fnt + unt where ∗ ∗ ∗ fnt = [Gn (E(Znt |It −1 )δ0 + αt0 ln ), E(Znt |It −1 )],

(24)

unt = [Gn (φnt δ0 + Vnt ), φnt ],

(25)



(∗,−1)

(∗,−1)

∗ with E(Znt |It −1 ) = [E(Yn,t −1 |It −1 ), Wn E(Yn,t −1 |It −1 ), Xnt∗ ] and

φnt = −cTt [V˜ n,tT , Wn V˜ n,tT , 0n×kx ]. From (16) and (17), for t ≥ 2, the fnt can be approximated  t −1 t −1 1 1

by the variables: Yn,t −1 , t −1 s=1 Yns , t −1 s=1 Zns , the exogenous variables afterwards (Xnt , . . . , XnT ), the n × 1 vector ln , and their products with spatial power series expansions.33 As the elements in Hnt contain spatial series of hnt , the many moments via (22) come out from both spatial and time dimensions. The fnt can be well approximated by some linear combination of Hnt when t is far from the initial period.

31 For a simple dynamic panel data without exogenous variables and time effects, we have Kt = pt = t so that the total number of IVs is equal to T (T − 1)/2. With exogenous variables and spatial lags, the total number of IVs would be much larger. 32 When only T is large but p is not large, the asymptotic variance matrix of the n

θˆ2sl,nT is not clear. On the other hand, when T is finite, we do not need to use spatial expansion for Gnj ’s. With an initial consistent estimate, one can estimate Gnj ’s and construct a finite number of IVs with them. However, we might not use estimated Gnj ’s due to technical difficulty when T becomes large and there are many moments due to the time dimension. With both T and pn being large, we can follow the asymptotic setting in Donald and Newey (2001) where the many moments would approximate the conditional mean of endogenous regressors. 33 For t = 1, f can be approximated by the spatial power series expansion of Y , n1

Xn1 , . . . , Xn,T −1 and ln .

n0

182

L.-f. Lee, J. Yu / Journal of Econometrics 180 (2014) 174–197 1 ˆ (iii) let θˆ2sl ,nT = θ2sl,nT −

Thus, the 2SLSE in (23) can be written as



n(T − 1)(θˆ2sl,nT − θ0 )  −1 T −1  1 ′ = (fnt + unt ) Mnt (fnt + unt ) n( T − 1 ) t = 1   T −1  1 × √ (fnt + unt )′ Mnt Vnt∗ . (26) n(T − 1) t =1 T −1 ′ From Lemma 5, plimn,T →∞ n(T1−1) t =1 fnt fnt = plimn,T →∞ Σ nT ,22

is the probability limit of the first component inside the inverse in ∗ (26). As unt and Vnt are correlated, the second component in (26) has a non-zero mean. Denote

σ2

T −1 

0 b1,λj = √ [tr(Mnt Gnj )], n(T − 1) t =1

 σ02 1 tr b2,λj = − √ n(T − 1) t =1 T + 1 − t   ×

′−1

Mnt CnTt Sn Gnj

γ 0 In +

p 

 Wnj ρj0

,

j =1

b2,γ = − √

T −1  1 σ02 ′ tr(Mnt CnTt Sn′−1 ) n(T − 1) t =1 T + 1 − t

 σ02 1 ′ b2,ρk = − √ tr(Mnt CnTt Sn′−1 Wnk ) n( T − 1 ) t = 1 T + 1 − t T −1

T −t

h=1

hAhn−1 .

Theorem 3. Suppose we use many linear moments in (22) and n, T → ∞. Under Assumptions 1–7, 9 and 10 and

T −1

K t =1 t n(T −1)

the 2SLS θˆ2sl,nT in (26) is consistent and  ˆ ]−1 · (ϕ1 + ϕ2 ) n(T − 1)(θˆ2sl,nT − θ0 ) − [H   T −1 √ Kt    d  t =1 1 + Op  √  → N (0, σ02 plimn,T →∞ 6− nT ,22 ),  n(T − 1) 

→ 0,

(27)

(Wn Ynt∗ , Znt∗ )′ Mnt (Wn Ynt∗ , Znt∗ ),   T −1 K t    t =1  ϕ1 = (b1,λ1 , . . . , b1,λp , 01×(kx +p+1) )′ = Op  √ ,  n(T − 1)  ˆ = 1 where H n(T −1)

 T −1 t =1

From Theorem 3, we see that the 2SLS estimate might not be T −1



T −1

t ϕ n(T − 1), √n(T1−1) is of the order n(tT=−1 1) . The ϕ2 is caused ∗ ∗ by the correlation of Znt and Vnt after the data transformation ϕ to eliminate individual effects, and √n(T2−1) is of the order n(TK−1)

by

(i) if

T −1

Kt

√ t =1 → n(T −1) −1 plimn,T →∞ 6nT ,22 ); 

(ii) if



T −1 K t =1 t

√ 0, then

→ 0. For this requirement to hold, it is implicit that T has

3.2.2. Consistency, asymptotic distribution and efficiency of GMME To increase the efficiency of estimates, quadratic moment conditions can be included as those in Section 3.1. Thus, the moment conditions are (9) where Hnt takes the form in (22). Similar to Section 3.1, the variance matrix of these moment conditions can be approximated by 1

+ d

n(T − 1)(θˆ2sl,nT − θ0 ) → N (0, σ02

1 · ϕ → ( , σ02 plimn,T →∞ 6− nT ,22 );

∆nm,T

0(Σ T −1 K )×m t =1 t

→ c where c is a positive finite constant and √ → 0 as T → ∞, then n(T − 1)(θˆ2sl,nT − θ0 ) −

n(T −1) max{Kt :t =1,...,T −1} T −1 K t =1 t d −1 H N 0 1

K

to be small relative to n. For an AR(1) dynamic panel data as discussed in Alvarez and Arellano (2003, p. 1222), contrary to structural equation setting where too many IVs cause bias and produce undesirable closeness to OLS estimates, the large number of IVs is associated with a larger value of T and the closeness to OLS estimate is desirable because the endogeneity bias tends to zero when T tends to infinity. For a spatial dynamic panel data, the endogeneity bias has the two components ϕ1 and ϕ2 . The component ϕ2 will behave similarly to the bias in the GMM estimates in Alvarez and Arellano (2003), where a larger T will cause ϕ2 smaller. However, the component ϕ1 comes from the endogeneity of spatial lag terms, will behave similarly to the conventional structural equation setting and a larger T does not cause ϕ1 to be smaller. Instead, a larger T gives rise to more IVs so that 2SLSE of λ0 is closer to OLSE of λ0 , which implies that ϕ1 is larger when T is larger.

 n(T − 1) ΣnT = σ04 

ϕ2 = (b2,λ1 , . . . , b2,λp , b2,γ , b2,ρ1 , . . . , b2,ρp , 01×kx )′   T −1  1 Kt = Op √ . n(T − 1) t =1 (T + 1 − t )(T − t ) Consequently, 

Kt

consistent if we have too many moments such that n(tT=−1 1) is not small. Here, the bias ϕ1 in the asymptotic expansion is caused by the endogeneity of the spatial lag, where, after being rescaled



and

[ˆ]

d

1 θ0 ) → N (0, σ02 plimn,T →∞ 6− nT ,22 ).

T −1 Kt √ t =1 n(T −1)

and

1 with CnTt = T − t

be a bias corrected

estimate, where ϕˆ 1 is estimated ϕ1 with θˆ2sl,nT . Then, under the T −1 √ √ Kt 1 setting in (i), or (ii) and √tn=(1T −1) → 0, n(T − 1)(θˆ2sl ,nT −

where K = max{Kt : t = 1, . . . , T − 1}. Thus, the dominant asymptotic bias of the estimate is caused by the endogeneity of the spatial lag term rather than the dynamic lag term. However, after the bias correction, the dominating bias ϕ1 can be eliminated. Comparing the asymptotic distribution of the bias corrected IV estimate in Theorem 3 to that of the IV component with a finite number of moments in Theorem 2, we see that they have the same asymptotic distribution and, thus, both can asymptotically attain the best IV estimate. The asymptotic efficiency of the many IV estimate, however, requires ratio conditions, in particular, that

T −1



ˆ −1 ϕˆ 1 √ 1 H n(T −1)

1 n(T − 1)





0m×(Σ T −1 K ) t =1 t 1

1



σ02 n(T − 1)

′ (µ4 − 3σ04 )ωnm ,T ωnm,T

0(Σ T −1 K )×m t =1 t

Hn,T −1 Jn,T −1 Hn,T −1

∗ 0(Σ T −1 K )×(Σ T −1 K ) t =1 t t =1 t



 

, (28)

where Hn,T −1 = Diag(Hn1 , . . . , Hn,T −1 ) is the block diagonal matrix with Hnt in the tth diagonal block. Because ΣnT is block diagonal from (28) with ΣnT = Diag(ΣnT ,1 , ΣnT ,2 ), the −1 ′ optimal GMM has the objective function gnT (θ )ΣnT gnT (θ ) = −1 −1 ′ ′ gnT ,1 (θ )ΣnT ,1 gnT ,1 (θ ) + gnT ,2 (θ )ΣnT ,2 gnT ,2 (θ ) with gnT (θ ) = ′ ′ ′ (gnT ,1 (θ ), gnT ,2 (θ )) so that gnT ,1 (θ ) is the quadratic moment in (9),

L.-f. Lee, J. Yu / Journal of Econometrics 180 (2014) 174–197

gnT ,2 (θ ) is the linear moment in (9). The BGMME with many IVs can have the same asymptotic distribution as that in Theorem 2 when T is large. Theorem 4. Suppose we use many moment conditions in (9) with ˆ ∗n,T −1 estimated from (15). When n, T → ∞, Hnt in (22) and P

under Assumptions 1–7 and 9 and 10, the feasible BGMME θˆb,nT is  T −1 K t =1 t n(T −1)

consistent under

→ 0, and



n(T − 1)(θˆb,nT − θ0 ) − [σ02 Σ b ]−1 · (ϕ1 + ϕ2 )



√

as the IV matrix for t = 1, . . . , T ; the GMME uses addition-

Kt    t =1  d 1 + Op  √  → N (0, 6− b ),  n(T − 1) 

tr(W J )

where 6b is in (20). 2 ˆ Σ b ]−1 ϕˆ 1 , under the setting in TheLet θˆb1,nT = θˆb,nT − √n(1T −1) [σˆ nT orem 3(iii), the bias corrected BGMME θˆb1,nT has d

θ0 ) → N 0, 6b 



n(T − 1)(θˆb1,nT −

 −1

.

Hence, for the GMM estimation under the separate approach, we can follow the following two steps. Step 1: Obtain an initial consistent estimate θ˜ with the linear IV in (8). Step 2: By using θ˜ , obtain the best quadratic IV from (15). Then implement the GMM estimation. 4. Monte Carlo We run simulations to investigate the performance of 2SLSEs and GMMEs in Sections 3.1 and 3.2 under different values of n, T . We also compare them with those of the QMLE in Yu et al. (2008). We first investigate the case with one spatial lag, then followed by the two spatial lags. Some further investigations on relevant aspects are also covered. 4.1. One spatial lag case Samples are generated from Ynt = λ0 Wn Ynt + γ0 Yn,t −1 + ρ0 Wn Yn,t −1 + Xnt β0

+ cn0 + αt0 ln + Vnt ,

t = 1, 2, . . . , T ,

= (0.2, 0.2, −0.2, 1), θ0b = (0.2, 0.8, −0.2, 1) where θ0 = (λ0 , γ0 , ρ0 , β0′ )′ . Hence, γ0 takes the values from 0.2 to 0.8 and other parameters are held constant.34 The Xnt , cn0 , αt0 and using θ

is a rook matrix with row-normalization.36 We use T = 5, 20, and n = 100, 900. For each set of generated sample observations, we calculate the GMM estimator θˆnT and evaluate the bias θˆnT −θ0 . We 1000 1 ˆ do this 1000 times to get the empirical bias 1000 i=1 (θnT − θ0 )i . With different values of θ0 for each n and T , finite sample properties of these estimators are summarized in Tables 1–3. For each case, we report the bias, empirical standard deviation (SD), theoretical SD (T-SD)37 and empirical root mean square error (RMSE). Table 1 corresponds to finite number of moment conditions in (7), where 2SLSE uses

[Yn,t −1 , Wn Yn,t −1 , Wn2 Yn,t −1 , Xnt∗ , Wn Xnt∗ , Wn2 Xnt∗ ]



T −1

183

tr(W 2 J )

ally IT −1 ⊗ (Wn − n−n1n Jn ) and IT −1 ⊗ (Wn2 − n−n1 n Jn ) for ˜ nt in (19) as the IV maquadratic moments. The BGMME uses Q trix in linear moments and the quadratic moments are from (15), with initial estimates for (15) and (19) obtained from the GMME. Tables 2 and 3 are the 2SLSE, BGMME with many moments, and MLE. For 2SLSE and BGMM with many moments, the IV matrices are Yn0 , . . . , Yn,t −1 , Xn1 , . . . , XnT and their two spatial lags; for the BGMM with many moments, the quadratic moments are from (15) with initial estimates from 2SLSE with many moments. All the GMMEs are optimum ones as inverses of their variance matrices are used for weighting. The MLE is obtained from a partial likelihood constructed after the elimination of time effects under the situation that Wn is row-normalized. Table 2 is before bias correction and Table 3 is after bias correction. From Table 1 for the 2SLSE and GMME, biases are small for all the estimates. For both 2SLSE and GMME, as T increases, SDs decrease; as γ0 increases, biases and SDs for the estimate of γ0 increase. The GMME of λ0 has a smaller SD than does the 2SLSE of λ0 such that SDs can be reduced by 40% on average; but for other estimates, the reduction in SD is slight. The BGMMEs also have small biases. When n and T increase or γ0 decreases, SDs will be smaller. The BGMMEs have smaller SDs than do GMMEs except for item (5).38 From Table 2, the 2SLSE and BGMME with many IVs have some biases for the estimate of λ0 , γ0 , and ρ0 when T is small. The biases for γ0 are smaller when n and T are larger or γ0 is smaller, while those of λ0 will be larger when T is larger.39 Also, SDs for those estimates are smaller when T is larger. The QMLE also has some bias when T is small especially for γ0 . From Table 3, after the bias correction, the biases of 2SLSE are smaller. The SDs are smaller for the estimate of λ0 when T is large, but those for other estimates are ambiguous.40 Compared with the 2SLSE in Table 1, the 2SLSE

a 0

Vnt are generated from independent standard normal distributions. Throughout the simulation results, the initial observations are generated with a similar pattern.35 The spatial weights matrix Wn

36 We use the rook matrix based on an r board (so that n = r 2 ). The rook matrix represents a square tessellation with a connectivity of four for the inner fields on the chessboard and two and three for the corner and border fields, respectively. Most empirically observed regional structures in spatial econometrics are made up of regions with connectivity close to the range of the rook tessellation. 37 In constructing the T-SD for 2SLSE and GMME, we need to estimate L∗ in (10). nt

34 In a supplement file available upon request (see Appendix E), we also run the simulation where λ0 changes from 0.2 to 0.8 and other parameters are held constant. The results are similar. 35 We generated the spatial panel data with 20 + T periods where the starting

value is from N (0, In ), and then take the last T periods as our sample. By doing so, the initial value in the estimation is close to the steady state. Throughout the simulation results, the initial observations are generated with a similar pattern. Alternatively, we can use the first period simulated data as the initial observation in the estimation sample (so that the process is away from its steady state); the simulation results are similar so that the initial condition does not affect the performances of estimates in the current simulation study. These simulation results are provided in the supplement file available upon request (see Appendix E). In addition to different initial values in the data, for the supplement file, we have also run simulations where (a) the exogenous variables are kept fixed in each DGP or (b) the exogenous variables are correlated with each other. The basic results are similar.

∗ As it involves estimation of time effects for each period, we use Wn Ynt in place of L∗nt , which is practical and also provides a consistent estimate of the limiting variance. As seen from the Monte Carlo results, the T-SD is close to the empirical SD. 38 This occurs mainly for the SD of the λ estimate when T is small and γ is large. 0

We see also in this situation the Hessian estimation of SDs is inaccurate. When T is larger or with a smaller γ0 , such inaccuracy does not appear. 39 As is explained after Theorem 3, the many IVs cause the 2SLSE close to the OLSE.

For γ , its associated endogeneity bias of OLSE tends to zero when T is large. For λ, its endogeneity bias of OLSE is similar to the conventional structural equation setting and will not decrease when T increases; however, a larger T produces more IVs so that 2SLSE is closer to OLSE, which implies that the bias of 2SLSE for λ0 is larger when T is larger. 40 Unlike the QMLE in Lee and Yu (2010), the bias correction for the 2SLSEs and

GMMEs is applied to the estimation of λ0 only, but not for γ0 , ρ0 and β0 . Due to the presence of spatial interactions, the bias correction for these low order biases of γ0 , ρ0 and β0 (compared to the bias of λ0 ) is hard to track.

184

L.-f. Lee, J. Yu / Journal of Econometrics 180 (2014) 174–197

Table 1 2SLS, GMME and BGMME using a finite number of IVs. 2SLSE n (1)

(2)

(3)

(4)

(5)

(6)

(7)

(8)

100

100

900

900

100

100

900

900

θ0

T 5

20

5

20

5

20

5

20

θ0a

θ0a

θ0a

θ0a

θ0b

θ0b

θ0b

θ0b

GMME

λ

γ

Bias SD T-SD RMSE

−0.0001

−0.0019

ρ

0.1026 0.1010 0.1026

Bias SD T-SD RMSE

β

λ

−0.0010

0.0644 0.0647 0.0644

0.0018 0.1231 0.1194 0.1231

−0.0007

−0.0010

0.0448 0.0439 0.0448

0.0240 0.0239 0.0240

Bias SD T-SD RMSE

0.0008 0.0322 0.0335 0.0322

−0.0012

Bias SD T-SD RMSE

BGMME

γ

ρ

−0.0047

0.0547 0.0530 0.0547

0.0010 0.0651 0.0560 0.0651

−0.0002

−0.0011

0.0454 0.0455 0.0454

0.0232 0.0231 0.0232

0.0014 0.0398 0.0394 0.0398

−0.0003

0.0211 0.0214 0.0211

−0.0008

−0.0002

0.0149 0.0147 0.0149

0.0079 0.0080 0.0079

Bias SD T-SD RMSE

−0.0039

−0.0366

0.2319 0.2355 0.2319

Bias SD T-SD RMSE

β

λ

−0.0018

0.0631 0.0643 0.0633

0.0023 0.1231 0.1173 0.1231

−0.0001

−0.0011

−0.0003

−0.0012

0.0255 0.0254 0.0255

0.0240 0.0239 0.0240

0.0450 0.0453 0.0450

0.0232 0.023 0.0232

0.0001 0.0207 0.0188 0.0207

−0.0016 0.0210 0.0213 0.0210

0.0017 0.0396 0.0390 0.0397

−0.0004

0.0178 0.0175 0.0178

−0.0001

−0.0001

−0.0005

−0.0002

0.0151 0.0152 0.0151

0.0079 0.0077 0.0079

0.0086 0.0085 0.0086

0.0079 0.0080 0.0079

0.0086 0.3516 0.3647 0.3517

−0.0156 0.1296 0.1255 0.1305

0.0042 0.1242 0.0625 0.1243

−0.0500

0.2781 0.2674 0.2805

−0.0027

−0.0038

−0.0056

−0.0018

0.0489 0.0478 0.0490

0.0486 0.0447 0.0488

0.0838 0.0804 0.0839

0.0255 0.0251 0.0256

Bias SD T-SD RMSE

0.0022 0.0637 0.0626 0.0637

−0.0051

0.0032 0.1007 0.0968 0.1008

−0.0021

0.0731 0.0711 0.0733

Bias SD T-SD RMSE

−0.0010

−0.0002

0.0159 0.0156 0.0159

0.0142 0.0142 0.0142

γ

ρ

0.0016 0.0623 0.0559 0.0624

−0.0052

0.0003 0.0252 0.0254 0.0252

−0.0004

0.0002 0.0193 0.0187 0.0193

−0.0008

0.0177 0.0175 0.0177

−0.0002

−0.0001

−0.0005

−0.0001

0.0150 0.0152 0.0150

0.0079 0.0077 0.0079

0.0086 0.0085 0.0086

0.0114 0.1970 0.2216 0.1973

−0.0203

0.1978 0.2069 0.2040

−0.0008

−0.0044

−0.0039

−0.0020

0.0301 0.0260 0.0301

0.0459 0.0441 0.0461

0.0725 0.0785 0.0726

0.0253 0.0249 0.0253

0.0019 0.0532 0.0212 0.0533

−0.0095 0.0694 0.0673 0.0700

0.0063 0.0896 0.0733 0.0898

−0.0039

0.0348 0.034 0.0348

−0.0002

−0.0001

−0.0007

−0.0003

0.0249 0.0259 0.0249

0.0085 0.0082 0.0085

0.0100 0.0087 0.0100

0.0141 0.0141 0.0141

β 0.0013 0.1033 0.1019 0.1034

−0.0013

0.0007 0.0364 0.0360 0.0364

−0.0010

0.0011 0.0324 0.0337 0.0324

−0.0001

0.0000 0.0121 0.0121 0.0121

−0.0001

0.0063 0.0062 0.0063

0.0005 0.1860 0.0600 0.1860

−0.0183

−0.0041

−0.0070

0.1448 0.1235 0.1459

0.2115 0.1804 0.2115

0.0797 0.0728 0.0800

0.0005 0.0278 0.0257 0.0279

−0.0029

0.0017 0.0508 0.0524 0.0509

−0.0013

0.0011 0.0302 0.0199 0.0302

−0.0053 0.0342 0.0325 0.0346

0.0023 0.0478 0.0505 0.0478

−0.0020

0.0333 0.0322 0.0336

−0.0003

−0.0001

−0.0005

0.0230 0.0257 0.0230

0.0084 0.0082 0.0084

0.0091 0.0086 0.0091

0.0001 0.0085 0.0083 0.0085

0.0001 0.0146 0.0164 0.0146

0.0000 0.0080 0.0078 0.0080

0.0540 0.0525 0.0540

0.0969 0.0994 0.0990

0.0532 0.0549 0.0535 0.0180 0.0187 0.0180 0.0187 0.0182 0.0187

0.0269 0.0262 0.0270

0.0535 0.0518 0.0535 0.0231 0.0230 0.0231 0.0178 0.0172 0.0178 0.0078 0.0077 0.0078

0.0235 0.0236 0.0235 0.0227 0.0213 0.0228

Note: 1. θ0a = (0.2, 0.2, −0.2, 1) and θ0b = (0.2, 0.8, −0.2, 1). The disturbance in DGP is from the standard normal distribution. ∗ 2. For 2SLSE and GMME, the IV matrix is [Yn,t −1 , W n Yn,t −1 , W 2n Yn,t −1 , X ∗nt , W n Xnt , Wn2 Xnt∗ ]. 3. For GMME, the quadratic matrices are IT −1 ⊗ (Wn −

tr(Wn Jn ) Jn n−1

) and IT −1 ⊗ (Wn2 −

tr(Wn2 Jn ) Jn n− 1

).

For BGMME, the IV matrix is Qˆ nt in (19) and the quadratic matrix is estimated from (15).

Table 2 2SLS and BGMM using many moments, and MLE: before bias correction. 2SLSE before bias correction n (1)

(2)

(3)

(4)

(5)

(6)

100

100

900

900

100

100

θ0

T 5

20

5

20

5

20

θ0a

θ0a

θ0a

θ0a

θ0b

θ0b

λ

γ

ρ

Bias SD T-SD RMSE

0.0507 0.0948 0.0834 0.1075

−0.0459

Bias SD T-SD RMSE

0.0769 0.0371 0.0312 0.0854

−0.0282

Bias SD T-SD RMSE

0.0105 0.0306 0.0320 0.0323

−0.0058

Bias SD T-SD RMSE

GMME before bias correction

β

λ

γ

0.0214 0.0821 0.0805 0.0848

−0.0130

0.0243 0.0673 0.0523 0.0716

−0.0471

0.0033 0.0313 0.0313 0.0314

−0.0060

0.0391 0.0324 0.0219 0.0508

−0.0297

0.0033 0.0279 0.0293 0.0281

−0.0019

0.0040 0.0195 0.0185 0.0199

−0.0062

0.0157 0.0157 0.0168

0.0311 0.0142 0.0135 0.0342

−0.0034

−0.0024

−0.0019

0.0115 0.0114 0.0118

0.0079 0.0077 0.0082

0.0115 0.0093 0.0082 0.0148

−0.0042

0.0060 0.0059 0.0069

Bias SD T-SD RMSE

0.0518 0.0998 0.0885 0.1125

−0.1072

0.0025 0.0935 0.0908 0.0935

−0.0451

0.0253 0.0717 0.0534 0.0760

−0.1078

0.0528 0.0513 0.1195

Bias SD

0.0793 0.0379

−0.0550

−0.0399

−0.0151

0.0129

0.0346

0.0234

0.0404 0.0328

0.0436 0.0441 0.0633 0.0160 0.0163 0.0324

0.0533 0.0507 0.0549 0.0232 0.0229 0.0240 0.0176 0.0171 0.0177

0.0559 0.0528 0.0718

ρ

MLE before bias correction

β

λ

γ

ρ

β

0.0235 0.0818 0.0802 0.0851

−0.0120

0.0083 0.0580 0.0498 0.0586

−0.1297

0.0480 0.0691 0.0622 0.0841

−0.0300

0.0072 0.0314 0.0312 0.0322

−0.0036

0.0016 0.0256 0.0247 0.0256

−0.0307

0.0110 0.0312 0.0305 0.0331

−0.0026

0.0043 0.0275 0.0293 0.0279

−0.0016

0.0073 0.0195 0.0167 0.0208

−0.1284

0.0473 0.0220 0.0208 0.0522

−0.0302

0.0006 0.0115 0.0114 0.0115

−0.0009

0.0008 0.0091 0.0083 0.0091

−0.0306

0.0105 0.0105 0.0102 0.0148

−0.0020

0.0180 0.0851 0.0879 0.0870

−0.0443

0.0069 0.0592 0.0509 0.0596

−0.2379

0.0494 0.0712 0.0629 0.0867

−0.0964

0.0524 0.0512 0.1199

−0.0553

−0.0135

−0.0127 0.0234

0.0119 0.0293

−0.0114

0.0326

0.0025 0.0261

−0.0539

0.0128

0.0432 0.0440 0.0639 0.0159 0.0163 0.0337 0.0157 0.0157 0.0168 0.0060 0.0059 0.0073

0.0529 0.0505 0.0543 0.0231 0.0229 0.0234 0.0176 0.0171 0.0176 0.0079 0.0077 0.0079 0.0555 0.0526 0.0710

0.0351 0.0327 0.1344 0.0157 0.0158 0.0345 0.0123 0.0109 0.1290 0.0054 0.0052 0.0311 0.0389 0.0324 0.2410 0.0122

0.0527 0.0448 0.0607 0.0231 0.0225 0.0233 0.0170 0.0149 0.0347 0.0079 0.0075 0.0081 0.0533 0.0446 0.1102 0.0233

L.-f. Lee, J. Yu / Journal of Econometrics 180 (2014) 174–197

185

Table 2 (continued) 2SLSE before bias correction n

(7)

900

(8)

θ0

T

900

5

20

θ0b

θ0b

λ

γ

ρ

GMME before bias correction

β

λ

γ

MLE before bias correction

ρ

β

λ

γ

ρ

β

T-SD RMSE

0.0314 0.0879

0.0124 0.0565

0.0316 0.0529

0.0229 0.0279

0.0220 0.0520

0.0124 0.0567

0.0277 0.0353

0.0229 0.0266

0.0249 0.0262

0.0116 0.0553

0.0282 0.0316

0.0225 0.0259

Bias SD T-SD RMSE

0.0116 0.0330 0.0347 0.0350

−0.0142

−0.0005

−0.0061

0.0526 0.0236 0.0210 0.0576

−0.0961

0.0200 0.0197 0.0247

0.0068 0.0207 0.0170 0.0218

−0.2349

0.0195 0.0185 0.0205

0.0042 0.0301 0.0347 0.0304

−0.0059

0.0356 0.0359 0.0356

0.0053 0.0230 0.0190 0.0236

−0.0144

0.0201 0.0197 0.0246

Bias SD T-SD RMSE

0.0317 0.0143 0.0136 0.0348

−0.0103

−0.0199

−0.0037

−0.0047

−0.0027

0.0054 0.0053 0.0118

0.0113 0.0114 0.0122

0.0079 0.0077 0.0083

0.0121 0.0098 0.0094 0.0156

−0.0106

0.0080 0.0077 0.0088

0.0017 0.0090 0.0083 0.0091

−0.0529

0.0140 0.0136 0.0243

0.0120 0.0095 0.0083 0.0153

−0.0105

0.0054 0.0053 0.0117

0.0195 0.0185 0.0203

0.0135 0.0108 0.2353 0.0041 0.0038 0.0530

0.0174 0.0149 0.0976 0.0079 0.0075 0.0132

Note: 1. θ0a = (0.2, 0.2, −0.2, 1) and θ0b = (0.2, 0.8, −0.2, 1). The disturbance in DGP is from the standard normal distribution. 2. The IVs are Yn0 , . . ., Yn,t −1 , Xn1 , . . ., XnT and their first two spatial lags for the period t. 3. For GMME, the quadratic matrix is estimated from (15). For MLE, we use Lee and Yu (2010).

Table 3 2SLS and BGMM using many moments, and MLE: after bias correction. 2SLSE after bias correction n (1)

(2)

(3)

(4)

100

100

900

900

θ0

T 5

20

5

20

θ0a

θ0a

θ0a

θ0a

(5)

100

5

θ0b

(6)

100

20

θ0b

(7)

(8)

900

900

5

20

θ0b

θ0b

GMME after bias correction

λ

γ

ρ

β

Bias SD T-SD RMSE

−0.0136

−0.0472

−0.0096

0.0693 0.0688 0.0706

0.0434 0.0367 0.0641

0.0166 0.0824 0.0700 0.0841

Bias SD T-SD RMSE

−0.0443

−0.0329

0.0202 0.0309 0.0487

0.0158 0.0162 0.0365

Bias SD T-SD RMSE

0.0006 0.0293 0.0232 0.0293

−0.0060

Bias SD T-SD RMSE Bias SD T-SD RMSE

−0.0043

−0.0048

0.0123 0.0104 0.0131 −0.0202 0.0693 0.0720 0.0722

0.0060 0.0054 0.0076 −0.1063 0.0528 0.0368 0.1187

Bias SD T-SD RMSE

−0.0450

−0.0559

0.0204 0.0311 0.0494

0.0128 0.0119 0.0574

Bias SD T-SD RMSE

0.0001 0.0313 0.0247 0.0313

−0.0139

Bias SD T-SD RMSE

−0.0046

−0.0105

0.0123 0.0105 0.0132

0.0054 0.0040 0.0118

0.0157 0.0124 0.0168

0.0200 0.0126 0.0244

λ

MLE after bias correction

γ

ρ

β

λ

γ

−0.0475

0.0219 0.0819 0.0803 0.0848

−0.0109

0.0056 0.0587 0.0504 0.0590

−0.0159

0.0528 0.0504 0.0537

0.0023 0.0604 0.0525 0.0604

0.0161 0.0315 0.0312 0.0354

−0.0006

−0.0112

−0.0316 0.0159 0.0164 0.0354

0.0014 0.0255 0.0249 0.0256

−0.0010

0.0252 0.0221 0.0276

0.0125 0.0314 0.0313 0.0338

−0.0014

0.0231 0.0230 0.0231

0.0029 0.0279 0.0236 0.0281

−0.0013

0.0008 0.0193 0.0185 0.0193

−0.0062

0.0041 0.0275 0.0293 0.0279

−0.0014

0.0046 0.0196 0.0170 0.0201

−0.0141

0.0176 0.0170 0.0177

0.0017 0.0116 0.0105 0.0117 0.0245 0.0913 0.0730 0.0945

−0.0001

−0.0006

−0.0047 0.0060 0.0059 0.0076 −0.1075 0.0524 0.0512 0.1196

0.0006 0.0091 0.0083 0.0091 0.0035 0.0630 0.0505 0.0631

−0.0009

0.0089 0.0082 0.0089 0.0025 0.0638 0.0537 0.0639

0.0020 0.0115 0.0114 0.0117 0.0250 0.0851 0.0880 0.0887

−0.0003

0.0079 0.0077 0.0079 −0.0414 0.0555 0.0507 0.0692

0.0449 0.0278 0.0313 0.0528

−0.0096

−0.0106

−0.0556

0.0255 0.0222 0.0276

0.0128 0.0124 0.0571

0.0213 0.0296 0.0278 0.0365

−0.0105

0.0234 0.0230 0.0253

0.0030 0.0355 0.0251 0.0356

−0.0055

0.0020 0.0227 0.0190 0.0227

−0.0143

0.0053 0.0302 0.0347 0.0306

−0.0057

0.0195 0.0174 0.0203

0.0050 0.0132 0.0105 0.0141

−0.0019

−0.0002

−0.0106

0.0090 0.0083 0.0090

0.0054 0.0053 0.0119

0.0036 0.0111 0.0114 0.0117

−0.0020

0.0079 0.0077 0.0082

0.0432 0.0440 0.0642

0.0157 0.0157 0.0169

0.0200 0.0197 0.0246

0.0529 0.0506 0.0540 0.0231 0.0230 0.0232 0.0176 0.0171 0.0176 0.0078 0.0077 0.0078 −0.0432 0.0555 0.0527 0.0703 0.0234 0.0230 0.0256 0.0195 0.0185 0.0203 0.0079 0.0077 0.0082

ρ

β 0.0065 0.0779 0.0680 0.0782

−0.0028

0.0009 0.0321 0.0313 0.0322

−0.0008

0.0050 0.0246 0.0228 0.0251

−0.0034

0.0003 0.0108 0.0105 0.0108 −0.0097 0.1034 0.0720 0.1038

−0.0001

0.0056 0.0054 0.0056 0.0019 0.0592 0.0355 0.0592

0.0019 0.0261 0.0249 0.0262

−0.0016

−0.0009

−0.0009

0.0133 0.0119 0.0134

0.0314 0.0294 0.0314

0.0232 0.0230 0.0232

0.0036 0.0215 0.0170 0.0218

0.0034 0.0201 0.0118 0.0204

−0.0067 0.0329 0.0241 0.0336

0.0014 0.0199 0.0163 0.0199

0.0010 0.0090 0.0083 0.0091

−0.0007

−0.0006

−0.0002

0.0045 0.0039 0.0046

0.0103 0.0098 0.0104

0.0079 0.0077 0.0079

0.0397 0.0358 0.0428 0.0162 0.0162 0.0162 0.0139 0.0119 0.0198

0.0528 0.0491 0.0529 0.0231 0.0230 0.0231 0.0174 0.0164 0.0178 0.0078 0.0077 0.0078 0.0024 0.0588 0.0488 0.0589

Note: 1. θ0a = (0.2, 0.2, −0.2, 1) and θ0b = (0.2, 0.8, −0.2, 1). The disturbance in DGP is from the standard normal distribution. 2. The IVs are Yn0 , . . ., Yn,t −1 , Xn1 , . . ., XnT and their first two spatial lags for the period t. 3. For GMME, the quadratic matrix is estimated from (15). For MLE, we use Lee and Yu (2010).

with many IVs (before or after bias correction) has larger biases, smaller SDs; for RMSEs, the 2SLSE of λ0 with many IVs has larger RMSE for the item (2) with n = 100 and T = 20, but smaller for other situations. For the BGMME, it has a similar performance as 2SLSE, but its SDs of the estimates of λ0 are smaller for most cases. Compared with the BGMME in Table 1, the BGMME with many moments has a larger RMSE for the estimate of γ0 when T = 20 and a smaller RMSE for the estimate of ρ0 for all cases; for the estimate of λ0 , the BGMM with many moments has a smaller RMSE when T = 5 or γ0 = 0.8, but a larger RMSE otherwise. Comparing the QMLE with the BGMME in Table 1 and the bias corrected BGMME

with many moments, except that QMLEs have larger biases than those of the GMMEs, QMLE is slightly better overall, especially when γ0 is large.41

41 We also investigate the finite sample performance of the estimators under settings of (i) a negative λ0 ; (ii) a denser Wn of queen matrix and (iii) heterogeneous xit across i. We find that the performance of estimators is unchanged under a negative λ0 or heterogeneous xit . However, when the spatial weights matrix Wn becomes denser, the 2SLS and GMM estimators have a larger bias and SD with many moments. For the 2SLS and GMM estimators with finite moments, the SD becomes larger for the estimator of λ0 and ρ0 but not for γ0 and β0 ; the bias for all estimators is not much changed.

186

L.-f. Lee, J. Yu / Journal of Econometrics 180 (2014) 174–197

Table 4 Two spatial lags: row-normalized spatial weights matrices. n, T (1)

(2)

(3)

(4)

(5)

(6)

(7)

(8)

λ1

100, 10

Bias SD T-SD RMSE

900, 10

Bias SD T-SD RMSE

100, 10

Bias SD T-SD RMSE

900, 10

Bias SD T-SD RMSE

100, 10

Bias SD T-SD RMSE

900, 10

Bias SD T-SD RMSE

100, 10

Bias SD T-SD RMSE

900, 10

λ2

2SLSE −0.0005 0.0628 0.0621 0.0628

γ

ρ1

ρ2

β

λ1

0.0020 0.0422 0.0434 0.0422

−0.0014

0.0012 0.0645 0.0654 0.0645

0.0008 0.0474 0.0467 0.0474

−0.0020

0.0385 0.0380 0.0386

0.0001 0.0140 0.0145 0.0140

0.0002 0.0126 0.0128 0.0126

0.0006 0.0233 0.0221 0.0233

0.0002 0.0152 0.0158 0.0152

−0.0006

−0.0002

−0.0022 0.0307 0.0311 0.0308

0.0009 0.0542 0.0550 0.0542

0.0020 0.0390 0.0390 0.0391

−0.0019

0.0261 0.0256 0.0261

0.0000 −0.0001 0.0122 0.0087 0.0120 0.0085 0.0122 0.0087 2SLSE-many 0.0738 0.0700 0.0528 0.0345 0.0433 0.0303 0.0907 0.0780

0.0000 0.0105 0.0104 0.0105

0.0006 0.0188 0.0185 0.0188

0.0000 0.0132 0.0131 0.0132

−0.0006

−0.0572

0.0244 0.0427 0.0438 0.0491

0.0247 0.0304 0.0311 0.0392

−0.0179

0.0051 0.0167 0.0166 0.0174

0.0036 0.0116 0.0118 0.0121

−0.0060

0.0233 0.0431 0.0438 0.0490

0.0233 0.0307 0.0311 0.0385

−0.0118

0.0062 0.0166 0.0167 0.0177

0.0046 0.0116 0.0118 0.0125

−0.0035

0.0001 0.0207 0.0208 0.0207 BGMME 0.0004 0.0378 0.0358 0.0378

0.0283 0.0195 0.0189 0.0344 GMME-many 0.0404 0.0475 0.0306 0.0624

Bias SD T-SD RMSE

0.0112 0.0132 0.0116 0.0173

0.0237 0.0239 0.0619

0.0260 0.0133 0.0133 0.0292

−0.0087

0.0387 0.0324 0.0217 0.0505

−0.0600

0.0104 0.0092 0.0082 0.0139

−0.0105

0.0093 0.0092 0.0127

0.0236 0.0239 0.0645 0.0092 0.0092 0.0140

0.0342 0.0342 0.0343 0.0113 0.0114 0.0113

λ2

γ

ρ1

ρ2

β

0.0026 0.0648 0.0649 0.0649

0.0014 0.0470 0.0464 0.0470

−0.0022

0.0384 0.0378 0.0385 0.0000 0.0125 0.0128 0.0125

0.0007 0.0231 0.0221 0.0231

0.0003 0.0151 0.0157 0.0151

−0.0007

0.0206 0.0433 0.0441 0.0480

0.0201 0.0310 0.0313 0.0369

−0.0022

−0.0024 −0.0113 0.0170 0.0122 0.0093 0.0146 0.0101 0.0080 0.0173 0.0124 0.0146 Bias-corrected GMME-many −0.0125 −0.0078 −0.0645 0.0373 0.0269 0.0234 0.0311 0.0221 0.0240 0.0393 0.0280 0.0686

0.0048 0.0168 0.0148 0.0174

0.0033 0.0117 0.0104 0.0121

−0.0015

0.0217 0.0433 0.0441 0.0484

0.0213 0.0308 0.0313 0.0375

−0.0052

−0.0114

0.0061 0.0167 0.0167 0.0177

0.0045 0.0116 0.0118 0.0125

−0.0019

GMME 0.0008 0.0385 0.0360 0.0385

0.0003 0.0260 0.0257 0.0260

−0.0027

0.0000 0.0125 0.0121 0.0125

0.0000 0.0087 0.0085 0.0087

0.0339 0.0337 0.0339

0.0341 0.0339 0.0342 0.0113 0.0113 0.0113

.

0.0112 0.0112 0.0112

0.0342 0.0333 0.0386 0.0113 0.0112 0.0128

0.0338 0.0333 0.0358 0.0112 0.0112 0.0117

Bias-corrected 2SLSE-many −0.0553 −0.0377 −0.0677 0.0294 0.0230 0.0233 0.0436 0.0305 0.0240 0.0626 0.0442 0.0716

−0.0031

0.0005 0.0126 0.0116 0.0126

0.0003 0.0089 0.0083 0.0089

0.0092 0.0092 0.0147

0.0336 0.0338 0.0337 0.0112 0.0112 0.0113

0.0337 0.0334 0.0341 0.0112 0.0112 0.0113

Note: We use θ0a = (0.2, 0.2, 0.2, −0.2, −0.2, 1). The disturbance in DGP is from the standard normal distribution.

4.2. Two spatial lags case Following are Monte Carlo results with high order spatial lags (p = 2). We use a rook matrix Wn1 and an income-based matrix Wn2 . We generate the income randomly by a uniform distribution from (0, 1) with a multiplicative factor depending on sample size n,42 and the weights w2,ij for i ̸= j is constructed from (d −1d )2 i

j

where di is the generated income.43 In the following, Table 4 is the result with row-normalized Wn1 and Wn2 while Table 5 is the result with the non-row-normalized Wn2 . The true parameter vector is θ0 = (λ10 , λ20 , γ0 , ρ10 , ρ20 , β0 ) = (0.2, 0.2, 0.2, −0.2, −0.2, 1). We use [Wn , Wn2 ] for spatial power series expansion in constructing IVs. From Table 4, we see that the basic results are similar to the case of one spatial lag with a finite number of moments. Due to inclusion of more spatial lags, the biases of the estimates for spatial lag coefficients are larger for the separate moment approach. This

42 If we do not row-normalize W , the income is generated from a uniform n2 distribution from (0, 1) multiplied by 0.75 · n2 for n = 100 and multiplied by 3.2 · n2 for n = 900. The purpose of doing so is to control the magnitude of the eigenvalues of Wn2 so that the magnitude of spatial lags will not cause the precision problem in computation. 43 For this two spatial weights matrices case where W is not row-normalized, n2

is so because for the high order spatial lag case, the number of IVs is larger. With a larger n, the magnitude of the biases decreases. From Table 5, with the non-row-normalized Wn2 , the corresponding estimated λ20 has a large standard deviation compared to the rownormalized Wn2 as expected due to the fact that the row sums of weights for the former have smaller magnitude. Other estimates perform similarly to the case with row-normalized Wn2 . Therefore, implications for the cases with or without row-normalization are similar overall.44 For the case with two spatial lags, instead of [Wn , Wn2 ] in constructing IVs in the main paper, we also use [Wn , Wn2 , . . . , Wn5 ] in constructing IVs in our simulation. Under such a setting, the number of IVs would be large when T is not small. For example, for a row-normalized weights matrices case, when kx = 1 and p = 2 as in the DGP, the number of IVs at time period t is (kx + t )(1 + p + p2 +· · ·+ p5 ) = 31(t + 1), which is not small relative to n. Table 6 is the result for different estimation methods using row-normalized weights matrices, while Table 7 is the result with non-row-normalized Wn2 . From Tables 6 and 7, we see that the basic results are similar to the case of one spatial lag with a finite number of moments. Compared to Tables 4 and 5 using [Wn , Wn2 ] in constructing IVs, the biases are larger on average as we use a larger number of IVs from spatial expansion. Thus, in the high order spatial lag case, a relatively smaller number of IVs from the spatial power series expansion is recommended.

the maximum and minimum eigenvalues of Wn1 and Wn2 are [−0.9972, 1] and [−0.9303, 0.9303], respectively. By searching over (−∞, ∞) for xj as a reparameterization via λj =

(− |wj,1min | ,

1 wj,max

).

1 1 wj,max − |wj,min | exp(−xj )

1+exp(−xj )

, we equivalently search λj over

44 We also investigate a different parameter values case and the results are similar. Details are in a supplement file available upon request (see Appendix E).

L.-f. Lee, J. Yu / Journal of Econometrics 180 (2014) 174–197

187

Table 5 Two spatial lags: Wn2 is not row-normalized. n, T (1)

(2)

(3)

(4)

(5)

(6)

(7)

(8)

100, 10

900, 10

100, 10

900, 10

100, 10

900, 10

100, 10

900, 10

λ1 Bias SD T-SD RMSE Bias SD T-SD RMSE Bias SD T-SD RMSE Bias SD T-SD RMSE Bias SD T-SD RMSE

2SLSE 0.0000 0.0639 0.0642 0.0639

λ2

γ

ρ1

β

λ1

0.0179 0.2040 0.2055 0.2048

−0.0018

0.0015 0.0679 0.0700 0.0679

0.0135 0.1441 0.1494 0.1447

−0.0014

0.0382 0.0375 0.0382

0.0120 0.1946 0.1968 0.1949

0.0002 0.0125 0.0125 0.0125

0.0006 0.0248 0.0237 0.0248

0.0045 0.1424 0.1438 0.1424

−0.0006

−0.0106

−0.0018 0.0305 0.0306 0.0305

0.0004 0.0572 0.0581 0.0572

0.0078 0.1502 0.1327 0.1504

−0.0014

0.1215 0.1037 0.1219

0.0001 −0.0216 0.0125 0.1094 0.0124 0.0989 0.0125 0.1115 2SLSE-many 0.0797 0.0321 0.0558 0.1282 0.0450 0.1178 0.0973 0.1321

0.0000 0.0104 0.0102 0.0104

0.0007 0.0198 0.0195 0.0198

0.002 0.1341 0.1267 0.1342

−0.0006

−0.0608

0.0187 0.0442 0.0455 0.0480

0.0280 0.1174 0.1177 0.1207

−0.0117

0.0025 0.0176 0.0175 0.0177

0.0134 0.109 0.1131 0.1098

−0.0033

0.0202 0.0444 0.0455 0.0488

0.0271 0.1173 0.1162 0.1204

−0.0096

0.0050 0.0174 0.0175 0.0181

0.0134 0.1098 0.1117 0.1106

−0.0024

0.0003 0.0215 0.0216 0.0215 BGMME 0.0010 0.0390 0.0369 0.0390

0.0238 0.0239 0.0653

−0.0091

Bias SD T-SD RMSE

0.0297 0.0315 0.0207 0.1206 0.0198 0.1128 0.0362 0.1247 GMME-many 0.0420 0.0123 0.0495 0.1272 0.0317 0.0928 0.0649 0.1277

Bias SD T-SD RMSE

0.0115 0.0136 0.0120 0.0178

−0.0099

Bias SD T-SD RMSE

ρ2

0.0118 0.1211 0.0887 0.1217

0.0093 0.0092 0.0131

−0.0621 0.0237 0.0239 0.0665 0.0093 0.0092 0.0136

0.0341 0.0340 0.0341 0.0112 0.0113 0.0112

λ2

γ

ρ1

ρ2

β

0.0024 0.0678 0.0695 0.0678

0.0116 0.1487 0.1396 0.1492

−0.0017

0.0383 0.0373 0.0384 0.0001 0.0124 0.0125 0.0124

0.0008 0.0246 0.0236 0.0246

0.0050 0.1403 0.1341 0.1404

−0.0006

0.0236 0.0446 0.0456 0.0504

0.0206 0.1171 0.1184 0.1189

−0.0054

−0.0308 −0.0102 0.0179 0.0895 0.0093 0.0151 0.1129 0.0080 0.0181 0.0947 0.0138 Bias-corrected GMME-many −0.0136 −0.0213 −0.0640 0.0382 0.1065 0.0236 0.0320 0.0939 0.0239 0.0406 0.1086 0.0682

0.0045 0.0176 0.0154 0.0182

0.0051 0.1115 0.1133 0.1116

−0.0016

0.0222 0.0445 0.0456 0.0498

0.0231 0.1167 0.1165 0.1189

−0.0070

−0.0219

−0.0102 0.0093 0.0092 0.0138

0.0057 0.0174 0.0175 0.0184

0.0090 0.1109 0.1118 0.1113

−0.0018

0.1017 0.0895 0.1041

ρ1

ρ2

β

0.0062 0.0559 0.0553 0.0562

0.0066 0.0397 0.0400 0.0402

−0.0041

0.0011 0.0223 0.0215 0.0223

0.0007 0.0147 0.0154 0.0147

−0.0009

0.0200 0.0432 0.0442 0.0476

0.0195 0.0308 0.0314 0.0364

−0.0008

0.0212 0.0144

0.0199 0.0105

−0.0022

GMME 0.0011 0.0392 0.0369 0.0392

0.0134 0.1699 0.1183 0.1704

−0.0028

0.0001 0.0128 0.0124 0.0128

0.0129 0.1583 0.1134 0.1588

0.0339 0.0336 0.0339

0.0341 0.0338 0.0341 0.0112 0.0112 0.0112

.

0.0111 0.0112 0.0112

0.0341 0.0333 0.0360 0.0112 0.0112 0.0117

0.0339 0.0333 0.0353 0.0112 0.0112 0.0114

Bias-corrected 2SLSE-many −0.0305 −0.0653 0.0299 0.0940 0.0235 0.0450 0.1184 0.0238 0.0610 0.0988 0.0694

−0.0532

−0.0026

0.0006 0.0130 0.0120 0.0130

0.0338 0.0335 0.0342 0.0112 0.0112 0.0113

0.0339 0.0334 0.0346 0.0111 0.0112 0.01135

Note: We use θ0a = (0.2, 0.2, 0.2, −0.2, −0.2, 1). The disturbance in DGP is from the standard normal distribution.

Table 6 Two spatial lags: row-normalized spatial weights matrices, more IVs. n, T (1)

(2)

(3)

(4)

(5)

(6)

100, 10

900, 10

100, 10

900, 10

100, 10

900, 10

λ1 2SLSE 0.0223 0.0592 0.0561 0.0632

λ2

γ

ρ1

ρ2

β

λ1

λ2

γ

GMME 0.0100 0.0413 0.0346 0.0425

0.0112 0.0276 0.0248 0.0298

−0.0086

0.0014 0.0126 0.0120 0.0127

0.0013 0.0088 0.0085 0.0089

−0.0004

0.0278 0.0396 0.0397 0.0484

−0.0061

0.0044 0.0558 0.0554 0.0560

0.0050 0.0399 0.0401 0.0402

−0.0058

0.0331 0.0327 0.0337

0.0035 0.0139 0.0143 0.0143

0.0000 0.0124 0.0125 0.0124

0.0009 0.0224 0.0215 0.0224

0.0005 0.0147 0.0154 0.0148

−0.0012

−0.0003

−0.0019 0.0306 0.0310 0.0307

0.0008 0.0541 0.0549 0.0541

0.0018 0.0389 0.0389 0.0390

−0.0019

0.0261 0.0256 0.0261

0.0000 0.0105 0.0104 0.0105

0.0006 0.0188 0.0185 0.0188

0.0000 0.0132 0.0131 0.0132

−0.0006

Bias SD T-SD RMSE

0.0000 −0.0001 0.0122 0.0087 0.0120 0.0085 0.0122 0.0087 2SLSE-many 0.0747 0.0683 0.0532 0.0343 0.0430 0.0301 0.0917 0.0764

−0.0567

0.0238 0.0423 0.0436 0.0486

0.0243 0.0301 0.0309 0.0387

−0.0177

Bias SD

0.0875 0.0172

0.0246 0.0142

0.0236 0.0103

−0.0206

−0.0457

−0.0350

−0.0662

0.0113

0.0097

0.0077

0.0080

Bias SD T-SD RMSE Bias SD T-SD RMSE Bias SD T-SD RMSE Bias SD T-SD RMSE

0.0040 0.0206 0.0205 0.0210 BGMME 0.0003 0.0378 0.0358 0.0378

0.0797 0.0117

0.0235 0.0237 0.0613

−0.0549 0.0081

0.0341 0.0339 0.0346 0.0113 0.0114 0.0114

0.0327 0.0326 0.0338 0.0123 0.0125 0.0123

0.0339 0.0337 0.0339

0.0339 0.0337 0.0342 0.0113 0.0113 0.0113

.

0.0112 0.0112 0.0112

0.0342 0.0333 0.0385

Bias-corrected 2SLSE-many −0.0717 −0.0448 −0.0681 0.0287 0.0227 0.0231 0.0437 0.0305 0.0241 0.0773 0.0503 0.0720

0.0336 0.0338 0.0336 0.0111

(continued on next page)

188

L.-f. Lee, J. Yu / Journal of Econometrics 180 (2014) 174–197

Table 6 (continued) n, T

λ1 T-SD RMSE

(7)

(8)

100, 10

900, 10

λ2

Bias SD T-SD RMSE

0.0145 0.0101 0.0892 0.0806 GMME-many 0.0412 0.0381 0.0481 0.0324 0.0305 0.0216 0.0633 0.0500

Bias SD T-SD RMSE

0.0472 0.0154 0.0102 0.0496

0.0440 0.0108 0.0072 0.0453

γ

ρ1

ρ2

0.0080 0.0555

0.0147 0.0284

0.0104 0.0257

0.0111 0.0235

−0.0594

0.0228 0.0428 0.0436 0.0485

0.0230 0.0303 0.0309 0.0381

−0.0117

0.0230 0.0143 0.0147 0.0271

0.0220 0.0104 0.0104 0.0243

0.0234 0.0236 0.0639

−0.0581 0.0081 0.0080 0.0587

β

λ1

ρ1

ρ2

0.0146 0.0102 0.0080 0.0467 0.0359 0.0666 Bias-corrected GMME-many −0.0196 −0.0112 −0.0644 0.0370 0.0267 0.0232 0.0310 0.0221 0.0238 0.0419 0.0290 0.0684

0.0148 0.0257

0.0105 0.0225

0.0113 0.0113

0.0212 0.0430 0.0438 0.0479

0.0210 0.0305 0.0311 0.0370

−0.0045

−0.0119

−0.0066

−0.0046

−0.0628

0.0122 0.0104 0.0139

0.0090 0.0073 0.0101

0.0080 0.0080 0.0633

0.0216 0.0143 0.0148 0.0259

0.0204 0.0105 0.0105 0.0230

−0.0042

0.0112 0.0111 0.0163

ρ1

ρ2

β

0.0338 0.0333 0.0358

λ2

γ

β

0.0337 0.0335 0.0340 0.0112 0.0112 0.0119

Note: We use θ0a = (0.2, 0.2, 0.2, −0.2, −0.2, 1). The disturbance in DGP is from the standard normal distribution. Table 7 Two spatial lags: Wn2 is not row-normalized, more IVs.

(1)

(2)

(3)

(4)

(5)

(6)

(7)

(8)

n, T

λ1

100, 10

Bias

2SLSE 0.0271

0.0351

−0.0126

0.0055

SD T-SD RMSE

0.0600 0.0582 0.0658

0.1282 0.1206 0.1329

0.0342 0.0333 0.0364

0.0578 0.0601 0.0580

Bias SD T-SD RMSE

0.0320 0.1206 0.1131 0.1248

−0.0013

0.0014 0.0244 0.0232 0.0245

Bias SD

0.0046 0.0214 0.0213 0.0219 BGMME 0.0007 0.0387

−0.0104

−0.0016

0.1112

0.0304

T-SD RMSE

0.0368 0.0387

0.1030 0.1117

Bias SD T-SD RMSE

0.0001 −0.0195 0.0125 0.1077 0.0124 0.0983 0.0125 0.1095 2SLSE-many 0.0802 0.0321 0.0558 0.1281 0.0448 0.1178 0.0977 0.1321

900, 10

100, 10

900, 10

100, 10

900, 10

100, 10

900, 10

Bias SD T-SD RMSE

λ2

γ

ρ1

β

λ1

0.0139

−0.0037

GMME 0.0118

0.0148

−0.0139

0.0082

0.0148

−0.0034

0.1178 0.1196 0.1186

0.0340 0.0337 0.0342

0.0422 0.0356 0.0439

0.1291 0.0942 0.1299

0.0340 0.0333 0.0368

0.0578 0.0600 0.0583

0.1191 0.1178 0.1201

0.0339 0.0336 0.0341

0.0108 0.1091 0.1134 0.1097

−0.0010

0.0017 0.0129 0.0123 0.0130

0.0139 0.1218 0.0888 0.1226

−0.0015

0.0020 0.0242 0.0232 0.0243

0.0117 0.1111 0.1119 0.1117

−0.0009

0.0002 0.0568

0.0076 0.1374

−0.0013

0.0305 0.0304

0.0578 0.0568

0.1309 0.1376

0.0336 0.0339

0.0000 0.0104 0.0102 0.0104

0.0007 0.0197 0.0195 0.0198

0.0016 0.1309 0.1253 0.1309

−0.0006

−0.0603

0.0184 0.0439 0.0453 0.0476

0.0279 0.1173 0.1177 0.1206

−0.0116

0.0207 0.0153 0.0157 0.0257

0.0287 0.1095 0.1127 0.1132

−0.0129

0.0199 0.0442 0.0453 0.0484

0.0272 0.1176 0.1162 0.1207

−0.0095

0.0218 0.0153 0.0158 0.0266

0.0268 0.1109 0.1112 0.1141

−0.0097

−0.0070

−0.0221

−0.0646

0.0112 0.0111 0.0148

0.0127 0.0109 0.0145

0.1028 0.0894 0.1052

0.0083 0.0083 0.0652

0.0123 0.0124 0.0124

0.0235 0.0237 0.0647

−0.0616

Bias SD T-SD RMSE

0.0881 0.0297 0.0187 0.1213 0.0157 0.1123 0.0901 0.1249 GMME-many 0.0426 0.0122 0.0497 0.1274 0.0316 0.0928 0.0654 0.1279

Bias SD T-SD RMSE

0.0436 0.0158 0.0108 0.0464

−0.0630

Bias SD T-SD RMSE

0.0110 0.1219 0.0883 0.1224

0.0083 0.0083 0.0622

−0.0616 0.0234 0.0237 0.0659 0.0083 0.0083 0.0635

ρ2

0.0112 0.0113 0.0113

λ2

γ

0.0123 0.0124 0.0123

0.0339

0.0112 0.0112 0.0112

. .

0.0111 0.0112 0.0111

0.0341 0.0333 0.0360 0.0113 0.0111 0.0172

0.0339 0.0333 0.0352

Bias-corrected 2SLSE-many −0.0331 −0.0651 0.0293 0.0943 0.0232 0.0450 0.1185 0.0238 0.0689 0.0999 0.0691

−0.0623

−0.0404

−0.0322 −0.0658 0.0107 0.0908 0.0083 0.0151 0.1129 0.0079 0.0418 0.0964 0.0663 Bias-corrected GMME-many −0.0179 −0.0228 −0.0637 0.0380 0.1067 0.0233 0.0319 0.0939 0.0237 0.0420 0.1091 0.0678

0.0238 0.0444 0.0456 0.0504

0.0201 0.1172 0.1185 0.1190

−0.0049

0.0251 0.0153 0.0153 0.0294

0.0198 0.1117 0.1132 0.1134

−0.0062

0.0222 0.0443 0.0454 0.0496

0.0231 0.1170 0.1165 0.1193

−0.0067

0.0236 0.0153 0.0158 0.0281

0.0221 0.1117 0.1115 0.1139

−0.0071

0.0338 0.0335 0.0341 0.0111 0.0112 0.0127

0.0339 0.0334 0.0345 0.0111 0.0111 0.0132

Note: We use θ0a = (0.2, 0.2, 0.2, −0.2, −0.2, 1). The disturbance in DGP is from the standard normal distribution.

4.3. Some further investigations Table 8 studies the effect of best quadratic moment when the disturbances are not normally distributed. The DGP has the disturbance from the demeaned gamma(2, 1), where its kurtosis is 6. We compare the QMLE, BGMME with a finite number of moments, and bias corrected GMME with many IVs and best quadratic moment. Compared with the QMLE, the performances of the BGMME is only slightly better for λ0 when T = 20, while those of GMME with many moments are only slightly better for λ0 when n = 900.45

45 Under the rook matrix specification in our Monte Carlo design, the diagonal elements of Gn has small variation so that diag(Jn Gnj Jn ) −

tr(Gnj Jn ) n

In in (15) has small

We also investigate the effect of number of IVs from both spatial power series expansion and time lags on the bias, SD and RMSE of estimates. As the number of moments is relevant for linear IVs, we investigate the 2SLSE with many IVs before and after bias correction.46 The spatial power series expansion orders, pn , range from 1 to 10, and the time lags to include, k, also range from 1 to 10. For each time period, the actual number of time lags to include

tr(G Jn )

∗ values. This makes the Pnj not so different from Gnj − n−nj1 Jn and might explain the slight improvement by using the kurtosis in the quadratic moment. 46 The purpose of this simulation is to investigate the finite sample properties of

2SLSE with different number of IVs. Theoretical investigation of the optimal choice of IVs is complicated with the simultaneous spatial and time lags and is not covered in the current paper.

L.-f. Lee, J. Yu / Journal of Econometrics 180 (2014) 174–197

189

Table 8 Improvement by kurtosis. n = 100, T = 5

λ

γ

n = 900, T = 5

n = 100, T = 20

ρ

β

λ

γ

ρ

β

λ

γ

ρ

β

0.9923 0.9099 0.9521 1.0453 1.0898 −0.0025 0.0708 0.0687 0.0708

0.2051 0.1779 0.1915 0.2209 0.2376 0.0064 0.0225 0.0183 0.0234

0.1741 0.1537 0.1631 0.1849 0.1960 −0.0259 0.0162 0.0137 0.0305

−0.1912 −0.2275 −0.2109 −0.1695 −0.1515

0.9933 0.9645 0.9767 1.0100 1.0255 −0.0061 0.0244 0.0229 0.0251

0.2009 0.1640 0.1819 0.2200 0.2393 0.0008 0.0287 0.0272 0.0287

0.1980 0.1735 0.1872 0.2118 0.2240 −0.0010 0.0192 0.0187 0.0192

−0.1980 −0.2468 −0.2246 −0.1730 −0.1496

1.0008 0.9611 0.9804 1.0219 1.0418 0.0011 0.0316 0.0326 0.0316

0.9982 0.9119 0.9564 1.0520 1.0972 0.0016 0.0722 0.0730 0.0722

0.2007 0.1693 0.1856 0.2147 0.2294 0.0003 0.0228 0.0204 0.0228

0.1979 0.1687 0.1829 0.2155 0.2298 −0.0010 0.0239 0.0235 0.0239

−0.2007 −0.2528 −0.2270 −0.1691 −0.1436

0.9995 0.9693 0.9823 1.0153 1.0332 0.0000 0.0247 0.0242 0.0247

0.2000 0.1625 0.1818 0.2181 0.2365 −0.0005 0.0281 0.0278 0.0281

0.2012 0.1710 0.1857 0.2164 0.2312 0.0008 0.0227 0.0225 0.0227

−0.1984 −0.2554 −0.2290 −0.1698 −0.1421

0.9839 0.8978 0.9412 1.0355 1.0825 −0.0118 0.0710 0.0709 0.0719

0.2007 0.1713 0.1862 0.2158 0.2293 0.0010 0.0222 0.0201 0.0222

0.1906 0.1676 0.1783 0.2049 0.2189 −0.0082 0.0196 0.0187 0.0213

−0.1976 −0.2375 −0.2199 −0.1755 −0.1503

0.9974 0.968 0.9809 1.0141 1.0302 −0.0017 0.0245 0.0240 0.0246

0.1858 0.1495 0.1674 0.2038 0.2221 −0.0144 0.0276 0.0237 0.0311

0.1587 0.1340 0.1468 0.1717 0.1830 −0.0411 0.0189 0.0189 0.0452

−0.1839 −0.2301 −0.2097 −0.1594 −0.1362

MLE (after bias correction) Median 10%Q 25%Q 75%Q 90%Q Bias SD T-SD RMSE

0.2066 0.1297 0.1659 0.2514 0.2849 0.0077 0.0606 0.0545 0.0611

0.1717 0.1098 0.1380 0.2051 0.2349 −0.0282 0.0487 0.0409 0.0563

−0.1850 −0.2961 −0.2433 −0.1281 −0.0713

0.2058 0.1141 0.1599 0.2469 0.2881 0.0025 0.0678 0.0608 0.0678

0.1921 0.1002 0.1475 0.2407 0.2819 −0.0065 0.0717 0.0711 0.0720

−0.1905 −0.3568 −0.2791 −0.1034 −0.0157

0.0145 0.0867 0.0779 0.0879

0.0098 0.0297 0.0261 0.0313

0.0018 0.0376 0.0361 0.0376

BGMM Median 10%Q 25%Q 75%Q 90%Q Bias SD T-SD RMSE

0.0097 0.1320 0.1316 0.1323

0.0004 0.0425 0.0434 0.0425

0.0010 0.0434 0.0433 0.0434

1.0004 0.9608 0.9799 1.0210 1.0424 0.0009 0.0316 0.0325 0.0316

GMM with many IVs and best quadratic moments Median 10%Q 25%Q 75%Q 90%Q Bias SD T-SD RMSE

0.2058 0.1211 0.1608 0.2509 0.2884 0.0053 0.0649 0.0565 0.0651

0.1343 0.0646 0.1007 0.1684 0.1986 −0.0659 0.0521 0.0515 0.0840

−0.165 −0.2812 −0.2267 −0.1034 −0.0394 0.0359 0.0924 0.0936 0.0991

0.0037 0.0338 0.0350 0.0340

0.0163 0.0366 0.0361 0.0401

0.9999 0.9608 0.9798 1.0206 1.0413 0.0003 0.0317 0.0325 0.0317

Note: 1. We use θ0a = (0.2, 0.2, −0.2, 1). 2. The disturbance in DGP is from demeaned gamma (2, 1) with kurtosis being 6. 3. The GMME with many IVs and best quadratic moments is after bias correction. 4. For MLE, we use Lee and Yu (2010).

is kt = min(t , k). Thus, the number of IVs for each time period is Kt = pt · (1 + p + · · · + ppn ) where pt = kx T + kt . We study the single spatial lag case, i.e., p = 1, so that Kt = pt · pn . With n = 225, T = 10, and one exogenous variable (kx = 1), the number of IVs Kt ranges from 11 to 200. We run the simulation with 1000 repetitions. Table 9 is the plot of bias, SD and RMSE of 2SLSE before bias correction, and Table 10 is the plot after bias correction. From Table 9, before bias correction, when either k or pn increases, the bias increases and SD decreases for all the estimates. For RMSE, we see that a smaller number of IVs is preferred. From Table 10, after bias correction, we similarly have increasing bias and decreasing SD when either k or pn increases; however, the RMSE of λ is lowest at pn = 3 in the space dimension. Compared to Table 9, the RMSE of estimates for λ0 is smaller. We have the following summary observations: (1) the bias correction effectively reduces the RMSE of 2SLSE of λ0 for different combinations of pn and k; (2) the RMSE of 2SLSE of λ0 is minimized at pn = 3 in the space dimension, but is insensitive to the number of time lags in the IVs; (3) for other estimates, either they are insensitive to the number of IVs or the number of IV can be small in terms of RMSE. 5. Conclusion This paper proposes the GMM estimation of the spatial dynamic panel data model with fixed effects when n is large and T can be relatively small. We can stack up the data and construct a finite number of moment conditions, where we derive the best linear and quadratic moment conditions. Alternatively, we can use separate moment conditions for each time period, with which the number of IVs may increase as the time period increases. We show that

√ these estimators are nT consistent, asymptotically normal, and have efficient properties. In a simple dynamic panel data model with fixed effects, the OLS (least squares with dummy variables; within) estimate has O(1/T ) bias due to the correlation of predetermined variables and resulting disturbances after the elimination of fixed effects. The IV estimation approach avoids such a problem when a finite number of IVs is used as those IVs are uncorrelated with the disturbances. However, when the number of IVs increases without bound as the sample size increases, the correlation of the predetermined variables and disturbances is restored to some extent (determined by the number of IVs). In the SDPD model with fixed effects, and time and spatial time lags but no contemporaneous spatial effect, the OLS estimate has a similar O(1/T ) bias (Korniotis, 2008). For the SDPD model with the additional contemporaneous spatial lag, an additional O(1) bias for the OLS estimate would occur due to the simultaneity of that spatial lag variable. The simultaneity of the spatial lag can be handled in the QML approach as in Yu et al. (2008); but, the bias order O(1/T ) remains for the QMLE. On the contrary, IV estimates would not have such an order of bias when the number of IVs is finite. However, when the number of IVs increases (to infinity), the bias for the SDPD model will also be restored, and the bias for the estimate of the spatial effect would be dominant. A bias correction procedure can eliminate this dominating bias. Therefore, under the situation that T is small relative to n, we can have consistent estimates with properly centered asymptotic normal distribution. In addition to linear moments constructed from the time lags, spatial time lags, and exogenous variables, we also utilize quadratic moments to increase the efficiency of the estimates.

190

L.-f. Lee, J. Yu / Journal of Econometrics 180 (2014) 174–197

Table 9 Bias, SD and RMSE as pn and k increase from 1 to 10, before bias correction.

Note: 1. From top to bottom is bias, SD and RMSE. From left to right is λ, γ , ρ and β . 2. The biases for γ and β in the first row are negative. 3. The disturbance in DGP is from the standard normal distribution. Table 10 Bias, SD and RMSE as pn and k increase from 1 to 10, after bias correction.

Note: 1. From top to bottom is bias, SD and RMSE. From left to right is λ, γ , ρ and β . 2. The biases for λ, γ and β in the first row are negative. 3. The disturbance in DGP is from the standard normal distribution.

These quadratic moments are implied by the spatial effect in the SDPD model, which do not appear in the dynamic panel data models. This is a distinct feature of our GMM approach as compared with IV approaches for the estimation of spatial dynamic

panel models. We propose an optimal quadratic moment condition that is free of distributional assumption for the disturbances. The best GMM estimates from a finite number of moment conditions have the same asymptotic distribution of the MLE

L.-f. Lee, J. Yu / Journal of Econometrics 180 (2014) 174–197

when the disturbances are normal. Compared to MLE of the SDPD models, the GMM estimate is computationally simpler, and can be applied to higher spatial order models, or finite T case, or nonrow-normalized weights matrices that the MLE cannot easily deal with. Additionally, when the distribution is not normal, the best GMM estimate in the current paper can be more efficient relative to the QMLE as the kurtosis of the disturbances is used for the best quadratic moment.

191

(iii) under Assumption 6, 1

(∗,−1)′

n( T − 1 )

Yn,T −1 (IT −1 ⊗ B′n )V∗n,T −1 1

−E

(∗,−1)′

n(T − 1)

Yn,T −1 (IT −1 ⊗ B′n )V∗n,T −1 = Op





1



nT

σ02

(∗,−1)′ where E n(T1−1) Yn,T −1 (IT −1 ⊗ B′n )V∗n,T −1 = n(T −1) tr Bn





)Ahn−1 Sn−1 is O

  T −1 h=1

(1 −

1

Appendix A. Notations

h T

The following list summarizes some frequently used notations in the paper: p Sn (λ) = In − j=1 λj Wnj for any possible λj .

(iv) under Assumption 8, for the IV matrix Qnt , plimn→∞ n(T1−1) T −1 ′ ∗ t =1 Qnt Bn Vnt = 0.   ∗′ (v) E[(Vnt Bn Vnt∗ )2 ] = (µ4 − 3σ04 )cTt4 1 + (T −1t )3 vec′D (Bn )vecD

 = In − pj=1 λj0 Wnj , Gnj = Wnj−1 Sn and An = Sn−1 (γ0 In + j=1 ρj0 Wnj ). λ = (λ1 , . . . , λp )′ and ρ = (ρ1 , . . . , ρp )′ . T −1 1 (∗,−1) T −t 1 ∗ ∗ Yn,t −1 = ( T − ) 2 [Yn,t −1 − T − h=t Ynh ]; Xnt and Vnt are t +1 t S

p n

defined similarly. For any matrix bn with n rows, Wn bn ≡ [Wn1 bn , . . . , Wnp bn ] and Gn bn ≡ [Gn1 bn , . . . , Gnp bn ].

1) (∗,−1) [Yn(∗,− ,t −1 , Wn Yn,t −1 ,

Znt = [Yn,t −1 , Wn Yn,t −1 , Xnt β0 ] and Znt = ∗ Xnt ]. θ = (λ, δ ′ )′ with δ = (γ , ρ, β ′ ). kx is the column dimension of Xnt and kz = kx + p + 1 is the column dimension of Znt . hnt = [Yn0 , . . . , Yn,t −1 , Xn1 , . . . , XnT , ln ]. Qnt is the IV matrix for Section 3.1. p Hnt = (hnt , Wn hnt , . . . , Wn n hnt ) is the IV matrix for Section 3.2. ′ ′ Mnt = Jn Hnt (Hnt Jn Hnt )+ Hnt Jn . vecD (A) is the column vector formed by diagonal elements of any square matrix A. vec(A) is the column vector formed by stacking the columns of A. diag(A) is the diagonal matrix formed by diagonal elements of A. As = A′ + A for any square matrix A. Pnl,T −1 = IT −1 ⊗ Pnl for l = 1, 2, . . . , m and Jn,T −1 = IT −1 ⊗ Jn . ωnm,T = [vecD (Jn,T −1 Pn1,T −1 Jn,T −1 ), . . . , vecD (Jn,T −1 Pnm,T −1 Jn,T −1 )]. ∗ ∗ L∗nj,t = Gnj (Znt δ0 + αt0 ln ), and L∗nt = [L∗n1,t , L∗n2,t , . . . , L∗np,t ]. ∗ ∗′ ∗′ ∗′ ′ Zn,T −1 = (Zn1 , . . . , Zn,T −1 )′ and L∗n,T −1 = [L∗′ n1 , . . . , Ln,T −1 ] . ∗

1

j−1

A Φ

T −t ) 2 , Φj = h=0 Ahn and Ψt = cTt (In − nT −Tt−t ). cTt = ( T − t +1  T −1  T −1 1 1 X˜ n,tT = T −t Sn−1 h=t ΦT −h Xnh , V˜ n,tT = T − S −1 h=t ΦT −h Vnh t n

and α˜ tT = T −t Sn h=t ΦT −h αh0 .  t −1 Hnt = Ψt [Yn,t −1 − (In − An )−1 Sn t −1 1 s=1 (Sn Yns − Zns δ0 − αs0 ln )] − cTt X˜ n,tT β0 − cTt α˜ tT ln . ∗ ∗ Knt ≡ (Hnt , Wn Hnt , Xnt ) and Qnt = (Gn (Knt δ0 + αt0 ln ), Knt ). 1

−1

 T −1

(∗,−1) (∗,−1) ∗ E(Znt |It −1 ) = [E(Yn,t −1 |It −1 ), Wn E(Yn,t −1 |It −1 ), Xnt ]. ∗ ∗ 1 ∗ ∗ ′ 6nT ,22 = n(T −1) (Ln,T −1 , Zn,T −1 ) Jn,T −1 (Ln,T −1 , Zn,T −1 ). ∗ ∗ ∗ fnt = [Gn (E(Znt |It −1 )δ0 + αt0 ln ), E(Znt |It −1 )]. φnt = −cTt [V˜ n,tT , Wn V˜ n,tT , 0n×kx ] and unt = [Gn (φnt + Vnt∗ ), φnt ]. ∗

Lemma 1. Under Assumption 2, for any n × n nonstochastic UB matrices Bn , (i) E(Vnt∗ Bn Vns∗′ |It −1 ) = 0 for t ̸= s; 1 V∗′ n(T −1) n,T −1

(IT −1 ⊗ Bn )V∗n,T −1 = 1n σ02 trBn + Op

;

(Bn ) + σ04 [tr2 (Bn ) + tr(Bn Bsn )].

Lemma 2. Under Assumption 2 with Pnj,T −1 = IT −1 ⊗ Pnj , the ∗ ∗′ ∗ covariance of V∗′ n,T −1 Pnj,T −1 Vn,T −1 and Vn,T −1 Pnl,T −1 Vn,T −1 is

σ04 tr(Pnj,T −1 Psnl,T −1 ) + (µ4 − 3σ04 )vec′D (Pnj,T −1 )vecD (Pn,T −1,l ) ∗ ′ ∗ and that of V∗′ n,T −1 Pnj,T −1 Vn,T −1 and Qn,T −1 Vn,T −1 is zero for j = 1, . . . , m.

Let Cnt be an n × 1 column vector from the IV matrix Qnt in ′ ∗ ∗′ ∗ Assumption 8. Denote snt = Cnt Vnt + Vnt Bn Vnt −σ02 trBn and σs2,nT =

Var(

 T −1



(

4 0 tr

t =1

snt ) = E(σ02

Bn Bsn

T −1 t =1

)).

′ Cnt Cnt + T (µ4 − 3σ04 )



√1 nT



;

n

i=1

b2n,ii +

Lemma 3. Under Assumptions 2, 7 and 8, if { n(T1−1) σs2,nT } is bounded  T −1 s t =1 nt

away from zero,

σs,nT

d

→ N (0, 1).

1

 j −1

T −t ) 2 , Φj = Lemma 4. Denote cTt = ( T − t +1 An ΦT −t cTt (In − T −t ). We have

(∗,−1)

h=0

Ahn and Ψt =

(∗,−1)

Yn,t −1 = E(Yn,t −1 |It −1 ) − cTt V˜ n,tT and (∗,−1) Yn,t −1

 = Hnt + Ψt (In − An ) Sn

t −1 

1

−1 −1

t − 1 s=1

 Vns − cTt V˜ n,tT

,

(∗,−1)

w where E(Yn,t −1 |It −1 ) = Ψt Ynw,t −1 − cTt X˜ n,tT β0 − cTt α˜ tT ln , Ynt =

1 Ynt − (In − An )−1 Sn−1 cn0 , V˜ n,tT = T − S −1 t n in (17).

T −1 h =t

ΦT −h Vnh and Hnt is

For Hnt in (17), (∗,−1)

E(Yn,t −1 |It −1 ) = Hnt + Wnt , (∗,−1) Yn,t −1

and

(29)

= Hnt + Wnt − cTt V˜ n,tT ,

where Wn1 An )−1 Sn−1 t −1 1

= −Ψ (I − An )−1 Sn−1 cn0 and Wnt = Ψt (In − t −1 1 n s=1 Vns for t ≥ 2.

Lemma 5. Under Assumptions 1–7, when n, T nonstochastic square matrix Bn ,

Appendix B. Some lemmas

(ii)

T

1

T −1 

n( T − 1 ) t = 1

 Hnt Bn Hnt = E

T −1 

1



n(T − 1) t =1

 + Op

1



nT



→ ∞, for any 



Hnt Bn Hnt

192

L.-f. Lee, J. Yu / Journal of Econometrics 180 (2014) 174–197

with E( n(T1−1) T −1 

1

n(T − 1) t =1 T −1 

1

n(T − 1) t =1

T −1 t =1

′ ′ ′ Proof47 for Lemma 2. Denote VnT = (Vn1 , . . . , VnT ) . As V∗n,T −1 = ′ (FT ,T −1 ⊗ In )VnT and Pnj,T −1 = IT −1 ⊗ Pnj , we have Cov(V∗′ n,T −1 ∗ ′ ′ Pnj,T −1 V∗n,T −1 , V∗′ n,T −1 Pnl,T −1 Vn,T −1 ) = Cov(VnT (JT ⊗ Pnj )VnT , VnT

H′nt Bn Hnt ) = O(1). Also,

W′nt Bn Wnt = op (1)

and

(JT ⊗ Pnl )VnT ) = σ04 tr((JT ⊗ Pnj )(JT ⊗ Pnls )) + (µ4 − 3σ04 ) vec′D (JT ⊗ Pnj )vecD (JT ⊗ Pnl ), by using the variance formulae of

H′nt Bn Wnt = op (1).

(Wnt − cTt V˜ n,tT )′ Bn (Wnt − cTt V˜ n,tT ) = op (1) Similarly, n(T1−1) T −1 t′ =1 1 ˜ and n(T −1) t =1 Hnt Bn (Wnt − cTt Vn,tT ) = op (1). T −1

′ ′ Let Mnt = Jn Hnt (Hnt Jn Hnt )+ Hnt Jn so that Mnt is an n × n idempotent matrix with rank Kt .

Lemma 6. For any UB n × n square matrices B1n and B2n , (i) tr(Mnt B1n B′1n Mnt ) ≤ cKt , where c is a finite constant (for all n and t); (ii) |tr(B1n Mnt )| and |tr(B1n Mnt B2n )| are less than cKt for some c > 0; √ (iii) |tr(Mnt B1n Mns B2n )| are less than c Kt Ks for some c > 0.

quadratic form of i.i.d. disturbances. For the first component, we s have tr((JT ⊗ Pnj )(JT ⊗ Pnl )) = (T − 1)tr(Pnj Pnls ) = tr(Pnj,T −1 Psnl,T −1 ). For the second component, we have vec′D (JT ⊗ Pnj )vecD (JT ⊗ Pnl ) = (T − 1)vec′D (Pnj )vecD (Pnl ) = vec′D (Pnj,T −1 )vecD (Pnl,T −1 ). Thus, the covariance matrix is ∗ ∗′ ∗ Cov(V∗′ n,T −1 Pnj,T −1 Vn,T −1 , Vn,T −1 Pnl,T −1 Vn,T −1 )

= σ04 tr(Pnj,T −1 Psnl,T −1 ) + (µ4 − 3σ04 )vec′D (Pnj,T −1 )vecD (Pnl,T −1 ). T −1 ∗′ ∗ ′ ∗ For Cov(V∗′ n,T −1 Pnj,T −1 Vn,T −1 , Qn,T −1 Vn,T −1 ) = E[( t =1 Vnt Pn,j  T −1 ∗ ′ ∗ Vnt )( t =1 Qnt Vnt )], we have ∗′ ∗ E(Vnt Pn,j Vnt )(Qnt′ Vnt∗ )

 =

Lemma 7. Under Assumptions 1–6, for any nonstochastic UB matrix Bn ,   (i) E( Tt =−11 Vnt∗′ Bn Mnt Vnt∗ ) = σ02 Tt =−11 E[tr(Bn Mnt )] = O

(

T −1 t =1

(ii)

K ), and t T −1 ∗′  T −1 √ 2 ∗ Kt ). t =1 (Vnt Bn Mnt Vnt − σ0 tr(Bn Mnt )) = Op ( t =1

Lemma 8. Let ηnt = −cTt V˜ n,tT . Under Assumptions 1–6, for any nonstochastic UB matrix Bn ,

 E

T −1 

 ηnt Bn Mnt Vnt ′



σ02 ′ ′ E[tr(Mnt CnTt Sn−1 B′n )] T +1−t   T −1  Kt

=O

t =1

T −1 ′ ∗ t =1 (ηnt Bn Mnt Vnt +    T −1 Kt

and

′ σ02 ′ tr(Mnt CnTt Sn−1 B′n )) T +1−t

 ×

Vnt −



T 

1

T − t h=t +1

 = =

2 cTs cTt E

Lemma 9. Under Assumptions 1–7, suppose we choose Hnt from (22).  T −1 For each t, there exists a matrix π t such that n(T1−1) t =1 (fnt − Hnt ·

Vns −



Op

T +1 −t    T −1 ′ T −1 Kt Also, E( t =1 ηnt Bn1 Mnt Bn2 ηnt ) = O for any nont =1 T −t +1 T −1 ′ stochastic UB matrices Bn1 and Bn2 , and (ηnt Bn1 MntBn2 ηnt −  √ t =1 T −1 Kt ′ E(ηnt Bn1 Mnt Bn2 ηnt |It −1 )) = Op . t =1 T −t +1

πt )′ (fnt − Hnt · πt ) → 0 as n, T → ∞. Lemma 10 is about magnitudes of certain orders in the 2SLS estimate with many IVs in (26). Denote K = max{K1 , . . . , KT −1 }, T −1 ′ ef (K ) = n(T1−1) t =1 fnt (In − Mnt )fnt and ∆K = tr(ef (K )). Lemma 10. Under Assumptions 1–7 and n, T → ∞, (i) ∆K = op (1); √ 1 n(T −1)

(iii)

1 n(T −1)

 T −1 t =1

T −1 t =1

fnt′ Mnt Bn V˜ n,tT = Op





√ 1 n(T −1)



and n(T1−1)

×

Vns −

Vnt −

Qnt

T 

1

T − t h=t +1

 Vnh

=

2 cTs cTt E



 ×



T − s h=t

Vnh

Vnt −

T 

1

T − t h=t +1

 Vnh

Pn,j

 ′

Qnt

Vnt −



1

2 = cTs cTt µ3 EQnt′ vecD (Pn,j )

( T − s) 2

1

T 

T − t h=t +1



 Vnh



T −t

(T − s)2 (T − t )

= 0,

′ because EVng Pn,j Vnh Qnt′ Vnp = 0 for g , h < t and p ≥ t, E[Vnt − 1 T −t

T

h =t +1

T

Vnh ](

h=t

Vnh ) = 0 and E(Qnt |It −1 ) = Qnt . For s < t,

we have ∗′ ∗ E(Vnt Pn,j Vnt )(Qns′ Vns∗ ) 2 cTt cTs E

 T −1 t =1

 Qnt

 Vnh

P n ,j

′

T − s h =t T 

Vnh



Vnh

T 

1

1

′



T − s h=s+1



=



√ 1 ; n(T −1)  T −1 1 ′ (iv) (unt Mnt unt − E(u′nt Mnt unt |It −1 )) T −1 √n(T −1) t =1 1 T −1 ′ Kt ) where n(T −1) t =1 E(unt Mnt unt |It −1 ) tT=−11 t =1 Kt ).

Vnh

T − s h=s+1 T 

1



∗ fnt′ (In − Mnt )Vnt = Op ((E∆K )1/2 );

∗ fnt′ Mnt Bn Vnt = Op

 ′

T 

1

p

(ii)

Pn,j

  1 = cTt3 µ3 EQnt′ vecD (Pn,j ) 1 − (T − t )2   1 ′ = µ3 EQnt vecD (Pn,j ) · cTt 1 − . T −t

1 , where CnTt = T − (I + 2An +· · ·+(T − t )AnT −1−t ). t n

t =1

Vnh

T − t h=t +1

∗′ ∗ E(Vns Pn,j Vns )(Qnt′ Vnt∗ )

,

(T + 1 − t )(T − t )

Vnt −

′

T 

1

For s < t, we have

=−

t =1

3 E cTt

 ×

Vnt −

Vnt − 1

1

T 

T − t h =t +1 T 

T − t h=t +1

′ Vnh

 Vnh

Pn,j

 ′

Qns

Vns −

1

T 

T − s h=s+1

 Vnh

= Op ( n(T1−1)

= O( n(T1−1)

47 Proofs for all the lemmas are in a supplement file available upon request (see Appendix E).

L.-f. Lee, J. Yu / Journal of Econometrics 180 (2014) 174–197

 = cTt2 cTs E

Vnt −

 ×

Vnt −

′

T 

1

T − t h=t +1

1

Vnh

T − t h=t +1

= d∗′ n,T −1 (θ )

P n ,j

 × Qns′ −

Vnh

T 

1

T − s h =t

 qn,T −1 (θ ) =

∗′



Vnt Pn,j Vnt

 T −1 

′ (λj0 −

λj )Gnj,T −1 V∗n,T −1

j =1



 ′

m 



(l)

anT Jn,T −1 Pnl,T −1 Jn,T −1

Qnt Vnt

V∗n,T −1

×

t =1



t =1 T −1

T −2  T −1   s =1 t >s





T −t

t =1



µ3 EQns vecD (Pn,j ) · cTs 1 − ′

∗′ ∗ E (Vnt Pn,j Vnt )(Qns′ Vns∗ )



1

µ3 EQnt′ vecD (Pn,j ) · cTt 1 −

T −2  s=1

1 T −s

 = 0.

it is sufficient to prove that n(T1−1) ln,T −1 (θ ) and n(T1−1) qn,T −1 (θ ) converge uniformly to some well defined limits. By Lemma 1, 1 l (θ ) will converge to 0 and n(T1−1) qn,T −1 (θ ) will n(T −1) n,T −1 converge to its mean. The uniform convergence and identification uniqueness imply the consistency of the estimator. For the asymptotic distribution, by the Taylor expansion, we have



∗ ′ ∗ Therefore, Cov(V∗′ n,T −1 Pnj,T −1 Vn,T −1 , Qn,T −1 Vn,T −1 (θ )) = 0.

n(T − 1)(θˆnT − θ0 )



′ ˆ ∂ gnT (θnT )/∂θ ′ ∂ gnT (θ¯nT )/∂θ ′ =− anT anT n(T − 1) n(T − 1)



Appendix C. Proofs for theorems

× C.1. Proof for Theorem 1 We first derive the uniform convergence of n(T1−1) anT gnT (θ ). Combined with the identification in Assumption 8, the consistency of GMM estimator θˆnT will follow. (m) (Q ) (1) Let anT = (anT , . . . , anT , anT ) be a ka × (m + q) matrix. Then, 1 n(T − 1)

anT gnT (θ ) =

1 n(T − 1)

 ×

m 

V∗′ n,T −1 (θ )



(l)

anT Jn,T −1 Pnl,T −1 Jn,T −1

V∗n,T −1 (θ )

1 n(T − 1)

where θ¯nT lies between θˆnT and θ0 and with ∂ gnT (θ )

∂λ′

(Q )

p

where, by expansion, V∗n,T −1 (θ ) = d∗n,T −1 (θ )+(In(T −1) + j=1 (λj0 − λj )Gnj,T −1 )V∗n,T −1 with d∗n,T −1 (θ ) = L∗n,T −1 (λ0 −λ)+ Z∗n,T −1 (δ0 −δ). For 1 (Q ) a Q′n,T −1 Jn,T −1 V∗n,T −1 (θ ) n(T − 1) nT 1 (Q ) = a Q′n,T −1 Jn,T −1 d∗n,T −1 (θ ) n(T − 1) nT 1 (Q ) + a Q′n,T −1 Jn,T −1 (In(T −1) n(T − 1) nT p  j =1

 ··· ..  .   · · · ···

Q′n,T −1 Jn,T −1 Wnj,T −1 Y∗n,T −1

s ∗ V∗′ n,T −1 (θ )Jn,T −1 Pn1,T −1 Jn,T −1 Zn,T −1



  .. ∂ gnT (θ )   . = −  ∗′ . ′ Vn,T −1 (θ )Jn,T −1 Psnm,T −1 Jn,T −1 Z∗n,T −1  ∂δ Q′n,T −1 Jn,T −1 Z∗n,T −1

s ∗ For V∗′ n,T −1 (θ )Jn,T −1 Pnl,T −1 Jn,T −1 Zn,T −1 in

∂ gnT (θ ) , ∂δ ′

zero mean. From Lemma 1(iii), we have E n(T1−1)   

=

σ02

n(T −1)T

bl γ =

σ02 n

n

Y˜n′ ,t −1 B′n V˜ nt

(T − h)Ahn−1 Sn−1 Bn . By denoting  T −1  s h−1 −1 (T − h)An Sn Jn Pnl Jn ,

T −1

tr

h =1

 tr

σ02

it has a non-

T t =1

1

T − 1 h=1

 tr

1

 T −1  h−1 −1 s (T − h)An Sn Jn Pnl Jn Wnk ,

T − 1 h=1

and

the second term is op (1) uniformly in θ ∈ Θ from Lemma 1 (iv). Because V∗′ n,T −1 (θ )

nT (θ ) ∂ gnT (θ ) = ( ∂ g∂λ ) ′ , ∂δ ′

and

blρk =

(λj0 − λj )Gnj,T −1 )V∗n,T −1 ,

∂ gnT (θ ) ∂θ ′

(Wnj,T −1 Y∗n,T −1 )′ Jn,T −1 Psn1,T −1 Jn,T −1 V∗n,T −1 (θ ) .. . ∗ ′ (Wnj,T −1 Yn,T −1 ) Jn,T −1 Psnm,T −1 Jn,T −1 V∗n,T −1 (θ )

···  .  . = − . · · · ···

 anT Q′n,T −1 Jn,T −1 V∗n,T −1 (θ ),

 −1

′ ˆ gnT (θ0 ) ∂ gnT (θnT )/∂θ ′ anT anT √ , n(T − 1) n(T − 1)

l =1

+

 p  ∗ + (λj0 − λj )Gnj,T −1 Vn,T −1 , j =1

∗′ ∗ E (Vnt Pn,j Vnt )(Qnt′ Vnt∗ ) +







T −1  

+

+

p 

l=1

t =1



V∗n,T −1

×

 T −1 

=

d∗n,T −1 (θ )

 (l) ∗ where ln,T −1 (θ ) = d∗′ (θ )( m n , T − 1 l=1 anT Jn,T −1 Pnl,T −1 Jn,T −1 )(Vn,T −1 + p ∗ j=1 (λj0 − λj )Gnj,T −1 Vn,T −1 ) and

Vnh

Hence,

=

(l)

anT Jn,T −1 Pnl,T −1 Jn,T −1

+ ln,T −1 (θ ) + qn,T −1 (θ )



  1 T −t = cTt2 cTs µ3 EQns′ vecD (Pn,j ) − − T −s ( T − t ) 2 ( T − s)   1 = µ3 EQns′ vecD (Pn,j ) · −cTs . T −s

E

m 

193



l =1



T 



 m  l =1

(l)

anT Jn,T −1 Pnl,T −1 Jn,T −1

 V∗n,T −1 (θ )

blδ = [blγ , blρ1 , . . . , blρk , 0kx ×1 ],

(30)

we have 1 n( T − 1 )

s ∗ Z∗′ n,T −1 Jn,T −1 Pnl,T −1 Jn,T −1 Vn,T −1

1 = b′lδ + Op T



1



nT

 ,

194

L.-f. Lee, J. Yu / Journal of Econometrics 180 (2014) 174–197

for l = 1, . . . , m. For (Wnj,T −1 Y∗n,T −1 )′ Jn,T −1 Psn1,T −1 Jn,T −1 V∗n,T −1 (θ ) ∂ g (θ )

∗ ∗ ∗ nT in ∂λ ′ , as Wnj,T −1 Yn,T −1 = Lnj,T −1 +Gnj,T −1 Vn,T −1 from (4) where ∗ ∗′ ∗′ ′ Lnj,T −1 = [Lnj,1 , . . . , Lnj,T −1 ] , we have

1 n(T − 1)

V∗n,T −1 G′nj,T −1 Jn,T −1 Psnl,T −1 Jn,T −1 V∗n,T −1

1

=

n(T − 1)

tr(



Gnj,T −1 Jn,T −1 Psnl,T −1 Jn,T −1

) + Op

( 

n(T − 1)

s ∗ L∗′ nj,T −1 Jn,T −1 Pnl,T −1 Jn,T −1 Vn,T −1 =

1



T

σ04

b ′ −1 ωnm ,T ωnm,T + ∆mn,T ) Cmp,nT in (14) is maximized at Cp,nT

1 Q′n,T −1 Jn,T −1 (L∗n,T −1 , Z∗n,T −1 ) n( T − 1 ) 1 = plimn,T →∞ Q′n,T −1 Jn,T −1 Qn,T −1 . n(T − 1)

nT

plimn,T →∞

 blλj + Op

µ4 −3σ04

by choosing P∗nl,T −1 in (15). For linear moments, from Lemma 1(iv),



1

by Lemma 1(ii) and 1

proceed to the asymptotic equivalence of the best GMME using ˆ ∗nj,T −1 from (15) and Q ˆ nt in (19) with that using the estimated P exact ones. ′ For quadratic moments, from Appendix D, n(T1−1) Cmp ,nT



1



nT

Thus,

by Lemma 1(iii) where blλj =

σ

2 0

n



 γ0 In +

tr Gnj

p 



plimn,T →∞



ρj0 Wnj blγ

(31)

≤ plimn,T →∞

j =1

for l = 1, . . . , m and j = 1, . . . , p. Thus, by denoting DnT = −

σ02 Cmp,nT



1

0m×kz Q′n,T −1 Jn,T −1 Z∗n,T −1

Q′n,T −1 Jn,T −1 L∗n,T −1

n( T − 1 )

,

plimn,T →∞

··· .. . ··· ···

1λ1

RnT =

1  ..  .

T bmλ 1 0q×1

b1λp

b1δ

.. .

..  1 b mp . ≡  T 0q×p b

bmλp 0q×1

bmδ 0q×kz





, (33)

1

··· .. . ···

.. .

 = 

tr(G′n1,T −1 Jn,T −1 Psnm,T −1 Jn,T −1 )

tr(G′np,T −1 Jn,T −1 Psn1,T −1 Jn,T −1 )

.. .

tr(G′np,T −1 Jn,T −1 Psnm,T −1 Jn,T −1 )

   

∂ gnT (θˆnT ) 1 n(T −1) ∂θ ′

= DnT + RnT + op (1) where DnT defined in (32) is defined in (33) is O(1/T ). By denoting DnT = DnT +

RnT , we have

′ (θˆ )/∂θ ∂ gnT ∂ g (θ¯ )/∂θ ′ nT a′nT anT nTn(TnT n(T −1) −1)

′ ′ = DnT anT anT DnT +

d

op (1). Also, √n(1T −1) anT gnT (θ0 ) → N (0, plimn→∞ anT ΣnT a′nT ) from Lemma 3. Hence, d

n(θˆnT − θ0 ) → N



0, plimn→∞

1 T −1

′ ′ DnT anT anT DnT



′ ′ × DnT anT anT ΣnT a′nT anT DnT DnT a′nT anT DnT



Q′n,T −1 Jn,T −1 Qn,T −1 .

∗ ∗ V∗′ n,T −1 (θ )Jn,T −1 Pn1,T −1 Jn,T −1 Vn,T −1 (θ )





..

 

and the identification and uniform convergence of the GMM objective function can be obtained similar to the proof in ˆ nt and Pˆ ∗nj,T −1 , the feasible Appendix C.1. When we use estimated Q moment conditions are



and Gnj,T −1 = (IT −1 ⊗ Gnj ), with θˆnT − θ0 = op (1), we have O(1) and RnT

n(T − 1)

 

Cmp,nT tr(G′n1,T −1 Jn,T −1 Psn1,T −1 Jn,T −1 )

(L∗n,T −1 , Z∗n,T −1 )′ Jn,T −1 (L∗n,T −1 , Z∗n,T −1 )

. gnT (θ ) =   ∗ ∗ V∗′  n,T −1 (θ )Jn,T −1 Pnp,T −1 Jn,T −1 Vn,T −1 (θ ) ′ ∗ Qn,T −1 Jn,T −1 Vn,T −1 (θ )

0q×kz

where



Q′n,T −1 Jn,T −1 Qn,T −1 .

When we use the best P∗nj,T −1 in (15) and Qnt in (18), the infeasible moment conditions are

 

n(T − 1)

n( T − 1 ) 1

= plimn,T →∞

and

b

(L∗n,T −1 , Z∗n,T −1 )′ MJQ ,nT (L∗n,T −1 , Z∗n,T −1 )

Therefore, the best IV is Qn,T −1 . By Lemma 5,

 (32)



1

n( T − 1 ) 1

 −1



 −1

.

−1 For the optimum GMM, ΣnT is used as a′nT anT , and its efficiency relative to the ones with anT follows from the generalized Cauchy–Schwarz inequality. ˆ nT so that Σ ˆ nT = ΣnT + op (1), we When ΣnT is replaced by Σ will have the same asymptotic distribution by similar arguments for proposition 2 in Lee (2007). 

C.2. Proof for Theorem 2 −1 ′ For the variance matrix (DnT ΣnT DnT )−1 in (12) of the OGMME, we can investigate the best linear and quadratic moments when T is large, where DnT is reduced to DnT asymptotically. We shall first show that P∗nj,T −1 in (15) is the best quadratic moment matrix, and Qnt in (18) is the best linear IV matrix when T is large. We then

∗ ˆ∗ V∗′ n,T −1 (θ )Jn,T −1 Pn1,T −1 Jn,T −1 Vn,T −1 (θ )



  V∗′

gˆnT (θ ) = 

n,T −1

.. .

(θ )Jn,T −1 Pˆ ∗np,T −1 Jn,T −1 V∗n,T −1 ˆ ′n,T −1 Jn,T −1 V∗n,T −1 (θ ) Q

  . (θ )

First, ∥An (θˆ )∥∞ −∥An ∥∞ = op (1) under ∥θˆ −θ0 ∥ = op (1). Because

 h ˆ h −1 ˆ ∥An ∥∞ < 1, ∞ −(1− h=1 (∥An (θ )∥∞ −∥An ∥∞ ) = (1−∥An (θ )∥∞ ) ∞ −1 h ˆ h ∥An ∥∞ ) = op (1). Hence, h=0 ∥An (θ ) − An ∥∞ = op (1) and the T −1 s−1 h 1 h ˆ elements of T − s=t h=1 (An (θ )−An )Xn,s−h are op (1) uniformly. t Similarly, as ∥θˆ − θ0 ∥ = op (1), the elements of estimated time effects αˆ t = 1n l′n rˆnt are consistently estimated. So are the α˜ tT ln in ∗ ln in (18) using the estimated αˆ s for s = t , . . . , T − 1. (17) and αt0 ˆ Sn−1 ∥∞ = op (1) and ∥Gnj (λ)− ˆ Therefore, combined with ∥Sn−1 (λ)− ˆ ˆ nt − Hnt = ∥θ − θ0 ∥ · BHnt for some BHnt , of Gnj ∥∞ = op (1), H which its elements are bounded uniformly in n and t in probability. ˆ ∗nj,T −1 = op (1) · BP Similarly, P∗nj,T −1 − P for some BPnj,T −1 which nj,T −1 also has a block diagonal pattern similar to P∗nj,T −1 with its diagonal matrices being UB in probability. Thus, 1 n(T − 1)

gˆnT (θ ) =

1 n(T − 1)

gnT (θ ) +

1 n(T − 1)

∗ ∗ ˆ∗ V∗′ n,T −1 (θ )Jn,T −1 (Pn1,T −1 − Pn1,T −1 )Jn,T −1 Vn,T −1 (θ )





 ..  . . ∗ ∗ ∗  ˆ n,T −1 (θ )Jn,T −1 (Pnp,T −1 − Pnp,T −1 )Jn,T −1 Vn,T −1 (θ ) ′ ∗ ˜ n,T −1 − Qn,T −1 ) Jn,T −1 Vn,T −1 (θ ) (Q

  × V∗′

L.-f. Lee, J. Yu / Journal of Econometrics 180 (2014) 174–197

ˆ nt − Hnt = ∥ θˆ − θ 0 ∥ · BHnt , the 1 (Q ˆ n,T −1 − Qn,T −1 )′ As H n(T −1) 1 ∗ ′ ˆ n,T −1 − Qn,T −1 ) Jn,T −1 d∗n,T −1 (θ ) + Jn,T −1 Vn,T −1 (θ ) = n(T −1) (Q p

(λj0 − λj )Gnj,T −1 ) ˆ will be op (1) uniformly in θ because ∥ θ − θ 0 ∥ = op (1). ∗ ∗ ˆ∗ Similarly, n(T1−1) V∗′ n,T −1 (θ )Jn,T −1 (Pnj,T −1 − Pnj,T −1 )Jn,T −1 Vn,T −1 (θ ) is 1 op (1) uniformly in θ . Thus, the identification of n(T −1) gnT (θ ) implies the identification of n(T1−1) gˆnT (θ ) and the uniform convergence of n(T1−1) anT gnT (θ ) will imply the uniform convergence of 1 a gˆ (θ ). Hence, the consistency of the estimates using the n(T −1) nT nT 1 n(T −1)

ˆ n,T −1 − Qn,T −1 )′ Jn,T −1 (In(T −1) + (Q

j=1

and T2h,2 = √

V∗n,T −1

feasible moments follows. With  1 σ02 Cpb,nT DnT = − ′ ∗ n(T − 1) Qn,T −1 Jn,T −1 Ln,T −1

0m×kz Q′n,T −1 Jn,T −1 Z∗n,T −1

 ,

∂ gˆ (θˆ )

we have n(T1−1) nT∂θ ′ nT = DnT + op (1) under large T . Also, as    ˆ nt − Hnt =  H θˆ − θ 0  · BHnt and P∗nj,T −1 − Pˆ ∗nj,T −1 = op (1) · BPnj,T −1 , we have (see Lee, 2007 for details) 1



n( T − 1 )

(˜gnT (θ0 ) − gnT (θ0 )) 

∗ ∗ ˆ∗ V∗′ n,T −1 Jn,T −1 (Pn1,T −1 − Pn1,T −1 )Jn,T −1 Vn,T −1



 ..  .  ∗ ∗ ∗  ˆ n,T −1 Jn,T −1 (Pnp,T −1 − Pnp,T −1 )Jn,T −1 Vn,T −1 ′ ∗ ˆ n,T −1 − Qn,T −1 ) Jn,T −1 Vn,T −1 (Q

1

= op (1).

C.3. Proof for Theorem 3

By Lemma 7, T2h,1 = ϕ1 + (T2h,1 − ϕ1 ), where ϕ1 = Op is the conditional mean of T2h,1 and (T2h,1 − ϕ1 ) = which is not larger than the order Op

 T −1

hˆ = h +

and

ϕ

i=1



t =1

is UB in probability,

 T −1 t =1

T −1  [tr(Mnt Gnj )] and Tt =−11 [tr(Mnt G2nj (λ¯ nT ))]  t =1

Kt , we have

1 bˆ 1,λj − b1,λj = √ n( T − 1 )



T −1 T −1   σˆ [tr(Mnt Gnj (λˆ 2sls,nT ))] − σ02 [tr(Mnt Gnj )]

(



2 nT

t =1







T −1

2

T −1





Kt  Kt     1   t =1    t =1 = Op max  √ √  ,  .  n(T − 1)  n(T − 1)  n(T − 1)  



=

(



= −

 T −1

ˆ 2sls,nT ) − Gnj = G2nj (λ¯ nT )(λˆ j,2sls,nT − λj0 ) where G2nj (λ¯ nT ) With Gnj (λ



+ ˆ



=



= −



=

. By Lemma 8, T2h,2 =

 T −1

t =1

(34)

T −1 ′ T −1 ′ 1 1 H t =1 fnt fnt , Z1 t =1 fnt In n(T −1) n(T −1) T −1 ′ T −1 ′ 1 1 Mnt fnt , Z2H f M u u t =1 nt nt nt t =1 nt Mnt fnt and n(T −1) n(T −1) T −1 ′ 1 H √ 1 Z3 t =1 unt Mnt unt . For h, we have h n(T −1) n(T −1) T −1 ′ T −1 ′ h h ∗ ∗ √ 1 f I M V and T n nt nt nt 2 t =1 fnt Vnt , T1 t = 1 n(T −1) T − 1 ∗ ′ √ 1 t =1 unt Mnt Vnt . n(T −1) For the terms in H, we have H Op 1 from Lemma 5. Z1H Op √1 Op E ∆K op 1 as K from Lemma 10(i); Z2H nT T −1 K t H t =1 from Lemma 10(iii), and Z3 Op from Lemma 10(iv). nT

)









endogeneity component T2h,1 dominates T2h,2 . This implies that, for the bias of the estimates due to many moments, the dominant term is caused by the spatial endogeneity component. ˆ and hˆ with limn,T →∞ H = Combining the expansions of H plimn,T →∞ Σ nT ,22 from Lemma 5, (27) follows.

i=1

We have H =

Kt

p

t =1

.

T −1

√ t =1 n(T −1)  √  T −1 Kt Op √tn=(1T −1) ,

→ 0. Therefore, in T2h , the spatial see that ϕ2 → 0 when T −K 1 1 Kt

× Tih

TK n



 ϕ2 + (T2h,2 − ϕ2 ) where ϕ2 = Op √1nT t =1 (T +1−Ktt)(T −t ) and     (T2h,2 − ϕ2 ) = Op √n(1T −1) Tt =−11 T +K1t−t . Hence, hˆ = h +     √  Kt T −1 T −1 Kt t =1 T +1−t t = 1 √ ϕ1 + ϕ2 + e1 Op √n(T −1) + Op + op (1). We n(T −1)

(θˆ2sl,nT − θ0 ) = [Hˆ ]−1 × hˆ where

2 





of order O

′ ˆ = 1 For (26), denote H t =1 (fnt + unt ) Mnt (fnt + unt ) and n(T −1) √  T − 1 1 ′ ∗ hˆ = √n(T −1) n(T − 1) t =1 (fnt + unt ) Mnt Vnt . For the 2SLSE,

ˆ =H+ H

n(T − 1) t =1

(Sn (λˆ 2sl,nT )Ynt∗ − Znt∗ δˆ2sl,nT )′ Jn (Sn (λˆ 2sl,nT )    T −1 K ∗ Ynt − Znt∗ δˆ2sl,nT ). As θˆ2sl,nT − θ0 = Op max n(tT=−1 1)t , √n(1T −1)    T −1 K 2 from (27), we have σˆ nT − σ02 = Op max n(tT=−1 1)t , √n(1T −1) .

Thus, the GMME from the feasible moments have the same asymptotic distribution as the infeasible ones. 

ZiH

T −1  (Gn φnt δ0 , φnt )′ Mnt Vnt∗ .

1

2 Let σˆ nT = n(T1−1)

  = √  n(T − 1) V∗′

3 

195



)

=

Thus, with ϕ2 = Op



√K nT



, we have



ˆ

( (

)) =

( )

( )

=

=

→∞





= 



 



1   1 n(T − 1)(θˆ2sl ,nT − θ0 ) + Op max  √   n(T − 1)





=

ˆ = H + Op Therefore, H



T −1 K t =1 t

nT



 + op (1).

ˆ h will be asymptotically normally distributed For the terms in h, d

T −1 ′ 2 ∗ by Lemma 3 as √n(1T −1) t =1 fnt Vnt → N (0, σ0 plimn→∞ Σ nT ,22 ). √ h For the residual terms, T1 = Op ( E(∆K )) = op (1) from Lemma 10(ii); T2h has two components from (25) which are



T −1 

1 T2h,1 = √ (Gn Vnt∗ , 0n×1 , 0n×p , 0n×kx )′ Mnt Vnt∗ , n(T − 1) t =1

T −1



2

T −1



T −1

√



Kt  Kt Kt    K  t =1   t =1 t =1 × √ ,√ ,√  ,   n(T − 1)  n(T − 1) n( T − 1 )   nT d

1 → N (0, σ02 plimn,T →∞ 6− nT ,22 ).

T −1

T −1 √

K Kt 1 → 0 and √tn=(1T −1)t → 0, θˆ2sl Under √nt(=T1−1) → c, T −K 1 ,nT is K t t =1 asymptotically centered normal. 

196

L.-f. Lee, J. Yu / Journal of Econometrics 180 (2014) 174–197

C.4. Proof for Theorem 4 T −1

Kt

We will first prove the consistency of the GMME under n(tT=−1 1) → 0, then establish its asymptotic normality. Subsequently, we analyze its bias corrected version. For the identification, Assumption 9 provides the sufficient rank condition. Based on the ideal IVs, λ0 and δ0 can be identified. For the many IVs approach, as linear combinations of the many IVs converge to the ideal IVs in the limit from Lemma 9, λ0 and δ0 can thus be identified from the many IV’s −1 ′ conditions. For the uniform convergence of gnT (θ )ΣnT gnT (θ ) in −1 ′ θ , as analysis of the part gnT ,1 (θ )ΣnT ,1 gnT ,1 (θ ) is the same as that

−1 in Appendix C.1, we analyze the remaining gnT ,2 (θ )ΣnT ,2 gnT ,2 (θ ) = T −1 ∗′ ∗ ∗ ∗ −1 ∗ t =1 Vnt (θ )Mnt Vnt (θ ). Because Jn Vnt (θ ) = Jn dnt (θ )+Jn Sn (λ)Sn Vnt ∗ ∗ ∗ ′

with dnt (θ ) = Lnt (λ0 − λ) + Znt (δ0 − δ), we have T −1 

1

n(T − 1) t =1

=

T −1 

1

n(T − 1) t =1

+

+



Vnt Sn Sn (λ)Mnt Sn (λ)Sn Vnt

n(T − 1) t =1 T −1 

2

H′n,T −1 Jn,T −1 V∗n,T −1

we have 1 −1 gnT (θ0 ) D′nT ΣnT = −√ √ n(T − 1) n(T − 1)



n(T − 1) t =1

∗′ ′−1 ′

−1



∗ d∗′ nt (θ )Mnt dnt (θ )

dnt (θ )Mnt Sn (λ)Sn Vnt . ∗′

−1

From Lemma 7, we have n(T1−1) 

 T −1 t =1



∗′ ′−1 ′ ∗ Vnt Sn Sn (λ)Mnt Sn (λ)Sn−1 Vnt

T −1  p K → 0 under n(tT=−1 1)t → 0. From Lemma 8, we have n(T1−1) Tt =−11 cTt ∗ p V˜ n′ ,tT Bn Mnt Sn (λ)Sn−1 Vnt → 0 under n(TK−T 1) → 0 for any UB matrix T −1 ′ −1 Bn . Also, from Lemma 10(iii), we have n(T1−1) t =1 fnt Mnt Sn (λ)Sn   ∗ Vnt = Op √n(1T −1) . Thus, by (Wn Ynt∗ , Znt∗ ) = fnt + unt , T −1 T −1 ∗′ K 2 −1 ∗ p t =1 t t =1 dnt (θ )Mnt Sn (λ)Sn Vnt → 0 under n(T −1) → 0. Also, n(T −1)

T −1 

n(T − 1) t =1

Znt Mnt Vnt

t =1

where C∗pp =

µ4 −3σ04

−1 D′nT ΣnT DnT =

σ04

′ ωnm ,T ωnm,T + ∆mn,T and

1



n(T − 1)

+

1

 −1 b σ04 Cpb,′nT C∗pp Cp,nT

tion 9, we have

0kz ×p

1

σ n(T − 1) 2 0

1 n(T −1)

 T −1

t =1 T −1

p

Vnt (θ )Mnt Vnt (θ ) → plimn,T →∞ 6nT ,22 ∗′



Kt

n(T − 1)(θˆb,nT − θ0 )  −1 ′ ˆ ∂ gnT (θb,nT )/∂θ −1 ∂ gnT (θ¯nT )/∂θ ′ =− ΣnT n(T − 1) n(T − 1)

t =1

(35)

−1 −1 D′nT ΣnT DnT ) with plimn,T →∞ D′nT ΣnT DnT =

 limn,T →∞

1 Cb n(T − 1) p,nT 0kz ×p

0p×kz 0kz ×kz

 +

1

σ02

plimn,T →∞ Σ nT ,22 = 6b .

ˆ ∗nj,T −1 and Σ ˆ nT , the result holds When we use an estimated P similar to the proof in Appendix C.2.  Appendix D. Best quadratic moments



s = σ04 tr[(IT −1 ⊗ Jn Pn1 Jn )(IT −1 ⊗ Jn Pn2 Jn )]

+ (µ4 − 3σ04 )vec′D (IT −1 ⊗ Jn Pn1 Jn )vecD (IT −1 ⊗ Jn Pn2 Jn ) s = (T − 1)[σ04 tr(Jn Pn1 Jn Pn2 Jn )

n( T − 1 )

Hn,T −1 Jn,T −1 Wn,T −1 Y∗n,T −1

d

Cov(V∗n,T −1 (IT −1 ⊗ Jn Pn1 Jn )V∗n,T −1 · V∗n,T −1 (IT −1 ⊗ Jn Pn2 Jn )V∗n,T −1 )

1

σ02 Cpb,nT

(θ )

−1 √nT 0 → N (0, plimn,T →∞ over here and we obtain D′nT ΣnT n(T −1)





 (Wn Ynt∗ , Znt∗ )′ Mnt (Wn Ynt∗ , Znt∗ ).

This section derives the best quadratic moment matrix (15). For ′ the covariance of V∗n,T −1 (IT −1 ⊗ Jn Pnl Jn )V∗n,T −1 where tr(Pnl Jn ) = 0, from Lemma 2,

∂ g ′ (θˆb,nT )/∂θ −1 gnT (θ0 ) × nT ΣnT √ , n( T − 1 ) n(T − 1) where θ¯nT lies between θˆb,nT and θ0 . By denoting

×



T −1

g

uniformly in θ under n(tT=−1 1) → 0. Therefore, by combining the identification uniqueness and uniform convergence, we obtain the consistency of GMME. As is derived, the best quadratic moment is to use P∗nj,T −1 in (15) for j = 1, . . . , p. From the Taylor expansion,



0p×kz 0kz ×kz

ˆ The second components of (37) and (38) correspond to hˆ and H of Appendix C.3. Thus, the analysis in Theorem 3 can be carried

ˆ has the limit equal to plimn,T →∞ 6nT ,22 in Assumpwhere H

DnT = −

(37)

(38) ∗ ′ ′ d∗′ nt (θ )Mnt dnt (θ ) = ((λ0 − λ) , (δ0 − δ) )

× Hˆ ((λ0 − λ)′ , (δ0 − δ)′ )′



 ∗′  Vn,T −1 Jn,T −1 P∗n1,T −1 Jn,T −1 V∗n,T −1  −1   .. C∗pp   .  ∗′ ∗ ∗  Vn,T −1 Jn,T −1 Pnp,T −1 Jn,T −1 Vn,T −1  

  T −1 ∗ ′ ∗ ( W Y ) M V n1 nt nt nt    t =1    ..     . 1  T −1  −√   , ∗ ′ ∗ n(T − 1)σ02   (Wnp Ynt ) Mnt Vnt   t =1   T −1    ∗′  ∗

as

1



0kz ×1

T −1 

1

Vn,T −1 Jn,T −1 Pn1,T −1 Jn,T −1 Vn,T −1

  .. gnT (θ0 ) 1   . = √ √  ∗′ , n(T − 1) n(T − 1) Vn,T −1 Jn,T −1 P∗np,T −1 Jn,T −1 V∗n,T −1 

 2 b σ C ×  0 p,nT 

Vnt (θ )Mnt Vnt (θ ) ∗′

∂ gnT (θˆnT ) 1 n(T −1) ∂θ ′

= DnT + op (1) when T is large by ˆ Lemma 1 and θb,nT − θ0 = op (1). Hence, (35) can be rewritten −1 ′ −1 gnT (θ0 )  √ −1 DnT ΣnT √n(T −1) + as n(T − 1)(θˆb,nT − θ0 ) = − D′nT ΣnT DnT op (1). By using ΣnT in (28), DnT in (36), and  ∗′  ∗ ∗ we have

+ (µ4 − 3σ04 )vec′D (Jn Pn1 Jn )vecD (Jn Pn2 Jn )]. 0m×kz H′n,T −1 Jn,T −1 Z∗n,T −1



,

(36)

By using Lemmas 11–14, the best quadratic matrix, which takes ∗ into account η4 , is Pnj in (15) for j = 1, . . . , p. This is shown

L.-f. Lee, J. Yu / Journal of Econometrics 180 (2014) 174–197

as follows. We choose Pnl for l ′ Cmp ,nT (

µ4 −3σ04 σ04

1, . . . , m to maximize

=

′ −1 ωnm ,T ωnm,T + ∆mn,T ) Cmp,nT in (14). By using

Lemma 13 for

µ4 −3σ04 σ04

′ ωnm ,T ωnm,T + ∆mn,T and using Lemma 14(i)

∗ for Cmp,nT , we can use the Cauchy–Schwarz inequality to obtain Pnj in (15) by Lemma 14(ii).

Lemma 11. Suppose tr(Pn Jn ) = 0, then diag[Jn diag(Jn Pn Jn )Jn ] = n −2 diag(Jn Pn Jn ). n Lemma 12. Suppose tr(Pn1 Jn ) = 0 and tr(Pn2 Jn ) = 0, then s s s tr(Jn Pn1 Jn · Jn Pn2 Jn ) = vec′ [Jn Pn1 Jn − Jn diag(Jn Pn1 Jn )Jn ]

·vec[Jn Pn2 Jn − Jn diag(Jn Pn2 Jn )Jn ]   n+2 +2 vec′D (Jn Pn1 Jn )vecD (Jn Pn2 Jn ). n

Lemma 13. There exists a scalar αn such that s tr(Jn Pn1 Jn · Jn Pn2 Jn ) + (η4 − 3)vec′D (Jn Pn1 Jn )vecD (Jn Pn2 Jn )

=

1

s s vec′ [Jn Pn1 Jn + (αn − 1)Jn diag(Jn Pn1 Jn )Jn ] 2  s s ·vec[Jn Pn2 Jn + (αn − 1)Jn diag(Jn Pn2 Jn )Jn ] ,

2 )α 2 + 4n α = where αn solves the quadratic equation ( n− n for the unknown α .

η4 −3 2

+ n+n 2

Lemma 14. (i) There exists a diagonal matrix An with tr(An ) = 0 such that



tr(Jn Pns Jn Gn Jn ) = tr [Jn Pns Jn + (αn − 1)Jn diag(Jn Pns Jn )Jn ]

   tr(Gn Jn ) In + An Jn , ·Jn Gn − n−1 n(1−α )

where An = 2+(n−2n)α [diag(Jn Gn Jn ) − n

(ii) Let Pn = (Gn − ∗

tr(Gn Jn ) J n −1 n

)+

tr(Gn Jn ) In n

tr(Gn Jn ) In n



n n−2

2

]. 

1

η −3 n + 42 n−2



1



n n−2

[diag(Jn Gn Jn ) − ], which has tr(Pn Jn ) = 0. Then, Jn Pn Jn + ∗ (αn − 1)Jn diag(Jn Pn Jn )Jn = Jn (Gn − tr(nG−n1Jn ) Jn + An )Jn . ∗



Appendix E. Supplementary data Supplementary material related to this article can be found online at http://dx.doi.org/10.1016/j.jeconom.2014.03.003. References Alvarez, J., Arellano, M., 2003. The time series and cross-section asymptotics of dynamic panel data estimators. Econometrica 71, 1121–1159. Anderson, T.W., Hsiao, C., 1981. Estimation of dynamic models with error components. J. Amer. Statist. Assoc. 76, 598–606. Arellano, M., Bond, O., 1991. Some tests of specification for panel data: Monte Carlo evidence and an application to employment equations. Rev. Econom. Stud. 58, 277–297.

197

Arellano, M., Bover, O., 1995. Another look at the instrumental-variable estimation of error-components models. J. Econometrics 68, 29–51. Baltagi, B., Song, S.H., Jung, B.C., Koh, W., 2007. Testing for serial correlation, spatial autocorrelation and random effects using panel data. J. Econometrics 140, 5–51. Bekker, P.A., 1994. Alternative approximations to the distributions of instrumental variable estimators. Econometrica 62, 657–681. Bell, K.P., Bockstael, N.E., 2000. Applying the generalized-moments estimation approach to spatial problems involving microlevel data. Rev. Econom. Statist. 82, 72–82. Bhargava, A., Sargan, J.D., 1983. Estimating dynamic random effects models from panel data covering short time periods. Econometrica 51, 1635–1659. Blundell, R., Bond, S., 1998. Initial conditions and moment restrictions in dynamic panel data models. J. Econometrics 87, 115–143. Bonacich, P., 1987. Power and centrality: a family of measures. Amer. J. Sociol. 92, 1170–1182. Bun, M.J.G., Kiviet, J.F., 2006. The effects of dynamic feedbacks and LS and MM estimator accuracy in panel data models. J. Econometrics 132, 409–444. Chao, J.C., Swanson, N.R., 2005. Consistent estimation with a large number of weak instruments. Econometrica 73, 1673–1692. Donald, C.G., Newey, W.K., 2001. Choosing the number of instruments. Econometrica 69, 1161–1191. Elhorst, J.P., 2010. Dynamic panels with endogenous interaction effects when T is small. Reg. Sci. Urban Econom. 40, 272–282. Elhorst, J.P., Lacombe, D.J., Piras, G., 2012. On model specification and parameter space definitions in higher order spatial econometrics models. Reg. Sci. Urban Econom. 42, 211–220. Han, C., Phillips, P.C.B., 2006. GMM with many moment conditions. Econometrica 74, 147–192. Hahn, J., Kuersteiner, G., 2002. Asymptotically unbiased inference for a dynamic panel model with fixed effects when both n and T are Large. Econometrica 70, 1639–1657. Hansen, L.P., 1982. Large sample properties of generalized method of moments estimators. Econometrica 50, 1029–1054. Hepple, L., 1995. Bayesian techniques in spatial and network econometrics: 2. Computational methods and algorithms. Environ. Planning A 27, 615–644. Hsiao, C., Pesaran, H., Tahmiscioglu, K., 2002. Maximum likelihood estimation of fixed effects dynamic panel data models covering short time periods. J. Econometrics 109, 107–150. Kapoor, M., Kelejian, H.H., Prucha, I.R., 2007. Panel data models with spatially correlated error components. J. Econometrics 140, 97–130. Kelejian, H.H., Prucha, I.R., 1998. A generalized spatial two-stage least squares procedure for estimating a spatial autoregressive model with autoregressive disturbance. J. Real Estate Finance Econom. 17 (1), 99–121. Kelejian, H.H., Prucha, I.R., 2001. On the asymptotic distribution of the Moran I test statistic with applications. J. Econometrics 104, 219–257. Kelejian, H.H., Prucha, I.R., 2010. Specification and estimation of spatial autoregressive models with autoregressive and heteroskedastic disturbances. J. Econometrics 157, 53–67. Korniotis, G.M., 2008. Estimating panel models with internal and external habit formation. J. Bus. Econom. Statist. 28, 145–158. Lee, L.F., 2007. GMM and 2SLS estimation of mixed regressive, spatial autoregressive models. J. Econometrics 137, 489–514. Lee, L.F., 2004. Asymptotic distributions of quasi-maximum likelihood estimators for spatial econometric models. Econometrica 72, 1899–1925. Lee, L.F., Liu, X., 2010. Efficient GMM estimation of high order spatial autoregressive models with autoregressive disturbances. Econometric Theory 26, 187–230. Lee, L.F., Yu, J., 2010. A spatial dynamic panel data model with both time and individual fixed effects. Econometric Theory 26, 564–597. Liu, X., Lee, L., Bollinger, C., 2010. An efficient GMM estimator of spatial autoregressive models. J. Econometrics 159, 303–319. Moran, P., 1950. Notes on continuous stochastic phenomena. Biometrika 37, 17–23. Neyman, J., Scott, E., 1948. Consistent estimates based on partially consistent observations. Econometrica 16, 1–32. Nickell, S.J., 1981. Biases in dynamic models with fixed effects. Econometrica 49, 1417–1426. Ord, J.K., 1975. Estimation methods for models of spatial interaction. J. Amer. Statist. Assoc. 70, 120–297. Su, L., Yang, Z., 2007. QML estimation of dynamic panel data models with spatial errors. Manuscript, Beijing University and Singapore Management University. Yu, J., de Jong, R., Lee, L.F., 2008. Quasi-maximum likelihood estimators for spatial dynamic panel data with fixed effects when both n and T are large. J. Econometrics 146, 118–134. Yu, J., de Jong, R., Lee, L.F., 2012. Estimation for spatial dynamic panel data with fixed effects: the case of spatial cointegration. J. Econometrics 167, 16–37. Yu, J., Lee, L.F., 2010. Estimation of unit root spatial dynamic data models. Econometric. Theory 26, 1332–1362.