Accepted Manuscript Fixed-effects dynamic spatial panel data models and impulse response analysis Kunpeng Li
PII: DOI: Reference:
S0304-4076(17)30016-7 http://dx.doi.org/10.1016/j.jeconom.2017.02.001 ECONOM 4344
To appear in:
Journal of Econometrics
Received date : 3 December 2015 Revised date : 30 January 2017 Accepted date : 4 February 2017 Please cite this article as: Li, K., Fixed-effects dynamic spatial panel data models and impulse response analysis. Journal of Econometrics (2017), http://dx.doi.org/10.1016/j.jeconom.2017.02.001 This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
Fixed-effects dynamic spatial panel data models and impulse response analysis∗ Kunpeng Li International School of Economics and Management Capital University of Economics and Business Beijing, China February 5, 2017
Abstract Real data often have complicated correlations over cross section and time. Such correlations are of particular interests in empirical studies. This paper considers using high order spatial lags and high order time lags to model complicated correlations over cross section and time. We propose to use the quasi maximum likelihood (QML) method to estimate the model. We establish the asymptotic theory of the quasi maximum likelihood estimator (QMLE), including the consistency and limiting distribution, under large N and large T setup, where N denotes the number of individuals and T the number of time periods. We investigate the problem of estimating impulse response functions and the associated (1 − α)-confidence intervals. Average direct, indirect and total impacts are defined along the same spirits of LeSage and Pace (2009) under the dynamic spatial panel data setup. The estimation and inferential theory for the three impacts are studied. Model selection issue is also considered. Monte Carlo simulations confirm our theoretical results and show that the QMLE after bias correction has good finite sample performance. Key Words: Dynamic spatial models; Panel data models; Quasi maximum likelihood estimation; Impulse response analysis; Confidence intervals; Model selection. JEL: C31; C33.
∗ The
author would like to thank the co-editor Jianqing Fan, an associate editor and four anonymous referees for their critical/constructive comments, which greatly improve the quality of this paper. The author is deeply indebted to Professor Qi Li for his constant encouragement, stimulating comments and valuable supports on this research. Financial supports from NSFC No.71571122 and No.71201031 are gratefully acknowledged. All errors are mine.
1 Introduction Real data often have complicated correlations over cross section and time. Econometricians and statisticians have developed a large body of tools to deal with these correlations. In econometric/statistic literature, correlations over time are typically dealt with by autoregressive models or moving average models, among other models (e.g., Brockwell and Davis (1991), Fuller (1996), etc). Correlations over cross section are typically captured by spatial models or factor models (e.g., Anselin (1988), Gupta and Robinson (2015), Bai and Li (2015, 2016), Fan et al. (2011), etc). This paper considers using a dynamic spatial autoregressive panel data model to deal with the correlations over cross section and time. We allow multiple spatial lags and multiple time lags to accommodate possibly complicated correlation structure in real data. Spatial econometrics is one of the most active fields in econometrics and has received considerable attention over the last three decades. Early work on spatial econometrics dates back to Cliff and Ord (1973) who first propose spatial autoregressive (SAR) models. Due to the presence of endogenous spatial term, the ordinary least squares (OLS) method is inapplicable. Various methods have been proposed to address this issue. Kelejian and Prucha (1998) use the instrumental variable (IV) method to estimate SAR models. Kelejian and Prucha (1999) extend IV analysis to the generalized method of moments (GMM) framework. Besides GMM, the quasi maximum likelihood (QML) method is also popular in spatial econometric literature. Lee (2004) gives a rigorous analysis on the asymptotic properties of quasi maximum likelihood estimators (QMLE). With availability of panel data, recent studies pay much attention on spatial panel data models. Baltagi et al. (2007) consider the testing on various combinations of spatial correlation, serial correlation and random effects in static panel data setup. Kapoor et al. (2007) use the GMM method to estimate panel data models with spatially correlated error components. Yu et al. (2008) suggest using QML methods to estimate dynamic spatial panel models. Lee and Yu (2014) apply GMM estimation methods on dynamic spatial panel models with multiple spatial lags. Lee and Yu (2010) consider spatial autoregressive panel models with spatial autoregressive disturbances. The current study is also related to the large volume of literature on estimation and inference of dynamic panel data models. Models with presence of heterogeneous timeinvariant intercepts (fixed-effect) suffer the so-called “incidental parameters problem” (Neyman and Scott (1948)), which is a primary concern in the studies on dynamic panel models. Under the sphere error assumption, it can be shown that the QMLE is equivalent to the within group estimator. So we use the QMLE and the within-group estimator interchangeably. Under large-N and fixed-T setup, where N denotes the number of individuals and T the number of time periods, Nickell (1981) shows that the within group estimator is inconsistent, see also Hsiao (1986) and Kiviet (1995). A well-adopted method to this issue, proposed by Anderson and Hsiao (1981), is taking difference on data over time to eliminate the fixed effects and use two-periods lagged dependent variable as instrument to estimate the model. Arellano and Bond (1991) observe that all the overtwo-periods lagged dependent variables are valid instruments and extend the Anderson and Hsiao’s idea to GMM framework. Blundell and Bond (1998) propose a system GMM method which includes the moment conditions of both level and first differences. All these studies deliver consistent estimation results under fixed-T at the cost of assuming temporal uncorrelatedness of errors. When T is large or moderately large, the number of the moment conditions increases dramatically, making the GMM method suffer the so1
called “many moments bias” problem. In addition, the increasing computational burden of the GMM method as T grows make it unattractive to practioners. Under large-N and large-T setup, the within-group estimator regains its appealing properties such as consistency, computational simplicity and insensitivity on the temporal correlations of errors. Hahn and Kuersteiner (2002) show that the within-group estimator has a O( T1 ) bias under large-N and large-T. After bias correction, the corrected estimator achieves the efficiency bound under normality error assumption. Alvarez and Arellano (2003) investigate the asymptotic properties of within-group estimator and GMM estimator under large-N and large-T. In this paper, we use the QML method to estimate a dynamic spatial panel data model under the large-N and large-T setup¬ . Since the model considered here can be regarded as an extension of a usual dynamic panel data model, the incidental parameters problem is inherited and the QMLE has a non-negligible bias. Following Hahn and Kuersteiner (2002), we conduct bias correction to the QMLE by making it properly centered at zero. An alternative approach is to use a GMM method. Although GMM method has several desirable features under fixed-T (Lee and Yu (2014)), its advantages under large-T are not obvious. In fact, according to the simulation results in Lee and Yu (2014), the performance of the QMLE after bias correction is comparable or even superior to that of GMM estimators even when T is not large. This paper contributes to the spatial econometric literature in several dimensions. First, we extend the model with one spatial lag and one time lag in Yu et al. (2008) and the model with multiple spatial lags and one time lag in Lee and Yu (2014) to a general model with multiple spatial lags and multiple time lags. Inclusion of multiple time lags is important for many reasons. It makes the model flexible to capture complicated temporal correlations in real data. It can generate different shaped impulse response functions (instead of a simple monotonous exponential-decay shape as in the case with one time lag) and offer opportunities to identify interesting economic phenomena from the data. Second, we consider the inference problem of impulse response functions of a dynamic spatial panel model. Impulse response analysis is one of primary tools in macroeconomic studies and impulse response analysis in spatial econometric setup has been considered in a number of studies, e.g., Beenstock and Felsenstein (2007), Brady (2011), Holly et al. (2011), Lee and Yu (2015), etc. However, all these studies focus on the applied aspect. To our knowledge, the asymptotic properties of estimated impulse response functions have not been studied yet. This paper intends to fill this gap. Third, we consider the model selection issue concomitant with dynamic spatial panel models. We show that if a penalty function satisfies some regularity conditions, the information criterion proposed in this paper can consistently estimate the underlying model. This enables one to remove irrelevant endogenous, predetermined or exogenous regressors from a model to reduce the number of parameters being estimated. The rest of the paper is organized as follows. Section 2 gives a detailed description on the dynamic spatial panel data model considered in this paper. Section 3 presents a ¬ Large-N
and Large-T setup is implicitly required throughout this paper. On one hand, the asymptotics of impulse response function in Section 6 depends critically on large-T. On the other hand, spatial econometrics inherently requires N to be large. We also show that, under some general conditions, the proposed information criterion is a consistent estimator for the Kullback discrepancy for the examined model. If no available models are correctly specified, we can still use the proposed information criterion to choose the best one which has the minimum Kullback-Leibler distance to the true one.
2
quasi likelihood function and defines the QMLE. Section 4 lists the assumptions needed for the theoretical analysis. Section 5 presents asymptotic properties of the QMLE including consistency and limiting distribution. Estimation of the limiting variance is also discussed. Section 6 considers estimation of impulse response functions and the associated (1 − α)- confidence intervals. Section 7 considers the model selection problem concomitant with the original model. Section 8 conducts Monte Carlo simulations to examine the validity of the theoretical results and investigate finite sample properties of the QMLE as well as estimated impulse response functions and confidence intervals. Section 9 concludes the paper. Technical proofs are presented in appendix A with more detailed derivations presented at several supplementary appendices. Throughout the paper, a ∨ b and a ∧ b denote the p respective maximum and minimum of a and b. For any N × N matrix M, k M k = tr( M0 M ) is the Frobenius norm, k Mk1 = max1≤ j≤ N ∑iN=1 |mij | the column sum norm, k M k∞ = max1≤i≤ N ∑ N j=1 | mij | the row sum norm, where mij is the (i, j)th element of M.
2
Model description
Consider the following dynamic spatial panel data model p
Yt = µ +
∑
m =1
q
$m Wm Yt +
∑
n =1
p
ρn Yt−n +
q
∑ ∑ γmn Wm Yt−n + Xt β + et ,
(2.1)
m =1 n =1
where Yt = (Y1t , Y2t , . . . , YNt )0 and et = (e1t , e2t , . . . , e Nt )0 are both N-dimensional column vectors, Yt−n is the lagged dependent variable which is defined similar to Yt , Xt is an N × k data matrix of k-dimensional exogenous explanatory variables, µ is an N × 1 unitdependent time-invariant intercept, W1 , W2 , . . . , Wp are all N × N exogenously spatial weights matrices whose properties will be prescribed below. Here we assume that eit is independent and identically distributed (i.i.d) over i and t (detailed assumption on eit is given in Assumption A below). In model (2.1), we allow multiple spatial lags, multiple time lags and mixed spatial and time lags. These considerations make model (2.1) flexible enough to accommodate complicated correlations over cross section and time in data. The spatial weights matrices Wm , in a precise sense, should be written as WNm since they specify spatial correlations across N units. As N changes, the spatial weights matrices will change accordingly. However, we drop the N subscript for notational simplicity. High order spatial models have many potential applications. As a showcase example, suppose that we have N spatial units which can be classified into p group. Further suppose that spatial spillover effects only exist within groups. This is the case in some applications. For example, one would have peer effects on one’s classmates, but lit˜ m summarize the spatial interaction tle effects on the students in other classes. Let W ˜ 1, W ˜ 2, . . . , W ˜ p ) as the spastructure for group m (m = 1, . . . , p). If we use W = diag(W tial weights matrix to fit the data, we implicitly assume that the spatial spillover effects are the same across groups. In contrast, if we use the high order spatial model with ˜ m , . . . , 0) to fit the data, we allow group-dependent strengths of Wm = diag(0, 0, . . . , W spatial spillover effects. This feature might be of empirical relevance, especially when the spatial spillover effects are the target of interests. High order spatial models have been applied in a variety of setups, see Lacombe (2004), McMillen et al. (2007), among others. A direct application of current model can be found in Gao, Li and Yang (2017), which investigates the determinant factors of home prices of 68 cities in China. 3
Throughout the paper, symbols with asterisk denote underlying true values. Define p
D($) = IN −
∑
m =1
p
Rn (Φ) = ρn IN +
$m Wm ,
∑
m =1
γmn Wm
(2.2)
for n = 1, 2, . . . , q, where Φ = (ρ0 , vec(γ)0 , β0 )0 , ρ = (ρ1 , ρ2 , . . . , ρq )0 and γ = (γmn ) p×q which is the p × q coefficients matrix. Similarly, define
D ∗ = IN −
p
∑
m =1
$∗m Wm ,
R∗n = ρ∗n IN +
p
∑
m =1
∗ γmn Wm .
(2.3)
Then the underlying model is equivalent to
D ∗ Yt = µ∗ + R1∗ Yt−1 + R2∗ Yt−2 + · · · + R∗q Yt−q + Xt β∗ + et . Let L be the lag operator. The preceding display can be written in term of L as
(D ∗ − R1∗ L − · · · − R∗q Lq )Yt = µ∗ + Xt β∗ + et .
(2.4)
Let R∗ ( L) = D ∗ − R1∗ L − · · · − R∗q Lq . If R∗ ( L) is invertible® , then Yt = R∗ ( L)−1 µ∗ + R∗ ( L)−1 Xt β∗ + R∗ ( L)−1 et
= D∗−1 µ∗ +
∞
∑
v =0
Bv∗ Xt−v β∗ +
∞
∑ Bv∗ et−v
(2.5)
v =0
with D∗ = D ∗ − R1∗ − · · · − R∗q . The matrix Bτ∗ is recursively determined by Bτ∗ = D ∗−1
q
∑ R∗v Bτ∗−v
(2.6)
v =1
with B0∗ = D ∗−1 and Bτ∗ = 0 if τ < 0. The moving average expression (2.5) is particularly convenient for the subsequent analysis, especially for the derivation of asymptotic representation. It is also important for impulse response analysis.
3
Likelihood function
We first introduce some notation to simplify the presentation. Let r = pq + q + k be the dimension of Φ, where Φ is defined below (2.2). Define an N × r matrix Xt as h i Xt = Yt−1 , . . . , Yt−q , W1 Yt−1 , . . . , Wp Yt−1 , . . . , W1 Yt−q , . . . , Wp Yt−q , Xt . Then model (2.1) can be more compactly written as p
Yt = µ +
∑
m =1
$m Wm Yt + Xt Φ + et .
® A sufficient condition for invertibility of R∗ ( L ) is that the roots of the characteristic function |D ∗ − · · · − R∗q−1 x q−1 − R∗q x q | = 0 are all outside the unit circle.
4
Let θ = ($0 , Φ0 , σ2 )0 with $ = ($1 , $2 , . . . , $ p )0 . By assuming that eit is independent and normally distributed over i and t with zero mean and variance σ2 , the log likelihood function is given by 1 1 L∗ (θ, µ) = − ln σ2 − ¯ 2 2 2N Tσ
T
1
∑ Zt (µ, $, Φ)0 Zt (µ, $, Φ) + N ln |D($)|,
(3.1)
t=q¯
where T¯ = T − q, q¯ = q + 1 and Zt (µ, $, Φ) = D($)Yt − µ − Xt Φ.
D($) = IN − $1 W1 − $2 W2 − · · · − $ p Wp ,
(3.2)
Given $, Φ, σ2 , it is easy to see that µ maximizes the log likelihood function at h1 µ = D($) ¯ T
T
∑ Yt
t=q¯
i
h1 − ¯ T
T
∑ Xt
t=q¯
i
Φ.
Substituting the preceding formula into (3.1), we can concentrate out the nuisance parameter µ. The resultant concentrated log likelihood function is then 1 1 L(θ ) = − ln σ2 − ¯ 2 2 2N Tσ with
T
1
∑ Z˙ t ($, Φ)0 Z˙ t ($, Φ) + N ln |D($)|
(3.3)
t=q¯
h 1 T i h Z˙ t ($, Φ) = D($) Yt − ¯ ∑ Yt − Xt − T t=q¯ | {z } | Y˙ t
i 1 T X Φ = D($)Y˙ t − X˙ t Φ, t ∑ T¯ t=q¯ {z }
(3.4)
X˙ t
where Y˙ t and X˙ t are implicitly defined in (3.4). Throughout the paper, we use υ˙ t to denote υt − T¯ −1 ∑sT=q¯ υs for any vector or matrix υt . Implementing the QML estimation is an important issue in practice. In appendix F, we give some discussions on this issue. We point out that the parameters Φ and σ2 can be concentrated out. As a result, the optimization only need to be implemented on $. To avoid repeated calculations of some known matrices in iterations, we rewrite the concentrated likelihood function to separate the free parameters $ and data-dependent matrices. We also discuss the method to avoid the calculation of |D($)| if N is extremely large.
4
Assumptions
For the subsequent asymptotic analysis, we make the following assumptions. Hereafter we use C to denote a generic constant. Assumption A The errors eit (i = 1, 2, . . . , N; t = . . . , −1, 0, 1, . . . , T ) are identically and independently distributed with mean zero and variance σ∗2 > 0. In addition, we assume that E(|eit |4+c ) < ∞ for all i and t with some c > 0. p
Assumption B Let D($) = IN − ∑m=1 $m Wm . We assume that D($) is invertible for all $ ∈ R$ , where R$ is a compact subset of R p and $∗ is an interior point of R$ . 5
Assumption C Wm (m = 1, 2, . . . , p) is an exogenous spatial weights matrix whose diagonal elements are all zeros. In addition, Wm is bounded by some constant C for all N under k · k1 and k · k∞ norms. Moreover, D($)−1 is bounded by some constant C for all N under k · k1 and k · k∞ norms uniformly on R$ . Assumption D The underlying true values θ ∗ = ($∗0 , Φ∗0 , σ∗2 )0 are in a compact set. In addition, $ is estimated in the compact set R$ . Assumption E The roots of the characteristic function |D ∗ − · · · − R∗q−1 x q−1 − R∗q x q | = 0 ∞ ∗ ∗ are all outside the unit circle. In addition, the sums ∑∞ v=0 k Bv k1 and ∑v=0 k Bv k∞ are bounded by some constant C, where Bv∗ is defined in (2.6). Assumption F The elements of Xt are nonrandom and bounded for t = . . . , −1, 0, 1, . . . . In addition, the matrix Ξ = lim N1T¯ ∑tT=q¯ E(Xet0 Xet ) is strictly positive definite with the N,T →∞
largest eigenvalue bounded by some constant C, where h i et−1 , . . . , Y et−q , W1 Y et−1 , . . . , Wp Y et−1 , . . . , W1 Y et−q , . . . , Wp Y et−q , X˙ t Xet = Y ∞
∞
v =0
v =0
(4.1)
et = ∑ Bv∗ X˙ t−v β∗ + ∑ Bv∗ et−v . with Y Assumption G
One of the following assumptions holds:
G.1 For all $ ∈ R$ and σ2 > 0 and ($, σ2 ) 6= ($∗ , σ∗2 ), lim inf N →∞
σ ∗2 i 1 1 h σ ∗2 1 0 0 ln 2 F ($) F ($) − tr 2 F ($) F ($) + >0 2N σ 2N σ 2
where F ($) = D($)D ∗−1 . G.2 The matrix ( lim
N,T →∞
T T −1 T i 1 h T 0 0 e 0 e 0 e e E (V V ) − E (V X ) E ( X X ) E ( X V ) t t t t ∑ t ∑ ∑ t t t N T¯ t∑ =q¯ t=q¯ t=q¯ t=q¯
)
∗ = W D ∗−1 is positive definite, where Vt = [ G1∗ Xet Φ∗ , G2∗ Xet Φ∗ , . . . , G ∗p Xet Φ∗ ] with Gm m for m = 1, 2, . . . , p.
Assumption A requires that disturbances are drawn from a random sample. Similar assumption appears in a number of studies on QML estimations of spatial models, see Lee (2004), Yu et al. (2008) and Lee and Yu (2010). Assumption B imposes invertibility of D($). Invertibility of D($∗ ) is necessary for the model to be well defined. Once D($∗ ) is invertible, we can always find a neighborhood of $∗ in which D($) is invertible since |D($)| is a continuous function for $. This invertibility, however, is a local property. We assume that such a local property can be extended to be a global property. This gives rise
6
to Assumption B¯ . Assumption C imposes restrictions on spatial weights matrices. This assumption is standard in the spatial econometric literature, e.g., Kelejian and Prucha (1998, 1999), Lee (2004), etc. Exogeneity assumption of spatial weights matrices simplifies theoretical analysis but it also rules out some interesting applications, see Qu and Lee (2015) for a recent development on allowing for endogenous spatial weights matrices. Assumption D assumes that $ is estimated in a compact set. It can be shown that the likelihood function can be eventually reduced to a nonlinear function with respect to $. But once $ is given, the model is a linear regression model for other parameters. When dealing with a nonlinear objective function, the assumption of compact parameters space is usually needed, see, for example, Jennrich (1969). This is the reason that $ is estimated in a compact subset but other parameters are free of such a restriction. Newey and McFadden (1994) discuss cases that compact parameters space assumption can be dropped. A sufficient condition is that the objective function is globally concave. However, verifying the concavity is not an easy task except for some special cases. Assumption E imposes restrictions on the coefficients of the moving average representation (2.5), which corresponds to Assumption 6 in Yu et al. (2008). This assumption can be viewed as an extension of absolute summability condition in time series literature. Under this assumption, the stochastic parts of Yt−n and Wm Yt−n (m = 1, . . . , p; n = 0, . . . , q) are stationary, an essential property for the final limiting distribution of the QMLE. An implication of Assumption E is that
(D ∗ − R1∗ − R2∗ − · · · − R∗q )−1 −
∞
∑ Bv∗ = 0.
(4.2)
v =0
To see this, notice that for any d sufficiently large,
(D ∗ − R1∗ − R2∗ − · · · − R∗q )
d
∑ Bv∗ = IN − zd
v =0
with zd = ( R1∗ Bd∗ + · · · + R∗q Bd∗−q+1 ) + ( R2∗ Bd + · · · + R∗q Bd∗−q+2 ) + · · · + R∗q Bd∗ . However, we see that zd is bounded by
kzd k1 ≤ (k R1∗ k1 + · · · + k R∗q k1 )
∞
∑
v = d − q +1
k Bv∗ k1 .
Letting d → ∞, the second factor converges to 0 by Assumption E. The first factor is bounded by Assumption D. Thus we have kzd k1 → 0 for any N, which implies (4.2) by first letting d → ∞ then N → ∞. A sufficient condition to guarantee Assumption E is
p
∑
m =1
|$∗m | +
q
∑
n =1
|ρ∗n | +
p
q
∑ ∑
m =1 n =1
∗ |γmn |
max (kWm k1 ∨ kWm k∞ ∨ 1) < 1
1≤ m ≤ p
¯ To make p Σm=1 $m Wm ,
(4.3)
Assumption B hold, a sufficient condition is that for those positive and real eigenvalues of they must lie in [0, 1). This condition has been imposed in a variety of ways by different studies under the model with one spatial lag, see Kelejian and Prucha (1999), LeSage and Pace (2009), among others. Along this line, Elhorst et al. (2012) make an extension to the model with two spatial lags. They show p that the widely-used condition [max1≤m≤ p kWm k∞ ] · Σm=1 |$m | < 1 is too narrow to define the parameters space. Although these results are interesting, imposing such a condition to guarantee invertibility of IN − p p Σm=1 $m Wm seems too restrictive. In fact, as long as the eigenvalues of Σm=1 $m Wm are not equal to 1, the model is well-defined.
7
Appendix D gives a proof of the above assertion. Note that if the parameters space is specified to satisfy condition (4.3), we have
p
k
∑
m =1
$ p Wm k1 ∨ k
p
∑
m =1
$ p Wm k∞ ≤
p
∑
m =1
|$ p |(kWm k1 ∨ kWm k∞ ) < 1.
Given this, Assumption B holds by the fact that I − A is invertible if k Ak1 < 1 or k Ak∞ < 1. We note that, for any N × N matrix M, the conditions k M k1 < ∞ and k Mk∞ < ∞ are only restrictive when N → ∞. For finite N, they are always satisfied. So the restrictions k Mk1 < ∞ and k Mk∞ < ∞ impose nothing for finite N. The same issue also appears in verifying Assumption E with finite N. Unless we know the rule on how Wm s evolve as N grows, we cannot formulate model (2.1) with infinite N. If we can only formulate model (2.1) with finite N, which is the typical case in applications, Assumption E is no more than the restriction that Bτ∗ is element-by-element absolutely summable over τ. √ Note that for any N × N matrix M, we have (k Mk1 ∨ k Mk∞ ) ≤ N k Mk2 , where k Mk2 denotes the square-root of the largest eigenvalue of the matrix M0 M (see Problem 4.67 in ∗ Seber (2008)). Given this result, it suffices to check the condition ∑∞ v=0 k Bv k2 < ∞ under ∗ ∗ q − 1 finite N. If the roots of the characteristic function |D − · · · − Rq−1 x − R∗q x q | = 0 are ∗ all outside the unit circle, we have ∑∞ v=0 k Bv k2 < ∞ (see Hamilton (1994)). Note that ∗ ∗ D and Rn (n = 1, . . . , q) are unknown, we therefore adopt the routine in the vector autoregressive (VAR) analysis by first estimating D ∗ and R∗n and then checking whether the roots of the estimated characteristic function are all outside the unit circle. Assumption F requires that Xet is of full column rank. This condition is imposed for the identification of Φ. An undesirable feature of Assumption F is that restrictions are partly imposed on the predetermined regressors Wm Yt−n , which in fact is determined within the model. In Appendix E, we calculate the explicit expression of T 1 E(Xet0 Xet ) in terms of the underlying true parameters and exogenous regressor N T¯ ∑t=q¯ Xt . The assumption therefore can be made on this alternative expression. However, we prefer the assumption on the term N1T¯ ∑tT=q¯ E(Xet0 Xet ) since it is simple and it conforms with the usual identification condition in regression models. Assumption G is an identification condition for $ and σ2 . Consider the spatial model ∗
Yt = µ +
p
∑
m =1
$∗m Wm Yt + Xt Φ∗ + et
which can be written as Yt = D($∗ )−1 µ∗ + D($∗ )−1 Xt Φ∗ + D($∗ )−1 et . The three terms on the right-hand-side of the above equation all contain $∗ . Loosely speaking, we may use the second term to identify $∗ , which corresponds to Assumption G.2; or alternatively we may use the variance of the third term to identify $∗ , which corresponds to Assumption G.1. The first term is irrelevant for the identification of $ since µ∗ is a free parameter. Notice that when we use Assumption G.2 to identify $, we implicitly assume that Φ∗ 6= 0r×1 , otherwise the second term disappears. For this reason, Assumption G.2 is a local identification condition since it depends on other underlying parameters. In contrast, Assumption G.1 does not have such dependence and therefore is a global identification condition. When the third term is used to identify $, the error eit 8
cannot have cross-sectional correlations, otherwise the third term would have the same problem as the first one and $ cannot be identified. Our assumption for et (Assumption A) precludes this problem.
5
Asymptotic properties of the QMLE
We first define the QMLE. Let Θ be the parameters space, which is defined as Θ = R$ × Rr × Rσ2 , where r is the dimension of Φ. The parameters space given here is due to Assumption D and the associated discussion in Section 4. The QMLE is then defined as θˆ = argmax L(θ ), (5.1) θ ∈Θ
( $0 , Φ0 , σ2 )0 .
where L(θ ) is given in (3.3) with θ = In practice, it may be difficult to get the exact maximizer for L(θ ). We can also use the condition (T¯ = T − q) L(θˆ) ≥ sup L(θ ) − o p [max( N −1 T¯ −1 , T¯ −2 )] (5.2) θ ∈Θ
to define the QMLE. The following proposition shows that the QMLE is consistent.
Proposition 5.1 Let θˆ be defined by (5.1) or (5.2). Under Assumptions A-G, as N, T → ∞, we p have θˆ − → θ∗. To state the limiting distribution of the QMLE, we introduce some additional notation. Let φ be a p-dimensional vector defined as i0 1h φ= tr(W1 D∗−1 ) tr(W2 D∗−1 ) · · · tr(Wp D∗−1 ) , N
where D∗ = D ∗ − R1∗ − · · · − R∗q . Let ι q be a q-dimensional column vector with all its elements being 1. Further define a ( p + q + pq + k + 1)-dimensional column vector ∆ N as h 1 i0 ∆ N = φ0 N −1 tr(D∗−1 )ι0q (ι q ⊗ φ)0 0 , 1×k 2σ ∗2 1× p 1× pq 1× q
where ⊗ denotes the Kronecker product. Furthermore, define two matrices Ω0,NT and Ω1,NT as 0 ∑tT=q¯ E(Zet0 Zet ) ∑tT=q¯ E(Zet0 Xet ) p ×1 p× p p ×r 1 T T 0 0 0 (5.3) Ω0,NT = ¯ ∗2 ∑t=q¯ E(Xet Zet ) ∑t=q¯ E(Xet Xet ) r ×1 N Tσ r× p r ×r ¯ (2σ∗2 ) 0 0 N T/
tr[ G1∗ G1∗ ]
1× p
···
.. .. . . 1 tr[ G ∗ G ∗ ] · · · + p 1 N 0 ··· r ×1 tr( G1∗ )/σ∗2 · · ·
tr[ G1∗ G ∗p ]
.. . tr[ G ∗p G ∗p ] 0
r ×1 tr( G ∗p )/σ∗2
9
1×r
0
1×r
.. . 0
1×r
0
r ×r
0
1×r
tr( G1∗ )/σ∗2
.. . ∗ ∗ 2 tr( G p )/σ 0 r ×1 0
and
Ω1,N =
κ4∗ − 3σ∗4 Nσ∗4
tr( G1∗ ◦ G1∗ )
...
.. .. . . tr( G ∗p ◦ G1∗ ) ··· 0 ··· r ×1 tr( G1∗ )/(2σ∗2 ) · · ·
tr( G1∗ ◦ G ∗p )
0
1×r
.. . tr( G ∗p ◦ G ∗p )
.. . 0
1×r
0
0
r ×1
tr( G ∗p )/(2σ∗2 )
r ×r
0
1×r
tr( G1∗ )/(2σ∗2 )
.. . tr( G ∗p )/(2σ∗2 ) , 0 r ×1 N (4σ∗4 )−1
(5.4)
∗ = W D ∗−1 , Z et = (W1 Y et ), Xet is defined in Aset , W2 Y et , . . . , Wp Y where κ4∗ = E(eit4 ), Gm m sumption F and r = pq + q + k is the dimension of Φ. In addition, “◦” denotes the Hadamard product. We have the following theorem on the limiting distribution of the MLE.
Theorem 5.1 Under Assumptions A-G, as N, T → ∞, N/T 3 → 0, we have
√
1 d N T¯ θˆ − θ ∗ + ¯ Ω0−1 ∆ − → N 0, Ω0−1 (Ω0 + Ω1 )Ω0−1 , T
where ∆ = lim ∆ N , Ω0 = lim Ω0,NT and Ω1 = lim Ω1,N . N →∞
N,T →∞
N →∞
We can use a plug-in method to estimate the bias and the limiting variance in Theorem 5.1. Details are given in Appendix E. Remark 5.1 Consider the case p = 1 and q = 1, model (2.1) reduces to the one considered in Yu et al. (2008). Although the two studies use different expressions, the asymptotic results in Theorem 5.1 are consistent with those in Yu et al. (2008). First note that Ω1,N is equivalent to Ω0,NT in Yu et al. (2008)° . Furthermore, by W Y˙ t = G ∗ X˙ t Φ∗ + G ∗ e˙t , we can show 1 ¯ N Tσ∗2
T
∑ E(Yet0 W 0 W Yet ) =
t=q¯
=
T
1 ¯ N Tσ∗2
t=q¯
1 ¯ N Tσ∗2
∑ E(Φ∗0 X˙ t0 G∗0 G∗ X˙ t Φ) + N tr(G∗0 G∗ ) + o(1),
∑ E(Y˙ t0 W 0 W Y˙ t ) + o(1) T
1
t=q¯
and 1 ¯ ∗2 N Tσ
T
∑ E(Yet0 W 0 Xet ) =
t=q¯
1 ¯ ∗2 N Tσ
T
∑ E(Y˙ t0 W 0 X˙ t ) + o(1) =
t=q¯
1 ¯ ∗2 N Tσ
T
∑ E(Φ∗0 X˙ t0 G∗0 X˙ t ) + o(1).
t=q¯
Using the above result, it is easy to see that Ω0,NT in Theorem 5.1 is equivalent to Σ0,nT in Yu et al. (2008). As regards the bias formula, by ( I − A)−1 = I + A + A2 + . . . for any matrix A such that k Ak1 < 1 or k Ak∞ < 1, we can simplify the bias formula ϕn (θ0 ) in Yu et al. (2008) to ∆ N in Theorem 5.1. Given these results, we see that as expected, the asymptotic results in the two studies are equivalent. ° There
is a difference on the order of parameters in the two studies, which causes a slight difference on the locations of elements.
10
6 Impulse response analysis and the associated asymptotics The moving average representation Yt = D∗−1 µ∗ +
∞
∞
v =0
v =0
∑ Bv∗ Xt−v β∗ + ∑ Bv∗ et−v
(6.1)
is of particularly practical interest. Suppose that there is one unit increase in the jth spatial unit’s innovation at some time. The jth column of B0∗ identifies the immediate changes of all N dependent variables. The jth column of B1∗ identifies the changes of the N dependent variables after one period, and so on. If one records these changes period by period, the dynamic responses of all N variables due to one unit increase of the jth spatial unit’s innovation is obtained. Such exercises are known as impulse response analysis in the vector autoregressive (VAR) literature and are widely used in macroeconomic policy evaluations. In this section, we consider the estimation and inferential theory of the coefficients in the moving average representation (6.1). According to (6.1), the consequence of a one-unit increase in the jth spatial unit’s innovation at time s for the value of the ith dependent variable at time s + τ, holding all other innovations at all dates constants, is ∂Yi,s+τ ∗ = Bij,τ , ∂e j,s
(6.2)
∗ is the (i, j )th element of B∗ . Notice that the above expression does not depend where Bij,τ τ on s since the stochastic part of Yt is stationary. A consistent estimator of B∗ is Bˆ ij,τ ,
where Bˆ ij,τ is the (i, j)th element of Bˆ τ , which is recursively estimated by Bˆ τ = Dˆ −1
q
∑ Rˆ v Bˆ τ−v
p
∑
m =1
$ˆ m Wm ;
(6.3)
v =1
with Bˆ 0 = Dˆ −1 and Bˆ τ = 0 if τ < 0, where
Dˆ = IN −
ij,τ
Rˆ n = ρˆ n IN +
p
∑
m =1
γˆ mn Wm ,
(6.4)
for n = 1, . . . , q. Here $ˆ m , ρˆ n and γˆ mn are the QMLE. Besides the analysis on disturbance terms, we can also evaluate the effects of changes in covariate X on Y. More specifically, according to (6.1), the response of the ith dependent variable at time s + τ to a one-unit increase of the lth regressor of the jth spatial unit at time s is ∂Yi,s+τ ∗ = Bij,τ β∗l . (6.5) ∂X jl,s Again the above result does not depend on s. Similarly, we can use Bˆ ij,τ βˆ l to consistently ∗ β∗ . estimate Bij,τ l To investigate the asymptotic properties of the above estimators, we introduce more $ notation. Let ψτ,m be a sequence of N × N matrices, which are recursively defined by ψτ,m = D ∗−1 Wm Bτ∗ + D ∗−1 $
11
q
∑ R∗v ψτ−v,m ,
v =1
$
(6.6)
$
$
where ψ0,m = D ∗−1 Wm D ∗−1 and ψτ,m = 0 if τ < 0 for every m = 1, . . . , p. Similarly, let ρ ψτ,n be a sequence of N × N matrices, which are recursively defined by ρ ψτ,n
=
D ∗−1 Bτ∗−n
+D
∗−1
q
∑ R∗v ψτ−v,n ρ
(6.7)
v =1
ρ
γ
with ψτ,n = 0 if τ ≤ 0 for every n = 1, . . . , q and Bτ∗ = 0 for τ < 0. And define ψmn recursively by ψτ,mn = D ∗−1 Wm Bτ∗−n + D ∗−1 γ
q
∑ R∗v ψτ−v,mn , γ
(6.8)
v =1
γ
where ψτ,mn = 0 if τ ≤ 0 for every m = 1, . . . , p and n = 1, . . . , q and Bτ∗ = 0 if τ < 0. ij,$ $ ij,ρ Let ψτ,m be the (i, j)th element of ψτ,m and ψτ,n be defined similarly. In addition, let ij,γ ij,γ ij,γ ψτ = (ψτ,mn ) p×q be a p × q matrix made up by ψτ,mn and define ( p + q + pq + k + 1)dimensional column vectors ϑij,τ by h ij,$ ij,$ ij,ρ ij,ρ ij,γ ϑij,τ = ψτ,1 , . . . , ψτ,p , ψτ,1 , . . . , ψτ,q , vec(ψτ )0 ,
0
1×(k +1)
i0
.
(6.9)
Now we state asymptotic results on Bˆ ij,τ and Bˆ ij,τ βˆ l in the following theorem. Theorem 6.1 Under Assumptions A-G, when N, T → ∞, N/T 3 → 0, we have
√
and
√
1 0 d ∗ 0 N T¯ Bˆ ij,τ − Bij,τ + ¯ ϑij,τ Ω0−1 ∆ − → N 0, ϑij,τ Ω0−1 (Ω0 + Ω1 )Ω0−1 ϑij,τ , T
1 0 d ∗ 0 N T¯ Bˆ ij,τ βˆ l − Bij,τ β∗l + ¯ vij,lτ Ω0−1 ∆ − → N 0, vij,lτ Ω0−1 (Ω0 + Ω1 )Ω0−1 vij,lτ . T
∗ ν . Here ν is a ( p + q + pq + k + 1)-dimensional vector with its where vij,lτ = β∗l ϑij,τ + Bij,τ l l ( p + q + pq + l )th element being 1 and all other elements being 0. ∗ for Remark 6.1 Given the result in Theorem 6.1, the (1 − α)-confidence interval for Bij,τ τ = 0, 1, . . . , is given by " r 1 ˆ 0 ˆ −1 ˆ 1 ˆ 0 ˆ −1 ˆ ˆ 1 )Ω ˆ −1 ϑˆ ij,τ , ˆ ϑ Ω ( Ω0 + Ω Bij,τ + ¯ ϑij,τ Ω0 ∆ − zα/2 0 T N T¯ ij,τ 0 # r 1 1 0 ˆ −1 ˆ ˆ −1 ( Ω ˆ0+Ω ˆ 1 )Ω ˆ −1 ϑˆ ij,τ . Bˆ ij,τ + ¯ ϑˆ ij,τ Ω0 ∆ + zα/2 ϑˆ 0 Ω 0 T N T¯ ij,τ 0 ∗ β∗ is Similarly, the (1 − α)-confidence interval for Bij,τ l
"
1 0 ˆ −1 ˆ Bˆ ij,τ βˆ l + ¯ vˆ ij,lτ Ω0 ∆ − zα/2 T
r
1 0 ˆ −1 ˆ ˆ 1 )Ω ˆ −1 vˆ ij,lτ , vˆ Ω ( Ω0 + Ω 0 N T¯ ij,lτ 0 # r 1 1 0 ˆ −1 ∆ ˆ + zα/2 ˆ −1 ( Ω ˆ0+Ω ˆ 1 )Ω ˆ −1 vˆ ij,lτ , Bˆ ij,τ βˆ l + ¯ vˆ ij,lτ Ω vˆ 0 Ω 0 0 T N T¯ ij,lτ 0 12
ˆ 0, Ω ˆ1 where zα/2 is the critical value such that P(| N (0, 1)| > zα/2 ) = α. The symbols Ω ˆ and ∆ are the respective estimators for Ω0 , Ω1 and ∆, which are given in Appendix E. ϑˆ ij,τ and vˆ ij,lτ are the estimators for ϑij,τ and vij,lτ . The two estimators can be calculated recursively according to (6.6), (6.7) and (6.8) by replacing the underlying parameters with their QMLE. LeSage and Pace (2009) define the average direct impact (ADI), average indirect impact (AII) and average total impact (ATI) in a spatial autoregressive model. As regards the concrete definitions, we refer to Section 2.7 in LeSage and Pace (2009). Following the spirits of LeSage and Pace (2009), we define the three impacts in our dynamic spatial panel data model. More specifically, the three impacts due to the l-th exogenous regressor after τ periods are defined as ADIτ,l =
1 tr( Bτ∗ ) β∗l , N
AIIτ,l =
1 0 ∗ [ι B ι N − tr( Bτ∗ )] β∗l , N N τ
ATIτ,l =
1 0 ∗ ι B ι N β∗l . N N τ
(6.10)
where ι N is an N-dimensional vector with all its elements being 1± . The three impacts can be consistently estimated by d τ,l = 1 tr( Bˆ τ ) βˆ l , ADI N
c τ,l = 1 [ι0N Bˆ τ ι N − tr( Bˆ τ )] βˆ l , AII N
d τ,l = 1 ι0N Bˆ τ ι N βˆ l . ATI N
(6.11)
To present the asymptotic results for these three estimated impact, we introduce the following notation. Let vdτ , viτ and vtτ be defined as i0 1h $ $ ρ ρ γ γ vdτ = tr(ψτ,1 ), . . . , tr(ψτ,p ), tr(ψτ,1 ), . . . , tr(ψτ,q ), tr(ψτ,11 ), . . . , tr(ψτ,pq ), 0 , N 1×(k +1) i0 1h0 $ $ ρ ρ γ γ ι N ψτ,1 ι N , . . . , ι0N ψτ,p ι N , ι0N ψτ,1 ι N , . . . , ι0N ψτ,q ι N , ι0N ψτ,11 ι N , . . . , ι0N ψτ,pq ι N , 0 , vtτ = N 1×(k +1) viτ = vtτ − vdτ .
Furthermore, we define 1 1 tr( Bτ∗ )νl , vtτ,l = β∗l vtτ + ι0N Bτ∗ ι N νl N N 1 = β∗l viτ + [ι0N Bτ∗ ι N − tr( Bτ∗ )]νl . N
vdτ,l = β∗l vdτ + viτ,l
The asymptotic results of the three estimated impacts can be obtained immediately by Theorem 6.1 and are given in the following corollary. Corollary 6.1 Under Assumptions A-G, when N, T → ∞, N/T 3 → 0, we have √ d d0 −1 −1 d d τ,l − ADIτ,l + 1 vd0 Ω−1 ∆ − N T¯ ADI → N 0, v Ω ( Ω + Ω ) Ω v 0 1 τ,l 0 τ,l , 0 T¯ τ,l 0 and
√
d i0 −1 −1 i c τ,l − AIIτ,l + 1 vi0 Ω−1 ∆ − N T¯ AII → N 0, v Ω ( Ω + Ω ) Ω v 0 1 τ,l 0 τ,l , 0 T¯ τ,l 0
note that N1 ι0N Bτ∗ ι N is well-defined under large-N. To see this, by Assumption E, we have k Bτ∗ k∞ ≤ ∗ C for all τ. By the definition of k · k∞ , this condition is equivalent to max1≤i≤ N Σ N j=1 | Bij,τ | ≤ C. Then N 1 N 1 N 0 B∗ l | = | 1 N ∗ N ∗ | N1 l N τ N N ∑i =1 Σ j=1 Bij,τ ≤ N Σi =1 Σ j=1 Bij,τ ≤ N Σi =1 C = C. ± We
13
and
√
d t0 −1 −1 t d τ,l − ATIτ,l + 1 vt0 Ω−1 ∆ − → N 0, N T¯ ATI v Ω ( Ω + Ω ) Ω v 0 1 τ,l 0 τ,l , 0 T¯ τ,l 0
a a with a = d, i and t. = lim vτ,l where vτ,l N →∞
Remark 6.2 With the same spirits, we can also define the ADI, AII and ATI due to the innovations. The three new impacts have the same forms with the counterparts in (6.10) except that β∗l should be removed from the expressions (so the subscript l need to be removed too). The estimation for these three impacts is implicitly given in (6.11) by deleting βˆ l from the expressions. The asymptotic results for the new estimated impacts are almost the same with those in Corollary 6.1 except that the symbols vdτ,l , viτ,l and vtτ,l should be replaced with vdτ , viτ and vtτ , respectively. Another routine exercise is to calculate accumulative changes of the ith dependent variable responding to a one-unit increase of the jth spatial unit’s innovation over in∗ finite horizon, i.e., ∑∞ τ =0 Bij,τ . This value represents a long-run effect of an increase in innovation. Similarly, we can use a plug-in method to consistently estimate it by ˆ ∑∞ τ =0 Bij,τ . Apart from this method, we have another easier way. According to (4.2), ∗ ∗−1 ] , where D∗ = D ∗ − R∗ − · · · − R∗ and [ A ] denotes the we have ∑∞ ij ij τ =0 Bij,τ = [D q 1 ∞ ∗ − 1 ˆ ˆ = (i, j)th element of A. So an alternative estimator for ∑τ =0 B is [D ]ij , where D ij,τ
Dˆ − Rˆ 1 − · · · − Rˆ q with Dˆ and Rˆ n (n = 1, . . . , q) defined in (6.4). Likewise, we can compute the accumulate effect due to a one-unit change of explanatory variables as well. According to (6.5), the accumulate effect of a change in the l regressor of unit j to unit i ∗ ∗ ˆ −1 ˆ is ∑∞ τ =0 Bij,τ β l , which can be consistently estimated by [D ]ij β l . For ease of exposition, ˆ −1 ]ij . we use dˆij to denote [D To state the limiting distribution of dˆij and dˆij βˆ l , we introduce the following notation. Let ηij be a p-dimensional vector whose m-th (m = 1, 2, . . . , p) element is the (i, j)th element of D∗−1 Wm D∗−1 . Let ( p + q + pq + k + 1)-dimensional vector ξ ij be h ξ ij = ηij0 ,
d¨ij ι0q ,
(ι q ⊗ ηij )0 ,
0
1×(k +1)
where d¨ij is the (i, j)th element of D∗−2 . Now we have the following theorem for dˆij and dˆij βˆ l .
i0
,
Theorem 6.2 Under Assumptions A-G, when N, T → ∞, N/T 3 → 0, we have
√
N T¯ dˆij −
∞
1
∗ + ¯ ξ ij0 Ω0−1 ∆ ∑ Bij,τ T
τ =0
d − → N 0, ξ ij0 Ω0−1 (Ω0 + Ω1 )Ω0−1 ξ ij ,
and
√
N T¯ dˆij βˆ l −
∞
1
∗ 0 β∗l + πij,l Ω0−1 ∆ ∑ Bij,τ T
τ =0
d 0 − → N 0, πij,l Ω0−1 (Ω0 + Ω1 )Ω0−1 πij,l .
∗ where πij,l = β∗l ξ ij + ∑∞ τ =0 Bij,τ νl with νl defined in Theorem 6.1.
14
∞
∗ is Remark 6.3 Likewise, by Theorem 6.2, the (1 − α)-confidence interval for ∑ Bij,τ τ =0
"
r
1 ˆ0 ˆ − 1 ˆ ˆ 1 )Ω ˆ −1 ξˆij , ξ Ω ( Ω0 + Ω 0 N T¯ ij 0 # r 1 1 ˆ −1 ∆ ˆ + zα/2 ˆ −1 ( Ω ˆ0+Ω ˆ 1 )Ω ˆ −1 ξˆij , dˆij + ¯ ξˆij0 Ω ξˆ0 Ω 0 0 N T¯ ij 0 T
1 ˆ −1 ˆ dˆij + ¯ ξˆij0 Ω 0 ∆ − zα/2 T
∞
∗ β∗ is and the (1 − α)-confidence interval for ∑ Bij,τ l τ =0
"
r
1 0 ˆ −1 ˆ ˆ 1 )Ω ˆ −1 πˆ ij,l , πˆ Ω (Ω0 + Ω 0 N T¯ ij,l 0 # r 1 0 ˆ −1 ˆ 1 0 ˆ −1 ˆ − 1 ˆ ˆ 1 )Ω ˆ πˆ ij,l , dij + ¯ πˆ ij,l Ω0 ∆ + zα/2 πˆ Ω (Ω0 + Ω 0 N T¯ ij,l 0 T
1 0 ˆ −1 ˆ Ω0 ∆ − zα/2 dˆij + ¯ πˆ ij,l T
where zα/2 is defined in Remark 6.1. Again, ξˆij and πˆ ij,l can be obtained by a plug-in method. As in (6.10), we can define the temporal accumulations of average direct impact, average indirect impact and average total impact, which we use the abbreviations AADI, AAII and AATI to denote. More specifically, the three accumulation impacts due to the l regressor are defined as AADIl =
1 N
AATIl =
1 N
∞
∑ tr( Bτ∗ ) β∗l ,
τ =0 ∞
AAIIl =
1 N
∞
∑ [ι0N Bτ∗ ι N − tr( Bτ∗ )] β∗l ,
τ =0
∑ ι0N Bτ∗ ι N β∗l .
(6.12)
τ =0
As discussed above, we can estimate them by ˆ −1 ) βˆ l , \ l = 1 tr(D AADI N
ˆ −1 ι N − tr(D ˆ −1 )] βˆ l , [ l = 1 [ι0N D AAII N
To present asymptotic results, we introduce ed = ei = et − ed . Furthermore, we define edl = β∗l ed +
1 tr(D∗−1 )νl , N
etl = β∗l et +
1 N
ˆ −1 ι N βˆ l . \l = 1 ι0N D AATI N
∑iN=1 ξ ii , et =
1 0 ∗−1 ι D ι N νl , N N
1 N
∑iN=1 ∑ N j=1 ξ ij and
eil = etl − edl .
Then we have the following corollary on the three estimated accumulation impacts. Corollary 6.2 Under Assumptions A-G, when N, T → ∞, N/T 3 → 0, we have
√
and
d d 0 −1 −1 d \ l − AADIl + 1 edl 0 Ω−1 ∆ − N T¯ AADI → N 0, e Ω ( Ω + Ω ) Ω e 0 1 l l , 0 0 0 T¯
√
d i 0 −1 −1 i [ l − AAIIl + 1 eil0 Ω−1 ∆ − N T¯ AAII → N 0, e Ω ( Ω + Ω ) Ω e 0 1 l l , 0 0 0 T¯ 15
and
d t 0 −1 −1 t \l − AATIl + 1 etl 0 Ω−1 ∆ − N T¯ AATI → N 0, e Ω ( Ω + Ω ) Ω e 0 1 l l , 0 0 0 T¯ where ela = lim ela with a = d, i and t.
√
N →∞
Remark 6.4 Likewise, we can define AADI, AAII and AATI due to the innovations. These three new accumulation impacts can be obtained by deleting β∗l from the three expressions in (6.12). The asymptotic results for the new estimated impacts are almost the same with those in Corollary 6.2 except that we need use ed , ei and et to replace edl , eil and etl , respectively.
7
The model selection problem
A concomitant issue arising from model (2.1) is that it involves too many regressors. For example, if p = 2 (two spatial weights matrices) and q = 3 (three time lags), the model has eleven regressors even in the absence of exogenous regressors X. In practice, it is likely that partial coefficients are zeros. To achieve a parsimonious model and gain more the degrees of freedom, one may want to find out all the irrelevant regressors and remove them from the model. This is a classical model selection issue and we will show that the underlying model can be consistently estimated by minimizing an information criterion if the penalty functions satisfy some regularity conditions. For ease of exposition, we introduce some notation. Let M∗ be the regressors set, which consists of all regressors in the underlying true model. For example, if the underlying true model is Yt = µ + $1 W1 Yt + $2 W2 Yt + ρ1 Yt−1 + γW2 Yt−1 + Xt1 β 1 + Xt2 β 2 + et ,
M∗ is then defined as
(7.1)
o n M∗ = W1 Y, W2 Y, Y−1 , W2 Y−1 , X1 , X2 .
Let M denote the set of regressors in an arbitrary candidate model. The information criterion (IC) for M is defined as IC(M) = −2LM (θˆM ) + |M|λ( N, T¯ ),
where λ is a generic penalty function (depending on the sample size), LM (·) is the quasi (log) likelihood function under model M and θˆM is the QMLE for θ under model M, |M| denotes the number of elements in M, which is six for model (7.1). Notice that the above result can be alternatively written as 2 2 IC(M) = ln σˆ M − ln IN − $ˆ1,M W1 − · · · − $ˆ p,M Wp + |M|λ( N, T¯ ). N The following theorem shows that the information criterion method can consistently estimate the model if some regularity conditions on the penalty function are satisfied. Theorem 7.1 For every model M such that M∗ ∩ M = M∗ , M∗ 6= M, suppose that As¯ T¯ 2 )λ( N, T¯ ) → ∞ as N, T → ∞, we sumptions A-G hold for model M. Then, if min( N T, have n o P IC(M) > IC(M∗ ) → 1. 16
For every model M such that M∗ ∩ M 6= M∗ , suppose that Assumptions A-G hold for model M∗ ∪ M, if λ( N, T¯ ) → 0 as N, T → ∞, we have n o P IC(M) > IC(M∗ ) → 1.
Theorem 7.1 requires that underlying parameters are identified in overfitted models. This condition, albeit strong, is widely assumed in the literature (Fan and Peng (2004), Huang et al. (2008), and Wang et al. (2009), etc). Remark 7.1 According to Theorem 7.1, one has a large room to choose penalty functions. For example, one may choose ¯ T¯ 2 )) ln(min( N T, λ( N, T¯ ) = ¯ T¯ 2 ) , min( N T,
(7.2)
then IC(M) is the Bayesian information criterion (BIC). Alternatively one can choose ¯ T¯ 2 )) ln ln(min( N T, λ( N, T¯ ) = 2 , ¯ T¯ 2 ) min( N T, then IC(M) is the Hannan-Quinn information criterion (HQC). Both BIC and HQC can consistently estimate the underlying model. Remark 7.2 Theorem 7.1 is derived under the assumption that the true model is among those alternatives. It is likely in practice that all the alternatives are misspecified. The sources of misspecification are diversified, for example, the misspecifiations of serial correlations and cross sectional heteroskedasticity of errors or the spatial weights matrices, among others. If all the alternatives are misspecified, the model selection issue amounts to choosing a model which is the closest to the true one, where the closest is defined in term of Kullback-Leibler distance. In Appendix D, we show that, under very general setup, the information criterion value IC(M) is a consistent estimator for the Kullback discrepancy for model M. So we can still use the proposed information criterion to determine the best model if no models are correctly specified.² Once an information criterion method indicates that the original model is overfitted and selects a more parsimonious model, we call this information-criterion-selected model as the “condensed model”. Then the QML estimation method should be applied to the condensed model. A subsequent issue is what is the limiting distribution of the QMLE for this condensed model. As will be seen, it is closely related to the result in Theorem 5.1. We call those models like (2.1), which contain all spatial-temporal lags, the balanced models, and use M B ( p, q) to denote, where p is the spatial lag and q the time lag. The balanced model is called the smallest if it contains the underlying true model as a subset ² The
Kullback-Leibler distance, or more precisely Kullback-Leibler information criterion (KLIC), is defined as KLIC = E0 [ln g(z)] − E0 [ln f (z; θ∗ )] where z, g(·) and f (·) denote the observed data, the true model and the approximating model, respectively. In addition, E0 denotes the expectation under g(·) and θ∗ is the pseudo true value that maximizes E0 [ln f (z; θ )]. The Kullback discrepancy is defined as KD = −2E0 [ln f (z; θ∗ )]. To prove IC(M) is a consistent estimator for −2E0 [ln f (z; θ∗ )], we need to show (a) θ∗ = argmaxθ E0 [ln f (z; θ )] is well defined in terms of existence and uniqueness; (b) p
supθ ∈Θ | N −1 ΣiN=1 ln f (zi ; θ ) − E0 [ln f (z; θ )]| − → 0 for some compact parameters space Θ.
17
with minimum p and q values. The asymptotic properties of the balanced model is given in Theorem 5.1. Let S be a selection matrix, which selects all the relevant regressors from the smallest balanced model. For example, for model (7.1), the smallest balanced model is n o M B (2, 1) = W1 Y, W2 Y, Y−1 , W1 Y−1 , W2 Y−1 , X1 , X2 .
The selection matrix S should select W1 Y, W2 Y, Y−1 , W2 Y−1 , X1 , X2 from M B (2, 1). So it is a 6 × 7 matrix, which is equal to
1 0 0 S= 0 0 0
0 1 0 0 0 0
0 0 1 0 0 0
0 0 0 0 0 0
0 0 0 1 0 0
0 0 0 0 1 0
0 0 0 0 0 1
Let Se be defined as Se = diag(S , 1). Here we consider an extended selection matrix Se instead of S since the estimated parameters include σ2 , which is free of model selection issue. The asymptotic properties of the QMLE for the condensed model can be easily delivered through the extended selection matrix Se. Detailed results are given in Appendix E.
8 Monte Carlo simulations We conduct Monte Carlo simulations to investigate the finite sample properties of the QMLE. The data are generated according to Yt = µ + $1 W1 Yt + $2 W2 Yt + ρYt−1 + γW1 Yt−2 + Xt β + et
(8.1)
with θ = ($1 , $2 , ρ, γ, β, σ2 ) = (0.3, 0.2, 0.1, 0.2, 1, 0.25). The spatial weights matrices used in simulations are “q ahead and q behind” spatial weights matrix as in Kelejian and Prucha (1999), which is obtained as follows: all the units are arranged in a circle and each unit is affected only by the q units immediately before it and immediately after it with equal weight. Following Kelejian and Prucha (1999), we normalize the spatial weights matrix by letting the sum of each row equal to 1. In our simulations, W1 is “1 ahead and 1 behind” and W2 “3 ahead and 3 behind”. The exogenous regressor xit and intercept µi are both drawn independently √ from N (0, 1). The disturbance eit is drawn from a normalized χ2 (3), i.e., [χ2 (3) − 3]/ 6. Once xit and eit are generated, we calculate Yt by Yt = ( IN − $1 W1 − $2 W2 )−1 (µ + ρYt−1 + γW1 Yt−2 + Xt β + et ), where Yt = 0 if t ≤ 0. To eliminate the effect of initial values, we generate T + 500 periods for data and discard the first 500 periods.
18
8.1 Model selection We first investigate the performance of information criteria on the model selection issue. We generate “2 ahead and 2 behind” matrix (denoted by W3 ) as an irrelevant spatial weights matrix to (8.1). So the largest spatial-lags value is three. In addition, the largest time-lags value is set to three. Table 1 presents the percentage that the Bayesian information criterion (7.2) correctly identifies the underlying model³ . The result is obtained from 5000 repetitions. From Table 1, we see that the Bayesian information criterion performs well. Even under the sample size N = 50 and T = 50, the percentage of correctly identifying the true model is over 97%. Table 1: The performance of the information criterion N = 50 N = 75 N = 100
T = 50 97.24% 99.48% 99.92%
T = 75 97.58% 98.54% 99.56%
T = 100 98.24% 98.52% 99.29%
8.2 Performance of the QMLE We next investigate the performance of the QMLE. For simplicity, we assume that the underlying model is known in this and the next subsections. Although it is more informative to consider the performance of the QMLE jointly with the model selection issue, we note that the underlying regressors are possibly not chosen by the IC method, which makes the performance evaluation difficult and complicated. Tables 2-5 present the biases, root mean square errors (RMSE) and sample-size adjusted RMSE (SRMSE) of the QMLE under the combinations of N = 50, 75, 100 and √ T = 50, 75, 100, where the sample-size adjusted RMSE= NT × RMSE. Tables 6 and 7 give the empirical sizes of the t-statistics for 5% nominal level. All the results are obtained by 5000 repetitions. First, we consider Tables 2 and 3, which give the performance of the QMLEs before bias correction. From these two tables, we see that the QMLEs are consistent. As the sample size becomes larger, the biases and RMSEs decrease stably. In addition, we see that the biases for ρ, γ and σ2 are relatively large, which may cause problems for statistical inferences. For example, when N = 100 and T = 50, the bias and the RMSE for ρ are −0.0043 and 0.0069, respectively. The ratio is −0.623, implying that the t-statistic has an additional −0.623 value. Given the critical value is 1.96 under the 5% significance level, the inference based on t-statistics would have a severe size distortion problem. Table 2: The performance of the QMLE for $1 , $2 and ρ before bias correction ³ We
also consider the HQC information criterion and find that its performance is inferior to that of Bayesian information criterion.
19
N
T
50 75 100 50 75 100 50 75 100
50 50 50 75 75 75 100 100 100
Bias -0.0004 -0.0008 -0.0010 -0.0006 -0.0007 -0.0003 -0.0004 -0.0003 -0.0003
$1 RMSE 0.0148 0.0121 0.0105 0.0118 0.0098 0.0085 0.0103 0.0083 0.0073
SRMSE 0.7400 0.7410 0.7425 0.7226 0.7350 0.7361 0.7283 0.7188 0.7300
Bias -0.0010 0.0001 0.0002 0.0000 0.0001 0.0001 0.0001 0.0001 0.0001
$2 RMSE 0.0197 0.0161 0.0141 0.0157 0.0126 0.0113 0.0135 0.0109 0.0096
SRMSE 0.9850 0.9859 0.9970 0.9614 0.9450 0.9786 0.9546 0.9440 0.9600
Bias -0.0042 -0.0041 -0.0043 -0.0027 -0.0028 -0.0027 -0.0021 -0.0020 -0.0020
ρ RMSE 0.0088 0.0077 0.0069 0.0066 0.0058 0.0052 0.0058 0.0048 0.0043
SRMSE 0.4400 0.4715 0.4879 0.4042 0.4350 0.4503 0.4101 0.4157 0.4300
Table 3: The performance of the QMLE for γ, β and σ2 before bias correction N
T
50 75 100 50 75 100 50 75 100
50 50 50 75 75 75 100 100 100
Bias -0.0022 -0.0022 -0.0020 -0.0015 -0.0013 -0.0016 -0.0009 -0.0009 -0.0010
γ RMSE 0.0101 0.0084 0.0073 0.0082 0.0067 0.0060 0.0069 0.0057 0.0049
SRMSE 0.5050 0.5144 0.5162 0.5021 0.5025 0.5196 0.4879 0.4936 0.4900
Bias -0.0002 -0.0001 -0.0000 0.0002 -0.0000 -0.0002 0.0000 -0.0001 0.0001
β RMSE 0.0107 0.0086 0.0075 0.0085 0.0070 0.0060 0.0073 0.0060 0.0052
SRMSE 0.5350 0.5266 0.5303 0.5205 0.5250 0.5196 0.5162 0.5196 0.5200
Bias -0.0059 -0.0057 -0.0054 -0.0038 -0.0036 -0.0037 -0.0027 -0.0027 -0.0027
σ2 RMSE 0.0138 0.0116 0.0104 0.0108 0.0090 0.0080 0.0091 0.0077 0.0066
SRMSE 0.6900 0.7104 0.7354 0.6614 0.6750 0.6928 0.6435 0.6668 0.6600
Table 4: The performance of the QMLE for $1 , $2 and ρ after bias correction N
T
50 75 100 50 75 100 50 75 100
50 50 50 75 75 75 100 100 100
Bias 0.0003 -0.0001 -0.0002 -0.0002 -0.0002 0.0002 -0.0001 0.0001 0.0000
$1 RMSE 0.0148 0.0121 0.0105 0.0118 0.0097 0.0085 0.0103 0.0083 0.0073
SRMSE 0.7400 0.7410 0.7425 0.7226 0.7275 0.7361 0.7283 0.7188 0.7300
Bias -0.0013 -0.0003 -0.0001 -0.0002 -0.0001 -0.0001 -0.0001 -0.0001 -0.0001
$2 RMSE 0.0197 0.0161 0.0141 0.0157 0.0126 0.0113 0.0135 0.0109 0.0096
SRMSE 0.9850 0.9859 0.9970 0.9614 0.9450 0.9786 0.9546 0.9440 0.9600
Bias -0.0000 0.0000 -0.0001 0.0000 -0.0001 0.0000 -0.0001 -0.0000 -0.0000
ρ RMSE 0.0077 0.0065 0.0055 0.0061 0.0051 0.0045 0.0054 0.0043 0.0038
SRMSE 0.3850 0.3980 0.3889 0.3735 0.3825 0.3897 0.3818 0.3724 0.3800
Table 5: The performance of the QMLE for γ, β and σ2 after bias correction N
T
50 75 100 50 75 100 50 75 100
50 50 50 75 75 75 100 100 100
Bias -0.0002 -0.0002 0.0001 -0.0002 0.0000 -0.0002 0.0001 0.0001 -0.0001
γ RMSE 0.0099 0.0082 0.0070 0.0081 0.0066 0.0058 0.0069 0.0056 0.0048
SRMSE 0.4950 0.5021 0.4950 0.4960 0.4950 0.5023 0.4879 0.4850 0.4800
Bias -0.0002 -0.0001 0.0000 0.0002 -0.0000 -0.0002 0.0000 -0.0001 0.0001
20
β RMSE 0.0107 0.0086 0.0075 0.0085 0.0070 0.0060 0.0073 0.0060 0.0052
SRMSE 0.5350 0.5266 0.5303 0.5205 0.5250 0.5196 0.5162 0.5196 0.5200
Bias -0.0009 -0.0007 -0.0004 -0.0004 -0.0003 -0.0003 -0.0002 -0.0002 -0.0002
σ2 RMSE 0.0128 0.0104 0.0091 0.0102 0.0083 0.0072 0.0088 0.0072 0.0061
SRMSE 0.6400 0.6369 0.6435 0.6246 0.6225 0.6235 0.6223 0.6235 0.6100
Tables 4 and 5 present the performance of the QMLEs after bias correction. It is seen that the biases are effectively removed from the estimators and the QMLEs after bias correction are nearly centered around zero. In addition, we see that the RMSEs after adjusting with sample size in all the combinations of N and T are almost the same. Take γ to illustrate, SRMSE for the sample sizes (50, 100), (75, 100) and (100, 100) are 0.4879, 0.4850, 0.4800, respectively. This result confirms our theoretical result that the QMLE √ after bias correction is NT-consistent, as asserted in Theorem 5.1. Table 6: The empirical size of t-test (nominal 5%) before bias correction N 50 75 100 50 75 100 50 75 100
T 50 50 50 75 75 75 100 100 100
$1 5.28% 5.32% 4.76% 4.90% 4.92% 5.38% 5.18% 4.96% 5.15%
$2 5.50% 5.58% 5.68% 5.30% 5.16% 5.14% 4.94% 4.82% 5.00%
ρ 8.48% 10.52% 12.14% 6.94% 8.22% 10.36% 6.90% 7.92% 8.56%
γ 5.48% 5.88% 5.68% 5.84% 5.94% 6.52% 5.28% 5.32% 5.52%
β 5.96% 5.16% 5.50% 5.28% 5.06% 5.26% 4.70% 4.78% 5.14%
σ2 11.70% 12.32% 13.40% 8.74% 9.38% 10.06% 7.24% 8.04% 8.26%
Table 7: The empirical size of t-test (nominal 5%) after bias correction N 50 75 100 50 75 100 50 75 100
T 50 50 50 75 75 75 100 100 100
$1 5.20% 5.18% 4.82% 4.90% 4.86% 5.36% 5.20% 5.16% 5.20%
$2 5.56% 5.44% 5.46% 5.20% 5.08% 5.16% 4.94% 4.90% 5.04%
ρ 5.40% 5.84% 5.26% 4.62% 4.80% 5.68% 5.34% 5.14% 5.20%
γ 5.18% 5.30% 5.10% 5.30% 5.90% 5.66% 5.02% 5.10% 4.86%
β 5.98% 5.10% 5.40% 5.28% 5.06% 5.22% 4.72% 4.78% 5.12%
σ2 7.30% 7.04% 7.06% 6.28% 6.36% 5.42% 5.26% 5.84% 5.02%
Tables 6 and 7 further give empirical sizes of the t-test under nominal 5% significance level. It confirms the previous assertion that inferences based on the QMLE has a size distortion problem if no bias correction is conducted. We also see that the size distortion problem is much severer in the sample when N/T is larger. For instance, when N = 100 and T = 50, the actual significance levels for ρ and σ2 under the nominal 5% level are respective√12.14% and 13.40%. This result is consistent with our theory that the bias is of order N/T. Furthermore, Table 7 shows that the size distortion problem has been much alleviated after bias corrections.
8.3 Performance of the estimated impulse response function Finally, we investigate the finite sample performance of the estimated impulse response functions. To evaluate the finite sample performance, in each repetition and each period, 21
Figure 1: The shape of impulse response function
we calculate the impulse response value based on the true parameters and 95% confidence intervals (i.e., using 1.96 as the critical value to construct confidence intervals) based on the estimated parameters (QMLE). If the true impulse response value falls in this estimated confidence interval, we count 1, otherwise 0. We then calculate the ratio of the total count number relative to the total repetition number. If the estimators perform well, the ratio should be close to 0.95. The repetition number is set to be 5000 as well. Throughout this subsection, the response value under examination is that of the second dependent variable to the first spatial unit’s innovation shock. We note that in our data setup, all the units are in the same status and the response of one variable to the shocks of another is the same as the other way around. So the response values depend only on relative distance between the two spatial units, that is, the response of the (i + 1)th dependent variable to the ith spatial unit’s shock is the same as the response value under examination (i.e., i = 1). Consequently, the examined response can be generally interpreted as the response to its immediate neighbors’s innovation shock. Figure 1 depicts shapes of estimated impulse response functions with sample size N = 100 and T = 100. The upper-left subfigure plots estimated impulse response functions due to a one-unit change in error with 5000 repetitions. The upper-right subfigure gives the sample mean of the estimated impulse response function and the associated 95% confidence intervals based on the same 5000 repetitions. The lower-left subfigure plots the estimated impulse response functions due to a one-unit change in the exogenous regressor X and the lower-right figure is the corresponding sample mean of impulse response function and the 95% confidence intervals. From Figure 1, we see that the impulse response function achieves its maximum value instantly when the innovation’s shock occurs, has a sharp decrease at period one, gets its second maximum value at pe-
22
riod two, and decreases step by step hereafter. Although the depicted impulse response function is generated using simulated data, it implies that the allowance of multiple time lags can provide rich shapes of impulse response functions, which are useful for real data analysis. Tables 8 and 9 present performances of estimated impulse response functions in term of the ratio of total counts that fall inside 95% confidence intervals to the total number of simulations. From these two tables, we see that all the ratios fall into the interval [0.93, 0.97]. Except for sample size N = 50 and T = 50, nearly all the ratios fall into the interval [0.94, 0.96]. When the sample size is moderately large, the ratios nearly fall into the interval [0.945, 0.955]. These results indicate that the estimated impulse response functions perform well in finite sample applications. Table 8: Performance of impulse response function due to a one-unit change in error N 50 75 100 50 75 100 50 75 100
T 50 50 50 75 75 75 100 100 100
0 0.9416 0.9474 0.9476 0.9442 0.9452 0.9486 0.9502 0.9502 0.9512
1 0.9428 0.9500 0.9460 0.9442 0.9478 0.9526 0.9454 0.9504 0.9462
2 0.9422 0.9510 0.9444 0.9458 0.9516 0.9428 0.9490 0.9510 0.9450
3 0.9406 0.9458 0.9462 0.9452 0.9466 0.9498 0.9476 0.9446 0.9478
4 0.9426 0.9490 0.9472 0.9452 0.9502 0.9470 0.9472 0.9480 0.9448
5 0.9362 0.9454 0.9442 0.9392 0.9498 0.9488 0.9476 0.9484 0.9446
6 0.9396 0.9466 0.9468 0.9436 0.9506 0.9472 0.9452 0.9466 0.9448
7 0.9350 0.9466 0.9450 0.9382 0.9468 0.9488 0.9426 0.9486 0.9430
8 0.9376 0.9444 0.9468 0.9400 0.9488 0.9470 0.9442 0.9458 0.9458
9 0.9318 0.9452 0.9432 0.9392 0.9458 0.9478 0.9420 0.9474 0.9454
Table 9: The performance of the impulse response function due to a one-unit change in X N 50 75 100 50 75 100 50 75 100
T 50 50 50 75 75 75 100 100 100
0 0.9430 0.9530 0.9476 0.9488 0.9388 0.9558 0.9492 0.9458 0.9508
1 0.9468 0.9438 0.9482 0.9504 0.9458 0.9454 0.9506 0.9490 0.9484
2 0.9474 0.9464 0.9462 0.9464 0.9484 0.9474 0.9482 0.9454 0.9530
3 0.9492 0.9438 0.9470 0.9434 0.9508 0.9446 0.9464 0.9466 0.9516
4 0.9462 0.9436 0.9480 0.9442 0.9490 0.9462 0.9470 0.9426 0.9518
5 0.9464 0.9418 0.9454 0.9422 0.9478 0.9460 0.9452 0.9424 0.9506
6 0.9460 0.9452 0.9470 0.9430 0.9476 0.9452 0.9444 0.9442 0.9482
7 0.9450 0.9408 0.9456 0.9406 0.9464 0.9480 0.9446 0.9406 0.9472
8 0.9450 0.9412 0.9450 0.9418 0.9452 0.9448 0.9438 0.9406 0.9478
9 0.9388 0.9386 0.9440 0.9400 0.9434 0.9454 0.9438 0.9400 0.9452
9 Concluding remarks This paper considers using a dynamic spatial panel data model with multiple spatial lags and multiple time lags to capture complicated correlations over cross section and time in data. We use a QML method to estimate the model and investigate the asymptotic properties of the QMLE under the large-N and large-T setup. The QMLE is shown to be √ min( NT, T )-consistent. After conducting bias correction to remove the O( T1 ) order bias √ from the estimator, the QMLE is proved to have NT convergence rate under N/T 3 → 0, possesses asymptotic normal distribution with the liming variance explicitly given in the main text. We also consider the problem of estimating impulse response functions associated with the model. We derive the limiting distribution of the estimated impulse response 23
functions. Given the limiting distribution, confidence intervals can be easily constructed. The model selection issue of removing irrelevant regressors is also considered. We show that if penalty functions satisfy some regularity conditions, information criterion methods can consistently identify the underlying model. Monte Carlo simulations are conducted to examine the finite sample performances of the QMLE and of the estimated impulse response functions. Overall, simulation results confirm our theoretical results and show that the QMLE after bias correction has good finite sample performance. Recently, Bai (2013) considers factor analytical approach to estimate dynamic panel data models. In Bai’s paper, fixed-effects are treated as a special case of interactive-effects with one observed factor. Bai uses a factor-based QML method to estimate the model (for a related approach, see also Bai and Li (2014)). Bai’s analysis indicates that the factoranalytical estimator is free of bias under large-N and large-T. Norkute (2014) conducts Monte Carlo simulations to confirm the Bai’s theoretical results and show that the factoranalytical estimator has better finite sample performance than the bias-corrected QMLE in Hahn and Kuersteiner (2002). It is therefore interesting to use the factor analytical approach in spatial dynamic panel data models to deal with the fixed-effects. We will investigate this approach in a future work.
Appendix A: Proof of consistency In this section, we give a detailed consistency proof. The following lemma is useful for subsequent analyses. Lemma A.1 Let W and V be two square matrices such that kW k1 ∨ kW k∞ < ∞ and kV k1 ∨ kV k∞ < ∞, then we have kWV k1 ∨ kWV k∞ < ∞. Given these results, we have
( a) ( a0 ) (b) (c) (d) (e)
max (kWm Wn k1 ∨ kWm Wn k∞ ) < ∞,
1≤m,n≤ p
max (kWm0 Wn k1 ∨ kWm0 Wn k∞ ) < ∞,
1≤m,n≤ p
∗ ∗ max (k Gm k1 ∨ k Gm k∞ ) < ∞,
1≤ m ≤ p
∗0 ∗ ∗0 ∗ max (k Gm Gn k1 ∨ k Gm Gn k∞ ) < ∞,
1≤m,n≤ p
∗0 ∗0 max (kWm0 Gm Wn k1 ∨ kWm0 Gm Wn k∞ < ∞,
1≤m,n≤ p
∗ ∗ max (kWm Gm k1 ∨ kWm Gm k∞ ) < ∞,
1≤ m ≤ p
p
∗ = W D ∗−1 and D ∗ = I − where Gm ∑m=1 $∗m Wm . m N
Proof of Lemma A.1. See Lee (2004). Lemma A.2 Under Assumptions A-G, we have
( a0 ) (b0 )
T
1 N T¯
t=q¯
1 N T¯
∑ Y˙ t0−m W X˙ t =
∑ Y˙ t0−m W Y˙ t−n = T
t=q¯
1 N T¯
1 N T¯
T
∑ E(Y˜t0−m W Y˜t−n ) + o p (1),
m, n = 0, 1, . . . , q
t=q¯
T
∑ E(Y˜t0−m W X˙ t ) + o p (1),
t=q¯
24
m = 0, 1, . . . , q
where W is any N × N nonrandom matrix such that kW k1 ∨ kW k∞ ≤ C for some constant C. Given the above two results, we have
( a) (b) (c)
1 N T¯ 1 N T¯ 1 N T¯
T
1 ∑ X˙ t0 X˙ t − N T¯ t=q¯ T
T
∑ E(Xet0 Xet ) = o p (1),
t=q¯
1 ∑ X˙ t0 Gm∗0 X˙ t − N T¯ t=q¯ T
T
∑ E(Xet0 Gm∗0 Xet ) = o p (1), T
1 ∑ X˙ t0 Gm∗0 Gn∗ X˙ t − N T¯ t=q¯
h 1 (d) E N T¯
h 1 (e) E N T¯
m = 1, . . . , p
t=q¯
T
t=q¯
∑ Φ∗0 (X˙ t − Xet )0 et
t=q¯ T
∑ E(Xet0 Gm∗0 Gn∗ Xet ) = o p (1),
i2
= O( T¯ −2 ),
∑ Φ∗0 (X˙ t − Xet )0 Gm∗0 Gn∗ et
t=q¯
i2
= O( T¯ −2 ),
m, n = 1, . . . , p
et and Xet are defined in Assumption F. where Y˙ t and X˙ t are defined in (3.4), Y
Proof of Lemma A.2. Consider (a0 ). We first show that, for every m, n = 0, 1, . . . , q, 0 ∞ 1 ∞ ∗ Bv e¯−m−v W ∑ Bu∗ e¯−u−n = O p ( T¯ −1 ), ∑ N v =0 u =0
and
1 N T¯
T
1
T
∑ Yet0−m W Yet−n − N T¯ ∑ E(Yet0−m W Yet−n ) = O p ( N −1/2 T¯ −1/2 ).
t=q¯
(A.1)
(A.2)
t=q¯
By (A.1) and (A.2), result (a0 ) follows as will be shown below. Consider (A.1). By the definition of e¯−n , the left hand side of (A.1) is equal to 1 N T¯ 2
∞
T
∑ ∑
u,v=0 t,s=q¯
es0 −m−v Bv∗0 WBu∗ et−u−n .
(A.3)
So we have E
=
1 N 2 T¯ 4
∞
∑ 0 0
h1 ∞ 0 ∞ i2 ∗ ∗ ¯ ¯ B e W B e − m − v − u − n ∑ u v N v∑ =0 u =0 T
∑ 0 0
u,v,u ,v =0 t,s,t ,s =q¯
(A.4)
E(es0 −v−m Bv∗0 WBu∗ et−u−n es0 0 −v0 −m Bv∗00 WBu∗0 et0 −u0 −n ).
The term E(es0 −m−v Bv∗0 WBu∗ et−u−n es0 0 −v0 −m Bv∗00 WBu∗0 et0 −u0 −n ) is 0 except for cases: (i) s − v − m = t − u − n and s0 − v0 − m = t0 − u0 − n; (ii) s − v = s0 − v0 and t − u = t0 − u0 ; (iii) s − v − m = t0 − u0 − n and t − u − n = s0 − v0 − m. All the three cases include the special case s − v − m = t − u − n = s0 − v0 − m = t0 − u0 − n. If we take summation of the above three cases, we have to subtract twice of this special case. Given this analysis, we have that equation (A.4) is equal to ∗2 ∞ 2 σ ∗0 ∗ ¯ (A.5) ∑ [(T − |(v + m) − (u + n)|) ∨ 0]tr( Bv WBu ) N T¯ 2 u,v =0 25
+ +
σ ∗4 N 2 T¯ 4
∞
∑ 0 0
u,v,u ,v =0
∞
σ ∗4
N 2 T¯ 4
∑ 0 0
u,v,u ,v =0
κ ∗ − 3σ∗4 + 4 2 ¯4 N T
[( T¯ − |v − v0 |) ∨ 0][( T¯ − |u − u0 |) ∨ 0]tr( Bu∗00 W 0 Bv∗0 Bv∗0 WBu∗ )
[( T¯ − |v + m − u0 − n|) ∨ 0][( T¯ − |u + n − v0 − m|) ∨ 0]tr( Bv∗0 WBu∗ Bv∗00 WBu∗0 ) ∞
∑
u,v,u0 ,v0 =0
n
o [ T¯ − δ¯(u, v, u0 , v0 ) + δ(u, v, u0 , v0 )] ∨ 0 tr[( Bv∗0 WBu∗ ) ◦ ( Bv∗00 WBu∗0 )],
where δ(u, v, u0 , v0 ) = (v + m) ∨ (u + n) ∨ (v0 + m) ∨ (u0 + n), δ(u, v, u0 , v0 ) = (v + m) ∧ (u + n) ∧ (v0 + m) ∧ (u0 + n) and κ4∗ = E(eit4 ). Consider the first expression of (A.5). By kW k1 ∨ kW k∞ ≤ C, we have |wij | ≤ C for all i, j. Thus, σ ∗2 ¯2 NT
∞
∑
u,v=0
≤
[( T¯ − |(v + m) − (u + n)|) ∨ 0]tr( Bv∗0 WBu∗ )
σ ∗2 N T¯
≤C ≤C
∞
N
N
N
∗ | ∑ ∑ ∑ ∑ | B∗ji,v | · |w jl | · | Bli,u
u,v=0 i =1 j=1 l =1 ∞
σ ∗2 N T¯
N
N
N
∗ | ∑ ∑ ∑ ∑ | B∗ji,v | · | Bli,u
u,v=0 i =1 j=1 l =1
σ ∗2 N
h
∞
N
∞
N
| B∗ | | B∗ | N T¯ ∑ ∑ ∑ ji,v ∑ ∑ li,u j =1
v =0 i =1
u =0 l =1
i
.
∞
N ∗ By Assumption E, ∑ k Bv∗ k1 ≤ C and this implies ∑∞ u=0 ∑l =1 | Bli,u | ≤ C for all i. Hence, v =0
the above expression is bounded by C2 ∞
σ ∗2 N T¯
N
h
∞
∑ ∑
j =1
i ∗ | B | ∑ ji,v . N
v =1 i =1
∞
N
Again, ∑ ∑ | B∗ji,v | ≤ ∑ k Bv∗ k∞ ≤ C. So we have v =0 i =1
v =0
σ ∗2 ¯2 NT
1 ∗0 ∗ ¯ = O [( T − |( v + m ) − ( u + n )|) ∨ 0 ] tr ( B WB ) ∑ v u ¯ , T u,v=0 ∞
which implies that the first expression of (A.5) is O( T¯12 ). Consider the second expression of (A.5). Notice that |wl 0 j0 | ≤ C, the second expression of (A.5) is bounded by σ ∗4 N 2 T¯ 2
∞
∑
u,v,u0 ,v0 =0
≤
σ ∗4 N 2 T¯ 2
≤C
∗0 0 ∗ ∗0 ∗ tr( Bu0 W Bv0 Bv WBu ) ∞
∑
u,v,u0 ,v0 =0
σ ∗4 N 2 T¯ 2
∞
∑ 0 0
v,u ,v =0
N
∑
i,j,j0 ,l,l 0 ,d=1 N
∑0
i,j,l,l ,d=1
∗ ∗ ∗ ∗ 0 0 B w B B w B ji,u0 lj ld,v0 l 0 d,v l j j0 i,u
∗ ∗ ∗ B w B B ji,u0 lj ld,v0 dl 0 ,v 26
N
∞
| B∗j0 i,u | ∑ ∑ 0
j =1 u =0
≤ C2
∞
σ ∗4 N 2 T¯ 2
σ ∗4 N 2 T¯ 2
≤ C4
σ ∗4 N 2 T¯ 2
∑0
i,j,l,l ,d=1
∞
N
v,u ,v =0
σ ∗4 ≤ C2 2 ¯ 2 N T
≤ C4
N
∑ 0 0
∑0
v,v =0
l,l 0 ,d=1
∞
N
∑
∑
∑
v,v0 =0 N
h
l,l 0 ,d=1 ∞
N
("
∗ ∗ ∗ Bji,u0 wlj Bld,v 0 Bl 0 d,v N
∞
N
∑ ∑ ∑ | B∗ji,u0 | 0
j =1
u =0 i =1
∗ Bld,v0 Bl∗0 d,v ih
∞
N
∗ | Bl∗0 d,v | 0| ∑ ∑ ∑ | Bld,v ∑∑ 0 0
d =1
v =0 l =1
v =0 l =1
wlj
#
∗ ∗ · Bld,v 0 Bl 0 d,v
)
i
σ ∗4 ≤ C6 ¯ 2 . NT Hence, the second expression of (A.5) is O( N1T¯ 2 ). The third expression is also O( N1T¯ 2 ), which can be proved similarly as the second one. Consider the last expression, which is bounded by κ4∗ − 3σ∗4 N 2 T¯ 3
∞
N
∑ 0 0
∑0 0
u,v,u ,v =0 i,j,j ,l,l =1
∗ | B∗ji,v w jl Bli,u B∗j0 i,v0 w j0 l 0 Bl∗0 i,u0 | ∞
N
≤ C2
κ4∗ − 3σ∗4 N 2 T¯ 3
≤ C2
κ4∗ − 3σ∗4 N 2 T¯ 3
≤ C4
κ4∗ − 3σ∗4 N 2 T¯ 3
u,v=0 i,j,l =1
≤ C4
κ4∗ − 3σ∗4 N 2 T¯ 3
∗ | ∑ ∑ ∑ | Bli,u
≤ C6
κ4∗ − 3σ∗4 . N T¯ 3
∑ 0 0
∑0 0
u,v,u ,v =0 i,j,j ,l,l =1 ∞
N
∑ ∑
u,v=0 i,j,l =1 ∞
N
∑ ∑
N
i =1
∞
∗ | B∗ji,v Bli,u B∗j0 i,v0 Bl∗0 i,u0 |
∗ | B∗ji,v Bli,u |
∞
N
∑ ∑
v 0 =0 j 0 =1
| B∗j0 i,v0 |
∞
N
∑ ∑
u 0 =0 l 0 =1
| Bl∗0 i,u0 |
∗ | B∗ji,v Bli,u |
N
u =0 l =1
∞
N
∑ ∑ | B∗ji,v |
v =0 j =1
So the last expression is O( N1T¯ 3 ). Summarizing the above results, we have i2 h1 ∞ 0 ∞ 1 = O ( ¯ 2 ), E Bv∗ e¯−m−v W ∑ Bu∗ e¯−u−n ∑ N v =0 T u =0
which implies (A.1). et , it suffices to prove that We proceed to consider (A.2). By the definition of Y 0 ∞ 1 1 T ∞ ∗˙ ∗ ∗ X β W B e B t−m−v ∑ u t−n−u = O p ( √ N T¯ ) N T¯ ∑ ∑ v t=q¯
(A.6)
u =0
v =0
and
1 N T¯
T
∑
t=q¯
0 ∗ B e ∑ v t−m−v W ∞
v =0
1 − ¯ NT
T
∑E
t=q¯
"
∞
∑
u =0 ∞
Bu∗ et−n−u
∑ Bv∗ et−m−v
v =0
0
27
W
∞
∑ Bu∗ et−n−u
u =0
#
= Op ( √
1 N T¯
(A.7)
).
∞
Let Xt−m = W 0 ∑ Bv∗ X˙ t−m−v β∗ . By Assumptions E, F and kW k1 ∨ kW k∞ ≤ C, it is easy v =0
to see that all the elements of Xt−m is nonrandom and bounded in absolute value by some constant C. Let Xi,t−m be the ith element of Xt−m . Some computations show that E
h 1 N T¯
T
∞
∑ X0t−m ∑ Bu∗ et−u−n
t=q¯
u =0
i2
∞ ∞ T 1 ∗2 σ X0t−m Bu∗ Bv∗0 Xt+v−u−m ∑ ∑ ∑ N 2 T¯ 2 v=0 u=0 t=q¯+|u−v|
=
∞ ∞ T N 1 ∗2 ∗ ∗ σ Blj,v Xl,t+v−u−m | ∑ ∑ ∑ ∑ |Xi,t−m Bij,u N 2 T¯ 2 v=0 u=0 t=q¯+|u−v| i,j,l =1
≤
1 ∗2 ∞ ∞ N ∗ ∗ σ ∑ ∑ ∑ | Bij,u | · | Blj,v | N 2 T¯ v=0 u=0 i,j,l =1 " # ∞ N ∞ N N 1 1 ∗ ∗ | | ≤ C 2 ¯ σ ∗2 ∑ ∑ ∑ | Bij,u ∑ ∑ | Blj,v N j =1 NT u =0 i =1 v =0 l =1
≤ C2
= C4
1 ∗2 σ . N T¯
Given the above result, we have (A.6). Next, we consider (A.7). Let Υvu = Bv∗0 WBu∗ . Now the left hand side of (A.7) is equal to 1 N T¯
∞
T
∑ ∑
t=q¯ u,v=0
et0 −m−v Υvu et−n−u −
1 N T¯
T
∞
∑ ∑
t=q¯ u,v=0
E(et0 −m−v Υvu et−n−u ).
The variance of the above expression is equal to σ ∗4 N 2 T¯ 2
+ +
σ ∗4 N 2 T¯ 2
∞
∑
u,v,w=0
− 3σ∗4 N 2 T¯ 2
κ4∗
∞
∑
u,v,w=0
[( T¯ − |w − v|) ∨ 0]tr(Υvu Υ0w,w+u−v )1(w + u − v ≥ 0)
(A.8)
[( T¯ − |n + u − m − v|) ∨ 0]tr(Υvu Υw,2m−2n+w+v−u )1(2m − 2n + w + v − u ≥ 0)
∞
∑
v,w=0
[( T − q − |v − w|) ∨ 0]tr(Υv,m+v−n ◦ Υw,w+m−n )1(m + v − n ≥ 0 ∩ m + w − n ≥ 0).
For ease of exposition, let Bv∗ = 0 if v < 0. Then the first expression in (A.8) is bounded by σ ∗4 N 2 T¯
∞
∑
u,v,w=0
≤ ≤
|tr( Bw∗0+u−v W 0 Bw∗ Bv∗0 WBu∗ )|
σ ∗4 N 2 T¯ σ ∗4
N 2 T¯
∞
N
∑
∑
∞
N
u,v,w=0 i,j,l,b,d, f =1
∑
∑
u,v,w=0 i,l,b,d, f =1
∗ ∗ | B∗ji,w+u−v wlj Blb,w Bdb,v wd f B∗f i,u |
∗ ∗ | Blb,w Bdb,v wd f B∗f i,u |
28
h
N
∑ | B∗ji,w+u−v wlj |
j =1
i
≤C
σ ∗4 N 2 T¯
≤C
σ ∗4 N 2 T¯
≤ C3 ≤ C3 ≤ C5
∞
∞
N
∑
∑
∞
N
v,w=0 l,b,d=1
N 2 T¯
∑
1 ¯ NT N
∑
N
∑
b =1
h
∗ ∗ | Blb,w Bdb,v wd f B∗f i,u |
∗ ∗ | Blb,w Bdb,v |
v,w=0 l,b,d=1
σ ∗4
N T¯
∑
u,v,w=0 i,l,b,d, f =1
σ ∗4
σ ∗4
N
∑
∞
h
N
∞
N
∑ |wd f | ∑ ∑ | B∗f i,u |
f =1
u =0 i =1
i
∗ ∗ | Blb,w Bdb,v | N
∗ | ∑ ∑ | Blb,w
w =0 l =1
∞
N
∗ | ∑ ∑ | Bdb,v
v =0 d =1
i
.
So the first expression of (A.8) is O( N1T¯ ). The second expression is also O( N1T¯ ), which can be proved similarly as the first one. Consider the third expression, which is bounded by κ4∗ − 3σ∗4 N 2 T¯
∞
∑
v,w=0
∗ ∗0 ∗ tr[( Bv∗0 WBm +v−n ) ◦ ( Bw WBw+m−v )]
≤
κ4∗ − 3σ∗4 N 2 T¯
≤
κ4∗ − 3σ∗4 N 2 T¯
≤ C2
∞
N
∑
∑0 0
v,w=0 i,j,j ,l,l =1 ∞
N
∑ ∑
v,w=0 i,j,j0 =1
κ4∗ − 3σ∗4 N 2 T¯
∞
≤C
∗ 4 κ4
− 3σ∗4 N T¯
| B∗ji,v B∗j0 i,w |
N
∑ ∑0
v,w=0 i,j,j =1
κ ∗ − 3σ∗4 1 ≤ C2 4 ¯ N NT
N
∑
i =1
"
∗ ∗ ∗ | B∗ji,v w jl Bli,m +v−n Bj0 i,w w j0 l 0 Bl 0 i,w+m−n |
N
∗ ∑ |w jl Bli,m +v−n |
l =1
N
∑
l 0 =1
|w j0 l 0 Bl∗0 i,w+m−n |
| B∗ji,v B∗j0 i,w |
∞
N
∑ ∑ | B∗ji,v |
v =0 j =1
∞
N
| B∗j0 i,w | ∑ ∑ 0
w =0 j =1
#
.
Hence, the last expression of (A.8) is O( N1T¯ ). Summarizing the three terms of (A.8), we have " #2 1 T ∞ 0 1 T ∞ 1 0 E e Υvu et−n−u − ¯ ∑ ∑ E(et−m−v Υvu et−n−u ) = O , N T¯ ∑ ∑ t−m−v NT N T¯ t=q¯ u,v=0
t=q¯ u,v=0
which implies (A.7). Given (A.6) and (A.7), we obtain (A.2). Now we use (A.1) and (A.2) to prove (a0 ). The left hand side of (a0 ) can be written as 1 N T¯
T
1 ∑ Y˙ t0−m W Y˙ t−n = N T¯ t=q¯
+
1 N T¯
T
T
∑ (Y˙ t−m − Yet−m )0 W (Y˙ t−n − Yet−n ) +
t=q¯
1
T
1 N T¯
T
∑ Yet0−m W Yet−n
t=q¯
∑ (Y˙ t−m − Yet−m )0 W Yet−n + N T¯ ∑ Yet0−m W (Y˙ t−n − Yet−n ).
t=q¯
t=q¯
29
(A.9)
et−m , the first term on the right-hand-side of (A.9) is By the definitions of Y˙ t−m and Y 1 N(
∞
∞
v =0
u =0
∑ Bv∗ e¯−m−v )0 W ( ∑ Bu∗ e¯−u−n ), which is o p (1) by (A.1). The second term is equal to
1 N T¯
e0 W Y et−n ) + o p (1) by (A.2). The third term equals to ∑tT=q¯ E(Y t−m 1 N T¯
T
∞
∑ ∑ Bv∗ e¯−m−v
t=q¯
v =0
0
which is bounded in norm by h 1 N T¯
∑ T
t=q¯
2 i1/2 h 1
∗ ¯ B e ∑ v −m−v N T¯ v =0 ∞
et−n , WY T
∑ kW Yet−n k2
t=q¯
i1/2
.
(A.10)
Consider the expression in the first square bracket of (A.10), which is equal to # " 0 ∞ 1 T ∞ ∗ B e¯−m−v tr ∑ Bv∗ e¯−m−v . N T¯ ∑ ∑ v t=q¯
v =0
v =0
Invoking (A.1) with W = IN with n = m we have that the above expression is O p ( T −1 ). Proceed to consider the expression in the second square bracket of (A.10), which equals to " # # " T 1 T e0 1 0 0 0 et−n = tr et−n W W Y et−n ) + o p (1), tr Y W WY E (Y N T¯ ∑ t−n N T¯ ∑ t=q¯
t=q¯
kW 0 W k
∨ kW 0 W k
where the equality is due to ∞ ≤ C (by Lemma A.1) and (A.2). Given 1 this result, we have that the expression in (A.10) is O p ( T −1/2 ) and this implies that the third expression of (A.9) is O p ( T −1/2 ). The last expression in (A.9) is also O p ( T −1/2 ), which can be proved similarly as the third one. Given all the results, we have (a0 ). Consider (b0 ). We will show that for any m = 0, 1, . . . , q, 1 N T¯
and
1 N T¯
T
∑ (Y˙ t−m − Yet−m )0 W X˙ t = O p ( N −1/2 T −1/2 )
(A.11)
t=q¯
T
1 ∑ Yet0−m W X˙ t = N T¯ t=q¯
T
∑ E(Yet0−m W X˙ t ) + O p ( N −1/2 T −1/2 ).
(A.12)
t=q¯
et , the The left hand side of (A.11) First, we consider (A.11). By the definitions of Y˙ t and Y is 1 T ∞ 0 ¯ ∑ es−m−v Bv∗0 W X. N T¯ s∑ =q¯ v=0 Let X¯ z be the zth column of Xz , where z = 1, . . . , k. It suffices to consider 1 N T¯
T
∞
∑ ∑ es0 −m−v Bv∗0 W X¯ z .
s=q¯ v=0
0 ∗0 ¯ 2 Consider E[ N1T¯ ∑sT=q¯ ∑∞ v=0 es−m−v Bv W Xz ] , which is equal to
1 2 N T¯ 2
T
∞
∑ ∑
s,t=q¯ u,v=0
E( X¯ z0 W 0 Bv∗ es−m−v et0 −m−v Bu∗0 W X¯ z ) 30
=
σ ∗2 N 2 T¯ 2 σ ∗2
≤
N 2 T¯
≤
σ ∗2 N 2 T¯
σ ∗2 ≤ 2¯ N T
≤ C2
∞
∑
u,v=0 ∞
∑
u,v=0
X¯ z0 W 0 Bv∗ Bu∗0 W X¯ z
∞
N
∑
∑
u,v=0 i,j,l,b,d=1 ∞
N
∑
∑
u,v=0 j,l,b=1 ∞
σ ∗2
N 2 T¯
σ ∗2 ≤C 2¯ N T 2
≤ C4
[( T¯ − |v − u|) ∨ 0] X¯ z0 W 0 Bv∗ Bu∗0 W X¯ z
∗ | B∗jl,v Bbl,u |
N
∑ N
σ ∗2 , N T¯
∗ | B∗jl,v Bbl,u |
∞
N
∑ ∑∑
l =1
"
∑
u,v=0 j,l,b=1
"
∗ | X¯ iz w ji B∗jl,v Bbl,u wbd X¯ dz |
| B∗jl,v |
v =0 j =1
N
∑ |X¯ iz w ji |
i =1
∞
N
∑∑
u =0 b =1
∗ | Bbl,u |
N
∑ |wbd X¯ dz |
d =1
#
#
et , we only need to show which implies (A.11). Consider (A.12). By the definition of Y 1 N T¯
T
∞
∑ ∑ Bv∗ et−m
t=q¯
v =0
0
W X˙ t = O p ( N −1/2 T −1/2 ).
By letting W X˙ t = Xt , the proof of the above result is almost the same as that of (A.6). Therefore, details are omitted here. Given (A.11) and (A.12), we have (b0 ). This completes the proof of the first part of Lemma (A.2). For the second part, consider (a). By definition, h i X˙ t = Y˙ t−1 , . . . , Y˙ t−q , W1 Y˙ t−1 , . . . , Wp Y˙ t−1 , . . . , W1 Y˙ t−q , . . . , Wp Y˙ t−q , X˙ t .
Let W0 = IN . Then the element of N1T¯ ∑tT=q¯ X˙ t0 X˙ t is either N1T¯ ∑tT=q¯ Y˙ t0−n Wm0 Wm0 Y˙ t−n0 or T 1 Y˙ W X˙ for m, m0 = 0, . . . , p and n, n0 = 1, . . . , q. Notice that kWm0 Wm0 k1 ∨ N T¯ ∑t=q¯ t−n m t 0 kWm Wm0 k∞ ≤ C for all m, m0 = 0, . . . , p by Lemma A.1(a0 ). By results (a0 ) and (b0 ) of ∗0 W 0 k ∨ kW 0 G ∗0 W 0 k and this lemma, we immediately obtain (a). Notice that kWm0 Gm m 1 m m m ∞ ∗0 k ∨ kW 0 G ∗0 k are bounded for all m, m0 = 0, . . . , p by Lemma A.1(d) and (e), kWm0 Gm 1 m m ∞ the proof of (b) is similar to that of (a). By the same arguments, the proof of (c) is similar to that of (b). Consider (d). By definition, h ∞ ∞ ∞ (A.13) X˙ t − Xet = − ∑ Bv∗ e¯−v−1 , . . . , ∑ Bv∗ e¯−v−q , W1 ∑ Bv∗ e¯−v−1 , . . . , v =0
v =0
Wp
where e¯−v = 1 N T¯
1 T¯
∞
∑
v =0
v =0
Bv∗ e¯−v−1 , . . . , W1
∞
∑
v =0
Bv∗ e¯−v−q , . . . , Wp
∞
∑
v =0
Bv∗ e¯−v−q , 0
N ×k
∑tT=q¯ et−v . Given (A.13), we have
T
1
∞
1
∞
i
∑ Φ∗0 (X˙ t − Xet )0 et = − N φ1∗ ∑ e¯−0 v−1 Bv∗0 e¯ − · · · − N φq∗ ∑ e¯−0 v−q Bv∗0 e¯
t=q¯
v =0
31
v =0
,
−···−
1 ∗ ∞ 0 φ e¯−v−q Bv∗0 Wp0 e¯ N l v∑ =0
with l = q + pq, where φl∗ is the lth element of Φ∗ . By the Cauchy-Schwarz inequality
( a1 + · · · + al )2 ≤ l ( a21 + · · · + a2l ), we have h 1 N T¯
T
∑ Φ (X˙ t − Xet ) et ∗0
0
t=q¯
i2
(
h1 h1 i2 i2 ∞ ∞ ∗ 0 ∗0 0 ∗0 ¯ ¯ ¯ φ1∗ ∑ e¯− φ B + · · · + e e B e −v−q v N v =0 v −1 v N q v∑ =0 ) h1 i2 ∞ ∗ 0 ∗0 0 φ +···+ e¯−v−q Bv Wp e¯ . N l v∑ =0
≤l
Given the above result, to prove (d), it suffices to prove E
h1 N
∞
∑
v =0
i2 0 ∗0 0 ¯ e¯− B W e = O ( T −2 ) v−n v m
(A.14)
for every m = 0, 1, . . . , p and n = 1, . . . , q, where W0 = IN . The left hand side is equal to ∞
1 N2 T4
T
∑
∑ 0 0
u,v=0 s,t,s ,t =q¯
E(es0 −v−n Bv∗0 Wm0 et es0 0 −u−n Bu∗0 Wm0 et0 ).
The terms E(es0 −v−n Bv∗0 Wm0 et es0 0 −u−n Bu∗0 Wm∗ et0 ) = 0 except for the case (i) s − v − n = t, s0 − u − n = t0 , (ii) s − v − n = s0 − u − n, t = t0 , (iii) s − v − n = t0 , t = s0 − u − n. All the three cases include s − v − n = t = s0 − u − n = t0 , So we need to subtract twice of the last case. Some tedious calculation shows that it equals to σ ∗4
1
N 2 T¯ 4
+ σ ∗4
∞
∞
∑ ∑[
u =0 v =0
1 2 N T¯ 4
∞
T¯ − (v + n) ∨ 0][ T¯ − (u + n) ∨ 0]tr(Wm Bv∗ )tr(Wm Bu∗ )
∞
∑ ∑[
u =0 v =0
+ σ ∗4 +(κ4∗ − 3σ∗4 )
T¯ − (v + n) ∨ 0][ T¯ − (u + n) ∨ 0]tr(Wm Bv∗ Wm Bu∗ )
1 2 N T¯ 3
1 2 N T4
∞
∞
∑ ∑ [(T¯ − |u − v|) ∨ 0]tr(Wm Bv∗ Bu∗0 Wm0 )
u =0 v =0
∞
∞
∑ ∑ [(T¯ − |u − v|) ∨ 0]tr[(Wm Bv∗ ) ◦ (Wm Bu∗ )],
(A.15)
u =0 v =0
where κ4∗ = E(eit4 ). The first expression of (A.15) is equal to σ ∗4
h 1 N T¯ 2
∞
∑[
v =0
i2 h 1 T¯ − (v + n) ∨ 0]tr(Wm Bv∗ ) ≤ σ∗4 NT
Notice that |wij,m | ≤ C for some constant C, therefore 1 N T¯
∞
∑ |tr(Wm Bv∗ )| ≤
v =0
1 N T¯
∞
N
N
∑ ∑ ∑ |wij,m | · | B∗ji,v |
v =0 i =1 j =1
32
∞
∑ |tr(Wm Bv∗ )|
v =0
i2
.
≤C
1 N T¯
N
∞
N
∑ ∑ ∑ | B∗ji,v |
i =1
v =0 j =1
≤C
1 T
∞
1
∑ k Bv∗ k1 = O( T¯ ).
v =0
Hence, the first expression is O( T¯ −2 ). The second expression is bounded by 4σ∗4
1 N 2 T¯ 2
∞
∞
∑ ∑ |tr(Wm Bv∗ Wm Bu∗ )|.
u =0 v =0
Notice that |wij,m | ≤ C for all i, j and m. So the above expression is further bounded by σ ∗4
1 2 N T¯ 2
∞
∞
N
∑∑ ∑
u=0 v=0 i,j,k,l =1
≤ Cσ∗4
1 N 2 T¯ 2
∗ |wij,m B∗jk,v wkl,m Bli,v | ∞
N
∑∑ ∑
u=0 v=0 i,j,k,l =1
1 ≤ C 2 σ ∗4 2 ¯ 2 N T
= O(
∞
N
(
∞
∗ | B∗jk,v wkl,m Bli,v |
h
N
N
∞
N
∗ | ∑ ∑ ∑ | B∗jk,v | ∑ wkl,m ∑ ∑ | Bli,v
j =1
u =0 k =1
1 ). N T¯ 2
l =1
v =0 i =1
i
)
The third expression is O( N1T¯ 2 ) and the fourth term is O( N1T¯ 3 ), which can be proved similarly to the third and fourth ones of (A.5). Thus, (A.15) is O p ( T¯12 ). Hence, we obtain (d). The proof (e) is similar to that of (d). The details are therefore omitted. This completes the proof of Lemma A.2. Lemma A.3 Under Assumptions A-G, we have
( a) sup θ ∈Θ
1 T ∗0 ˙ 0 0 ˙ Φ X F ( $ ) F ( $ ) e = o p (1), t t ¯ 2 ∑ N Tσ t=q¯
(b) sup θ ∈Θ
(c) sup θ ∈Θ
(d) sup θ ∈Θ
1 ¯ 2 N Tσ
T
1
∑ e˙t0 F ($)0 F ($)e˙t − N tr
t=q¯
1 T 0 0 ˙ ˙ e F ( $ ) X = o p (1), t ∑ t ¯ 2 N Tσ t=q¯
h σ ∗2 σ2
i F ( $ ) 0 F ( $ ) = o p (1),
1 T ∗0 ˙ 0 0 ˙ Φ X F ( $ ) X = O p (1), t ∑ t ¯ 2 N Tσ t=q¯
where θ = ($0 , Φ0 , σ2 )0 , Θ = R$ × Rr × Rσ2 and
F ($) = D($)D($∗ )−1 = IN −
p
∑ ($m − $∗m )Gm∗ .
(A.16)
m =1
Proof of Lemma A.3. Consider (a). By (A.16), the left hand side of (a) is bounded by p p T 1 1 T ∗0 ˙ 0 1 ∗ ∗ 1 ∗0 ˙ 0 ∗0 ∗ sup 2 ¯ ∑ Φ Xt et + sup 2 ∑ ∑ |$m − $m | · |$n − $n | · ¯ ∑ Φ Xt Gm Gn et N T t=q¯ θ ∈Θ σ N T t=q¯ θ ∈ Θ σ m =1 n =1 33
1 + sup 2 σ θ ∈Θ
p
∑
m =1
1 N T¯
|$m − $∗m | ·
1 0 ∗0 ˙ ∑ Φ Xt Gm et + sup σ2 θ ∈Θ t=q¯ T
p
∗0
∑
m =1
1 N T¯
|$m − $∗m | ·
0 ∗ ˙ ∑ Φ Xt Gm et . T
∗0
t=q¯
(A.17)
Consider the first term of (A.17), which is bounded by E
h 1 N T¯
T
∑ Φ∗0 X˙ t0 et
t=q¯
h 1 ≤ 2E N T¯
i2
=E
T
h 1 N T¯
∑ Φ∗0 Xet0 et
t=q¯
i2
T
T
1
∑ Φ∗0 Xet0 et + N T¯ ∑ Φ∗0 (X˙ t − Xet )0 et
t=q¯
t=q¯
h 1 + 2E N T¯
i2 ∗0 ˙ 0 e Φ ( X − X ) e t t t . ∑ T
i2 (A.18)
t=q¯
By Assumption A and the definition of Xet , we have E
h 1 N T¯
T
∑ Φ∗0 Xet0 et
t=q¯
i2
=
T 1 ∗2 σ ∑ Φ∗0 E(Xet0 Xet )Φ∗ = O( N −1 T¯ −1 ), N 2 T¯ 2 t=q¯
where the last equality is due to Assumption F. So the first term of (A.18) is O p ( N1T¯ ). The second term of (A.18) is O p ( T¯12 ) by Lemma A.2(d). Given these results, together with σ2 bounded away from zero by Assumption D, we have that the first term of (A.17) is O p ( √ 1 ¯ ) + O p ( T1¯ ). Consider the second term. For any m, n = 1, . . . , p, NT
E
h 1 N T¯
T
∑ Φ∗0 X˙ t0 Gm∗0 Gn∗ et
t=q¯
i2
≤ 2E + 2E
h 1 N T¯
h 1 N T¯
T
∑ Φ∗0 Xet0 Gm∗0 Gn∗ et
t=q¯ T
i2
∑ Φ∗0 (X˙ t − Xet )0 Gm∗0 Gn∗ et
t=q¯
i2
.
(A.19)
∗ k ∨ k G ∗ k ) is bounded. Let C = max (k G ∗ k ∨ k G ∗ k )2 . By Lemma A.1(b), max (k Gm 1 m ∞ m 1 m ∞ 1≤ m ≤ p
1≤ m ≤ p
∗0 G ∗ G ∗0 G ∗ is positive semi-definite for every m, n = Then it follows that C · IN − Gm n n m 1, . . . , p. Thus,
E
h 1 N T¯
T
∑ Φ∗0 Xet0 Gm∗0 Gn∗ et
t=q¯
i2
=
T 1 ∗2 ∗0 ∗ ∗0 ∗ e ∗ σ E(Φ∗0 Xet0 Gm Gn Gn Gm Xt Φ ) ∑ N 2 T¯ 2 t=q¯
≤C
T 1 ∗2 σ ∑ Φ∗0 E(Xet0 Xet )Φ∗ = O( N −1 T¯ −1 ), N 2 T¯ 2 t=q¯
∗0 G ∗ e = O ( N −1/2 T ¯ −1/2 ). The second term of (A.19) which implies that N1T¯ ∑tT=q¯ Φ∗0 Xet0 Gm p n t is O p ( T¯ −2 ), which is shown in Lemma A.2(e). This result, together with the boundedness of |$m − $∗m | < C and σ−2 by Assumption D, shows that the second term of (A.17) is O p ( √ 1 ¯ ) + O p ( T1¯ ). The third and forth terms of (A.17) are both O p ( √ 1 ¯ ) + O p ( T1¯ ), NT NT which can be proved similarly as the second term. Thus, (a) follows. Consider (b). The left hand side of (b) is bounded by
sup θ ∈Θ
1 1 σ2 N T¯
T
1 1
∑ et0 F ($)0 F ($)et − tr[σ∗2 F ($)0 F ($)] + sup σ2 N e¯0 F ($)0 F ($)e¯ .
t=q¯
34
θ ∈Θ
(A.20)
Consider the first term of (A.20), which is further bounded by 1 T ∗ ∗ |$m − $∗m | · ¯ ∑ [et0 Gm et − tr(σ∗2 Gm )] N T t=q¯ θ ∈Θ θ ∈Θ t=q¯ m =1 p p 1 T 1 ∗0 ∗ ∗0 ∗ + sup 2 ∑ ∑ |$m − $∗m | · |$n − $∗n | · ¯ ∑ [et0 Gm Gn et − tr(σ∗2 Gm Gn )] . (A.21) N T t=q¯ θ ∈ Θ σ m =1 n =1
sup
1 1 σ2 N T¯
T
p
1
∑ (et0 et − Nσ∗2 ) + 2 sup σ2 ∑
Notice that for any nonrandom matrix A such that k Ak1 ∨ k Ak∞ ≤ C, we have E
h 1 N T¯
T
∑ [et0 Aet − tr(σ∗2 A)]
t=q¯
i2
i κ4∗ − 3σ∗4 1 ∗4 h 0 2 tr ( A ◦ A ) σ tr ( A A ) + tr ( A ) + σ ∗4 N 2 T¯
=
= O( N −1 T¯ −1 ).
Using the above result, we have that 1 N T¯
T
1 N T¯
∑ (et0 et − Nσ∗2 ),
t=q¯
T
∑ [et0 Gm∗ et − tr(σ∗2 Gm∗ )] and
t=q¯
1 N T¯
T
∑ [et0 Gm∗0 Gn∗ et − tr(σ∗2 Gm∗0 Gn∗ )]
t=q¯
∗ k ∨ k G ∗ k and k G ∗0 G ∗ k ∨ k G ∗0 G ∗ k are bounded are all O p ( N −1/2 T¯ −1/2 ) because k Gm 1 m ∞ m n 1 m n ∞ by Lemma A.1(b) and (c). These results, together with the boundedness of $m and σ−2 by Assumption D, give that all the three expressions in (A.21) are O p ( N −1/2 T¯ −1/2 ). Thus, the first term of (A.20) is O p ( N −1/2 T¯ −1/2 ). Using the same argument following (A.19), we can find a constant C such that C · IN − F ($)0 F ($) ≥ 0 uniformly on Θ. The second term of (A.20) is therefore bounded by C sup σ12 N1 e¯0 e¯ = O p ( T¯ −1 ). Given these results, we θ ∈Θ
have (b). The proof of result (c) is almost the same as that of the third term of (A.17). Consider (d). By (A.16), the left hand side of (d) is bounded by sup θ ∈Θ
1 1 σ2 N T¯
T
1 |$m − $∗m | · ¯ NT m =1 p
1
∑ Φ∗0 X˙ t0 X˙ t + sup σ2 ∑
t=q¯
θ ∈Θ
T
∑ Φ∗0 X˙ t0 Gm∗0 X˙ t .
t=q¯
∗0 X˙ only involve Notice that both the expressions N1T¯ ∑tT=q¯ Φ∗0 X˙ t0 X˙ t and N1T¯ ∑tT=q¯ Φ∗0 X˙ t0 Gm t the underlying parameters and the observations, thus they are apparently both O p (1). Given this result, together with the boundedness of $m and σ−2 , we have (d).
Proof of Proposition 5.1. We only consider the QMLE defined by (5.1) because the proof under (5.2) is similar.´ Consider the following centered (quasi) log likelihood function 1 1 L(θ ) = − ln σ2 − ¯ 2 2 2N Tσ
T
1
∑ Z˙ t ($, Φ)0 Z˙ t ($, Φ) + N ln |D($)|
t=q¯
1 1 1 + ln σ∗2 + − ln |D($∗ )| 2 2 N ´A
(A.22)
general argument to deal with this kind of approximate maximizer can be found in the proof of Theorem 14.4 in Kosorok (2007).
35
with T¯ = T − q, q¯ = q + 1, D($) = IN − $1 W1 − · · · − $ p Wp and Z˙ t ($, Φ) = D($)Y˙ t − X˙ t Φ,
(A.23)
where Y˙ t and X˙ t are implicitly defined in (3.4). The above log likelihood function differs from the original one with a constant and is more convenient for our subsequent analysis. Given $ and σ2 , the likelihood function (A.22) is maximized with respect to Φ at Φ($) =
h
T
∑ X˙ t0 X˙ t
t=q¯
i −1 h
T
∑ X˙ t0 D($)Y˙ t
t=q¯
i
.
(A.24)
Using this formula, (A.23) now can be written as Z˙ t ($, Φ) = D($)Y˙ t − X˙ t
h
T
∑ X˙ t0 X˙ t
t=q¯
i −1 h
T
∑ X˙ t0 D($)Y˙ t
t=q¯
i
.
(A.25)
Substituting the above formula into the likelihood function (A.22), we have
−
1 ¯ 2 2N Tσ
1 1 1 1 1 L(θ ) = − ln σ2 + ln |D($)| + ln σ∗2 + − ln |D($∗ )| 2 N 2 2 N ( ) i −1 h T i h T ih T T 0 0 0 0 0 0 ∑ Y˙ t D($) D($)Y˙ t − ∑ Y˙ t D($) X˙ t ∑ X˙ t X˙ t ∑ X˙ t D($)Y˙ t . t=q¯
t=q¯
t=q¯
t=q¯
Notice that
D($)Y˙ t = D($)D($∗ )−1 (X˙ t Φ∗ + e˙t ) = F ($)X˙ t Φ∗ + F ($)e˙t with F ($) = D($)D($∗ )−1 = D($)D ∗−1 . Substituting the above result into the likelihood function, we have L(θ ) = L1 (θ ) + L2 (θ ),
where
i 1 σ2 1 1 1 h σ ∗2 1 0 L1 (θ ) = − ln ∗2 + ln |F ($)| + − tr 2 F ($) F ($) − ¯ 2× 2 σ N 2 2N σ 2N Tσ ( ) h T ih T i −1 h T i T Φ∗0 ∑ X˙ t0 F ($)0 F ($)X˙ t − ∑ X˙ t0 F ($)0 X˙ t ∑ X˙ t0 X˙ t ∑ X˙ t0 F ($)X˙ t Φ∗ t=q¯
t=q¯
and
− −
(
(
−
(
L2 ( θ ) = − 1 ¯ 2 2N Tσ
1 h 1 2σ2 N T¯
1h 1 σ2 N T¯
T
∑
t=q¯
T
∑
t=q¯
(
t=q¯
1 ¯ 2 N Tσ
T
∑Φ
t=q¯
e˙t0 F ($)0 F ($)e˙t
e˙t0 F ($)0 X˙ t
ih 1 N T¯
∑
t=q¯
36
X˙ t0 F ($)0 F ($)e˙t
)
i 1 h σ ∗2 − tr 2 F ($)0 F ($) 2N σ
T
ih 1 ∑ Φ∗0 X˙ t0 F ($)0 X˙ t N T¯ t=q¯ T
∗0
t=q¯
X˙ t0 X˙ t
i −1 h 1 N T¯
T
∑ X˙ t F ($)e˙t
t=q¯
i −1 h 1 ∑ X˙ t0 X˙ t N T¯ t=q¯ T
)
T
i
)
∑ X˙ t F ($)e˙t
t=q¯
i
)
The expression in the first bracket of L1 (θ ) is equivalent to σ ∗2 i 1 1 h σ ∗2 1 ln 2 F ($)0 F ($) − tr 2 F ($)0 F ($) + . 2N σ 2N σ 2
So we can rewrite the function L1 (θ ) as σ ∗2 i 1 1 h σ ∗2 1 1 0 0 ln 2 F ($) F ($) − tr 2 F ($) F ($) + L1 ( θ ) = − ¯ 2× 2N σ 2N σ 2 2N Tσ ( ) h T ih T i −1 h T i T ∗0 0 0 0 0 0 0 Φ ∑ X˙ t F ($) F ($)X˙ t − ∑ X˙ t F ($) X˙ t ∑ X˙ t X˙ t ∑ X˙ t F ($)X˙ t Φ∗ . t=q¯
t=q¯
t=q¯
t=q¯
The two expressions in L1 (θ ) are both non-positive for all θ ∈ Θ. To see this, it is easy to verify that the second expression is non-positive. For the first expression, let λi (i = 1, 2, . . . , N ) be the eigenvalues of the matrix σ∗2 F ($)0 F ($)/σ2 . Since this matrix is symmetric, all the λi s are real. Now the first expression in L1 (θ ) is equivalent to 1 2N
N
∑ (ln λi − λi + 1),
i =1
which is non-positive since the function f ( x ) = ln x − x + 1 achieves its maximum value 0 at x = 1. Now consider the objective function. Since θˆ maximizes the likelihood function, we have L1 (θˆ) + L2 (θˆ) ≥ L1 (θ ∗ ) + L2 (θ ∗ ), or equivalently L1 (θˆ) − L1 (θ ∗ ) ≥ L2 (θ ∗ ) − L2 (θˆ). But the results in Lemma A.3 imply L2 (θ ) = o p (1) uniformly on Θ. Given this result, we have L1 (θˆ) − L1 (θ ∗ ) ≥ −2 supθ ∈Θ |L2 (θ )| = −|o p (1)|. By the definition of L1 (θ ), it is easy to verify that L1 (θ ∗ ) = 0. So we have L1 (θˆ) = o p (1). But the two expressions in L1 (θˆ) are both non-positive as shown above. Given this result, we have σ ∗2 i 1 1 1 h σ ∗2 0 ln 2 F ($ˆ ) F ($ˆ ) − tr 2 F ($ˆ )0 F ($ˆ ) + = o p (1) σˆ σˆ 2N 2N 2
and 1 ∗0 ¯ 2Φ 2N Tσ
(
(A.26)
T
∑ X˙ t0 F ($ˆ)0 F ($ˆ)X˙ t
t=q¯
−
h
T
∑
t=q¯
X˙ t0 F ($ˆ )0 X˙ t
ih
T
∑
t=q¯
X˙ t0 X˙ t
i −1 h
T
∑
t=q¯
X˙ t0 F ($ˆ )X˙ t
i
)
Φ ∗ = o p (1).
(A.27)
Notice that F ($ˆ ) = D($ˆ )D($∗ )−1 = IN −
p
∑
m =1
$ˆ m Wm D($∗ )−1 = IN −
p
∑ ($ˆm − $∗m )Wm D ∗−1 .
m =1
∗ = W D ∗−1 . So we can alternatively write the above result as Let Gm m p
F ($ˆ ) = IN −
∑ ($ˆm − $∗m )Gm∗ .
m =1
37
(A.28)
Given (A.28), together with Lemma A.2(b) and (c), we can rewrite (A.27) as " T 1 (A.29) ($ˆ − $∗ )0 ¯ ∑ E(Vt0 Vt ) N T t=q¯ # T T −1 T 0 e 0 e 0 e e − ∑ E(Vt Xt ) ∑ E(Xt Xt ) ∑ E(Xt Vt ) ($ˆ − $∗ ) = o p (1), t=q¯
t=q¯
t=q¯
where Vt = ( G1∗ Xet Φ∗ , G2∗ Xet Φ∗ , . . . , G ∗p Xet Φ∗ ). Now we show the consistency of $ˆ by the local or global identification conditions. By p
(A.26), together with Assumption G.1, we immediately obtain that $ˆ − → $∗ . Alternatively, p
p
by (A.27), together with Assumption G.2, we also obtain $ˆ − → $∗ . Once $ˆ − → $∗ is p obtained, using (A.26), we have σˆ 2 − → σ ∗2 . In addition, equation (A.24) implies that we can estimate Φ by ˆ = Φ
h
T
∑ X˙ t0 X˙ t
t=q¯
i −1 h
T
∑ X˙ t0 D($ˆ)Y˙ t
t=q¯
i
=
h
T
∑ X˙ t0 X˙ t
t=q¯
i −1 h
T
∑ X˙ t0 F ($ˆ)(X˙ t Φ∗ + e˙t )
t=q¯
Substituting (A.28) into (A.30), we have p h T i −1 h T i ∗ ∗ 0 0 ∗ ˆ − Φ = − ∑ ($ˆ m − $m ) ∑ X˙ t X˙ t Φ ∑ X˙ t Gm X˙ t Φ∗ m =1
h
T
+ ∑ X˙ t0 X˙ t t=q¯
i −1 h
i 0 ˙ ˙ X e ∑ t t − T
t=q¯
p
∑
m =1
t=q¯
($ˆm − $∗m )
t=q¯
h
T
∑ X˙ t0 X˙ t
t=q¯
i −1 h
T
∑ X˙ t0 Gm∗ e˙t
t=q¯
i
.
(A.30)
(A.31) i
.
Consider the first term of (A.31). Given that N1T¯ ∑tT=q¯ X˙ t0 X˙ t = N1T¯ ∑tT=q¯ E(Xet0 Xet ) + o p (1) T ∗ X˙ = 1 ∗X et ) + o p (1) by Lemma by Lemma A.2(a) as well as N1T¯ ∑tT=q¯ X˙ t0 Gm E(Xet0 Gm t N T¯ ∑t=q¯ ∗ A.2(b), together with $ˆ − $ = o p (1), we have that the first term is o p (1). By N1T¯ ∑tT=q¯ X˙ t0 e˙t = T 1 X˙ 0 e = O p ( N −1/2 T¯ −1/2 ), the second term is o p (1). Also notice that N1T¯ ∑tT=q¯ X˙ t0 Gm e˙t N T¯ ∑t=q¯ t t ˆ we have that the third term is also = O p ( N −1/2 T¯ −1/2 ), together with the consistency of $, p p ∗ ˆ − o p (1). Given the above analysis, we have Φ → Φ . Thus, θˆ − → θ∗. This completes the consistency proof.
References Alvarez, J., and Arellano, M. (2003). The time series and cross-section asymptotics of dynamic panel data estimators. Econometrica, 1121-1159. Arellano, M., and Bond, S. (1991). Some tests of specification for panel data: Monte Carlo evidence and an application to employment equations. The Review of Economic Studies, 58(2), 277-297. Anderson, T. W., and Hsiao, C. (1981). Estimation of dynamic models with error components. Journal of the American Statistical Association, 76(375), 598-606. Anselin, L. (1988). Spatial Econometrics: Methods and Models (Vol. 4). Springer Science & Business Media. 38
Bai, J. (2013). Fixed-effects dynamic panel models, a factor analytical method. Econometrica, 81(1), 285-314. Bai, J. and Li, K. (2015). Dynamic spatial panel data models with common shocks, Unpubslished manuscript. Bai, J., and Li, K. (2016). Maximum likelihood estimation and inference for approximate factor models of high dimension. Review of Economics and Statistics, 98(2), 298-309. Bai, J., and Li, K. (2014). Theory and methods of panel data models with interactive effects. The Annals of Statistics, 42(1), 142-170. Baltagi, B., Song, S.H., Jung, B.C., and Koh, W. (2007) Testing for serial correlation, spatial autocorrelation and random effects using panel data, Journal of Econometrics, 140, 5-51. Beenstock, M., and Felsenstein, D. (2007). Spatial vector autoregressions. Spatial Economic Analysis, 2(2), 167-196. Blundell, R., and Bond, S. (1998). Initial conditions and moment restrictions in dynamic panel data models. Journal of econometrics, 87(1), 115-143. Brady, R. R. (2011). Measuring the diffusion of housing prices across space and over time. Journal of Applied Econometrics, 26(2), 213-231. Brockwell, P. J., and Davis, R. A. (1991). Time series: Theory and methods. Springer Series in Statistics, Berlin, New York: Springer, 2nd edition. Elhorst, J.P., Lacombe, D.J. and Piras, G. (2012) On model specification and prarameter space definitions in higher order spatial econometrics models. Reginal Science and Urban Economics, 42, 211-220. Fan, J., Liao, Y., and Mincheva, M. (2011). High dimensional covariance matrix estimation in approximate factor models. Annals of statistics, 39(6), 3320. Fan, J., and Peng, H. (2004). Nonconcave penalized likelihood with a diverging number of parameters. The Annals of Statistics, 32(3), 928-961. Fuller, W. A. (1996). Introduction to Statistical Time Series (Vol. 230). John Wiley & Sons. Gao, Y., Li, K., and Yang, X. (2017) Spill-over effects, qualities of healthcare and education, and home prices, Manuscript in progress. Gupta, A., and Robinson, P. M. (2015). Inference on higher-order spatial autoregressive models with increasingly many parameters. Journal of Econometrics, 186(1), 19-31. Hahn, J., and Kuersteiner, G. (2002). Asymptotically unbiased inference for a dynamic panel model with fixed effects when both “n” and “T” are Large. Econometrica, 70(4), 1639-1657. Hall, P., and Heyde, C. C. (1980) Martingale Limit Theory and Its Applications. Academic press. Hamilton, J. D., (1994), Time series analysis, Princeton University Press.
39
Holly, S., Pesaran, M. H., and Yamagata, T. (2011). The spatial and temporal diffusion of house prices in the UK. Journal of Urban Economics, 69(1), 2-23. Hsiao, C. (1986) Analysis of panel data. Cambridge University Press, Cambridge. Huang, J., Ma, S., and Zhang, C. H. (2008). Adaptive Lasso for sparse high-dimensional regression models. Statistica Sinica, 18(4), 1603. Jennrich, R. I. (1969). Asymptotic properties of non-linear least squares estimators. The Annals of Mathematical Statistics, 633-643. Kapoor, M., Kelejian, H. H., and Prucha, I. R. (2007). Panel data models with spatially correlated error components. Journal of Econometrics, 140(1), 97-130. Kelejian, H. H., and Prucha, I. R. (1998). A generalized spatial two-stage least squares procedure for estimating a spatial autoregressive model with autoregressive disturbances. The Journal of Real Estate Finance and Economics, 17(1), 99-121. Kelejian, H. H., and Prucha, I. R. (1999). A generalized moments estimator for the autoregressive parameter in a spatial model. International economic review, 40(2), 509-533. Kiviet, J. F. (1995). On bias, inconsistency, and efficiency of various estimators in dynamic panel data models. Journal of econometrics, 68(1), 53-78. Kosorok, M. R. (2007). Introduction to empirical processes and semiparametric inference. Springer Science & Business Media. Lacombe, D. J. (2004). Does econometric methodology matter? An analysis of public policy using spatial econometric techniques.Geographical Analysis, 36(2), 105-118. Lee, L. F. (2004). Asymptotic distributions of quasi-maximum likelihood estimators for spatial autoregressive models. Econometrica, 1899-1925. Lee, L. F., and Yu, J. (2010). Estimation of spatial autoregressive panel data models with fixed effects. Journal of Econometrics, 154(2), 165-185. Lee, L. F., and Yu, J. (2014). Efficient GMM estimation of spatial dynamic panel data models with fixed effects. Journal of Econometrics, 180(2), 174-197. Lee, L. F., and Yu, J. (2015). Estimation of fixed effects panel regression models with separable and nonseparable spacešCtime filters. Journal of Econometrics, 184(1), 174-192. LeSage, J. P., and Pace, R. K. (2009). Introduction to Spatial Econometrics (Statistics, textbooks and monographs). CRC Press. McMillen, D. P., Singell, L. D., and Waddell, G. R. (2007). Spatial competition and the price of college. Economic Inquiry, 45(4), 817-833. Newey, W. K., and McFadden, D. (1994). Large sample estimation and hypothesis testing. Handbook of econometrics, 4, 2111-2245. Neyman, J., and Scott, E. L. (1948). Consistent estimates based on partially consistent observations. Econometrica. 1-32.
40
Nickell, S. (1981). Biases in dynamic models with fixed effects. Econometrica, 49(6), 14171426. Norkute, M. (2014). A Monte Carlo study of a factor analytical method for fixed-effects dynamic panel models. Economics Letters, 123(3), 348-351. Qu, X., and Lee, L. F. (2015). Estimating a spatial autoregressive model with an endogenous spatial weight matrix. Journal of Econometrics, 184(2), 209-232. Seber, G. A. (2008). A matrix handbook for statisticians (Vol. 15). John Wiley & Sons. Wang, H., Li, B., and Leng, C. (2009). Shrinkage tuning parameter selection with a diverging number of parameters. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 71(3), 671-683. Yu, J., de Jong, R., and Lee, L. F. (2008). Quasi-maximum likelihood estimators for spatial dynamic panel data with fixed effects when both n and T are large. Journal of Econometrics, 146(1), 118-134.
41