Spatial dynamic panel data models with interactive fixed effects

Spatial dynamic panel data models with interactive fixed effects

Accepted Manuscript Spatial dynamic panel data models with interactive fixed effects Wei Shi, Lung-fei Lee PII: DOI: Reference: S0304-4076(16)30231-7...

1MB Sizes 0 Downloads 50 Views

Accepted Manuscript Spatial dynamic panel data models with interactive fixed effects Wei Shi, Lung-fei Lee PII: DOI: Reference:

S0304-4076(16)30231-7 http://dx.doi.org/10.1016/j.jeconom.2016.12.001 ECONOM 4334

To appear in:

Journal of Econometrics

Received date: 31 December 2014 Revised date: 23 December 2016 Accepted date: 23 December 2016 Please cite this article as: Shi, W., Lee, L.-f., Spatial dynamic panel data models with interactive fixed effects. Journal of Econometrics (2017), http://dx.doi.org/10.1016/j.jeconom.2016.12.001 This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

Spatial Dynamic Panel Data Models with Interactive Fixed EffectsI Wei Shia , Lung-fei Leeb,∗ a Institute

for Economic and Social Research, Jinan University, Guangzhou, Guangdong 510632, China. of Economics, The Ohio State University, Columbus, Ohio 43210, United States.

b Department

This version: 12/12/2016. First version: 12/29/2014. Second version: 11/26/2015.

Abstract This paper studies the estimation of a dynamic spatial panel data model with interactive individual and time effects with large n and T . The model has a rich spatial structure including contemporaneous spatial interaction and spatial heterogeneity. Dynamic features include individual time lag and spatial diffusion. The interactive effects capture heterogeneous impacts of time effects on cross sectional units. The interactive effects are treated as parameters, so as to allow correlations between the interactive effects and the regressors. We consider a quasi-maximum likelihood estimation and show estimator consistency and characterize its asymptotic distribution. The Monte Carlo experiment shows that the estimator performs well and the proposed bias correction is effective. We illustrate the empirical relevance of the model by applying it to examine the effects of house price dynamics on reverse mortgage origination rates in the US. Keywords: Spatial panel, dynamics, multiplicative individual and time effects JEL classification: C13, C23, C51

I We would like to thank Stephanie Moulton for sharing the HECM loan data, and participants at seminars in the Ohio State University, the 2014 China Meeting of the Econometric Society at Xiamen University, the 2014 Shanghai Econometrics Workshop at Shanghai University of Finance and Economics, the 2015 MEA Annual Meeting, New York Camp Econometrics X, and the 11th World Congress of the Econometric Society for many valuable comments. We appreciate receiving valuable comments and suggestions from referees, an associate editor and a coeditor of this journal. The usual disclaimer applies. ∗ Corresponding author. Email addresses: [email protected] (Wei Shi), [email protected] (Lung-fei Lee)

1. Introduction Spatial interaction is present in many economic problems. When a state determines its tax rate, it takes into account not only its domestic constituent, but also what neighboring states might do (e.g., Han and Lee (2016)). Fluctuations in one industrial sector can spill to other sectors if the sectors are “close” in terms of using similar production technology, using similar inputs, etc (e.g., Conley and Dupor (2003)). Early contributions to spatial econometrics include Cliff and Ord (1973) and Anselin (1988). Kelejian and Prucha (2010) examine the GMM for estimation of spatial models. Lee (2004a) establishes asymptotic properties of the quasi-maximum likelihood (QML) estimator of a spatial autoregressive (SAR) model. A spatial panel can take into account dynamics and control for unobserved heterogeneity. Dynamic spatial panel data models with fixed individual and/or time effects where spatial effects appear as lags in time and in space have been studied in Yu et al. (2008) and Lee and Yu (2010b). Su and Yang (2015) examine QML estimation of dynamic panel data models with spatial effects in the errors. Lee and Yu (2013) is a recent survey on spatial panel data models. On the other hand, individual units can be differently affected by common factors. The factor loading quantifies the magnitude of the effect of a time varying factor on an individual unit. In the example of industrial sectors above, common factors like interest rate, demographic trend, etc., can simultaneously affect various industrial sectors, but magnitudes of the impact can be different across sectors. A few possibly unobserved common factors may drive much essential comovement between sectors although those sectors are far from each other according to economic distance measures. Essentially, a factor induces a time fixed effect which may affect individuals differently. Bai (2009) labels this an interactive effect. In recent years, much progress has been made in the estimation and inference of panel data models with interactive effects. When interactive effects are viewed as fixed parameters, the model can be estimated by nonlinear least squares (NLS) method involving principal components. Bai (2009) systematically studies asymptotic properties of the NLS estimator. The estimation is iterative where in each iteration, slope parameters are estimated given factors, and then factors are estimated by principal components given the estimated slope parameters. Moon and Weidner (2015b) show that, under additional assumptions, the limiting distribution of the NLS estimator does not depend on the number of factors assumed in the estimation, as long as it does not fall below the true number of factors. They analyze asymptotic properties of the NLS estimator by perturbation method from Kato (1995), where the objective function is expanded involving its approximated gradient vector and Hessian matrix. Assuming that factors and factor loadings are random and factors enter regressors linearly, Pesaran (2006) proposes the common correlated effects (CCE) estimator. His idea is that variations due to factors can be captured by cross sectional averages of the dependent and 1

explanatory variables. Ahn et al. (2013) propose a GMM estimator for fixed T case. The interactive effects can be eliminated by a transformation involving the factors and then GMM can be applied. Spatial interaction and common factors are two specifications explored in the literature where individuals’ activities and outcomes are not independently distributed. In this paper, an individual can be influenced by its neighbors’ actions or outcomes which is modeled by the SAR specification. Individuals are also exposed to unobserved common factors which are modeled as interactive effects following Bai (2009). By treating interactive effects as parameters to be estimated, this approach allows flexible correlation between the interactive effects and the regressors. As a data generating process involves spatial spillovers and interactive effects, both should be taken into account for estimators to be consistent. This paper jointly models spatial interactions and interactive effects in a panel data with large n and T . We consider spatial interaction in the dependent variable where the degree of spatial correlation is of interest, in which case the CCE estimator is not directly applicable. The spatial panel model under consideration is a general dynamic spatial panel data model where spatial effects can appear both in the form of lags and errors. In addition to contemporaneous spatial interaction, time lagged dependent variables, diffusion and spatially correlated and heterogeneous disturbances are included to allow a rich specification of the state dependence and guard against spurious spatial correlation. We do not impose specific structures on how interactive effects affect the regressors. The interactive effects are treated as nuisance parameters and are concentrated out in the estimation. Moon and Weidner (2015b) show how to derive the approximated gradient vector and the Hessian matrix of the concentrated NLS objective function of a regression panel. In spatial panel setting, the log of the sum of squared residuals from the regression panel is a component of the likelihood function, and we adapt their approach to that component. We provide conditions for identification and show that the QML estimation method works well. The estimator is shown to be consistent and asymptotically normal. Asymptotic biases of order

√1 nT

exist due to incidental parameters, and a bias correction method is

proposed. Existing literature on factor models (Bai (2003), Forni et al. (2004), Stock and Watson (2002)) ignores spatial interactions. Literature on spatial interactions (Lee and Yu (2010a), Lee and Yu (2013)) has mostly not considered unobserved interactive effects. To the best of our knowledge, there are only a few papers that jointly model spatial correlation and interactive effects. Pesaran and Tosetti (2011) model different forms of error correlation, including spatial error correlation and common factor, and show that the CCE estimator continues to work well. A recent work by Bai and Li (2015) has a related model specification but their model and estimation method are different from ours. Their model specification has a different feature on heterogeneous variances. Our model has richer dynamics, which include spatial time lags (diffusion), 2

contemporaneous spatial lags, spatial errors and individual time lags, while Bai and Li (2015) consider them one at a time. The usefulness of the richer dynamics structure of our model has been discussed in Lee and Yu (2015) for dynamic panel models with additive effects. The objective function in Bai and Li (2015) features a large number of incidental parameters due to factor loadings and heteroskedastic variances over one dimension (the cross section). We show that the factor loadings can be further concentrated out, and the objective function only needs to be maximized with respect to the finite dimensional parameter vector of interest if homoskedasticity is assumed. The paper is organized as follows. Section 2 presents the model and discusses assumptions and identification. We prove consistency and derive the limiting distribution of the QML estimator in Section 3. We then illustrate the empirical relevance of the theory by demonstrating the estimator’s good finite sample performance and applying the model to analyze the effect of house price dynamics on reverse mortgage origination rates in the U.S. Section 6 concludes. Proofs of the main results are collected in the appendix. A supplementary file is available online which has detailed proofs on relevant useful lemmas. In this paper, here are some essential notations. For a vector η, kηk1 = ∑k |ηk | and kηk2 =

q 2 ∑k |ηk | .

Let µi (M) denote the i-th largest eigenvalue of a symmetric matrix M of dimension n with eigenvalues listed

in a decreasing order such that µn (M) ≤ µn−1 (M) ≤ · · · ≤ µ1 (M). For a real matrix A, its spectral norm is p ||A||2 , i.e., kAk2 = µ1 (A0 A). In addition, ||A||1 = max1≤ j≤n ∑m i=1 |Ai j | is its maximum column sum norm, p ||A||∞ = max1≤i≤m ∑nj=1 |Ai j | is its maximum row sum norm, and kAkF = tr(AA0 ) is its Frobenius norm. In stands for an n × n identity matrix. Denote the projection matrices PA = A(A0 A)−1 A0 and MA = I − PA . In cases where A might not have full rank, we use (A0 A)† to denote the Moore-Penrose generalized inverse of A0 A. For a real number x, dxe is the smallest integer greater than or equal to x. “wpa 1” stands for “with probability approaching 1”.

2. The Spatial Dynamic Panel Data (SDPD) Model with Interactive Effects 2.1. The Model There are n individual units and T time periods. The SDPD model has the following specification, Ynt = λWnYnt + γYn,t−1 + ρWnYn,t−1 + Xnt β + Γn ft +Unt , and Unt = α W˜ nUnt + εnt ,

(1)

where Ynt is an n-dimensional column vector of observed dependent variables and Xnt is an n×(K −2) matrix

of exogenous regressors, so that the total number of variables in Yn,t−1 , WnYn,t−1 and Xnt is K. The model accommodates two types of cross sectional dependences, namely, local dependence and global (strong) dependence. Individual units are impacted by potentially time varying unknown common factors ft , which 3

captures global (strong) dependence. The effects of the factors can be heterogeneous on the cross section units, as described by the factor loading parameter matrix Γn . For example, in an earnings regression where Ynt is the wage rate, each row of Γn may correspond to a vector of an individual’s skills and ft is the skill premium which may be time varying. The number of unobserved factors is assumed to be a fixed constant r that is much smaller than n and T .1 The matrix of n × r factor loading Γn and the T × r factors FT = ( f1 , f2 , · · · , fT )0 are not observed and are treated as parameters. The fixed effects approach is flexible

and allows unknown correlation between the common factor components and the regressors. The n × n spatial weights matrices Wn and W˜ n in Eqs. (1) are used to model spatial dependences. The term λWnYnt

describes the contemporaneous spatial interactions. There are also dynamics in model (1). γYn,t−1 captures the pure dynamic effect. ρWnYn,t−1 is a spatial time lag of interactions, which captures diffusion (Lee and Yu (2013)).2 The idiosyncratic error Unt with elements of εit being i.i.d. (0, σ 2 ) also possesses a spatial structure W˜ n , which may or may not be the same as Wn . The specification in (1) is general and encompasses many models of empirical interest. In particular, additive fixed individual and time effects is a special case of the factor structure, as can be seen in the following. • Ynt = λWnYnt + γYn,t−1 + ρWnYn,t−1 + Xnt β + ~ζn + `n ξt + εnt ,

(2)

 0  0 where ~ζn = ζ1 ζ2 · · ζn are individual effects, and ξt are time effects with `n = 1 1 · · 1 .  0  0 ζ1 ζ2 · · ζn 1 1 · · 1  and FT =  . Eq. (2) is a special case of Eq. (1) with Γn =  1 1 · · 1 ξ1 ξ2 · · ξT

Define An = Sn−1 (γIn + ρWn ), where Sn = In − λWn . From Eq. (1), Ynt = AnYn,t−1 + Sn−1 (Xnt β + Γn ft +Unt ).

h −1 Continuous substitution gives Yn,t = ∑∞ h=0 An Sn (Xn,t−h β + Γn ft−h +Un,t−h ), assuming that the series con ∞ verges. With kAn k2 < 1, Ahn h=0 is absolutely summable and the initial condition Yn,0 does not affect the

asymptotic analysis when T → ∞. Lee and Yu (2013) discuss the parameter space of γ, ρ and λ and regularity conditions that guarantee kAn k2 < 1. Let ϖni denote an eigenvalue of Wn and dni the corresponding eigenvalue of An , we then have dni =

γ+ρϖni 1−λ ϖni .

If the spatial weights matrix Wn is row normalized from a sym-

1 In

many empirical studies, the number of factors is much smaller than the dimension of the dataset. For example, in Stock and Watson (2002), 6 factors are used to model 215 macroeconomic time series. 2 In general, the spatial weights matrices for the contemporaneous spatial interactions and for the diffusion can be different. However it is straightforward to extend this paper’s analysis to such cases. QMLE is still consistent and asymptotically normal under assumptions that will be introduced in Section 2.3. Assuming identical spatial weights simplifies the notation.

4

metric matrix (as in Ord (1975)),3 all eigenvalues of Wn are real. Furthermore, if tr (Wn ) = 0, the condition 1 ϖn,min

< λ < 1 = ϖn,max implies that In − λWn is invertible, where ω¯ n,min and ω¯ n,max are, respectively, the

smallest and largest eigenvalues of Wn . Stationarity further requires |dni | < 1 for all i. Lee and Yu (2013) show that with a row normalized Wn , the parameter space for kAn k2 < 1 can be characterized as a region enclosed by four linear hyperplanes

Rs = {(λ , γ, ρ) : γ + (ρ − λ )ϖn,min > −1, γ + ρ + λ < 1, γ + ρ − λ > −1, γ + (λ + ρ)ϖn,min < 1} . Elements in Rs already imply that −1 < γ < 1 and

1 ϖn,min

for kAn k2 < 1 is |λ | + |γ| + |ρ| < 1.

< λ < 1. Because |ϖn,i | < 1, a sufficient condition

2.2. Estimation Method The parameters for the model (1) are θ = (δ 0 , λ , α)0 with δ = (γ, ρ, β 0 )0 , σ 2 , Γn and FT . It is convenient to collect the predetermined variables and exogenous regressors by the n×K matrix Znt = (Yn,t−1 ,WnYn,t−1 , Xnt ), where K = k + 2, with k being the number of exogenous regressors in Xnt . Denote Sn (λ ) = In − λWn and Rn (α) = In − α W˜ n . The sample averaged quasi-log likelihood function is

1 1 1 QnT (θ , σ 2 , Γn , FT ) = − log 2π − log σ 2 + log |Sn (λ )Rn (α) | 2 2 n T 1 − 2 ∑ (Sn (λ )Ynt − Znt δ − Γn ft )0 Rn (α)0 Rn (α) (S(λ )Ynt − Znt δ − Γn ft ) . 2σ nT t=1

(3)

Concentrating out σ 2 from the objective function (3) and dropping the overall constant term for simplicity, 1 QnT (θ , Γn , FT ) = log |Sn (λ )Rn (α)| n ! 1 T 1 0 0 − log ∑ (Sn (λ )Ynt − Znt δ − Γn ft ) Rn (α) Rn (α) (Sn (λ )Ynt − Znt δ − Γn ft ) 2 nT t=1 is a concentrated sample averaged log likelihood function of θ , Γn and FT . In view of the unobserved nature of the common factor component, we shall make minimal assumptions on their structures. The number of factors is set at r in the estimation, which does not necessarily equal to the true number of factors r0 . Later sections will show that consistency requires r ≥ r0 while the results on limiting distribution require r = r0 . Because for our estimation method, no restriction is imposed on Γn and Rn (α) is assumed invertible for α

3 This paper does not require W be row normalized, see Assumption R2. Although it is convenient to work with a row normalized n spatial weights matrix, in some applications row normalization is not appropriate. For example, in the analysis of social interactions where the effect of network structure (for example, centrality) is of interest or there are individuals who influence others but are not influenced by others, row normalization might not be appropriate, see Liu and Lee (2010).

5

in its parameter space, optimizing with respect to Γn ∈ Rn×r is equivalent to optimizing with respect to the transformed Γ˜ n with Γ˜ n = Rn (α)Γn . The objective function can be equivalently written as

1 QnT (θ , Γ˜ n , FT ) = log |Sn (λ )Rn (α)| n !  0 1 T 1 − log ∑ Rn (α) (Sn (λ )Ynt − Znt δ ) − Γ˜ n ft Rn (α) (Sn (λ )Ynt − Znt δ ) − Γ˜ n ft . 2 nT t=1

(4)

As the sample expands, the number of parameters in the factors and their loadings also increases. Because the parameter of interest is θ and it is easier to optimize with respect to a finite dimensional vector, we concentrate out factors and their loadings using the principal component theory: for an n × T matrix HnT , min

FT ∈RT ×r ,Γ˜ n ∈Rn×r

tr



HnT − Γ˜ n FT0



HnT − Γ˜ n FT0

0 

0 = min tr HnT MFT HnT

=

FT ∈RT ×r n



i=r+1

 0 µi HnT HnT ,



where for the second equality, the j-th column of FˆT is the eigenvector corresponding to the j-th largest 0 H . Because Γ ˜ n FT0 cannot be separately identified from Γ˜ n MM −1 FT0 for an invertible maeigenvalue of HnT nT

trix M, the identification conditions that FT0 FT = Ir and Γ˜ 0n Γ˜ n is diagonal have been imposed. The estimated factors and factor loadings will be rotations of their true values. However, the asymptotic distribution of our QML estimator is unaffected by the normalizations because the space spanned by the factors and their loadings do not change. The concentrated log likelihood is QnT (θ ) =

 1 1 QnT θ , Γ˜ n , FT = log |Sn (λ )Rn (α)| − log LnT (θ ), (5) n 2    0 ∑ni=r+1 µi Rn (α) Sn (λ ) − ∑Kk=1 Zk δk Sn (λ ) − ∑Kk=1 Zk δk Rn (α)0 . The QML estima-

max

Γ˜ n ∈Rn×r ,FT ∈RT ×r

with LnT (θ ) =

1 nT

tor is θˆnT = arg maxθ ∈Θ QnT (θ ). The estimate for Γ˜ n can be obtained as the eigenvectors associated with  0 the first r largest eigenvalues of Rn (α) Sn (λ ) − ∑Kk=1 Zk δk Sn (λ ) − ∑Kk=1 Zk δk Rn (α)0 . By switching n and T , the estimate for FT can be similarly obtained. Note that the estimated Γ˜ n and FT are not unique, as

Γ˜ n HH −1 FT0 is observationally equivalent to Γ˜ n FT0 for any invertible r × r matrix H. However, the column

spaces of Γ˜ n and FT are invariant to H, hence the projectors MΓ˜ n and MFT are uniquely determined. 2.3. Assumptions

The true values of θ , Γ˜ n and FT are denoted by θ0 , Γ˜ n0 and FT 0 . Note that the dimensions of Γ˜ n and FT are n × r and T × r, and may not equal to the dimensions of Γ˜ n0 and FT 0 which are n × r0 and T × r0   respectively. Denote ε = εn1 εn2 · · εnT , a n × T matrix. 6

Assumption E 1. The disturbances εit are independently distributed across i and over t with Eεit = 0, Eεit2 = σ02 > 0 and has uniformly bounded moment E |εit |4+η for some η > 0. 2. The disturbances in ε are independently distributed from regressors Xk and the factors FT 0 and Γn0 . The disturbances of the model have a spatial structure. From Eq. (1), U = Rn (α0 )−1 ε. Its spatial heterogeneity is captured by W˜ n and coefficient α. In panels with factors, many estimation methods allow idiosyncratic errors to be cross sectionally correlated and heteroskedastic in an unknown form, up to a degree (Bai (2009), Pesaran (2006)). However, when a spatial autoregressive model is estimated by QML assuming homoskedastic error, the QMLE is generally inconsistent if errors are in fact heteroskedastic but ignored, as shown by Lin and Lee (2010) in the cross sectional setup. In the panel setup, Bai and Li (2015) allow heteroskedastic disturbances along the cross section but invariant over time by treating individual variances as additional parameters4 . In our current setting, consistency of the QMLE requires this stronger p  homoskedastic assumption in ε. Latala (2005) shows that, under Assumption E, kεk2 = OP max(n, T ) .

To have more simplified notations, define n × n matrices Sn = Sn (λ0 ), Gn = Wn Sn−1 , Rn = Rn (α0 ), G˜ n =     K W˜ n R−1 and n × T matrices Z = = δ G Z , Y = ∑ G Z δ · · G Z δ Y · · Y K+1 n 0k k n k=1 n n1 0 n nT 0 n1 nT   (3) 3 (4) and Y−1 = Yn0 · · YnT −1 . In the case of nonnormal disturbance, denote µ = Eεit and µ = Eεit4 . Assumption R

1. The parameter θ0 is in the interior of Θ, where Θ is a compact subset of RK+1 . We use Θχ to denote the parameter space for parameter χ, χ = λ , α, etc. 2. The spatial weights matrices Wn and W˜ n are non-stochastic. Wn , Sn−1 , W˜ n and R−1 n are uniformly bounded in absolute value in both row and column sums (UB). Sn (λ ) and Rn (α) are invertible for any λ ∈ Θλ and α ∈ Θα . Furthermore, lim infn,T →∞ infλ ∈Θλ |Sn (λ )| > 0 and lim infn,T →∞ infα∈Θα |Rn (α)| >

0.

3. The elements of Xnt have uniformly bounded 4-th moments. 4. The number of factors is constant r0 , and elements of Γn0 and FT 0 have uniformly bounded 4-th moments.  h 5. ∑∞ h=1 abs An is UB, where [abs (An )]i j = An,i j . In addition, there exists a constant b < 1 and n0 , such that kAn k2 ≤ b for all n ≥ n0 .

4 Consistent estimation methods remain to be studied if variances depend on individual explanatory variables or vary across both units and time, where one may adjust the score vector from the principle component approach following the strategy in Liu and Yang (2015) for the cross sectional case. This is an interesting research topic and is beyond the scope of this paper.

7

` 5 Assumption R1 is standard. The sum ∑∞ `=0 (λWn ) is convergent if kλWn k < 1 for some norm k·k. In

` this situation, Sn (λ ) is invertible and Sn (λ )−1 = ∑∞ `=0 (λWn ) is Neumann’s series. Therefore a sufficient

condition for the invertibility of Sn (λ ) is |λ | <

1 kWn k .

Similar properties hold for Rn (α).

There are several limit concepts, such as sequential, pathwise, and simultaneous. In this paper, the

asymptotics consider both n and T going to infinity, and simultaneous limit is used. A simultaneous limit does not restrict n and T to grow along a fixed path. In deriving the asymptotic distribution of the QML estimator, proportional n and T is assumed. 2.4. Identification The following Assumptions ID1 and ID2 are used to show that θ0 , Γn0 FT0 0 and the number of factors can be uniquely recovered from the distribution of data on Y and Z. Their sample counterparts are the subsequent Assumptions NC1 and NC2. Assumptions ID1 and NC1 require Z1 , · · · , ZK+1 be linearly independent, while

Assumptions ID2 and NC2 relax this, but impose more restrictions on the variance structure. The linear independence conditions can fail if Gn Znt δ0 is linearly dependent on Znt for all t = 1, · · · , T . This can happen in a pure SAR model with no regressors in which case δ0 = 0.

Assumption ID1   1. Let z = vec (Z1 ) · · · vec (ZK+1 ) , which is an nT × (K + 1) matrix of regressors. The (K + 1) ×   (K + 1) matrix E z0 MFT 0 ⊗ Rn (α)0 MΓ˜ n Rn (α) z is positive definite for any α ∈ Θα and Γ˜ n ∈ Rn×r

with some r ≥ r0 where r0 is the true number of factors, MFT 0 = IT − FT 0 (FT0 0 FT 0 )† FT0 0 and MΓ˜ n = † In − Γ˜ n Γ˜ 0n Γ˜ n Γ˜ 0n .

2. For any α 6= α0 , Rn (α)0 Rn (α) is linearly independent of R0n Rn .

1  0 0 −1 − R−1 0 R (α)0 R (α)R−1 n > 0, by the inequalAssumption ID1(2) implies that 1n tr R−1 n n n n n Rn (α) Rn (α)Rn ity of arithmetic and geometric means. A sufficient condition for ID1(2) is that In , W˜ n + W˜ n0 and W˜ n0W˜ n are

linearly independent.6 Assumption ID1 generalizes those in Lee and Yu (2016) on the identification of spatial panel models with additive individual and time effects.

Assumption ID2     1. Let z = vec (Z1 ) · · · vec (ZK ) , which is nT ×K. The K ×K matrix E z0 MFT 0 ⊗ Rn (α)0 MΓ˜ n Rn (α) z is positive definite for any α ∈ Θα and Γ˜ n ∈ Rn×r with some r ≥ r0 where r0 is the true number of

5 This condition does not depend on the type of norm used, since all norms in Rn×n are equivalent and therefore convergence of a series does not depend on a specific type of norm.  6 Let c and c be two scalars. For α 6= α , c R (α)0 R (α) + c R0 R = (c + c )I − (c α + c α ) W ˜ n + W˜ n0 + (c1 α 2 + n 1 2 0 1 n 2 n n 1 2 n 1 2 0 c2 α02 )W˜ n0W˜ n = 0 only if c1 = c2 = 0 because In , W˜ n + W˜ n0 and W˜ n0W˜ n are assumed to be linearly independent. Therefore for any α 6= α0 , Rn (α)0 Rn (α) is linearly independent of R0n Rn . Notice that W˜ n can be symmetric.

8

factors, MFT 0 = IT − FT 0 (FT0 0 FT 0 )† FT0 0 and MΓ˜ n = In − Γ˜ n Γ˜ 0n Γ˜ n

†

Γ˜ 0n .

2. For any λ ∈ Θλ and α ∈ Θα , if λ 6= λ0 or α 6= α0 , then Sn (λ )0 Rn (α)0 Rn (α)Sn (λ ) is linearly independent of Sn0 R0n Rn Sn .7

Assumption ID2(2) is equivalent to for any λ ∈ Θλ and α ∈ Θα , if λ 6= λ0 or α 6= α0 , n1  −1 0 −1 0 1 0 −1 0 0 0 −1 −1 > 0. tr R−1 − Rn Sn Sn (λ )0 Rn (α)0 Rn (α)Sn (λ )Sn−1 R−1 n Sn Sn (λ ) Rn (α) Rn (α)Sn (λ )Sn Rn n n

Assumption ID requires that the regressors be not linearly dependent. For parameter identification, we

need the concentrated expected objective function to be uniquely maximized at the truth. We assume that the number of factors used in the concentrated expected objective function is not smaller than the true number of factors. Given that the number of latent factors is small in many empirical applications, it is reasonable to assume that an upper bound of the factor number is known. The estimated Γn and FT need not have full column rank and the true number of factors, r0 , can be recovered from the rank of Γn FT0 , as the following proposition shows. Proposition 1. Under Assumptions E, R and ID1 (or ID2), θ0 , Γn0 FT0 0 and r0 are identified. The proof is in Appendix B. Assumptions NC below are sample counterparts of Assumptions ID. They are specifically needed for the consistency of the proposed estimator. They can be slightly weakened, but will then involve the unobserved factors, as in Assumption A of Bai (2009). Assumption NC1 1. There exists a positive constant b, such that minη∈BK+1 ,α∈Θα ∑ni=2r+1 µi

1 nT Rn (α) (η

 · Z) (η · Z)0 Rn (α)0 ≥

b > 0 wpa 1 as n, T → ∞, where BK+1 is the unit ball of the (K + 1)−dimensional Euclidean space; √ η is a (K + 1) × 1 nonzero vector with kηk2 = η 0 η = 1; η · Z ≡ ∑K+1 k=1 ηk Zk is a convex linear

combination of those n × T matrices Zk ’s.  1   0 R (α)0 R (α)R−1 − R−1 0 R (α)0 R (α)R−1 n > 0. 2. For any α ∈ Θα , α 6= α0 , lim infn,T →∞ n1 tr R−1 n n n n n n n n

Assumption NC1(1) requires no perfect collinearity between regressors and sufficient variations for each regressor.

Notice that this excludes constant regressors, because they are constant along i or t, and

 sufficient condition is that the following 9 matrices are linearly independent, In , Wn + Wn0 , W˜ n + W˜ n0 , Wn0 W˜ n + W˜ n0 +  W˜ n + W˜ n0 Wn , W˜ n0W˜ n , Wn0Wn , W˜ n0W˜ nWn + Wn0W˜ n0W˜ n , Wn0 W˜ n + W˜ n0 Wn and Wn0W˜ n0W˜ nWn , for the case that W˜ n 6= Wn . In the event that W˜ n = Wn , Assumption ID2(2) can only give local identification for λ0 and α0 in the sense that (λ0 , α0 ) can not be distinguished from (α0 , λ0 ). The latter situation is similar to the identification issue of a pure spatial autoregressive with spatial error process Yn = λWnYn +Un with Un = αWnUn + εn . 7A

9

∑ni=2r+1 µi

1 nT Rn (α) (η

 · Z) (η · Z)0 Rn (α)0 is 0 for them. Such regressors will be absorbed into the com-

mon factor component and examples include those that do not vary over time, e.g., gender, race, and those

that are common across individuals, e.g., common time effects. An analogy is in an additive fixed effects model, the time invariant or individual invariant regressors would be captured respectively in the individual and time dummies. To estimate the effect of the constant regressors, one may impose additional identifying assumptions (see Bai and Ng (2008)) to separate the time factors from the factor loadings, and then estimate the effects of the constant regressors on time factors or the factor loadings. In the context of additive fixed effects panel model, Hausman and Taylor (1981) propose a similar two-step method to estimate the effect of time-invariant characteristics, and instrumental variables can be used if the constant regressors are endogenous. Alternatively, the constant regressors can be included in the model if one is willing to assume that they are not collinear after projecting out the unobserved true factors and loadings, see Moon and Weidner (2015a) for the linear panel model. However the consistency proof would require new assumptions on the relationship between the constant regressors and the unobserved factors and loadings, and new arguments will also be needed. Because these assumptions involve the structure of the unobserved factors and therefore may be difficult to verify in empirical studies, and the asymptotic distribution of our estimator will still have similar forms, these are not pursued in this paper. It is desirable to understand more about the condition in NC1(1) in terms of regressors implied by the SAR model. Define n × (K + 1) matrices Znt = (Znt,1 , · · · , Znt,K , Gn Znt δ0 ) = (Znt , Gn Znt δ0 ) for t = 1, · · · , T ; and the overall n × (K + 1)T matrix ZnT = [Zn1 , · · · , ZnT ]. We have Rn (α)(η · Z) =

K+1

∑ ηk Rn (α)Zk = Rn (α)

k=1

! h i h i ∑ ηk Zn1,k Zn2,k · · · ZnT,k + ηK+1 Gn Zn1 δ0 Gn Zn2 δ0 · · · Gn ZnT δ0 K

k=1

K

K

=Rn (α)[ ∑ Zn1,k ηk + (Gn Zn1 δ0 )ηK+1 , · · · , ∑ ZnT,k ηk + (Gn ZnT δ0 )ηK+1 ] k=1

k=1

h i =Rn (α) Zn1 η Zn2 η · · · ZnT η = Rn (α)ZnT (IT ⊗ η),

where η = (η1 , · · · , ηK+1 )0 . So the NC1(1) condition concerns about the smallest (n − 2r) eigenvalues of the n × n matrix

1 nT Rn (α)(η

· Z)(η · Z)0 Rn (α)0 =

1 nT Rn (α)ZnT (IT

0 R (α)0 for each η ∈ ⊗ η)(IT ⊗ η 0 )ZnT n

BK+1 and α ∈ Θα . Because these matrices are nonnegative definite, their eigenvalues are nonnegative but some can be zero. If there were an α ∈ Θα and η ∈ BK+1 with the n − 2r smallest eigenvalues of 1 nT Rn (α)(η

· Z)(η · Z)0 Rn (α)0 being all zero, then the NC1 assumption would not be satisfied. So we need

some sufficient conditions to rule out such cases.

10

1 0 ) > 0, i.e., the sum of the smallest n − 2r − 1 − KT eigenvalues ZnT ZnT Proposition 2. If ∑ni=2r+1+KT µi ( nT

of

1 0 nT ZnT ZnT

is positive, and µn (Rn (α)0 Rn (α)) > 0 for all α ∈ Θα , with probability approaching 1 as

n, T → ∞, then Assumption NC1(1) is satisfied. The proof is in Appendix B. In order for Proposition 2 to hold, it is necessary that ZnT has rank at least as large as KT + 2r + 1, that in turn, requires (2r + 1 + KT ) ≤ min{n, (K + 1)T } because ZnT is a

n × (K + 1)T matrix. As the problem under consideration has both n and T tend to infinity and r is finite, the latter requires

n T

> K for large enough n and T .

The above analysis can be generalized to the case where Gn Znt δ0 is linearly dependent on Znt for all t = 1, · · · , T , such that Gn Znt δ0 = Znt C for a constant vector C. As pointed out preceding Assumption ID1, this can happen in a pure SAR model. In this case, let η ∗ = (η1 , · · · , ηK )0 . Here,

˜ Rn (α)η · Z = Rn (α)[Zn1 η ∗ + (Gn Zn1 δ0 )ηK+1 , · · · , ZnT η ∗ + (Gn ZnT δ0 )ηK+1 ] = Rn (α)[Zn1 , · · · , ZnT ](IT ⊗ η), ∗ = [Z , · · · , Z ]. where η˜ = η ∗ + ηK+1C. The previous result can now be applied to the n × KT matrix ZnT n1 nT

In such case, we have an alternative set of conditions that guarantee consistency, as follows. Assumption NC2

1. Suppose that Gn Znt δ0 = Znt C for a constant vector C for all t = 1, · · · , T . There exists a positive  1 constant b, such that minη∈BK ,α∈Θα ∑ni=2r+1 µi nT Rn (α) (η · Z) (η · Z)0 Rn (α)0 ≥ b > 0 wpa 1 as n, T → ∞, where BK is the unit ball of the K−dimensional Euclidean space; η is a K × 1 nonzero vector √ with kηk2 = η 0 η = 1; η · Z ≡ ∑Kk=1 ηk Zk .

2. For any λ ∈ Θλ and α ∈ Θα , if λ 6= λ0 or α 6= α0 ,   1  −1 0 −1 0 1 −1 0 −1 0 0 0 −1 −1 0 0 −1 −1 n lim inf tr Rn Sn Sn (λ ) Rn (α) Rn (α)Sn (λ )Sn Rn − Rn Sn Sn (λ ) Rn (α) Rn (α)Sn (λ )Sn Rn > 0. n,T →∞ n

When Gn Znt δ0 is linearly dependent on Znt for all t, NC1(1) will not be satisfied. The additional condition (2) of NC2 on the variance structure will make up for it, as will be clear in Proposition 3 below. 3. Asymptotic Theory 3.1. Consistency

Standard argument for consistency of an extremum estimator consists of showing that, for any τ >   ¯ 0 , τ) is the complement of an open 0, lim supn,T →∞ maxθ ∈Θ(θ ¯ 0 ,τ) EQnT (θ ) − EQnT (θ0 ) < 0, where Θ(θ neighborhood of θ0 in Θ with radius τ (i.e., identification uniqueness); and QnT (θ ) − EQnT (θ ) converges to zero uniformly on its parameter space Θ.

11

In many situations, the objective function with a finite number of parameters contains averages, and LLN follows with regularity assumptions (e.g. Amemiya (1985)). If smoothness condition holds, uniform LLN would also follow (Andrews (1987)). In our model, the concentrated likelihood function (Eq. (5)) involves sum of certain eigenvalues of a random matrix and is not in the direct form of sample averages, nor is it necessarily smooth. Furthermore, the number of parameters in Γn and FT increases to infinity as n (and T ) tends to infinity. It turns out that for consistency proof, it is relatively easier to work with the objective function without concentrating out Γn and FT . The idea is to respectively find an upper bound and a lower bound of the objective function, and then to show that the former for any θ that is outside the τ−neighborhood of θ0 for any τ > 0 is strictly less than the latter at θ0 , as n and T increase. Since we are maximizing the objective function, the upper bound at θˆnT must be not smaller than the lower bound at θ0 , which implies that the distance between θˆnT and θ0 is collapsing to 0 as n and T increase. Lemma 1 of Wu (1981) demonstrates this idea formally. Using this method, Moon and Weidner (2015b) show consistency of an NLS estimator for a regression panel model with interactive effects. We adapt these arguments to the spatial panel setting. Proposition 3. Under Assumptions NC1 (or NC2), E and R, and assuming that the number of factors is not

underestimated, i.e., r ≥ r0 , then θˆnT − θ0 1 = oP (1). Furthermore, under Assumptions NC1, E, R, and    

−1 −1

r ≥ r0 , δˆnT − δ0 = OP cnT2 , λˆ nT − λ0 = OP cnT2 , where cnT = min (n, T ). 2

The proof is in Appendix B. Note that we do not need to know the exact number of factors for the

consistency result. It is enough that the true number of factors is less than or equal to the number of factors assumed in estimation. Intuitively, if the number of factors used is not fewer than the true number of factors, variations due to factors can be accounted for. This feature has been observed in Moon and Weidner (2015b) for the panel regression model. This property turns out to hold also for spatial panels. A step in the consistency proof is based on the following inequality (Eq. B.9), 1 1 QnT (θ0 ) = log |Sn | − log n 2

1 min Γn ∈Rn×r ,FT ∈RT ×r nT ! 1 1 1 T n 2 ≥ log |Sn | − log ∑ ∑ εit , n 2 nT t=1 i=1

T

∑ (Γn0 ft0 +Unt − Γn ft )

0

t=1

R0n Rn (Γn0 ft0 +Unt

!

− Γn ft )

 which together with an upper bound of QnT θˆnT justifies the condition of Eq. (B.10). This inequality trivially holds if the true number of factors is at most r, but might not hold if the number of factors in the estimation is smaller than the number of true factors.

12

3.2. Limiting Distribution In this section, we derive the limiting distribution of the QML estimator θˆnT . Recall that Γ˜ n = Rn Γn . Because FT and Γ˜ n are concentrated out in estimation, only the true factor and loading will be needed via the analysis of first and second order derivatives of the concentrated objective function. Therefore, as no confusion will arise, in subsequent sections FT and Γ˜ n refer to the true factor and loading with the subscript ’0’ omitted for simplicity. For the limiting distribution of θˆnT , additional assumptions on limiting behaviors of Γ˜ n and FT are needed. Assumption SF The number of factors, r0 , is constant and known. plimn,T →∞ n1 Γ˜ 0n Γ˜ n and plimn,T →∞ T1 FT0 FT exist and are positive definite. The above assumption implies that for large enough n and T , all the eigenvalues of plimn,T →∞ 1n Γ˜ 0n Γ˜ n

and plimn,T →∞ T1 FT0 FT are bounded away from zero and are bounded from above. For deriving the limiting

distribution, the number of factors needs to be exact in order for asymptotic analysis to be tractable.8

2 (Γ 2 (Γ ˜ n , FT ) = 1 µr0 (Γ˜ n FT0 FT Γ˜ 0n ) and dmax ˜ n , FT ) = 1 µ1 (Γ˜ n FT0 FT Γ˜ 0n ). Notice that 1 Γ˜ n FT0 FT Γ˜ 0n Define dmin nT nT nT 2 (Γ ˜ n , FT ) > 0 and has at most r0 positive eigenvalues. As a consequence of Assumption SF, plimn,T →∞ dmin  2 (Γ ˜ n , FT ) < ∞.9 The total variation in Y is 1 tr (YY 0 ), and its component 1 tr Γ˜ n FT0 FT Γ˜ 0n is plimn,T →∞ dmax nT nT

due to common factors. Assumption SF guarantees that each of the r0 factors has a nontrivial contribution  1 towards nT tr Γ˜ n FT0 FT Γ˜ 0n . Similar assumption is in Bai (2003, 2009). Moon and Weidner (2015b) label this “strong factor assumption”. 1 nT

In deriving the limiting distribution of θˆnT , we need to express LnT (θ ) around θ0 , where LnT (θ ) =    0 ∑ni=r0 +1 µi Rn (α) Sn (λ )Y − ∑Kk=1 Zk δk Sn (λ )Y − ∑Kk=1 Zk δk Rn (α)0 . The perturbation theory of lin-

ear operators is used. The technical details of perturbation are in Appendix C and the supplementary file.

We now provide the limiting distribution of θˆnT . The detailed proofs are in Appendix D. Let CnT denote the sigma algebra generated by Xn1 , · · · , XnT , Γ˜ n and FT . Define the n × T matrices Z¯ k = E (Zk |CnT ),

Zk ≡ MΓ˜ n Rn Z¯ k MFT + MΓ˜ n Rn (Zk − Z¯ k ), k = 1, · · · , K + 1 and the n × T matrix of lagged disturbances ε˜h =   εn,1−h · · εn,T −h , h ≥ 1, where we drop subscripts n and T for those matrices for simplicity. Using 8 In

Moon and Weidner (2015b), they show that with additional assumptions on regressors and error distribution, the additional term does not change the limiting distribution of the estimator. However, those additional assumptions are rather strong. In spatial models, relevant assumptions remain to be seen. 1 ˜0 ˜ 1 0 9 This is so because the n × n matrix 1 Γ ˜ 0 ˜0 have the same nonzero eigenvalues, nT n FT FT Γn and the r0 × r0 matrix n Γn Γn T FT F T  2 (Γ ˜ n , FT ) = µ1 ( 1 Γ˜ 0n Γ˜ n 1 F 0 FT ) ≤ µ1 1 Γ˜ 0n Γ˜ n µ1 1 F 0 FT < ∞, and d 2 (Γ˜ n , FT ) = counting multiplicity. For large n and T , dmax min n T T n T T    µr0 n1 Γ˜ 0n Γ˜ n T1 FT0 FT ≥ µr0 1n Γ˜ 0n Γ˜ n µr0 T1 FT0 FT > 0. See Theorem 8.12 (2) in Zhang (2011), which shows that for Hermitian and positive semidefinite n × n matrices A and B, µi (A)µn (B) ≤ µi (AB) ≤ µi (A)µ1 (B).

13

the reduced form of the dynamic equation in (1), we have ∞ h−1 −1 −1 h−1 −1 −1 ¯ Z1 − Z¯ 1 = ∑∞ h=1 An Sn Rn ε˜h , Z2 − Z2 = Wn ∑h=1 An Sn Rn ε˜h ,

Zk − Z¯ k = 0 for k = 3, · · · , K,

h−1 −1 −1 ZK+1 − Z¯ K+1 = (γ0 Gn + ρ0 GnWn ) ∑∞ h=1 An Sn Rn ε˜h .

(6)

Theorems 1 and 3 characterize the asymptotic distribution, asymptotic bias and asymptotic variance of θˆnT . Theorem 1. Assume that

n T

→ κ 2 > 0 and Assumptions NC1 (or NC2), E, R and SF hold, then

√ −1 −1 1 √ νnT + oP (1), nT (θˆnT − θ0 ) − σ02 DnT ϕnT = σ02 DnT nT

(7)

where DnT is defined in Eq. (9) and is assumed to be positive definite, ϕnT = ϕnT,γ , ϕnT,ρ , 01×(K−2) , ϕnT,λ , ϕnT,α     σ02 σ02 −1 T −1 −1 with ϕnT,γ = − √nT tr J0 PFT Jh0 tr Anh−1 Sn−1 , ϕnT,ρ = − √nT tr J0 PFT Jh0 tr Wn Ah−1 ∑Th=1 ∑h=1 n Sn , ϕnT,λ =   q T 2 r0  σ02 −1 −1 + −1 − √nT tr J0 PFT Jh0 tr (γGn + ρGnWn ) Ah−1 and ϕnT,α = ∑Th=1 n Sn n σ0 n tr (Gn ) − tr PΓ˜ n Rn Gn Rn q    0 T 2 r0 ˜ ˜ n σ0 n tr Gn − tr PΓ˜ n Gn . Jh = 0T ×(T −h) , IT , 0T ×h , and

0

  νnT = tr Z1 ε 0 , · · · , tr ZK ε 0 ,   0      1 1 1 1 0 −1 0 0 1 0 0 1 ˜ ˜ tr ZK+1 ε + √ tr Rn Gn Rn εε − √ tr εε tr (Gn ) , √ tr Gn εε − √ tr εε tr Gn . n n nT nT nT nT

To derive the joint distribution of vnT , Cramér-Wold device can be used. Let c ∈ RK+2 , !     K+1  1   1   0 0 −1 0 0 0 0 ˜ ˜ c νnT = tr ∑ ck Zk ε + cK+1 tr Rn Gn Rn εε − tr (Gn ) tr εε + cK+2 tr Gn εε − tr Gn tr εε n n k=1 !0    K+1  1 0 −1 ˜ ˜ vec (ε) = vec ∑ ck Zk vec (ε) + vec (ε) cK+1 IT ⊗ Rn Gn Rn + Gn − tr Gn + Gn n k=1 c 0 = bcnT 0 vec (ε) + ωnT vec(ε) + vec (ε)0 AcnT vec (ε) ,

 c  ∞ c ˜ c c h−1 −1 −1 ¯ where bcnT = ∑K+1 k=1 ck vec MΓ˜ n Rn Zk MFT , ωnT = vec ∑h=1 Pnh εh with Pnh = Bn A0n Sn Rn , and

 1 c1 0 AnT + Ac1 Bcn = MΓ˜ n Rn (c1 In + c2Wn + cK+1 (γ0 Gn + ρ0 GnWn )) , and AcnT = nT is an nT × nT symmetric matrix with 2     1 1 c1 −1 AnT = cK+1 HK+1 + cK+2 HK+2 , HK+1 = IT ⊗ Rn Gn Rn − tr (Gn ) , and HK+2 = IT ⊗ G˜ n − tr G˜ n . n n  h−1 Under Assumptions R and SF, elements of bcnT have uniformly bounded 4-th moments; AcnT , Bcn , ∑∞ h=1 abs A0n , Sn−1 and R−1 n are UB. Together with Assumption E, the CLT of the martingale difference array for linear-

quadratic form (Kelejian and Prucha (2001) and Yu et al. (2008) Lemma 13) are applicable. Theorem 2. Under Assumptions E, R and SF, Ec0 vnT = σ02 tr (AcnT ) = 0,  var c0 νnT 14

=T σ04 Etr



∑ Pnhc 0 Pnhc

h=1

!

  nT nT  + σ02 EbcnT 0 bcnT + 2σ04 tr AcnT 2 + 2µ (3) E ∑ [bcnT ]i [AcnT ]ii + µ (4) − 3σ04 ∑ [AcnT ]2ii i=1

=nT σ04 c0 (DnT + ΣnT + oP (1)) c,

i=1

(8)

where the (K + 2) × (K + 2) matrix DnT is   0 ··· 0 0  ..  .. .. .  . . 1   0 X X + DnT = ,  nT nT 0 · · · 0 0 0 2   nT σ0 0 · · · 0 ψK+1,K+1 ψK+1,K+2  0 · · · 0 ψK+1,K+2 ψK+2,K+2   where XnT = X1 · · · XK+1 0 , with Xk = vec MΓ˜ n Rn Zk MFT , k = 1, · · · , K + 1;

(9)

 1  1  2 1 1 −1 0 0 0 2 −1 ˜ 0 ˜ ψK+1,K+1 = n1 tr Rn Gn R−1 n Rn Gn Rn + n tr(Gn )−2 n tr (Gn ) , ψK+1,K+2 = n tr Gn Gn + n tr Rn Gn Rn Gn −  2 2 tr(Gn )tr(G˜ n ) and ψK+2,K+2 = n1 tr G˜ n G˜ 0n + n1 tr(G˜ 2n ) − 2 1n tr G˜ n ; and furthermore n2

ΣnT



Σ1,K+1 nT,A .. .

  0K×K µ (3)  = 4  ΣK,K+1 nT,A σ0   1,K+1 ΣnT,A · · · ΣK,K+1 2ΣK+1,K+1 nT,A nT,A Σ1,K+2 · · · ΣK,K+2 ΣK+1,K+2 nT,A nT,A nT,A

   Σ1,K+2 0 0 nT,A  .. .. ..    . 0K×K . .   µ (4) − 3σ04    K,K+2  +  , 0 0 ΣnT,A  4   σ K+1,K+1 K+1,K+2 0 K+1,K+2  0 · · · 0 ΣnT,B  ΣnT,B  ΣnT,A K+1,K+2 K+2,K+2 0 · · · 0 ΣnT,B ΣnT,B 0 (10)

 k1 ,k2 K+1,K+1 1 ¯ where ΣnT,A = nT = ∑nT i=1 [vec MΓ˜ n Rn Zk1 MFT ]i Hk2 ,ii for k1 = 1, · · · , K +1 and k2 = K +1 or K +2; ΣnT,B K+2,K+2 K+1,K+2 2 2 nT nT nT 1 1 1 = nT ∑i=1 (HK+2,ii ) ; and ΣnT,B = nT ∑i=1 HK+1,ii HK+2,ii . nT ∑i=1 (HK+1,ii ) ; ΣnT,B T n

→ κ 2 > 0; D = plimn,T →∞ DnT is positive definite; Σ = p limn,T →∞ ΣnT ; ϕ = √  p limn,T →∞ ϕnT ; and suppose that Assumptions NC1 (or NC2), E, R and SF hold, then nT θˆnT − θ0 − −1 d  σ02 D ϕ → N 0, D−1 (D + Σ) D−1 .

Theorem 3. Assume that

Theorem 3 shows that the limiting distribution of θˆnT may not center at θ0 , with an asymptotic bias term −1 σ02 D ϕ. For a regression panel with factors, Moon and Weidner (2015a) show that leading order biases

are due to the correlation between the predetermined regressors with the disturbances, and heteroskedastic disturbances. In our model, the biases arise from the predetermined regressors and the interaction between the spatial effects and the factor loadings. As the time factors and loadings contain infinite number of parameters when sample sizes n and T go to infinity, the asymptotic bias must be due to the incidental parameter problem. In the next subsection, a bias corrected estimator will be proposed. The variance matrix in the limiting distribution has a sandwich form to accommodate possible non-normal disturbances. When disturbances in the model are normally distributed, Σ = 0 and the limiting variance will be D−1 in a single matrix form. 15

3.3. Bias Correction This section proposes a method to correct the bias ϕ in the limiting distribution. The bias depends on θ , PΓ˜ n , PFT , σ02 and D−1 . The parameter θ can be estimated by θˆnT . The projectors PΓ˜ n and PFT can be estimated as follows. From Eq. (5), let Bˆ Γ˜ n denote the n × r0 matrix of the eigenvectors associated with the largest r0       0 ˆ Sn λˆ Y − ∑K Zk δˆk Sn λˆ Y − ∑K Zk δˆk Rn (α) ˆ 0 , where the subscripts nT eigenvalues of 1 Rn (α) k=1

nT

k=1

of parameter estimates are dropped for simplicity, then PΓˆ˜ = Bˆ Γ˜ n Bˆ 0Γ˜ , MΓˆ˜ = In − PΓˆ˜ . Interchanging the role n

n

n

n

of n and T , the factors PFˆT and MFˆT can be similarly estimated. Let Bˆ FT denote the T × r0 matrix of the eigen       0 1 ˆ 0 Rn (α) ˆ Sn λˆ Y − ∑Kk=1 Zk δˆk , vectors associated with the largest r0 eigenvalues of nT Sn λˆ Y − ∑Kk=1 Zk δˆk Rn (α) and PFˆT = Bˆ FT Bˆ 0FT , MFˆT = IT − PFˆT . Lemma 11 in Appendix C shows that under the assumptions of Theorem







3, MΓˆ˜ − MΓ˜ n = PΓˆ˜ − PΓ˜ n = OP ( √1n ) and MFˆT − MFT 2 = PFˆT − PFT 2 = OP ( √1n ). n

2

2

n

The variance σ02 can be estimated by

1 σˆ 2 = LnT (θˆnT ) = nT

n



K

i=r0 +1

ˆ Sn (λˆ )Y − ∑ Zk δˆk µi Rn (α) k=1

!

K

Sn (λˆ )Y − ∑ Zk δˆk k=1

!0

!

ˆ 0 . Rn (α)

(11)

Finally, Dˆ nT is an estimate of DnT (see Eq. (9)) with estimated θˆnT , PΓˆ˜ , PFˆT and σˆ 2 , and hence Dˆ nT also n

estimates D. The bias ϕ can then be estimated by plugging in these estimated elements. The following c is asymptotically normal and centered at θ . theorem shows that the bias corrected estimator θˆnT 0

Theorem 4. Assume that

T n

→ κ 2 > 0; D = plimn,T →∞ DnT is positive definite; Σ = limn,T →∞ ΣnT ; and

c = θˆ − suppose that Assumptions NC1 (or NC2), E, R and SF hold, the bias corrected estimator is θˆnT nT √    d 2 D c −θ → ˆ nT −1 √1 ϕˆ nT . Then nT θˆnT σˆ nT N 0, D−1 (D + Σ) D−1 . 0 nT

Section 4 investigates the finite sample performance of the QML estimators in a Monte Carlo study.

3.4. The number of factors As long as the number of factors specified is not fewer than the true number of factors, the QML estimator is consistent. However, the limiting distribution is under the premise that the number of factors is correctly specified. Although Moon and Weidner (2015b) show that, under certain conditions, the limiting distribution of the NLS estimator for a regression panel is invariant to the inclusion of redundant factors, its finite sample performance may suffer, as Lu and Su (2016) emphasize. If factors are interpreted as omitted variables, their detection is a first step in trying to measure them. Several factor number selection methods have been proposed in the literature, including Bai and Ng (2002)’s PC and IC criteria, Onatski (2010)’s Edge Distribution estimator, and Ahn and Horenstein (2013)’s eigenvalue ratio tests. In our model, consistent determination of the number of factors cannot be separated from consistent estimation of the regression 16

coefficients. Our strategy is to first consistently estimate the regression coefficients and then determine the number of factors from the residuals. Suppose that it is known that the number of factors does not exceed rmax , which is typical and consistent with the assumption that the number of factors is finite and small relative to the sample size. Denote θ˘nT = 0  0 ,λ ˘ nT , α˘ nT the QML estimator with the number of factors set at rmax . The residuals are δ˘nT   K D˘ nT = Sn λ˘ nT Y − ∑ Zk δ˘nT,k = Γn FT0 + E˘nT ,

(12)

k=1

    K 0 −1 ˘ where E˘nT = R−1 λ0 − λ˘ nT . With appropriate n ε + ∑k=1 Zk δ0k − δnT,k + ZK+1 + Gn Γn FT + Gn Rn ε    



−1 −1



assumptions, Proposition 3 establishes that δ˘nT − δ0 = OP cnT2 and λ˘ nT − λ0 = OP cnT2 , and it 2

√ √  follows that E˘nT 2 = OP max n, T . We now demonstrate how the factor number can be consistently

determined using criteria similar to those in Bai and Ng (2002)10 . Consider the following criteria for r = 1, · · · , rmax , PC (r) = V (r) + r · g(n, T ), IC(r) = log (V (r)) + r · g(n, T ), where V (r) =

1 nT

∑ni=r+1 µi

      0  K K ˘ ˘ ˘ ˘ Sn λnT Y − ∑k=1 Zk δnT,k Sn λnT Y − ∑k=1 Zk δnT,k and g(n, T ) is a

positive penalty function of n and T . The following theorem provides conditions on g(n, T ) that ensures the consistent determination of the factor number.

Theorem 5. Assuming that r0 ≤ rmax , Assumptions NC1, E, R, SF hold, and (1) g(n, T ) → 0, (2) cnT · g(n, T ) → ∞ where cnT = min (n, T ), then limn,T →∞ Pr (ˆr = r0 ) = 1 with rˆ = arg min0≤r≤rmax PC(r) or rˆ = arg min0≤r≤rmax IC(r). The proof can be found in Appendix D. In order for the number of factors to be determined consistently, the penalty term needs to vanish at an appropriate rate so that both underestimation and overestimation of the number of factors can be ruled out. Note that the case with no factors (r0 = 0) is covered. In Bai and Ng (2002), the following 6 criteria are proposed.

PC1 (r) = V (r) + r · LnT PC2 (r) = V (r) + r · LnT

     n+T nT log , nT n+T    n+T θ˘nT log cnT , nT θ˘nT

10 Note

that this does not preclude the use of the methods of Onatski (2010) or Ahn and Horenstein (2013). Rather it is for the easy of exposition, as the composite error E˘nT here has complicated structures which do not directly satisfy their error assumptions.

17

 PC3 (r) = V (r) + r · LnT θ˘nT c−1 nT log cnT ,     n+T nT log , IC1 (r) = logV (r) + r · nT n+T   n+T IC2 (r) = logV (r) + r · log cnT , nT IC3 (r) = logV (r) + r · c−1 nT log cnT .

(13)

While all of the criteria satisfy the conditions in Theorem 5, the penalty g(n, T ) differs and they may have different finite sample performances. Other criteria that satisfy the conditions can also be formulated. We will test finite sample performances of the criteria above using Monte Carlo simulations. 4. Monte Carlo simulations 4.1. Design We study the finite sample performance of the QML estimator and the accuracy of the factor number selection in samples of different sizes, different degrees of spatial interaction, and different ratios between the variances of the idiosyncratic error and the factors. We also illustrate finite sample biases due to misspecification when the DGP has either spatial or interactive effects but they are ignored. The DGP is Ynt = λ0WnYnt + γ0Yn,t−1 + ρ0WnYn,t−1 + Xnt β0 + Γn0 ft0 + Unt , with Unt = α0WnUnt + εnt .  0 The dependent variable is affected by 2 unobserved factors FT 0 = f10 f20 · · · fT 0 which is T × 2  0   and their n × 2 loadings matrix Γn0 = γ10 γ20 · · · γn0 . Xnt = Xnt,1 Xnt,2 is a n × 2 matrix of  two regressors, which are generated according to Xnt,1i = 0.25 γi00 ft0 + (γi00 ft0 )2 + `0 γi0 + `0 ft0 + ηit,1 and  0 Xnt,2i = ηit,2 , where ` = 1 1 . Elements of γi0 , ft0 , ηit,1 and ηit,2 are generated from independent standard

normal variables. We see that Xnt,1i is correlated with the common component γi00 ft , its square, and the factors and loadings separately. Xnt,2i is not affected by the factors. The spatial weights matrix Wn is √ √ generated from a rook matrix. Individual units are arranged row by row on a n × n chessboard where neighbors are defined as those who share a common border. Units in the interior of the chessboard have 4 neighbors, and units on the border and corner have respectively 3 and 2 neighbors. This design of the spatial weights matrix is motivated by the observation that regions in most observed regional structures have similar connectivity as units in the rook matrix. Define n × n matrix M˜ n , such that M˜ n,i j = 1 if and only if individuals i and j are neighbors on the chessboard, and M˜ n,i j = 0 otherwise. Then the spatial weights matrix Wn is

defined as a row normalized M˜ n . εit ’s are generated from independent normal (0, ϑ ) distributions. Elements of the common factor components and the idiosyncratic errors have the same variances when ϑ = 1, and the latter has more variation when ϑ > 1. For each Monte Carlo experiment, Xnt , Γn0 and FT 0 are generated according to the above specification for T +1000 time periods and the last T periods are taken as our sample. 18

4.2. Monte Carlo results 1000 Monte Carlo replications are carried out for each design. The numerical maximization routine starts at multiple values, because the objective function is not concave and multiple local maxima might exist. The baseline specification is θ0a = (0.3, 0.3, 0.3, 0.3, 1, 1) for θ = (λ , γ, ρ, α, β1 , β2 ). We then consider the cases with negative spatial interaction (θ0b = (−0.3, 0.3, 0.3, 0.3, 1, 1)), no spatial correlation in disturbances (θ0c = (0.3, 0.3, 0.3, 0, 1, 1)), and no contemporaneous spatial interaction (θ0d = (0, 0.3, 0.3, 0.3, 1, 1)). We use combinations of n = 25, 49, 81 and T = 25, 49, 81. The Monte Carlo designs cover panels with small and moderate n and T . The θˆnT rows in Tables 1 and 2 report the Monte Carlo results for the QML estimators in specifications θ0a and θ0b respectively. The magnitude of biases generally decreases as n and T increase. The coverage probability (CP) is calculated using the asymptotic variance covariance matrix and the nominal coverage probability is set at 95%. The estimates for α have noticeable biases in finite sample, and as a result, their CPs are well below 95%. The CPs for other parameter estimates are also generally below 95% and therefore hypothesis tests will have over-rejection. The Monte Carlo results of the bias c rows in the same tables. The biases have been reduced signifcorrected estimators are reported in the θˆnT

icantly, especially for α. The CPs also improve which indicate a more reliable statistical inference based on the bias corrected estimator. Due to limited space, Monte Carlo results for θ0c and θ0d are reported in the supplementary file. Moon and Weidner (2015b) show that in regression panel, the limiting distribution of the QML estimator does not change when the number of factors is overspecified. Such a result is not available for the spatial panel considered in this paper, but consistency is still possible as argued. In Table 3, we report the performance of the bias corrected estimators when the number of factors is over-specified by 1, i.e. r = 3 while the true r0 = 2. Although estimates are still consistent, α has noticeable bias in small sample that is not removed by the bias correction procedure. Tables S.5 and S.6 in the supplementary file report additional Monte Carlo results with more redundant factors. As the number of redundant factors increases, the CP deteriorates. Note that the biases and CP improve in large samples (e.g., n = 81 and T = 81), and this is consistent with the results of Moon and Weidner (2015b). Therefore for valid inference in small sample, it is important that a correct number of factors is chosen. The estimators are less sensitive to redundant factors in large samples. We now check finite sample performances of the factor number selection criteria in Subsection 3.4. We report the percentages of underestimation (ˆr < r0 ), correct estimation (ˆr = r0 ), and overestimation (ˆr > r0 ) in 1000 simulations for different ratios of the variance of the error to the variance of the factor components, i.e. ϑ = 1, 4, 9. In addition to the criteria in Eq.(13) where we have proved their consistency, we also report 19

Table 1: Performance of the QML and Bias Corrected Estimators, θ0a

λˆ γˆ ρˆ αˆ βˆ1 βˆ2 Bias -0.00023 -0.00135 0.00007 0.01910 0.00173 -0.00137 θˆnT CP 0.921 0.920 0.909 0.869 0.929 0.919 25 Bias -0.00037 -0.00124 0.00010 0.00040 0.00177 -0.00136 c θˆnT CP 0.921 0.928 0.910 0.905 0.931 0.919 Bias 0.00026 -0.00076 -0.00049 0.02024 -0.00024 -0.00038 θˆnT CP 0.921 0.927 0.936 0.859 0.918 0.919 25 49 Bias 0.00019 -0.00075 -0.00045 0.00159 -0.00023 -0.00039 c θˆnT CP 0.920 0.929 0.936 0.919 0.919 0.920 Bias 0.00074 -0.00047 -0.00039 0.02142 0.00057 -0.00093 θˆnT CP 0.933 0.931 0.924 0.830 0.932 0.933 81 Bias 0.00067 -0.00046 -0.00034 0.00199 0.00057 -0.00094 c θˆnT CP 0.929 0.934 0.923 0.934 0.933 0.933 Bias -0.00078 -0.00126 0.00103 0.00940 -0.00020 -0.00080 θˆnT CP 0.934 0.918 0.936 0.907 0.920 0.940 25 Bias -0.00082 -0.00114 0.00097 -0.00026 -0.00016 -0.00078 c θˆnT CP 0.928 0.931 0.936 0.925 0.922 0.941 Bias 0.00015 -0.00022 -0.00017 0.01268 -0.00054 0.00017 θˆnT CP 0.946 0.938 0.942 0.890 0.943 0.938 49 49 Bias 0.00015 -0.00023 -0.00017 0.00258 -0.00053 0.00018 c θˆnT CP 0.942 0.949 0.944 0.936 0.945 0.938 Bias -0.00037 -0.00000 0.00001 0.01080 0.00070 0.00087 θˆnT CP 0.944 0.934 0.944 0.901 0.939 0.937 81 Bias -0.00038 -0.00001 0.00003 0.00043 0.00070 0.00087 c θˆnT CP 0.944 0.937 0.946 0.950 0.938 0.937 Bias -0.00031 0.00014 -0.00038 0.00526 0.00043 -0.00072 θˆnT CP 0.932 0.937 0.933 0.914 0.919 0.944 25 Bias -0.00029 0.00015 -0.00042 -0.00077 0.00048 -0.00070 c θˆnT CP 0.934 0.944 0.937 0.921 0.921 0.944 Bias 0.00013 0.00019 -0.00074 0.00488 0.00017 0.00058 θˆnT CP 0.944 0.935 0.955 0.937 0.929 0.927 81 49 Bias 0.00015 0.00016 -0.00073 -0.00141 0.0001 0.00058 c θˆnT CP 0.944 0.944 0.955 0.937 0.929 0.927 Bias -0.00037 -0.00017 0.00028 0.00692 0.00025 -0.00006 θˆnT CP 0.926 0.945 0.924 0.905 0.941 0.957 81 Bias -0.00038 -0.00015 0.00027 0.00065 0.00025 -0.00006 c θˆnT CP 0.926 0.947 0.925 0.938 0.942 0.957 The true parameters are θ0a = (0.3, 0.3, 0.3, 0.3, 1, 1) with θ = (λ , γ, ρ, α, β1 , β2 ). ϑ = 1. θˆnT is the QML c is the bias corrected estimator from Theorem 4. estimator and θˆnT n

T

20

Table 2: Performance of the QML and Bias Corrected Estimators, θ0b

λˆ γˆ ρˆ αˆ βˆ1 Bias 0.00238 -0.00173 -0.00118 0.02102 0.00194 θˆnT CP 0.915 0.920 0.923 0.857 0.929 25 Bias 0.00134 -0.00152 -0.00089 0.00276 0.00181 c θˆnT CP 0.917 0.925 0.921 0.905 0.928 Bias 0.00159 -0.00139 -0.00189 0.02106 -0.00003 θˆnT CP 0.928 0.922 0.919 0.861 0.911 25 49 Bias 0.00074 -0.00130 -0.00168 0.00294 -0.00018 c θˆnT CP 0.934 0.930 0.920 0.928 0.913 Bias 0.00253 -0.00038 -0.00042 0.02106 0.00101 θˆnT CP 0.921 0.922 0.917 0.837 0.933 81 Bias 0.00150 -0.00026 -0.00013 0.00234 0.00083 c θˆnT CP 0.928 0.929 0.915 0.935 0.932 Bias 0.00108 -0.00141 -0.00039 0.00967 -0.00025 θˆnT CP 0.917 0.920 0.943 0.912 0.924 25 Bias 0.00064 -0.00125 -0.00026 0.00028 -0.00029 c θˆnT CP 0.919 0.930 0.942 0.925 0.926 Bias 0.00133 -0.00030 -0.00057 0.01255 -0.00037 θˆnT CP 0.944 0.931 0.929 0.896 0.950 49 49 Bias 0.00083 -0.00026 -0.00044 0.00285 -0.00045 c θˆnT CP 0.945 0.934 0.930 0.935 0.951 Bias 0.00036 -0.00020 -0.00037 0.01064 0.00066 θˆnT CP 0.944 0.936 0.951 0.902 0.945 81 Bias -0.00011 -0.00015 -0.00024 0.00065 0.00058 c θˆnT CP 0.944 0.941 0.952 0.950 0.945 Bias 0.00020 -0.00024 -0.00109 0.00587 0.00036 θˆnT CP 0.949 0.928 0.938 0.910 0.928 25 Bias -0.00001 -0.00020 -0.00105 0.00003 0.00038 c θˆnT CP 0.949 0.933 0.938 0.920 0.930 Bias 0.00081 0.00002 -0.00057 0.00482 0.00017 θˆnT CP 0.934 0.945 0.952 0.920 0.931 81 49 Bias 0.00057 0.00002 -0.00051 -0.00124 0.00014 c θˆnT CP 0.938 0.950 0.954 0.945 0.930 Bias 0.00011 -0.00024 0.00003 0.00680 0.00019 θˆnT CP 0.921 0.944 0.929 0.910 0.936 81 Bias -0.00015 -0.00020 0.00011 0.00075 0.00014 c θˆnT CP 0.921 0.944 0.929 0.938 0.937 b The true parameters are θ0 = (−0.3, 0.3, 0.3, 0.3, 1, 1) with θ = (λ , γ, ρ, α, β1 , β2 ). c is the bias corrected estimator from Theorem 4. estimator and θˆnT n

T

21

βˆ2 -0.00087 0.921 -0.00113 0.920 -0.00001 0.917 -0.00024 0.919 -0.00032 0.926 -0.00060 0.929 -0.00055 0.937 -0.00065 0.937 0.00050 0.934 0.00037 0.933 0.00100 0.946 0.00087 0.944 -0.00069 0.943 -0.00072 0.944 0.00078 0.929 0.00072 0.930 -0.00000 0.954 -0.00007 0.953 ϑ = 1. θˆnT is the QML

Table 3: Performance of the Bias Corrected Estimator When the Number of Factors is Overspecified by 1

λˆ γˆ ρˆ αˆ βˆ1 βˆ2 Bias -0.00001 -0.00178 0.00040 0.00327 0.00221 -0.00208 CP 0.875 0.882 0.875 0.829 0.894 0.874 25 49 Bias 0.00054 -0.00088 -0.00068 0.00597 -0.00027 0.00013 CP 0.896 0.890 0.903 0.872 0.879 0.891 25 81 Bias 0.00062 -0.00051 -0.00031 0.00692 0.00061 -0.00073 CP 0.904 0.914 0.907 0.863 0.912 0.907 49 25 Bias -0.00025 -0.00125 0.00043 -0.00111 -0.00030 -0.00115 CP 0.891 0.905 0.892 0.880 0.900 0.902 49 49 Bias 0.00015 -0.00038 -0.00006 0.00425 -0.00089 0.00033 CP 0.930 0.917 0.911 0.902 0.929 0.916 49 81 Bias -0.00026 -0.00001 -0.00009 0.00165 0.00073 0.00077 CP 0.928 0.915 0.935 0.919 0.924 0.933 81 25 Bias -0.00025 0.00000 -0.00043 -0.00082 0.00030 -0.00096 CP 0.909 0.916 0.903 0.885 0.902 0.928 81 49 Bias 0.00020 0.00018 -0.00081 -0.00115 0.00022 0.00067 CP 0.936 0.929 0.941 0.935 0.919 0.918 81 81 Bias -0.00036 -0.00019 0.00028 0.00081 0.00025 -0.00018 CP 0.919 0.937 0.908 0.914 0.929 0.947 b θ0 25 25 Bias 0.00322 -0.00240 -0.00164 0.00509 0.00270 -0.00157 CP 0.869 0.898 0.882 0.829 0.882 0.889 25 49 Bias 0.00167 -0.00156 -0.00218 0.00738 -0.00004 0.00037 CP 0.901 0.896 0.898 0.879 0.880 0.892 25 81 Bias 0.00212 -0.00041 -0.00031 0.00643 0.00093 -0.00028 CP 0.905 0.916 0.897 0.855 0.913 0.903 49 25 Bias 0.00138 -0.00143 -0.00064 -0.00027 -0.00026 -0.00097 CP 0.902 0.904 0.913 0.889 0.898 0.911 49 49 Bias 0.00091 -0.00042 -0.00042 0.00458 -0.00079 0.00054 CP 0.934 0.922 0.924 0.909 0.932 0.918 49 81 Bias 0.00011 -0.00019 -0.00041 0.00190 0.00064 0.00080 CP 0.937 0.926 0.934 0.918 0.925 0.936 81 25 Bias 0.00005 -0.00035 -0.00112 0.00004 0.00017 -0.00101 CP 0.917 0.909 0.913 0.889 0.903 0.924 81 49 Bias 0.00051 0.00005 -0.00058 -0.00080 0.00017 0.00081 CP 0.922 0.927 0.934 0.935 0.919 0.918 81 81 Bias -0.00014 -0.00023 0.00010 0.00093 0.00015 -0.00019 CP 0.919 0.933 0.923 0.917 0.931 0.951 a b θ0 = (0.3, 0.3, 0.3, 0.3, 1, 1), θ0 = (−0.3, 0.3, 0.3, 0.3, 1, 1), and θ = (λ , γ, ρ, α, β1 , β2 ). ϑ = 1. The DGP is the same as described in the text. The true number of factors is 2 and the estimation assumes 3 factors. θˆ is the bias corrected QML estimator assuming 3 factors. θ0 θ0a

n 25

T 25

22

the accuracies of the eigenvalue ratio (ER) and eigenvalue growth ratio (GR) criteria of Ahn and Horenstein (2013), and the edge distribution (ED) criterion of Onatski (2010). Because ER, GR and ED criteria impose structures on the idiosyncratic errors and the residuals from our spatial panel regression have complicated forms, their theoretical justification remains to be studied. Figure 1 shows that the performances of IC1 and IC2 criteria are generally comparable to those of ER, GR and ED. The selection accuracies of IC1 and IC2 approach 100% with moderate sample sizes, for example n = 49 and T = 25. PC1, PC2 and IC3 perform less well in smaller samples and PC3 tends to overestimate the factor numbers. With higher ϑ , factors become relatively weaker, and we find that as a result, the selection errors increase in Figures 2 and 3. Underestimation becomes more frequent when factors become weaker. However, the selection accuracy quickly improves as sample size increases, and close to 100% accuracy is achieved in the sample with n = 100 and T = 100 for IC1. Based on the simulations, IC1 performances better in finite samples than other criteria in Eq.(13). For misspecification issues, Tables S.3 and S.4 in the supplementary file show that estimators might not be consistent as the biases are substantial if factors or spatial effects are ignored in the estimation but in fact ˆ they exist. Such biases are rather severe for the estimates ρˆ and α. 5. Empirical application: spatial spillovers in mortgage originations Our empirical application is motivated by Haurin et al. (2016), which analyzes the effect of house prices on state-level origination rates of the Home Equity Conversion Mortgage (HECM), but does not consider spatial spillovers. HECM is the predominant type of reverse mortgages which enable senior homeowners to withdraw their home equity without home sale or monthly payments. HECMs are insured and regulated by the federal government, although the private market originates the loans. The insurance is provided by the Federal Housing Administration through the mutual mortgage insurance fund, which guarantees that the borrower can have access to the loan fund in the future even when the lender is no longer in business, and the lender can be fully repaid when the loan terminates even if the house value is less than the loan balance. The borrower pays mortgage insurance premium both at loan closing and monthly over the lifetime of the loan. Haurin et al. (2016) find that states with past volatile house prices and current house price levels above long term norms have higher origination rates. This is consistent with the hypothesis that households use HECMs to insure against house price declines and therefore the mortgage insurance should take into account this behavioral response to house price dynamics, as the insurance fund will face higher claim risk when the insured HECMs concentrate disproportionately in areas that more likely see house price declines. Observing that the origination rates exhibit spatial clustering, it is of interest to quantify the spatial 23

Figure 1: Percentages of rˆ < r0 , rˆ = r0 and rˆ > r0 , ϑ = 1.

True parameter values: θ0a = (0.3, 0.3, 0.3, 0.3, 1, 1), where θ = (λ , γ, ρ, α, β1 , β2 ). ϑ = 1. True number of factors: 2. Initial estimates assume 8 factors.

24

Figure 2: Percentages of rˆ < r0 , rˆ = r0 and rˆ > r0 , ϑ = 4.

True parameter values: θ0a = (0.3, 0.3, 0.3, 0.3, 1, 1), where θ = (λ , γ, ρ, α, β1 , β2 ). ϑ = 4. True number of factors: 2. Initial estimates assume 8 factors.

25

Figure 3: Percentages of rˆ < r0 , rˆ = r0 and rˆ > r0 , ϑ = 9.

True parameter values: θ0a = (0.3, 0.3, 0.3, 0.3, 1, 1), where θ = (λ , γ, ρ, α, β1 , β2 ). ϑ = 9. True number of factors: 2. Initial estimates assume 8 factors.

Figure 4: Average HECM Origination Rates by US Regions

26

Figure 5: Average House Price Deviations and Volatility by US Regions

spillover effect. If spatial effects are present, the HECM activity in a state can be affected by developments in the neighboring states. In addition to contemporaneous spatial lag and spatial error, spatial time lag or diffusion is also included, which can arise due to partial adjustment (see Tao and Yu (2012) for the theoretical setup that can give rise to our empirical specification). Our data covers 51 states and 52 quarters from 2001 to 2013. Let yit denote the HECM origination rate, defined as the number of newly originated HECM loans in state i at quarter t as a percentage of the senior population (age 65 plus) in state i from the 2010 census. The n × n spatial weights matrices is Wn and Wn,i j = 1 if states i and j share the same border

and w1,i j = 0 otherwise. House price dynamic variables are constructed using the Federal Housing Finance

Agency’s quarterly all-transactions house price indexes (HPI) deflated by the CPI, and include deviations from the previous 9 year averages (hpi_dev), standard deviations of house price changes in the previous 9 years (hpi_v) and the interaction between the two. Figures 4 and 5 show the averages of these variables by U.S. regions in our sample period. It is likely that the origination rates are affected by some macroeconomic factors which are captured by the time factors fyt . It is also likely that macroeconomic factors have different impacts in different states, as captured by state-specific factor loadings. Note that the factor loadings may be spatially correlated, capturing the residual spatial effects not directly modeled. The interactive effects include additive individual and time effects as special cases, hence state time invariant variables and time variables invariant across states are not included. The model consists of yit =λ uit =α

n

n

j=1 n

j=1

∑ Wn,i j y jt + γyi,t−1 + ρ ∑ Wn,i j y j,t−1 + hpi_devit β1 + hpi_vit β2 + (hpi_devit × hpi_vit ) β3 + Γ0n,i ft + uit , ∑ Wn,i j u jt + εit .

j=1

The preliminary estimates use 8 factors. Based on the factor number selection criteria, we estimate 1 unob27

Table 4: Estimation of State-Level Origination Rates

Coefficient Contemporaneous Spatial Effect λ −0.05527∗∗∗ Own Time Lag γ 0.68981∗∗∗ Spatial Diffusion ρ 0.05405∗∗∗ Spatial Effect in Disturbances α 0.12180∗∗∗ House Price Deviation β1 −0.00025 House Price Volatility β2 0.01346∗∗∗ Deviation × Volatility β3 0.03203∗∗∗ The sample size is n = 51 and T = 52. Bias corrected estimates are reported. ∗ ∗ ∗ p < 0.01, ∗∗ p < 0.05, ∗

SD 0.01426 0.01263 0.00998 0.00808 0.00055 0.00153 0.00847 p < 0.1. σˆ = 0.0063.

served factor in yit . Table 4 reports the estimation results with one unobserved factor. The results reveal interesting spatial patterns. Higher HECM activity in a state negatively influences neighboring states (λ ). This is consistent with the hypothesis that lenders shift resources towards states with higher activity, resulting in lower activity in the neighboring states. On the other hand, spatial error (α) has a positive sign, reflecting spatially correlated demand and supply effects. The own time lag (γ) has positive effect, capturing serially correlated effects. Note that λ and diffusion (ρ) have opposite signs, which is consistent with partial adjustment. Because it takes time for loan originations to complete, the observed HECM origination rate in a quarter may reflect a partial adjustment to some target level from the origination rate in the last quarter. Tao and Yu (2012) shows that this implies a nonlinear restriction ρ + γλ = 0, and λ and ρ will have opposite signs with positive γ. Intuitively, given the origination rate in the current quarter, a higher origination rate in the last quarter means a downward adjustment. If the adjustment is only partial, the target rate should be lower than the observed rate. With negative spillovers (λ < 0), this implies a positive correction (ρ > 0). In addition, the results show that states with high past house price volatilities and current house prices above long term averages have higher origination rates, consistent with the findings of Haurin et al. (2016). 6. Conclusion Dynamic spatial panels with interactive effects are of practical interest. Outcomes of spatial units are correlated due to spatial interactions and common factors. The model under consideration has a rich spatial structure, which includes contemporaneous spatial interaction, spatial diffusion and spatial disturbances. Unobserved interactive effects of individual loadings and time factors account for additional cross sectional dependence and may correlate with observed regressors. This paper shows that the QML method provides consistent and asymptotically normal estimators. There are asymptotic biases arising from the predeter-

28

mined regressors and the interaction between the spatial effects and the factor loadings, but bias correction is possible. The Monte Carlo study shows that the proposed bias correction is effective in reducing bias. Consistency of estimators may still hold when the number of factors is overspecified. There are various criteria which can determine the number of factors in a sample. An application of the model to reverse mortgage originations reveals interesting spatial patterns. 7. References Ahn, S. C., Horenstein, A. R., 2013. Eigenvalue ratio test for the number of factors. Econometrica 81 (3), 1203–1227. Ahn, S. C., Lee, Y. H., Schmidt, P., 2013. Panel data models with multiple time-varying individual effects. Journal of Econometrics 174, 1–14. Amemiya, T., 1985. Advanced Econometrics. Harvard University Press. Andrews, D. W. K., 1987. Consistency in nonlinear econometric models: A generic uniform law of large numbers. Econometrica 55 (6), 1465–1471. Anselin, L., 1988. Spatial Econometrics: Methods and Models. Springer. Bai, J., 2003. Inferential theory for factor models of large dimensions. Econometrica 71 (1), 135–171. Bai, J., 2009. Panel data models with interactive fixed effects. Econometrica 77 (4), 1229–1279. Bai, J., Li, K., 2015. Dynamic spatial panel data models with common shocks. Manuscript, Columbia University. Bai, J., Ng, S., 2002. Determining the number of factors in approximate factor models. Econometrica 70 (1), 191–221. Bai, J., Ng, S., 2008. Large dimensional factor analysis. Foundation and Trends in Econometrics 3 (2), 89–163. Bernstein, D., 2009. Matrix Mathematics: Theory, Facts, and Formulas. Princeton University Press, Princeton, NJ. Cliff, A. D., Ord, J. K., 1973. Spatial Autocorrelation. Pion, London.

29

Conley, T. G., Dupor, B., 2003. A spatial analysis of sectoral complementarity. Journal of Political Economy 111, 311–352. Forni, M., Hallin, M., Lippi, M., Reichlin, L., 2004. The generalized dynamic factor model consistency and rates. Journal of Econometrics 119, 231–255. Han, X., Lee, L.-F., 2016. Bayesian analysis of spatial panel autoregressive models with time-varying endogenous spatial weight matrices, common factors, and random coefficients. Journal of Business & Economic Statistics 34, 642–660. Haurin, D., Ma, C., Moulton, S., Schmeiser, M., Seligman, J., Shi, W., 2016. Spatial variation in reverse mortgage usage: House price dynamics and consumer selection. Journal of Real Estate Finance and Economics 53, 392–417. Hausman, J. A., Taylor, W. E., 1981. Panel data and unobservable individual effects. Econometrica 49, 1377–1398. Kato, T., 1995. Perturbation Theory for Linear Operators. Springer-Verlag. Kelejian, H. H., Prucha, I. R., 2001. On the asymptotic distribution of the moran i test statistic with applications. Journal of Econometrics 104, 219–257. Kelejian, H. H., Prucha, I. R., 2010. Specification and estimation of spatial autoregressive models with autoregressive and heteroskedastic disturbances. Journal of Econometrics 157, 53–67. Latala, R., 2005. Some estimates of norms of random matrices. Proceedings of the American Mathematical Society 133 (5), 1273–1282. Lee, L.-F., 2004a. Asymptotic distributions of quasi-maximum likelihood estimator for spatial autoregressive models. Econometrica 72 (6), 1899–1925. Lee, L.-F., Nov. 2004b. Asymptotic distributions of quasi-maximum likelihood estimator for spatial autoregressive models, online appendix. URL http://www.econ.ohio-state.edu/lee/wp/sar-qml-r-appen-04feb.pdf Lee, L.-F., Yu, J., 2010a. Some recent developments in spatial panel data models. Regional Science and Urban Economics 40, 255–271.

30

Lee, L.-F., Yu, J., 2010b. A spatial dynamic panel data model with both time and individual fixed effects. Econometric Theory 26, 564–597. Lee, L.-F., Yu, J., 2013. Spatial Panel Data Models. The Oxford Handbooks: Panel Data. Oxford University Press, Oxford, England. Lee, L.-F., Yu, J., 2015. Estimation of fixed effects panel regression models with separable and nonseparable space-time filters. Journal of Econometrics 184, 174–192. Lee, L.-F., Yu, J., 2016. Identification of spatial durbin panel models. Journal of Applied Econometrics 31, 133–162. Lin, X., Lee, L.-F., 2010. Gmm estimation of spatial autoregressive models with unknown heteroskedasticity. Journal of Econometrics 157, 34–52. Liu, S. F., Yang, Z., 2015. Modified qml estimation of spatial autoregressive models with unknown heteroskedasticity and nonnormality. Regional Science and Urban Economics 52, 50–70. Liu, X., Lee, L.-F., 2010. Gmm estimation of social interaction models with centrality. Journal of Econometrics 159, 99–115. Lu, X., Su, L., 2016. Shrinkage estimation of dynamic panel data models with interactive fixed effects. Journal of Econometrics 190, 148–175. Moon, H. R., Weidner, M., 2015a. Dynamic linear panel regression models with interactive fixed effects. Forthcoming, Econometric Theory. Moon, H. R., Weidner, M., 2015b. Linear regression for panel with unknown number of factors as interactive fixed effects. Econometrica 83 (4), 1543–1579. Onatski, A., 2010. Determining the number of factors from empirical distribution of eigenvalues. Review of Economic and Statistics 92, 1004–1016. Ord, K., 1975. Estimation methods for models of spatial interaction. Journal of the American Statistical Association 70, 120–297. Pesaran, M. H., 2006. Estimation and inference in large heterogeneous panels with a multifactor error structure. Econometrica 74, 967–1012.

31

Pesaran, M. H., Tosetti, E., 2011. Large panels with common factors and spatial correlation. Journal of Econometrics 161, 182–202. Stock, J. H., Watson, M. W., 2002. Macroeconomic forecasting using diffusion indexes. Journal of Business and Economic Statistics 20 (2), 147–162. Su, L., Yang, Z., 2015. Qml estimation of dynamic panel data models with spatial errors. Journal of Econometrics 185, 230–258. Tao, J., Yu, J., 2012. The spatial time lag in panel data models. Economics Letetrs 117, 544–547. Wu, C.-F., 1981. Asymptotic theory of nonlinear least squares estimation. The Annals of Statistics 9 (3), 501–513. Yu, J., De Jong, R., Lee, L.-F., 2008. Quasi-maximum likelihood estimators for spatial dynamic panel data with fixed effects when both n and t are large. Journal of Econometrics 146, 118–134. Zhang, F., 2011. Matrix Theory: Basic Results and Techniques. Springer.

32

Appendix A. Some Matrix Algebra and Useful Lemmas This section lists some results in matrix algebra used frequently throughout the paper. All matrices are real. Lemma 1 on the uniform boundedness of some matrices is in Lee (2004b).

Lemma 1. Supposing that kWn k and Sn−1 are uniformly bounded for some matrix norm k·k, where Sn =

In − λ0Wn . Then Sn (λ )−1 is uniformly bounded for |λ − λ0 | < c12 , where Sn (λ ) = In − λWn and c =

 max lim supn kWn k , lim supn Sn−1 . The following results on matrix norms are standard, see Bernstein (2009). Let µi (M) denote the i-th

largest eigenvalue of a symmetric matrix M, µn (M) ≤ µn−1 (M) ≤ · · · ≤ µ1 (M). Lemma 2. A is a n × m matrix, B is a m × p matrix, and C and D are n × n matrices. Then 1

1. kAk2 ≤ kAkF ≤ kAk2 rank(A) 2 ; and kAk22 ≤ kAk1 kAk∞ ; 2. kABkF ≤ kAkF kBk2 ≤ kAkF kBkF ; and |tr (AB)| ≤ kAkF kBkF , if n = p; 3. |tr(C)| ≤ rank(C) kCk, where k·k is an induced matrix norm; 4. µi (C)µ1 (D) ≥ µi (CD) ≥ µi (C)µn (D), where C, D are n × n symmetric positive semidefinite matrices. We have following results on matrix norms for the n × T dimensional explanatory variables, of which proofs

are in the supplementary file.

√  √  Lemma 3. Under Assumptions E and R, kZk k2 = OP nT , kZk kF = OP ( nT ), and tr (An Zk )0 ε = √  OP nT for k = 1, · · · , K + 1, where An is an n × n nonstochastic UB matrix. Lemma 4. Define Z¯ k = E (Zk |CnT ) where the conditioning set CnT is the sigma algebra generated by Xn1 , · · · , XnT and Γ˜ n and FT . Under Assumptions E, R and SF, we have

 

−1 0

(1) PΓ˜ n ,FT = OP √1nT , where PΓ˜ n ,FT = Γ˜ n Γ˜ 0n Γ˜ n (FT FT )−1 FT0 ; PΓ˜ n εPFT 2 = OP (1); 2

√ √  (2) For k = 1, · · · , K + 1, kZk − Z¯ k k2 = OP max n, T ; and Zk ε 0 PΓ˜ n 2 = OP (max (n, T )).

Lemma 5. Under Assumptions E, R and SF,

√1 nT

[tr (Qn (Zk − Z¯ k ) PFT ε 0 ) − E (tr (Qn (Zk − Z¯ k ) PFT ε 0 ) |CnT )] =

OP ( √1T ) for all k = 1, · · · , K + 1, where Qn is any n × n UB matrix.

Lemma 6. Under Assumptions E, R andSF, we have   (1) √1nT tr PΓ˜ n εε 0 − σ02 Tr0 = OP √1n .     1 0 − σ 2 T tr P R G R−1 (2) √1nT tr PΓ˜ n Rn Gn R−1 εε = O P √n ; n Γ˜ n n n n 0 33

    1 0 − σ 2 T tr P R G R−1 tr Rn Gn R−1 P εε = O P √n ; and n Γ˜ n Γ˜ n n n n 0     1 0 − σ 2 T tr P R G R−1 tr PΓ˜ n Rn Gn R−1 P εε = O P √n . n Γ˜ n Γ˜ n n n n 0         (3) √1nT tr PΓ˜ n G˜ n εε 0 − σ02 T tr PΓ˜ n G˜ n = OP √1n ; √1nT tr G˜ n PΓ˜ n εε 0 − σ02 T tr PΓ˜ n G˜ n = OP √1n ;     and √1nT tr PΓ˜ n G˜ n PΓ˜ n εε 0 − σ02 T tr PΓ˜ n G˜ n = OP √1n . √1 nT 1 √ nT

Appendix B. Identification and Consistency: Proof of Propositions 1-3 Proof of Proposition 1. Dropping constant terms, the expected objective function is equivalent to

1 QnT (θ , Γ˜ n , FT ) = log |Sn (λ )Rn (α)| n " ! ! ! !0 #! K K 1 1 − log E tr Rn (α) Sn (λ )Y − ∑ Zk δk − Γ˜ n FT0 Rn (α) Sn (λ )Y − ∑ Zk δk − Γ˜ n FT0 , 2 nT k=1 k=1 with Γ˜ n = Rn (α)Γn , Γ˜ n ∈ Rn×r , because no restriction is imposed on Γn and Rn (α) is invertible for α ∈ Θα .

Denote Γ˜ n0 = Rn Γn0 . Thus, QnT (θ0 , Γ˜ n0 , FT 0 ) = n1 log |Sn Rn | − 12 log σ02 . Substituting in the DGP of Y , ! ! K K+1 Rn (α) Sn (λ )Y − ∑ Zk δk − Γ˜ n FT0 =Rn (α) Sn (λ )Sn−1 Γn0 FT0 0 + Sn (λ )Sn−1U + ∑ Zk (θk0 − θk ) − Γ˜ n FT0 , k=1

k=1

where for simplicity, θk = δk , for k = 1, · · · , K, θK+1 = λ , and ZK+1 = Gn ∑Kk=1 Zk θk0 . Under Assumption E, as the idiosyncratic disturbances are independent from the interactive effects and regressors, QnT (θ , Γ˜ n , FT )

 2  σ0 1 1 −1 0 −1 0 0 0 tr Rn (α)Sn (λ )Sn−1 R−1 ≤ log |Sn (λ )Rn (α)| − log n Rn Sn Sn (λ ) Rn (α) + n 2 n " ! !0 #! K+1 K+1 1 Sn (λ )Sn−1 Γn0 FT0 0 + ∑ Zk (θk0 − θk ) Rn (α)0 MΓ˜ n E tr MΓ˜ n Rn (α) Sn (λ )Sn−1 Γn0 FT0 0 + ∑ Zk (θk0 − θk ) nT k=1 k=1  2  σ 1 1 −1 0 −1 0 0 0 0 ≤ log |Sn (λ )Rn (α)| − log tr Rn (α)Sn (λ )Sn−1 R−1 n Rn Sn Sn (λ ) Rn (α) n 2 n ! !0 !! K+1 K+1 1 +E tr MΓ˜ n Rn (α) ∑ Zk (θk0 − θk ) MFT 0 ∑ Zk (θk0 − θk ) Rn (α)0 MΓ˜ n nT k=1 k=1  2  σ0 1 1 −1 0 −1 0 0 0 = log |Sn (λ )Rn (α)| − log tr Rn (α)Sn (λ )Sn−1 R−1 n Rn Sn Sn (λ ) Rn (α) + n 2 n ! K+1  1 0 0 E vec (Zk1 ) MFT 0 ⊗ Rn (α) MΓ˜ n Rn (α) vec (Zk2 ) (θk1 0 − θk1 ) (θk2 0 − θk2 ) . nT k1 ,k∑ 2 =1 • Case 1: Assumption ID1 holds.

34

   If δ 6= δ0 or λ 6= λ0 , because Ez0 MFT 0 ⊗ Rn (α)0 MΓ˜ n Rn (α) z with z = vec (Z1 ) · · · vec (ZK+1 ) is positive definite,

 2   σ0 1 1 −1 −1 −1 0 −1 0 0 0 ˜ QnT (θ , Γn , FT ) < log |Sn (λ )Rn (α)| − log tr Rn (α)Sn (λ )Sn Rn Rn Sn Sn (λ ) Rn (α) n 2 n   1 1 1 2 −1 −1 −1 0 −1 0 0 0 n ≤ log |Sn (λ )Rn (α)| − log σ0 Rn (α)Sn (λ )Sn Rn Rn Sn Sn (λ ) Rn (α) n 2 1 1 = − log σ02 + log |Sn Rn | = QnT (θ0 , Γ˜ n0 , FT 0 ). 2 n Therefore δ0 and λ0 are identified. At δ = δ0 and λ = λ0 ,   1 1 1 1 −1 0 0 n R R (α) = QnT (θ0 , Γ˜ n0 , FT 0 ). QnT (θ , Γ˜ n , FT ) ≤ log |Sn | + log |Rn (α)| − log σ02 Rn (α)R−1 n n n n n 2

If α 6= α0 , this inequality will be strict by Assumption ID1(2). Therefore α0 is identified. • Case 2: Assumption ID2 holds.

   Because z˜0 MF0T ⊗ Rn (α)0 MΓ˜ n Rn (α) z˜ with z˜ = vec (Z1 ) · · · vec (ZK+1 ) is positive semi-definite,   1 1 1 2 −1 −1 −1 0 −1 0 0 0 n ˜ QnT (θ , Γn , FT ) ≤ log |Sn (λ )Rn (α)| − log σ0 Rn (α)Sn (λ )Sn Rn Rn Sn Sn (λ ) Rn (α) n 2 1 1 = − log σ02 + log |Sn Rn | = QnT (θ0 , Γ˜ n0 , FT 0 ). 2 n

λ and α are identified, because this inequality holds strictly for λ 6= λ0 or α 6= α0 under Assumption ID2(2).

  With λ = λ0 , δ is then identified, because this inequality is also strict for δ 6= δ0 , due to E z0 MFT 0 ⊗ Rn (α)0 MΓ˜ n Rn (α) z   with z = vec (Z1 ) · · · vec (ZK ) being positive definite under Assumption ID2(1). Therefore δ0 , λ0 and α0 are identified under either Assumption ID1 or ID2. With δ = δ0 , λ = λ0 , and  h  0 i 1 α = α0 , QnT (θ0 , Γ˜ n , FT ) = 1n log |Sn Rn | − 12 log σ02 + nT tr Γ˜ n0 FT0 0 − Γ˜ n FT0 Γ˜ n0 FT0 0 − Γ˜ n FT0 , which is

strictly less than QnT (θ0 , Γ˜ n0 , FT 0 ) unless Γ˜ n FT0 = Γ˜ n0 FT0 0 . As a result, the number of factors r0 is identified from the rank of Γ˜ n0 FT0 0 = Rn Γn0 FT0 0 . Proof of Proposition 2. Instead of considering the positive eigenvalues of evant eigenvalues of

1 nT (η

1 nT Rn (α)(η

· Z)(η · Z)0 Rn (α)0 , one may consider the rel-

· Z)0 Rn (α)0 Rn (α)(η · Z). Because these two matrices have the same nonzero

eigenvalues, counting multiplicity.11 There are T eigenvalues of

1 nT (η

· Z)0 Rn (α)0 Rn (α)(η · Z) =

1 nT (IT



0 R (α)0 R (α)Z (I ⊗ η). Because η ∈ B 0 2 η 0 )ZnT n n nT T K+1 , (IT ⊗ η) (IT ⊗ η) = ||η||2 IT = IT . By the Poincaré 11 See

Theorem 2.8 in Zhang (2011).

35

1 0 R (α)0 R (α)Z ) ≤ µ ( 1 (I ⊗ η 0 )Z 0 R (α)0 R (α)Z (I ⊗ η)), for ZnT interlacing theorem, µi+KT ( nT n n nT i nT T n nT T nT n 1 0 R (α)0 R (α)Z ) is of dimension (T + KT ) × (T + KT ), its T smallest i = 1, · · · , T .12 Because ( nT ZnT n n nT

eigenvalues (including multiplicity) are less than or equal to the T corresponding eigenvalues of

1 nT (IT

0 R (α)0 R (α)Z (I ⊗ η) uniformly in η ∈ B η 0 )ZnT n n nT T K+1 . A way to see this is the following:



1 0 0 nT Rn (α)(η · Z)(η · Z) Rn (α) . Supnη 1 · Z)0 Rn (α)0 ) = ∑i=2r+1 µi ( nT Rn (α)(η · Z)(η ·

1. For each η, let nη be the number of positive eigenvalues of 1 Rn (α)(η · Z)(η pose nη ≥ 2r + 1, then ∑ni=2r+1 µi ( nT

Z)0 Rn (α)0 ).

2. There are nη positive eigenvalues of

1 nT (η

1 · Z)0 Rn (α)0 Rn (α)(η · Z) and µi ( nT (η · Z)0 Rn (α)0 Rn (α)(η ·

1 Z)) = µi ( nT Rn (α)(η · Z)(η · Z)0 Rn (α)0 ) for i = 1, · · · , nη . The remaining T − nη eigenvalues of

Z)0 Rn (α)0 Rn (α)(η · Z) are zero. Note that it is necessary that nη ≤ min{n, T }.

1 nT (η ·

1 0 R (α)0 R (α)Z ) ≤ µ ( 1 (I ⊗η 0 )Z 0 R (α)0 R (α)Z (I ⊗η)) = µ ( 1 (η · 3. Because µi+KT ( nT ZnT n n nT i nT T n nT T i nT nT n

Z)0 Rn (α)0 Rn (α)(η · Z)) for i = 2r + 1, · · · , nη , it follows that T +KT



µi (

i=2r+1+KT T

≤ Because



1 Z 0 Rn (α)0 Rn (α)ZnT ) nT nT n

µi (

i=2r+1

η 1 1 (η · Z)0 Rn (α)0 Rn (α)(η · Z)) = ∑ µi ( (η · Z)0 Rn (α)0 Rn (α)(η · Z)). nT nT i=2r+1

1 0 0 nT ZnT Rn (α) Rn (α)ZnT

and

1 0 0 nT ZnT ZnT Rn (α) Rn (α) have the same positive eigenvalues, counting

multiplicity, T +KT

=



µi (



µi (

i=2r+1+KT n i=2r+1+KT

1 Z 0 Rn (α)0 Rn (α)ZnT ) nT nT n  1 1 0 0 ZnT ZnT Rn (α)0 Rn (α)) ≥ )µn Rn (α)0 Rn (α) , µi ( ZnT ZnT ∑ nT nT i=2r+1+KT

where the last inequality is due to Lemma 2(4). Therefore, a set of sufficient conditions for Assumption 1 0 ) > 0 and µ (R (α)0 R (α)) > 0 wpa 1 as n, T → ∞. NC1(1) is ∑ni=2r+1+KT µi ( nT ZnT ZnT n n n

Proof of Proposition 3. Maximizing Eq. (4) with respect to Γn and FT is equivalent to 1 n×r T ×r nT Γn ∈R ,FT ∈R min

×

T



t=1

0 Sn (λ )Sn−1 (Znt δ0 +Unt ) − Znt δ + Sn (λ )Sn−1 Γn0 ft0 − Γn ft Rn (α)0 Rn (α)

Sn (λ )Sn−1 (Znt δ0 +Unt ) − Znt δ

12 See

+ Sn (λ )Sn−1 Γn0 ft0 − Γn ft

Zhang (2011), Theorem 8.10 and p.271

36





 0 1 T  Rn (α) Sn (λ )Sn−1 (Znt δ0 +Unt ) − Znt δ − Γ˜ n f˜t ∑ Γ˜ n ∈Rn×2r ,F˜T ∈RT ×2r nT t=1    × Rn (α) Sn (λ )Sn−1 (Znt δ0 +Unt ) − Znt δ − Γ˜ n f˜t min

1 ˜Γn ∈Rn×2r nT

T

= min

t=1

1 nT

≥ min tr Γ˜ n ∈Rn×2r

+

2 nT

T



t=1



0  Sn (λ )Sn−1 (Znt δ0 +Unt ) − Znt δ Rn (α)0 MΓ˜ n Rn (α) Sn (λ )Sn−1 (Znt δ0 +Unt ) − Znt δ (B.1) T

∑ Rn (α)

t=1



0 Sn (λ )Sn−1 Znt δ0 − Znt δ Rn (α)0 MΓ˜ n T

0  1 Sn (λ )Sn−1 Znt δ0 − Znt δ Rn (α)0 Rn (α) Sn (λ )Sn−1Unt + nT

− max tr

2 nT

− max tr

1 nT

Γ˜ n ∈Rn×2r

Γ˜ n ∈Rn×2r

Sn (λ )Sn−1 Znt δ0 − Znt δ

T

∑ Rn (α)Sn (λ )Sn−1Unt

t=1 T



t=1

Rn (α)Sn (λ )Sn−1Unt

0



t=1

0  Sn (λ )Sn−1Unt Rn (α)0 Rn (α) Sn (λ )Sn−1Unt

Sn (λ )Sn−1 Znt δ0 − Znt δ Rn (α)0 PΓ˜ n 0 Sn (λ )Sn−1Unt Rn (α)0 PΓ˜ n

!

!

!

(B.2)

.

The first inequality above follows because the value of the minimization problem can be no less than the case where we were also able to optimally choose Sn (λ )Sn−1 Γn0 and FT 0 . Eq. (B.1) is obtained by concentrating out the factor ft . Because there is no restriction on Γ˜ n ∈ Rn×2r , optimization with respect to Rn (α)Γ˜ n is equivalent to optimization with respect to Γ˜ n in Eq. (B.1). Now we examine the terms in Eq. (B.2) one by  0 one. Denote η = δ00 − δ 0 , λ0 − λ . Then  0 1 T min tr Rn (α) Sn (λ )Sn−1 Znt δ0 − Znt δ Sn (λ )Sn−1 Znt δ0 − Znt δ Rn (α)0 MΓ˜ n ∑ ˜Γn ∈Rn×2r nT t=1   n 1 0 0 Rn (α) (η · Z) (η · Z) Rn (α) ≥ b||η||21 , = ∑ µi nT i=2r+1

!

(B.3)

for some constants b > 0 wpa 1 as n, T → ∞, by Assumption NC1. When Gn Znt δ0 = Znt C for a constant

vector C, then Assumption NC2 can be used in (B.3). In this case, Sn (λ )Sn−1 Znt δ0 − Znt δ = Znt (δ0 − δ ) +

Znt C(λ0 − λ ) = Znt η ∗ , where η ∗ = δ0 − δ +C(λ0 − λ ). Denote η ∗ · Z ∗ = ∑Kk=1 ηk∗ Zk , by Assumption NC2, !  0 1 T −1 −1 0 min tr ∑ Rn (α) Sn (λ )Sn Znt δ0 − Znt δ Sn (λ )Sn Znt δ0 − Znt δ Rn (α) MΓ˜ n nT t=1 Γ˜ n ∈Rn×2r   n 1 = ∑ µi Rn (α) (η ∗ · Z ∗ ) (η ∗ · Z ∗ )0 Rn (α)0 ≥ b||η ∗ ||21 . nT i=2r+1 for some constant b ≥ 0 wpa 1 as n, T → ∞. Consider the next term, 2 nT

T



t=1

0  Sn (λ )Sn−1 Znt δ0 − Znt δ Rn (α)0 Rn (α) Sn (λ )Sn−1Unt

37

=

2 nT

K+1

∑ ηk tr

k=1



 0  −1 0 −1 0 0 0 R−1 n Sn Sn (λ ) Rn (α) Rn (α)Zk ε = OP kηk1 cnT ,

(B.4)

by Lemma 3. For the next term, by using a law of large numbers for quadratic form (Lemma 9 in Yu et al. (2008)), 1 nT =

σ02 n

T



t=1

 0 Sn (λ )Sn−1Unt Rn (α)0 Rn (α) Sn (λ )Sn−1Unt

  0 −1 0 0 0 −1 −1 tr R−1 + OP c−1 n Sn Sn (λ ) Rn (α) Rn (α)Sn (λ )Sn Rn nT .

(B.5)

From Lemma 2 (3), for an n × n matrix A, |tr (A)| ≤ rank(A) kAk, where k·k is an induced matrix norm, ! T   2 0 max tr ∑ Rn (α) Sn (λ )Sn−1Unt Sn (λ )Sn−1 Znt δ0 − Znt δ Rn (α)0 PΓ˜ n nT Γ˜ n ∈Rn×2r t=1  

4r K+1 − 12 −1 −1 0 0

| |η kηk R εZ . (B.6) R (α)S (λ )S R (α) = O c ≤ P ∑ k n n n n k n 1 nT 2 nT k=1

The last equality is due to the sub-multiplicative property of k·k2 matrix norm, and that for a matrix A, p kAk22 ≤ kAk1 kAk∞ ; furthermore, Assumptions E and R guarantee that kεk2 = OP ( max(n, T )) and kZk k2 = √ OP ( nT ). Similarly, ! T   0 1 −1 −1 0 . (B.7) max tr ∑ Rn (α) Sn (λ )Sn Unt Sn (λ )Sn Unt Rn (α) PΓ˜ n = OP c−1 nT nT Γ˜ n ∈Rn×2r t=1

Under Assumption NC1, by substituting Eqs. (B.3)-(B.7) into Eq. (B.2),

1 QnT (θ ) ≤ log |Sn (λ )Rn (α)| n  2      σ0 1 − 21 −1 0 −1 0 0 0 −1 −1 2 −1 − log tr Rn Sn Sn (λ ) Rn (α) Rn (α)Sn (λ )Sn Rn + b||η||1 + OP kηk1 cnT + OP cnT 2 n ≡Q˜ nT (θ ) .

(B.8)

At θ = θ0 , ! 1 T 0 0 min ∑ (Γn0 ft0 +Unt − Γn ft ) Rn Rn (Γn0 ft0 +Unt − Γn ft ) Γn ∈Rn×r ,FT ∈RT ×r nT t=1 !  1 1 1 T 0 1 1 ≥ log |Sn Rn | − log εnt εnt = log |Sn Rn | − log σ02 + OP c−1 ≡ Q (θ0 ) . (B.9) ∑ nT n 2 nT t=1 n 2 ∼ nT

1 1 QnT (θ0 ) = log |Sn Rn | − log n 2

Lemma 1 in Wu (1981) provides a criterion for the consistency of θˆnT = arg maxθ ∈Θ QnT (θ ). To show that

p 

θˆnT − θ0 → 0, it is sufficient to show that for all τ > 0, lim infn,T →∞ P infθ ∈Θ,kθ −θ0 k1 ≥τ (QnT (θ0 ) − QnT (θ )) > 0 = 1 38

1. From Eqs. (B.8) and (B.9), QnT (θ0 ) − QnT (θ ) ≥ Q (θ0 ) − Q˜ nT (θ ) ∼ nT      σ02  1 − 12 2 −1 0 −1 0 0 0 −1 −1 −1 = log b||η||1 + OP kηk1 cnT + OP cnT + tr Rn Sn Sn (λ ) Rn (α) Rn (α)Sn (λ )Sn Rn 2 n   1  1 0 −1 0 0 0 −1 −1 n − log σ02 R−1 S S (λ ) R (α) R (α)S (λ )S R + OP c−1 n n n n n n n n nT 2      σ02  1 − 12 2 −1 0 −1 0 0 0 −1 −1 −1 ≥ log b||η||1 + OP kηk1 cnT + OP cnT + tr Rn Sn Sn (λ ) Rn (α) Rn (α)Sn (λ )Sn Rn 2 n  2    σ0 1 0 −1 0 0 0 −1 −1 (B.10) tr R−1 S S (λ ) R (α) R (α)S (λ )S R − log + OP c−1 n n n n n n n n nT , 2 n

1  0 −1 0 0 0 −1 −1 ≥ R−1 0 S−1 0 S (λ )0 R (α)0 R (α)S (λ )S−1 R−1 n by because 1n tr R−1 n n n n n Sn Sn (λ ) Rn (α) Rn (α)Sn (λ )Sn Rn n n n n the inequality of arithmetic and geometric means. Under Assumption NC1, when kδ − δ0 k1 ≥ τ or |λ − λ0 | ≥  τ, b > 0 and lim infn,T →∞ P infθ ∈Θ,kθ −θ0 k1 ≥τ (QnT (θ0 ) − QnT (θ )) > 0 = 1 holds. In addition, (B.10) is strict when α 6= α0 even if λ = λ0 , as guaranteed by Assumption NC1(2). In either case, consistency follows from Lemma 1 of Wu (1981).

When Assumption NC2 holds instead of NC1, the inequality in (B.10) is strict when α 6= α0 or λ 6= λ0

from Assumption NC2(2), and even with λ = λ0 , b||η||21 > 0 when kδ − δ0 k1 ≥ τ. Therefore consistency  follows from lim infn,T →∞ P infθ ∈Θ,kθ −θ0 k1 ≥τ (QnT (θ0 ) − QnT (θ )) > 0 = 1.  0  0 ,λ ˆ nT , αˆ nT . By definition, QnT θˆnT − QnT (θ0 ) ≥ 0. Now consider the QML estimator θˆnT = δˆnT      −1 0 ,λ −λ ˆ nT . ˆ nT = δ00 − δˆnT From Eq.(B.10), it follows that b||ηˆ nT ||21 +OP kηˆ nT k1 cnT2 +OP c−1 0 nT ≤ 0 with η    1 2  − 12 −1 Completing the squares, b 2 kηˆ nT k1 + anT + OP cnT ≤ 0 for anT = OP cnT , and therefore kηˆ nT k1 =   − 21 OP cnT . Appendix C. Perturbation Theory and Series Expansion of the Concentrated Log Likelihood Function Kato (1995) has a systematic presentation of perturbation theory. Moon and Weidner (2015b) applies the perturbation theory to a regression panel with common factors. Here we show specifically how to expand LnT (θ0 ) itself and LnT (θ ) around θ0 . Firstly, ! K

Rn (α) Sn (λ )Y − ∑ Zk δk k=1

= Rn + (α0 − α)W˜ n



K



!

0 −1 Γn FT0 + R−1 n ε + ∑ Zk (δ0k − δk ) + ZK+1 + Gn Γn FT + Gn Rn ε (λ0 − λ ) k=1

39

K+2

K+2

k=0

k1 =0,k2 =0

∑ ξkVk + ∑

=Γ˜ n FT0 +

where Γ˜ n = Rn Γn , ξ0 =

ξk1 ξk2 Vk1 k2 ,

kεk √ 2, nT

(C.1)

ξk = δ0k − δk for k = 1, · · · , K, ξK+1 = λ0 − λ , ξK+2 = α0 − α, V0 =

√ nT ε kεk2 ,

˜ 0 ˜ −1 ˜ 0 Vk = Rn Zk for k = 1, · · · , K, VK+1 = Rn (ZK+1 + Gn R−1 n Γn FT ), and VK+2 = Wn Rn Γn FT ; and also, Vk1 k2 = Rn Gn R−1 n

√ nT ε kεk2

√ nT ε kεk2

for k1 = 0, k2 = K + 1; Vk1 k2 = G˜ n

for k1 = 0, k2 = K + 2; Vk1 k2 = W˜ n Zk1 for k1 =

−1 ˜ 0 1, · · · , K, k2 = K + 2; Vk1 k2 = W˜ n (ZK+1 + Gn R−1 n Γn FT + Gn Rn ε) for k1 = K + 1, k2 = K + 2; and Vk1 k2 = 0

otherwise. All the Vk and Vk1 k2 above are n × T matrices. In Moon and Weidner (2015b), for the NLS estimator of a factor panel regression, only the similar term Γn FT0 + ∑Kk=0 εkVk appears. For our model, ξK+1 ,

VK+1 , ξK+2 , VK+2 and Vk1 k2 are extra terms due to the contemporaneous spatial lag and spatial disturbances. In addition, our objective function (Eq. (5)) has the Jacobian term log |Sn (λ )Rn (α)|.

For our spatial model, the part LnT (θ ) of the objective function without the Jacobian term is

LnT (ξ ) =

n



µi

i=r0 +1

1 + nT

1 (0) 1 T + nT nT

K+2 K+2 K+2

∑ ∑ ∑

k1 =0 k2 =0 k3 =0

K+2

∑ ξk Tk

k1 =0

(1)

1

(3) ξk1 ξk2 ξk3 Tk1 k2 k3

1

+

1 nT

1 + nT

K+2 K+2

∑ ∑ ξk ξk Tk k

k1 =0 k2 =0

(2)

1

2

K+2 K+2 K+2 K+2

∑ ∑ ∑ ∑

k1 =0 k2 =0 k3 =0 k4 =0

1 2

(4) ξk1 ξk2 ξk3 ξk4 Tk1 k2 k3 k4

!

,

(C.2)

(1) (2) (3) where T (0) = Γ˜ n FT0 FT Γ˜ 0n , Tk1 = Vk1 FT Γ˜ 0n + Γ˜ n FT0 Vk01 , Tk1 k2 = Vk1 k2 FT Γ˜ 0n + Γ˜ n FT0 Vk01 k2 +Vk1 Vk02 , Tk1 k2 k3 = Vk1 k2 Vk03 + (4)

Vk3 Vk01 k2 , and Tk1 k2 k3 k4 = Vk1 k2 Vk03 k4 , with k j = 0, · · · , K + 2 and j = 1, 2, 3, 4. In our model, T (0) is the unper-

turbed operator and it has exactly n − r0 zero eigenvalues and the rest r0 eigenvalues are strictly positive.

Therefore, if there is no perturbation, i.e., ξk = 0 for k = 0, · · · , K + 2, LnT (ξ = 0) = 0. The objective is to

expand LnT (ξ ) around LnT (ξ = 0) = 0 in terms of ξ , T (1) , T (2) , T (3) and T (4) using the formula in Kato (1995). Define  bnT

(1)

T K+2 1 

k , = max  ∑ |ξk1 | 1 2

nT dmax (Γ˜ n , FT ) k1 =0 2

(2) ! 12   12

T 1

k1 k2 , ∑ |ξk1 | |ξk2 |

nT

2 (Γ ˜ n , FT ) dmax k1 ,k2 =0 K+2

2

(3) ! 13   13

T 1

k1 k2 k3 , ∑ |ξk1 | |ξk2 | |ξk3 |

nT

2 (Γ ˜ n , FT ) dmax k1 ,k2 ,k3 =0 K+2

2 

! 14 

(4) 1 

K+2 4 1

Tk k k k  . ∑ |ξk1 | |ξk2 | |ξk3 | |ξk4 |

1nT2 3 4

2 (Γ ˜ d , F ) max n T k1 ,k2 ,k3 ,k4 =0

(C.3)

2







(1)

(2)

(3)

(4) 2 (Γ ˜ n , FT ) Because Tk1 = OP (nT ), Tk1 k2 = OP (nT ), Tk1 k2 k3 = OP (nT ), Tk1 k2 k3 k4 = OP (nT ), and dmax 2

2

2

2

is converging to some positive constant, bnT = OP (kξ k1 ). The following lemma provides an expansion of

LnT (ξ ) as a power series in ξ . The expansion is valid if ξ is small such that the condition on bnT in Lemma 40

7 below holds. We list the expansion below for its use, but the detailed derivation of this lemma and bound on the remainder can be found in the supplementary file. 2 (Γ ˜ n ,FT ) dmin 2 (Γ ˜ n ,FT ) − bnT > 0, then LnT (ξ ) 16dmax   K+2 K+2 K+2 ∞ 1 ˜ (g) ξ · · · ξ · · · ξ tr T ∑ ∑ ∑ ∑ k k k g=1 g 1 2 kg =0 k1 =0 k2 =0 k1 k2 ···kg , where nT

Lemma 7. Under Assumption SF and assume that series expansion LnT (ξ ) = (g) T˜k1 k2 ···kg = −

g



(−1) p p=d 4g e



(v )

(v )

v1 +v2 +···+v p =g m1 +···+m p+1 =p−1 ν j =1,··· ,4, m j =0,1,···

has a convergent

1 S(m1 ) Tk1 ··· S(m2 ) · · · S(m p ) T···kpg S(m p+1 )

with S(0) = −MΓ˜ n , S(m) = S0m for m ≥ 1 where S0 = Γ˜ n (Γ˜ 0n Γ˜ n )−1 (FT0 FT )−1 (Γ˜ 0n Γ˜ n )−1 Γ˜ 0n .13 For g ≥ 5, 1 nT

  16r d 2 (Γ˜ , F )d 2 (Γ˜ , F )  16d 2 (Γ˜ , F )b g n T min n T 0 (g) max n T nT ˜ . ∑ ξk1 · · · ξkg tr Tk1 k2 ···kg ≤ 16d 2 max 2 2 (Γ ˜ ˜ n , FT ) ˜ dmin max (Γn , FT ) − dmin (Γn , FT ) k1 ,··· ,kg =0 K+2

When the series expansion is truncated at an order G ≥ 4,   1 G K+2 K+2 K+2 (g) LnT (ξ ) − ∑ ∑ ∑ · · · ∑ ξk1 ξk2 · · · ξkg tr T˜k1 k2 ···kg nT g=1 k1 =0 k2 =0 kg =0   2 (Γ 2 (Γ ˜ n , FT )d 2 (Γ˜ n , FT ) ˜ n , FT )bnT G+1 16r0 dmax 16dmax min  = OP (kξ kG+1 ). ≤  1 2 (Γ ˜ n ,FT )bnT 2 (Γ ˜ 16d 2 max d , F ) 2 16dmax (Γ˜ n , FT ) − dmin (Γ˜ n , FT ) 1 − d 2 (Γ˜ ,F ) min n T min

n

T

The condition on bnT can be satisfied wpa 1 for n, T → ∞, because bnT = OP (kξ k1 ), |ξ0 | = OP ( √ 1 ), min(n,T )

kεk and θˆ − θ0 1 = oP (1) by Proposition 3. At θ0 , LnT (θ0 ) has the perturbation ξ0 = √nT2 while other ξk , k = 1, · · · , K + 2 are zero. LnT (θ0 ) has an expansion in terms of ξ0 as,

LnT (θ0 ) =

1 nT

n



i=r0 +1

µi



Γ˜ n FT0 + ε



Γ˜ n FT0 + ε

0 

  2  1 tr MΓ˜ n εMFT ε 0 PΓ˜ n ,FT ε 0 + JnT + OP = tr MΓ˜ n εMFT ε 0 − nT nT



kεk √ 2 nT

5 !

(C.4)

=σ02 + OP (min(n, T )−1 ),

(C.5)

where in Eq. (C.4), the series expansion is truncated at order 4 with 1  (0) (2) (1) (2) (0)  tr S T00 S T00 S nT 1  (0) (2) (1) (1) (1) (1) (0)  1  (1) (1) (0) (2) (0) (1) (1)  1  (0) (2) (0) (1) (2) (1) (0)  + + + tr S T00 S T0 S T0 S tr S T00 S T0 S T0 S tr S T0 S T00 S T0 S nT nT nT

JnT = −

13 Notice that v + · · · + v p 1

(v)

= g. When v = 1, Tk

(v)

has a single k subscript; when v = 2, Tkk0 has two subscripts, k, k0 . For example, (1)

(2)

(1)

when g = 4, and in a particular summand, v1 = 1, v2 = 2, v3 = 1, a typical term in the summand is S(m1 ) Tk1 S(m2 ) Tk2 k3 S(m3 ) Tk4 .

41

1  (0) (1) (2) (1) (0) (2) (0)  1  (0) (1) (1) (1) (1) (2) (0)  1  (0) (1) (1) (2) (1) (1) (0)  tr S T0 S T00 S T0 S + tr S T0 S T0 S T00 S + tr S T0 S T0 S T00 S nT nT nT     1 1 (1) (1) (1) (1) (1) (1) (1) (1) − tr S(0) T0 S(2) T0 S(0) T0 S(1) T0 S(0) − tr S(0) T0 S(1) T0 S(0) T0 S(2) T0 S(0) nT nT 1  (1) (1) (0) (1) (1) (1) (0) (1) (1)  1  (0) (1) (1) (1) (1) (1) (1) (1) (0)  − tr S T0 S T0 S T0 S T0 S tr S T0 S T0 S T0 S T0 S − . (C.6) nT nT +

It follows that

1 LnT (θ0 )

=

1 σ02

+ OP (min(n, T )−1 ).

For the derivation of the asymptotic distribution of the QMLE, it is sufficient to consider the series expansion of LnT (θ ) truncated at order G = 4. Define the following (K + 2) × 1 vectors, 

0    (C.7) MΓ˜ n Rn Z1 MFT ε 0 , · · · √1nT tr MΓ˜ n Rn ZK MFT ε 0 , √1nT tr MΓ˜ n Rn ZK+1 MFT ε 0 , 0         − √1nT tr MΓ˜ n εMFT ε 0 PΓ˜ n ,FT Z10 R0n − √1nT tr MΓ˜ n εMFT Z10 R0n PΓ˜ n ,FT ε 0 − √1nT tr MΓ˜ n Rn Z1 MFT ε 0 PΓ˜ n ,FT ε 0     ..   .           (2) 1 1 1 ,  0 0 0 0 0 0 0 0 C =  − √ tr MΓ˜ εMFT ε PΓ˜ ,F ZK Rn − √ tr MΓ˜ εMFT ZK Rn PΓ˜ ,F ε − √ tr MΓ˜ Rn ZK MFT ε PΓ˜ ,F ε  n n T n n T n n T nT nT nT          √1 1 1 0 0 0 0 0 0 0 0 − nT tr MΓ˜ n εMFT ε PΓ˜ n ,FT ZK+1 Rn − √nT tr MΓ˜ n εMFT ZK+1 Rn PΓ˜ n ,FT ε − √nT tr MΓ˜ n Rn ZK+1 MFT ε PΓ˜ n ,FT ε    0

C(1) =

√1 tr nT

(C.8)

 C(3) = 0, · · · 0,

√1 tr nT

 0 MΓ˜ n Rn Gn R−1 n MΓ˜ n εMFT ε ,

√1 tr nT

MΓ˜ n G˜ n MΓ˜ n εMFT ε

0 0

(C.9)

,

where PΓ˜ n ,FT = Γ˜ n (Γ˜ 0n Γ˜ n )−1 (FT0 FT )−1 FT0 . Define the following (K + 2) × (K + 2) matrices, 

 0   ..    .        1 1 0 0 0 0 C1 =  1 tr MΓ˜ Rn Z1 MFT ZK0 R0n , · · · nT tr MΓ˜ n Rn ZK MFT ZK Rn 0 nT tr MΓ˜ n Rn ZK MFT ZK+1 Rn n  nT    1   1 1 0 0 0 0  tr MΓ˜ Rn Z1 MFT ZK+1 tr MΓ˜ n Rn ZK MFT ZK+1 R0n · · · nT R0n 0 nT tr MΓ˜ n Rn ZK+1 MFT ZK+1 Rn n  nT  0 ··· 0 0 0   0 ··· 0 0 0   .. .. ..   ..  . . . .     C2 = 0  , and 0 0     σ02   σ02 σ02 −1 −1 0 0 0 −1 0 ˜ ˜  0 · · · 0 n tr Rn Gn Rn Rn Gn Rn n tr Gn Gn + n tr Rn Gn Rn Gn      σ2 σ2 σ02 ˜0 ˜ ˜0 0 · · · 0 n0 tr Gn G˜ n + n0 tr Rn Gn R−1 n Gn n tr Gn Gn 1 nT tr

MΓ˜ n Rn Z1 MFT Z10 R0n .. .



···

1 nT tr

MΓ˜ n Rn Z1 MFT ZK0 R0n .. .

C = C1 +C2 .

Lemma 8. Assume that



1 nT tr

0 MΓ˜ n Rn Z1 MFT ZK+1 R0n .. .



(C.10)

n T

→ κ 2 > 0, Assumptions NC1 (or NC2), E, R and SF hold, then with θ =

42

δ 0, λ , α

0

in a small neighborhood of θ0 ,

  2 rem (θ ), LnT (θ ) = LnT (θ0 ) − √ (θ − θ0 )0 C(1) +C(2) +C(3) + (θ − θ0 )0C(θ − θ0 ) + LnT nT (1) (2) (3) rem where  C , C  , C  and C are defined  in Eqs.  (C.7)-(C.10), 3and  the remainder  term is LnT (θ ) = 1 5 OP kθ − θ0 k31 + OP kθ − θ0 k21 (nT )− 4 + OP kθ − θ0 k1 (nT )− 4 + OP (nT )− 4 .

Lemma 8 is an application of Lemma 7 by rearrangement.

Lemma 9. Under Assumptions E, R and SF, and assuming that k = 1, · · · , K, K + 1,

n T

→ κ > 0, we have C = OP (1); also for

  1 1 (1) (2) Ck = √ tr Zk ε 0 − √ tr MΓ˜ n Rn (Zk − Z¯ k ) PFT ε 0 = OP (1), Ck = oP (1), nT nT and

 0   0 ˜ n MΓ˜ εε 0 + OP (1) √1 tr M ˜ G C(3) = 0, · · · 0, √1nT tr MΓ˜ n Rn Gn R−1 n MΓ˜ n εε + OP (1), Γ n n nT 0  q q  T T 2 + O (1), ˜ n σ 2 + OP (1) . = 0, · · · 0, tr (G ) σ tr G n P 0 0 n n Lemma 10. Assume that

n T

→ κ 2 > 0 and Assumptions E, R and SF hold, then D˜ nT − DnT = oP (1), where

DnT and D˜ nT are respectively defined in Eq. (9) and Eq. (D.9).







Lemma 11. Under the assumptions of Theorem 3, MΓˆ˜ − MΓ˜ n = PΓˆ˜ − PΓ˜ n = OP ( √1n ), and MFˆT − MFT 2 = n n 2 2

P ˆ − PF = OP ( √1 ), where M ˆ and M ˆ are defined in Section 3.3. T 2 FT FT Γ˜ T n

Appendix D. Asymptotic Distributions: Proof of Theorems 1-5 Proof of Theorem 1.

 Lemma 8 shows that, for θ close to θ0 , LnT (θ ) = LnT (θ0 ) − √2nT (θ − θ0 )0 C(1) +C(2) +C(3) + (θ −       − 41 − 34 3 2 rem (θ ), where Lrem (θ ) = O kθ k kθ k kθ k − θ +O − θ (nT ) θ0 )0C(θ −θ0 )+LnT +O − θ (nT ) + P 0 P 0 P 0 nT 1 1 1   5 OP (nT )− 4 . In the following, we provide concise expressions for those terms in the expansion of QnT (θ˜nT )

around θ0 , where θ˜nT satisfies θ˜nT − θ0 1 = oP (1).

Before proceeding further, let’s examine the (K +2)×1 vectors C(1) , C(2) , C(3) and the (K +2)×(K +2) √ (3) (3) matrix C more closely. Notice that the K + 1 and K + 2 entries of C(3) have CK+1 = OP ( nT ) and CK+2 = √ OP ( nT ), which are of a higher stochastic order than C(1) and C(2) . However, as Theorem 1 shows, the higher order parts of C(3) will be canceled with terms from the log Jacobian determinant in the log likelihood function.

43

Recall that Zk = MΓ˜ n Rn Z¯ k MFT + MΓ˜ n Rn (Zk − Z¯ k ) for k = 1, · · · , K + 1. Notice that Zk,it is independent

from εit . The concentrated likelihood function is

1 1 log Sn (λ˜ nT )Rn (α˜ nT ) − log LnT (θ˜nT ) n 2 1 ˜ = log Sn (λnT )Rn (α˜ nT ) n    0  0  1 2 rem ˜ − log LnT (θ0 ) − √ θ˜nT − θ0 C(1) +C(2) +C(3) + θ˜nT − θ0 C θ˜nT − θ0 + LnT (θnT ) 2 nT 3  1 1 ˜ 2 2 ˜ ˜ = QnT (θ0 ) − tr (Gn ) (λnT − λ0 ) − tr Gn (λnT − λ0 ) + O( λnT − λ0 ) n 2n   1 1 − tr G˜ n (α˜ nT − α0 ) − tr G˜ 2n (α˜ nT − α0 )2 + O(|α˜ nT − α0 |3 ) n 2n  

QnT (θ˜nT ) =

 0 2 1  θ˜nT − θ0 − log 1 − √ 2  nT |

C(1) +C(2) +C(3) LnT (θ0 )

!

+ θ˜nT − θ0 {z x

0

 Lrem (θ˜nT )  C  θ˜nT − θ0 + nT , LnT (θ0 ) LnT (θ0 )  } (D.1)

where the Taylor expansions of log Sn (λ˜ nT ) and log |Rn (α˜ nT )| are used. From Lemma 9 and by using 2

log(1 + x) = x − x2 + O(x3 ) with

i  h   0 0 ˜ ˜ ˜ λ − λ εε ( α − α ) εε + tr M G M M tr MΓ˜ n Rn Gn R−1 nT 0 nT 0 n Γ˜ n n Γ˜ n Γ˜ n nT LnT (θ0 )        

2

2 1 5 1 + OP θ˜nT − θ0 1 (nT )− 2 + OP θ˜nT − θ0 1 (nT )− 4 + OP θ˜nT − θ0 1 + OP (nT )− 4 ,

x =−

2

   1 1 1 1 QnT (θ˜nT ) = QnT (θ0 ) − tr (Gn ) (λ˜ nT − λ0 ) − tr G2n (λ˜ nT − λ0 )2 − tr G˜ n (α˜ nT − α0 ) − tr G˜ 2n (α˜ nT − α0 )2 n 2n ! n 2n (1) (2) (3) 0 C  0 C +C +C 1 1 θ˜nT − θ0 +√ θ˜nT − θ0 − θ˜nT − θ0 LnT (θ0 ) 2 LnT (θ0 ) nT  2 2 0 tr MΓ˜ n Rn Gn R−1 tr MΓ˜ n G˜ n MΓ˜ n εε 0 1 1 n MΓ˜ n εε 2 2 ˜ + + (λnT − λ0 ) (α˜ nT − α0 ) LnT (θ0 )2 LnT (θ0 )2 (nT )2 (nT )2   −1 0 0 ˜ 2 ˜ nT − λ0 )(α˜ nT − α0 ) tr MΓ˜ n Rn Gn Rn MΓ˜ n εε tr MΓ˜ n Gn MΓ˜ n εε + ( λ LnT (θ0 )2 (nT )2       

2

3

3  1 1 5 Lrem (θ˜nT ) + OP θ˜nT − θ0 1 (nT )− 2 + OP θ˜nT − θ0 1 (nT )− 4 + OP θ˜nT − θ0 1 + OP (nT )− 2 . − nT 2LnT (θ0 ) ! 0 C(1) +C(2) +C(3) 0  1 1 (1) ˜ − θ˜nT − θ0 DnT θ˜nT − θ0 + Qrem = QnT (θ0 ) + √ θ˜nT − θ0 −D nT (θnT ), LnT (θ0 ) 2 nT (D.2)

 q q 0 T T ˜ where D(1) = 0, · · · 0, is a (K + 2) × 1 vector and the (K + 2) × (K + 2) n tr (Gn ) , n tr Gn 44

matrix DnT is the one in Eq. (9). The remainder term is    √

3 

2



− 54



˜ ˜ ˜ ˜ θ − θ + 2 + O θ − θ θ − θ 1 + nT nT Qrem ( θ ) = O (nT ) nT 0 1 P nT 0 1 nT 0 1 nT P nT    √

2 = oP (nT )−1 1 + nT θ˜nT − θ0 1 .

(D.3) (D.4)



3 

2

In Eq. (D.3), OP θ˜nT − θ0 1 = oP ( θ˜nT − θ0 1 ), because θ˜nT − θ0 1 = oP (1).   (1) (2) +C(3) (1) = O (1). Define γ = √1 D−1 C(1) +C(2) +C(3) − D(1) which is Lemma 9 shows that C L+C − D P LnT (θ0 ) nT (θ0 ) nT nT   1 OP √nT . The rest of the proof is similar to Corollary 4.3 of Moon and Weidner (2015b). Completing the  0 ˜ squares in Eq. (D.2), QnT (θ˜nT ) = QnT (θ0 ) − 21 θ˜nT − θ0 − γ DnT θ˜nT − θ0 − γ + 12 γ 0 DnT γ + Qrem nT (θnT ).

Consider the following two cases, θ˜nT = θˆ = arg max QnT (θ ) and θ˜nT = θ0 +γ. Notice that both θˆ and θ0 +γ

0  satisfy the condition that θ˜nT − θ0 = oP (1). As QnT (θˆnT ) = QnT (θ0 )− 1 θˆnT − θ0 − γ DnT θˆnT − θ0 − γ + 2

1

QnT (θ0 + γ) = QnT (θ0 ) + and QnT (θˆnT ) ≥ QnT (θ0 + γ), it 0  rem rem ˆ ˆ follows that θˆnT − θ0 − γ DnT θˆnT − θ0 − γ ≤ 2Qrem nT (θnT )−2QnT (θ0 +γ). Using Eq. (D.4), QnT (θnT )−  

 √ √ 2 −1 . Because DnT is assumed to be Qrem 1 + nT θˆnT − θ0 1 + 1 + nT kγk1 nT (θ0 + γ) ≤ oP (nT ) 1 0 2 γ DnT γ

ˆ + Qrem nT (θnT ),

1 0 2 γ DnT γ

+ Qrem nT (θ0 + γ),

positive definite,

   √ √ √

 nT θˆnT − θ0 − γ 1 ≤ oP 1 + nT θˆnT − θ0 1 + oP 1 + nT kγk1    √ √

 = oP 1 + nT θˆnT − θ0 − γ 1 + oP 1 + nT kγk1 .

√ √ √  Because γ = OP ( √1nT ), oP (1+ nT kγk1 ) = oP (1), therefore nT θˆnT − θ0 − γ 1 = oP (1), and nT θˆnT − θ0 −  (1) (2) (3)  √ √  C +C +C (1) + o (1). Because L (θ ) = nT γ = oP (1), which implies nT θˆnT − θ0 = D−1 − D P nT 0 nT LnT (θ0 )  1 2 σ0 + OP n , √ nT (θˆnT − θ0 )

! (1) +C (2) +C (3) C =D−1 − D(1) + oP (1) nT LnT (θ0 )   = (LnT (θ0 )DnT )−1 C(1) +C(2) +C(3) − D(1) LnT (θ0 ) + oP (1)    = (LnT (θ0 )DnT )−1 C(1) +C(2) +C(3) − D(1) σ02 + (LnT (θ0 )DnT )−1 D(1) σ02 − LnT (θ0 ) + oP (1)  −1  (1) = σ02 DnT C +C(2) +C(3) − D(1) σ02   −1 (1)    1 1 1 2 2 0 0 0 − 32 + σ0 DnT D σ0 − tr εε + tr PΓ˜ n εε + tr εPFT ε + OP (n ) + oP (1) nT nT nT

45



√1 nT √1 nT

T Z1,it εit ∑ni=1 ∑t=1

√1 tr nT √1 tr nT

MΓ˜ n Rn (Z1 − Z¯ 1 ) PFT ε 0





−      n T 0   ¯ MΓ˜ n Rn (Z2 − Z2 ) PFT ε ∑i=1 ∑t=1 Z2,it εit −            −1 . 2   .. = σ0 DnT       √1 ∑n ∑T ZK,it εit   t=1 i=1 nT      √1 n T ZK+1,it εit − √1nT tr MΓ˜ n Rn (ZK+1 − Z¯ K+1 ) PFT ε 0   nT ∑i=1 ∑t=1   0   0     ..   .      −1 2   + σ0 DnT 0   q      √1 tr M R G R−1 M εε 0 − √1 tr M R G R−1 M εP ε 0 − T tr (G ) σ 2  ˜Γn ˜ ˜Γn n n n ˜ F n n n  nT n T Γn Γn 0 n nT q      T 0 − √1 tr M G 0 − 2 ˜ ˜ ˜ √1 tr M ˜ G εε εP M M ε tr G σ ˜ ˜ ˜ F n n n T Γn Γn Γn Γn 0 n nT nT   0     . .   .      −1 2  + oP (1),  + σ0 DnT 0        1 tr (G ) √nT σ 2 − √1 tr (εε 0 ) + 1 tr (G ) √1 tr P˜ εε 0  + 1 tr (G ) √1 tr (εP ε 0 )  n n n FT  n Γn 0 n n nT nT nT     √  1  1  1 1 1 1 2 0 0 0 ˜n ˜ n √ tr PΓ˜ εε + tr G˜ n √ tr (εPFT ε ) √ tr (εε ) + tr G tr G nT σ − 0 n n n n nT nT nT (D.5)

where the results of Lemma 9 are used. Eq. (D.5) can be further simplified. Notice that   1 0 0 tr (Gn ) tr εPFT ε 0 − tr MΓ˜ n Rn Gn R−1 n MΓ˜ n εPFT ε = vec(ε) M vec(ε), n

where M = PFT ⊗ 21

and tr(M 2 ) = O(n).



2 −1 −1 0 0 0 n tr(Gn )In − MΓ˜ n Rn Gn Rn MΓ˜ n − MΓ˜ n Rn Gn Rn MΓ˜ n . M is symmetric, tr (M ) = O(1),    2 4 2 + σ 2 tr(M ) 2 = We have E (vec(ε)0 M vec(ε))2 = µ (4) − 3σ04 ∑nT i=1 [M ]ii +2σ0 tr M 0

  0 = OP ( √1T ). Similarly, we have O(n). Therefore √1nT 1n tr (Gn ) tr (εPFT ε 0 ) − tr MΓ˜ n Rn Gn R−1 n MΓ˜ n εPFT ε   1 √1 tr G˜ n tr (εPFT ε 0 ) − tr MΓ˜ n G˜ n MΓ˜ n εPFT ε 0 = OP ( √1T ). Putting the above two terms into the remainnT n der, Eq. (D.5) becomes,

√ −1 1 √ ∆nT ) nT (θˆnT − θ0 − σ02 DnT nT

46



√1 nT √1 nT

T Z1,it εit ∑ni=1 ∑t=1



    0   T    Z2,it εit  ∑ni=1 ∑t=1 ..         . ..       . −1 −1     = σ02 DnT  + σ02 DnT    + oP (1), 0   √1 n   T   nT ∑i=1 ∑t=1 ZK,it εit   1  0 − √1 tr (εε 0 ) 1 tr (G )    √ tr Rn Gn R−1 εε n n n   √1 n nT  nT  T ZK+1,it εit   nT ∑i=1 ∑t=1   ˜ n εε 0 − √1 tr (εε 0 ) 1 tr G˜ n   √1 tr G n nT nT 0 (D.6)

where 

∆nT

−tr MΓ˜ n Rn (Z1 − Z¯ 1 ) PFT ε 0





      −tr MΓ˜ Rn (Z2 − Z¯ 2 ) PFT ε 0  n       0     1 ..  =√  .   nT     0        −tr MΓ˜ n Rn (ZK+1 − Z¯ K+1 ) PFT ε 0    0 

(D.7)



0       0       0     1 .. . +√  .   nT     0          1 0  0 −1 0 −1 0 −tr PΓ˜ n Rn Gn R−1 n εε − tr Rn Gn Rn PΓ˜ n εε + tr PΓ˜ n Rn Gn Rn PΓ˜ n εε + n tr (Gn ) tr PΓ˜ n εε         −tr PΓ˜ n G˜ n εε 0 − tr G˜ n PΓ˜ n εε 0 + tr PΓ˜ n G˜ n PΓ˜ n εε 0 + 1n tr G˜ n tr PΓ˜ n εε 0 (D.8)

From (D.6) it is clear that the limiting distribution is not centered but with the bias σ02 DnT

−1

1 nT ∆nT .

The bias arises from the predetermined regressors (D.7) and the interactions between spatial effects and the factor loadings (D.8). The bias terms would disappear if all regressors were strictly exogenous and no spatial effects were present.

47

Applying Lemmas 5 and 6, ∆nT = ϕnT + OP 

ϕnT



√1 n

 , where



  σ02 −1 −1 − √nT tr J0 PFT Jh0 tr Ah−1 ∑Th=1 n Sn   σ02 −1 −1 − √nT tr J0 PFT Jh0 tr Wn Ah−1 ∑Th=1 n Sn

      0   .. = .    0    qT 2  2  √σ0 T −1 0 tr (γG + ρG W ) Ah−1 S−1 + − tr J P J ∑  nT h=1 n n n 0 FT h n n n σ0  q   T 2 r0 ˜ ˜ n σ0 n tr Gn − tr PΓ˜ n Gn

Jh = 0T ×(T −h) , IT ×T , 0T ×h

0

        ,       r0 −1  n tr (Gn ) − tr PΓ˜ n Rn Gn Rn 

for h = 0, · · · , T − 1, and IT ×T is T × T identity matrix. It follows that

√ −1 nT (θˆnT − θ0 ) − σ02 DnT ϕnT   √1 ∑n ∑T Z1,it εit     nT i=1 t=1 0   1 n T    √ ∑i=1 ∑t=1 Z2,it εit  ..     nT     . ..     −1  −1  .   2 2 = σ0 DnT  + σ0 DnT   + oP (1).  0     √1 n T Z ε ∑ ∑   1   nT i=1 t=1 K,it it  1 1 0 0   √ tr Rn Gn R−1   √ n εε − nT tr (εε ) n tr (Gn )   √1 n nT  T  nT ∑i=1 ∑t=1 ZK+1,it εit    ˜ n εε 0 − √1 tr (εε 0 ) 1 tr G˜ n   √1 tr G n nT nT 0 Proof of Theorem 2. c 0 vec(ε)+vec(ε)0 Ac vec(ε), where ω c = vec c The object of interest is c0 νnT = bcnT 0 vec(ε)+ωnT ∑∞ h=1 Pnh ε˜h nT nT



and AcnT is a nonstochastic symmetric matrix. From Yu et al. (2008), p.128, Ec0 νnT = σ02 tr (AcnT ) = 0, !   nT ∞ nT   0 4 c 0 c var c νnT = T σ0 Etr ∑ Pnh Pnh + σ02 EbcnT 0 bcnT + 2σ04 tr AcnT 2 + 2µ (3) E ∑ [bcnT ]i [AcnT ]ii + µ (4) − 3σ04 ∑ [AcnT ]2ii . i=1

h=1

Explicitly,

1 0 nT var (c νnT )



D˜ nT

1 nT tr

i=1

= σ04 c0 (D¯ nT + ΣnT + o(1)) c, where D¯ nT = p limn,T →∞ D˜ nT , and

MΓ˜ n Rn Z¯ 1 MFT Z¯ 10 R0n .. .



···

   1    = 2  1 tr MΓ˜ Rn Z¯ K MFT Z¯ 10 R0n ··· nT n σ0   1  tr MΓ˜ Rn Z¯ K+1 MFT Z¯ 10 R0n · · · n  nT 0 ···



1 nT tr

MΓ˜ n Rn Z¯ 1 MFT Z¯ K0 R0n .. .

1 nT tr

MΓ˜ n Rn Z¯ K MFT Z¯ K0 R0n

1 nT tr



MΓ˜ n Rn Z¯ K+1 MFT Z¯ K0 R0n 0

48

 0  ..  .    1 0 0 ¯ ¯ 0 nT tr MΓ˜ n Rn ZK MFT ZK+1 Rn    1 0 0 ¯ ¯  tr M R Z M Z R 0 Γ˜ n n K+1 FT K+1 n nT  0 0 1 nT tr



0 MΓ˜ n Rn Z¯ 1 MFT Z¯ K+1 R0n .. .





φ1,2  φ1,1   φ1,2 φ2,2    0 0   . . +  .   0 0    φ1,K+1 φ2,K+1  0 0

0 ···

φ1,K+1

0 ··· .. .. . .

0 .. .

0

0

0 ···

0

φ2,K+1

0

0 · · · φK+1,K+1 + ψK+1,K+1 0 ···

ψK+1,K+2



   0    0   .. , .    0    ψK+1,K+2   ψK+2,K+2

(D.9)

and φ1,1 = n1 tr (Rn Pn R0n ), φ1,2 = 1n tr (Rn PnWn0 R0n ), φ2,2 = n1 tr (RnWn PnWn0 R0n ), φ1,K+1 = 1n tr (Rn Pn (γ0 G0n + ρ0Wn0 G0n ) R0n ), φ2,K+1 = 1n tr (RnWn Pn (γ0 G0n + ρ0Wn0 G0n ) R0n ), φK+1,K+1 = n1 tr (Rn (γ0 Gn + ρ0 GnWn ) Pn (γ0 G0n + ρ0Wn0 G0n ) R0n ),  1 2 h−1 −1 −1 −1 0 −1 0 h−1 0 1 1 −1 −1 0 0 0 2 where Pn = ∑∞ h=1 A0n Sn Rn Rn Sn A0n ; and ψK+1,K+1 = n tr Rn Gn Rn Rn Gn Rn + n tr(Gn )−2 n tr (Gn ) ,   2  1 2 1 1 ˜0 ˜ ˜0 ˜2 ˜ ψK+1,K+2 = 1 tr Gn G˜ n + 1 tr Rn Gn R−1 tr G˜ n . n Gn − 2 tr(Gn )tr(Gn ) and ψK+2,K+2 = tr Gn Gn + tr(Gn )−2 n

n

n

n

n

n

ΣnT is defined in Eq. (10).

Lemma 10 shows that DnT = D˜ nT + oP (1), therefore

1 0 nT var (c νnT )

= σ04 c0 (DnT + ΣnT + oP (1)) c.

Proof of Theorem 3. The theorem follows from Theorem 2 by applying the CLT of the martingale difference array. Proof of Theorem 4. We have the following useful results.

 

(1) σ02 can be estimated by LnT (θˆnT ). This is because from Lemmas 8, and θˆnT − θ0 1 = OP √1nT ,     LnT (θˆnT ) = LnT (θ0 ) + OP √1nT . From Eq. (C.5), LnT (θ0 ) = σ02 + OP √1nT .  



ˆ W˜ n and θˆnT − θ0 1 = OP √1 . (2) Rˆ n − Rn 2 = OP ( √1nT ), because Rˆ n = Rn + (α0 − α) nT ˆ −1 −1 ˜ ˜ ˜ ˜ ¯ ¯ (3) By the mean value theorem, Gn − Gn = Wn Rn (α) Wn Rn (α) (αˆ − α0 ), for some α¯ in between of



−1

−1

ˆ Because W˜ n 2 and Rn 2 are uniformly bounded, Rn (α) 2 is uniformly bounded in a α0 and α.

   



neighborhood of α0 by Lemma 1. G˜ˆ n − G˜ n = OP √1nT . Similarly, Gˆ n − Gn 2 = OP √1nT and 2  

−1

Rˆ n − Rn = OP √1 . 2 nT



ˆ (4) ZK+1 − ZK+1 2 = OP (1), because Zˆ K+1 −ZK+1 = ∑Kk=1 δˆk Gˆ n Zk − ∑Kk=1 δ0k Gn Zk = ∑Kk=1 δˆk (Gˆ n −Gn )Zk +

 



∑Kk=1 (δˆk − δ0k )Gn Zk , δˆ − δ = OP ( √1nT ), and from (3), Gˆ n − Gn 2 = OP √1nT . 1

p

To prove Theorem 4, it is sufficient to show that ϕˆ nT − ϕ → 0, which immediately follows from (1) to

(4) above. For example,

         tr PΓˆ˜ n Gˆ˜ n − tr PΓ˜ n G˜ n ≤ tr PΓˆ˜ n − PΓ˜ n Gˆ˜ n + tr PΓ˜ n Gˆ˜ n − G˜ n  



1

ˆ˜

ˆ˜

˜ ≤ 2r0 PΓˆ˜ − PΓ˜ n Gn + r0 PΓ˜ n 2 Gn − Gn = OP √ , n n 2 2 2 49



   



where PΓˆ˜ − PΓ˜ n = OP √1n by Lemma 11 and Gˆ n − Gn 2 = OP √1nT from (3). n 2    1 0 Furthermore, Dˆ nT = DnT +oP (1), because, for example, nT Rˆ 0n − tr MΓ˜ n Rn Zk MFT ZK+1 R0n = tr MΓˆ˜ n Rˆ n Zk MFˆT Zˆ K+1         oP (1) for k = 1, · · · , K by (1) to (4) above. 1n tr Gˆ˜ n − tr G˜ n = OP √1nT , 1n tr Gˆ n − tr (Gn ) = OP √1nT      −1 −1 P R G = O ˆ ˆ √1 and tr Rˆ −1 P R G − tr R . Finally, Dˆ −1 ˜ ˆ n n n n P n n Γn nT = DnT + oP (1). Γ˜ nT n

Proof of Theorem 5.

Our objective is to show that limn,T →∞ Pr (PC(r) < PC(r0 )) = 0 for r 6= r0 and r ≤ rmax . The case for

the consistency of the IC(r) criterion then follows using similar arguments for Corollary 1 in Bai and Ng p (2002). Denote σi (A) the i-th largest singular value for matrix A. Note that σi (A) = kAk2 = µi (AA0 ). The following singular value inequality is useful (see Theorem 8.13 in Zhang (2011)), σi (A) − σ1 (B) ≤ σi (A + B) ≤ σi (A) + σ1 (B) ,

(D.10)

where A and B are m × n matrices.

Firstly supposing r > r0 but r ≤ rmax . Note that PC(r) < PC(r0 ) is equivalent to

⇐⇒ V (r) + r · g(n, T ) < V (r0 ) + r0 · g(n, T ) ⇐⇒ V (r0 ) −V (r) > (r − r0 )g(n, T ) ⇐⇒

r

cnT nT



i=r0 +1

 µi D˘ nT D˘ 0nT > cnT (r − r0 )g(n, T ),

where recall that D˘ nT = Γn FT0 + E˘nT , V (r0 ) −V (r) =

1 nT

 ∑ri=r0 +1 µi D˘ nT D˘ 0nT is from the definition of V (r)

and Eq.(12), and cnT = min (n, T ). From Proposition 3 and Assumption R, the composite error E˘nT satisfies

√ 

E˘nT = OP max √n, T . From Eq.(D.10), for i > r0 and note that σi (Γn F 0 ) = 0 because its rank is T 2

r0 ,

 √ √ 

    σi D˘ nT ≤ σi Γn FT0 + σ1 E˘nT = σ1 E˘nT = E˘nT 2 = OP max n, T . Therefore

cnT nT

 ∑ri=r0 +1 µi D˘ nT D˘ 0nT = OP (1). Because cnT g(n, T ) → ∞, limn,T →∞ Pr (PC(r) < PC(r0 )) = 0.

Next consider the case of r < r0 . PC(r) < PC(r0 ) is equivalent to ⇐⇒ V (r) −V (r0 ) < (r0 − r)g(n, T ) ⇐⇒

1 nT

r0



i=r+1

 µi D˘ nT D˘ 0nT < (r0 − r)g(n, T ).

From Eq.(D.10),

      1 1 1 1 − 12 0 0 ˘ ˘ √ σi DnT ≥ √ σi Γn FT − √ σ1 EnT = √ σi Γn FT + OP cnT . nT nT nT nT 50

1 µr (Γn FT0 FT Γ0n ) > 0 for r ≤ r0 . By Assumption SF, plimn,T →∞ nT

limn,T →∞ Pr (PC(r) < PC(r0 )) = 0 for r < r0 .

51

Together with g(n, T ) → 0,