Infinite-dimensional VARs and factor models

Infinite-dimensional VARs and factor models

Journal of Econometrics 163 (2011) 4–22 Contents lists available at ScienceDirect Journal of Econometrics journal homepage: www.elsevier.com/locate/...

548KB Sizes 0 Downloads 45 Views

Journal of Econometrics 163 (2011) 4–22

Contents lists available at ScienceDirect

Journal of Econometrics journal homepage: www.elsevier.com/locate/jeconom

Infinite-dimensional VARs and factor models✩ Alexander Chudik a,b , M. Hashem Pesaran b,c,d,∗ a

European Central Bank, Kaiserstrasse 29, 60311 Frankfurt am Main, Germany

b

Centre for International Macroeconomics and Finance, University of Cambridge, Austin Robinson Building, Sidgwick Avenue, Cambridge, CB3 9DD, UK

c

Faculty of Economics, Cambridge University, Austin Robinson Building, Sidgwick Avenue, Cambridge, CB3 9DD, UK

d

University of Southern California, College of Letters, Arts and Sciences, University Park Campus, Kaprielian Hall 300, KAP M/C 0253, Los Angeles, CA, USA

article

info

Article history: Available online 11 November 2010 JEL classification: C10 C33 C51 Keywords: Large N and T panels Weak and strong cross-section dependence VARs Spatial models Factor models

abstract This paper proposes a novel approach for dealing with the ‘curse of dimensionality’ in the case of infinitedimensional vector autoregressive (IVAR) models. It is assumed that each unit or variable in the IVAR is related to a small number of neighbors and a large number of non-neighbors. The neighborhood effects are fixed and do not change with the number of units (N), but the coefficients of non-neighboring units are restricted to vanish in the limit as N tends to infinity. Problems of estimation and inference in a stationary IVAR model with an unknown number of unobserved common factors are investigated. A cross-section augmented least-squares (CALS) estimator is proposed and its asymptotic distribution is derived. Satisfactory small-sample properties are documented by Monte Carlo experiments. An empirical illustration shows the statistical significance of dynamic spillover effects in modeling of US real house prices across the neighboring states. © 2010 Elsevier B.V. All rights reserved.

1. Introduction Vector autoregressive (VAR) models provide a flexible framework for the analysis of complex dynamics and interactions that exist across economic variables or units. Traditional VARs assume that the number of such variables, N, is fixed and the time dimension, T , tends to infinity. But since the number of parameters to be estimated grows at a quadratic rate with N, in practice the empirical applications of VARs often involve only a handful of variables. The objective of this paper is to consider VARs where both N and T are large. In this case, parameters of the VAR model can no longer

✩ We are grateful to Elisa Tosetti, Jean-Pierre Urbain, and three anonymous referees for helpful comments and constructive suggestions. This version has also benefited from comments by the seminar participants at the University of California, San Diego, University of Southern California, Columbia University, University of Leicester, European University Institute, McGill University, Princeton University, University of Pennsylvania, Stanford University; as well as by the conference participants in the Nordic Econometric Meeting held at Lund University, and in the conference on the Factor Structures for Panels and Multivariate Time Series Data held at Maastricht University. We would also like to acknowledge Takashi Yamagata for carrying out the computation of the results reported in Section 6. ∗ Corresponding author at: Faculty of Economics, Cambridge University, Austin Robinson Building, Sidgwick Avenue, Cambridge, CB3 9DD, UK. E-mail addresses: [email protected] (A. Chudik), [email protected] (M.H. Pesaran). URL: http://www.econ.cam.ac.uk/faculty/pesaran/ (M.H. Pesaran).

0304-4076/$ – see front matter © 2010 Elsevier B.V. All rights reserved. doi:10.1016/j.jeconom.2010.11.002

be consistently estimated unless suitable restrictions are imposed to overcome the dimensionality problem. Two different approaches have been suggested in the literature to deal with this ‘curse of dimensionality’: (i) shrinkage of the parameter space, and (ii) shrinkage of the data. Spatial and/or spatiotemporal literature shrinks the parameter space by using a priori given spatial weight matrices that restrict the nature of the links across the units. Alternatively, prior probability distributions are imposed on the parameters of the VAR such as the ‘Minnesota’ priors proposed by Doan et al. (1984). This class of models is known as Bayesian VARs (BVARs).1 The second approach is to shrink the data, along the lines of index models. Geweke (1977) and Sargent and Sims (1977) introduced dynamic factor models, which have more recently been generalized to allow for weak cross-section dependence by Forni and Lippi (2001) and Forni et al. (2000, 2004). Empirical evidence suggests that few dynamic factors are needed to explain

1 Other types of prior have also been considered in the literature. See, for example, Del Negro and Schorfheide (2004) for a recent reference. In most applications, BVARs have been applied to relatively small systems (e.g. Leeper et al. (1996) considered 13- and 18-variable BVARs; a few exceptions include Giacomini and White (2006) and De Mol et al. (2008)), with the focus being mainly on forecasting. Bayesian VARs are known to produce better forecasts than unrestricted VARs or structural models. See Litterman (1986) and Canova (1995) for further references.

A. Chudik, M.H. Pesaran / Journal of Econometrics 163 (2011) 4–22

the co-movements of macroeconomic variables.2 This has led to the development of factor-augmented VAR (FAVAR) models by Bernanke et al. (2005) and Stock and Watson (2005), among others. Applied researchers are often forced to impose arbitrary restrictions on the coefficients that link the variables of a given cross-section unit to the current and lagged values of the remaining units, mostly because they realize that without such restrictions the model is not estimable. This paper proposes a novel way to deal with the curse of dimensionality by shrinking part of the parameter space in the limit as the number of variables (N) tends to infinity. An important example would be a VAR model in which each unit is related to a small number of neighbors and a large number of non-neighbors. The neighbors could be individual units or, more generally, linear combinations of units (spatial averages). The neighborhood effects are fixed and do not change with N, but the coefficients corresponding to the remaining non-neighbor units are small, of order O(N −1 ). Such neighborhood and nonneighborhood effects could be motivated by theoretical economic considerations, or could arise due to the mis-specification of spatial weights. Although under this set-up each of the non-neighboring coefficients is small, the sum of their absolute values in general does not tend to zero and the aggregate spatiotemporal nonneighborhood effects could be large. This paper shows that, under weak cross-section dependence, the spillover effects from non-neighboring units are neither particularly important, nor estimable.3 But the coefficients associated with the neighboring units can be consistently estimated by simply ignoring the nonneighborhood effects that are of second-order importance in N. On the other hand, if the units are cross-sectionally strongly dependent, then the spillover effects from non-neighbors are in general important, and ignoring such effects can lead to inconsistent estimates. Another model of interest arises when, in addition to the neighborhood effects, there is also a fixed number of dominant units that have non-negligible effects on all other units. In this case the limiting outcome is shown to be a dynamic factor model.4 Accordingly, this paper provides a link between data and parameter shrinkage approaches to mitigating the curse of dimensionality. By imposing limiting restrictions on some of the parameters of the VAR we effectively end up with a data shrinkage. To distinguish high-dimensional VAR models from the standard specifications, we refer to the former as the infinite-dimensional VARs or IVARs for short. The paper also establishes the conditions under which the global VAR (GVAR) approach proposed by Pesaran et al. (2004) is applicable.5 In particular, the IVAR featuring all macroeconomic variables could be arbitrarily well approximated by a set of finitedimensional small-scale models that can be consistently estimated separately in the spirit of the GVAR. A second contribution of this paper is the development of appropriate econometric techniques for estimation and inference

5

in stationary IVAR models with an unknown number of unobserved common factors. This extends the analysis of Pesaran (2006) to dynamic models where all variables are determined endogenously. A simple cross-section augmented least-squares estimator (or CALS for short) is proposed and its asymptotic distribution derived. Small-sample properties of the proposed estimator are investigated through Monte Carlo experiments. As an illustration of the proposed approach we consider an extension of the empirical analysis of real house prices across the 49 US states conducted recently by Holly et al. (2010), and show statistically significant dynamic spillover effects of real house prices across the neighboring states. The remainder of the paper is organized as follows. Section 2 introduces the IVAR model. Section 3 investigates cross-section dependence in IVAR models. Section 4 focusses on the estimation of a stationary IVAR model. Section 5 discusses the results of the Monte Carlo (MC) experiments, and Section 6 presents the empirical results. The final section offers some concluding remarks. Proofs are provided in the Appendix. We give a brief word on notation. |λ1 (A)| ≥ |λ2 (A)| ≥ · · · ≥ ×n is the |λn (A)| are the eigenvalues of A ∈ Mn×n , where Mn∑ n space of real-valued n × n matrices. ‖A‖1 ≡ max1≤j≤n i=1 |aij | denotes the maximum ∑ absolute column sum matrix norm of A, n and ‖A‖∞ ≡ max1≤i≤n j=1 |aij | is the absolute row sum matrix norm of A. ‖A‖ = ϱ(A′ A) is the spectral norm of A, and ϱ(A) ≡ max1≤i≤n {|λi (A)|} is the spectral radius of A.6 All vectors are column vectors, and the ith row of A is denoted by a′i . an = O(bn ) denotes that the deterministic sequence {an } is at most of order bn . xn = Op (yn ) states that the random variable xn is at most of order yn in probability. N is the set of natural numbers, and Z is the set of integers. We use K and ϵ to denote positive fixed constants that do not vary with N or T . Convergence in



d

distribution and convergence in probability is denoted by → and q.m.

p

→, respectively. Symbol → represents convergence in quadratic j

mean. (N , T ) → ∞ denotes joint asymptotic in N and T , with N and T → ∞, in no particular order. 2. Infinite-dimensional vector autoregressive models Suppose we have T time series observations on N cross-section units indexed by i ∈ S(N ) ≡ {1, . . . , N } ⊆ N. Individual units could be households, firms, regions, or countries. Both dimensions, N and T , are assumed to be large. For each point in time, t, and for each N ∈ N, the N cross-section observations are collected in the N × 1 vector x(N ),t = (x(N ),1t , . . . , x(N ),Nt )′ , and it is assumed that x(N ),t follows the VAR(1) model: x(N ),t = Φ(N ) x(N ),t −1 + u(N ),t ,

(1)

u(N ),t = R(N ) ε(N ),t .

(2)

Φ(N ) and R(N ) are N × N coefficient matrices that capture the 2 Stock and Watson (1999, 2002), Giannone et al. (2005) conclude that only a few, perhaps two, factors explain much of the predictable variations, while Bai and Ng (2007) estimate four factors and Stock and Watson (2005) estimate as many as seven factors. 3 Concepts of strong and weak cross-section dependence, introduced in Chudik et al. (2010), will be applied to VAR models. 4 The case of IVAR models with a dominant unit is studied in Pesaran and Chudik (2010). 5 The GVAR model has been used to analyse credit risk in Pesaran et al. (2006, 2007). An extended and updated version of the GVAR by Dées et al. (2007), which treats the Euro area as a single economic area, was used by Pesaran et al. (2007) to evaluate UK entry into the Euro. Global dominance of the US economy in a GVAR model is considered in Chudik (2008). Further developments of a global modeling approach are provided in Pesaran and Smith (2006). Garratt et al. (2006) provide a textbook treatment of GVAR.

dynamic and contemporaneous dependences across the N units, and ε(N ),t = (ε1t , ε2t , . . . , εNt )′ is an N × 1 vector of white noise errors with mean 0 and covariance matrix IN . VAR models have been extensively studied when N is small and fixed, and T is large and unbounded. This framework, however, is not appropriate for many empirical applications of interest. This paper aims to fill this gap by analyzing VAR models where both N and T are large. The sequence of models (1) and (2) with dim(x(N ),t ) = N → ∞ will be referred to as the infinitedimensional VAR model, or IVAR for short. The extension of the 6 Note that, if x is a vector, then ‖x‖ = Euclidean length of vector x.

 √ ϱ(x′ x) = x′ x corresponds to the

6

A. Chudik, M.H. Pesaran / Journal of Econometrics 163 (2011) 4–22

IVAR(1) to IVAR(p), where p is fixed, is relatively straightforward, and will not be attempted in this paper. The analysis of dependence over time is simplified by the facts that ordering of observations along the time dimension (t = 1, 2, . . . , T ) is immutable and the arrival of new observations cannot change past realizations; namely, bygones are bygones. As a consequence, for any given N, i, and j, the cross-time covariance function, cov(x(N ),it , x(N ),j,t −ℓ ), does not change with T and will depend only on ℓ if the time series processes are covariance stationary. However, since it cannot be assumed that an immutable ordering necessarily exists with respect to the cross-section dimension, addition of new cross-section units to an existing set can potentially alter the pair-wise cross-section covariances of all the units. For instance, in models of oligopoly, where firms strategically interact with each other, new entries can change the relationship between the existing firms. Similarly, the introduction of a new asset in the market can change the correlation of returns on the existing assets. In what follows, to simplify the notation, the explicit dependence of xt and ut and the related parameter matrices on N will be suppressed, with (1)–(2) written as xt = Φxt −1 + ut ,

(3)

and ut = Rεt .

(4)

Clearly, it is not possible to estimate all the N 2 elements of the matrix Φ when both N and T are large. Only a small (fixed) number of unknown coefficients can be estimated per equation, and some restrictions on Φ must be imposed. In order to deal with the dimensionality problem, we assume that, for a given i ∈ N, it is possible to classify cross-section units a priori into ‘neighbors’ and ‘non-neighbors’. No restrictions are imposed on neighbors, but the non-neighbors are assumed to have only negligible effects on xit that vanish at a suitable rate with N. The number of neighbors of unit i, collected in the index set Ni , is assumed to be small (fixed). Neighbors of unit i can have nonnegligible effects that do not vanish even if N → ∞. A similar classification is followed in the spatial econometrics literature, where the non-neighborhood effects are set to zero for all N and the non-zero neighborhood effects are often assumed to be homogenous across i. In this sense, our analysis can also be seen as an extension of spatial econometric models. Subject to the above classification, the equation for unit i can be written as xit =



φij xj,t −1 +

φij xj,t −1 +uit .

(5)

j∈Nic

j∈Ni







Neighbors









Non-neighbors

The coefficients of the neighboring units, {φij }j∈Ni , are the parameters of interest, and they do not vary with N. The remaining coefficients, {φij }j∈N c , tend to zero for each i as N → ∞, where i Nic ≡ {1, . . . , N } \ Ni is the index set of non-neighbors. Note that the non-neighbors are unordered. More specifically,

|φij | ≤

K N

for any N ∈ N and any j ∈ Nic .

(6)

Individually, the coefficients of non-neighbors are asymptotically negligible, but, as we argue below, it is not clear if∑ the same applies to their aggregate effects on the ith unit, namely j∈N c φij xj,t −1 . i

The bounds in (6) ensure that limN →∞ j=1 |φij | < K . We refer to this as the ‘cross-section absolute summability condition’, which is distinct from the absolute summability condition used in the time series literature, where the same idea is applied to the coefficients of current and past innovations. A similar constraint

∑N

is used in Lasso and Ridge regression shrinkage methods. The Lasso estimation procedure applied to (3) involves minimizing ∑T ∑N 2 t =1 uit for each i subject to j=1 |φij | ≤ K . Under the Ridge regression, the minimization is carried out subject to the weaker ∑N 2 7 constraint, j=1 φij ≤ K . In applications of shrinkage methods, it is necessary that K is specified a priori, but no knowledge of the ordering of the units along the cross-section dimension is needed. In our approach, we do not need to specify the value of K . ∑ The sum of the coefficients of the non-neighboring units, j∈N c φij , does not necessarily tends to zero as N → ∞, which i

implies that the non-neighbors can have a large aggregate spatiotemporal impact on the unit i, as N → ∞. The question that we address is whether it is possible to estimate neighborhood coefficients {φij }j∈Ni without imposing further restrictions. As it turns out, the answer depends on the stochastic behavior of ∑ c φij xj,t −1 , which in turn depends on the strength of crossj∈N i

section dependence in {xit }. If {xit } is weakly cross-sectionally dependent, then



q.m.

j∈Nic

φij xj,t −1 → 0, and the spillover effects

from non-neighboring units are neither particularly important nor estimable. But the coefficients associated with the neighboring units can be consistently estimated by simply ignoring the nonneighborhood effects that are of second-order importance in N. If, on the other hand, ∑ {xit } is strongly cross-sectionally dependent, then limN →∞ Var ( j∈N c φij xj,t −1 ) is not necessarily zero, and the i

spillover effects from non-neighbors are in general Op (1) and important. Therefore, ignoring the non-neighborhood effects can lead to inconsistent estimates. The concepts of weak and strong cross-section dependence were introduced in Chudik et al. (2010), and these concepts are applied to the IVAR model in the next section. Our approach to dealing with the curse of dimensionality can be motivated with a couple of examples. One important example is provided by the Arbitrage Pricing Theory (APT), originally developed by Ross (1976). Under approximate pricing, the conditional mean returns of N risky assets, µt , is modeled in terms of a fixed number (k) of factor risk premia, λt , and an N × 1 vector of pricing errors, vt , namely µt = Bλt + vt , where B is an N × k matrix of factor loadings. In the absence of arbitrage opportunities, we must have vt = 0 when N is fixed, or v′t vt = Op (1) as N → ∞ (see Hubermann (1982) and Ingersoll (1984)). It is clear that any pair-wise dependence of pricing errors must vanish as N → ∞, otherwise there will be unbounded profitable opportunities. Another example relates to a multi-country DSGE model discussed in Chudik (2008). The country interactions need not be symmetric. Nevertheless, as long as the foreign trade weights are granular, the equilibrium solution of such a multicountry DSGE model has a similar structure to the basic IVAR model set out in this paper. Neighbors in this set-up could, for example, be identified in terms of the trade shares. For instance, US would be Canada’s neighbor considering that 80% of Canada’s trade is with the US, although using the same metric Canada might not qualify as a neighbor of the US. In some cases, the strict division of individual units into neighbors and non-neighbors might be considered as too restrictive. In the assumption below, we consider a slightly more general set-up in which the neighborhood effects are characterized in terms of ‘local’ averages defined by S′i xt , where Si is a known spatial or neighborhood weight matrix. Assumption 1. Let K ⊆ N be a non-empty index set. For any i ∈ K , the row i of coefficient matrix Φ, denoted by φ′i , can be divided as

φ′i = φ′ai + φ′bi ,

(7)

7 See Section 3.4.3 of Hastie et al. (2001) for a detailed description of the Lasso and Ridge regression shrinkage methods.

A. Chudik, M.H. Pesaran / Journal of Econometrics 163 (2011) 4–22

7

3. Cross-sectional dependence in stationary IVAR models

where

‖φbi ‖∞ = max |φbij | < j∈{1,...,N }

K N

,

(8)

φai = Si δi ,

(9)

‖δi ‖ < K , δi is an hi × 1-dimensional vector containing the unknown coefficients to be estimated for unit i, which do not change with N, hi < K , hi is fixed and generally small, and Si is a known N × hi ‘spatial’ weight matrix such that ‖Si ‖1 < K . Assuming K ≡ N and stacking (7)–(9) for i = 1, 2, . . . , N, we have

(10)

 ′ δ1 0 D =  .. N ×h .

δ2

0

0

0

···

0



0

 , 



.

(11)

δ′N

∑N

Φa

φ23 φ33 .. .

0 0

0 0

0 0

φ34 0 0

... ... ... .. . ... ...

0 0 0

0 0 0

0 0 0

.. .

.. .

.. .

φN −1,N −2 0

φN −1,N −1 φN −1,N

φN −1,N φNN

    ,    (12)

where the non-zero elements are fixed coefficients that do not change with N. This represents a bilateral spatial representation in which each unit, except for the first and the last units, has one left and one right neighbor. In contrast, the individual elements of Φb are of order O(N −1 ), in particular |φbij | < NK for any N ∈ N and any i, j ∈ {1, . . . , N }. The equation for unit i ∈ {2, . . . , N − 1} can be written as xit = φi,i−1 xi−1,t −1 + φii xi,t −1 + φi,i+1 xi+1,t −1 + φ′bi xt −1 + uit . (13) Section 3 shows that, under weak cross-section dependence of q.m.

errors {uit }, φ′bi xt −1 → 0, while Section 4 considers the problem of estimation of the individual-specific parameters {φi,i−1 , φii , φi,i+1 }. We refer to this model as a two-neighbor IVAR model, which we use later for illustrative purposes as well as in the Monte Carlo experiments. Example 2. As a simple example, consider the model xt = ρx Sx xt −1 + ut ,

(14)

ut = ρu Su ut + εt ,

(15)

where ρx and ρu are scalar unknown coefficients, and Sx and Su are N × N known spatial weight matrices. This model can be obtained from (1)–(2) by setting R = (I − ρu Su )

−1

S = Sx ,

,

δi = ρx for i ∈ {1, . . . , N },

and Φb = 0.

(18)

The necessary condition for covariance stationarity for fixed N is that all eigenvalues of Φ lie inside the unit circle. For a fixed N, and assuming that maxi |λi (Φ)| < 1, the Euclidean norm of Φℓ defined by [Tr (Φℓ Φℓ′ )]1/2 → 0 exponentially in ℓ, and the process ∑∞ ℓ xt = ℓ=0 Φ ut −ℓ will be absolute summable. However, note that, as N → ∞, Var (xit ) need not necessarily be bounded in N even if maxi |λi (Φ)| < 1. For example, consider the IVAR(1) model with

ϕ ψ 0 Φ= .  ..

ϕ ψ

0

0



Example 1. An example of Φa is given by 0

{xit } is said to be cross-sectionally strongly dependent (CSD) if

lim Var (w′ xt ) ≥ K > 0.

h = i=1 hi , and S is a known h × N matrix defined by S = (S1 , S2 , . . . , SN )′ . Note also that by assumption the individual elements of Φb are uniformly O(N −1 ).

φ12 φ22 φ32 .. .

for all t .

N →∞



..

(17)

there exists a sequence of weight vectors, w, satisfying (16)–(17) and a constant K such that

where Φa = (φa1 , φa2 , . . . , φaN ) , Φb = (φb1 , φb2 , . . . , φbN ) , ′

(16)

we have N →∞

= DS + Φb ,

0

1

‖w‖ = O(N − 2 ), wj 1 = O(N − 2 ) for any j, ‖w‖ lim Var (w′ xt ) = 0,

Φ = Φa + Φb ,

 φ11 φ21  0  =  ..  .  0

This section investigates the correlation pattern of {xit }, over time, t, and along the cross-section units, i. We follow Chudik et al. (2010) and define the covariance stationary process {xit } to be cross-sectionally weakly dependent (CWD) if, for all weight vectors, w = (w1 , . . . , wN )′ , satisfying the ‘granularity’ conditions

0

0 0

ϕ .. . ···

··· ··· ..

. ψ

0 0 0 ,



0

 

ϕ

and assume that Var (uit ) is uniformly bounded away from zero as N → ∞. It is clear that all the eigenvalues of Φ are inside the unit circle if and only if |ϕ| < 1, regardless of the value of the neighborhood coefficient, ψ . Yet it is easily seen that the variance of xNt increases in N if ψ 2 +ϕ 2 ≥ 1. Therefore, a stronger condition than stationarity for each N is required to prevent the variance of xit from exploding as N → ∞. A set of sufficient conditions that ensure the existence of the variance of xit , even if N → ∞, is set out in the following assumptions. Assumption 2. The elements of the double index process {εit , i ∈ N, t ∈ Z} are independently distributed random variables with zero means and unit variances on the probability space (Ω , F , P ). Assumption 3 (CWD Errors). Matrix R has bounded row and column matrix norms. Assumption 4 (Stationarity and Bounded Variances). There exists a real ϵ , in the range 0 < ϵ < 1, such that8

‖Φ‖ ≤ 1 − ϵ.

(19)

Remark 1. Assumptions 2 and 3 imply that {uit } is CWD, since, for any weight vector, w, satisfying (16), we have Var (w′ ut ) ≤ ‖w‖2 ‖R‖1 ‖R‖∞ → 0 as N → ∞. For future reference, define covariance matrix Σ = Var (ut ) = RR′ and denote the ith diagonal element of Σ by σii2 = Var (uit ). Note also that ‖Σ ‖ ≤ ‖R‖1 ‖R‖∞ < K , which as shown in Pesaran and Tosetti (2010) 8 Our assumptions concerning coefficient matrix Φ can be relaxed so long as they hold for all N ≥ N0 (where N0 is a fixed constant that does not depend on N). But in order to keep the notation and exposition simple, we simply state that Assumptions 1 and 4 hold for any value of N.

8

A. Chudik, M.H. Pesaran / Journal of Econometrics 163 (2011) 4–22

includes all commonly used processes in the spatial literature, such as spatial autoregressive and spatial error component models pioneered by Whittle (1954), and further developed by Cliff and Ord (1973), Anselin (1988), and Kelejian and Robinson (1995). Remark 2. It is not necessary that proximity is measured in terms of physical space. Other measures such as economic (Conley, 1999; Pesaran et al., 2004) or social distance (Conley and Topa, 2002) could also be employed. All these are examples of dependence across nodes in a physical (real) or logical (virtual) networks. In the case of the IVAR model, defined by (3) and (4), such contemporaneous dependence can be modeled through the N × N network topology matrix R.9 , 10 Remark 3. The IVAR model when combined with ut = Rεt yields an infinite-dimensional spatiotemporal model. The model can also be viewed more generally as a ‘dynamic network’, with R and Φ capturing the static and dynamic forms of interconnections that might exist in the network. Remark 4 (Eigenvalues of Φ). Assumption 4 implies that polynomial Φ(L) = I − ΦL is invertible (for any N ∈ N) and

ϱ(Φ) ≤ 1 − ϵ,

(20)

which is a sufficient condition for covariance stationarity. Assumption 4 also ensures that Var (xit ) < K < ∞. Proposition 1. Consider model (1), and suppose that Assumptions 2–4 hold. Then, for any arbitrary sequence of fixed weights w satisfying condition (16), and for any t ∈ Z, lim Var (xwt ) = 0.

N →∞

(21)

Assumptions 2–4 are thus sufficient conditions for weak dependence. Proposition 1 has several interesting implications. Suppose that we can impose limiting restrictions given by Assumption 1. Corollary 1. Consider model (1), and suppose that Assumptions 1–4 hold. Then, for any i ∈ K ,

is augmented with common factors. The basic IVAR model, (3), can be augmented with exogenously specified common factors in a number of different ways. Here we consider two important possibilities. First, a finite number of common factors can be added to the vector of the error terms, defined by (4). This is equivalent to assuming that a finite number of the columns (or linear combinations of the columns) of R have unbounded norms. This compounding of the spatial (weak) cross-section dependence with the strong factor dependence complicates the analysis unduly, and will not be pursued here. A more attractive alternative would be to assume that

Φ(L)(xt − α − Γ ft ) = ut ,

for t = 1, 2, . . . , T ,

(25)

where Φ(L) = I − ΦL, α = (α1 , . . . , αN ) is an N × 1 vector of fixed effects, ft is an m × 1 vector of unobserved common factors (m is fixed but otherwise unknown), Γ =(γ 1 , γ 2 , . . . , γ N )′ is the N × m matrix of factor loadings, and, as before, ut = Rεt . Under this specification, the strong cross-section dependence of xit due to the factors is explicitly separated from other sources of crossdependence as embodied in Φ and R. ′

4. Estimation of a factor-augmented stationary IVAR model We now consider the problem of estimation and inference in the case of the factor-augmented IVAR model as set out in (25), as both N and T tend to infinity. We focus on the parameters of the ith equation and assume that φ′i (the ith row of matrix Φ) can be decomposed as in Assumption 1. See (7)–(9). As an important example, we consider the two-neighbor IVAR model defined in Example 1, where the parameters of interest are given by the elements of the ith row of matrix Φa given by (12). In what follows, we set ξ it = S′i xt , where Si is defined by (9), and note that it reduces to (xi−1,t , xit , xi+1,t )′ in the case of the two-neighbor IVAR model. We suppose that the following assumptions hold.

(22)

Assumption 5 (Available Observations). Available are ∑∞ observations ℓ x0 , x1 , . . . , xT , with the starting values x0 = Φ R ε + α + −ℓ ℓ=0 Γ f0 .

Remark 5. It is also possible to establish (22) under the following conditions:

Assumption 6 (Common Factors). The unobserved common factors, f1t , f2t , . . . , fmt , are covariance stationary and follow the general linear processes:

lim Var (xit − φ′ai xt −1 − uit ) = 0.

N →∞

1

‖φbi ‖ = O(N − 2 ), ‖Σ ‖ = O(N

1−ϵ

),

(23)

fst = ψs (L)εfst ,

(24)

ℓ where ψs (L) = ℓ=0 ψsℓ L with absolute summable coefficients that do not vary with N, and the factor innovations, εfst , are independently distributed over time with zero means and a constant variance, σε2fs , that do not vary with N. The εfst are also distributed independently of the idiosyncratic errors, εit ′ , for any i ∈ N, any t , t ′ ∈ T , and any s ∈ {1, . . . , m}. E (ft f′t ) exists and is a positive definite matrix.

which are less restrictive than condition (8) and Assumption 3 on the boundedness of the column and row norms of matrix R. These stronger conditions are needed for establishing the asymptotic properties of the CALS estimator to be proposed below in Section 4. 3.1. IVAR models with strong cross-sectional dependence The IVAR model can generate observations with strong crosssection dependence if the boundedness assumptions on the column and row norms of R and Φ are relaxed. The analysis of this case is beyond the scope of the present paper, and is considered in Pesaran and Chudik (2010). But even if the boundedness assumptions on R and Φ are maintained, it is still possible for xit to show strong cross-section dependence if the IVAR model 9 A network topography is usually represented by graphs whose nodes are identified with the cross-section units, with the pair-wise relations captured by the arcs in the graph. 10 It is also possible to allow for time variations in the network matrix, R, to capture changes in the network structure over time. However, this will not be pursued here.

for s = 1, 2, . . . , m,

(26)

∑∞

Assumption 7 (Existence of Fourth-Order Moments). There exists a 4 positive real constant K such that E (εfst ) < K and E (εit4 ) < K for any s ∈ {1, . . . , m}, any t ∈ T , and any i ∈ N. Assumption 8 (Bounded Factor Loadings and Fixed Effects). For any i ∈ N, γ i and αi do not change with N, ‖γ i ‖ < K , and |αi | < K . We follow Pesaran (2006) and introduce the following vector of cross-section averages xWt = W′ xt , where W = (w1 , w2 , . . . , wN )′ and {wj }Nj=1 are mw × 1-dimensional vectors. Subscripts denoting the number of groups are again omitted where not necessary, in order to keep the notation simple. Matrix W does not correspond to any spatial weight matrix. It is any arbitrary matrix

A. Chudik, M.H. Pesaran / Journal of Econometrics 163 (2011) 4–22

of pre-determined weights satisfying the following granularity conditions: 1

‖W‖ = O(N − 2 ), ‖wj ‖ 1 = O(N − 2 ) for any j. ‖W‖

(27) (28)

Multiplying (25) by the inverse of polynomial Φ(L) and then by W′ yields xWt = αW + Γ W ft + υ Wt ,

(29)

where αW = W α, Γ W = W Γ , υ Wt = W υ t , and ′

υt =

∞ −



Φ ut −ℓ .

(30)

Under Assumptions 2–3, {ut } is weakly cross-sectionally dependent, and

  δi,CALS   −1 T T − −  bˆ i1  ′ =  g g git xit . πi =  it it  bˆ  i2 t =1 t =1 cˆi 

∞ −

MH = IT − H(H′ H)+ H′ ,

H = [XW , XW (−1), τ],

(41)

Zi = [ξ i1 (−1), ξ i2 (−1), . . . , ξ ihi (−1)],

(42)

for r ∈ {1, . . . , hi },

τ is a T × 1 vector of ones, XW = (xW 1◦ , . . . , xWmw ◦ ), XW (−1) = [ xW 1 (−1), . . . , xWmw (−1)], xWs◦ = (xWs1 , . . . , xWsT )′ , xWs (−1) = (xWs0 , . . . , xWs,T −1 )′ , for s ∈ {1, . . . , mw }, and xi◦ = (xi1 , . . . , xiT )′ . For future reference, also let vit = S′i υ t = ξ it − S′i Γ ft − S′i α,

‖Φ ℓ ‖2 ,

),

(31)

where ‖W‖2 = O(N −1 ) by condition ‖Σ ‖∑= O(1) by ∑∞ (27), ∞ ℓ ℓ Assumption 3 (see Remark 1) and ‖ Φ ‖ ≤ ℓ=0 ℓ=0 ‖Φ‖ =

Q = [F, F(−1), τ],

1

(Γ W Γ W )−1 Γ W (xWt − αW ) = ft + Op (N − 2 ),

(32)



provided that the matrix Γ W Γ W is non-singular. It can be inferred that the full column rank of Γ W is important for the estimation of unit-specific coefficients. Pesaran (2006) shows that the full column rank condition is not, however, necessary, if the object of the interest is the cross-section mean of the parameters, E (δi ), as opposed to the unit-specific parameters, δi , which are the focus of the current paper. Using (25), the equation for unit i ∈ K can be written as xit − αi − γ ′i ft = δ′i S′i (xt −1 − α − Γ ft −1 ) + ζi,t −1 + uit ,

(33)

(43)

and 1



1

O(1) under Assumption 4. This implies that υ Wt = Op (N − 2 ), and the unobserved common factors can be approximated as ′

(40)

where

ξ ir (−1) = (ξir0 , . . . , ξi,r ,T −1 ) ,

ℓ=0



(39)

Also, using the partitioned regression formula,



  ∞ −   ′ ℓ ′ℓ  ‖Var (υWt )‖ =  W Φ ΣΦ W ,  ℓ=0 

= O(N

the cross-section augmented least-squares estimator (or CALS for short), and denote it by  δi,CALS . We have

 δi,CALS = (Z′i MH Zi )−1 Z′i MH xi◦ ,

ℓ=0

−1

δi , can now be estimated using the cross-section augmented regression defined by (38). We refer to such an estimator of δi as





≤ ‖W‖2 ‖Σ ‖

9

A

(2m+1)×(2mw +1)

= 0 0

α′W ′

ΓW 0m×mw

α′W



0m×mw  ,

(44)



ΓW

where F =(f1◦ , . . . , fm◦ ), F(−1) = [f1 (−1), . . . , fm (−1)], fr ◦ = (fr1 , . . . , frT )′ , and fr (−1) = (fr0 , . . . , fr ,T −1 )′ for r ∈ {1, . . . , m}. First, we consider the asymptotic properties of  πi (and δi,CALS ) as j

(N , T ) → ∞, in the case where the number of unobserved common factors equals to the dimension of xWt (m = mw ), and make the following additional assumption. Assumption 9 (Identification of π i ). There exists T0 and N0 such ∑T that, for all T ≥ T0 , N ≥ N0 , and for any i ∈ K , (T −1 t =1 git g′it )−1 1 exists, C(N ),i = E (git g′it ) is positive definite, and ‖C− (N ),i ‖ < K .

where 1

ζit = φ′ib υt = Op (N − 2 ),

(34)

since, by Assumption 1, φib satisfies condition (27). It follows from (29) that

γ ′i ft − φ′ia Γ ft −1 = b′i1 xWt + b′i2 xW ,t −1 − (bi1 + bi2 )′ αW − b′i1 υWt − b′i2 υW ,t −1 , ′

(35)







where bi1 = γ ′i (Γ W Γ W )−1 Γ W and bi2 = −δ′i S′i Γ (Γ W Γ W )−1 Γ W . Substituting (35) into (33) yields xit = δ′i S′i xt −1 + b′i1 xWt + b′i2 xW ,t −1 + ci + uit + qit ,

(36)

where ci = αi − φia α − (bi1 + bi2 ) αW , and ′

Theorem 1. Let xt be generated by model (25), let Assumptions 1–9 hold, and let W be any arbitrary (pre-determined) matrix of weights satisfying conditions (27)–(28). Then, for any i ∈ K and as (N , T ) j

→ ∞,  πi defined in Eq. (39) has the following properties. (a)



qit = ζi,t −1 − b′i1 υ Wt − b′i2 υ W ,t −1 = Op (N

− 21

p

).

 πi − πi → 0. (37)

Consider now the following auxiliary regression based on (36): xit = g′it π i + ϵit ,

Remark 6. Assumption 9 implies that Γ W is a square, full rank matrix and, therefore, the number of unobserved common factors is equal to the number of columns of the weight matrix, W (m = mw ). In cases where m < mw , full augmentation of individual models by (cross-section) averages is not necessary.

(38)

where ϵit = uit + qit , π i = (δ′i , b′i1 , b′i2 , ci )′ is the ki × 1 vector ′ ′ of coefficients associated with the regressors git = (ξ i,t −1 , xWt , ′ ′ xW ,t −1 , 1) , and ki = hi + 2mw + 1. The parameters of interest,

(b) If, in addition, T /N → ~ , with 0 ≤ ~ < ∞,

√ T

σ(N ),ii

1

D

πi − πi ) → N (0, Iki ), C(2N ),i (

(45) 1

where σ(2N ),ii = Var (uit ) = E (e′i RR′ ei ), and C(2N ),i is the square root of the positive definite matrix C(N ),i = E (git g′it ). Also

10

A. Chudik, M.H. Pesaran / Journal of Econometrics 163 (2011) 4–22

(c) p

C(N ),i −  C(N ),i → 0,

and

p

σ(N ),ii −  σ(N ),ii → 0,

where T 1−  C(N ),i = git g′it ,

T t =1

 σ(2N ),ii =

T 1−

T t =1

 u2it ,

(46)

and  uit = xit − g′it  πi . Remark 7. Suppose that, in addition to the assumptions of 1 2 Theorem 1, the limits of C− (N ),i and σ(N ),ii , as N → ∞, exist and 1 2 11 are given by C− (∞),i , and σ(∞),ii , respectively. Then (45) yields



D

−1 2 T ( πi − πi ) → N (0, σ(∞), ii C(∞),i ).

xt − γ ft = Φ(xt −1 − γ ft −1 ) + ut ,

(47)

Consider now the case where the number of unobserved common factors is unknown, but it is known that mw ≥ m. Since the auxiliary regression (38) is augmented possibly by a larger number of cross-section averages than the number of unobserved common factors, we have a potential problem of multicollinearity (as N → ∞). But this does not affect the estimation of δi so long as the space spanned by the unobserved common factors including ′ a constant and the space spanned by the vector (1, xWt )′ are the same as N → ∞. This is the case when Γ W has full column rank. For this more general case we replace Assumption 9 with the following, and suppress the subscript N to simplify the notation. Assumption 10 (Identification of δi ). There exist T0 and N0 such that, for all T ≥ T0 , N ≥ N0 , and for any i ∈ K , (T −1 Z′i MH Zi )−1 ∑∞ exists, Γ W is a full column rank matrix, Ωv i = E (vit v′it ) = ℓ=0 1 S′i Φℓ RR′ Φ′ℓ Si is positive definite, and ‖Ω− v i ‖ = O(1). Theorem 2. Let xt be generated by model (25), let Assumptions 1–8 and 10 hold, and let W be any arbitrary (pre-determined) matrix of weights satisfying conditions (27)–(28) and Assumption 10. Then, for j

any i ∈ K , and if in addition (N , T ) → ∞ such that T /N → ~ , with 0 ≤ ~ < ∞, the asymptotic distribution of  δi,CALS defined by (40) is given by

√ T

σii

1

D

Ωv2i ( δi,CALS − δi ) → N (0, Ihi ),

where σii2 = Var (uit ), Ωv i = E (vit v′it ) and vit = S′i υ t = S′i Φℓ ut −ℓ .

(48)

∑∞

ℓ=0

Remark 8. As before, we also have



compare the results with those from standard least-squares estimators. The objectives of the experiments are twofold. First, we would like to investigate how well the CALS estimator performs in the presence of unobserved common factors. Second, we would like to find out the extent to which cross-section augmentation affects the small-sample properties of the estimator when the cross-section dependence is weak, and therefore crosssection augmentation is asymptotically unnecessary. The focus of our analysis will be on the estimation of the individualspecific parameters in an IVAR model that also allows for other interdependences that are of order O(N −1 ). The data-generating process (DGP) used is given by

D

−1 2 T ( δi,CALS − δi ) → N (0, σ(∞), ii Ωv(∞),i ),

2 2 where Ωv(∞),i = limN →∞ Ωv i , and σ(∞), ii = limN →∞ σii , assuming limits exist.

5. Monte Carlo (MC) experiments

(49)

where ft is the only unobserved common factor considered (m = 1), and γ = (γ1 , . . . , γN )′ is the N × 1 vector of factor loadings. We consider two sets of factor loadings to distinguish the cases of weak and strong cross-section dependence. Under the former we set γ = 0, and under the latter we generate the factor loadings γi , for i = 1, 2, . . . , N, from a stationary spatial process in order to show that our estimators are invariant to possible cross-section dependence in the factor loadings. Accordingly, the factor loadings are generated by the following bilateral spatial autoregressive (SAR) model process:

γ i − µγ =

aγ 2

(γi−1 + γi+1 ) − aγ µγ + ηγ i ,

0 < aγ < 1,

(50)

2 where ηγ i ∼ IIDN (0, σηγ ). As established by Whittle (1954), the unilateral SAR(2) scheme

γi = ψγ 1 γi−1 + ψγ 2 γi−2 + ηγ i ,

(51)

with ψγ 1 = αγ + βγ , ψγ 2 = −αγ βγ , αγ = (1 −

βγ−1 = (1 +





1 − a2γ )/aγ , and

1 − a2γ )/aγ , generates the same autocorrelations as

the bilateral SAR(1) scheme (50). The factor loadings are generated using the unilateral scheme (51) with 50 burn-in data points (i = −49, . . . , 0) and the initialization γ−51 = γ−50 = 0. We set 2 2 aγ = 0.4, µγ = 1, and choose σηγ given by σηγ = (1 + ψγ 2 )[(1 −

ψγ 2 )2 − ψγ21 ]/(1 − ψγ 2 ), such that Var (γi ) = 1. The common factors are generated according to the AR(1) process: ft = ρf ft −1 + ηft , ηft ∼ IIDN (0, 1 − ρf2 ), with ρf = 0.9. In line with the theoretical analysis, the autoregressive parameters are decomposed as Φ = Φa + Φb , where Φa captures own and neighborhood effects as in

ϕ1 ψ2 0   Φa =  0  

ψ1 ϕ2 ψ3

ψ2 ϕ3

0

ψ4

0

0



0

0

0 0

ψ3 ϕ4 .. .

0 0  0  



.. ..

.

. ψN

ψ N −1 ϕN

 ,   

5.1. Monte Carlo (MC) design

and the remaining elements of Φ, defined by Φb , are generated as

In this section, we report some evidence on the small-sample properties of the CALS estimator in the presence of unobserved common factors and weak error cross-section dependence, and

φbij =

11 A sufficient condition for lim N →∞ C(N ),i to exist is the existence of the following limits (together with Assumptions 1–8): limN →∞ S′i α, limN →∞ S′i Γ , limN →∞ W′ Γ , ∑ ∞ limN →∞ W′ α, and limN →∞ ℓ=0 S′i Φℓ RR′ Φ′ℓ Si .

λi ∼ IIDU (−0.1, 0.2) and ωij =



λi ωij 0

for j ̸∈ {i − 1, i, i + 1} for j ∈ {i − 1, i, i + 1},

where

ςij , ∑ ςij N

j =1

(52)

A. Chudik, M.H. Pesaran / Journal of Econometrics 163 (2011) 4–22

with ςij ∼ IIDU (0, 1). This ensures that φbij = Op (N −1 ), and limN →∞ E (φbij ) = 0, for all i and j. With Φa as specified above, each unit i, except the first and the last, has two neighbors: the ‘left’ neighbor i − 1 and the ‘right’ neighbor i + 1. The DGP for the ith unit can now be written as x1t = ϕ1 x1,t −1 + ψ1 x2,t −1 + φ′b1 xt −1 + γ1 ft − (φ′1 γ)ft −1 + u1t , xit = ϕi xi,t −1 + ψi (xi−1,t −1 + xi+1,t −1 ) + φ′bi xt −1 + γi ft − (φ′i γ)ft −1 + uit , i ∈ {2, . . . , N − 1},

To ensure that the DGP is stationary, we generate ϕi ∼ IIDU (0.4, 0.6), and ψi ∼ IIDU (−0.1, 0.1) for i ̸= 2. We choose to focus on the equation for unit i = 2 in all experiments, and we set ϕ2 = 0.5 and ψ2 = 0.1. This yields ‖Φ‖∞ ≤ 0.9, and together with |ρf | < 1 it is ensured that the DGP is stationary and that the variance of xit is bounded in N. The cross-section averages, xwt , are ∑N constructed as simple averages, xt = N −1 j=1 xit . The N-dimensional vector of error terms, ut , is generated using the following SAR model:

i ∈ {2, . . . , N − 1}

for t = 1, 2, . . . , T . We set au = 0.2, which ensures that the errors are cross-sectionally weakly dependent, and generate εit , the ith element of εt , as IIDN (0, σε2 ). We set σε2 = N /tr (Ru R′u ) so that on average Var (uit ) = 1, where Ru = (I − au S)−1 , and the spatial weight matrix S is

0

1

 2  0  S=     

0

1

0

1 2

0 1 2 0

..

.

0

0

0

0 

1 2

..

.

1 0

2 0

..

  .    1  0

.

0 1

(53)

2 0

In order to minimize the effects of the initial values, the first 50 observations are dropped. N ∈ {25, 50, 75, 100, 200} and T ∈ {25, 50, 75, 100, 200}. For each N, all parameters were set at the beginning of the experiments and 2000 replications were carried out by generating new innovations εit , ηft , and ηγ i . The focus of the experiments is to evaluate the small-sample properties of the CALS estimator of the own coefficient ϕ2 = 0.5 and the neighboring coefficient ψ2 = 0.1, in the case of the second cross-section unit.12 The cross-section augmented regression for estimating (ϕ2 , ψ2 ) is given by x2t = c2 + ψ2 (x1,t −1 + x3,t −1 ) + ϕ2 x2,t −1 + δ2,0 xt

+ δ2,1 xt −1 + ϵ2t .

Tables 1 and 2 give the bias (×100) and root mean square error (RMSE: ×100) of the CALS and LS estimators as well as size and power of tests based on them at the 5% nominal level. The results for the estimated own coefficient,  ϕ2,CALS , and  ϕ2,LS , are reported in Table 1. The top panel of this table presents the results for the experiments with an unobserved common factor (γ ̸= 0). In this case, {xit } is CSD, and the standard LS estimator without augmentation by cross-section averages is not consistent. The bias of  ϕ2,LS is indeed quite substantial for all values of N and T = 200, and the tests based on  ϕ2,LS are grossly oversized. CALS, on the other hand, performs well for T ≥ 100 and all values of N. For smaller values of T , there is a negative bias, and the test based on  ϕ2,CALS is slightly oversized. This is the familiar time series bias, where even in the absence of any cross-section dependence the LS estimator of the autoregressive coefficient is biased downward (when ϕ2 > 0) in small-T samples. Moving on to the experiments without a common factor (given at the bottom half of the table), we observe that the LS estimator only slightly outperforms the CALS estimator. In the absence of common factors, {xit } is weakly cross-sectionally dependent, and therefore the augmentation with cross-section averages is (asymptotically) innocuous. The distortions coming from crosssection augmentation are in this case very small. Note that the LS estimator is not efficient because the residuals are crosssectionally dependent. Augmentation by cross-section averages helps to reduce part of this dependence. Nevertheless, the reported RMSE of  ϕ2,CALS does not outperform the RMSE of  ϕ2,LS . The estimation results for the neighboring coefficient, ψ2 , are presented in Table 2. These are qualitatively similar to the ones reported in Table 1. Cross-section augmentation is clearly needed and is very helpful when common factors are present. But in the absence of such common effects, the presence of weak crosssection dependence, whether through the dynamics or error processes, does not pose any difficulty for the LS and CALS estimators so long as N is sufficiently large. Finally, not surprisingly, the estimates are subject to the small-T bias irrespective of the size of N or the degree of cross-section dependence. Fig. 1 plots the power of the CALS estimator of the own coefficient,  ϕ2,CALS , (left chart) and the neighboring coefficient, 2,CALS , (right chart) for N = 200 and two different values of ψ T ∈ {100, 200}. These charts provide a graphical representation of the results reported in Tables 1–2, and also suggest significant improvement in power as T increases for a number of different alternatives.

(54)

We also report results of the least-squares (LS) estimator computed using the above regression but without augmentation with cross-section averages. The corresponding CALS estimator and non-augmented LS estimator are denoted by  ϕ2,CALS and  ϕ2,LS 2,CALS and ψ 2,LS (neighboring coefficient), (own coefficient), or ψ respectively. To summarize, we carry out two different sets of experiments, one set without the unobserved common factor (γ = 0), and the 12 Similar results are also obtained for other cross-section units.

other with the unobserved common factor (γ ̸= 0). There are many sources of interdependence between individual units: spatial dependence of innovations {uit }, spatiotemporal interactions due to coefficient matrices Φa and Φb , and finally, in the case where γ ̸= 0, the cross-section dependence also arises via the unobserved common factor, ft , and the cross-sectionally dependent factor loadings, γi . 5.2. Monte Carlo results

xNt = ϕN xN ,t −1 + ψN xN −1,t −1 + φ′b,N xt −1 + γN ft − (φ′N γ)ft −1 + uNt .

u1t = au u2t + ε1t , au uit = (ui−1,t + ui+1,t ) + εit , 2 uNt = au uN −1,t + εNt ,

11

6. An empirical illustration: a spatiotemporal model of house prices in the US In a recent study, Holly et al. (2010), hereafter HPY, consider the relation between real house prices, pit , and real per capita personal disposable income yit (both in logs) in a panel of 49 US states over 29 years (1975–2003), where i = 1, 2, . . . , 49 and t = 1, 2, . . . , T . Controlling for heterogeneity and cross-section dependence, they show that pit and yit are cointegrated with coefficients (1, −1), and provide estimates of the following panel error correction model:

1pit = ci + ωi (pi,t −1 − yi,t −1 ) + δ1i 1pi,t −1 + δ2i 1yit + υit .

(55)

12

A. Chudik, M.H. Pesaran / Journal of Econometrics 163 (2011) 4–22

Table 1 MC results for the own coefficient ϕ2 . (N,T)

Bias (×100) 25

50

75

100

200

Root mean square error (×100)

Size (5% level, H0 : ϕ2 = 0.50)

25

25

50

75

100

200

50

75

100

Power (5% level, H1 : ϕ2 = 0.60) 200

25

50

75

100

200

Experiments with non-zero factor loadings LS estimator not augmented with cross-section averages,  ϕ2,LS

−4.00 −2.90 −3.72 −3.30 −3.31

25 50 75 100 200

4.79 7.31 4.90 7.14 4.32 7.03 4.44 7.30 4.17 6.55 CALS estimator  ϕ2,CALS

25 50 75 100 200

−17.83 −7.51 −18.89 −8.99 −20.63 −9.79 −20.32 −10.04 −20.30 −9.40

−5.02 −6.16 −6.25 −6.34 −6.27

8.60 9.29 8.89 8.43 8.85

11.04 10.89 11.30 11.46 10.94

−3.10 −3.83 −4.36 −4.81 −4.74

−0.94 −1.93 −2.21 −2.12 −1.94

24.73 18.82 17.91 24.13 19.16 17.81 24.67 18.92 17.65 24.96 19.02 17.78 24.46 18.67 17.23

17.58 17.61 13.40 20.75 28.80 33.45 43.70 15.75 17.64 17.53 11.60 22.75 28.70 34.45 45.45 13.15 17.75 17.81 13.80 21.10 27.90 34.95 45.45 15.80 17.61 18.03 13.75 21.40 28.10 33.15 45.75 15.35 17.50 17.51 12.00 19.90 26.50 33.20 44.05 14.60

20.30 20.60 20.25 20.85 18.70

25.25 29.45 41.80 25.75 28.50 41.45 25.00 30.15 41.75 25.00 29.90 42.35 23.75 28.80 40.95

27.88 28.88 29.70 29.93 30.05

9.86 10.02 10.44 10.64 10.43

6.38 6.72 6.77 6.76 6.53

12.75 12.00 14.40 14.45 13.65

8.95 9.50 9.70 9.35 9.90

8.50 8.30 7.55 8.50 8.05

6.45 6.75 7.45 8.15 7.65

5.65 5.55 6.25 6.25 5.30

23.25 23.60 25.05 25.45 26.40

23.25 25.75 26.45 27.65 25.30

26.80 30.00 29.70 29.85 29.00

29.40 29.90 32.70 34.30 33.70

41.55 46.50 48.30 48.05 46.25

6.44 6.65 6.67 6.62 6.66

8.75 8.70 8.60 9.00 8.15

6.85 7.95 7.20 7.85 6.70

5.80 6.25 6.80 5.70 6.75

5.75 6.30 6.25 6.40 5.75

5.45 5.95 5.05 5.85 5.35

17.45 18.30 18.35 17.50 17.00

18.80 19.25 20.55 21.65 19.90

23.05 23.10 22.40 22.80 23.65

26.25 26.55 27.20 26.35 27.50

41.10 45.05 44.05 43.35 42.60

6.25 9.35 6.54 10.25 6.63 9.95 6.61 10.50 6.77 10.10

7.30 8.65 7.35 8.20 7.70

6.10 7.05 7.00 5.55 7.80

5.20 6.40 7.10 6.20 6.05

4.90 5.80 5.25 5.50 5.95

18.65 19.80 19.70 19.50 18.95

18.90 20.40 22.10 24.15 21.70

23.00 23.70 22.70 24.05 24.55

24.45 26.65 27.25 27.35 28.05

36.80 44.70 42.60 42.35 42.85

15.78 16.93 17.10 17.07 17.07

12.14 12.62 12.64 12.63 12.62

Experiments with zero factor loadings LS estimator not augmented with cross-section averages,  ϕ2,LS 25 50 75 100 200

−12.90 −13.38 −13.65 −12.68 −12.50

−6.15 −6.55 −6.72 −6.90 −6.54

−4.07 −4.44 −4.04 −4.32 −4.23

−3.06 −3.17 −3.13 −3.17 −3.25

−1.45 −1.67 −1.68 −1.55 −1.75

24.45 24.57 24.63 24.36 23.84

14.81 15.13 15.30 15.56 15.18

11.48 11.52 11.69 11.54 11.79

9.65 9.77 9.80 9.68 9.76

−2.48 −3.14 −3.30 −3.42 −3.72

−0.69 −1.47 −1.58 −1.54 −1.89

25.88 26.59 27.25 26.78 26.50

15.21 15.63 16.01 16.31 16.13

11.49 11.74 11.92 11.88 12.27

9.46 9.90 10.00 9.88 10.03

CALS estimator  ϕ2,CALS 25 50 75 100 200

−14.25 −15.22 −15.78 −15.03 −15.03

−6.26 −7.03 −7.41 −7.77 −7.59

−3.82 −4.63 −4.31 −4.80 −4.93

Notes: ϕ2 = 0.5, ψ2 = 0.1, aγ = 0.4, au = 0.2, and Var (γi ) = 1. The DGP is given by the two-neighbor IVAR model (49), where the equation for unit i ∈ {2, . . . , N − 1} ′ is xit = ϕi xi,t −1 + ψi (xi−1,t −1 + xi+1,t −1 ) + φbi xt −1 + γi ft − φi′ γ ft −1 + uit . The CALS estimator of the own coefficient ϕ2 and the neighboring coefficient ψ2 is computed 2,LS are computed from the auxiliary ϕ2,LS and ψ using the following auxiliary regression: x2t = c2 + ψ2 (x1,t −1 + x3,t −1 ) + ϕ2 x2,t −1 + δ2,0 xt + δ2,1 xt −1 + ϵ2t . Estimators  regressions not augmented with cross-section averages. The unobserved common factor ft is generated as a stationary AR(1) process, and factor loadings and innovations {uit } are generated according to stationary spatial autoregressive processes. Please refer to Section 5 for the detailed description of Monte Carlo design.

Table 2 MC results for the neighboring coefficient ψ2 . (N,T)

Bias (×100) 25

50

Size (5% level, H0 : ψ2 = 0.10)

Root mean square error (×100) 75

100

200

25

50

75

100

Power (5% level, H1 : ψ2 = 0.20)

200

25

50

75

100

200

25

50

75

100

200

Experiments with non-zero factor loadings

2,LS LS estimator not augmented with cross-section averages, ψ 25 50 75 100 200

9.28 7.28 6.95 7.06 8.49 7.39 7.38 6.27 8.91 7.33 7.23 7.18 8.28 7.84 7.16 6.80 8.83 7.81 7.17 6.99 2,CALS CALS estimator ψ

6.71 6.73 6.25 6.68 6.58

21.77 21.45 21.60 21.53 21.98

16.57 16.44 16.86 16.78 16.86

15.30 15.22 15.45 15.03 15.34

14.41 14.17 14.55 14.34 14.18

12.99 13.01 13.13 13.45 12.75

18.30 18.10 18.75 17.20 18.10

28.55 25.15 26.75 27.15 28.10

35.90 34.65 37.15 35.75 36.30

43.55 41.05 42.15 44.10 42.30

58.40 59.55 59.20 59.30 58.05

16.85 15.45 17.00 15.80 16.90

25.30 23.80 24.95 24.55 25.65

31.20 29.60 30.05 29.10 30.75

33.00 34.10 33.00 33.75 33.05

41.95 43.35 45.20 44.85 42.30

25 50 75 100 200

2.47 1.77 2.46 1.54 1.96

0.78 0.76 0.69 0.44 0.34

17.30 17.44 17.41 17.70 17.58

10.92 10.38 10.32 10.42 10.49

8.36 8.06 7.96 8.26 8.25

7.23 6.89 6.78 7.18 6.70

4.93 4.73 4.74 4.67 4.43

8.90 8.90 8.60 10.35 8.30

7.70 6.65 6.10 7.10 6.00

6.95 6.05 5.90 6.60 5.70

6.90 5.70 5.20 6.55 5.70

6.60 6.10 6.10 5.50 4.95

11.65 12.30 11.30 12.70 12.00

17.05 15.90 16.20 15.95 16.35

22.10 22.50 21.60 24.40 22.95

30.55 30.70 30.70 30.25 29.70

52.85 54.00 53.05 54.90 56.45

1.41 1.29 1.27 1.70 1.32

1.39 1.28 0.93 0.86 1.01

1.01 0.50 0.55 0.75 0.71

Experiments with zero factor loadings

2,LS LS estimator not augmented with cross-section averages, ψ 25 50 75 100 200

1.04 2.33 2.04 1.60 1.25

0.90 0.88 1.11 1.37 0.65

0.38 0.66 0.77 0.61 0.93

0.48 0.28 0.74 0.48 0.37

0.23 0.47 0.22 0.24 0.09

15.73 16.38 16.17 16.10 16.14

9.98 10.43 10.10 10.20 10.22

7.88 8.17 8.28 8.18 7.92

6.86 6.62 6.81 6.89 6.75

4.74 4.64 4.64 4.72 4.67

6.60 8.05 7.50 8.40 7.15

6.05 7.05 5.95 6.30 5.90

5.60 6.35 6.60 6.90 5.45

5.65 5.35 5.85 6.25 5.50

5.20 4.70 4.55 5.70 5.35

12.40 12.20 11.70 12.40 11.75

16.20 17.80 17.00 15.75 18.55

25.30 24.10 24.50 25.40 23.45

31.50 31.60 29.45 30.55 30.95

56.05 55.55 56.65 56.35 57.30

0.36 0.53 0.25 0.25 0.07

16.65 17.64 17.25 16.97 17.17

10.30 10.81 10.40 10.38 10.62

8.19 8.32 8.48 8.31 8.07

6.97 6.80 6.90 6.94 6.87

4.73 4.68 4.69 4.79 4.68

6.60 9.30 8.55 8.35 7.70

6.60 6.75 6.25 6.00 6.40

6.45 6.35 6.60 6.50 5.10

5.20 5.80 5.95 6.10 5.75

5.15 5.30 4.85 5.75 5.05

11.65 11.15 11.80 12.05 10.75

14.90 17.45 16.50 14.90 16.80

24.60 23.75 23.75 24.05 22.70

29.70 31.25 28.45 29.90 29.85

53.50 54.20 56.35 55.75 56.70

2,CALS CALS estimator ψ 25 50 75 100 200

1.12 2.63 2.08 1.74 1.30

0.94 0.88 1.27 1.39 0.71

0.48 0.77 0.80 0.61 0.92

See the notes to Table 1.

0.63 0.39 0.76 0.55 0.40

A. Chudik, M.H. Pesaran / Journal of Econometrics 163 (2011) 4–22

13

Fig. 1. Power curves for the CALS t-tests of the own coefficient, ϕ2 , (left chart) and the neighboring coefficient, ψ2 , (right chart) in the case of experiments with γ ̸= 0 and N = 200. Table 3 Alternative average estimates of the error correction models for house prices across 49 US states over the period 1975–2003.

1pit

Holly et al. (2010) regressions without dynamic spatial effects

Regressions augmented with dynamic spatial effects

MG

CCEMG

CCEP

MG

CCEMG

CCEP

pi,t −1 − yi,t −1

−0.105

−0.183

−0.171

−0.095

−0.154

−0.152

∆pi,t −1

0.524

0.449

0.518

0.296

0.188

0.272

∆yit

0.500

0.277

0.227

0.497

0.284

0.201

∆psi,t −1







0.331

0.350

0.431

(0.008)

(0.030)

(0.040)

(0.016)

(0.038)

(0.015)

(0.065)

(0.059)

(0.063)

(0.009)

(0.060)

(0.040)

(0.066)

(0.018)

(0.049)

(0.059)

(0.085)

(0.018)

(0.082) (0.088)

(0.105)

R¯ 2

0.54

0.70

0.66

0.60

0.79

0.72

Average cross-correlation coefficients (ρˆ )

0.284

−0.005

−0.016

0.267

−0.012

−0.016

Notes: MG, CCEMG, and CCEP, respectively, stand for the mean group, the common correlated effects mean group, and the common correlated effects pooled estimators ¯ t = Σi49 ¯ t −1 − y¯ t −1 = Σi49 defined in Pesaran (2006). Augmentation by simple cross-section averages, 1p¯ t = Σi49 =1 1pit /49, 1y =1 1yit /49, and p =1 (pi,t −1 − yi,t −1 )/49, is used to deal with the possible effects of strong cross-section dependence. Standard errors are in parentheses. ρˆ denotes the average pair-wise correlation of the residuals from the cross-section augmented regressions across the 49 US states.

To take account of unobserved common factors, HPY augmented (55) with simple cross-section averages, 1p¯ t = Σi49 =1 1pit /49, 49 ¯ ¯ 1y¯ t = Σi49 1 y / 49, and p − y = Σ ( p − yi,t −1 )/49, it t − 1 t − 1 i , t − 1 =1 i =1 and obtained common correlated effects mean group and pooled estimates (denoted as CCEMG and CCEP) of {ωi , δ1i , δ2i }, which we reproduce in the left panel of Table 3. HPY then showed that the residuals from these regressions, υˆ it , display a significant degree of spatial dependence. Here we exploit the theoretical results of the present paper and consider the possibility that dynamic neighborhood effects are partly responsible for the residual spatial dependence reported by HPY. To this end, we considered an extended version of (55) in which the lagged spatial variable ∑ 1psi,t −1 = Nj=1 sij 1pj,t −1 is also included amongst the regressors, with sij being the (i, j)th element of a spatial weight matrix, S, namely

1pit = ci + ωi (pi,t −1 − yi,t −1 ) + δ1i 1pi,t −1 + ψi 1psi,t −1

+ δ2i 1yit + υit .

(56)

Here we consider a simple contiguity matrix with sij = 1 when the states i and j share a border and zero otherwise, and with sii = 0. Possible strong cross-section dependence is again controlled for by augmentation of the extended regression equation with 1p¯ t , 1y¯ t , and p¯ t −1 − y¯ t −1 . Estimation results are reported in the right panel of Table 3. The dynamic spatial effects are found to be highly significant, irrespective of the estimation method, increasing R¯ 2 of the price equation by 6–9%. The dynamics of past price changes are now distributed between own and neighborhood effects giving rise to much richer dynamics and spillover effects. It is also interesting that the inclusion of the spatiotemporal variable 1psi,t −1 in the model has had little impact on the estimates of the coefficient of the real income variable, δ2i .

7. Concluding remarks This paper has proposed restrictions on the coefficients of infinite-dimensional VAR (IVAR) that are binding only in the limit as the number of cross-section units (or variables in the VAR) tends to infinity to circumvent the curse of dimensionality. The proposed framework relates to the various approaches considered in the literature. For example, when modeling individual households or firms, aggregate variables, such as market returns or regional/national income, are treated as exogenous. This is intuitive, as the impact of a firm or household on the aggregate economy is small, of the order O(N −1 ). This paper formalizes this idea in a spatiotemporal context. The paper establishes that, in the absence of common factors and when the degree of cross-section dependence is weak, then equations for individual units decouple as N → ∞, and can be consistently estimated by running separate regressions. In the presence of observed and/or unobserved common factors, individual-specific VAR models can still be estimated separately if they are conditioned on the common factors. Unobserved common factors can be approximated by cross-sectional averages, following the idea originally introduced by Pesaran (2006). This paper shows that the global VAR approach of Pesaran et al. (2004) can be motivated as an approximation to an IVAR model featuring all the macroeconomic variables. Asymptotic distribution of the cross-section augmented least-squares (CALS) estimator of the parameters of the unit-specific equations in the IVAR model is established both in the case when the number of unobserved common factors is known, and when it is unknown but fixed. Small-sample properties of the proposed CALS estimator were investigated through Monte Carlo simulations, and an empirical

14

A. Chudik, M.H. Pesaran / Journal of Econometrics 163 (2011) 4–22

illustration shows the statistical significance of dynamic spillover effects in modeling of US real house prices across the neighboring states. Topics for future research include estimation and inference in the case of IVAR models with dominant individual units, analysis of large dynamic networks with and without dominant nodes, and an examination of the relationships between IVAR and dynamic factor models.

∞ FNt . Let {{cNt }∞ t =−∞ }N =1 be a two-dimensional array of constants, and set cNt = 1/TN for all t ∈ Z and N ∈ N. Note that

[  E

E

κNt cNt

]2  | F N ,t − n

∞ −

=

ℓ=mnp

≤ ςn , where mnp = max{n, p} and

ςn = sup ‖θ‖ ‖Σ ‖ ‖Φ‖ 2

Proof of Proposition For any N ∈ N, the variance of xt is Ω = ∑∞ ℓ 1. ′ℓ Var (xt ) = Φ ΣΦ , and, under Assumptions 2–4, ‖Ω‖ ≤ ℓ= 0 ∑ 2ℓ ‖Σ ‖ ∞ ℓ=0 ‖Φ‖ < K . Hence, it follows that, for any arbitrary nonrandom vector of weights satisfying the granularity condition (16), ‖Var (w′ xt )‖ = ‖w′ Ωw‖ ≤ ‖ϱ(Ω)(w′ w)‖, where ϱ(Ω) = ‖Ω‖ < K , and w′ w = O(N −1 ) by condition (16). Therefore, limN →∞ ‖Var (w′ xt )‖ = 0.  Proof of Corollary 1. Assumption 1 implies that, for any i ∈ K , vector φbi satisfies condition (16). It follows from Proposition 1 that lim Var (φ′bi xt ) = 0

N →∞

for i ∈ K .

xit − φai xt −1 − uit = φbi xt −1 ,

ς0 < K ,

Taking the variance of (58) and using (57) now yields (22).



Lemma 1. Suppose that Assumptions 2–4 hold. Then, for any p, q ∈ {0, 1}, and for any sequences of non-random vectors θ and ϕ, such j

that ‖θ‖ = O(1) and ‖ϕ‖1 = O(1), as (N , T ) → ∞, we have T 1− ′ p θ υt −p → 0, T t =1

(59)

and T 1− ′ p θ υt −p ϕ′ υt −q − E (θ ′ υt −p ϕ′ υt −q ) → 0, T t =1

(60)

where the process υ t is defined by (30). Furthermore, if ‖θ‖ = O(N −1/2 ), then



T N − ′ p θ υt → 0, T t =1

(61)

TN −

lim

T √ N − ′ p θ υt −p ϕ′ υt −q − E ( N θ ′ υt −p ϕ′ υt −q ) → 0. T t =1

(62)

Proof. Let TN = T (N ) be any non-decreasing integer-valued function of N such that limN →∞ TN = ∞. Consider the following ∞ two-dimensional array, {{κNt , FNt }∞ t =−∞ }N =1 , defined by

κNt =

1 ′ θ υ t −p , TN

where the subscript N is used to emphasize the number of cross-section units,13 {FNt } denotes the array of σ -fields that is increasing in t for each N, and κNt is measurable with respect to 13 Note that vectors υ and θ change with N as well, but the subscript N is omitted t here to keep the notation simple.

and ςn → 0

N →∞

cNt = lim

N →∞

t =1 TN −

lim

.

as n → ∞.

(64)

2 cNt = lim

N →∞

t =1

TN − 1 t =1

TN

TN − 1 t =1

TN2

= 1 < ∞,

(65)

= 0.

(66)

∞ Therefore, array {{κNt , FNt }∞ t =−∞ }N =1 satisfies the conditions of a

mixingale weak law,16 which implies that

∑TN

L1

t =1

κNt → 0, i.e.,

T 1− ′ L1 θ υ t −p → 0 , T t =1 j

as (N , T ) → ∞ at any rate. Convergence in the L1 -norm implies convergence in probability. This completes the proof of result (59). Under the condition ‖θ‖ √ = O(N −1/2 ), result (61) follows from result (59) by noting that ‖ N θ‖ = O(1). Result (60) is established in a similar fashion. Consider the ∞ following two-dimensional array, {{κNt , FN ,t }∞ t =−∞ }N =1 , defined 17 by

κNt =

1 ′ 1 θ υt −p ϕ′ υt −q − E (θ ′ υt −p ϕ′ υt −q ), TN TN

where, as before, TN = T (N ) is any non-decreasing integer-valued function of N such that limN →∞ TN = ∞. Set cNt = 1/TN for all t ∈ Z and N ∈ N. Note that E



‖Φ ‖

ℓ=0





and

 2ℓ

By Liapunov’s inequality, E |E (κNt | FN ,t −n )| ≤ E {[E (κNt | FN ,t −n )]2 } (Theorem 9.23 of Davidson (1994)). It follows that the two∞ dimensional array {{κNt , FNt }∞ t =−∞ }N =1 is an L1 -mixingale with respect to the constant array {cNt }. Eqs. (63) and (64) establish that array {κNt /cNt } is uniformly bounded in the L2 -norm. This implies uniform integrability.15 Note that

(57)

for any i ∈ K and any N ≥ i. (58)

∞ −

Under Assumptions 2–4, ςn has the following properties:

N →∞



2(mnp −p)

N ∈N

Also, (1) implies that ′

(63)

14

 Appendix. Lemmas and proofs

θ ′ Φℓ−p ΣΦ′ℓ−p θ,

κNt cNt

 | FN ,t −n

=E

 ∞ −

θΦ ′

s−p

ut −s

s =p

∞ −

 ϕΦ ′

ℓ−q

ut −ℓ | FN ,t −n

ℓ=q

− E (θ ′ υt −p ϕ′ υt −q ), ∞ ∞ − − = [θ ′ Φs−p ut −s ϕ′ Φℓ−q ut −ℓ s=mnp ℓ=mnq

− E (θ ′ Φs−p ut −s ϕ′ Φℓ−q ut −ℓ )]. 14 We use the submultiplicative property of matrix norms (‖AB‖ ≤ ‖A‖ ‖B‖ for any matrices A, B such that AB is well defined) and the fact that the spectral matrix norm (i.e., ‖A′ ‖ = ‖A‖). Note also that Assumption 4 implies that ∑∞ is self-adjoint ℓ 2 ℓ=0 ‖Φ ‖ = O(1). 15 A sufficient condition for uniform integrability is L uniform boundedness for any ϵ > 0. 16 See Theorem 19.11 of Davidson (1994).

1+ϵ

17 As before, {F } denotes the array of σ -fields that is increasing in t for each N, Nt and κNt is measurable with respect to FNt .

A. Chudik, M.H. Pesaran / Journal of Econometrics 163 (2011) 4–22

Let θ ′s = θ ′ Φs and ϕ′ℓ = ϕ′ Φℓ ; then

[  E

E

=

κNt cNt

∞ ∞ − −

−

weak law,20 which implies that

| FN ,t −n

s=mpn ℓ=mqn j=mpn d=mqn



∞ array {{κNt , FN ,t }∞ t =−∞ }N =1 satisfies the conditions of a mixingale

]2 

∞ ∞ − ∞ − ∞ − −

s=mpn ℓ=mqn

E (θ ′s−p ut −s ϕ′ℓ−q ut −ℓ θ ′j−p ut −j ϕ′d−q ut −d )

s=mpn ℓ=mqn

E (θ ′s−p ut −s ϕ′ℓ−q ut −ℓ ) =

(67)

∞ −

t =1

κNt → 0, i.e.,

j

2 E (θ ′s−p ut −s ϕ′ℓ−q ut −ℓ ) .

L1

∑TN

T 1− ′ L1 θ υt −p ϕ′ υt −q − E (θ ′ υt −p ϕ′ υt −q ) → 0, T t =1

Using the independence of ut and ut ′ for any t ̸= t ′ (Assumption 2), we have ∞ ∞ − −

15

θ ′ Φℓ−p ΣΦ′ℓ−q ϕ

ℓ=max{p,q,n}

as (N , T ) → ∞. Convergence in the L1 -norm implies convergence in probability. This completes the proof of result (60). Under −1/2 ‖θ‖ ), result (62) follows from result (60) by noting that √ = O(N ‖ N θ‖ = O(1).  Lemma 2. Suppose that xt is generated by model (25), and that j

Assumption 2–8 hold. Then, as (N , T ) → ∞, for any p, q ∈ {0, 1}, and for any sequence of non-random vectors θ and ϕ with growing dimension N × 1 such that ‖θ‖1 = O(1) and ‖ϕ‖1 = O(1), we have T 1− ′ p θ xt −p − E (θ ′ xt −p ) → 0, T t =1

≤ ςa,n , where

(71)

and

 ςa,n = sup ‖θ‖ ‖ϕ‖ ‖Σ ‖ ‖Φ‖χ1 (p,n,q) N ∈N

∞ −

 ‖Φ‖2ℓ ,

ℓ=0

and χ1 (p, n, q) = max{0, q − p, n − p} ∑ + max{0, p − q, n − q}. 2ℓ ‖Σ ‖ = O(1) by Assumptions 2 and 3, ∞ = O(1) by ℓ=0 ‖Φ‖ Assumption 4, ‖θ‖ = O(1), and ‖ϕ‖ ≤ ‖ϕ‖1 = O(1). ςa,n has the following properties:

ςa,0 < Ka ,

and ςa,n → 0

as n → ∞.

(68)

Similarly, since by Assumption 2 ut and ut ′ are independently distributed for any t ̸= t ′ , the first term on the right-hand side of Eq. (67) is bounded by ςb,n 18 :

T 1− ′ p θ xt −p ϕ′ xt −q − E (θ ′ xt −p ϕ′ xt −q ) → 0. T t =1

Furthermore, for ‖θ‖ = O(1) and ‖ϕ‖1 = O(1), we have T 1− ′ p θ υ t −p ϕ ′ Γ f t −q → 0 , T t =1

N ∈N

− ℓ=max{p,q,n}

‖Φ‖2(ℓ−p)+2(ℓ−q) + 2ςa2,n

 + ‖θ‖2 ‖Σ ‖2 ‖ϕ‖2 ‖Φ‖2χ2 (p,n,q)

∞ −

‖Φ‖2ℓ

ℓ=0

2  

,



where χ2 (p, n, q) = max{0, n − p} + max{n − q, 0}, B is an N × N matrix with the (i, j)th element given by ‖Ψ ij ‖, and Ψ ij is an N × N matrix of fourth moments with its (n, s)th element given by E (uit ujt unt ust ). It follows from Assumptions 2–4 that ςb,n has the following properties19 :

ςb,0 < Kb ,

and ςb,n → 0

as n → ∞.

(69)

E {[E (κNt /cNt | FN ,t −n )] } is therefore bounded by ςn = ςa,n + ςb,n . Eqs. (68) and (69) establish 2

ς0 < K ,

ςn → 0 as n → ∞.

(70)

By Liapunov’s inequality, E |E (κNt | FN ,t −n )| ≤ E {[E (κNt | FN ,t −n )]2 } (Theorem 9.23 of Davidson (1994)). It follows that the two∞ dimensional array {{κNt , FN ,t }∞ t =−∞ }N =1 is an L1 -mixingale with respect to a constant array {cNt }. Furthermore, (70) establishes that array {κNt /cNt } is uniformly bounded in the L2 -norm. This implies uniform integrability. Since Eqs. (65) and (66) also hold,

(73)

where υ t is defined in Eq. (30). Proof. Let TN = T (N ) be any non-decreasing integer-valued function of N such that limN →∞ TN = ∞. Consider the following ∞ two-dimensional array, {{κNt , FNt }∞ t =−∞ }N =1 , defined by

 ςb,n = sup ‖B‖ · ‖θ‖2 ‖ϕ‖2

(72)

1 ′ θ υt −p ϕ′ Γ ft −q ,

κNt =

TN

where {FNt } denotes the array of σ -fields that is increasing in t for each N, and κNt is measurable with respect to FNt . Let ∞ {{cNt }∞ t =−∞ }N =1 be a two-dimensional array of constants, and set cNt = 1/TN for all t ∈ Z and N ∈ N. Using the submultiplicative property of the matrix norm, and the independence of ft and υ t ′ for any t , t ′ ∈ Z, we have

[  E

E

 ]2  κNt  F N ,t − n ≤ ςn , c  Nt

where

 ςn = sup ‖θ‖2 ‖Σ ‖ ‖Φ‖2 max{0,n−p}

∞ −

N ∈N

‖Φ‖2ℓ

ℓ=0

 × E {[E (ϕ Γ ft −q | FN ,t −n )] } . ′

2



18 E (θ ′ u ϑ ′ u θ ′ u ϑ′ u ) is non-zero only if one of the following s−p t −s ℓ−q t −ℓ j−p t −j d−q t −d four cases holds: (i) s = ℓ = j = d, (ii) s = ℓ, ℓ ̸= j, and j = d, (iii) s = j, j ̸= ℓ, and ℓ = d, or (iv) s = d, d ̸= ℓ, and ℓ = j. 19 Matrix B is symmetric by construction. Therefore ‖B‖ ≤ √‖B‖ ‖B‖ = ‖B‖ , ∞

where ‖B‖∞ = maxn∈{1,...,N }

∑N

s=1

1

‖Ψ ns ‖ ≤ ‖RR′ ‖2∞ ≤ ‖R‖2∞ ‖R‖21 < K .

2 ‖θ‖ = O(1), ‖Φ‖ ≤ 1 − ϵ by Assumption 4, and ‖Σ ‖ ≤ √ ‖Σ ‖1 ‖Σ ‖∞ = O(1) by Assumption 3. Furthermore, since ft −q is covariance stationary and ‖ϕ′ ΓΓ ′ ϕ‖ = O(1) (by condition ‖ϕ‖1 = O(1) and Assumption 8), we have

E {[E (ϕ′ Γ ft −q | FN ,t −n )]2 } = O(1). It follows that ςn has the following properties:

ς0 < K and ςn → 0 as n → ∞.



20 See Theorem 19.11 of Davidson (1994).

16

A. Chudik, M.H. Pesaran / Journal of Econometrics 163 (2011) 4–22

Array {κNt /cNt } is thus uniformly bounded in the L2 -norm. This proves the uniform integrability of array {κNt /cNt }. Furthermore, using Liapunov’s inequality, the two-dimensional array ∞ {{κNt , FNt }∞ t =−∞ }N =1 is an L1 -mixingale with respect to the constant array {cNt }. Noting that Eqs. (65) and (66) hold, it follows that the array {κNt , FNt } satisfies the conditions of a mixingale weak

∑TN

L1

law, which implies that t =1 κNt → 0. Convergence in L1 -norm implies convergence in probability. This completes the proof of result (73). Assumption 8 implies that the sequence θ ′ α (as well as ϕ′ α) is deterministic and bounded. The vector of endogenous variables xt can be written as 21

Process ft is independent of υ t . Suppose that (N , T ) → ∞. Processes {θ ′ υ t −p } and {θ ′ υ t −p ϕ′ υ t −q } are ergodic in mean by Lemma 1 since ‖θ‖ ≤ ‖θ‖1 = O(1). Furthermore, T 1− ′ p θ Γ ft − θ ′ Γ E (ft ) → 0, T t =1

T

√ ˚ ′r υ t −p ϕ′ xt −q = w

t =1

T N −

T

˚ ′r υ t −p ϕ′ w

t =1

× (α + Γ f t −q + υt −q ).

(79)



˚ r ‖ = O(1) for any r ∈ {1, . . . , mw } by condition (27), Since ‖ N w we can use Lemma 1, result (62), which implies that

T

p

˚ ′r υ t −p ϕ′ υ t −q − E (w ˚ ′r υ t −p ϕ′ υ t −q ) → 0. w

(80)

t =1

The sequence {ϕ′ α} is deterministic and bounded in N, and therefore it follows from Lemma 1, result (61), that



T N −

T

and

p

˚ ′r υ t −p ϕ′ α → 0. w

(81)

t =1

Similarly, Lemma 2 Eq. (73) implies that

T 1− ′ p θ Γ ft ϕ′ Γ ft −q − θ ′ Γ E (ft f′t −q )Γ ′ ϕ → 0, T t =1



T N −

since ft is a covariance stationary m × 1-dimensional process with absolute summable autocovariances (ft is ergodic in mean as well as in variance), and

T

p

˚ ′r υ t −p ϕ′ Γ ft −q → 0. w

(82)

t =1

Results (80)–(82) establish that



‖θ ΓΓ ϕ‖ = O(1), ‖(θ ′ ΓΓ ′ ϕ)2 ‖ = O(1), ′

T N −

T

by Assumption 8, condition ‖θ‖1 = O(1), and condition ‖ϕ‖1 = O(1). The sum of a bounded deterministic process and independent processes that are ergodic in mean is a process that is ergodic in mean as well. This completes the proof.  Lemma 3. Let xt be generated by model (25), let Assumptions 1–8 j

hold, and let (N , T ) → ∞. Then, for any p, q ∈ {0, 1}, for any sequence of non-random weight matrices, W, of growing dimension N × mw satisfying conditions (27)–(28), and for any i ∈ K ,



T

T N −

T N −

j

T N −





xt = α + Γ ft + υ t .



follows directly from Lemma 1, result (61). This completes the proof of result (74). Let ϕ be any sequence of non-random N × 1-dimensional vectors of growing dimension such that ‖ϕ‖1 = O(1). We have

p

W′ υ t −p → 0,

(74)

t =1

p

˚ ′r υ t −p ϕ′ xt −q → 0. w

(83)

t =1

˚ l for any l ∈ Result (75) follows from Eq. (83) by setting ϕ = w {1, . . . , mw }. Result (76) follows from Eq. (83) by setting ϕ = ei , where ei is an N × 1-dimensional selection vector for the ith element. Finally, result (77) directly follows from results (74)–(76). This completes the proof.  Lemma 4. Let xt be generated by model (25), let Assumptions 1–8 j

hold, and let (N , T ) → ∞. Then, for any sequence of non-random matrices, W, of growing dimension N × mw satisfying conditions (27)–(28), and for any i ∈ K ,



T N −

T

p

W′ υ t −p xW ,t −q → 0,

(75)

T t =1

t =1



T N −

T

p

W′ υ t −p xi,t −q → 0,

T

t =1 p

git qit → 0,

where matrix Ci = E (git g′it ) and vector git = (ξ i,t −1 , xWt , xW ,t −1 , 1)′ . ′





(77)

Lemma 5. Let xt be generated by model (25), let Assumptions 2–8

(1, ξ ′i,t −1 , x′Wt ,

hold, and let (N , T ) → ∞. Then, for any sequence of non-random weight matrices, W, of growing dimension N × mw satisfying conditions (27)–(28), and for any fixed p ≥ 0,

j

where the process υ t is defined in (30), vector git = ′ xW ,t −1 )′ , and qit is defined in Eq. (37).

˚ r for r ∈ {1,√ Proof. Let w . . . , mw } denote the rth column vector of ˚ r ‖ = O(1) by granularity condition matrix W. Noting that ‖ N w (27), the result



T

(84)

Proof. Result (84) directly follows from Lemmas 1–3.

t =1

T N −

p

git g′it − Ci → 0, ′

(76)



T N −

T 1−

p

˚ ′r υ t −p → 0 w

(78)

T 1−

T t =1

p

W′ υ t −p uit → 0,

(85)

where the process υ t is defined in (30). If, in addition, T /N → ~ , with 0 ≤ ~ < ∞,

t =1 T 1 −

√ 21 See Theorem 19.11 of Davidson (1994).

T t =1

p

W′ υ t −p uit → 0.

(86)

A. Chudik, M.H. Pesaran / Journal of Econometrics 163 (2011) 4–22

Proof. Let TN = T (N ) be any non-decreasing integer-valued function of N such that limN →∞ TN = ∞ and limN →∞ TN /N = ~ < ∞, where ~ ≥ 0 is not necessarily non-zero. Define 1

κNit = √

TN

{W′ υt −p uit − E (W′ υt −p uit )},

(87)

Therefore, for each fixed i ∈ N, each of the mw two-dimensional ∞ arrays given by the elements of vector array {{κNit , FNt }∞ t =−∞ }N =i satisfies the conditions of a mixingale weak law,24 which implies that TN −

1



where the subscript N is used to emphasize the number of crosssection units.22 Let {FNt } denote the array of σ -fields that is increasing in t for each N and let κNt be measurable with respect to FNt . First it is established that, for any fixed i ∈ N, the vector ∞ array {{κNit /cNt , FNt }∞ t =−∞ }N =i is uniformly integrable, where 1 . For p > 0, we can write cNt = √NT N

∞ −

‖Φ ℓ ‖2 ,

TN E (W′ υ t −p uit ) → 0.

But

  ‖ TN E [W′ υt −p uit ]‖1 = TN ‖E (W′ ut uit )‖1    1 = TN O → 0,

−1/2

= O(1), where ‖W‖2 = O(N −1 ) ∑ by condition (27), ‖Σ ‖ = O(1) by ∞ ℓ 2 Assumptions 2 and 3, and ℓ=0 ‖Φ ‖ = O(1) by Assumption 4. For p = 0, we have

∞ ray {TN cNt }, since {{κNit , FNt }∞ t =−∞ }N =i is an L1 -mixingale with respect to the constant array {cNt }. Note that

lim

N →∞

      ∞   −  κNit κ′Nit   ′ ′ ℓ E = N · Var W u u + W Φ u u ,  t it t −ℓ it   2   cNt ℓ=1 

TN TN − − 1 √ cNt = lim

TN

t =1

N →∞

N →∞

TN

t =1

= lim

‖Φℓ ‖2 + O(N −1 ) ,

N →∞

TN −

Also, since

E |E (κNit | FN ,t −n )|

E (W′ υ t −p uit ) = O(N −1 ),

0

for any n > 0 and any fixed p ≥ 0 for n = 0 and any fixed p ≥ 0,

(88)

∞ and {{κNit , FNt }∞ t =−∞ }N =i is an L1 -mixingale with respect to 23 constant array {cNt }. Note that

lim

N →∞

TN −

cNt = lim

N →∞

t =1

TN − t =1





1

NTN

= lim

N →∞

TN N

√ =

N →∞

N →∞

N

= 0,

t =1

1 TN N

2

1

√ TN

N

= 0.

L1

qNit → 0

as N → ∞.

(89)

t =1

it follows that T 1−

T t =1

L1

W′ υ t −p uit → 0, j

as N , T → ∞ at any rate. Convergence in the L1 -norm implies convergence in probability. This completes the proof of result (85). 

~ < ∞,

and lim

N

= lim √

Therefore, for any fixed i ∈ N, a mixingale weak law25 implies that

where, as before, Ψ ii is an N × N symmetric matrix with the element (n, s) equal to E (uit uit unt ust ). Therefore, for p ≥ 0, the two-dimensional vector array {κNit /cNt } is uniformly bounded in the L2 -norm. This proves the uniform integrability of {κNit /cNt }.

τ mw cNt O(1)

TN

N →∞



ℓ=1

=

1

2 TN  TN  − − 1 lim = lim √ cNt

= O(1),



t =1

1



and

≤ N ‖W‖2 ‖Ψ ii ‖ + σii2 ‖W‖2 ‖Σ ‖ ∞ −

L1



since limN →∞ TN /N = ~ < ∞. Convergence in the L1 -norm implies convergence in probability. This completes the proof of result (86). Result (85) is established in a very similar fashion. Define the −1/2 new vector array qNit = TN κNit , where κNit is the array defined in (87) and i ∈ N is fixed. Let TN = T (N ) be any nondecreasing integer-valued function of N such that limN →∞ √ TN = ∞. Notice that, for any fixed i ∈ N, the vector array {{ TN qNit / ∞ cNt , FNt }∞ t =−∞ }N =i is uniformly integrable because {{κ Nit /cNt , ∞ FNt }∞ } t =−∞ N =i is uniformly integrable. Furthermore, {{qNit , ∞ FNt }∞ t =−∞ }N =i is an L1 -mixingale with respect to the constant ar-

ℓ=0

×

TN t =1

W′ υ t −p uit −

N

      ∞  −  κNit κ′Nit   ′ ℓ E  = N · E W Φ ut −ℓ−p uit   2  cNt ℓ=0 ′   ∞  −  ′ ℓ W Φ ut −ℓ−p uit × ,  ℓ=0   ∞  −    = N σii2 W′ Φℓ ΣΦ′ℓ W ,  ℓ=0  ≤ N σii2 ‖W‖2 ‖Σ ‖

17

Lemma 6. Let xt be generated by model (25), let Assumptions 1–8 TN − t =1

2 cNt

= lim

N →∞

TN − 1 t =1

TN N

= lim

N →∞

1 N

j

hold and let (N , T ) → ∞ such that T /N → ~ , with 0 ≤ ~ < ∞. Then, for any sequence of non-random matrices of weights W of growing dimension N × mw satisfying conditions (27)–(28), and for any i ∈ K , we have

= 0.

22 Note that W and υ t −p change with N, but, as before, we omit the subscript N here to keep the notation simple. 23 The last equality in Eq. (88) takes advantage of Liapunov’s inequality. τ is an mw × 1-dimensional vector of ones.

mw

24 See Theorem 19.11 of Davidson (1994). 25 See Theorem 19.11 of Davidson (1994).

18

A. Chudik, M.H. Pesaran / Journal of Econometrics 163 (2011) 4–22 T 1−

(a) under Assumption 9, T 1 − 12 1 − D  Ci √ git uit → N (0, Iki ), σii T t =1

(90) ′



g′it ) and  git = (ξ i,t −1 , f′t Γ W , f′t −1 Γ W , 1)′ , and where Ci = E ( git (b) under Assumption 10, ′

1

−1

√ Ωv i 2

σii T

T −

D

vi,t −1 uit → N (0, Ihi ),

(91)

t =1

where matrix Ωv i = E (vit v′it ) and vector vit = Si

∑∞ ′

ℓ=0

Φℓ ut −ℓ .

Proof. Let a be any ki × 1-dimensional vector such that ‖a‖ = 1, and define

κNt = √

1

− 12

TN σii

a ′ Ci  git uit ,

TN 1 − p −1 −1 a′ Ci 2  git g′it Ci 2 a → 1. TN t =1

p

2 κNt → 1.

(92)

T 1−

T t =1

Υ ′i Q

,

N



υW ,t −p f′t −q = op

(95)



1



N



υi,t −p υ′W ,t −q = op

,

(96)



1



N



υ W , t − p υ W , t − q = op ′

,



1



N

(97)

,

(98)

= op (1).

T

(99)

H′ Q T Z′i H T H′ H

= A′ =

Q′ Q T

Z′i Q T

=A

T

= A′



1



N

A + op

T

,

N



Q′ ui◦



1



 A + op

′ ′Q Q

T

 + op

,

(101)



1



N

 + op

(100)

,



1



N

(102)

,

(103)

where

Υi T ×hi

= (vi0 , vi1 , . . . , vi,T −1 )′ .

(104)

vit = S′i ℓ=0 Φℓ ut −ℓ , H and Zi are defined by (41) and (42), respectively, and Q, F and A are defined in (43)–(44).

−1/2

TN −

N →∞

 git uit )4 = O(1), and therefore

E (κ ) = 0. 4 Nt

t =1

Using Liapunov’s theorem (Theorem 23.11 of Davidson (1994)), Lindeberg condition27 holds, which in turn implies that p

max |κNt | → 0

as N → ∞.

(93)

1≤t ≤TN

TN

κNt = √

1 TN σii

Proof. Result (95) follows directly from Eq. (61) of Lemma 1, since the spectral norm of any column vector of the matrix W is O(N −1/2 ). Result (96) follows from result (95) by noting that ft is independently distributed of υ W ,t , and all elements of the variance matrix of ft are finite. Furthermore, since (by Lemma 1)

∑T

− 21

a′ Ci

TN



D

 git uit → N (0, 1)

(94)

t =1

Since Eq. (94) holds for any ki × 1-dimensional vector a such that ‖a‖ = 1, result (90) directly follows from Eq. (94) and Theorem 25.6 of Davidson (1994). Result (91) can be established in the same way as result (90), −1

−1/2

but this time we set κNt = TN σii−1 a′ Ωvi 2 vi,t −1 uit , where a is any hi × 1-dimensional vector such that ‖a‖ = 1. 

p

T −1 t =1 vit → 0, Eq. (99) follows. Results (97) and (98) follow directly from Eq. (62) of Lemma 1 by noting that



Results (92), (93) and the martingale difference array central limit theorem (Theorem 24.3 of Davidson (1994)) establish that

t =1

T t =1



1



∑∞

Furthermore, E (σii−1 a′ Ci



T 1−

H′ ui◦

t =1

lim

T t =1

T

 git and uit are independent, and the fourth moments of uit are finite. −1/2  Therefore, a′ Ci git uit is ergodic in variance, and −

T 1−



Furthermore,

where TN = T (N ) is any non-decreasing integer-valued function of N such that limN →∞ TN = ∞ and limN →∞ TN /N = ~ < ∞, where 0 ≤ ~ < ∞. Array {κNt , FNt } is a stationary martingale −1/2  difference array.26 Lemmas 1 and 2 imply that a′ Ci git is ergodic in variance; in particular,

TN

T t =1

υ W , t − p = op

NE (υ i,t −p υ ′W ,t −q ) = O



1

(105)



N

as well as28



NE (υ W ,t −p υ ′W ,t −q ) = O





1



N

.

(106)

In order to prove Eqs. (100)–(103), first note that row t of the matrix H − QA is (0, υ ′Wt , υ ′W ,t −1 ). Using results (95)–(98), we have

(H − QA)′ Q T

T 1−



0



υWt υ W ,t − 1   1 = op √ ,

=

 1,



T t =1

f′t ,

f′t −1

 (107)

N

Lemma 7. Let xt be generated by model (25), and suppose that j





′ 

Assumptions 1–8 hold, and that (N , T ) → ∞. Then, for any arbitrary matrix of weights, W, satisfying conditions (27)–(28), for any p, q ∈ {0, 1}, and for any i ∈ K ,

Z′i (H − QA)

26 As before, {F } denotes the array of σ -fields that is increasing in t for each N, Nt and κNt is measurable with respect to FNt . 27 See Condition 23.17 of Davidson (1994).

28 Results (105) and (106) are straightforward to establish by taking the row norm and by noting that the granularity conditions (27)–(28) imply that ‖W‖∞ = O(N −1 ).

T

=

T 1−

T t =1

ξ i,t −1



0

υWt

υ W ,t − 1

 = op



1



N



,

(108)

A. Chudik, M.H. Pesaran / Journal of Econometrics 163 (2011) 4–22

H′ (H − QA) T

=

T 1−

T t =1

 = op







x W ,t − 1



1

0



xWt

υWt

υ W ,t − 1

′ 

Z′i MH Q





T

,



N

(109)

H′ H

=

T

H′ (H − QA) T

+

H′ (QA) T

,

H′ (H − QA)

T

T t =1

 

Υ ′i MQ ui◦

=



(114) T

+ op



T

,

N

(115)

Proof. Z′i MH Zi

=

T

N

where the last equality uses Eqs. (107) and (109). This completes the proof of result (102). Eq. (89) (see proof of Lemma 5) implies that T 1−

Z′i MH ui◦

,

N

where Ωv i is defined in Assumption 10, MH and Zi are defined in (41) and (42), respectively, Q and F are defined by (43), and Υ i =  ′ vi0 , vi1 , . . . , vi,T −1 .

Q′ Q (H − QA)′ Q + A + A′ A, T T T   Q′ Q 1 , = A′ A +o p √

=

T

= op

T

Eq. (107)–(108) establish results (100) and (101). Note that

19

 

Z′i Zi



T

Z′i H



H′ H

T

+

H′ Zi

T

T

.

(116)

Results (101)–(102) of Lemma 7 imply that Z′i H

H′ H



T

H′ Zi

T

p

υW ,t −p uit − E (υW ,t −p uit ) → 0,

+

T



=

Zi Q T



′Q

A A



Q

T

+ A

′Q

A





Zi

+ op

T



1



N

.

(117)

j

as Result (103) follows by noting that √ N , T → ∞ at any rate. NE (υ W ,t −p uit ) = O(N −1/2 ). This completes the proof.  Lemma 8. Let xt be generated by model (25), and suppose that

Using the definition of the Moore–Penrose inverse, it follows that



A′

Q′ Q T

j

Assumptions 1–8 and 10 hold, and that (N , T ) → ∞. Then, for any i ∈ K , and for any arbitrary matrix of weights, W, satisfying conditions (27)–(28) and Assumption 10, we have Q′ Q T

→ ΩQ ,

Υ ′i Υ i T

  −1 ′ −1 Q′ Q

 ′



A A′

p

− Ωvi → 0,

(111)

 ΩQ =

1 0 0

0 Γ f (0) Γ f (1)



Proof. Assumption 6 implies that matrix ΩQ is non-singular. Result (110) directly follows from the ergodicity properties of the covariance stationary time-series process ft . Consider now asymptotics N , T → ∞ at any rate. Lemma 1 implies that the hi × 1-dimensional vector vit = S′i υ t is ergodic in p

t =1

S′i υ t υ ′t Si − E (S′i υ t υ ′t Si ) → 0.29 This

Lemma 9. Let xt be generated by model (25), and suppose that j

Assumptions 1–8 and 10 hold, and that (N , T ) → ∞. Then, for any i ∈ K , and for any arbitrary matrix of weights W satisfying conditions (27)–(28) and Assumption 10, we have Z′i MH Zi T Z′i MQ Zi

=

Z′i MQ Zi T

 + op

1



p

− Ωvi → 0,

29 ‖S ‖ = O(1) by Assumption 1. i 1

N

  Q′ Q = A′ A .

(118)

T

−1 

AA′

−1

A from the left and by



A′ =

Q′ Q

−1

T

.

(119)



H′ H

+

T

H′ Zi T

=

Z′i Q

Q′ Q



T

 −1

Q′ Zi

T

T

 + op



1



N

.

Zi = τα′i Si + F(−1)Γ ′i Si + Υ i .

(120)



,

(121)

Since Q = [τ, F , F (−1)], it follows that Z′i MQ Zi T

j

variance; in particular, T1 completes the proof. 

T

+ A

T

 A

Result (112) follows from Eqs. (120) and (116). Using (25), we have

Γ f (ℓ) = E (ft f′t −ℓ ), Ωvi = E (vi v′i ), matrix Q is defined in Eq. (43), and matrix Υ i = (vi0 , vi1 , . . . , vi,T −1 )′ .

∑T

Q′ Q

Q′ Q T

Q′ Q

from the right to obtain30

T

T

0 Γ f (1) , Γ f (0)



A′

Eqs. (119) and (117) imply that Z′i H

where

T

+  A

Multiply Eq. (118) by

(110)

ΩQ is non-singular, and

T

A′

A AA p

Q′ Q

 A

=

Υ ′i MQ Υ i

=

T

Υ ′i Υ i T

+

Υ ′i Q



Q′ Q

T

 −1

Q′ Υ i

T

T

.

(122)

Using Eqs. (99), (110) and (111), result (113) follows directly from (122). Results (100)–(102) of Lemma 7 imply that Z′i H



H′ H

T

T ′

=

+

Zi Q T

H′ Q T



A A′

Q′ Q T

+ A

A′

Q′ Q T

 + op

1



N



.

(123)

Substituting Eq. (119), it follows that (112) Z′i H (113)

T



H′ H T

+

H′ Q T

=

Z′i Q T



Q′ Q T

 −1

Q′ Q T

 + op

1



N



.

(124)

′ 1 ′ 30 Note that plim T →∞ T Q Q is non-singular by Lemma 8, result (110). AA is nonsingular, since matrix A has full row-rank by Assumption 10.

20

A. Chudik, M.H. Pesaran / Journal of Econometrics 163 (2011) 4–22

Proof of Theorem 1. (a) Substituting for xit in Eq. (39) yields

Eq. (124) implies that Z′i MH Q



 

Z′i MQ Q

=

T

T

+ op



T

 πi − πi 

  T

= op

N

.

N

=

This completes the proof of result (114). Results (101)–(103) of Lemma 7 imply that Z′i H



H′ H

T

+

Z′i Q

=

T

T 1−

′Q

A A



+

Q

A

T

′Q



A



ui◦

+ op

T



1



N

T t =1

.

Z′i H

H′ H

T

+

H′ Q

T

=

T

Z′i Q



Q′ Q

T

T 1−

 −1

Q′ ui◦

T

T

 + op



1



N

.

(125)

Noting that MQ (τα′i Si + F Γ i Si ) = 0 since Q = [τ, F , F (−1)], Eqs. (125) and (121) imply that ′

Z′i MH ui◦



 



Zi MQ ui◦

=



T

T

T

+ op

 



Υ i MQ ui◦

=



T

T

+ op

.

N

This completes the proof.

git qit +

T t =1

 git uit

.

(129)

p

git uit → 0.

(130)

T t =1

p

git qit → 0,

(131)

and T 1−

T t =1

p

git g′it − C(N ),i → 0,

(132)

respectively. Assumption 9 postulates that the matrix C(N ),i is 1 invertible and that ‖C− (N ),i ‖ is bounded in N. It follows from Eq. (132) that

,

N

T t =1

T 1−

Also, using Lemmas 3 and 4, we have

Substituting Eq. (119), it follows that



git git

T t =1

T 1−

j

T





With N , T → ∞ in any order, Lemma 5 yields31

H′ ui◦

T

 −1 

T 1−



T 1−

T t =1



 −1

p

1 − C− (N ),i → 0.

git g′it

(133)

p

Lemma 10. Let xt be generated by model (25), and suppose that j

Assumptions 1–8 and 10 hold, and that (N , T ) → ∞. Then, for any i ∈ K , and for any arbitrary matrix of weights, W, satisfying conditions (27)–(28) and Assumption 10, we have Z′i MH ζ i (−1) T Z′i MH ui◦



 = op

Υ ′i ui◦

= √

T

T



1



N

,

H ζ i (−1)

+ op

+ op (1),

N

=

=

T t =1





T



φib ′

∞ −

′  ℓ

Φ ut −ℓ−1

xW ,t −1

Υ ui◦ ΥQ = √i + i ′



T



T

Υ ′ ui◦ = √i + op (1), T

 −1 ′

git git

T 1 −



T t =1

T 1 −



git qit + √ git uit T t =1

.

(134)

With (N , T ) → ∞ such that T /N → ~ < ∞, Lemma 3 can be used to show that T 1 −

φib ′

∞ −



′  ℓ

Φ ut −ℓ−1



QQ T

T 1−

.

T t =1

ℓ=0

 −1

p

git qit → 0.

(135)

1 Since ‖C− (N ),i ‖ = O(1), Eqs. (133) and (135) now yield

,

ℓ=0

  T 1− xWt T t =1

T yields

j



xi,t −1

T t =1

(127)

follows from Eqs. (108) and (109).

Υ i MQ ui◦

T ( πi − πi ) =

T t =1 T 1−

T 1−

×

T

‖φib ‖∞ = O(N −1 ) by Assumption 1; therefore, result (126) directly ′







T



 

Proof.

T



(b) Multiplying Eq. (129) by

(126)

where matrices MH , and Zi are defined in (41) and (42), respectively, Υ i = (vi0 , vi1 , . . . , vi,T −1 ), and vector ζ i (−1) = (ζi,0 , . . . , ζi,T −1 )′ .

Z′i ζ i (−1)

Result  πi − πi → 0 directly follows from Eqs. (130), (131) and (133).

T 1 −

T t =1

Q ui◦

√ ,

git git

T 1 −



T t =1

p

git qit → 0.

(136)

Lemma 5 establishes that

√ ′

 −1 ′

p

υW,t −p uit → 0 for p ∈ {0, 1} .

(137)

It follows from Eq. (137) that

T

(128)

where T −1/2 Q′ ui◦ = Op (1), plimT →∞ Q′ Q/T is non-singular by Lemma 8, and Υ ′i Q/T = op (1) by Lemma 7, Eq. (99). Substituting (128) into Eq. (115) implies result (127). This completes the proof. 

T 1 −



T t =1

p

(git − git )uit → 0,

(138)

p 31 T −1 ∑T x t =1 j,t −1 uit → 0 since xjt is ergodic in mean by Lemma 2 and uit is independent of xj,t −1 for any N ∈ N and any j ∈ {1, . . . , N }. Furthermore, using

similar arguments, T −1

∑T

t =1 ft uit

p

→ 0.

A. Chudik, M.H. Pesaran / Journal of Econometrics 163 (2011) 4–22





where  git = (ξ i,t −1 , f′t Γ W , f′t −1 Γ W , 1)′ . Lemma 6 establishes that ′

1

T −

1 −1 C(N2),i √

σ(N ),ii

T t =1

D

 git uit → N (0, Iki ).

(139)

Eqs. (133), (136), (138) and (139) imply result (45). p

∑T

(c) Lemma 4 establishes that T −1 t =1 git g′it − C(N ),i → 0. The estimated residuals from auxiliary regression (38) are equal to  uit = uit − g′it ( πi − πi ), which implies that T 1−

T t =1

T 1−

 u2it =

T t =1

u2it − 2( π i − π i )′

 + ( πi − πi )



T 1−

T t =1

T 1−

T t =1

git uit

 ′

git git

( πi − πi ),

(140)

∑T p p where T −1 t =1 u2it − σ(2N ),ii → 0,  πi − πi → 0 is established

∑T p ′ t =1 git git − C(N ),i → 0 is ∑T p established in Lemma 4, and T −1 t =1 git uit → 0 is established in part (a) of this proof, T −1

in Eq. (130). This completes the proof.



Proof of Theorem 2. Vector xi◦ can be written, using system (25), as xi◦ = τ(αi − δ′i S′i α) + Zi δi + Fγ i − F(−1)Γ ′ Si δi

+ ζ i (−1) + ui◦ , (141)  ′ where ζ i (−1) = ζi0 , . . . , ζi,T −1 . Substituting Eq. (141) into the partition least-squares formula (40) and noting that, by Lemma 9, Z′i MH Q



T

  T

= op

N

,

(142)

it follows that

 √  T  δi − δi  =

Z′i MH Zi

−1 

Z′i MH ui◦ + ζ i (−1)





T

 



T

+ op

T

N

.

(143)

Lemma 9 also establishes that Z′i MH Zi T

p

− Ωvi → 0,

j

as N , T → ∞ at any rate,

(144)

where Ωv i = E (vit v′it ) is non-singular by Assumption 10. j

Consider now asymptotics N , T → ∞ such that T /N → ~ < ∞. Lemma 10 establishes that Z′i MH ζ i (−1)



T

p

→ 0,

(145)

and Z′i MH ui◦



Υ ′i ui◦

= √

T

T

  T

+ op

N

+ op (1),

(146)

where Υ i = (vi0 , . . . , vi,T −1 )′ . Also from Lemma 6 1

−1

√ Ωv i 2

σii T

T −

D

vi,t −1 uit → N (0, Ihi ).

(147)

t =1

The desired result (48) now follows from (143)–(147).



21

References Anselin, L., 1988. Spatial Econometrics: Methods and Models. Kluwer Academic Publishers, Dordrecht, The Netherlands. Bai, J., Ng, S., 2007. Determining the number of primitive shocks in factor models. Journal of Business and Economic Statistics 25, 52–60. Bernanke, B.S., Bovian, J., Eliasz, P., 2005. Measuring the effects of monetary policy: a factor-augmented vector autoregressive (FAVAR) approach. Quarterly Journal of Economics 120, 387–422. Canova, F., 1995. Vector autoregressive models: specification, estimation, inference, and forecasting. In: Pesaran, M., Wickens, M. (Eds.), Handbook of Applied Econometrics: Macroeconomics. Basil Blackwell, Oxford (Chapter 2). Chudik, A., 2008. Global Macroeconomic Modelling. Ph.D. Thesis. Trinity College, University of Cambridge. Chudik, A., Pesaran, M.H., Tosetti, E., 2010. Weak and strong cross section dependence and estimation of large panels. ECB Working Paper No. 1100, October 2009, revised April 2010. Cliff, A., Ord, J.K., 1973. Spatial Autocorrelation. Pion, London. Conley, T.G., 1999. GMM estimation with cross sectional dependence. Journal of Econometrics 92, 1–45. Conley, T.G., Topa, G., 2002. Socio–economic distance and spatial patterns in unemployment. Journal of Applied Econometrics 17, 303–327. Davidson, J., 1994. Stochastic Limit Theory. Oxford University Press. De Mol, C., Giannone, D., Reichlin, L., 2008. Forecasting using a large number of predictors: is Bayesian shrinkage a valid alternative to principal components? Journal of Econometrics 146, 318–328. Del Negro, M., Schorfheide, F., 2004. Priors from general equilibrium models for VARs. International Economic Review 45, 643–673. Dées, S., di Mauro, F., Pesaran, M.H., Smith, L.V., 2007. Exploring the international linkages of the Euro area: a global VAR analysis. Journal of Applied Econometrics 22, 1–38. Doan, T., Litterman, R., Sims, C., 1984. Forecasting and conditional projections using realistic prior distributions. Econometric Reviews 3, 1–100. Forni, M., Hallin, M., Lippi, M., Reichlin, L., 2000. The generalized dynamic factor model: identification and estimation. Review of Economics and Statistic 82, 540–554. Forni, M., Hallin, M., Lippi, M., Reichlin, L., 2004. The generalized dynamic factor model: consistency and rates. Journal of Econometrics 119, 231–235. Forni, M., Lippi, M., 2001. The generalized factor model: representation theory. Econometric Theory 17, 1113–1141. Garratt, A., Lee, K., Pesaran, M.H., Shin, Y., 2006. Global and National Macroeconometric Modelling: A Long Run Structural Approach. Oxford University Press. Geweke, J., 1977. The dynamic factor analysis of economic time series. In: Aigner, D., Goldberger, A. (Eds.), Latent Variables in Socio-Economic Models. NorthHolland, Amsterdam. Giacomini, R., White, H., 2006. Tests of conditional predictive ability. Econometrica 74, 1545–1578. Giannone, D., Reichlin, L., Sala, L., 2005. Monetary policy in real time. In: Gertler, M., Rogoff, K. (Eds.), NBER Macroeconomics Annual 2004, Vol. 19. MIT Press, pp. 161–200. Hastie, T., Tibshirani, R., Friedman, J., 2001. The Elements of Statistical Learning. In: Springer Series in Statistics. Holly, S., Pesaran, M.H., Yamagata, T., 2010. A spatio-temporal model of house prices in the US. Journal of Econometrics 158, 160–173. Hubermann, G., 1982. A simple approach to the arbitrage pricing theory. Journal of Economic Theory 28, 183–191. Ingersoll, J., 1984. Some results in the theory of arbitrage pricing. Journal of Finance 39, 1021–1039. Kelejian, H.H., Robinson, D.P., 1995. Spatial correlation: a suggested alternative to the autoregressive model. In: Anselin, L., Florax, R. (Eds.), New Directions in Spatial Econometrics. Springer-Verlag, Berlin, pp. 75–95. Leeper, E.M., Sims, C.A., Zha, T., 1996. What does monetary policy do? Brookings Papers on Economic Activity 2, 1–63. Litterman, R., 1986. Forecasting with Bayesian vector autoregressions — five years of experience. Journal of Business and Economics Statistics 4, 25–38. Pesaran, M.H., 2006. Estimation and inference in large heterogenous panels with multifactor error structure. Econometrica 74, 967–1012. Pesaran, M.H., Chudik, A., 2010. Econometric Analysis of High Dimensional VARs Featuring a Dominant Unit. ECB working paper No. 1194, May 2010. Pesaran, M.H., Schuermann, T., Treutler, B.J., 2007. Global business cycles and credit risk. In: Carey, M., Stultz, R. (Eds.), The Risks of Financial Institutions. University of Chicago Press (Chapter 9). Pesaran, M.H., Schuermann, T., Treutler, B.J., Weiner, S.M., 2006. Macroeconomic dynamics and credit risk: a global perspective. Journal of Money, Credit and Banking 38, 1211–1262. Pesaran, M.H., Schuermann, T., Weiner, S.M., 2004. Modelling regional interdependencies using a global error-correcting macroeconometric model. Journal of Business and Economics Statistics 22, 129–162. Pesaran, M.H., Smith, R., 2006. Macroeconometric modelling with a global perspective. The Manchester School (Supplement), 24–49. Pesaran, M.H., Smith, L.V., Smith, R.P., 2007. What if the UK or Sweden had joined the Euro in 1999? An empirical evaluation using a global VAR. International Journal of Finance and Economics 12, 55–87. Pesaran, M.H., Tosetti, E., 2010. Large Panels with Common Factors and Spatial Correlation. CESifo Working Paper No. 2103, September 2007, revised May 2010. Ross, S., 1976. The arbitrage theory of capital asset pricing. Journal of Economic Theory 13, 341–360.

22

A. Chudik, M.H. Pesaran / Journal of Econometrics 163 (2011) 4–22

Sargent, T.J., Sims, C.A., 1977. Business cycle modeling without pretending to have too much a-priori economic theory. In: Sims, C. (Ed.), New Methods in Business Cycle Research. Federal Reserve Bank of Minneapolis, Minneapolis. Stock, J.H., Watson, M.W., 1999. Forecasting inflation. Journal of Monetary Economics 44, 293–335.

Stock, J.H., Watson, M.W., 2002. Macroeconomic forecasting using diffusion indexes. Journal of Business and Economic Statistics 20, 147–162. Stock, J.H., Watson, M.W., 2005. Implications of dynamic factor models for VAR analysis. NBER Working Paper No. 11467. Whittle, P., 1954. On stationary processes on the plane. Biometrika 41, 434–449.