Accepted Manuscript On the role of the rank condition in CCE estimation of factor-augmented panel regressions Hande Karabiyik, Simon Reese, Joakim Westerlund PII: DOI: Reference:
S0304-4076(16)30200-7 http://dx.doi.org/10.1016/j.jeconom.2016.10.006 ECONOM 4320
To appear in:
Journal of Econometrics
Received date: 29 November 2015 Revised date: 17 September 2016 Accepted date: 27 October 2016 Please cite this article as: Karabiyik, H., Reese, S., Westerlund, J., On the role of the rank condition in CCE estimation of factor-augmented panel regressions. Journal of Econometrics (2016), http://dx.doi.org/10.1016/j.jeconom.2016.10.006 This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
O N THE R OLE OF THE R ANK C ONDITION IN CCE E STIMATION OF FACTOR -A UGMENTED PANEL R EGRESSIONS ∗ Hande Karabiyik
Simon Reese
Joakim Westerlund†
VU University Amsterdam
Lund University
Lund University and Centre for Financial Econometrics Deakin University
October 28, 2016 Abstract A popular approach to factor-augmented panel regressions is the common correlated effects (CCE) estimator of Pesaran (Estimation and inference in large heterogeneous panels with a multifactor error structure. Econometrica 74, 967–1012, 2006). This paper points to a problem with the CCE approach that appears in the empirically relevant case when the number of factors is strictly less than the number of observables used in their estimation. Specifically, the use of too many observables causes the second moment matrix of the estimated factors to become asymptotically singular, an issue that has not yet been appropriately accounted for. The purpose of the present paper is to fill this gap in the literature.
JEL Classification: C12; C13; C33; C36. Keywords: Factor-augmented panel regression; CCE estimation; Moore–Penrose inverse.
1
Introduction
Consider the scalar yi,t and the k × 1 vector xi,t , where i = 1, ..., N and t = 1, ..., T index the
cross-sectional and time series dimensions, respectively. Except for some simplifications that ∗ Previous
versions of the paper were presented at a seminar at Maastricht University, at the 2016 UvAEconometrics Panel Data Workshop, at the 9th International Conference on Computational and Financial Econometrics, and at the IAAE 2016 Annual Conference. The authors would like to thank seminar and workshop participants, and in particular Jianqing Fan (Co-Editor), Arturas Juodis, Maurice Bun, Joerg Breitung, Mehdi Hosseinkouchack, one Associate Editor, and two anonymous referees for many valuable comments and suggestions. Thank you also to the Knut and Alice Wallenberg Foundation for financial support through a Wallenberg Academy Fellowship, and the Jan Wallander and Tom Hedelius Foundation for financial support under research grant number P2014–0112:1. † Corresponding author: Department of Economics, Lund University, Box 7082, 220 07 Lund, Sweden. Telephone: +46 46 222 8997. Fax: +46 46 222 4613. E-mail address:
[email protected].
1
are irrelevant for the purpose of the paper, such as the absence of deterministic terms, the model is the same as in Pesaran (2006), and is given by yi = Xi β + ei ,
(1)
ei = Fγi + εi ,
(2)
Xi = FΓi + Vi ,
(3)
where yi = (yi,1 , . . . , yi,T )0 and Xi = (xi,1 , ..., xi,T )0 are T × 1 and T × k, respectively, β is a
k × 1 vector of coefficients, F = (f1 , ..., f T )0 is a T × m matrix of unobserved common factors
with γi and Γi being the associated vectors of factor loadings, and εi = (ε i,1 , ..., ε i,T )0 and Vi =
(vi,1 , ..., vi,T )0 are T × 1 and T × k matrices, respectively, of idiosyncratic errors. Except for the
requirements that ε i,t and vi,t are serially uncorrelated and homoskedastic and βi are assumed
to be the same for all i = 1, . . . , N, Assumption 1 is the same as Assumptions 1–4 in Pesaran (2006). Assumption 1. (i) ε i,t is independently and identically distributed (iid) across both i and t with E(ε i,t ) = 0, E(ε2i,t ) = σ2 and E(ε4i,t ) < ∞; 0 ) = Σ positive definite and (ii) vi,t is iid across both i and t with E(vi,t ) = 0k×1 , E(vi,t vi,t p E(kvi,t k4 ) < ∞, where kAk = tr(A0 A) is the Frobenius norm of the matrix A;
(iii) ft is covariance stationary such that E(kft k4 ) < ∞ and E(ft f0t ) = Σf is positive definite;
(iv) γi and Γi are iid across i, independent of ε j,t , v j,t and ft for all i and j, have fixed means γ and Γ, respectively, and finite variances; (v) ε i,t , vi,s and f` are mutually independent for all i, j, t, s and `. Because of the way that F enters both (2) and (3) the estimation of β is nontrivial. However, by combining (1) and (3), we have Zi = FCi + Ui ,
(4)
0 )0 is ( k + 1) × 1, C = ( Γ β + where Zi = (yi , Xi ) = (zi,1 , ..., zi,T )0 is T × (k + 1), zi,t = (yi,t , xi,t i i
γi , Γi ) is m × (k + 1), and Ui = (ui,1 , ..., ui,T )0 = (Vi β + εi , Vi ) is T × (k + 1). Thus, (1)–(3) can
be rewritten equivalently as a static factor model for Zi , which means that F can be estimated 2
using existing approaches for such models. In the common correlated effects (CCE) approach of Pesaran (2006), the estimator of F is particularly simple, and is given by b = Z = FC + U, F
(5)
where A = N −1 ∑iN=1 Ai for any Ai . It is important to note here that under Assumption 1, p p b U → 0 T ×(k+1) as N → ∞, where → signifies convergence in probability. This means that F
is consistent for the space spanned by F. The pooled CCE estimator of β is the conventional b in place of F; pooled OLS estimator with F ! −1 N N b P = ∑ X 0 M b Xi b ∑ X 0 M b yi , i =1
i
F
i
i =1
(6)
F
where MA = I T − PA = I T − A(A0 A)+ A0 for any T-rowed matrix A with (A0 A)+ being the Moore–Penrose (MP) inverse of A0 A. Because of its simplicity and generality, the CCE
approach has attracted considerable attention, so much so that there is by now a separate CCE branch of the literature. This literature makes extensive use of the asymptotic distribution of √ b P − β), which has been shown to be normal under a wide variety of circumstances (see, NT (b for example, Chudik et al., 2011; Kapetanios et al., 2010; Pesaran et al., 2013; Westerlund and Urbain, 2015).
The current paper is about the way in which the asymptotic distribution of
√
b P − β) NT (b
has been established. A critical first step in all existing proofs is to show that the effect of estimating F is negligible, such that F can be treated as known in the rest of the proof. This is done by showing that ( NT )−1 ∑iN=1 Xi0 (MFC − MFb )Xi , ( NT )−1/2 ∑iN=1 Xi0 (MFC − MFb )εi and
( NT )−1/2 ∑iN=1 Xi0 (MFC − MFb )Fγi are negligible, such that ! −1 N N √ 1 1 b P − β) = √ NT (b Xi0 MFb (Fγi + εi ) Xi0 MFb Xi ∑ ∑ NT i=1 NT i=1 ! −1 N 1 N 0 1 √ = Xi MFC Xi Xi0 MFC εi + o p (1). ∑ ∑ NT i=1 NT i=1
(7)
Let us therefore consider MFC − MFb , which can be expanded in the following way: 0
0
b0 F b)+ U + U(F b0 F b)+ C F0 + FC(F b0 F b)+ U MFC − MFb = U(F b0 F b)+ − (C0 F0 FC)+ ]C0 F0 . + FC[(F
0
(8)
p
0
b0 F b − T −1 C F0 FC → 0(k+1)×(k+1) as N, T → ∞, which is taken Pesaran (2006) shows that T −1 F
to imply that the same result holds for the difference of the inverses. This reasoning has been 3
used in numerous other studies (see, for example, Chudik et al., 2011; Kapetanios et al., 2010; Pesaran et al., 2013; Westerlund and Urbain, 2015). However, this is not always true. Suppose p
therefore that An − A0 → 0r×r as n → ∞ for any real r × r matrix (matrix sequence) A0 (An ). p
+ Let rn = rank(An ) and r0 = rank(A0 ). Then, according to Andrews (1987), A+ n − A0 → 0 r ×r
if and only if
a.s.
r n − r0 → 0
(9) a.s.
b) − b0 F as n → ∞, where → signifies almost sure convergence. Therefore, only if rank( T −1 F 0
a.s.
0
p
b − T −1 C F0 FC → 0(k+1)×(k+1) imply b0 F rank( T −1 C F0 FC) → 0(k+1)×(k+1) as N, T → ∞ does T −1 F p
0
b)+ − ( T −1 C F0 FC)+ → 0(k+1)×(k+1) . While this fact has not gone unnoticed in the CCE b0 F ( T −1 F
literature, it has been thought that provided that rank(C) = m ≤ k + 1,
(10) 0
b0 F b should converge to the rank of T −1 C F0 FC, and that this should in turn the rank of T −1 F p
0
b0 F b)+ − ( T −1 C F0 FC)+ → 0(k+1)×(k+1) (see Chudik et al., 2011; Kapetanios ensure that ( T −1 F
et al., 2010; Pesaran, 2006; Pesaran et al., 2013). However, the rank is a discrete function and b0 F b reaches its asymptotic for m < k + 1 the probability that a rank change occurs before T −1 F 0
b0 F b) = rank( T −1 C F0 FC)] → 0. The only way to ensure that limit is zero. That is, P[rank( T −1 F 0
a.s.
b0 F b) − rank( T −1 C F0 FC) → 0 is therefore to assume that m = k + 1. rank( T −1 F
The purpose of the present paper is in part to make the above discussion of the conse-
quences of improper use of the MP inverse a little more precise, in part to propose a solution. This requires new tools. Section 2 therefore provides some general theory for perturbed matri0
b0 F b − T −1 C F0 FC may be regarded as negligices. This is a natural starting point, because T −1 F 0
ble perturbation of T −1 C F0 FC. In Section 3 the new theory is applied to the problem at hand. b0 F b)+ does not converge to The results show that unless (10) is satisfied with equality, ( T −1 F 0
0
b0 F b)+ − ( T −1 C F0 FC)+ actually diverges. The method described ( T −1 C F0 FC)+ , but that ( T −1 F
in the last paragraph is therefore not suitable for evaluating MFC − MFb , and hence also not √ b P − β). An alternative method is suitable for studying the asymptotic distribution of NT (b therefore needed. This is the subject of Section 4. Section 5 concludes. All proofs are provided in the supplemental material, which also contains some useful illustrations.
4
2
Some matrix perturbation theory
Consider again the r × r matrices A0 and An . The following assumption is made: A n = A0 + E n , where En is a perturbation of A0 . Most of the existing matrix perturbation theory is based on the assumption that En and A0 are non-random, that n is fixed, and that kA0+ En k < 1, or even
kA0+ kkEn k < 1 (see, for example, Stewart, 1977; Wedin, 1973). The more general conditions
that we will be working under are given in Assumption 2. Assumption 2. (i) E(kA0 k) < ∞; p
(ii) kEn k → 0 as n → ∞. 0
0
b0 F, b T −1 C F0 FC and T −1 F b0 F b − T −1 C F0 FC, respectively, Replacing An , A0 and En with T −1 F
it is clear that Assumption 2 provides a natural starting point for our analysis. The questions p
+ are: What are the conditions under which A+ n − A0 → 0r ×r , and what are the consequences if
those conditions are not met? We begin by noting that since kAn − A0 k = kEn k = o p (1), we have that P(rn ≥ r0 ) → 1 as n → ∞ (see, for example, Andrews, 1987, Note 1). a.s.
Theorem 1. Suppose that Assumption 2 is met, and that rn − r0 → 0 as n → ∞. Then, + kA+ n − A0 k = O p (k En k) = o p (1).
Theorem 1 is similar to Theorem 2 of Andrews (1987). Unfortunately, the results of Ana.s.
drews (1987) cannot be used to study the consequences of a violation of rn − r0 → 0. Theorem
2 provides the missing piece.
Theorem 2. Suppose that Assumption 2 is met, and that P(rn > r0 ) → 1 as n → ∞. Then, + −1 kA+ n − A0 k = O p (k En k).
Theorem 2 implies that if An is “near” A0 , but rn > r0 , then its MP inverse can be larger and completely different from A0+ , and the smaller is En , the worse the problem can be. Hence, if we + + want A+ n to be well-behaved in the sense that k An − A0 k = o p (1), a necessary and sufficient
5
a.s.
condition is given by (9), that is, rn − r0 → 0. Unfortunately, this condition is not easily verified.
Chudik et al. (2011), Kapetanios et al. (2010), and Pesaran et al. (2013) all recognize the p
a.s.
importance of (9). They claim that if En → 0r×r and rank(A0 ) = r0 , then rn − r0 → 0, which p
is not correct, since En → 0r×r only implies P(rn ≥ r0 ) → 1, and not P(rn = r0 ) → 1 (see Andrews, 1987, Note 1). A key result in this regard is that if An is a continuous random matrix
of full rank, then a.s.
rn − r → 0
(11)
as n → ∞ (see, for example, Feng and Zhang, 2007). This means that for (9) to hold, we require r n = r0 = r
(12)
for all n. An important consequence of Theorem 1 is that under (9), + kA+ n k = O p (k A0 k) = O p (1)
(13)
If, on the other hand, (9) is not met, then, by Theorem 2, −1 kA+ n k = O p (k En k).
3
(14)
Implications for CCE
We now apply the theory developed in Section 2 to the CCE problem described in Section 1. 0
We begin by noting how E(k T −1 C F0 FCk) < ∞, and by (36) in Pesaran (2006), we also have 0
b0 F b − T −1 C F0 FCk = O p ( N −1 ) + O p (( NT )−1/2 ) k T −1 F
0
b0 F, b T −1 C F0 FC showing that Assumption 2 is applicable with An , A0 and En replaced by T −1 F 0
b0 F b − T −1 C F0 FC), respectively. As for the rank of T −1 F b in analogy to (11), b0 F, and ( T −1 F a.s.
b0 F b) → k + 1 rank( T −1 F
(15)
as N, T → ∞. Moreover, by Assumption 1 (iii), rank( T −1 F0 F) = m for all T, including T → ∞. 0
The rank of T −1 C F0 FC is therefore determined by the rank of C. Hence, in view of (10), 0
rank( T −1 C F0 FC) = m,
(16)
for all N and T, including T, N → ∞. 6
a.s.
0
b0 F b) → rank( T −1 C F0 FC), by Suppose now that m = k + 1. Since in this case rank( T −1 F
Theorem 1, we have the following: 0
b0 F b)+ − ( T −1 C F0 FC)+ k = O p ( N −1 ) + O p (( NT )−1/2 ), k( T −1 F
(17)
b0 F b ) + k = O p (1). k( T −1 F
(18)
b will not converge to b0 F If, however, m < k + 1, then (15) and (16) imply that the rank of T −1 F 0
that of T −1 C F0 FC, and so, by Theorem 2,
√ b0 F b)+ − ( T −1 C0 F0 FC)+ k = O p ( N ) + O p ( NT ). k( T −1 F
(19)
b)+ k is of the same order. b0 F Further use of (14) shows that k( T −1 F
As an illustration of the implications of the above findings, let us consider T −1 Xi0 (MFC −
MFb )Xi . From Lemma 3 of Pesaran (2006), we have k T −1 Xi0 Uk = O p ( N −1 ) + O p (( NT )−1/2 ). In
view of this, kCk = O p (1), k T −1 F0 Xi k = O p (1) and (8), we have
k T −1 Xi0 (MFC − MFb )Xi k
0
0
b0 F b)+ kk T −1 U Xi k + 2k T −1 Xi0 Ukk( T −1 F b0 F b)+ kkC kk T −1 F0 Xi k ≤ k T −1 Xi0 Ukk( T −1 F 0
b0 F b)+ − ( T −1 C F0 FC)+ kk T −1 F0 Xi k + k T −1 Xi0 FkkCk2 k( T −1 F b0 F b)+ k) = [O p ( N −1 ) + O p (( NT )−1/2 )]O p (k( T −1 F 0
b0 F b)+ − ( T −1 C F0 FC)+ k), + O p (k( T −1 F
(20)
√ which is O p ( N −1 ) + O p (( NT )−1/2 ) if m = k + 1 and O p ( N ) + O p ( NT ) if m < k + 1, where the latter contradicts the results reported by Chudik et al. (2011), Kapetanios et al. (2010), Pesaran et al. (2013), and Westerlund and Urbain (2015). Similar results are obtained when the steps are applied to k( NT )−1/2 ∑iN=1 Xi0 (MFC − MFb )εi k and k( NT )−1/2 ∑iN=1 Xi0 (MFC − MFb )Fγi k. This means that if m < k + 1 the predominating method of proof fails to show that the effect of the estimation of F is negligible.
4
An alternative method of proof
The above results imply that many of the statements in the literature are actually yet to be proven. In what follows, we therefore propose a new method of proof that is appropriate in general, provided that (10) is met. The idea is the same as when analyzing sample second moment matrices where the elements are of different orders of magnitude (see, for example, 7
Chang and Phillips, 1995), that is, we normalize to ensure convergence to a positive definite matrix. Let us therefore assume without loss of generality that C = [Cm , C−m ], where Cm is an
m × m full rank matrix and C−m is m × (k + 1 − m). By similarly partitioning U = [Um , U−m ],
we obtain
b = [FCm , FC−m ] + [Um , U−m ]. F
Define
B = [ Bm , B−m ] =
"
−1
Cm
0(k+1−m)×m
(21)
−1
− Cm C−m I k +1− m
#
,
which is of full rank under (10). Post-multiplying (21) by B yields 1 −1 b = FCB + UB = [F, 0 T ×(k+1−m) ] + [Um C− FB m , U − m − U m C m C − m ].
b0 FBD b We now look for a conformable normalization matrix D N such that T −1 D0N B0 F N con-
verges to a positive matrix. Here we make use of Lemma 2 of Pesaran (2006), which states that 0
b0 FB b converges to Σf , k T −1 U Uk = O p ( N −1 ). Hence, while the upper left m × m block of T −1 B0 F
the lower right (k + 1 − m) × (k + 1 − m) block is O p ( N −1 ). This means that the required nor√ malization matrix is given by D N = diag(Im , NIk+1−m ). Hence, letting F0 = [F, 0 T ×(k+1−m) ] −1 √ −1 0 0 0 and U = UBD N = [Um , U−m ] = [Um Cm , N (U−m − Um Cm C−m )], the resulting normalized b is given by version of FB
0 0 b0 = FBD b F N = F +U .
(22)
One of the key insights behind the approach used here is that since BD N is positive definite, b0 ) 0 F b0 ]+ . From the definition of F b0 , MFb = MFb0 . Consider [ T −1 (F
b0 ) 0 F b 0 = T −1 ( F 0 ) 0 F 0 + T −1 ( F 0 ) 0 U 0 + T −1 ( U 0 ) 0 F 0 + T −1 ( U 0 ) 0 U 0 , T −1 ( F
where
T
−1
0 0 0
(F ) F =
"
T −1 F 0 F
0m×(k+1−m)
0(k+1−m)×m 0(k+1−m)×(k+1−m)
#
.
(23)
(24)
By (A.11) in Lemma 2 of Pesaran (2006), k T −1 F0 Um k and k T −1 F0 U−m k are both O p (( NT )−1/2 ). This implies
kT
−1
" # −1 √ −1
NT −1 (F0 U−m − F0 Um Cm C−m )
T −1 F 0 U m C m (F ) U k =
= O p ( T −1/2 ). (25)
0(k+1−m)×m 0(k+1−m)×(k+1−m) 0 0
0
8
0
0
0
Also, by using the decomposition of U into Um and U−m , # " 0 0 0 0 0 0 ) ( ) U U U U ( 0 0 m m m −m T −1 ( U ) 0 U = T −1 . 0 0 0 0 ( U−m )0 Um ( U−m )0 U−m 0
By (A.10) in Lemma 2 of Pesaran (2006), we have k T −1 Um Um k = O p ( N −1 ), and by further use −1
of Assumption 1 (iv), kCm k and kCm k are O p (1). It follows that 0
−1
0
0
k T −1 (Um )0 Um k ≤ N −1 kCm k2 k( NT −1 Um Um )k = O p ( N −1 ),
(26)
which in turn implies 0
−1
0
−1
0
0
k T −1 (Um )0 U−m k ≤ N −1/2 kCm k k NT −1 Um U−m k + N −1/2 kCm k2 k NT −1 Um Um k kC−m k = O p ( N −1/2 ). 0
(27)
0
0
It remains to evaluate T −1 (U−m )0 U−m . Use of U−m =
√
leads to
0
0
0
−1
NU[−C−m (Cm )0 , Ik+1−m ]0 =
0
T −1 (U−m )0 U−m = B0−m NT −1 U UB−m .
√
NUB−m
(28)
0 ). A straightforward calculation reveals that Let Σu = E(ui,t ui,t
1 NT
0
NT −1 U U =
N
N
T
∑ ∑ ∑ ui,t u0j,t = Σu + O p (T −1/2 ),
(29)
i =1 j =1 t =1
which in turn implies 0
0
T −1 (U−m )0 U−m = Σu0−m + O p ( T −1/2 )
(30)
where Σu0−m = B0−m Σu B−m , a (k + 1 − m) × (k + 1 − m) matrix. It is important to note that this 0
0
matrix is positive definite, which implies that T −1 (U−m )0 U−m is positive definite, too. Hence, 0
0
rank[ T −1 (U−m )0 U−m ] = k + 1 − m. By substituting (24)–(28) into (23), we obtain the following: b0 ) 0 F b0 = Σf0 + O p ( N −1/2 ) + O p ( T −1/2 ), T −1 ( F
where
Σ f0 =
"
T −1 F 0 F
0m×(k+1−m) 0
0
0(k+1−m)×m T −1 (U−m )0 U−m
#
(31)
.
Consider the rank of this matrix. Note how T −1 F0 F = Σf + O p ( T −1/2 ) and from (30) we also 0
0
a.s.
know that T −1 (U−m )0 U−m = Σu0−m + O p ( T −1/2 ). We have already shown that rank( T −1 F0 F) → 9
0
0
m as T → ∞, which, together with the positive definiteness of T −1 (U−m )0 U−m , in turn implies
a.s. a.s. b0 ) 0 F b0 ] → rank(Σf0 ) → k + 1. But we also have rank[ T −1 (F k + 1 as N, T → ∞, and so we obtain a.s.
b0 ) 0 F b0 ] → rank(Σf0 ). rank[ T −1 (F
(32)
b0 ) 0 F b0 ]+ = Σ+0 + O p ( N −1/2 ) + O p ( T −1/2 ). [ T −1 ( F f
(33)
b0 ) 0 F b0 ] + k = O p (1). k[ T −1 (F
(34)
By using this result, (31) and Theorem 1, we obtain the following key result:
An important implication of (13) and (33) is that
0
b0 F b)+ − ( T −1 C F0 FC)+ k and Hence, in contrast to the divergence result obtained for k( T −1 F b0 ) 0 F b0 ) 0 F b0 F b)+ k, k[ T −1 (F b0 ]+ − Σ+0 k and k[ T −1 (F b0 ]+ k do behave “nicely”. From this point k( T −1 F f
on the analysis is similar to the analyses of Chudik et al. (2011), Kapetanios et al. (2010), Pesaran et al. (2013), and Westerlund and Urbain (2015). The key difference is that here we make use of the fact that MFb = MFb0 , which allows us to proceed in roughly the same way as
b and assuming that m = k + 1. We therefore put the details in the supplemental when using F material, and just provide here the final result.
Theorem 3. Under Assumption 1 and condition (10), as N, T → ∞ with T/N → τ < ∞,
√
d b P − β) → NT (b N ( 0 m ×1 , σ 2 Σ −1 ) +
√
τΣ−1 (b − d),
where b = b2 − b1 − b3 and d = d2 − d1 with d1 = d2 = b1 = b2 = b3 =
lim
N →∞
1 N
N
0
∑ (Σ[ β, Ik ] − Γi0 (C )+ Σu )P−m Σu C
+
i =1
0
γi ,
0
lim (Σ[ β, Ik ] − Γ (C )+ Σu )P−m σ2 [1, 01×k ]0 ,
N →∞
+
lim Σ[ β, Ik ]C γ,
N →∞
lim
N →∞
1 N
N
0
∑ Γi0 (C )+ Σu C
i =1 0 + 2 0
+
γi ,
lim Γ (C ) σ [1, 01×k ]0 ,
N →∞
and P−m = B−m (B0−m Σu B−m )+ B0−m .
√
As far as we are aware, Theorem 3 is the first to establish the asymptotic distribution of b P − β) while properly accounting for the problematic m < k + 1 case. Interestingly, NT (b 10
while far from obvious from the results of Section 3, the degeneracy that occurs when m < k + 1 does not interfere with asymptotic normality. In this sense, Theorem 3 is consistent with the previous literature (see, for example, Pesaran, 2006; Chudik et al., 2011; Kapetanios et al., 2010; Pesaran et al., 2013). We also see that the estimator is biased, which is in agreement with the results of Westerlund and Urbain (2015, Theorem 1). Interestingly, the bias expression given in Theorem 3 is not the same as the one reported in this other paper. The difference is d. This term depends on P−m , capturing the effect of the degenerate regressors, and is only present under m < k + 1. In fact, it is not difficult to show that if m = k + 1, such that (10) is satisfied
with equality, then
√
d b P − β) → NT (b N ( 0 m ×1 , σ 2 Σ −1 ) +
√
τΣ−1 b.
(35)
If, in addition, T/N → 0, then
√
d b P − β) → NT (b N ( 0 m ×1 , σ 2 Σ −1 ),
which under Assumption 1 is the same as in Theorem 4 of Pesaran (2006). Hence, while important under T/N → τ < ∞, failure to account for the degenerate regressors does not have an effect under the more restrictive assumption that T/N → 0.
5
Conclusion
The CCE approach of Pesaran (2006) has attracted considerable interest in the literature on factor-augmented panel regressions. In the present paper we point to a problem with the CCE approach that seems to have gone largely unnoticed in this literature. The problem occurs in the empirically relevant case when m < k + 1. Specifically, the use of too many observables causes the second moment matrix of the estimated factors to become asymptotically singular, which in turn invalidates some of the arguments commonly used to establish asymptotic theory. Hence, the bulk of existing theories is actually yet to be proven. A new method of proof is therefore proposed that is shown to alleviate the singularity problem, leading to a straightforward asymptotic analysis.
11
References Andrews, D. W. K. (1987). Asymptotic Results for Generalized Wald Tests. Econometric Theory 3, 348–358. Chang, Y., and P. C. B. Phillips (1995). Time Series Regression with Mixtures of Integrated Processes. Econometric Theory 10, 1033–1094. Chudik, A., and M. H. Pesaran (2011). Econometric Analysis of High Dimensional VARs Featuring a Dominant Unit. Econometric Reviews 32, 592–649. Chudik, A. M., M. H. Pesaran, and E. Tosetti (2011). Weak and Strong Cross Section Dependence and Estimation of Large Panels. Econometrics Journal 14, C45–C90. Feng, X., and Z. Zhang (2007). The Rank of a Random Matrix. Applied Mathematics and Computation 185, 689–694. Kapetanios, G., M. H. Pesaran, and T. Yamagata (2011). Panels with Non-Stationary Multifactor Error Structures. Journal of Econometrics, 160, 326–348. Pesaran, M. H. (2006). Estimation and Inference in Large Heterogeneous Panels with a Multifactor Error Structure. Econometrica 74, 967–1012. Pesaran, H. M. (2007). A Simple Panel Unit Root Test in the Presence of Cross Section Dependence. Journal of Applied Econometrics 22, 265–312. Pesaran, H. M., L. V. Smith, and T. Yamagata (2013). Panel Unit Root Tests in the Presence of a Multifactor Error Structure. Journal of Econometrics 175, 94–115. Stewart, G. W. (1977). On the Perturbation of Pseudo-Inverses, Projections and Linear Least Squares Problems. SIAM Review 19, 634–662. ˚ (1973). Perturbation Theory for Pseudo-Inverses. BIT 13, 217–232. Wedin, P.-A. Westerlund, J, and J.-P. Urbain (2015). Cross-Sectional Averages versus Principal Components. Journal of Econometrics 185, 372–377.
12