Determining the number of factors when the number of factors can increase with sample size

Determining the number of factors when the number of factors can increase with sample size

Accepted Manuscript Determining the number of factors when the number of factors can increase with sample size Hongjun Li, Qi Li, Yutang Shi PII: DOI:...

394KB Sizes 2 Downloads 166 Views

Accepted Manuscript Determining the number of factors when the number of factors can increase with sample size Hongjun Li, Qi Li, Yutang Shi PII: DOI: Reference:

S0304-4076(16)30202-0 http://dx.doi.org/10.1016/j.jeconom.2016.06.003 ECONOM 4322

To appear in:

Journal of Econometrics

Received date: 17 December 2013 Revised date: 31 May 2016 Accepted date: 5 June 2016 Please cite this article as: Li, H., Li, Q., Shi, Y., Determining the number of factors when the number of factors can increase with sample size. Journal of Econometrics (2016), http://dx.doi.org/10.1016/j.jeconom.2016.06.003 This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

Determining the Number of Factors When the Number of Factors Can Increase with Sample Size Hongjun Li International School of Economics and Management, Capital University of Economics and Business, Beijing, 100070, PR China Qi Li Department of Economics, Texas A&M University and International School of Economics and Management, Capital University of Economics and Business, Beijing, 100070, PR China Yutang Shi∗ School of Finance, Nankai University, Tianjin, 300350, PR China and Wells Fargo & Company, USA November 10, 2016

Abstract Correctly specifying the number of factors (r) is a fundamental issue for the application of factor models. In this paper we develop an econometric method to estimate the number of factors in factor models of large dimensions where the number of factors is allowed to increase as the two dimensions, cross-section size (N) and time period (T) increase. Using similar information criteria as proposed by Bai and Ng (2002), we show that the number of factors can be consistently estimated using the criteria. We propose a new procedure that avoids over estimating the number of factors while allowing for one to search for possible number of factors over a wide range of positive integers so that it also avoids underestimation of the number of factors. We conduct Monte-Carlo simulation to investigate the finite sample properties of the proposed approach. JEL classication: C2, C3, C5, G1 Keywords: Principal Components, Factor Analysis, Increasing Number of Factors, Information Criteria.

∗ We would like to thank two anonymous referees, an associate editor and seminar participants at Prince-

ton University, University of Southern California, University of Kansas, Xiamen University and Texas A&M University for helpful comments that greatly improved our paper. Hongjun Li’s research is partially supported by China National Science Foundation, project # 71601130. The corresponding author: Yutang Shi: shij- [email protected] (emails for Qi Li: [email protected] and for Hongjun Li: [email protected])

1 Introduction Factor models have been widely used in economic analyses such as forecasting economic variables, estimating variance-covariance matrices with high dimensional data, and estimating average treatment effects, among others. In practice a few common factors may capture the variations of a large number of economic variables. In the finance literature, the arbitrage pricing theory (APT) of Ross (1976) assumes that a small number of factors can be used to explain a large number of asset returns. Stock and Watson (1998) and Stock and Watson (1999) consider forecasting inflation with diffusion indices (“factors”) constructed from a large number of macroeconomic series. Gregory and Head (1999) and Forni et al. (2000) find that cross country variations have common components. Fan et al. (2011), and Fan et al. (2013) use factor models to estimate high dimensional variancecovariance matrices. Factor models can also be used to evaluate the impacts of various policies (e.g., Hsiao et al. (2012)). By assuming that the cross-sectional correlations for all the units are attributed to the presence of some (unobserved) common factors, Hsiao et al. (2012), Ching et al. (2012), Bai et al. (2014), Ouyang and Peng (2015) and Li and Bell (2017) use panel data methods to construct the counterfactuals and to measure average treatment effects of some policy interventions based on factor models. A fundamental issue of factor models is the correct specification of the number of factors, r. When the number of factors is fixed, Bai and Ng (2002), Onatski (2009), Ahn and Horenstein (2013), among others, have developed various approaches to consistently estimate the number of factors. But many empirical findings suggest that the number of factors may increase as the dimensions of the data N increases, or T increases. For many empirical analyses, the estimated number of factors ranges from one to more than ten, see Ludvigson and Ng (2009), Giannone and Sala (2005) and Forni and Gambetti (2010). This suggests that the number of factors may depend on sample sizes. One reason that the number of factors may increase with sample size is structural break, new factors may emerge after economic environments change. Using Bai and Ng’s information criteria, Ludvigson and Ng (2007) find that the factor structure of their financial dataset comprising of 172 (N = 172) series quarterly financial indicators spanning the first quarter of 1960 through the fourth quarter of 2002 (T = 172) can be well described by 8 (r = 8) common factors. Jurado et al. (2015) update monthly version of the 147 financial time series used 1

in Ludvigson and Ng (2007) and combine them with an updated version of 132 monthly macroeconomic series used in Ludvigson and Ng (2010). They find that 12 (r = 12) common factors can capture the variations of this new dataset with 279 series (N = 279) spanning the period 1959:01-2011:12 (T = 636). Hence, finding in Jurado et al. (2015) supports the argument that the number of factors may increase as sample size increases. Assuming that the number of factors r is fixed, there are many papers in the literature analyzing the problem of determining the number of factors. Some of them not only fix the number of factors, but also impose restrictions on the dimensions N and T, such as Lewbel (1991), Donald (1997), Cragg and Donald (1997), Connor and Korajzcyk (1993), Forni and Reichlin (1998) and Stock and Watson (1998). Imposing no restriction on the relation between N and T except that both N and T are assumed to be large, Bai and Ng (2002) treat the determination of the number of factors as a model selection problem, they propose some criteria and show that the number of factors can be consistently estimated by minimizing the proposed criteria. To improve the finite sample performances of Bai and Ng’s criteria, Hallin and Liska (2007), and Alessi et al. (2010) propose some modifications to Bai and Ng’s procedure by introducing a tuning multiplicative constant in the penalty objective function. Onatski (2009) develops a test of the null of k0 factors against the alternative that the number of factors r satisfies k0 < r ≤ k1 for some finite positive

integer k1 . Onatski also describes the asymptotic distribution of the test statistic with critical values tabulated. Onatski (2010) suggests to determine the number of factors from empirical distribution of eigenvalues of sample covariance matrix. Ahn and Horenstein (2013) exploit the fact that the r largest eigenvalues of the variance matrix of N response variables grow unboundedly as N increases, while the other eigenvalues remain bounded to estimate the number of factors. All of the above mention works consider the case of a fixed number of factors (r is fixed). The main difference between our paper and the existing work is that we consider the problem of determining the number of factors in a factor model where the number of factors is allowed to increase as N or T increases. Specifically, this paper is designed to provide an approach which enables one to estimate the number of factors consistently when the number of factors is allowed to increase as N, T → ∞. We extend the method of Bai and Ng (2002) to penalize the number of fac-

tors with a penalty function which is determined by the sample sizes, N and T, as well as the maximum possible number of factors allowed in the estimation. As the factors are 2

unobserved, the estimation procedure takes two steps. First, assuming the number of factors to be an arbitrary number 1 ≤ k ≤ k max , we estimate the factors ( Fbk ) using the principal components method, where k max = k max,N,T is the maximum number for pos-

sible number of factors, which is assumed to be greater or equal to the true number of

factors, whose value is determined by N and T and it increases as N, T increases. Second, we select the number of factors kˆ by minimizing a criterion modified from Bai and Ng (2002), which is a function of k and the estimated factors ( Fbk ). This criterion depends on the usual trade-off between good fit and parsimony. We show that this method produces

a consistent estimator of the number of factors r. However, simulation results show that the selected number of factor kˆ can be sensitive to the choice of k max and it tends to choose

a kˆ that is larger than r when k max is large. We propose using a new (‘mode’ based) selection procedure to overcome this problem so that the selected kˆ is not sensitive to different k max values used in practice. The rest of this paper is organized as follows. Section 2 sets up the model and presents the assumptions associated with the model. Section 3 presents the estimating procedures and the theoretical properties of the proposed estimators. Section 4 reports simulation experiments to examine the finite sample performances of our proposed method when r increases with N or T. Concluding remarks are given in Section 5. All the proofs are given in the Appendix.

2

Factor Models

We consider the problem of determining the number of factors (r) in a static approximate factor model, allowing r = r N,T → ∞, as N → ∞, or T → ∞, or both N, T → ∞, but with a slower rate than min{ N, T }, i.e., max{r/N, r/T } → 0, as N, T → ∞.

Let Xit denote the response variable for unit i at time t, for i = 1, . . . , N, and t = 1, . . . , T.

Our model is of the following form

0 1 Xit = √ λ0i Ft0 + eit , (1) r where Ft0 is an r × 1 vector of common factors, λ0i is the r × 1 vector of factor loadings,

and eit is the idiosyncratic error of the response variable Xit . The factors, factor loadings and idiosyncratic errors are not observed. Without loss of generality, we can assume that E( Xit ) = 0. If this is not the case, we can de-mean the data first. 3

0

Note that at the right-hand-side of our model (1), we divide λ0i Ft0 by



r. This is because √ we allow for r to diverge when N, T → ∞. If we do not divide λi Ft0 by r, then the 00

0

variance of the systematic part, λ0i Ft0 , is proportional to r and the variance of idiosyncratic

error eit is finite , the variance of noise part over the variance of information part will go 0

to zero, or equivalently, the goodness-of-fit R2 will converge to one. By dividing λ0i Ft0 by √ 0 r, we have Var (r −1/2 λ0i Ft0 ) = O(1) and we can obtain a reasonable goodness-of-fit that is not too close to one. Let tr( A) denote the trace of a square matrix A. The norm of a matrix A is defined as k Ak = [tr( A0 A)]1/2 . We use M1 to denote a generic positive constant and use N to denote the set of natural number. We make the main assumptions as follows: A SSUMPTION A (Factors and loadings): 1. For all t, r −2 Ek Ft0 k4 < M1 ; p

2. There exists a r × r positive definite matrix Σ F such that k T −1 ∑tT=1 Ft0 Ft0 − Σ F k − →0 0

as T → ∞;

3. max1≤i≤ N r −2 Ekλ0i k4 ≤ M1 < ∞; 4. Let Λ0 be the N × r factor loading matrix with its ith row given by λ0i . Then there 0

p

exists a r × r positive definite matrix D such that k N −1 Λ0 Λ0 − D k − → 0 as N → ∞;

5. Let λ0il and Ftl0 be the l th components (l = 1, ..., r) of λ0i and Ft0 , respectively. Then for all (i, t), E{[r −1/2 ∑rl=1 E(λ0il Ftl0 )]4 } ≤ M1 . T < ∞, A SSUMPTION B (Idiosyncratic Components): As N, T → ∞ and 0 < lim N,T →∞ N

1. For all i and t, E(eit ) = 0, E|eit |8 ≤ M1 ; 2. E( N −1 es0 et ) = E( N −1 ∑iN=1 eis eit ) = γ N (s, t), |γ N (s, s)| ≤ M1 for all s, and that T −1 ∑sT=1 ∑tT=1 |γ N (s, t)| ≤ M1 ; 3. E(eit e jt ) = τij,t with |τij,t | ≤ |τij | for some τij and for all t; furthermore, N −1 ∑iN=1 ∑ N j=1 | τij | ≤ M1 ;

4

T T 4. E(eit e js ) = τij,ts and ( NT )−1 ∑iN=1 ∑ N j=1 ∑t=1 ∑s=1 | τij,ts | ≤ M1 ;

5. for every (t, s), E| N −1/2 ∑iN=1 [eis eit − E(eis eit )]|4 ≤ M1 ; 6. We assume that there exist a T × T matrix L, a N × N matrix R, and a T × N matrix ε such that

e = LεR where L ( T × T ) and R ( N × N ) are arbitrary non-random positive definite ma-

trices, and ε = (ε ti ) is a T × N matrix consisting of independent elements with uniformly bounded 7th moment and E(ε it ) = 0.

A SSUMPTION C 1. Weak Dependence Between Factors and Idiosyncratic Components: 

2 

T N 1

1

E  ∑ √ ∑ Ft0 eit  ≤ M1 ;

N i=1 Tr t=1

2. Weak Dependence Between Factor Loadings and Idiosyncratic Components: 

2 

N N 1

1 0   E eit λi ≤ M1 .



Nr i∑ T t∑ =1 =1

Conditions in Assumption A are modified from Assumptions A-B in Bai and Ng (2002) by taking care of the fact that r → ∞ as N, T → ∞. It is easy to see that assumption A1

holds true if E[( Ftl0 )4 ] = O(1) for all t = 1, ..., T and for all l = 1, ..., r. A2 imposes a restric-

tion on the rate of r. For example with Σ F = T −1 ∑tT=1 E( Ft0 Ft0 ), it can be easily shown 0

T 0 , F0 F0 ) = O ( T ) for all l, m ∈ that A2 holds true if r = o ( T 1/2 ) and ∑tT=1 ∑s,t Cov( Ftl0 Ftm sl sm 0 {1, ..., r }. This is because E[|| T −1 ∑tT=1 Ft0 Ft0 −Σ F ||2 ] = T −2 ∑tT=1 ∑sT=1 ∑rl=1 ∑rm=1 E{[ Ftl0 Ftm 0

T 0 )][ F0 F0 − E ( F0 F 0 )]} = O (r 2 /T ) if for all l, m = 1, ..., r, it holds that T − E( Ftl0 Ftm ∑t=1 ∑s,t sl sm sl sm

0 − E ( F0 F0 )][ F0 F0 − E ( F0 F 0 )]} = O ( T ). Similarly, A3 holds true if E [( λ0 )4 ] = E{[ Ftl0 Ftm tl tm sl sm sl sm il

O(1) for all i = 1, ..., N and for all l = 1, ..., r. A4 is similar to A2, it holds true with D = N −1 ∑iN=1 E(λ0i λ0i ), if r = o ( N 1/2 ) and ∑iN=1 ∑ Tj,i Cov(λ0il λ0jm , λ0jl λ0jm ) = O( N ) for all 0

l, m ∈ {1, ..., r }. A5 requires that λ0il Ftl0 is a weakly dependent process in l because we

allow for r → ∞.

Conditions in Assumption B are basically the same as Assumption C in Bai and Ng

(2002) because the idiosyncratic error eit is unrelated to r whether r is finite or is allowed to diverge to infinity with the sample size. In particular, B5 is similar to A5 in that it 5

assumes that, for all (t, s), eit eis is a weakly dependent process in i. Assumption B6 puts a structure on the idiosyncratic components. This structure allows heteroscedasticity in both the time and cross-section dimensions, and also limited autocorrelation and crosssectional correlation in the components. Finally, assumption C is similar to assumption D in Bai and Ng (2002) except that we √ modified it by dividing the quantity by r as r is allowed to diverge as N and T tend to infinity. They allow for limited time-series and cross-section dependence in idiosyncratic component and also weak dependence between factors (factor loadings) and idiosyncratic errors.

3 Estimating the Common Factors and the Number of Factors Following Bai and Ng (2002), we estimate the common factor in a large panel by the principal components method. For k ∈ {1, . . . , k max }, where k max is allowed to increase at a slower speed than min{ N, T } such that k max = o (min{ N 1/3 , T }). Let λik and Ftk denote

k × 1 vectors of the loadings and factors with the allowance of k factors in the estimation.

The method of principal components minimizes 0 1 1 N T (2) V (k ) = min ( Xit − √ λik Ftk )2 ∑ ∑ k k Λ ,F NT i =1 t=1 k 0 0 over 1 ≤ k ≤ k max , subject to the normalization of either Λk Λk /N = Ik or F k F k /T = Ik , where Λk and F k are the N × k and T × k factor loading and factor matrices, respectively.

Let ev(i) ( A) denote the ith largest eigenvalue of matrix A, and EV(i) ( A) is the eigen-

vector corresponding to the eigenvalue ev(i) ( A) of the matrix A. If one concentrates 0

out Λk and uses the normalization that F k F k /T = Ik . The estimated factor matrix is √ √ √ ˜ k0 = k ( F˜ k0 F˜ k )−1 F˜ k0 X = k F˜ k0 X/T F˜ k = T (EV(1) ( XX )0 , . . . , EV(k) ( XX 0 )). Given F˜ k , Λ is the corresponding matrix of factor loadings. On the other hand, if one concentrates 0

out F k and uses the normalization that Λk Λk /N = Ik , the solution to the above problem √ ¯ k ), where Λ ¯ k = N (EV(1) ( X 0 X ), . . . , EV(k) ( X 0 X )). The normalization is given by ( F¯ k , Λ √ 0 ¯ k /N. that Λk Λk /N = Ik implies F¯ k = kX Λ 0 Define Fbk = F¯ k ( F¯ k F¯ k /T )1/2 , a rescaled estimator of the factors. This rescaled estimator has the asymptotic properties summarized in the following theorem.

Proposition 1. Under the assumptions A - C, for any 1 ≤ k ≤ kmax = o (min{ N 1/3 , T }) there 6

exists a (r × k ) matrix H k with rank = min{k, r } such that  3  

2 1 T k r k3

ˆk k0 0 , .

Ft − H Ft = O p max T t∑ N T =1

(3)

Similar to the results of Bai and Ng (2002), Proposition 1 suggests that the time average of the squared deviations between the estimated factors Fbk and those that lie in the true 0

factor space, H k Ft0 , will vanish as N, T → ∞. However, the convergence rate depends on

not only the panel structure N and T, but also the factor structure r and k.

Given the results of Proposition 1, we can now analyze the problem of determining the number of factors. Let V (k, F k ) = minΛ

1 NT

∑iN=1 ∑tT=1 ( Xit −

1 √

0 λ k F k )2 k i t

be the sum of

squared residuals (divided by NT), where the residuals are from regression models of regressing Xi on the k factors for all i = 1, . . . , N, and Xi = ( Xi1 , Xi2 , . . . , XiT )0 is a T × 1 vector of time-series observations for the ith cross-section unit. The selecting criterion

modified from those suggested by Bai and Ng (2002) has the form PC (k ) = V (k, Fbk ) + kg( N, T ),

(4)

where g( N, T ) is the penalty factor satisfying two conditions: (i) k max · g( N, T ) → 0 as o  n 3 5/2 k√ −1 max max k√ N, T → ∞, (ii) CN,T,k , . g ( N, T ) → ∞ as N, T → ∞, where C = O max p N,T,k max max N T As V (k, Fbk ) is decreasing in k, the criterion above penalizes k with a penalty factor kg( N, T ) to select the estimator kˆ such that asymptotically under and overparameterized models will not be chosen. Theorem 1 formally establishes this result.

Theorem 1. Let 1 ≤ r ≤ k max = o (min{ N 1/17 , T 1/16 }) and kˆ = argmin1≤k≤kmax PC (k ). −1 Suppose that Assumptions A-C hold, and that (i) k max · g( N, T ) → 0, (ii) CN,T,k · g( N, T ) → max

∞ as N, T → ∞. Then

lim Prob[kˆ = r ] = 1.

N,T →∞

(5)

A formal proof of Theorem 1 is provided in the Appendix. Conditions (i) and (ii) together define the type of penalty factor that should vanish at an appropriate rate. They are sufficient conditions for estimation consistency so that they may not always be required for consistent estimating the number of factors.

7

Remark 1. Since we often need to divide some quantities by r, we rule out the case that r = 0. Allowing for r = 0 in our framework will complicate the regularity conditions, notations and proofs. Therefore, we did not consider the case that r = 0 in our paper. The r = 0 case is covered in Bai and Ng’s (2002). Their procedure can be used to select the number of factors even when the true number of factors is 0. We also conducted some simulations which show that both Bai and Ng’s (2002) original method and the modified method proposed in our paper work well when r = 0. Note that the condition imposed in k max is asymmetric in ( N, T ). This result is induced by Proposition 1. The details can be found in the proof of Proposition 1 given in the appendix. As a referee correctly points out, if in the proof of Theorem 1, instead of us

2  n 3 3 o 0

ing the result of Proposition 1 that T −1 ∑tT=1 Fˆtk − H k Ft0 = O p max kNr , kT , one

2 o  n 3 3 0

may use N −1 ∑iN=1 λˆ ik − H˜ k λ0i = O p max kN , kTr ,1 where H˜ k is a r × k matrix with rank( H˜ k ) = min{r, k}. Then the condition that 1 ≤ r ≤ k max = o (min{ N 1/17 , T 1/16 }) in

Theorem 1 will be replaced by 1 ≤ r ≤ k max = o (min{ N 1/16 , T 1/17 }). The result is still asymmetric in N and T, but the roles of N and T are exchanged.

In fact, it is possible to obtain a symmetric result (of k max in N and T) under some stronger regularity conditions, i.e., one can obtain a symmetric condition for k max as 1 ≤

r ≤ k max = o (min{ N 1/16 , T 1/16 }) in Theorem 1 under some stronger assumptions. We

state this result in the following proposition.

Proposition 2. Under the same conditions as in Proposition 1 except that we strength some conditions as follows: (i) λil is non-random with λil ≤ λ¯ < ∞ for all i = 1, ..., N and l = 1, ..., r;

0 ) = 0 for all (ii) E(eit e jt ) = 0 for all t ∈ {1, ..., T } and for all j , i, i, j ∈ {1, ..., N }, E( Ftl0 Ftm

t ∈ {1, ..., T } and for all m , l, l, m ∈ {1, ..., r }; (iii) eit and Fs0 are independent with each other

for all i, t and s. Then

1 T

  

2

1 1

ˆk k0 0 3 ∑ Ft − H Ft = O p k N + T . t =1 T

(6)

The proof of Proposition 2 is given in the appendix. Under Proposition 2, the condition 1 ≤ r ≤ k max = o (min{ N 1/17 , T 1/16 }) in Theorem 1 can be replaced by 1 ≤ r ≤ k max = 1 This

result can be proved similar to the proof of Proposition 1, its proof is available from the authors upon request.

8

o (min{ N 1/16 , T 1/16 }). That is, we obtain a condition on k max that is symmetric in N and T.

Remark 2. The zero correlation assumption on Ftl used in Proposition 2 is quite strong. However, it can be replaced by some weakly dependence assumptions such as ρ-mixing or β-mixing processes with mixing coefficients decay to zero at certain rates. But this will make the presentation (regarding regularity conditions) as well as the proofs of Proposition 2 much longer. Therefore, we will not pursue a proof of Proposition 2 under weak regularity conditions in this paper. Similar to Bai and Ng (2002) we have the following Corollary. Corollary 1. Under the Assumptions of Theorem 1, if one replaces PC (k ) in Theorem 1 by the class of criterion defined by

  IC (k ) = ln V (k, Fbk ) + kg( N, T ),

then the conclusion of Theorem 1 holds true.

Corollary 3.1 states that the class of criterion IC (k ) can also be used to consistently estimate the number of factors in factor models where the number of factors possibly increases with the sample size. Let b σ2 be a consistent estimate of ( NT )−1 ∑iN=1 ∑tT=1 E(eit )2 . Bai and Ng (2002) general-

ize the C p criterion of Mallows (1973) and suggest three PC p criteria as follows:     NT k 2 N+T b PC p1 (k ) = V (k, F ) + k · b σ ln , NT N+T   k 2 N+T b PC p2 (k ) = V (k, F ) + k · b σ ln(min{ N, T }), NT   k 2 ln(min{ N, T }) b PC p3 (k ) = V (k, F ) + k · b σ . (7) min{ N, T } It is easy to check that these criteria satisfy the two conditions for the penalty fac 1/6  tor in Theorem 1 if k max = o ln NNT . The three criteria have different finite+T

sample properties while they are asymptotically equivalent. In applications, Bai and Ng (2002) suggest to replace σˆ 2 with V (k max , Fbkmax ) = ( NT )−1 ∑ N ∑tT=1 eˆ2 , where eˆit = i =1

0

it

k b Xit − λ max Fbtkmax for i = 1, . . . , N and t = 1, . . . , T, the residuals for the linear regression k i of X on Fbkmax . Thus, the number of factors estimated using these three criteria may be 1 √

sensitive to the selection of k max . We will propose a method that avoids the sensitivity of selected kˆ depending on k max . 9

Corollary 3.1 suggests the following three IC p criteria can also be used to select the number of factors: IC p1 (k ) IC p2 (k ) IC p3 (k ) The main advantage

   N+T NT = ln V (k, F ) + k · ln , NT N+T     N + T k = ln V (k, Fb ) + k · ln(min{ N, T }), NT     ln ( min { N, T }) k = ln V (k, Fb ) + k · . (8) min{ N, T } of the three criteria given in (8) is that the scaling factor σˆ 2 is 

bk





automatically removed by the logarithmic transformation. We do not need to estimate σ2

before selecting the number of factors. Therefore, the number of factors estimated using IC p criteria is insensitive to the selection of k max . As the estimated kˆ using PC p criteria may be sensitive to k max , the selection of k max is an important issue in practice. Bai and Ng (2002) suggest to select k max by setting k max = 8[(min{ N, T }/100)1/4 ] where [ A] denotes the integer part of a real number A.

But their theoretical result does not cover this case as this k max increases (without bound) with N and T. Using some ad-hoc rules to select k max may lead to k max < r, which will lead to an underestimation of the number of factors because if k max < r, then we will have kˆ ≤ k max < r. On the other hand, if k max is too large (k max >> r), simulations show that the selected kˆ tends to overestimate r (kˆ > r ). We propose a new procedure to resolve this problem. We propose to let k max take a wide range of values. For each value of k max , we select a kˆ kmax that minimizes the PC p criteria. We then select the value of kˆ that appears most times among the different kˆ k values, i.e., we select the mode of kˆ k (over max

max

a wide range of k max ). We use a specific example to illustrate this selection procedure. We generate a simulated data of N = 200, T = 60 with the true number of factors r = 7. We let k max take values from {1, 2, ..., 40}. For each different 1 ≤ k max ≤ 40, we select a kˆ kmax by minimizing PC p1 criterion. The result is presented in Figure 1. From Figure 1 we observe that when k max < r = 7, we select kˆ = k max < 7 as expected; when 7 ≤ k max ≤ 16, we select kˆ = 7; when k max > 16, the selected kˆ > 7. Moreover, kˆ increases with k max . We also notice that kˆ = 7 is selected ten times (when k max = 7, 8, ..., 16), while all the other values are chosen no more than three times. For example, when 17 ≤ k max ≤ 19, the selected kˆ kmax = 8, i.e., kˆ kmax = 8 is selected three times. According to our selection rule, kˆ = 7 is selected because kˆ = 7 appears most times (10 times). 10

ˆ max curves for different N, T and r values. We see that although kˆ Figure 2 plot k-k increases with k max for most cases, our proposed procedure can select the correct number of factors because kˆ k takes value r more often than taking any other values for all max

cases reported in Figure 2. Hence, our proposed procedure of selecting kˆ is not sensitive to k max provided that one let k max take a wide range of values. Therefore, we suggest letting k max to take values in {1, 2, ..., [6 log(max( N, T ))]} where [ A] denotes the integer

part of a real number A. [6 log(max( N, T ))] is around 41, 45 and 55 when max( N, T )

= 1000, 2000 and 10000. This setting is also consistent with our simulation since we let r = [1.5 log(max( N, T ))] in our simulations in section 4.

4 Simulations In this section we conduct Monte Carlo simulations to investigate how our modified criteria of Bai and Ng (2002) perform when the number of factors is allowed to increase with N or T. For simplicity of the comparison with the simulation results in Bai and Ng (2002), we first fix T and allow N and r to increase. When T is fixed as 60, we let N = 100, 200, 500, 1000, 2000 and r = [1.5 log( N )], where [ A] denotes the integer part of a real number A; for T = 100, we let N = 40, 60, 100, 200, 500, 1000, 2000 and r = [1.5 log( N )]. The simulation results for this case are reported in the upper part of each table for each data generating process (DGP). Next, we check the performance of the criteria when N is fixed and T keeps increasing. When N = 100, we let T = 40, 60,100, 200, 500, 1000, 2000 and r = [1.5 log( T )]; when N = 60, we let T = 100, 200, 500, 1000, 2000 and r = [1.5 log( T )]. The simulation results for this case are reported in the lower part of each table for each DGP. We replicate the suggested estimating procedure 1000 times and the reported results are the averages of kˆ over 1000 replications. The data generating processes (DGP) have the following form: 1 r Xit = √ ∑ λij Ftj + eit , r j =1 where λij ∼ i.i.d. N (0, 1), Ftj ∼ i.i.d. N (0, 2).

We consider three DGPs here. In the base case, we set the DGP as eit ∼ i.i.d.N (0, 1). This

base DGP is denoted as DGP1. The simulation results for this case are reported in Table 1. We see that all information criterion give precise estimates of the number of factors.

11

Table 1: Estimated Number of Factors: DGP1 N T r PC p1 PC p2 PC p3 IC p1 IC p2 IC p3 100 60 6 6 6 6 6 6 6 200 60 7 7 7 7 7 7 7 500 60 9 9 9 9 9 9 9 1000 60 10 10 10 10 10 10 10 2000 60 11 11 11 11 11 11 11 40 100 5 5 5 5 5 5 5 60 100 6 6 6 6 6 6 6 100 100 6 6 6 6 6 6 6 200 100 7 7 7 7 7 7 7 500 100 9 9 9 9 9 9 9 1000 100 10 10 10 10 10 10 10 2000 100 11 11 11 11 11 11 11 100 40 5 5 5 5 5 5 5 100 60 6 6 6 6 6 6 6 100 100 6 6 6 6 6 6 6 100 200 7 7 7 7 7 7 7 100 500 9 9 9 9 9 9 9 100 1000 10 10 10 10 10 10 10 100 2000 11 11 11 11 11 11 11 60 100 6 6 6 6 6 6 6 60 200 7 7 7 7 7 7 7 60 500 9 9 9 9 9 9 9 60 1000 10 10 10 10 10 10 10 60 2000 11 11 11 11 11 11 11 DGP1: Xit = √1 r ∑rj=1 λij Ftj + eit ; r = [c ∗ ln ( N )] for the upper part of the table, and r = [c ∗ ln ( T )] for the lower part, where c=1.5, and [ A] denotes the integer part of a real number A. For the heterogeneity case of DGP2, we set the idiosyncratic shocks to be heterogeneous. We let eit = uit + δt eit where uit ∼ i.i.d.N (0, 1), eit ∼ i.i.d.N (0, 1), and δt = 0

for even t, δt = 1 for odd t. Thus the variance of the idiosyncratic shocks is 1 when t is odd and 2 when t is even. We denote this DGP as DGP2. The estimated values of kˆ

are reported in Table 2 where the boldfaced numbers indicate incorrect selection of the number of factors. Similar to the homogeneous cases, PC p1 , PC p2 , and PC p3 perform well under all kinds of combinations of N and T. The other three criteria ICP1 and IC p2 , and IC p3 also perform well in general, although occasionally they may select kˆ that is slightly smaller than the true number of factors r when sample size is small. 12

Table 2: Estimated number of Factors: Heterogeneity N T r PC p1 PC p2 PC p3 IC p1 IC p2 IC p3 100 60 6 6 6 6 5 5 6 200 60 7 7 7 7 7 7 7 500 60 9 9 9 9 8 8 8 1000 60 10 10 10 10 10 10 10 2000 60 11 11 11 11 11 11 11 40 100 5 5 5 5 5 5 5 60 100 6 6 6 6 6 6 6 100 100 6 6 6 6 6 6 6 200 100 7 7 7 7 7 7 7 500 100 9 9 9 9 9 9 9 1000 100 10 10 10 10 10 10 10 2000 100 11 11 11 11 11 11 11 100 40 5 5 5 5 5 5 5 100 60 6 6 6 6 5 5 6 100 100 6 6 6 6 6 6 6 100 200 7 7 7 7 7 7 7 100 500 9 9 9 9 9 9 9 100 1000 10 10 10 10 10 10 10 100 2000 11 11 11 11 11 11 11 60 100 6 6 6 6 6 6 6 60 200 7 7 7 7 7 7 7 60 500 9 9 9 9 9 9 9 60 1000 10 10 10 10 10 10 10 60 2000 11 11 11 11 11 11 11 DGP2: Xit = √1 r ∑rj=1 λij Ftj + eit ; eit = uit + δt eit , where δt = 0 for t even, and δt = 1 for t odd; r = [c ln ( N )] for the upper part of the table, and r = [c ln ( T )] for the lower part, where [ A] denotes taking the integer part of a real number.

13

Table 3: Estimated number of Factors: Autocorrelation N T r PC p1 PC p2 PC p3 IC p1 IC p2 IC p3 100 60 6 6 6 6 6 6 6 200 60 7 7 7 7 7 7 7 500 60 9 9 9 9 9 9 9 1000 60 10 10 10 10 10 10 10 2000 60 11 11 11 11 11 11 11 40 100 5 5 5 5 5 5 5 60 100 6 6 6 6 6 6 6 100 100 6 6 6 6 6 6 6 200 100 7 7 7 7 7 7 7 500 100 9 9 9 9 9 9 9 1000 100 10 10 10 10 10 10 10 2000 100 11 11 11 11 11 11 11 100 40 5 5 5 5 5 5 5 100 60 6 6 6 6 6 6 6 100 100 6 6 6 6 6 6 6 100 200 7 7 7 7 7 7 7 100 500 9 9 9 9 9 9 9 100 1000 10 10 10 10 10 10 10 100 2000 11 11 11 11 11 11 11 60 100 6 6 6 6 6 6 6 60 200 7 7 7 7 6 6 7 60 500 9 9 9 9 9 9 9 60 1000 10 10 10 10 10 10 10 60 2000 11 11 11 11 11 11 11 DGP3: Xit = √1 r ∑rj=1 λij Ftj + eit ; eit = ρeit−1 + vit ; ρ = 0.5; r = [c ln ( N )] for the upper part of the table, and r = [c ln ( T )] for the lower part, where [ A] denotes taking the integer part of a real number.

14

For the last case, denoted as DGP3, we allow the idiosyncratic to be autocorrelated. We set eit = ρeit−1 + vit , where ρ = 0.5 and vit ∼ i.i.d.N (0, 1). The estimation results are reported in Table 3. The results for this case are almost the same as those of the base

case except for ( N, T ) = (60, 200) with r = 7, IC p1 and IC p2 select r = 6. All other four information criteria perform quite well in accurately estimating the number of factors for all ( N, T ) combinations for DGP3. Summarizing the results for all the DGPs we observe that PC p1 , PC p2 , and PC p3 have the best overall performance. IC p 1, IC p 2, and IC p3 perform well when the sample size is large (min{ N, T }>100).

5

Concluding Remarks

In this paper, we consider the problem of determining the number of factors in large factor models where the number of factors is allowed to increase, but with a slower rate, as N or T increases. We extend the analysis of Bai and Ng (2002) to the case that number of factors can increase with the sample size and prove the consistency of a modified Bai and Ng’s (2002) procedure in determining the number of factors. We also propose a (‘mode’ based) new procedure so that our selected number of factors is not sensitive to the choice of k max . Monte Carlo simulation results suggest that the criteria PC p1 , PC p2 and PC p3 all have the overall best performance. Other criteria such as IC p1 , IC p2 and PC p3 can also be used to accurately estimate the number of factors when the data dimensions are relatively large, say min{ N, T } ≥ 100. One possible future research topic is to find alternative criteria

that can improve the finite-sample performance of Bai and Ng’s (2002) procedure and our modified procedure such that the new criteria can accurately determine the number of factors even in small or medium size samples.

15

Appendix A: Proofs Proof of Proposition 1 We will first prove a lemma (Lemma 1) below which will be used in proving Proposition 1. L EMMA 1 Under Assumptions A-C, we have for some positive constant 0 < M2 < ∞, and for all N and T, T T 2 ∑ s=1 ∑t=1 γ N (s,t) ≤ M2 ; 2  T T N 1 1 (ii) E T2 ∑s=1 ∑t=1 N ∑i=1 Xit Xis ≤ M2 ;

(i)

1 T

Proof. : (i) See the proof of Lemma 1 (i) in Bai and Ng (2002). 0

(ii) If suffices to prove that for all (i, t) that E( Xit4 ) ≤ M. Now E( Xit4 ) ≤ 8r −2 E[(λ0i Ft0 )4 ] +

8E(eit4 ) ≤ 16M1 by assumption A5 and B1.



Proof. of Proposition 3.1:

√ k k 0 ˜k ˜ k and Λ ˜k = ˜ k0 ˜ k X Λ N T X F . From the normalization F F /T 0 ( Tk)−1 ∑tT=1 k F˜tk k2 = 1. Following Bai and Ng (2002), using H k √

Recall that Fˆ k = Ik , we also have

= =

( F˜ k F0 /T )(Λ0 Λ0 /N ), we have 0 k T k T k T k T Fˆtk − H k Ft0 = ∑ F˜sk γ N (s, t) + ∑ F˜sk ζ st + ∑ F˜sk ηst + ∑ F˜sk ξ st , T s =1 T s =1 T s =1 T s =1 √ √ 0 0 0 0 where ζ st = es0 et /N − γ N (s, t), ηst = Fs0 Λ0 et /( N r ), and ξ st = Ft0 Λ0 es /( N r ) = ηts . 0 Because ( x + y + z + u)2 ≤ 4( x2 + y2 + z2 + u2 ), k Fˆtk − H k Ft0 k2 ≤ 4( at + bt + ct + dt ),



2

2

2

k2 T k2 T k2 T k k k ˜ ˜ ˜ where at = T2 ∑s=1 Fs γ N (s, t) , bt = T2 ∑s=1 Fs ζ st , ct = T2 ∑s=1 Fs ηst and dt =

2 0 k2 T k ξ . It follows that (1/T ) T k F ˜ F



∑t=1 ˆtk − H k Ft0 k2 ≤ (4/T ) ∑tT=1 ( at + bt + ct + dt ). st 2 s =1 s T     By Cauchy’s inequality, we have k∑sT=1 F˜sk γ N (s, t)k2 ≤ ∑sT=1 k F˜sk k2 · ∑sT=1 γ N (s, t)2 . 0

0

Thus,

! ! T T kk2 1 T ˜k 2 1 2 ∑ at ≤ T Tk ∑ k Fs k · T ∑ ∑ γN (s, t) t =1 s =1 t =1 s =1  3 k = Op T 0 by Lemma 1(i) and the fact that ( Tk)−1 ∑tT=1 k F˜tk k2 = 1 (this follows from F˜ k F˜ k /T = Ik ). 1 T

T

16

For bt , we have that 1 T

T

∑ bt

t =1

k2 = T3

=

k2 T3

2

T

k ˜ ∑

∑ Fs ζ st

t =1 s =1 T

T

T

T

∑ ∑ ∑ F˜sk F˜uk ζ st ζ ut 0

t =1 s =1 u =1

!1/2  !2 1/2 T T T 1  ∑ ∑ ( F˜s F˜uk )2 ∑ ζ st ζ ut  2 ∑ ∑ T s =1 u =1 s =1 u =1 t =1  ! !2 1/2 T T T T 3 1 1 k k F˜sk k2  2 ∑ ∑ ∑ ζ st ζ ut  ≤ ∑ T Tk s=1 T s =1 u =1 t =1  !2 1/2 T T T 1 = k3  4 ∑ ∑ ∑ ζ st ζ ut  T s =1 u =1 t =1  3 k , = Op N   2 1/2 T T T 1 where the last equality follows from T4 ∑s=1 ∑u=1 ∑t=1 ζ st ζ ut = O p ( N −1 ) as k2 ≤ T

1 T2

T

T

k0

shown in the proof of Theorem 1 of Bai and Ng (2002).

From E( T −1 ∑tT=1 ζ st ζ ut )2 = E( T −2 ∑tT=1 ∑vT=1 ζ st ζ ut ζ sv ζ uv ) ≤ maxs,t E|ζ st |4 and 4 1 N 1 1 E|ζ st |4 = 2 E √ ∑ (eit eis − E(eit eis )) ≤ 2 M1 N N i =1 N by Assumption B5, we have r  3 1 T2 1 T k 3 bt ≤ O p ( k ) . = Op ∑ 2 T t =1 T N N For ct , we have

2

T k2

k ˜ ct =

∑ Fs ηst 2

T s =1

2

T √ 0 0 k2

k 0 0 ˜ = F F Λ e /N r

t s s

T 2 s∑ =1 ! ! r T k2 0 0 √ 2 k T ˜ k 2 ≤ ke Λ / r k k Fs k ∑ k Fs0 k2 Tk s∑ Tr N2 t =1 s =1 k2 0 0 √ 2 ke Λ / r k O p (kr ) N2 t 1 1 because Tk ∑sT=1 k F˜sk k2 = 1 and Tr ∑sT=1 k Fs0 k2 = O p (1).

=

17

It follows that 1 T

T

∑ ct

t =1

k2 1 = O p (kr ) NT  3  k r = Op N

0 0 2

eΛ ∑

√t Nr

T

t =1

0 0 2

eΛ ∑tT=1 √t Nr = O p (1) by assumption C2.  3  T The term (1/T ) ∑t=1 dt = O p kNr can be proved similarly. Combining the above

because

1 T

results, we have shown that T

T

0 (1/T ) ∑ k Fˆtk − H k Ft0 k2 ≤ (4/T ) ∑ ( at + bt + ct + dt )

t =1

t =1  3 k r

 k3 + Op . = Op N T Alternatively, Proposition 3.1 can be proved by concentrating out Ft . Following the 



similar steps, we can show that N

N

(1/N ) ∑ kλˆ ik − H k λ0i k2 ≤ (4/N ) ∑ ( ai + bi + ci + di ) 0

i =1

= Op



i =1  3 k

N

+ Op



k3 r T



. 

Proof of Proposition 2 Proof. From the proof of Proposition 1 we know that T 1 T ˆk k0 0 2 k F − H F k ≤ ( 4/T ) ∑ ( a t + bt + c t + d t ) t t T t∑ =1 t =1

and that T −1 ∑tT=1 at = O p (k3 /T ) and T −1 ∑tT=1 dt = O p (k3 /N ). Therefore, we only need

to show that T −1 ∑tT=1 ct = O p (k3 /N ) and T −1 ∑tT=1 dt = O p (k3 /N ). Since the proofs are similar. We will only prove for the term related to ct . For ct , we have ct

2

T

√ 0 0

∑ F˜sk Fs0 Λ0 et /N r

s =1

! ! k2 1 1 T ˜ k 2 1 T 00 00 2 ≤ k Fs k k Fs Λ et k N r T s∑ TN s∑ =1 =1 ! T 0 0 1 1 = O p (k3 /N ) k Fs0 Λ0 et k2 r TN s∑ =1 k2 = T2

because T −1 ∑sT=1 k F˜sk k2 = O p (k ).

18

de f

Next, we show that A = ( TN )−1 ∑sT=1 k Fs0 Λ0 et k2 = O p (r ). 0

1 E(| A|) = NT 1 NT

=

1 NT

=

T

0

0

r

N

0

∑ Ek Fs0 Λ0 et k2

s =1 T r

N

0 )Λ0il λ0jm ∑ ∑ ∑ ∑ ∑ E(eit e jt )E( Fsl0 Fsm

s =1 l =1 m =1 i =1 j =1 T

r

N

∑ ∑ ∑ E(eit2 )E(( Fsl0 )2 )(Λ0il )2

s =1 l =1 i =1

= O (r ), 0 ) = 0 because of the zero correlation assumptions that E(eit e jt ) = 0 for j , i and E( Fsl0 Fsm

for m , l. This implies that A = O p (r ). Hence, ct = O p (k3 /N ). This completes the proof of Proposition 2.



From the above proof we can see that the conclusion of Proposition 2 still holds true if the zero correlation assumptions are replaced by some weakly dependent assumptions −1 r 0 c such as N −1 ∑iN=1 ∑ N ∑l =1 ∑rm,l E( Fsl0 Fsm 2,lmlm ) = O (1), j,i E ( eit e jt c1,ijlm ) = O (1) and r

where c1,ijlm and c2,ijlm are some bounded sequences of non-random numbers depending on i, j, l, m.

Proof of Theorem 3.1 0 0 0 L EMMA 2 Let Dk = Fˆ k Fˆ k /T and D0 = H k F0 F0 H k /T. When k ≤ r, we have (i) k Dk−1 k =  n 4 1.5 4 o k√ r k √r , max . O p (k); (ii) k Dk−1 − D0−1 k = O p max max

N

T

Proof. : Following Bai and Ng (2002), we have 0 0 0 Fˆ k Fˆ k H k F0 F0 H k Dk − D0 = − T T T 0 0 0 1 = [ Fˆtk Fˆtk − H k Ft0 Ft0 H k ] ∑ T t =1

=

1 T

+

T

1

T

∑ ( Fˆtk − H k Ft0 )( Fˆtk − H k Ft0 )0 + T ∑ ( Fˆtk − H k Ft0 ) Ft0 H k

t =1 T

1 T

0

0

t =1

∑ H k Ft0 ( Fˆtk − H k Ft0 )0 . 0

0

t =1

19

0

0

Hence, we have 1 k Dk − D0 k ≤ T

T

∑ k Fˆtk − H

t =1



= O p max 





k0

Ft0 k2 + 2 3 

k3 r k , N T

1 T

T

∑ k Fˆtk − H

t =1

+ O p max

k0

(√

Ft0 k2

!1/2

1 T

T

∑ kH

t =1

k0

Ft0 k2

√ )! √  k3 r k3 √ , √ · Op kr2 N T

!1/2

 k2 r1.5 k2 r = O p max √ , √ N T 0 1 by Proposition 3.1 and the fact that T ∑tT=1 k H k Ft0 k2 = O p (kr2 ), which is shown below. From weakly dependent process of Ft0 , it is easy to show that " #   T 0 0 1 1 T 1 k 0 2 k 0 2 k H Ft k − E ∑ k H Ft k = O p √T . T t∑ T =1 t =1

Also, one can easily show that || Dk−1 || = O p (k ). Then from Dk−1 − D0−1 = Dk−1 ( D0 −

Dk ) D0−1 , we have

k Dk−1 − D0−1 k = k Dk−1 ( Do − Dk ) D0−1 k

≤ k Dk−1 k · k D0 − Dk k · k D0−1 k −1 2 k Dk k

k D0−1 k · k D0 − Dk k · = k k k    2 1.5 k r k2 r 2 = k · O p (1) · O p max √ , √ N T   4 1.5 4  k max r k max r √ , √ = O p max . N T



L EMMA 3 For 1 ≤ k ≤ r, and the H k defined in Proposition 3.1, we have   5 3.5 5 3  k max r k r k 0 k ˆ √ √ V (k, F ) − V (k, F H ) = O p max , max . N T Proof. : For the true factor matrix with r factors and H k defined in Proposition 3.1, let 0 denote the idempotent matrix spanned by null space of F0 H k , with M0FH = I − PFH  0 0  −1 0 0 0 0 PFH0 = F0 H k H k F0 F0 H k H k F0 . Correspondingly, let MFkˆ = IT − Fˆ k ( Fˆ k Fˆ k )−1 Fˆ k =

20

IT − PFkˆ . Then

1 V (k, Fˆ k ) = NT V (k, F0 H k ) =

N

∑ Xi0 MFkˆ Xi ,

i =1

1 NT

N

∑ Xi0 M0FH Xi ,

i =1

1 N 0 0 k 0 k ˆ V (k, F ) − V (k, F H ) = X i ( PFH − PFkˆ ) X i . ∑ NT i=1 0 0 0 k k Following Bai and Ng (2002), let Dk = Fˆ Fˆ /T and D0 = H k F0 F0 H k /T. Then ! −1 ! −1 0 k 0 F 00 F 0 H k 0 0 0 1 ˆ k Fˆ k Fˆ k 1 H k 0 k 0 k PFˆ − PFH = F Fˆ − F H H k F0 T T T T 0 0 1 ˆ k 0 −1 ˆ k [ F Dk F − F0 H k D0−1 H k F0 ] T i 0 0 1 h ˆk = ( F − F0 H k + F0 H k ) Dk−1 ( Fˆ k − F0 H k + F0 H k )0 − F0 H k D0−1 H k F0 T 0 0 1 ˆk = [( F − F0 H k ) Dk−1 ( Fˆ k − F0 H k )0 + ( Fˆ k − F0 H k ) Dk−1 H k F0 T 0 0 + F0 H k Dk−1 ( Fˆ k − F0 H k )0 − F0 H k D0−1 H k F0 ].

=

0 ) X = I + I I + I I I + IV. We consider each term in turn. Thus, N −1 T −1 ∑iN=1 X i0 ( PFkˆ − PFH i

I =

1 NT 2

N

T

T

∑ ∑ ∑ ( Fˆtk − H k Ft0 )0 Dk−1 ( Fˆsk − H k Fs0 )Xit Xis 0

0

i =1 t =1 s =1

!1/2  0 0 1 1 ( Fˆtk − H k Ft0 )0 Dk−1 ( Fˆsk − H k Fs0 ) · 2 ≤ ∑ ∑ 2 T t =1 s =1 T ! 0 1 T ˆk k Ft − H k Ft0 k2 · k Dk−1 k · OP (1) ≤ T t∑ =1    3 k r k3 , · k · O p (1) = O p max N T   4  k r k4 = O p max , N T by Proposition 3.1, Lemma 1(iii) and Lemma 2(i). T

T

21

∑∑

t =1 s =1

!2 1/2 1 Xit Xis  N i∑ =1 N

II =

1 NT 2

N

T

T

∑ ∑ ∑ ( Fˆtk − H k Ft0 )0 Dk−1 H k Fs0 Xit Xis 0

0

i =1 t =1 s =1

!1/2  0 0 1 1 1 · 2 ∑ ∑ ≤ k Fˆtk − H k Ft0 k2 · k H k Fs0 k2 · k Dk−1 k2 ∑ ∑ 2 T t =1 s =1 T t =1 s =1 N !1/2 !1/2 2 T 0 0 1 T ˆk kr ≤ k Ft − H k Ft0 k2 k H k Fs0 k2 · k Dk−1 k · · O p (1) 2 ∑ T t∑ Tkr =1 s =1 ( 1/2  3 1/2 )! k k3 r , · k · k1/2 r · O p (1) = O p max N T   3 1.5 3  k r k r . = O p max √ , √ N T  n 3 1.5 3 o k r . Similarly, one can verify that I I I is also O p max k√r , √ T

T

N

IV =

1 NT 2

N

T

T

0

0

∑ ∑ ∑ Ft0 H k ( Dk−1 − D0−1 ) H k Fs0 Xit Xis

1 ≤ k Dk−1 − D0−1 k N



i =1

T

i =1 t =1 s =1

k Dk−1

!2 1/2 ∑ Xit Xis  N



N

1 T



i =1

T

∑ kH

t =1

k0

Ft0 k · | Xit |

T 2 N 0 1 −1 kr √ k H k Ft0 k D0 k N i=1 T kr t=1 D0−1 k · kr2 · O p (1)  4 4.5 4 4  k r k r

= k Dk−1 −  = O p max





!2

!2

, , √ N T  n 3 2.5 3 2 o where we used k Dk−1 − D0−1 k = O p max k√r , k√r by Lemma 2 (ii).



N

T

Thus, we have

  5 3.5 5 3  k max r k r k 0 k ˆ √ √ V (k, F ) − V (k, F H ) = O p max , max . N T



L EMMA 4 For the matrix H k defined in Proposition 3.1, and for each k with k < r = r N,T → ∞, there exists a positive constant C such that

plim inf inf[V (k, F0 H k ) − V (r, F0 )] ≥ C > 0. N,T →∞ k

22

Proof. : V (k, F0 H k ) − V (r, F0 ) =

1 NT

=

1 NT

=

N

∑ Xi0 M0FH Xi −

i =1 N

1 NT

N

∑ Xi0 M0F Xi

i =1

1

1

1

N

∑ ( √r F0 λ0i + ei )0 M0FH ( √r F0 λ0i + ei ) − NT ∑ ei0 M0F ei

i =1 N

i =1

N

0 0 1 2 √ ∑ e0 M0 F0 λ0i λ0i F0 M0FH F0 λ0i + ∑ NTr i=1 NT r i=1 i FH

+

1 NT

N

0 ) ei ∑ ei0 ( PF0 − PFH

i =1

= A + B + D. 0 ≥ 0, thus I I I ≥ 0. For the first term, Notice that PF0 − PFH

1 N 00 00 0 0 0 λi F MFH F λi NTr i∑ =1 1 N  0 0 0 0 0 0 0 = MFH F λi MFH F λi NTr i∑ =1

A =

≥ C>0

because k < r and M0FH F0 λ0i , 0. Next,

N N 2 2 0 √ ∑ ei0 F0 λ0i − √ ∑ ei0 PFH F0 λ0i . NT r i=1 NT r i=1 Consider the first term 1 1 N N T 0 0 0 00 0 √ √ ei F λi = ∑ eit Ft λi NT r i∑ NT r i∑ =1 =1 t =1 

2 1/2 !1/2

T T N √ 1 1 1

1 0 2 0  √ √ r · ≤ k F k · e λ

∑ ∑ it i

t Tr t∑ N T t=1 Nr i=1 =1  √  r , = Op √ N where the last equality follows from assumption C2. The second term is also o p (1), and

B=

hence B = o p (1).



 n 2 o k r k2max r0.5 L EMMA 5 For any k with r ≤ k ≤ k max , V (k, Fˆ k ) − V (r, Fˆ r ) = O p max max , . N T

23

Proof. :

|V (k, Fˆ k ) − V (r, Fˆ r )| ≤ |V (k, Fˆ k ) − V (r, F0 )| + |V (r, F0 ) − V (r, Fˆ r )| ≤ 2 max |V (k, Fˆ k ) − V (r, F0 )|. r ≤k

Thus, it is sufficient to prove for each k with r ≤ k ≤ k max ,   2 k 1.5  k r r max max k 0 √ , √ V (k, Fˆ ) − V (r, F ) = O p max . N T Let H k be as defined in Proposition 3.1, with full row rank. Let the k × r matrix H k+ be the generalized inverse of H k such that H k H k+ = Ir . From X i = Xi =

1 √

r

F0 H k H k+ λ0i + ei . This implies that

1 √

r

F0 λ0i + ei , we have

1 1 √ Fˆ k H k+ λ0i + ei − √ ( Fˆ k − F0 H k ) H k+ λ0i r r 1 = √ Fˆ k H k+ λ0i + ui , r 1 ˆk 0 k √ where ui = ei − r ( F − F H ) H k+ λ0i . Xi =

Note that

V (k, Fˆ k ) = V (r, F0 ) = V (k, Fˆ k ) =

=

1 NT 1 NT

N

∑ ui0 MFkˆ ui ,

i =1 N

∑ ei0 M0F ei ,

1 NT

i =1 N 

1 NT

∑ ei0 MFkˆ ei −

+



i =1 N

1 ei − √ ( Fˆ k − F0 H k ) H k+ λ0i r

i =1

0

MFkˆ



 1 ˆk 0 k k+ 0 ei − √ ( F − F H ) H λi , r

N 0 0 2 √ ∑ λ0i H k+ ( Fˆ k − F0 H k )0 MFkˆ ei NT r i=1

1 N 00 k + 0 ˆ k λi H ( F − F0 H k )0 MFkˆ ( Fˆ k − F0 H k ) H k+ λ0i NTr i∑ =1

= a + b + c.

Because I − MFkˆ is positive semi-definite, x 0 MFkˆ x ≤ x 0 x. Thus c ≤

≤ = =

1 N 00 k + 0 ˆ k λi H ( F − F0 H k )0 ( Fˆ k − F0 H k ) H k+ λ0i ∑ NTr i=1 ! N 0 1 1 T ˆk k Ft − H k Ft0 k2 · ∑ kλ0i k2 k H k+ k2 T t∑ Nr =1 i =1   3  3 k r k O p max , · O p (kr ) N T   4 2 4  k r k r O p max , N T 24

by Proposition 3.1. For term b, we use the fact that |tr( A)| ≤ r k Ak for any r × r matrix A. Thus !! 2 1 N 0 k+ ˆ k 0 k 0 k √ tr H ( F − F H ) MFˆ b = ei λi N i∑ T r =1





Fˆ k − F0 H k 1 N



0 √ λ ≤ 2 · k H k+ k · e

· √ i i

TrN i∑

T =1 

2 1/2 !1/2

T T 1 1 1

1 k+ 0  k 0 k 2 ˆ √ √ ≤ 2 · kH k · · kF − F H k ∑ ei λi

∑ T t∑ N T t=1 Nr i=1 =1 (√ √ )! 3r k k3 · O p (1) = 2 · (kr )1/2 · O p max √ , √ N T   2  k r k2 r0.5 = O p max √ , √ N T by Proposition 3.1 and assumption C2. Therefore,   2  1 N 0 k k r k2 r0.5 k ˆ V (k, F ) = ei MFˆ ei + O p max √ , √ . NT i∑ N T =1 Thus we have   2 N N 2 r0.5  1 k r k 1 0 k 0 0 k 0 V (k, Fˆ ) − V (r, F ) = . ei PF ei − ∑ ei PFˆ ei + O p max √ N , √T NT i∑ NT =1 i =1 Note that

! −1

F 00 F 0

1 N 0 0 00 1 N 0 0

P e ≤ · e i F i

NT 2 ∑ ei F F ei NT i∑ T

=1 i =1

2

! −1

T

F 00 F 0

1 N

1 0

√ = Ft eit · r

· NT ∑

Tr t∑

T

=1 i =1

1 = r · O p (1) · · r · O p (1)  2 T    2 r k r k2 r0.5 . = Op ≤ O p max √ , √ T N T N 1 0 0 NT ∑i =1 ei PF ei is bounded by the sum of the first k largest eigenvalues of the matrix

A NT =

1 0 NT e e,

where e = (eti ), T × N. Let ρ( A) denote the largest eigenvalue of a matrix

−2 A. Under Assumption B6, as Bai and Ng (2005) shows, ρ( A NT ) = O p (CNT ), where C2NT =

min{ N, T }. Thus, 1 NT

N



i =1

ei0 PF0 ei



= O p max



k k , N T

 25



≤ O p max



k2 r k2 r0.5 √ , √ N T



.

In summary, we have shown that

 2   k max r k2max r0.5 k 0 ˆ V (k, F ) − V (r, F ) = O p max √ , √ . N T



Proof of Theorem 3.1 Proof. : We shall prove that lim N,T →∞ P( PC (k ) < PC (r )) = 0 for all k , r. Since PC (k ) − PC (r ) = V (k, Fˆ k ) − V (r, Fˆ r ) − (r − k ) g( N, T ),

it is sufficient to prove that P[V (k, Fˆ k ) − V (r, Fˆ r ) < (r − k ) g( N, T )] → 0 as N, T, k, r → ∞. Consider k < r. We have the identity:

V (k, Fˆ k ) − V (r, Fˆ r ) = [V (k, Fˆ k ) − V (k, F0 H k )] + [V (k, F0 H k ) − V (r, F0 H r )]

+ [V (r, F0 H r ) − V (r, Fˆ r )].

 n 8.5 8 o k max k . Next, Lemma 3 implies that the first and the third terms are both O p max √max , √ N

T

we consider the second item. Because F0 H r and F0 span the same column space, V (r, F0 H r ) = V (r, F0 ). Thus the second item can be rewritten as V (k, F0 H k ) − V (r, F0 ), which has a

positive limit by Lemma 4. Hence P[ PC (k ) < PC (r )] → 0 if (r − k ) g( N, T ) → 0 as N, T, k, r → ∞.

Next, for k ≥ r,

P[ PC (k ) − PC (r ) < 0] = P[V (r, Fˆ r ) − V (k, Fˆ k ) > (k − r ) g( N, T )].  n 3 o k k2.5 max By Lemma 5, V (r, Fˆ r ) − V (k, Fˆ k ) = O p max √max , √ . According to our setting, N T  o n 3 2.5 (k − r ) g( N, T ) converges to zero at a slower rate than O p max k√max , k√max . Thus, for N

k > r, P[ PC (k) < PC (r )] → 0 as N, T, k, r → ∞.

26

T



Appendix B: Figures Figure 1: Sensitivity of PC p1 Criterion to k max : 200/60 case

45

40

35

30

25

200/60 (7) 20

15

10

5

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40

Note:The values of kˆ estimated by PC p1 for N = 200, T = 60 and r = 7 with k max ∈ [1, 40].

27

Figure 2: Sensitivity of PC p1 Criterion to k max

45

40

35

100/60 (6) 200/60 (7)

30 500/60 (9) 1000/60 (10)

25

2000/60 (11) 20

100/100 (6) 200/100 (7)

15 500/100 (9) 1000/100 (10)

10

2000/100 (11) 5

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40

Note: Each line represents kˆ estimated by PC p1 for each case of different sample size. The notation in the graph shows the sample size and the true number of factors for each case. For example, 100/60(6) means that N = 100, T = 60 and r = 6.

28

References Ahn, S. and Horenstein, A. (2013). Eigenvalue ratio test for the number of factors. Econometrica, 81:1203–1127. Alessi, L., Barigozzi, M., and Capasso, M. (2010). Improved penalization for determining the number of factors in approximate factor models. Statistics and Probability Letters, 80:1806–1813. Bai, C., Li, Q., and Ouyang, M. (2014). Property taxes and home prices: a tale of two cities. Journal of Econometrics, 180:1–15. Bai, J. and Ng, S. (2002). Determining the number of factors in approximate factor models. Econometrica, 70:191–221. Ching, H. S., Hsiao, C., and Wan, S. K. (2012). Impact of CEPA on the labor market of Hong Kong. China Economic Review, 23:975–981. Connor, G. and Korajzcyk, R. (1993). A test for the number of factors in an approximate factor model. Journal of Finance, 48:1263–1291. Cragg, J. and Donald, S. (1997). Inferring the rank of a matrix. Journal of Econometrics, 76:223–250. Donald, S. (1997). Inference concerning the number of factors in a multivariate nonparameteric relationship. Econometrica, 65:103–132. Fan, J., Liao, Y., and Mincheva, M. (2011). High-dimensional covariance matrix estimation in approximate factor models. Annals of Statistics, 39:3320–3356. Fan, J., Liao, Y., and Mincheva, M. (2013). Large covariance estimation by thresholding principal orthogonal complements. Journal of the Royal Statistical Society: Series B, 75:603–680. Forni, M. and Gambetti, L. (2010). Macroeconomic shocks and the business cycle: evidence from a structural factor model fiscal. Working Papers. Forni, M., Hallin, M., Lippi, M., and Reichlin, L. (2000). Reference cycles: the NBER methodology revisited. 29

Forni, M. and Reichlin, L. (1998). Let’s get real: a factor-analytic approach to disaggregated business cycle dynamics. Review of Economic Studies, 65:453–473. Giannone, D.and Reichlin, L. and Sala, L. (2005). Monetary policy in real time. In Gertler, M. and Rogoff, K., editors, NBER Macroeconomics Annual 2004, pages 161–200. Gregory, A. and Head, A. (1999). Common and country-specific fluctuations in productivity, investment, and the current account. Journal of Monetary Economics, 44:423–452. Hallin, M. and Liska, R. (2007). Determining the number of factors in the general dynamic factor model. Journal of American Statistical Association, 102:603–617. Hsiao, C., Ching, H. S., and Wan, S. K. (2012). A Panel Data Approach for Program Evaluation: Measuring the Benefits of Political and Economic Integration of Hong Kong with Mainland China. Journal of Applied Econometrics, 27:705–740. Jurado, K., Ludvigson, S., and Ng, S. (2015). Measuring uncertainty. American Economic Review, 105:1177–1216. Lewbel, A. (1991). The rank of demand systems: theory and nonparametric estimation. Econometrica, 59:711–730. Li, K. and Bell, D. R. (2017). Estimation of average treatment effects with panel data: theory and implementation. Forthcoming at Journal of Econometrics. Ludvigson, S. and Ng, S. (2007). The empirical risk-return relation: a factor analysis approach. Journal of Financial Economics, 83:171–222. Ludvigson, S. and Ng, S. (2009). Macro factors in bond risk premia. The Review of Financial Studies, 22:5027–5067. Ludvigson, S. and Ng, S. (2010). A factor analysis of bond risk premia. In Uhla, A. and Giles, D. E. A., editors, Handbook of Empirical Economics and Finance, pages 313–372. Mallows, C. L. (1973). Some comments on CP . Technometrics, 15:661–675. Onatski, A. (2009). Testing hypotheses about the number of factors in large factor models. Econometrica, 77:1447–1479.

30

Onatski, A. (2010). Determing the number of factors from empirical distribution of eigenvalues. Review of Economics and Statistics, 92:1004–1016. Ouyang, M. and Peng, Y. (2015). The treatment-effect estimation: A case study of the 2008 economic stimulus package of China. Journal of Econometrics, 188:545–557. Ross, S. (1976). The arbitrage theory of capital asset pricing. Journal of Finance, 13:341–360. Stock, J. H. and Watson, M. (1998). Diffusion indexes. NBER Working Paper. Stock, J. H. and Watson, M. (1999). Forecasting Inflation. Journal of Monetary Economics, 44:293–335.

31