Journal of Econometrics (
)
–
Contents lists available at ScienceDirect
Journal of Econometrics journal homepage: www.elsevier.com/locate/jeconom
Risks of large portfolios Jianqing Fan a,b,∗ , Yuan Liao c , Xiaofeng Shi a a
Department of Operations Research and Financial Engineering, Princeton University, United States
b
Bendheim Center for Finance, Princeton University, United States
c
Department of Mathematics, University of Maryland, United States
article
info
Article history: Available online xxxx JEL classification: C58 C38 Keywords: High dimensionality Factor models Principal components Sparse matrix Volatility
abstract The risk of a large portfolio is often estimated by substituting a good estimator of the volatility matrix. However, the accuracy of such a risk estimator is largely unknown. We study factor-based risk estimators under a large amount of assets, and introduce a high-confidence level upper bound (H-CLUB) to assess the estimation. The H-CLUB is constructed using the confidence interval of risk estimators with either known or unknown factors. We derive the limiting distribution of the estimated risks in high dimensionality. We find that when the dimension is large, the factor-based risk estimators have the same asymptotic variance no matter whether the factors are known or not, which is slightly smaller than that of the sample covariance-based estimator. Numerically, H-CLUB outperforms the traditional crude bounds, and provides an insightful risk assessment. In addition, our simulated results quantify the relative error in the risk estimation, which is usually negligible using 3-month daily data. © 2015 Elsevier B.V. All rights reserved.
1. Introduction The potential of a portfolio’s loss is termed as the portfolio risk. There are two types of portfolio risks. The systematic risk is the risk inherent to the entire market, such as risk associated with interest rates, currencies, recession, war and political instability, etc. The systematic risk cannot be diversified away, even with a well-diversified portfolio. In contrast, specific risk (or idiosyncratic risk) refers to the risk that affects a very specific group of securities or even an individual security. For example, it can be the risk of price changes due to the unique circumstances of a specific stock. Unlike systematic risk, specific risk can be reduced through diversification. Estimating and assessing the risk of a large portfolio is an important topic in financial econometrics and risk management. The risk of a given portfolio allocation vector wT is conveniently measured by (w′T 6wT )1/2 , in which 6 is a volatility (covariance) matrix of the assets’ returns. Often multiple portfolio risks are at interests and hence it is essential to estimate the volatility matrix 6. The problem becomes challenging when the portfolio size is large. Suppose we have created a portfolio from two thousand
∗
Correspondence to: Department of Operations Research and Financial Engineering, Sherrerd Hall, Princeton University, Princeton, NJ 08544, United States. Tel.: +1 609 258 3255. E-mail address:
[email protected] (J. Fan). http://dx.doi.org/10.1016/j.jeconom.2015.02.015 0304-4076/© 2015 Elsevier B.V. All rights reserved.
assets and invested in a part of selected assets. The covariance matrix 6 involved then contains over two million unknown parameters. Yet, the sample size based on one year’s daily data is around 252. It is hard to assess the estimation accuracy when the estimation errors from more than two million parameters are aggregated. Hence some regularization method is recommended to estimate and assess risks. We estimate and assess the risks of a given portfolio vector wT based on factor models. Two factor-based methods are compared, previously proposed by Fan et al. (2011, 2013). The first estimator assumes the factors to be known and observable. The second method deals with the case of unknown factors. In both cases, the factor model imposes a conditionally sparse structure, in that the idiosyncratic covariance is a large sparse matrix. This yields to an approximate factor model as in Chamberlain and Rothschild (1983), with a non-diagonal error covariance matrix. We provide a new and practical method to assess the accuracy of risk estimation w′T ( 6 − 6)wT . In the literature (e.g. Fan et al., 2012), this term has been bounded by
ξT = ∥wT ∥21 ∥ 6 − 6∥max where ∥wT ∥1 is the gross exposure of the portfolio, and is bounded when there are no extreme positions in the portfolio. However, this upper bound depends on the unknown 6, hence is not applicable in practice. In addition, numerical studies in this paper demonstrate that this upper bound is too crude: it is often of the same or even
2
J. Fan et al. / Journal of Econometrics (
larger scale than the estimated risk. In contrast, we provide a highconfidence level upper bound (H-CLUB) for w′T ( 6 − 6)wT , which is of much smaller scale and easy to compute in practice. H-CLUB is constructed based on the confidence interval for the true risk. For the risk estimator w′T 6wT and a given ϵ ∈ (0, 1), we find an H-CLUB U (ϵ) such that P (|w′T ( 6 − 6)wT | ≤ U (ϵ)) → 1 − ϵ. In contrast, P (|w′T ( 6 − 6)wT | ≤ ξT ) = 1. Hence H-CLUB is an upper bound for the risk estimation error with high confidence while the traditional bound ξT is of full confidence. For the inferential theory of the risk estimators with diversified portfolios, we prove that the effects of estimating the factor loadings and unknown factors are asymptotically negligible. Interestingly, it is found that when the dimensionality is larger than the sample size, the factor-based risk estimators have the same asymptotic variances no matter whether the factors are known or not. Hence the high dimensionality is in fact a bless for risk estimation instead of a curse from this point of view. In addition, the asymptotic variance of factor-based estimators is slightly smaller than that of the sample covariance-based estimator, but the difference is small. This demonstrates that the benefit of using a factor model is not in terms of a much smaller asymptotic variance, because the systematic risk cannot be diversified. Rather, factor models give a strictly positive definite covariance estimator, which is essential to estimate the optimal portfolio allocation vector, and also interprets the structure of the portfolio risks. Using our simulated results based on the model calibrated from the US equity market data, we are able to quantify the relative error of the estimation error or coefficient of variation, defined as STD(w′T 6wT )/w′T 6wT , where STD(·) denotes the standard error of the estimated risk. Interestingly, this ratio is just a few percent and is approximately independent of the gross exposure ∥wT ∥1 but sensitive to the length of the time series. On the other hand, we also quantify the relation between the crude bound and the practical H-CLUB. We find that ξT is many times larger than U (ϵ), and the ratio ξT / U (ϵ) increases as the gross exposure increases. A sampling technique that picks a random portfolio with a given gross exposure level is introduced, which can be useful for portfolio optimization and understanding the overall risks within a given level of gross exposure. The interest on large portfolios surges recently. Pesaran and Zaffaroni (2008) examined the asymptotic behavior of the portfolio weights. Brodie et al. (2009) and Fan et al. (2012) addressed the problem of portfolio selection using a regularization penalty. Gomez and Gallon (2011) numerically compared several methods of covariance matrix estimation for portfolio management. In particular, the optimal portfolio selection involves inverting an estimated 6, which is a challenging problem under a large number of assets. Gagliardini et al. (2010) considered a random coefficient model for an unbalanced panel, and focused on the observable factors, while we also study the inferential theory of the unobservable factor case. The recent works by Fan et al. (2011, 2013) are only concerned about covariance estimations and no inferential theories were studied. The literature is also found in Jacquier and Polson (2010), Antoine (2011), Chang and Tsay (2010), DeMiguel et al. (2009a,b), Ledoit and Wolf (2003), El Karoui (2010), Lai et al. (2011), Bannouh et al. (2012), Gandy and Veraart (2012), Bianchi and Carvalho (2011), among others. The rest of the paper is organized as follows. Section 2 introduces risk estimators based on factor models under both known and unknown factors. Section 3 constructs the H-CLUB for each risk estimator based on the confidence interval for risks. Section 4 derives the limiting distributions of the risk estimators and compares their asymptotic variances. Section 5 presents simulation results.
)
–
An empirical study is considered in Section 6. Finally, Section 7 concludes. All the proofs are given in the Appendix. N Throughout the paper, ∥wT ∥1 = i=1 |wi | is used to denote the gross exposure of a given portfolio allocation vector. For a square matrix A, λmin (A) and λmax (A) represent its minimum and maximum eigenvalues; ∥A∥1 = maxi j |Aij |. Let ∥A∥max and ∥A∥ denote its element-wise sup-norm and operator norm, given by 1/2 ∥A∥max = maxi,j |Aij | and ∥A∥ = λmax (A′ A) respectively. 2. Estimation of portfolio’s risks Let {Rt }Tt=1 be a strictly stationary time series of an N × 1 vector of observed excess returns and 6 = cov(Rt ), often known as the volatility matrix. The portfolio risk of a given allocation 6, vector wT is given by w′T 6wT . With a covariance estimator a straightforward estimator of the portfolio risk is
w′T 6wT . But
how good such a substitution estimator is and how to assess its estimation accuracy when the dimension N is large relative to T are the questions addressed here. The problem of estimating the risk of a given portfolio is challenging due to the high dimensionality of 6. In most cases N can be much larger than T . We assume 6 to be time-invariant within a short period, which holds approximately for locally stationary time series. Recently, Chang and Tsay (2010) proposed a Cholesky decomposition approach to estimating the large covariance matrix, and used simulation to assess its performance. A natural alternative approach is through the factor model (e.g. Stock and Watson, 2002; Bai, 2003), because the assets’ returns are usually driven by a few market factors. Estimating 6 is possible when both the factor and the idiosyncratic components can be estimated well. We thus consider three estimators of w′T 6wT for a given wT , based on three different estimators 6: sample covariance estimator, and factor model estimators with either observed or unobserved factors. 2.1. Sample-covariance-based estimator The first estimator 6 = S is the conventional sample covariance matrix based on {Rt }Tt=1 . Because we are mainly concerned about the variance, for simplicity and exposition, let us assume that the T returns have mean zero and S = T −1 t =1 Rt R′t . The asymptotic impact of using S on the risk management has been studied by Fan et al. (2008, 2012) when N is much larger than T . The sample covariance estimator does not require any structural assumption on the assets’ returns. It was shown by the aforementioned authors that for a given portfolio wT with a bounded gross exposure (that is, ∥wT ∥1 is bounded),
wT (S − 6)wT ≤ ∥wT ∥ ∥S − 6∥max = Op ′
2 1
log N T
.
However, when N > T , it is well known that S is singular, and therefore may result in an estimated risk close to zero for certain portfolios. 2.2. Estimating risks based on factor models We assume the true data generating process (DGP) of Rt to be an ‘‘approximate factor model’’ (Chamberlain and Rothschild, 1983): Rt = Bft + ut ,
t ≤ T,
(2.1)
where B is an N × K matrix of factor loadings; ft is a K × 1 vector of common factors, and ut is an N × 1 vector of idiosyncratic error components. In contrast to N and T , here K is assumed to be fixed. The common factors may or may not be observable. For
J. Fan et al. / Journal of Econometrics (
example, Fama and French (1992) identified three known factors that have successfully described the US stock market. On the other hand, in an empirical study, Bai and Ng (2002) determined two unobservable factors for stocks traded on the New York Stock Exchange during 1994–1998. Let cov(ft ) and 6u = cov(ut ) denote the covariance matrices of ft and ut , K × K and N × N respectively. Suppose ft and ut are uncorrelated. The factor model then implies the following decomposition of 6:
6 = B cov(ft )B + 6u . ′
(2.2)
We shall assume that 6u is a sparse matrix in the sense that many off-diagonal elements of the covariance are either zero or nearly so. The rationale of the sparsity here is, after the common factors are taken out, the remaining idiosyncratic components should be mostly weakly correlated. The decomposition (2.2) also implies that the first K eigenvalues of 6 grow at the rate O(N ), due to those of B cov(ft )B′ , while the remaining eigenvalues are bounded away from both zero and infinity. See, e.g., Onatski (2010) and Ahn and Horenstein (2013). 2.2.1. Known-factor-based estimator We first consider the case of observable factors, and construct an estimator of 6 based on thresholding the covariance matrix of idiosyncratic errors. Suppose B is the least squares estimator of B. The residual sample covariance matrix of ut is then given by T ¯ )( ¯ )′ = (Su,ij )N ×N , ( ut − u ut − u ut = Rt − Bft ,
Su = T −1
t =1
¯= u
T 1
T t =1
N λk ξk2,i , ij = k=K+1 Ω N λk ξk,i ξk,j , sij
and
|sij (z ) − z | ≤ τijf .
(2.3)
The thresholding parameter is taken to be, for some positive constant C > 0,
τ = C Su,ii Su,jj
log N T
. f
A simple example is the hard-thresholding sij (z ) = zI (|z | ≥ τij ),
√
namely, setting all correlation coefficients smaller than C log N /T f to zero. The soft-thresholding rule is given by sij (z ) = (z − τij )+ . See Antoniadis and Fan (2001), Rothman et al. (2009) and Cai and Liu (2011) for detailed discussions of various thresholding functions. Let
u,ij = Σ
Su,ii , sij (Su,ij ),
i=j i ̸= j.
(ft ) denote the sample covariance of the common factors. Let cov Define the estimated covariance matrix as (ft ) 6f = B cov B′ + 6u ,
u,ij )N ×N . 6u = (Σ
i=j
i ̸= j
k= K +1
where sij (·) is the same adaptive thresholding function as before, based on an entry-dependent threshold τijP :
jj ii Ω τ =C Ω P ij
log N
1
+√
T
N
.
Here C is a user-specified constant to maintain the finite sample positive definiteness. Even when N /T → ∞, there is C ∗ > 0 such that for any C > C ∗ , both 6f and 6P , K are strictly positive definite for a given finite sample. Previous simulated and empirical studies suggested that C = 0.5 is a good choice when sij is the soft thresholding (see Fan et al., 2013; Fryzlewicz, 2012). The number of factors can be estimated by the information criterion as in Bai and Ng (2002):
+
f
ij )N ×N , = (Ω
j =1
ut .
when |z | ≤ τij ,
3
′ ξ j ξj + , λj
0≤k≤M
f
K
6P , K =
K = arg min
function and for some thresholding parameter τij > 0,
f ij
–
The POET works as follows: let λ1 ≥ · · · ≥ λN be the ordered eigenvalues of the sample covariance S, whose corresponding eigenvectors are denoted by { ξ j }Nj=1 . We then estimate 6 by
Let sij (·) : R → R be an entry-dependent adaptive thresholding
sij (z ) = 0
)
1 N
k(N + T ) NT
tr
N
′
λj ξ j ξj
j=k+1
log
NT N +T
,
(2.5)
where M is a prescribed upper bound. The accuracy of the risk estimation w′T ( 6P , K − 6)wT is robust to over-estimating K , as the variance introduced by unnecessary factors are usually small. This is verified in our simulated studies. Based on the factor model, our proposed risk estimator is either
w′T 6f wT or
w′T 6P , K wT for a given portfolio allocation
vector wT , depending on whether ft is observable. This paper focuses on the limiting distributions of these risk estimators and their assessment for a given diversified wT . We will see that under high dimensionality, the factor-based estimators have the same asymptotic variance, and is smaller than that of the sample covariance-based estimator. 3. Assessment of the risk estimation This section proposes a new method to assess the estimated risks for a given portfolio allocation vector wT . We will assume ∥wT ∥1 ≤ c for some c ≥ 1, where ∥wT ∥1 is the gross exposure of the portfolio. This prevents extreme positions. For simplicity, we shall also assume the number of factors to be fixed, which is easy to be relaxed to allow for slowly-growing K .
(2.4)
2.2.2. Covariance matrix estimator with unknown factors When the common factors are unobservable, we estimate 6 by ‘‘principal orthogonal complements thresholding’’ (POET), recently proposed by Fan et al. (2013). Because K , the number of factors, might also be unknown, this estimator uses a data-driven number of factors K.
3.1. Data-dependent portfolio vector One of the simplest examples of wT is the equally weighted (1/N) portfolios. DeMiguel et al. (2009b) empirically showed how such simple strategies can outperform more sophisticated strategies. There are also many methods proposed to choose datadependent portfolios when the number of assets becomes large. Often one can estimate wT by solving constrained data-dependent
4
J. Fan et al. / Journal of Econometrics (
optimization problems, or directly estimate the unknown parameters when the ideal portfolio wT depends on some unknown quantities (e.g., Brandt et al. (2009), El Karoui (2010), Yen (2013), Lai et al. (2011)). While this paper focuses on assessing the risk of a given T , which portfolio vector wT , we allow to use an estimated vector w consistently estimates wT in the L1 -norm:
∥ wT − wT ∥1 = op (1).
(3.1)
The L1 -convergence is the right notion for consistency in the current context, because the portfolio vector itself has finite gross exposure ∥wT ∥1 . This paper will study the statistical inference ′T 6w T , which is the true risk of such types of dataabout w dependent portfolios and strategies. Example 3.1. The global minimum variance portfolio is the solution to the problem: gmv
wT
= arg min(w′ 6w), w
such that w′ e = 1
gmv wT
−1 6 e = , −1 6 e e′
−1
6
−1 6f = −1 6P , K,
addition, our simulation results have shown that the upper bound toy example.
Example 3.2. Consider three stocks with annualized returns that jointly follow a multivariate Gaussian distribution N3 (0, 6) where 6 = 0.04 × I3 . The equal-weight portfolio wT = (1/3, 1/3, 1/3)′ is constructed. Our task is to estimate the portfolio risk using the sample covariance matrix S based on the simulated 21-day (one month) returns. The true value of portfolio variance is w′T 6wT = 0.0133, which corresponds to a true risk, by definition, (w′T 6wT )1/2 = 11.55% per annum. Based on a simulated data set, the estimated portfolio variance w′T SwT = 0.0131, equivalent to a perceived risk (w′T SwT )1/2 = 11.43% per annum. In addition, for this realization, we also get ξT = ∥wT ∥21 ∥6 − S∥max = 0.0248. Based on this upper bound, a simple calculation shows that w′T 6wT ∈ [0, 0.0379]. In other words, the true risk (w′T 6wT )1/2 lies in [0, 19.46%], an interval that is too wide to be meaningful.
−1
gmv gmv ∥ wT − wT ∥1 ≤
1
The toy example is typical in the sense that ξT is already too crude for small portfolios. In statistical inference, often people use bounds of high confidence levels instead, e.g., quantities that bound ∆ with a high probability. This paper pursues such a highconfidence-level upper bound (H-CLUB) based on the confidence interval.
We propose a confidence upper bound for ∆ = | w′T ( 6 − 6) wT | to assess the estimation error of the portfolio risks. More specifically, for each proposed matrix estimator 6 and any given ϵ > 0, we find a quantity U (ϵ) such that for all large N and T , Therefore, U (ϵ) is an asymptotic (1 − ϵ)100% confidence upper bound for ∆, which is obtained based on the limiting distribution ′T T . In addition, it is data-driven (up to user-specified tuning of w 6w parameters), hence can be easily calculated in practice and used to construct confidence intervals for the true risks.
−1
= 6f or 6P , K, −1 e′ 6 e
P (| w′T ( 6 − 6) wT | ≤ ξT ) = 1.
P (| w′T ( 6 − 6) wT | ≤ U (ϵ)) ≥ 1 − ϵ.
−1 −1 ∥ 6P , ∥1 = op (1). K −6 −1
is,
3.3. High-confidence-level upper bound
unknown factors. gmv
This implies, for 6
Note that the inequality (3.2) holds for every sampling sequence
{Rt }Tt=1 . Hence ξT is in fact an upper bound of full confidence, that
known factors;
T , note that under the To investigate the consistency of w condition that the factors are pervasive (Assumption 4.3(ii) below), the factor-based inverse covariance estimators of Fan et al. (2011, 2013) are L1 -consistent, that is, −1 ∥ 6f − 6−1 ∥1 = op (1),
–
ξT is actually too crude to be useful. Let us consider the following
gmv
where e = (1, . . . , 1), yielding wT = 6−1 e/(e′ 6−1 e). Although this portfolio does not belong to the efficient frontier, Jagannathan and Ma (2003) showed that its performance is comparable with 1 those of other tangency portfolios. Suppose ∥6− u ∥1 = O(1) and e′ 6−1 e ≥ CN for some C > 0 and all large N (in the factor model, 1 −1 a sufficient condition is ∥e′ 6− ∥1 = O(1) u B∥ = o(N )). Then ∥6 gmv gmv −1 −1 and ∥wT ∥1 ≤ (CN ) ∥6 ∥1 N = O(1), so wT has a finite gross exposure. gmv When N > T , we can estimate wT using a factor model. This yields a data-dependent portfolio:
)
−1 ∥( 6 − 6−1 )e∥1 −1
+
|e′ ( 6 − 6−1 e)| −1 ∥6 e∥1 . −1 (e′ 6 e)(e′ 6−1 e)
4. Limiting distributions of risk estimators −1
Note that there is C > 0 so that e′ 6−1 e ≥ CN and e′ 6 e ≥ CN with probability approaching one. The first term on the right-hand−1
side is bounded by Op (N −1 )∥ 6
− 6−1 ∥1 N = op (1); the second − 1 term is bounded by Op (N −2 ∥ 6 − 6−1 ∥N )∥6−1 ∥1 N = op (1). v gm Consequently, w is L -consistent. 1 T
4.1. Regularity conditions We will treat N as an increasing function of T . Hence N grows via a fixed trajectory, e.g., N = NT = T α for some α > 0, and can be faster than T , namely, α > 1. T , We shall show that for a well behaved diversified w
√ 3.2. Measuring risks using full confidence bound One of the main problems to be addressed is to assess the risk error ∆ = | w′T ( 6 − 6) wT | when N > T . A commonly used upper bound for ∆ is based on the following inequality:
∆ ≤ ∥ wT ∥21 ∥ 6 − 6∥max ≡ ξT ,
(3.2)
which is asymptotically tight for risk assessment in the sense that ξT stochastically converges to zero. However, for the purpose of statistical inference, ξT is infeasible as it depends on the true 6. In
T 1 ′T ( Tw 6 − 6) wT = √ ZT ,t + op (1), T t =1
where in the sample covariance based risk estimator, ZT ,t = (w′T Rt )2 − E (w′T Rt )2 ; in the factor based risk estimator, ZT ,t = (w′T Bft )2 − E (w′T Bft )2 . Hence the limiting distribution of the T T estimated risk depends on that of t =1 ZT ,t . Note that {ZT ,t }t =1 is a triangular array of weakly dependent random variables, and a triangular array central limit theorem for weakly dependent time series data (e.g., Peligrad, 1996) can be applied. For this purpose, some technical assumptions are in order.
J. Fan et al. / Journal of Econometrics (
4.1.1. Data generating process Assumption 4.1. (i) Assume that (2.1) holds. In addition, {Rt , ut , ft }Tt=1 is strictly stationary, {ut }Tt=1 and {ft }Tt=1 are independent, and Euit = Efjt = 0 for all i, j. (ii) There exist r1 , r2 ∈ (0, 2] and b1 , b2 > 0, such that for any s > 0, P (|uit | > s) ≤ exp(−(s/b1 ) ), r1
P (|fjt | > s) ≤ exp(−(s/b2 ) ). r2
(iii) There is C > 0 such that C −1 < λmin (6u ) ≤ λmax (6u ) < C , ∥B∥max < C and λmin (cov(ft )) > C −1 . For simplicity, we assume the true DGP to be a factor model for all the three risk estimators. In fact, when estimating 6 using the sample covariance only, assuming the factor structure on Rt is not necessary, as long as the exponential-tail condition is assumed directly on Rt . The exponential-tail condition (ii) enables us to apply the large deviation theory to control the uniform convergence of T T 1 maxi≤N ∥ T1 t =1 uit ft ∥ and maxi,j≤N | T t =1 Rit Rjt −ERit Rjt |, which are needed for the consistent estimation of high-dimensional covariance matrices. On the other hand, when financial noises or factors have heavier tails, the analysis will be much more technically involved for non-i.i.d. data. Research along that line will be left for the future. Note that in the factor model being considered: Rt = Bft + ut , the exponential-tail condition also ensures that both the error term and the factors have bounded finite moments, which also implies that w′ 6w indeed exists. Moreover, it is standard in the literature to assume that the eigenvalues of covariance matrices for ft and ut are bounded away from both zero and infinity. The factor model is assumed to be conditionally sparse as follows: Assumption 4.2. There is q ∈ [0, 1) such that sN ≡ max i≤N
N
|Σu,ij |q = o(min{(T / log N )(1−q)/2 , N (1−q)/2 }). (4.1)
j =1
When q = 0 we define sN = maxi≤N j=1 I (Σu,ij ̸= 0) as the maximum number of non-vanishing elements in each row. Assumption 4.2, though slightly stronger than those in Chamberlain and Rothschild (1983), is meaningful in practice. For example, when the idiosyncratic components represent firms’ individual shocks, they are either uncorrelated or weakly correlated among the firms across different industries, because the industry specific components are not pervasive for the whole economy (Connor and Korajczyk, 1993). Technically, this assumption is easy to satisfy by many sparse covariances as long as T / log N → ∞. For instance, when 6u is a diagonal matrix, sN = 1; when 6u is a block-diagonal matrix with uniformly bounded block sizes, sN is the maximum block size; when 6u is a banded matrix, sN is the width of the bands. In all these cases q = 0, and sN is bounded, while the right hand side of (4.1) diverges fast. Recently, Gagliardini et al. (2010) discussed a detailed example of block dependence and its relation to the sparsity assumption and the approximate factor structure. The following assumption is standard for high-dimensional factor models (e.g., Bai, 2003; Bai and Ng, 2002; Stock and Watson, 2002). Under these assumptions, the unknown factors and loadings can be consistently estimated.
N
Assumption 4.3. (i) There is M > 0 such that, N E [N −1/2 (u′s ut − Eu′s ut )]4 < M and E ∥N −1/2 i=1 bi uit ∥4 < M. (ii) As N → ∞, the eigenvalues of B′ B/N are bounded away from both zero and infinity.
)
–
5
4.1.2. Weakly dependent data The autoregressive function of ZT ,t plays a central role in the ′T ( asymptotic variance of ∆ = w 6 − 6) wT . For the sample covariance based risk estimator, the relevant object is ZT ,t = (w′T Rt )2 − E (w′T Rt )2 . Hence we define the autoregressive function
γT (h) = cov((w′T Rt )2 , (w′T Rt +h )2 ),
h∈Z
which depends on T through dim(wT ) = N = NT . For factor-based risk estimator, the relevant autoregressive function is defined for ZT ,t = (w′T Bft )2 − E (w′T Bft )2 :
γf (h) = cov((w′T Bft )2 , (w′T Bft +h )2 ) h ∈ Z. Accordingly, the confidence interval of w′T 6wT depends on
σT2 = γT (0) + 2
∞
γT (h),
σf2 = γf (0) + 2
h=1
∞
γf (h).
(4.2)
h=1
We make the following assumption on the strength of the serial dependence. Assumption 4.4. As T , N → ∞, T T 1 2 2 (i) T1 h=1 hγT (h) = o(σT ), T h=1 hγf (h) = o(σf ), (ii) When the factors are unknown, σf2 N /T → ∞.
Condition (i) is standard for stationary weakly-dependent time series data. Let ρ(h) denote the autocorrelation function of ZT ,t , corresponding to either γT (h) or γf (h). Then by the ∞dominated convergence theorem, a sufficient condition is that h=1 |ρ(h)| < ∞ and 1 + 2 h≥1 ρ(h) is bounded away from zero, which is satisfied by many standard time series models such as ARMA(1,1). Condition (ii) is needed only when the common factors are not observable, so that a large dimension N helps to estimate the unknown factors accurately. This condition requires at least N /T → ∞ in order for the effect of estimating the common factors ′T ( to be negligible in the asymptotic expansion of w 6 − 6) wT . See more explanations on this effect in Remark 4.4. Next, we state the α -mixing condition. This condition enables us to apply the central limit theorem for the triangular array 0 weakly dependent data. Let F−∞ and FT∞ denote the σ -algebras generated by {(Rt , ft , ut ) : −∞ < t ≤ 0} and {(Rt , ft , ut ) : T ≤ t < ∞} respectively. Define the mixing coefficient
α(T ) =
|P (A)P (B) − P (AB)|.
sup 0 A∈F−∞ ,B∈FT∞
Assumption 4.5. There exist r3 > 0, M1 > 0 and M2 ∈ (0, 1) such that: for all T ∈ Z+ ,
α(T ) ≤ exp(−M1 T r3 ),
and
α(T ) ≤ M2 min{γT (0), γf (0)}.
The stated strong-mixing condition is also standard in the literature (e.g., Merlevède et al. (2011)). ∞Note that it also follows from the α -mixing condition that h=1 |γT (h)| = O(1) and ∞ |γ ( h )| = O ( 1 ) (see Lemma B.6 in the Appendix) and hence f h =1 Assumption 4.4 holds easily as noted before. Moreover, we also show in Lemma B.5 that γT (0) = O(1) and γf (0) = O(1). Hence by the definition of (4.2),
σf2 ≤ γf (0) + 2
∞
|γf (h)| = O(1).
h =1
Similarly, σT2 = O(1).
6
J. Fan et al. / Journal of Econometrics (
4.1.3. Diversified portfolio We are particularly interested in the risks of diversified portfolios. These portfolios diversify away the idiosyncratic risk w′T 6u wT , and hence the risks are mainly contributed by the factor risks w′T B cov(ft )B′ wT . A diversified portfolio vector wT should not be dominated by a few of the cross-sectional units, and should satisfy the following technical condition: 1 1/2−q/2 Assumption 4.6. ∥wT ∥1 = O(1), w′T 6u wT = o(σf s− N N −1 −q / 2 ′ 4 −(1−q)/2 −1/2 T ), and wT 6u wT = o(σf + σf sN T (log N ) ).
Assuming ∥wT ∥1 = O(1) ensures that the portfolio has a bounded exposure, which implies w′T 6wT = O(1) and w′T 6wT = Op (1) because ∥ 6∥max = Op (1). This is formally proved by Lemma A.1 in the Appendix. The required upper bounds on w′T 6u wT is a technical condition for our asymptotic analysis. Since the eigenvalues of 6u are bounded away from both zero and infinity, this condition can also be understood as requiring the same upper bounds on ∥wT ∥22 . While the required upper bounds seem complicated and technical, the intuition is clear: ∥wT ∥2 should decay as N → ∞, so that wT is diversified. Conditions of a similar spirit can be found, for instance, in Chudik et al. (2011, Assumption 2.2). Moreover, recall that q and sN ≥ 1 are defined in Assumption 4.2. In the usual case of q = 0 and sN = O(1), the required upper bounds in the above assumption are simplified to σf N 1/2 T −1/2 and σf4 + σf (log N )−1/2 respectively. The above condition is also related to the ‘‘risk ratio’’: w′T 6u wT w′T B cov(ft )B′ wT
,
the ratio of the idiosyncratic risk relative to the factor risk. Assumption 4.6 then requires the idiosyncratic risk to be dominated by the factor risk. To illustrate the intuition, consider the following example. Example 4.1. Consider a one-factor model on the asset returns with var(ft2 ) > 0. Suppose {ft }Tt=1 are independent across t, and thus σf2 = γf (0) = (w′T B)4 var(ft2 ). It is straightforward to show that Assumption 4.6 is satisfied as long as: w′T 6u wT ′
wT
BB′ w
= o(min{1/ log N , N /T }).
(4.3)
T
This condition is often satisfied by a diversified portfolio wT . For example, the equal-weight allocation wT = (1/N , . . . , 1/N ) gives N w′T 6u wT = O(N −1 ). For CN = |N −1 i=1 bi | = ∥w′T B∥, (4.3) holds as long as (log N )/N 2 = o(CN4 ) and T /N 3 = o(CN4 ). This is true since CN is often bounded away from zero and T /N 3 → 0.
)
–
4.2. Sample covariance based risk estimator Let us start with the risk estimator based on the sample covariance matrix S. The estimation error has an asymptotic expansion T ′ ′ 2 2 ′T (S−6) w wT = T1 t =1 ZT ,t + R, where ZT ,t = (wT Rt ) − E (wT Rt ) . T , The remaining term R arises from the use of estimated portfolio w which is asymptotically negligible under Assumption 4.7. To con′T 6w T , let us estimate struct the confidence interval for the risk w the autocovariance function γT (h) by
γ ( h) = T −1
T −h ′T Sw T )(( ′T Sw T ). (( w′T Rt )2 − w w′T Rt +h )2 − w t =1
T )2 . We employ In particular, γ (0) = T −1 t =1 ( w′T Rt )4 − ( w′T Sw the Newey–West estimator ∞with Bartlett kernel to estimate the variance σT2 = γT (0) + 2 h=1 γT (h): T
σ = γ (0) + 2 2
L
1−
h=1
h
L
γ (h),
(4.4)
where L = L(T ) → ∞ is a truncation parameter (see Newey and West, 1987). This estimator is always nonnegative in finite sample. We are now ready to define the H-CLUB US (ϵ) under the confidence level (1 − ϵ)100%, which is data-driven once a userspecified L is determined. Let zϵ/2 denote the upper ϵ/2 quantile of the standard normal distribution. Let
√ US (ϵ) = zϵ/2 σ/ T.
Lemma 4.1. Under Assumptions 4.1, 4.4 and 4.5, suppose L3 = 4 2 o(T σT ), LσT → ∞, and h>L γ (h) = o(σT2 ), then
| σ − σ | = op (σ ) and US (ϵ) = o 2
2 T
2 T
log N T
.
Remark 4.1. The conditions L3 = o(T σT4 ), LσT2 → ∞, and h >L γ (h) = o(σT2 ) are called to control the aggregated errors of estimating γT (h) and the ∞estimation bias when we use a truncated sum to approximate h=1 γT (h). These estimation errors often arise in estimating the covariances with serially correlated data (see Newey and West, 1987; Andrews, 1991), and the required conditions are satisfied with a slowly growing L, e.g., L3 = o(T ) when σT2 is bounded away from zero.
The following theorem gives the limiting distribution of the estimated risk. It also demonstrates that US (ϵ) is a valid H-CLUB for | w′T (S − 6) wT |.
T Finally, the following condition controls the error of using w to estimate wT .
Theorem 4.1. Under the assumptions of Lemma 4.1, as T → ∞ and N = NT → ∞,
Assumption 4.7. For r1 , r2 defined in Assumption 4.1,
∥ wT − wT ∥1 = op (min{σ , σ } 2 T
2 f
−1 ∥ wT − wT ∥1 = Op (∥ 6 − 6−1 ∥1 ) = Op sN
log N T
−1/2 ′T (S − 6) Tw wT →d N (0, 1),
and for any ϵ > 0,
This condition is only slightly stronger than the L1 -consistency ∥ wT − wT ∥1 = op (1), because often σT2 and σf2 are bounded away from zero, and log(NT ) is growing slowly. As an example, for the global minimum variance portfolio in Example 3.1, suppose σT2 > 0 and σf2 > 0,
T var (w′T Rt )2 t =1
× min{(log NT )−4/r1 , (log T )−4/r2 }).
(1−q)/2
.
Hence Assumption 4.7 is satisfied so long as sN does not grow too fast (in fact, for many examples of sparse matrices, sN is often bounded).
P | w′T (S − 6) wT | ≤ US (ϵ) → 1 − ϵ.
By the delta-method, we have the following corollary for the risk estimation. Define Rˆ ( wT ) =
′T Sw T , w
R( wT ) =
′T 6w T . w
Corollary 4.1. Under the assumptions of Lemma 4.1, for any ϵ > 0, as T , N → ∞,
P
′T Sw T → 1 − ϵ. |Rˆ ( wT ) − R( wT )| ≤ US (ϵ)/ 4w
J. Fan et al. / Journal of Econometrics (
Let us now approach the problem in the factor model. We assume Rt = Bft + ut , where in this section, {ft }Tt=1 are observed common factors. We show that the dominating term in the asymp′T ( totic expansion of w 6f − 6) wT is T 1
(w′T Bft )2 − E (w′T Bft )2 .
T ] ′T (ft ) [( w′T Bft +h )2 − w B cov B′ w
t =1
T ], ′T (ft ) × [( w′T Bft )2 − w B cov B′ w where B is the least squares estimator of B. For some L = L(T ) → ∞, apply the Newey–West estimator to estimate σf2 = γf (0) +
∞
2
h =1
γf (h):
σf2 = γf (0) + 2
L l =1
1−
h
L
γf (h),
(4.5)
which is always nonnegative. Define
√ Uf (ϵ) = zϵ/2 σf / T .
Let β = 3(r1−1 + r2−1 + r3−1 ), where r1 , r2 , r3 are defined as in Assumptions 4.1 and 4.5. 2β+2
Lemma = o(T ), and the truncation sat) √4.2. Suppose (log N 2 2 isfies L (L + log N )/T + h>L γf (h) = o(σf ), and Lσf → ∞. Under Assumptions 4.1, 4.2 and 4.4–4.6,
| σf2 − σf2 | = op (σf2 ),
and Uf (ϵ) = o
√
log N T
.
Remark 4.2. The condition L (L + log N )/T + h>L γf (h) = o(σf2 ) and Lσf2 → ∞ ensure that the effect of estimating the limiting distribution is asymptotically negligible. The first term √ L (L + log N )/T represents the error of estimating γf (0) +
L ∞l=1 (1−h/L)γf (h), while h>L γf (h) arises from approximating h=1 γf (h) by a truncated sum. These conditions are easily satisfied with a slowly-growing L, for instance, when L3 = o(T ) and σf2 is bounded away from zero. 2
The following theorem shows that Uf (ϵ) is a valid H-CLUB for the risk estimation error. Technically, the estimation error for the factor loadings is asymptotically negligible even under high dimensionality. Theorem 4.2. Suppose that the common factors are observable, and that the thresholded 6f (2.4) is used as the covariance estimator. Under the assumptions of Lemma 4.2,
T var (w′T Bft )2
−1/2 ′T ( Tw 6f − 6) wT →d N (0, 1),
t =1
and for any ϵ > 0, P | w′T ( 6f − 6) wT | ≤ Uf (ϵ) → 1 − ϵ.
T to estimate R( wT 6f w wT ) = ′
=
T , then applying the wT 6w
′
delta-method yields P
T → 1 − ϵ. ′T |Rˆ f ( wT ) − R( wT )| ≤ Uf (ϵ)/ 4w 6f w
(ft ) = T −1 Tt=1 ft f′t , autocovariance function of (w′T Bft )2 . For cov define T −h
7
T is a valid H-CLUB for |Rˆ f ( ′T Hence Uf (ϵ)/ 4w wT ) − R( wT )|. 6f w
Hence the estimation error of the risk only comes from the systematic error brought by the common factors, and the risk component ′T ( w 6u − 6u ) wT introduced by the idiosyncratic error can be diversified away by a selected portfolio allocation vector. To construct H-CLUB, we need to first estimate γf (h), the
γf (h) = T −1
–
Remark 4.3. Similar to Corollary 4.1, if we use Rˆ f ( wT )
4.3. Factor-based risk estimator
T t =1
)
It is also interesting to compare Uf (ϵ) with US (ϵ) and see if knowing the factor structure results in a reduced upper bound. This is equivalent to comparing the asymptotic variances of the estimated risks between a pure nonparametric estimator (sample covariance) and an estimator based on factor models. We will see in the following subsection that the factor-based risk estimator indeed gives a slightly smaller asymptotic variance. Recently Gagliardini et al. (2010) considered a similar covariance thresholding problem for the risk premium in a random coefficient panel model, where the factors are assumed to be observable. They studied the inferential theory for estimating the risk premium. While we are based on similar frameworks of the conditionally sparse factor model, we focus on the inferential theory of the estimated risks for a given portfolio vector, and the impacts on risks from estimating large covariance matrices. In the following subsection, we also study the unobservable factor case. 4.4. Risk estimation with unknown factors When the market assets’ returns are driven by a few unknown factors, one needs to handle the difficulty of not knowing the common factors in estimating the risk covariance matrix. In this case, the covariance estimator of Fan et al. (2013) is defined by
6P , K =
K
′ λj ξ j ξj +
(4.6)
j =1
as described in Section 2.2. The risk estimator for a given portfolio T is then w ′T T , which incorporates the case of unknown w 6P , Kw number of factors K . As we shall demonstrate below, using the consistent estimator of K does not affect the asymptotic behavior of the covariance estimator. In addition to the systematic T risk t =1 ZT ,t as before, there are three other components in the asymptotic expansion of the risk: effects of estimating the unknown loadings, factors, as well as the idiosyncratic risk. All these three components are asymptotically negligible. Under the conditional sparsity condition, Fan et al. (2011, 2013) showed that, when the common factors are observable,
−1 ∥ 6f − 6−1 ∥ = Op sN
log N
1/2−q/2
T
.
(4.7)
When the common factors are unobservable,
−1 −1 ∥ = Op sN ∥ 6P , K −6
log N T
+
1 N
1/2−q/2 (4.8)
where q and sN are defined in Assumption 4.2. The term 1/N in (4.8) is the price for not knowing ft . When T = o(N log N ), the above convergence rates are the same. This also explains the condition (iii) in Assumption 4.4, which requires N /T → ∞ in the case when ft is unobservable. Intuitively, as the dimensionality increases, more information about the common factors is collected, and eventually the common factors can be treated as though they are known. Therefore, the effect of estimating the unknown factors ′T T and w ′T T on the estimated risk is negligible, and w 6P , 6f w Kw
8
J. Fan et al. / Journal of Econometrics (
have the same limiting distribution. This is often true for the asset returns’ time series data. The number of assets can be in thousands while the sample size on the monthly returns over ten years is slightly larger than a hundred. To define an H-CLUB for a factor model with unknown factors, we first apply the principal components method (e.g., Stock and Watson, 2002) to estimate σf2 . Let F = ( f1 , . . . , fT ) be a
√ K × T matrix such that the rows of F/ T are the eigenvectors corresponding to the K largest eigenvalues of the T × T matrix R′ R, where R = (R1 , . . . , RT ). Let B = R F′ /T . Define γP (h) = T −1
T −h T ] ′T [( w′T B ft +h )2 − w B B′ w t =1
T ]. ′T × [( w′T B ft )2 − w B B′ w For some L = L(T ) → ∞, let
σ = γP (0) + 2 2 P
L
1−
h
h =1
L
γP (h),
UP (ϵ) = zϵ/2 σP2 /T .
Lemma 4.3. Suppose conditions in Lemma 4.2 are satisfied and L = √ o( N σf2 ). Under Assumptions 4.1–4.6,
| σ − σ | = op (σ ), 2 f
2 f
UP (ϵ) = o
log N
T
.
Remark 4.4. Compared √ to the conditions in Lemma 4.2, the extra requirement L = o( N σf2 ) controls the effect of estimating the unknown factors. So the rates of convergence of estimating 6u and σf2 are the same when N (log N )/T → ∞. Intuitively, as the number of unknown factors is O(T ), we require more cross sectional units (N) to accurately estimate them. This is also seen in the results in Bai (2003) and Fan et al. (2013). The following theorem shows that UP (ϵ) is an H-CLUB for ′T ( ′T T and w ′T T have w 6P , wT . Interestingly, w 6P , 6f w K − 6) Kw the same asymptotic limiting distribution. The price paid for not knowing the factors is asymptotically negligible. Theorem 4.3. Suppose the common factors are unobservable, and
6P , K (4.6) is used as the covariance estimator. Under the assumptions of Lemma 4.3,
T var (w′T Bft )2
−1/2 ′T ( Tw 6P , wT →d N (0, 1), K − 6)
t =1
and for any ϵ > 0, P | w′T ( 6P , wT | ≤ UP (ϵ) → 1 − ϵ. K − 6)
Remark 4.5. Similarly, if we define Rˆ P ( wT ) =
′T T , then w 6P , Kw
T is a valid H-CLUB for |Rˆ P ( ′T UP (ϵ)/ 4w 6P , wT ) − R( wT )|. Kw Knowing the factor-structure of the return Rt improves the estimation efficiency relative to the sample covariance estimator. This is demonstrated by the following theorem. Theorem 4.4. Under the assumptions of Theorem 4.3,
T T ′ 2 ′ 2 (wT Bft ) . var (wT Rt ) > var t =1
–
Table 1 Mean and covariance used to generate bi .
µB
6B
0.9833 −0.1233 0.0839
−0.0178
0.0921
−0.0178
0.0436
0.0862 −0.0211
0.0436
−0.0211 0.7624
In fact, the difference of the above two variances is small when wT is diversified enough, and this fact is further verified by our simulation results (see Tables 3 and 5 in Section 5). The reason is that the systematic risk cannot be diversified, and dominates the idiosyncratic risk. On the other hand, factor models give a strictly positive definite covariance estimator, whereas the sample covariance may produce a risk estimator being zero for certain portfolio allocation vectors. The positive definiteness is particularly important to estimate the optimal portfolio allocation vector. Furthermore, factor models interpret the structure of portfolio’s risks. It is clearly seen in both Theorems 4.2 and 4.3 that the idiosyncratic risks are diversified away by the portfolio allocation.
(4.9)
The following lemma characterizes the estimation error of σP2 .
2 P
)
t =1
5. Monte Carlo examples In this section, we examine the finite-sample performance of both the full confidence upper bound ξT defined in (3.2) and H-CLUB, based on three covariance estimators 6, using portfolios wT with different gross exposure constraints. Excess returns of the ith stock of a portfolio over the risk-free interest rate is assumed to follow the Fama–French three-factor model [Fama and French (1992)]: Rit = λi1 f1t + λi2 f2t + λi3 f3t + uit . The first factor is the excess return of the whole equity market, while the second and third factors are SMB (‘‘small minus big’’ cap) and HML (‘‘high minus low’’ book/price) respectively. Using US equity market data, we calibrate a sub-model to generate the loadings bi = (λi1 , λi2 , λi3 )′ , the idiosyncratic noises ut and the factors ft = (f1t , f2t , f3t )′ . 5.1. Calibration To calibrate parameters in the model, we use the data on daily returns of S&P 500’s top 100 constituents ranked by market capitalization (on June 29th 2012), the data on 3-month Treasury bill rates, and daily return data of the Fama–French factors. They are obtained from COMPUSTAT database, the data library of Kenneth French’s website, and CRSP database respectively. The ˜ t , ˜ft ) are analyzed for the period from July 1st, excess returns (R 2008 to June 29th 2012, approximately 1000 trading days. ˜ of R˜ t = B˜ft + ut , and (1) Calculate the least square estimator B compute the sample mean vector µB and sample covariance ma˜ These parameters are reported trix 6B of all the row vectors of B. in Table 1. The factor loadings {bi }Ni=1 of the simulated models are then generated from a trivariate Gaussian distribution N3 (µB , 6B ). (2) Assume that the factors follow the stationary vector autoregressive VAR(1) model ft = µ + 8ft −1 + εt for some 3 × 3 matrix 8, where εt follows i.i.d N3 (0, 6ϵ ). The model parameters 8, µ and 6ϵ are calibrated using the daily excess returns of the Fama–French factors ˜ft . The covariance matrix cov(ft ) is then obtained by solving the linear equation cov(ft ) = 8 cov(ft )8′ +6ϵ . Results are summarized in Table 2. (3) The error covariance matrix is sparse in our setting. For each fixed N, it is created by 6u = D60 D, where D = diag(σ1 , . . . , σp ). To be more specific, σ1 , . . . , σp are generated independently from a Gamma distribution G(α, β), in which α and β are selected to
J. Fan et al. / Journal of Econometrics ( Table 2 Parameters used to generate ft .
µ
Φ
0.0260 0.0211 −0.0043
−0.1006 −0.0191 0.0116
−0.0365 0.0186 0.0272
3.2351 0.1783 0.7783
0.1783 0.5069 0.0102
0.7783 0.0102 0.6586
Table 3 Averages and standard deviations of RE1 over 500 replications. c=1
c = 1.2
c = 1.4
c = 1.6
c = 1.8
c=2
RE1 S
4.2552 (1.4595)
6.0757 (2.0517)
8.3303 (2.8590)
10.9470 (3.6818)
14.0553 (5.0410)
17.2724 (6.4417)
RE1
4.2462 (1.4838)
6.0469 (2.0638)
8.3545 (2.9640)
10.9765 (3.7347)
14.0610 (5.1142)
17.3298 (6.5919)
4.1813 (1.4888)
5.9698 (2.0866)
8.2153 (2.9936)
10.8263 (3.7560)
13.8465 (5.1758)
17.0570 (6.6062)
6f RE1
6P , K
match the sample mean and sample standard deviation of the 100 ˜ t = R˜ t −B˜ ˜ft (recall that each u˜ t is standard deviations of the errors u 100 dimensional; see also Fan et al., 2008). An additional restriction is imposed on σi that only values in between the minimum ˜ t are accepted. We and maximum of the standard deviation of u then generate the off-diagonal entries of the correlation matrix 60 independently from a Gaussian distribution, with mean and standard deviation equal to those of the sample correlations of the estimated residuals. Moreover, absolute values of the offdiagonal entries are set to no greater than 0.95. Finally the hardthresholding is applied to make 60 sparse, where the threshold is set to be the smallest constant that makes 60 positive definite. 5.2. Representative portfolios We examine the performance of H-CLUB based on wT with a couple of different gross exposures. For a given exposure c and given number of assets N, we randomly generate portfolios wT N N that satisfy i=1 wi = 1 and i=1 |wi | = c. This task, which generates uniformly from the above set in RN , is of independent interest for portfolio optimization and research. We propose the following method. Let w + be the total long position and w − be the total short position: w + = (c + 1)/2 and w − = (c − 1)/2, where c = ∥wT ∥1 . For c = 1, there are no-short positions. For c > 1, there are both long and short positions (positive and negative components of wT ). The identities (or indices) of long and short positions are hard to identify, but the following sampling scheme is a reasonable approximation: The positive positions are determined by a Bernoulli trial (N times) with probability of success w + /(w + + w − ) = (c + 1)/(2c ). Once the identities are determined, we can normalize them and the problem reduces to the case with c = 1. For the case with c = 1, N the uniform distribution on the set {wi : i=1 wi = 1, wi ≥ 0} can be generated from a normalized exponential distribution:
wi = ζi /
N
ζi ,
ζi ∼i.i.d. standard exponential.
i=1
Combining the above two steps, we can generate a randomly selected portfolio wT of size N with a gross exposure c as follows. 1. Generate a positive integer k, the number of stocks with positive +1 ). weights in wT , from a binomial distribution Bin(N , c2c 2. Generate independently {ζi }ki=1 from the standard exponential distribution and set each wi+ = (c + 1)ζi /(2 j=1 ζj ), for i = 1, . . . , k. 3. Analogously compute for i = 1, . . . , N − k, wi− = (1 −
k
c )ζi /(2 j=1 ζj ), where {ζj }Nj=−1k are obtained independently from the standard exponential distribution.
N −k
–
9
4. Take the portfolio weights wT as a random permutation of the numbers {wi+ }ki=1 and {−wi− }iN=−1k .
cov(ft ) 0.2803 −0.0944 −0.0272
)
5.3. Choosing the time lag L The time lag can be chosen using the plug-in method, previously suggested by Newey and West (1994) and Andrews (1991). The plug-in method chooses L by minimizing the estimated mean squared error of the variance estimator σˆ 2 (L) of σT2 in (4.2). As suggested by Newey and West (1994), we find L∗ by L∗ = 1.447T 1/3 (ˆs1 /ˆs0 )2/3 where sˆ1 = 2
n
hγˆ (h),
h=1
sˆ0 = γˆ (0) + 2
n
γˆ (h),
h=1
n = 4(T /100)2/9 . We recommend initially setting L = L∗ , and then exercise some judgment about sensitivity of results to the choice of n and L. Some evidence from the literature suggests that the final result is less sensitive to n than to L (Silverman, 1986). In the simulation and empirical studies below, L is chosen by this method. 5.4. Simulation 5.4.1. Data generation For each given gross exposure c, number of assets N and length of time series T , we generate 50 different models and 200 testing portfolios for each model, so a total of 10,000 portfolios are actually used. To be more specific, in any simulation with fixed c, N and T , we repeat the following steps for 50 times: 1. Generate {bi }Ni=1 independently from N3 (µB , 6B ). Set B = (b1 , . . . , bp )′ . 2. Generate {ut }Tt=1 independently from Np (0, 6u ). 3. Generate {ft }Tt=1 from a VAR(1) model ft = µ + 8ft −1 + εt with parameters specified in the calibration part. 4. Calculate Rt = Bft + ut for t = 1, . . . , T . T 5. Calculate the sample covariance matrix S = T −1 t =1 (yt − y¯ )(yt − y¯ )′ ; obtain the factor-based covariance estimators by soft-thresholding the sample correlation matrices. 6. Generate 200 wT according to the method described in Section 5.2. 6 = S, 6f 7. (i) Compute the true risk R(wT ) = w′T 6wT ; (ii) For ′ and 6P , − 6)wT |, ξT = ∥wT ∥21 ∥ 6− K , calculate ∆ = |wT (6 √ 6∥max and U (0.05) = 1.96 σ / T ; (iii) Calculate empirical coverage probabilities based on the 200 generated portfolios wT , i.e., the proportions of times that ∆ < U (0.05). Note that time lags L are determined by the plug-in method described in Section 5.3.
5.4.2. Output We first produce the graph of risk domain by plotting averages of R(wT ) as a function of c and N, followed by plots of ∆, ξT and U (ϵ) against N for sample-based, factor-based and POETbased estimators. Average absolute distance between the empirical coverage probability and the nominal level (95%) are also plotted against N, for all three kinds of estimators. Finally, fix the dimensionality N and increase the number of model generation to 500, we look at two ratio quantities, namely
ξT ∥wT ∥21 ∥ 6 − 6∥max = and U (ϵ) 1.96 v ar(w′T 6wT ) v ar(w′T 6wT ) RE2 = . 2w′T 6wT
RE1 =
10
J. Fan et al. / Journal of Econometrics (
)
–
Table 4 Averages and standard deviations of RE1 over 500 replications using POET-based estimator.
ϵ = 0.05
ϵ = 0.01
ϵ = 0.005
ϵ = 0.001
RE1 c=1
4.1813 (1.4888)
3.2413 (1.1541)
2.4030 (0.8556)
2.1498 (0.7654)
RE1 c = 1.6
10.8263 (2.9936)
8.3925 (2.3206)
6.2220 (1.7205)
5.5662 (1.5391)
Table 5 Averages and standard deviations of RE2 over 500 replications, with T = 200. c=1
c = 1.2
c = 1.4
c = 1.6
c = 1.8
c=2
RE2 S
4.8968% (0.8440%)
4.8187% (0.8211%)
4.9171% (0.8694%)
4.8137% (0.8113%)
4.8808% (0.8586%)
4.8217% (0.8085%)
RE2
4.8888% (0.8428%)
4.8139% (0.8152%)
4.8959% (0.8740%)
4.7921% (0.8043%)
4.8610% (0.8570%)
4.7995% (0.8021%)
4.8918% (0.8443%)
4.8158% (0.8177%)
4.9015% (0.8746%)
4.7967% (0.8039%)
4.8636% (0.8583%)
4.8032% (0.8015%)
6f RE2
6P , K
Fig. 1. Averages of annualized risks R(wT ) over 10,000 portfolios.
Means and standard deviations of RE1 and RE2 under a selection of c and T are reported, for all three kinds of estimators. Respective plots and tables are presented and analyzed in Section 5.5. 5.5. Results In Fig. 1, we gradually increase N from 20 to 600 in increments of 20. Averages of R(wT ) over 10,000 portfolios are plotted against N. We produce the curves associated with four choices of c, all with T = 300. For the sake of comparison, we overlay four curves together on the same plot. The following observations can be made from Fig. 1. (1) The average risk ranges from less than 30% to around 50% per annum. (2) The average risk is higher for larger exposure c. This is consistent with the fact that portfolios with greater gross exposure are more volatile, and hence incur higher risks. (3) Given a gross exposure c, as the portfolio size N increases, the average risk decreases. The rate of decline is very fast until N is around 150. This is consistent with the theory that as N increases, the portfolio becomes more diversified and the idiosyncratic risk is reduced through diversification. Let N grow from 100 to 600 in grid of 20. In Fig. 2, their respective averages of ∆ = | w′T ( 6 − 6) wT |, ξT and U (ϵ) over 10,000 portfolios are plotted using estimators 6 = S, 6f and 6P , K. In particular, c = 1.6 results in 130% long positions and 30% short positions (130/30 strategy). The 130/30 structure is popular in long-short funds. In each subfigure, the dashed curve corresponds to ∆, the solid curve corresponds to U (ϵ), ϵ = 0.05, and the twodash curve corresponds to ξT . Based on these plots, we can observe the following features: (1) Solid curves lie entirely above dashed curves, ensuring U (ϵ) as a valid H-CLUB. This feature maybe difficult to observe on Fig. 2(c) and (d) with the presence of two-dash curves ( ξT ). To make the comparison clearer, we also produce plots without the presence of ξT , as shown in Fig. 3 (a zoomed-in version of Fig. 2). (2) The full confidence upper bound ξT is indeed a very crude bound and is much larger than U (ϵ). The larger c is, the larger the difference is. This feature is further demonstrated in Table 3.
(3) H-CLUB slightly increases with larger N, but its degree of increase is much smaller than the crude bound ξT . In order to further justify the validity of U (ϵ) as an HCLUB, we investigate the empirical coverage probabilities for all three estimators. We employ same settings as in Figs. 2 and 3, plot average absolute distances between the empirical coverage probability and the nominal level 95% against dimensionality N in Fig. 4. The average absolute distances vary from 1% to 5%, which is in an acceptable range. This is in line with our expectation which ensures U (ϵ) as a valid H-CLUB of ∆. Means and standard deviations (in parentheses) of RE1 for all three kinds of estimators are summarized in Table 3. Here we fix N = 600 and T = 300 with 500 replications. The ratio RE1 quantifies the relation between the full confidence bound and the H-CLUB. Numerical results justify our observations in Fig. 2 in the sense that ξT is in general many times greater than U (ϵ). Moreover, RE1 increases dramatically as the exposure c increases. Under the same setting, we also look at values of RE1 under multiple choices of ϵ . For illustrating purposes, only POET-based estimator with two gross exposures are considered. Means and standard deviations are summarized in Table 4. As ϵ decreases, it is not difficult to identify simultaneous decline trends of RE1 . This is due to the fact that U (ϵ) grows as the confidence level increases. Averages and standard deviations of relative error (see e.g. Corollary 4.1) RE2 =
v ar(w′T 6wT )/(2w′T 6wT )
with two choices of T are summarized in Tables 5 and 6, respec1
tively. RE2 measures the accuracy of the perceived risk R(wT ) 2 1 2
with respect to the true risk R(wT ) . Indeed by delta’s method, 1
1
RE2 ≈ ASD( R(wT ) 2 )/R(wT ) 2 , where ‘‘ASD’’ stands for asymptotic standard deviation. From both tables, it is not difficult to observe that standard deviations are small when compared to their corresponding means. The results also show that the relative error are negligible, at around 3%–5%, ensuring the estimate of R(wT ) a high level of accuracy. More interestingly, we realize that this ratio is approximately independent of the gross exposure c but sensitive to the length of the time series. RE2 steadily decreases as T grows. We also observe from Tables 3–6 that the asymptotic variances (reflected by U (ϵ)) of the estimators based on known and unknown factors are almost the same, and slightly smaller than that of the sample covariance estimator.
J. Fan et al. / Journal of Econometrics (
(a) c = 1.
(b) c = 1.6.
(c) c = 2.
(d) c = 3.
)
–
11
Fig. 2. Averages of ∆ (dashed curve), U (ϵ) with ϵ = 0.05 (solid curve) and ξT (two-dash curve) over 10,000 portfolios for c = 1, 1.6, 2 and 3, based on three estimated covariance estimators. Table 6 Averages and standard deviations of RE2 over 500 replications, with T = 400. c=1
c = 1.2
c = 1.4
c = 1.6
c = 1.8
c=2
RE2 S
3.4783% (0.4438%)
3.4684% (0.4215%)
3.4831% (0.4611%)
3.4807% (0.4691%)
3.4979% (0.4392%)
3.4793% (0.4479%)
RE2
3.4708% (0.4425%)
3.4607% (0.4183%)
3.4742% (0.4626%)
3.4659% (0.4649%)
3.4772% (0.4370%)
3.4553% (0.4438%)
3.4745% (0.4431%)
3.4634% (0.4170%)
3.4775% (0.4617%)
3.4708% (0.4665%)
3.4811% (0.4380%)
3.4586% (0.4431%)
6f RE2
6P , K
In addition, in order to examine the sensitivity to overestimating K in the POET-based estimator, we compute RE2 with multiple choices of K (the true K equals 3). Averages and standard deviations of RE2 are summarized in Tables 7 and 8. In both tables, the average RE2 for K = 3 . . . 7 are very close to each other, and hence the POET-based risk estimator is robust to over-estimating K . 6. Empirical studies We assess the performance of H-CLUB in a portfolio allocation. We use the daily excess returns of 100 industrial portfolios formed
12
J. Fan et al. / Journal of Econometrics (
(a) c = 2.
)
–
(b) c = 3. Fig. 3. Averages of ∆ (dashed curve), U (ϵ) with ϵ = 0.05 (solid curve) over 10,000 portfolios. This is a zoom version of Fig. 2.
Table 7 Averages and standard deviations of RE2 over 500 replications, with T = 200.
Table 8 Averages and standard deviations of RE2 over 500 replications, with T = 400.
c=1
c = 1.2
RE2
4.8918% (0.8443%)
4.8158% (0.8177%)
6P , K (K = 3 )
RE2
4.8918% (0.8439%)
4.8154% (0.8178%)
6P , K (K = 4 )
RE2
4.8921% (0.8439%)
4.8155% (0.8178%)
6P , K (K = 5 )
RE2
4.8925% (0.8436%)
4.8144% (0.8185%)
6P , K (K = 6 )
RE2
4.8922% (0.8434%)
4.8145% (0.8192%)
6P , K (K = 7 )
6P , K (K = 3) 6P , K (K = 4) 6P , K (K = 5) 6P , K (K = 6) 6P , K (K = 7)
on the size and book to market ratio from the website of Kenneth French. The study period is from July 1st 2008 to June 29th 2012, which spans a total of 1000 trading days. At the end of each month the covariance matrix is estimated by three estimators, the sample covariance, the factor-based estimator, and the POET estimator, using daily returns of the preceding 12 months (T = 252). In particular, we employ the Fama–French three-factor model to construct the factor-based estimator. As in Section 5, we adopt plug-in method to estimate the time lags L, refer to Section 5.3 for more details. Two types of strategies are tested, namely the equally weighted portfolio, and the minimum variance portfolio. The optimal portfolios are constructed under two exposure constraints (c = 1 and c = 1.6). The equally weighted portfolio is given by T = (1/N , . . . , 1/N ) . The data-dependent minimum variance w portfolio is given by
T = w
argmin w′T 1=1,∥wT ∥1 =c
w′T 6wT .
Portfolios are held for one month and rebalanced at the beginning of the next month. Because the ‘‘true’’ volatility matrix 6 is unknown, to illustrate our method on the real data, at any time point, we look one month ahead and define the true 6 to be the sample covariance of daily
c=1
c = 1.2
RE2
3.4745% (0.4431%)
3.4634% (0.4170%)
RE2
3.4750% (0.4430%)
3.4635% (0.4170%)
RE2
3.4750% (0.4431%)
3.4639% (0.4171%)
RE2
3.4752% (0.4433%)
3.4636% (0.4172%)
RE2
3.4750% (0.4434%)
3.4635% (0.4175%)
excess returns over the future month. For example, at day 10, the corresponding 6 and R( wT ) are defined to be
6=
31 1
21 t =11
Rt R′t ,
′T 6w T R( wT ) = w
1/2
,
T is the portfolio defined above. On the other hand, the where w ‘‘sample covariance matrix’’ S in Table 9 below is the sample covariance of daily excess returns over the past twelve months. This is aggregated over the entire testing period. We choose ϵ = 0.01, i.e. U (ϵ) is an empirical 99% upper bound. For each covariance matrix estimator and strategy, we study five quantities, whose respective averages over thewhole study period are summarized ′T T is the H-CLUB for the true in Table 9. Recall that U (ϵ)/ 4w 6w
risk error | R( wT ) − R( wT )|, see, for example, Corollary 4.1. All risks are annualized. By comparing the first two columns in Table 9, we observe that U (ϵ) is uniformly greater that ∆, regardless of the strategies and the covariance matrix estimators. Moreover, the true risk errors are very large, around 40%–50% of the true risk. There are two possible causes: The sample covariance based on one month data (holding period) can be different from the true covariance matrix. It incurs
J. Fan et al. / Journal of Econometrics (
(a) c = 1.
(b) c = 1.6.
(c) c = 2.
(d) c = 3.
)
–
13
Fig. 4. Average absolute distances between empirical coverage probabilities and the nominal level (95%).
substantial sampling variability. The second cause can be the nonstationarity of the financial returns. However, as shown in the two rightmost columns, results are still satisfactory in the sense that the U-CLUB’s are uniformly larger but quite close (<1% per annum) to the true risk error. 7. Conclusions In this paper we address the estimation and assessment for the risk of a large portfolio. The risk is estimated by a substitution of a
good estimator of the volatility matrix. We study factor-based risk estimators, based on the approximate factor model with known factors and unknown factors. We derive the limiting distribution of the estimated risks under high dimensionality. Given that the existing upper bound for the risk estimation error is too crude and not applicable in practice, we introduce a new method, H-CLUB, to assess the accuracy of the risk estimation based on the confidence intervals. Our numerical results demonstrate that the proposed upper bounds significantly outperform the traditional crude bounds, and provide insightful assessment of the estimation of the true portfolio risks.
14
J. Fan et al. / Journal of Econometrics (
)
–
Table 9 True risk errors and estimated risk errors based on the 100 Fama–French Industrial Portfolios. Average of ∆(×10−4 )
Strategy
Sample-based covariance estimator Equal weighted 2.356 Min variance (c = 1) 1.006 Min variance (c = 1.6) 0.497 Factor-based covariance estimator Equal weighted 2.352 Min variance (c = 1) 0.999 Min variance (c = 1.6) 0.475 POET-based covariance estimator Equal weighted 2.353 Min variance (c = 1) 1.005 Min variance (c = 1.6) 0.490
Average of U (0.01)(×10−4 )
Average of true risk
True risk error
H-CLUB
2.757 1.233 0.621
20.81% 14.38% 11.58%
11.18% 7.00% 4.69%
11.37% 7.44% 5.17%
2.768 1.226 0.594
20.81% 14.45% 11.79%
11.16% 6.95% 4.52%
11.39% 7.41% 4.98%
2.768 1.231 0.626
20.81% 14.38% 11.59%
11.17% 6.99% 4.61%
11.38% 7.43% 5.22%
T ))1/2 . True risk is R( T )1/2 − ( T )1/2 | and H-CLUB is ′T T Here ∆ = | w′T (6 − 6) wT | and U (0.01) = 2.58( var( w′T 6w wT ). The True Risk Error is |( w′T 6w w′T 6w U (0.01)/ 4w 6w respectively.
The empirical study suggests that the financial excess returns may not be globally stationary. Our method also allows for locally stationary time series, as well as slowly time-varying covariance matrices through the localization in time (time-domain smoothing). In fact, the empirical study analyzes a kind of time-varying model, in which we estimate covariance matrices based on the data in rolling windows. Acknowledgments The research was partially supported by DMS-1206464 and NIH R01GM100474-01, NIH R01-GM072611. Appendix A. Proofs for the sample covariance In this section, ZT ,t = w′T Rt R′t wT − Ew′T Rt R′t wT , and γT (h) = EZT ,t ZT ,t +h . In particular, γT (0) = var(ZT ,t ). We first prove that for 6 = S, 6f , 6P , K , the risk estimator
w′T 6wT = Op (1).
′T T = Op (1). Lemma A.1. w′T 6wT = Op (1) and w 6w Proof. It is true that for 6 = S, 6f , or 6P , K , we have ∥6 − 6∥max =
op (1) (see Theorem 3.2 of Fan et al., 2013). Hence ∥ 6∥max ≤ ∥6∥max + ∥ 6 − 6∥max = Op (1). In addition, since ∥wT ∥1 = O(1), we have w′T 6wT ≤ ∥wT ∥21 ∥ 6∥max = Op (1). Finally, because ∥ wT − wT ∥1 = op (1), ∥ wT ∥1 = Op (1). The result then also follows from the inequality
′T T ≤ ∥ w 6w wT ∥21 ∥ 6∥max = Op (1). A.1. Proof of Lemma 4.1
Proof. Let s = b1 (2 log NT )1/r1 . Then P ( max |uit | > s) ≤ NT exp(−(s/b1 )r1 ) i≤N ,t ≤T
≤ exp(log NT − (s/b1 )r1 ) → 0. Hence maxi≤N ,t ≤T |uit | = Op ((log NT )1/r1 ). Similarly, maxt ≤T ∥ft ∥
= Op ((log T )1/r2 ).
max |Rit | ≤ max ∥bi ∥ max ∥ft ∥ + max |uit | t
i
1/r2
= Op ((log T )
Proof. Note that for any N × N matrix A = (aij ), |w′T AwT | ≤ ∥A∥max ∥wT ∥21 . Thus
|(w′T SwT )2 − (w′T 6wT )2 | ≤ ∥S + 6∥max ∥wT ∥21 |w′T (S − 6)wT | T = Op (|w′T (S − 6)wT |) = Op T −1 Z T ,t . t =1 T The Chebyshev inequality implies |T −1 t =1 ZT ,t | = Op (T −1/2 σT2 ). (ii) Let Xt ,h = (w′T Rt )2 (w′T Rt +h )2 . By the Chebyshev inequality, for any s > 0, T 1 P max Xt ,h − EXt ,h > s h≤L T t =1 T L max var Xt ,h T 1 h≤L t =1 ≤ L max P Xt ,h − EXt ,h > s ≤ . T t =1 h≤L T 2 s2 T Note that maxh≤L var( t =1 Xt ,h ) = O(T ) since maxh≤L var(Xt ,h ) = T O(1) and maxh≤L t =1 cov(X1,h , Xt +1,h ) = O(1). Therefore, for √ arbitrarily small ϵ > 0, by choosing s > LM /(ϵ T ), P (maxh≤L | T1 T T EXt ,h | > s) < ϵ , which implies maxh≤L | T1 t = 1 X t ,h − √ t = 1 X t ,h − EXt ,h | = Op ( L/T ). In addition, P (max |Xt ,h − EXt ,h | > (L max var(Xt ,h )/ϵ)1/2 )
Lemma A.2. maxt ≤T ∥ft ∥ = Op ((log T )1/r2 ), maxi≤N ,t ≤T |uit | = Op ((log NT )1/r1 ), maxi≤N ,t ≤T |Rit | = Op ((log T )1/r2 + (log NT )1/r1 ).
i,t
Lemma A.3. (i) |(w′T SwT )2 − (w′T 6wT )2 | = Op (T −1/2 σT ). T −h ′ ′ −1 2 2 (ii) maxh≤ − E (w′T Rt )2 (w′T L |T t =1 (wT Rt ) (wT Rt +h ) √ Rt +h )2 | = Op ( L/T ). (iii) maxh≤L |w′T SwT − T −1 Tt =−1h (w′T Rt )2 | = Op (L2 w′T 6wT /T ). (iv) maxh≤L |w′T SwT − T −1 Tt =−1h (w′T Rt +h )2 | = Op (L2 w′T 6 wT /T ).
i≤N ,t ≤T
+ (log NT )1/r1 ).
h≤L
h≤L
L max var(Xt ,h )
≤
h≤L
L max var(Xt ,h )/ϵ
= ϵ.
h ≤L
√
Thus maxh≤L |Xt ,h − EXt ,h | = Op ( L), which implies
T −h 1 max Xt ,h − EXt ,h ≤ Op ( L/T ) + max |Xt ,h − EXt ,h |L/T h≤L T h ≤L t =1 = Op L/T . T (iii) The left hand side is maxh≤L T −1 t =T −h+1 (w′T Rt )2 = maxT −L+1≤t ≤T (w′T Rt )2 L/T . For any s > 0, P (maxT −L+1≤t ≤T (w′T Rt )2
J. Fan et al. / Journal of Econometrics (
> s) ≤ LP ((w′T Rt )2 > s) ≤ Lw′T 6wT /s, which then implies maxT −L+1≤t ≤T (w′T Rt )2 = Op (Lw′T 6wT ). The desired result then follows. (iv) A similar argument as above shows maxT +1≤t ≤T +L (w′T Rt )2
≤ = Op (LwT 6wT ). Hence maxh≤L T t =T −h+1 (wT Rt +h ) maxT +1≤t ≤T +L (w′T Rt )2 L/T = Op (L2 w′T 6wT /T ). This implies that the desired quantity is bounded by a + Op (L2 w′T 6wT /T ) where ′
−1
T
′
2
T 1 2 ′ 2 a = max [(wT Rt ) − (wT Rt +h ) ] h≤L T t =1 L L 1 1 ′ 2 ′ 2 (wT Rt ) + (wT RT +t ) . ≤ T t =1 T t =1 L ′ ′ 2 2 2 ′ Note that | T1 t =1 (wT Rt ) | ≤ max1≤t ≤L (wT Rt ) L/T = Op (L wT 6 L ′ 2 wT /T ). Similarly we have | T1 = Op (L2 w′T t =1 (wT RT +t ) | 6wT /T ). √ Lemma A.4. maxh≤L | γ (h) − γT (h)| = Op ( L/T + ∥ wT − wT ∥1 4/r2 4/r1 ((log T ) + (log NT ) )). Proof. The triangular inequality implies maxh≤L | γ (h) − γT (h)| ≤ a , where i i =1
8
T −h −1 ′ a1 = max T (wT Rt )2 (w′T Rt +h )2 − E (w′T Rt )2 (w′T Rt +h )2 , h≤L t =1 a2 = |(w′T SwT )2 − (w′T 6wT )2 |, a3
a4
a5
a6
a7
T )2 |. a8 = |(w′T SwT )2 − ( w′T Sw
15
Proof of Lemma 4.1. By the triangular inequality, | σ 2 − σT2 | ≤
3
bi , where
i =1
b1 = | γ (0) − γT (0)|,
b2 = 2
L (1 − h/L)| γ (h) − γT (h)|, h =1
b3 = 2
γT (h),
b4 =
h >L
L 2
L h=1
hγT (h)
√
b2 ≤ 2L maxh≤L | γ (h) − γT (h)| = wT ∥1 ((log T )4/r2 + (log NT )4/r1 )). Then
Op (L L/T + L∥ wT −
L 1 | σ 2 − σT2 | = Op L3/2 T −1/2 + hγ ( h) + γT (h) L h=1
+ L∥ wT − wT ∥1 ((log T )
h >L 4/r2
4/r1
+ (log NT )
) .
By the lemma’s conditions and Assumption 4.7, the first, third and the fourth terms are all stochastically dominated by σT2 . As for the second term
L
1 L
hγ (h), note that
∞
h=1 Then by dominated convergence theorem, limT 1L
∞
h =1
limT hγT (h)
h =1
1
σT2 L
|γT (h)|/σT2 < ∞. L h h=1 σ 2 γT (h) = T
I (h ≤ L) = 0 given that LσT2 → ∞.
√
σ 2 = Op (σT2 ) The second part US (ϵ) = o( log N /T ) is due to 2 2 2 2 as |σT − σ | = op (σT ) and σT = O(1) = o(log N ), as N → ∞.
Lemma A.5. (i) EZT2,1 = O(1) and maxl≤T |γT (l)| = O(1).
(ii) For any K ∈ [m, T ], var( K h=1 (1 − h/K )γT (h) = O(K ).
K
t =1
ZT ,t ) = K γT (0) + 2K
Proof. (i) It suffices to show E (w′T Rt )4 = O(1). In fact by maxi≤N ER4it = O(1), E (w′T Rt )4 =
N
ijkl=1
wi wj wk wl ERit Rjt Rkt Rlt ≤ maxi≤N
∥wT ∥ = O(1). The second part follows immediately. (ii) It is well known that for a stationary process with zero mean, K K var(K −1 t =1 ZT ,t ) = K −1 γT (0) + 2K −1 h=1 (1 − h/K )γT (h), which implies the result. ER4it
4 1
Lemma A.6. Under the assumptions of Theorem 4.1,
T var (w′T Rt )2
−1/2 T w′T (S − 6)wT →d N (0, 1).
(A.1)
t =1
We have, w′T SwT ≤ |w′T (S − 6)wT | + w′T 6wT = Op (w′T 6wT + T −1/2 σT2 ). It then follows from Lemma A.3 and σT2 = O(1), L3 = √ O(T ), w′T 6wT = O(1) that ai = Op ( L/T ) for i = 1...4. In addition,
T ∥1 (∥ a5 ≤ ∥wT + w wT ∥21 + ∥wT ∥21 )∥ wT − wT ∥1 ( max |Rit |)4 i ≤ N ,t ≤ T
= Op (∥ wT − wT ∥1 ((log T )
–
A.2. Proof of Theorem 4.1
T −h = w′T SwT max w′T SwT − T −1 (w′T Rt )2 , h ≤L t =1 T −h ′ ′ −1 ′ 2 = wT SwT max wT SwT − T (wT Rt +h ) , h≤L t =1 T −h 1 ′ 2 ′ 2 ′ 2 ′ 2 (wT Rt ) (wT Rt +h ) − ( wT Rt ) ( wT Rt +h ) = max h ≤L T t =1 T −h 1 ′T Sw T (w′T Rt )2 w′T SwT − ( = max w′T Rt )2 w h≤L T t =1 T − h 1 ′ 2 ′ ′ 2 ′ T Sw T , (wT Rt +h ) wT SwT − ( wT Rt +h ) w = max h≤L T t =1
4/r2
)
+ (log NT )
4/r1
)),
it
var( t =1 ZT ,t ) and B2T = var( t =1 ZT ,t ) = O(T ). By Davydov’s inequality (Proposition 2.5 of Fan and Yao, 2003 with p = 1/2 and q = 1/4), there are constants M , M1 , M2 > 0 such that for any integer h ≥ 0,
K
3 1
+ ∥w∥21 ∥ wT ∥ + ∥ wT ∥21 ∥wT ∥1 ) = Op (∥ wT − wT ∥1 ((log T )2/r2 + (log NT )2/r1 )). Term a7 is bounded in the same way as a6 . Finally,
T ∥1 a8 ≤ (∥wT ∥21 + ∥ wT ∥21 )(∥wT ∥1 + ∥ wT ∥1 )∥S∥2max ∥wT − w
where the last equality follows from the α -mixing condition and that E (w′T Rt )4 = O(1). By the assumption that αR (T ) = o(γT (0)), the correlation ρ(T ) = | Corr(ZT ,t , ZT ,t +T )| ≤ |γT (T )|/γT (0) ≤ exp(−MT r3 /4). To apply Theorem 2.1 of Peligrad (1996), we also need to check
T ∥1 ), = Op (∥wT − w
√ which implies maxh≤L | γ (h) − γT (h)| = Op ( L/T ).
T
|γT (h)| ≤ 8α(|h|)1/4 (E (w′T Rt )2 )1/2 (E (w′T Rt )4 )1/4 = M2 exp(−M |h|r3 /4)
T ∥1 max |Rit | ∥S∥max (∥wT ∥ + ∥ a6 ≤ ∥wT − w wT ∥31 2
Proof. The√proof is based on Theorem 2.1 of Peligrad (1996). T We have T w′T (S − 6)wT = T −1/2 t =1 ZT ,t . Define B2T ,K =
lim sup T
T 1
B2T t =1
EZT2,t < ∞
(A.2)
16
J. Fan et al. / Journal of Econometrics (
and the Lindeberg condition: ∀ϵ > 0,
ϵ BT ) → 0.
1 B2T
T
t =1
EZT2,t I (|ZT ,t | >
/γT (0) = [1 + 2 Th=1 ρ(h)](1 + o(1)). Hence T (A.2) holds because 1 + 2 h=1 ρT (h) is bounded away from zero. To verify the Lindeberg condition, note that there are c1 , c2 > 0, for all large T , c1 < T1 B2T /γT (0) < c2 . Then for any ϵ > 0, for some c > 0 and all large T , Note that
T 1
B2T t =1
1 2 B T T
EZT2,t I (|ZT ,t | > ϵ BT ) ≤
c
γ (0)
EZT2,t I (ZT2,t > T γ (0)ϵ 2 c1 ).
EZ 2
Because γ (T0,)t < ∞, hence by the dominated convergence theorem, lim T
c
γ (0)
EZT2,t I (ZT2,t > T γ (0)ϵ 2 c1 ) 1
= cE lim
γ (0)
T
ZT2,t I (ZT2,t > T γ (0)ϵ 2 c1 ) = 0.
Proofs of Theorem 4.1 and Corollary 4.1. Note that T σT2 /B2T → 1, we have T
[ w′T (S − 6) wT − w′T (S − 6)wT ]
T
log N
σT
T
Hence by (A.1),
T
σT2
T BT
op (σT (log N )−4/r1 ) = op (1).
[ w′T (S − 6) wT ] → N (0, 1). This also implies (A.3)
′T (S − 6) which also implies w wT = Op (T −1/2
σT2 ). Moreover,
since |σ − σ | = op (σ ), 2
2 T
1 1 − √ T | w′T (S − 6) wT | σ2 σ2
√
T −h ′ ′ −1 ′ 2 (ft )B wT − T × max wT B cov (wT Bft +h ) , h ≤L t =1 T −h −1 ′ 2 ′ 2 ′ 2 ′ 2 a5 = max T ( wT Bft +h ) ( wT Bft ) − (wT Bft +h ) (wT Bft ) , h≤L t =1 T − h T ) (ft ) a6 = maxT −1 ( w′T Bft +h )2 ( w′T B cov B′ w h≤L t =1 (ft ) − (w′T Bft +h )2 (w′T B cov B′ wT ) T −h T ) (ft ) ( w′T Bft )2 ( w′T B cov B′ w h≤L t =1 (ft ) − (w′T Bft )2 (w′T B cov B′ wT ) T −h −1 ′ T )2 − (w′T (ft ) (ft ) a8 = max T ( wT B cov B′ w B cov B′ wT )2 h≤L t =1
a7 = maxT −1
(w′T Bft +h )2 (w′T Bft )2 − E (w′T Bft )2 (w′T T −h Bft +h ) |, and a12 = maxh≤L |T −1 t =1 (w′T Bft +h )2 (w′T Bft )2 − ′ ′ 2 2 (wT Bft +h ) (wT Bft ) |. T Given the assumption that maxh≤L t =1 cov[(w′T Bf1 )2 (w′T B f1+h )2 , (w′T Bf1+t )2 (w′T Bf1+t +h )2 ] = O(1), the same argument √ of the proof of Lemma A.3(ii) implies a11 = Op ( L/T ). On the √ other hand, by (B.14) of Fan et al. (2011), ∥ B√ − B∥max = Op ( log N /T ), which implies ∥w′T ( B − B)∥ = Op ( log N /T ). It √ is then easy to show that a12 = Op ( log N /T ). It follows that √ a1 = Op ( (L + log N )/T ). By the triangular inequality, a2 = √ Op ( log N /T ). By the same argument of the proof of Lemma A.3, we have a3 = Op (L2 /T ) = a4 . Moreover, because maxi≤N ∥ bi ∥ = Op (1), maxi≤N ,t ≤T | b′i ft | = Op ((log T )1/r2 ), T −h t =1
a5 ≤ ∥ wT + w∥1 ∥ wT − wT ∥1 max | b′i ft |4 (∥wT ∥21 + ∥ wT ∥21 ) i≤N ,t ≤T
T
| σ 2 − σT2 | = T | wT (S − 6) wT | 2 = op (1). σT (1 + op (1)) ′T (S − 6) It then follows from (A.3) that T / σ 2w wT →d N (0, 1), √
(ft ) B cov B′ wT a4 = w′T
2
′T (S − 6) w wT →d N (0, 1)
2 T
T −h (ft ) (ft ) a3 = w′T B cov B′ wT maxw′T B cov B′ wT − T −1 (w′T Bft )2 , h≤L t =1
a11 = maxh≤L |T −1
∥S − 6∥max ∥ wT − w∥1 (∥ w∥1 + ∥w∥1 ) BT √
≤ Op
(ft ) a2 = |(w′T B cov B′ wT )2 − (w′T B cov(ft )B′ wT )2 |
a1 is bounded by a11 + a12 , where
T
≤
–
Hence the conditions of Theorem 2.1 of Peligrad (1996) are T 1 d satisfied, which implies B− T t =1 ZT ,t → N (0, 1), equivalent to (A.1).
BT
)
′
= Op ((log T )
4/r2
∥ wT − wT ∥1 )
(ft ) a6 ≤ ∥ wT − wT ∥1 max | b′i ft |2 ∥ B cov B′ ∥max i≤N ,t ≤T
× (∥wT ∥ ∥ wT ∥1 + ∥wT ∥31 + ∥ wT ∥21 ∥ wT + wT ∥1 ) 2 1
= Op ((log T )2/r2 ∥ wT − wT ∥1 ).
which gives the H-CLUB. Corollary 4.1 follows straightforward from applying the delta-method.
Term a7 is bounded as the same as a6 . Finally,
Appendix B. Proofs for the factor-based estimation
(ft ) a8 ≤ ∥ wT − wT ∥1 ∥ B cov B′ ∥2max (∥ wT ∥21 + ∥wT ∥21 ) × (∥ wT ∥1 + ∥wT ∥1 ) = Op (∥ wT − wT ∥1 ||).
B.1. Proof of Lemma 4.2 Lemma B.1. maxh≤L | γf (h) − γf (h)| (log T )4/r2 ∥ wT − wT ∥1 ).
√ = Op ( (L + log N )/T +
Proof. The triangular inequality implies maxh≤L | γf (h) − γf (h)| ≤
4
i=1
ai , where
T −h −1 ′ 2 ′ 2 ′ 2 ′ 2 (wT Bft +h ) (wT Bft ) − E (wT Bft ) (wT Bft +h ) , a1 = max T h≤L t =1
Proof of Lemma 4.2. We have | σf2 − σf2 | ≤ | γf (0) − γf (0)|, b2 = 2
L (1 − h/L)| γf (h) − γf (h)|,
L 2
L h=1
i =1
b3 = 2
bi , where b1 =
h >L
h=1
b4 =
3
hγf (h).
γf (h),
J. Fan et al. / Journal of Econometrics (
By the lemma’s condition, b3 = op (σf2 ) By Lemma B.1, h≤L
= Op (L (L + log N )/T + L(log T )4/r2 ∥ wT − wT ∥1 ) = op (σf2 ). ∞ 2 As for b4 , note that < ∞. By dominated h=1 |γf (h)|/σf convergence theorem,
T
L 1 h
γ (h) = 2 f
L h =1 σ f
∞
lim hγf (h) T
h=1
1
σf2 L
I (h ≤ L) = 0
given that Lσf2 → ∞. The second statement is due to σf2 = op (log N ).
Write R = (R1 , . . . , RT ) be N × T ; F = (f1 , . . . , fT ) be r × T , and (ft ) = FF′ /T . We have cov B = RF′ (FF′ )−1 . Define CT = B − B and (ft )−cov(ft ). Then we have the following decomposition: DT = cov 4 w′T ( 6f − 6)wT = i=1 di , where
(ft )B′ wT ; d2 = 2w′T CT cov
(ft )C′T wT , d3 = w′T CT cov
Lemma B.4. (i) |d3 | = Op (T −1 (w′T 6u wT )1/2 (E |w′T ut |4 )1/4 ). (ii) |d4 | = Op (sN (log N /T )1/2−q/2 w′T 6u wT ). Proof. (i) Because ∥(FF′ )−1 ∥ = Op (T −1 ), |d3 | = T −1 w′T EF′ (FF′ )−1 FE′ wT = Op (T −2 ∥FE′ wT ∥2 ). It then follows from Lemma B.2. 1 (ii) It follows from |d4 | ≤ ∥ 6u − 6u ∥∥wT ∥2 ≤ λ− min (6u )∥6u − 6u ∥w′T 6u wT and Lemma B.3. Lemma B.5. (i) E (w′T Bft )4 = O(1) and E (w′T Rt )4 = O(1); (ii) γT (0) = O(1) and γf (0) = O(1).
max Efjt4 ≤ max j ≤K
= tr[E (FE (E′ wT w′T E|F)F′ )] = tr[E (FE (E′ wT w′T E)F′ )]. Note that E (E′ wT w′T E) = (E [u′t wT w′T us ])t ≤t ,s≤T = (cov(w′T ut , w′T us ))t ≤t ,s≤T . By Davydov’s inequality, (see, e.g., Proposition 2.5 of Fan and Yao, 2003, with p = 1/2 and q = 1/4), 1/2 |cov(w′T ut , w′T us )| ≤ 8αf (|t − s|)1/4 (w′T 6u wT ) (E |w′T ut |4 )1/4 , ∞ where α(·) denotes the α -mixing coefficient. By t =1 αf (t )1/4 < ∞, we have cov(w′T ut , w′T us )E (fkt fks )
k=1 t =1 s=1
αf (|t − s|)1/4
t =1 s=1
= O(T (w′T 6u wT )1/2 (E |w′T ut |4 )1/4 ),
E (wT Bft )4 ≤ ∥wT B∥4 E ∥ft ∥4 ≤ ∥wT ∥41 max |bij |4 O(1) = O(1). i,j
E (w′T Rt )4 = E
≤
2 T
2 T
=
∥ 6u − 6u ∥ = Op sN
log N T
wT ,i wT ,j wT ,k wT ,l ERit Rjt Rkt Rlt
i,j,k,l≤N
≤ max |ERit Rjt Rkt Rlt |∥wT ∥41 ≤ max ER4jt ∥wT ∥41 = O(1). i,j,k,l
j ≤N
(ii) The result follows from (i) and by observing that γT (0) = var((w′T Rt )2 ) ≤ E (w′T Rt )4 and that γf (0) = var((w′T Bft )2 ) ≤ E (w′T Bft )4 . Lemma B.6.
∞
h=1
|γf (h)| < ∞, and
K
j =1
(
.
h=1
|γT (h)| < ∞.
Proof. By Davydov’s inequality (Proposition 2.5 of Fan and Yao, 2003, with p = 1/2 and q = 1/4), there are constants M1 , M2 > 0 such that for any integer h,
|γf (|h|)| ≤ 8αf (|h|)1/4 (E (w′T Bft )2 )1/2 (E (w′T Bft )4 )1/4 = M2 exp(−M |h|r3 /4) where the last equality follows from the α -mixing condition as well as the fact that E (w′T Bft )4 = O(1) dueto ∥wT ∥1 = O(1) ∞ (Lemma B.5). The first result the follows from h=1 exp(−Chr3 ) < ∞ ∞ for any C , r3 > 0. The proof of h=1 |γT (h)| < ∞ follows from the same arguments.
T /σf2 d1 →d N (0, 1).
T
N
i =1
wi bij )2 ≤
1/2−q/2
∞
through dim(wT ) = NT . Hence d1 = T −1 t =1 ZT ,t . Note that ∥w′T B∥2 ≤ K ∥B∥2max ∥wT ∥21 = O(1). Hence EZT2,1 = O(1).
Lemma B.3. For the factor-based thresholded error covariance matrix,
wT ,j Rjt
We define B2f ,K = var( t =1 ZT ,t ) and B2f = var( t =1 ZT ,t ) = O(T ). By the assumption that αf (T ) = o(γf (0)), the correlation | Corr(ZT ,t , ZT ,t +T )| ≤ |γf (T )|/γf (0) = o(1). Moreover, the Lindeberg condition can be verified by the same lines as in the proof of Lemma A.6. Hence the conditions of Theorem 2.1 of Peligrad (1996) are satisfied, which implies
K
Now write B = (bij )i≤N ,j≤K , then ∥w′T B∥2 =
maxi,j |bij |K ∥wT ∥ = O(1).
4
Proof. Let ZT ,t = w′T B(ft f′t − Eft f′t )B′ wT , which depends on T
|w′T BFE′ wT |
∥w′T B∥∥FE′ wT ∥.
2 1
N j =1
Lemma B.7.
which then implies (i). For part (ii), we have
(ft )(FF′ )−1 FE′ wT | = |d2 | = 2|w′T B cov
exp(−(s/b2 )r2 )ds < ∞.
x
E ∥FE′ wT ∥2 = E [tr(w′T EF′ FE′ wT )] = tr[E (FE′ wT w′T EF′ )]
T T
In addition,
Proof. We have,
= O(1)(w′T 6u wT )1/2 (E |w′T ut |4 )1/4
P (|fjt |4 > x)dx ≤
We have, because the dimension of ft is fixed,
Lemma B.2. (i) ∥FE′ wT ∥ = Op (T 1/2 (w′T 6u wT )1/4 (E |w′T ut |4 )1/8 ). (ii) |d2 | = Op (T −1/2 (w′T 6u wT )1/4 (E |w′T ut |4 )1/8 ).
r T T
∞
j ≤K
d4 = w′T ( 6u − 6u )wT .
We now study each of the above four terms separately. Let E = (u1 , . . . , uT ) be N × T . Then CT = EF′ (FF′ )−1 .
E ∥FE′ wT ∥2 =
17
Proof. (i) For all s > 0, P (|fjt | > s) ≤ exp(−(s/b2 )r2 ) implies
B.2. Proof of Theorem 4.2
d1 = w′T BDT B′ wT ;
–
Proof. By Lemma 3.1 in Fan et al. (2011), we have, maxi≤N T −1 T uit − uit )2 = Op (log N /T ). The result then follows from t =1 (ˆ Theorem A.1 in Fan et al. (2013).
b1 + b2 ≤ 2L max | γf (h) − γf (h)|
lim
)
1 B− f
T t =1
ZT ,t →d N (0, 1).
T
(B.1)
18
J. Fan et al. / Journal of Econometrics (
Now B2f = T γf (0) + 2T
h=1 γf (h) − 2T h=1 hγf (h)/T . Because T ∞ h γ ( h )/ T = o (γ ( 0 )+ 2 γ ( h )) , we have T −1/2 (σf2 )−1/2 f f h=1 f hT=1 d t =1 ZT ,t → N (0, 1).
T
T
T
wT ( 6f − 6)wT →d N (0, 1). 2 ′
(B.2)
σf
–
√
K × T matrix such that the rows of F/ T are the eigenvectors corresponding to the r largest eigenvalues of the T × T matrix R′ R. Let B = R F′ /T . Define a K × K matrix H=
Lemma B.8.
)
1 −1 ′ ′ FF B B. V
T
According to Stock and Watson (2002), B and ft can be treated as estimators of BH−1 and Hft respectively. C.1. Proof of Lemma 4.3
Proof. In fact,
T
wT ( 6f − 6)wT = ′
σf2
By Lemma B.7, it suffices to show that
T
d + σf2 1
T
σf2
(d2 + d3 + d4 ).
T /σ (d2 +d3 +d4 ) = op (1). 2 f
T /σf2 |d2 | = Op ((w′T 6u wT )1/4 (E |w′T ut |4 )1/8 )/ σf2
= Op ((w′T 6u wT )1/4 / σf2 ) = op (1) since E |w′T ut |4
O(1). Lemma B.4 implies
=
T /σf2 |d3 |
=
Op ((w′T 6u wT )1/2 (T σf2 )−1/2 ) = op (1) since w′T 6u wT = o(σf4 ) = O(1). It also follows from Lemma B.4 that 2 −1/2 f
(wT 6u wT sN (σ ) ′
desired result.
(log N )
1/2−q/2
T
q/2
T
T /σf2 |d4 | = Op
) = op (1). This implies the
ft − Hft = (V/N )−1
= Op ( logTNT ). Note that T σf2 /B2f → 1, we have
T
≤
√ ≤ Op
T
σf
log NT
T 1
op (σf (log NT )−4/r1 ) = op (1).
The first statement var
T t =1
(w′T Bft )2
−1/2
(B.3)
′T ( Tw 6f − 6)
T →d N (0, 1) then follows from (B.2) and (B.3). They also yield w
T
σf2
′T ( w 6f − 6) wT →d N (0, 1),
and wT ( 6f − 6) wT = Op (T
′
−1/2
(B.4)
σ ). Moreover, since |σ − σ |= 2 f
2 f
2 f
op (σ ), 2 f
√
T | w′T ( 6f − 6) wT ||σf
−1
√
T | wT ( 6f − 6) wT |
− σf | −1
which gives the H-CLUB.
T t =1
+
T s=1
T 1 fs ηst + fs ξst
(C.1)
T s=1
T 1
T s=1
T 1
T t =1
2 fˆis ζst
+
T 1
T t =1
2
T 1
fˆis ξst
T s =1
T 1
T s =1
1
= Op
N
2 fˆis ηst
.
Moreover, by Lemma C.9 of Fan et al. (2013), maxi≤r T1 t =1 (ft − 2 Hft )i = Op (1/T + 1/N ). Applying the inequality (a + b)2 ≤ 2a2 + 2b2 gives, T
T 1
T t =1
≤
| σf2 − σf2 |
= op (1). σf2 (1 + op (1)) ′T ( It then follows from (B.4) that T / σf2 w 6f − 6) wT →d N (0, 1), =
′
T s=1
N
T
T s=1
T 1 fs E (u′s ut )/N + fs ζst
where ζst = u′s ut /N − E (u′s ut )/N, ηst = f′s i=1 bi uit /N, and ξst = f′t pi=1 bi uis /N. It follows from Lemma C.7 in Fan et al. (2013) that
∥ 6f − 6∥max ∥ wT − w∥1 (∥ w∥1 + ∥w∥1 )
Bf
T 1
T 1
+
[ w′T ( 6f − 6) wT − w′T ( 6f − 6)wT ]
Bf
=
Proof. (i) By Lemma B.16 in an earlier version of Fan et al. (2013),1 B∥max ≤ ∥ B − BH−1 ∥max + ∥B∥max = Op (1). Thus ∥w′T B∥2 ≤ ∥ r ∥ B∥2max ∥wT ∥21 = Op (1). On the other hand, ∥w′T ( B − BH−1 )∥2 ≤ r ∥ B − B∥2max ∥wT ∥21 = Op (1/N + log N /T ). (ii) By (A.1) in Bai (2003), the following identity holds:
Proof of Theorem 4.2. By Theorem 3.2 of Fan et al. (2011), ∥ 6f −
6∥max
Op (1), and ∥w′T ( B − BH−1 )∥
T (ii) ∥ F − HF∥2 /T = T −1 t =1 ∥ ft − Hft ∥2 = Op (N −1 + T −2 ). ′ 2 (iii) ∥wT E∥ = Op (T ). f′t − Hft (Hft )′ ]∥ = Op (N −1/2 + T −1 ). (iv) ∥T −1 Tt=1 [ ft
By Lemma B.2,
Lemma C.1. (i) ∥w′T B∥ = Op (N −1/2 + (log N /T )1/2 )
≤
T 1
T s=1
T 1
T t =1 T 2
T 2
T t =1
2
T 1
T s =1
T t =1
+
2 fˆis E (u′s ut )/N
[|( fs − Hfs )i | + |(Hfs )i |]|E (u′s ut )|/N
T 1
T s =1
2 |( fs − Hfs )i ||E (us ut )|/N
T 1
T s =1
′
2 |(Hfs )i ||E (us ut )|/N ′
.
Appendix C. Proofs for the POET-based estimation Let V denote the K × K diagonal matrix of the first K largest eigenvalues of S in decreasing order. Let F = ( f1 , . . . , fT ) be a
1 Downloadable from html.
http://terpconnect.umd.edu/∼yuanliao/factor2/factor2.
J. Fan et al. / Journal of Econometrics (
By the Cauchy–Schwarz inequality and that maxt ≤T /N |2 = O(1), T 2
T t =1
T 1
T s=1
≤ max i ≤r
T
s=1
|E (u′s ut )
2
T s =1
T s=1
T
1 t =1 T
T t =1
=
(
T
NT
|(Hfs )i ||E (u′s ut )|/N )2 ≤ Op (T −1 ) t =1 ( T1 T 1 T ′ ′ 2 −1 s=1 ∥fs ∥|E (us ut )| s=1 ∥fs ∥|E (us ut )|/N ) . We have T t =1 ( T 2 −2 /N ) = Op (T ) since 2 T T 1 1 E ∥fs ∥|E (u′s ut )|/N 2 T
Also,
T
T
T
s=1
T s=1
T T 1
T 2 s=1 l=1
E ∥fs ∥∥fl ∥
≤ max E ∥fs ∥2 max s≤T
s≤T
′
′
N
N
T s=1
|Eu′s ut |/N
T
(log T )2/r2 ∥ wT − wT ∥1 ).
−2
L+log N T
+
√1 N
+
ai , where
a2 = |(w′T B B′ wT )2 − (w′T BB′ wT )2 |,
a8
T −h ′ ′ −1 ′ 2 = wT B B wT max wT B B wT − T (wT B ft ) , h ≤L t =1 T −h ′ ′ ′ ′ −1 ′ 2 = wT B B wT max wT B B wT − T (wT B ft +h ) h ≤L t =1 T − h −1 ′ 2 ′ 2 ′ 2 ′ 2 = max T ( wT B ft +h ) ( wT B ft ) − (wT B ft +h ) (wT B ft ) , h≤L t =1 T −h T ) = maxT −1 ( w′T B ft +h )2 ( w′T B B′ w h ≤L t =1 − (w′T B ft +h )2 (w′T B B′ wT ) T −h −1 ′ 2 ′ ′ T ) − (w′T = max T ( wT B ft ) ( wT BB w B ft )2 (w′T B B′ wT ) h≤L t =1 T −h T )2 − (w′T = max T −1 ( w′T B B′ w B B′ wT )2 . h≤L t =1 ′ ′
Here a1 is bounded by a11 + a12 , where a11 = maxh≤L |T −1 ′ ′ ′ ′ 2 2 2 2 t =1 (wT Bft +h ) (wT Bft ) − E (wT Bft ) (wT Bft +h ) |, and a12 =
T −h
maxh≤L |T −1
T −h t =1
= Op ∥w′T ( B − BH−1 )∥ + ∥w′T BH−1 ∥
T −1
T
1/2 . ∥ ft − Hft ∥2
It follows from Lemma C.1 that a12 = Op ( log N /T + N −1/2 ), √ thus a1 = Op ( (L + log N )/T + N −1/2 ). On the other hand, for g1 , . . . , g5 defined in (C.2), a2 = Op (|wT (B B − BB )wT |) = Op ′
′
′
5
|gi |
i=1
T −h −1 ′ 2 ′ 2 ′ 2 ′ 2 a1 = maxT (wT B ft +h ) (wT B ft ) − E (wT Bft ) (wT Bft +h ) , h≤L t =1
a7
t =1
t =1
Proof. The triangular inequality implies maxh≤L | γP (h) − γf (h)| ≤
a6
1/2
√
−1
a5
+
T
t =1
T −h T −1 (w′T B ft − w′T Bft )2
= O(T −2 ).
Lemma C.2. maxh≤L | γP (h) − γf (h)| = Op (
a4
×
Hft 2i −1 −2
Cauchy–Schwarz inequality and part (ii).
a3
T
h≤L
2
) = Op (N + T ), and + T ). T N (iii) E ∥w′T E∥2 = E t =1 ( i=1 wi uit )2 = T maxi,j |Euit ujt |∥wT ∥21 = O(T ). Thus ∥w′T E∥2 = Op (T ). Finally, (iv) follows from the
i=1
T −h −1 ′ ′ ′ ′ = Op max T (wT B ft +h )(wT B ft ) − (wT Bft +h )(wT Bft ) h≤L t =1 1/2 T −h = Op max T −1 (w′ B ft +h − w′ Bft +h )2
This implies maxi≤r T t =1 (ft − T −1 2 thus T t =1 ∥ft − Hft ∥ = Op (N
8
As in the proof of Lemma B.1, a11 the Cauchy–Schwarz inequality that
|Eus ut | |Eul ut |
T 1
−1
19
√ = Op ( L/T ). It follows from
T 1 1 ′ 2 + . (|Eus ut |/N ) = Op 2
1 ( fs − Hfs )2i
–
a12
|( fs − Hfs )i ||E (u′s ut )|/N
T 2
)
(w′T B ft +h )2 (w′T B ft )2 − (w′T Bft +h )2 (w′T Bft )2 |.
σf2 /T = op L/T . T ′ 1 ′ ′ a3 = Op (1) max wT B ft ft B wT h≤L T t =T −h +1 T 1 = Op ∥ ft ∥2 . = Op
T t =T −L+1
T 2 2 t =T −L+1 ∥ft − Hft ∥ + T T 2 2 t =T −L+1 ∥Hft ∥ . On one hand, E T t =T −L+1 ∥ft ∥ = O(L/T ). On the other hand, by Theorem 3.3 in Fan et al. (2013), maxt ∥ ft − T 2 ∥ f − Hf ∥ = O ( L / T ) . Thus Hft ∥ = op (1), hence T2 t t p t =T −L+1 a3 = Op (L/T ). Similarly, we have a4 = Op (L/T ). Moreover, by Corollary 3.1 of Fan et al. (2013), maxi,t | b′i ft | ≤ ′ 1/r2 ′ op (1)+ maxi,t |b ft | = Op ((log T ) ), and ∥ B B ∥max = Op (1). Thus
We have
T
1 T
T
t =T −L+1 2
∥ ft ∥2 ≤
2 T
i
it follows from the same lines of proofs as those in Lemma B.1 that a5 + · · · + a8 = Op ((log T )2/r2 ∥ wT − wT ∥1 ). Proof of Lemma 4.3. We have | σP2 − σf2 | ≤ | γP (0) − γf (0)|,
b2 = 2
L (1 − h/L)| γP (h) − γf (h)|,
L 2
L h=1
i=1
b3 = 2
bi , where b1 =
γf (h),
h >L
h=1
b4 =
3
hγf (h).
By Lemma C.2, b1 + b2 ≤ 2L maxh≤L | γP (h) − γf (h)|, which is
√
√
Op (L (L + log N )/T + L(log T )4/r2 ∥ wT − wT ∥1 + L/ N ) = op (σf2 ). Hence the proof is the same as that of Lemma 4.2.
20
J. Fan et al. / Journal of Econometrics (
Therefore, var
If we write CT = B − BH−1 and D T = T −1 then w′T ( 6P , K − 6)wT =
g5 = w′T BH−1
i=1
T
′
t =1 ft ft
− cov(ft ),
gi , where
T t =1
′ [ ft f′t − Hft (Hft )′ ]H−1 B′ wT .
t =1
(w′T Bft )2
′T ( Tw 6P , wT →d K − 6)
′T ( w 6P , wT →d N (0, 1). K − 6)
(C.2)
(C.4)
T | w′T ( 6P , wT ||σf−1 − σP−1 | K − 6)
| σP2 − σf2 |
√
T /σf2 g1 →d N (0, 1). We first show that
from Lemma B.7 that
−1/2
Moreover, since |σf2 − σP2 | = op (σf2 ),
Recall the definition of d1 in Appendix B.2, g1 = d1 . Thus it follows
T
σf2 √
T 1
T
N (0, 1), and
′
T
T
5
B′ wT , g2 = wT CT g4 = w′T ( − 6u )wT ,
g1 = wT B DT B′ wT , C′ wT , g3 = w′ BH−1
–
C.2. Proof of Theorem 4.3
′
)
T /σf2 gi are asymptotically negligible for i = 2, . . . , 5. These
= op (1). σf2 (1 + op (1)) ′T ( σP2 w 6P , wT →d N (0, 1), It then follows from (C.4) that T / K −6) =
T | w′T ( 6P , wT | K − 6)
which validates the H-CLUB.
results are given in the following lemma. Lemma C.3. (i) |g2 | = Op (T −1/2 (w′T 6u wT )1/4 + N −1/2 + T −1 ), (ii) |g3 | = Op (T −1/2 (w′T 6u wT )1/4 + N −1/2 + T −1 ). (iii) |g5 | = Op (N −1/2 + T −1 ). Proof. Using the facts that R = BF + E, B = R F′ /T and F F′ /T = IK , we have
B − BH−1 = BH−1 (HF − F) F′ /T + E( F − HF)′ /T + EF′ H′ /T .
Lemma C.1.
Lemma C.4. Suppose ft and ut are independent, and Eft = Eut = 0, then
T T T ′ 2 ′ 2 ′ 2 var (wT Rt ) = var (wT Bft ) + var (wT ut ) t =1
−1
Proof of Theorem 4.3. By Lemma C.3 and the assumption that T σf2 → ∞, σf2 N /T → ∞ and w′T 6u wT = o(σf4 ), we have
The theorem follows from the following lemma.
t =1
Thus g2 = g21 + g22 + g23 , where g21 = wT BH (HF − F) F′ /T B′ wT , ′ ′ ′ ′ ′ ′ ′ g22 = wT E(F − HF) /T B wT , and g23 = wT EF H /T B wT . It was shown by Fan et al. (2013) that ∥H∥ = Op (1) = ∥H−1 ∥. Thus by Lemma C.1, |g21 | ≤ Op (1)∥HF − F∥∥ F∥/T = Op ( 1/N + 1/T 2 ). √ In addition, |g22 | ≤ Op ( T )∥ F − HF∥/T = Op ( 1/N + 1/T 2 ). Finally, by Lemma B.2 , |g23 | ≤ ∥w′T EF′ ∥Op (1/T ) = Op (T −1/2 (w′T 6u wT )1/4 ). The proof for the convergence rate of |g3 | is the same, so is omitted. Finally, the rate of convergence for |g5 | follows from ′
C.3. Proof of Theorem 4.4
T /σf2 gi = op (1) for i = 2, 3, 5. In addition, by Theorem 3.1 of Fan
+ var 2
∥ − 6u ∥ = Op sN
log N T
which then implies that
+
1
1/2−q/2
N
t =1
T = var (w′T Bft )2 + (w′T ut )2 + 2w′T Bft w′T ut t =1
T T ′ 2 ′ 2 (wT Bft ) + var = var (wT ut ) t =1
, + 2 cov
d T /σf2 w′T ( 6P , K − 6)wT → N
t =1
∥ 6P , wT − w∥1 (∥ w∥1 + ∥w∥1 ) K − 6∥max ∥ √
≤ Op
+ Op
log N
σf √
T
T
σf
√
N
op (σf (log NT )
−4/r1
2
T
′
′
wT Bft wT ut
.
t =1
t =1
s,t
cov[(wT Bfs )2 , w′T Bft w′T ut ] ′
E (w′T Bfs )2 w′T Bft w′T ut =
s ,t
E (w′T Bfs )2 w′T Bft Ew′T ut = 0.
′
Finally, as Eft = 0 implies EwT Bft = 0, we have
)
T T ′ 2 ′ ′ cov (wT ut ) , wT Bft wT ut
t =1
op (σf (log NT )−4/r1 ) = op (1).
′
2
s≤T ,t ≤T
=
T
(wT Bft ) + (wT ut ) , 2 ′
T T ′ 2 ′ ′ cov (wT Bft ) , wT Bft wT ut =
Bf
T
It suffices to show the covariance term is zero. In fact, since Ew′T Bft w′T ut = 0,
[ wT ( 6P , wT − w′T ( 6P , K − 6) K − 6)wT ] ≤
w′T Bft w′T ut
t =1
′
T
t =1
unknown factors,
Bf
t =1
T
+ var 2
(0, 1). By Theorem 3.2 of Fan et al. (2013), ∥ 6P , K − 6∥max = log N 1 √ Op ( + N ). Due to N /T → ∞ is assumed in the case of T T
T /σf2 g4 = op (1). It then follows from
.
T ′ 2 var (wT Rt )
Assumption 4.6 and Lemma B.7,
w′T Bft w′T ut
Proof. Since Rt = Bft + ut , and ft and ut are independent,
t =1
et al. (2013),
t =1
T
(C.3)
=
s≤T ,t ≤T
t =1
cov[(w′T us )2 , w′T Bft w′T ut ]
J. Fan et al. / Journal of Econometrics (
=
s,t
=
s,t
E (w′T us )2 w′T Bft w′T ut E (w′T Bft )E (w′T us )2 w′T ut = 0.
References Ahn, S., Horenstein, A., 2013. Eigenvalue ratio test for the number of factors. Econometrica 81, 1203–1227. Andrews, D., 1991. Heteroskedasticity and autocorrelation consistent covariance matrix estimation. Econometrica 59, 817–858. Antoine, B., 2011. Portfolio Selection with estimation risk: a test-based approach. J. Financ. Econom. 10, 164–197. Antoniadis, A., Fan, J., 2001. Regularized wavelet approximations. J. Amer. Statist. Assoc. 96, 939–967. Bai, J., 2003. Inferential theory for factor models of large dimensions. Econometrica 71, 135–171. Bai, J., Ng, S., 2002. Determining the number of factors in approximate factor models. Econometrica 70, 191–221. Bannouh, K., Martens, M., Oomen, R., Dijk, D., 2012, Realized mixed-frequency factor models for vast dimensional covariance estimation. Manuscript. Bianchi, D., Carvalho, C., 2011, Risk assessment in large portfolios: why imposing the wrong constraints hurts. Manuscript. Brandt, M., Santa-Clara, P., Valkanov, R., 2009. Covariance regularization by parametric portfolio policie: exploiting characteristics in the cross-section of equity returns. Rev. Financ. Stud. 22, 3411–3447. Brodie, J., Daubechies, I., Mol, C., Giannone, D., Loris, I., 2009. Sparse and stable Markowitz portfolios. PNAS 106, 12267–12272. Cai, T., Liu, W., 2011. Adaptive thresholding for sparse covariance matrix estimation. J. Amer. Statist. Assoc. 106, 672–684. Chamberlain, G., Rothschild, M., 1983. Arbitrage, factor structure and mean–variance analyssi in large asset markets. Econometrica 51, 1305–1324. Chang, C., Tsay, R., 2010. Estimation of covariance matrix via the sparse Cholesky factor with lasso. J. Statist. Plann. Inference 140, 3858–3873. Chudik, A., Pesaran, H., Tosetti, E., 2011. Weak and strong cross section dependence and estimation of large panels. Econom. J. 14, C45–C90. Connor, G., Korajczyk, R., 1993. A Test for the number of factors in an approximate factor model. J. Finance 48, 1263–1291. DeMiguel, V., Garlappi, L., Nogales, F., Uppal, R., 2009a. A generalized approach to portfolio optimization: improving performance by constraining portfolio norms. Manage. Sci. 55, 798–812. DeMiguel, V., Garlappi, L., Nogales, F., Uppal, R., 2009b. Optimal versus naive diversification: How inefficient is the 1/N portfolio strategy? Rev. Financ. Stud. 22, 1915–1953. El Karoui, N., 2010. High-dimensionality effects in the Markowitz problem and other quadratic programs with linear constraints: Risk underestimation. Ann. Statist. 38, 3487–3566.
)
–
21
Fama, E., French, K., 1992. The cross-section of expected stock returns. J. Finance 47, 427–465. Fan, J., Fan, Y., Lv, J., 2008. High dimensional covariance matrix estimation using a factor model. J. Econometrics 147, 186–197. Fan, J., Liao, Y., Mincheva, M., 2011. High dimensional covariance matrix estimation in approximate factor models. Ann. Statist 39, 3320–3356. Fan, J., Liao, Y., Mincheva, M., 2013. Large covariance estimation by thresholding principal orthogonal complements (with discussion). J. R. Stat. Soc. Ser. B 75, 603–680. Fan, J., Yao, Q., 2003. Nonlinear Time Series: Nonparametric and Parametric Methods. Springer-Verlag, New York, p. 576. Fan, J., Zhang, J., Yu, K., 2012. Asset Allocation and Risk Assessment with Gross Exposure Constraints for Vast Portfolios. J. Amer. Statist. Assoc. 107, 592–606. Fryzlewicz, P., 2012. High-dimensional volatility matrix estimation via wavelets and thresholding. Biometrika 100, 921–938. Gagliardini, P., Ossola, E., Scaillet, O., 2010. Time-varying risk premium in large cross-sectional equidity datasets. Swiss Financ. Inst. Occasional Paper Ser. 11–40. Gandy, A., Veraart, L., 2012. The effect of estimation in high-dimensional portfolios. Math. Finance 23, 531–559. Gomez, K., Gallon, S., 2011. Comparison among high dimensional covariance matrix estimation methods. Revista Columb. Estad. 34, 567–588. Jacquier, E., Polson, N., 2010, Simulation-based-estimation in portfolio selection. Manuscript. Jagannathan, R., Ma, T., 2003. Risk reduction in large portfolios: why imposing the wrong constraints helps. J. Finance 54, 1651–1683. Lai, T., Xing, H., Chen, Z., 2011. Mean–variance portfolio optimization when means and covariances are unknown. Ann. Appl. Statist. 5, 798–823. Ledoit, O., Wolf, M., 2003. Improved estimation of the covariance matrix of stock returns with an application to portfolio selection. J. Empirical Financ. 10, 603–621. Merlevède, F., Peligrad, M., Rio, E., 2011. A Bernstein type inequality and moderate deviations for weakly dependent sequences. Probab. Theory Related Fields 151, 435–474. Newey, W., West, K., 1987. A simple, positive semi-definite, heteroskedasticity and autocorrelation consistent covariance matrix. Econometrica 55, 703–708. Newey, W., West, K., 1994. Automatic lag selection in covariance matrix estimation. Rev. Econom. Stud. 61, 631–654. Onatski, A., 2010. Determining the number of factors from empirical distribution of eigenvalues. Rev. Econ. Statist. 92, 1004–1016. Peligrad, M., 1996. On the asymptotic normality of sequences of weak dependent random variables. J. Theoret. Probab. 9, 703–715. Pesaran, M.H., Zaffaroni, P., 2008, Optimal asset allocation with factor models for large portfolios, Cesifo working paper 2326. Rothman, A., Levina, E., Zhu, J., 2009. Generalized thresholding of large covariance matrices. J. Amer. Statist. Assoc. 104, 177–186. Silverman, B., 1986. Density Estimation for Statistics and Data Analysis. Chapman and Hall, London. Stock, J., Watson, M., 2002. Forecasting using principal components from a large number of predictors. J. Amer. Statist. Assoc. 97, 1167–1179. Yen, Y., 2013, Sparse weighted norm minimum variance portfolio. Manuscript.