Finance Research Letters 5 (2008) 236–244
Contents lists available at ScienceDirect
Finance Research Letters www.elsevier.com/locate/frl
Estimation error in the average correlation of security returns and shrinkage estimation of covariance and correlation matrices Clarence C.Y. Kwan DeGroote School of Business, McMaster University, Hamilton, Ontario L8S 4M4, Canada
a r t i c l e
i n f o
a b s t r a c t
Article history: Received 8 May 2008 Accepted 4 September 2008 Available online 10 September 2008 JEL classification: C13 C19 G10 G11 Keywords: Portfolio choice Covariance and correlation matrices Shrinkage estimation
The correlation matrix of security returns is an important input component for mean–variance portfolio analysis. This study uses the average of sample correlations to estimate the correlation matrix and derives an expression of its estimation error in terms of sampling variance. This study then considers the impact of such estimation error on shrinkage estimation, where a weighted average is sought between the sample covariance matrix and an average correlation target, and between the sample correlation matrix and the target. An illustrative example using monthly returns of the current Dow Jones stocks is provided. © 2008 Elsevier Inc. All rights reserved.
1. Introduction Since the pioneering work by Markowitz (1952), mean–variance optimization has remained an important quantitative tool to guide investment decisions. To implement the mean–variance approach on a given set of risky securities requires estimates of their expected returns, variances of returns, and pairwise correlations of returns. If security analysts’ insights are required to provide or revise the individual estimates, to generate the correlation data for constructing a portfolio of a reasonable size is nearly an impossible task, considering that the number of pairwise correlations is on the order of thousands. As Elton et al. (1978) remark, “while analysts may be capable of providing estimates of returns and variances, the development of estimates of correlation coefficients from anything other than models utilizing historical data is highly unlikely” (p. 1375). Elton et al. (2006) offer a similar assessment regarding the estimation of correlations.
E-mail address:
[email protected]. 1544-6123/$ – see front matter doi:10.1016/j.frl.2008.09.001
©
2008 Elsevier Inc. All rights reserved.
C.C.Y. Kwan / Finance Research Letters 5 (2008) 236–244
237
Conditional heteroskedasticity being an important feature of financial time series, multivariate GARCH models are natural choices for estimating covariances and correlations of security returns. The requirement that the estimated covariance matrix be positive definite ensures its invertibility and the uniqueness in the allocation of investment funds for each efficient portfolio. As reported in the survey articles by Bauwens et al. (2006) and Silvennoinen and Teräsvirta (2008), multivariate GARCH estimation involves systems of equations, with the number of parameters increasing quadratically with the number of variables considered. Therefore, it is crucial for a model to provide realistic but parsimonious specifications that also ensure positive definiteness of the covariance matrix. However, the positive definiteness requirement has the effect of imposing constraints on some parameters, thus reducing the flexibility of the models involved. As the true covariances and correlations are unobservable and their proxies are model dependent, it is not immediately clear which model is most suitable for their estimation.1 Ledoit et al. (2003) offer a novel approach to generate time-varying covariance matrices without imposing unrealistic restrictions on the model structure. Their approach is able to reduce the computational burden by using pairwise covariances as the first step of the estimation procedure. The estimation of each conditional covariance matrix of weekly returns for the seven international market indices there has utilized time series of at least 600 weekly return observations. This number is about an order of magnitude greater than what is usually needed for unconditional covariance matrix estimation for the same number of securities. Therefore, to estimate a high-dimensional correlation matrix for practical portfolio analysis, unconditional estimation still appears to be a viable alternative. To allow also for financial analysts’ input for individual variances of returns, a practical question is whether there is an unconditional estimation method for the correlation matrix that dominates others in the same category, in terms of forecasting accuracy. As Elton and Gruber (1973) and Elton et al. (1978) observe, the average of sample correlations outperforms various models in terms of average absolute error. Such models include cases where security returns are driven by some market or group returns and where forecasts of individual pairwise correlations are based on their historical values. Chan et al. (1999) provide further support for the average correlation; the sample average corresponds to the lowest average absolute error in forecasting correlations among various unconditional estimation methods. Elton et al. (2006) refine the estimation of correlations based on different ways to utilize their historical averages. By averaging the sample correlations, any meaningful differences in the observed correlations are lost. A statistical method, called shrinkage estimation, offers a trade-off; it pulls the observed correlations—including both outliers and meaningful cases—toward some moderate values. The average of sample correlations can be used to provide a moderate value. Ledoit and Wolf (2004) implement the shrinkage approach by optimizing the weighted average of the sample covariance matrix and a target matrix, which is a structured covariance matrix where the correlation of returns between any two different securities is the sample average correlation. Minimization of a quadratic loss function has led to an explicit expression of the optimal weight on the target, called the shrinkage intensity. For analytical convenience, some asymptotic statistical properties are utilized in the Ledoit and Wolf study; the time series of security returns are assumed to be long enough to render estimation error in the average correlation negligible.2
1 For example, Silvennoinen and Teräsvirta observe that, for a bivariate case of daily returns on stock and bond index futures over a 164-month sample period, the implied conditional correlations vary across different models. For security analysts and researchers who rely on commercial software to estimate multivariate GARCH models, there is an additional concern. Brooks et al. (2003) report that, for a bivariate GARCH model estimated with 3580 daily observations of spot and futures returns of a market index, the parameter estimates vary across different software packages, with each showing a different characterization of the same data. 2 Besides portfolio investment settings, the Ledoit and Wolf shrinkage method has also appeared in biometric settings, where the estimation of large-scale correlation and covariance matrices is often based on small numbers of available observations. For example, in a study to infer correlations from genomic data, Schäfer and Strimmer (2005) present the shrinkage approach in various formulations, one of which is the Ledoit and Wolf case. Although asymptotic statistical properties do not apply to samples with small numbers of observations, the Ledoit and Wolf assumption of the absence of estimation error in the average correlation is still retained there for analytical convenience.
238
C.C.Y. Kwan / Finance Research Letters 5 (2008) 236–244
Estimation error in the correlation of two variables can be captured by its sampling variance. However, as the correlation—written in terms of the variances and the covariance of the two variables—is in an inconvenient analytical form for statistical purposes, it is difficult to derive an expression of the sampling variance of their correlation without resorting to approximations. Approximate expressions, based on a first-order Taylor expansion of the sample correlation are available in the statistics literature (see, for example, Scheinberg, 1966 and Stuart and Ord, 1987, chapter 10). In particular, as will soon be clear, the expression in Scheinberg (1966, Eq. (2)), which utilizes various sampling variances and covariances as deduced from the two underlying variables to compute the sampling variance of their correlation, is suitable for extension to the case of the average correlation. In portfolio investment settings, while it is useful to account for estimation error in the average correlation in order to improve the quality of the covariance matrix, it is also useful to apply the shrinkage approach to the correlation matrix directly. A practical reason is that, to generate input data for portfolio analysis to guide investment decisions, if the individual expected returns and variances of returns are, in whole or in part, based on the insights of the security analysts involved, the correlation matrix is the remaining input whose estimation requires models utilizing historical data. If so, although the correlation matrix can still be deduced from the shrinkage results of the covariance matrix, to estimate the correlation matrix instead is more direct. The rest of this paper is organized as follows: Section 2 derives an expression of the sampling variance of the average correlation. By recognizing the presence of estimation error in the average correlation for a finite sample, Section 3 refines the Ledoit and Wolf expression of the optimal shrinkage intensity for the covariance matrix. Section 4 considers shrinkage estimation of the correlation matrix instead, where the target is a structured correlation matrix based on the average correlation. The analytical results are illustrated in Section 5 with monthly return data of the 30 U.S. stocks currently included in the Dow Jones Industrial Average. Finally, Section 6 provides some concluding remarks. 2. Estimation error in the average correlation
( R i ) and s j j = Var ( R j ) Given a set of n securities with random returns R 1 , R 2 , . . . , R n , let sii = Var ( R i , R j ) be their sample be the sample variances of securities i and j, respectively, si j = s ji = Cov √ covariance, and r i j = r ji = si j / sii s j j be their sample correlation, for i , j = 1, 2, . . . , n. The average of all pairwise sample correlations of returns between different securities can be written as r=
1 n(n − 1)
Here, i and variance of r is
(r ) = Var
j =i
i
j =i
√
si j sii s j j
stand for 1
Var 2
(1)
.
n
i =1
and
n2 (n − 1)
j =i
i
n
√
j =1, j =i ,
si j sii s j j
respectively, for notational simplicity. The sampling
(2)
. √
(r ), we approximate si j / sii s j j as a In order to derive an analytically tractable expression of Var linear expression. The approach involved, commonly called the delta method, requires a first-order √ Taylor expansion of si j / sii s j j around si j = s∗i j , sii = s∗ii , and s j j = s∗j j , the corresponding point estimates that the sample provides. It follows from the truncated Taylor series √
si j sii s j j
=
that
(r i j ) = Var
s∗i j s∗ii s∗j j
1
+
1 s∗ii s∗j j
(s i j ) + Var s∗ s∗ ii j j
−
s∗i j
s∗i j
si j − s∗i j −
2s∗ii
s∗ii s∗j j
s∗i j
sii − s∗ii −
s ∗2 (sii ) + i j Var (s j j ) Var 4s∗ii3 s∗j j 4s∗ii s∗j j3
2s∗j j
s∗ii s∗j j
s j j − s∗j j
(3)
s∗i j2
s∗ s∗i j2 (si j , sii ) − i j Cov (s i j , s j j ) + (sii , s j j ) Cov Cov ∗ 2 ∗ ∗ ∗ 2 sii s j j sii s j j 2s∗ii2 s∗j j2
(4)
C.C.Y. Kwan / Finance Research Letters 5 (2008) 236–244
and
(r ) = Var
1 n2 (n
− 1)2
j =i
i
l=k
1 ∗ s∗ s∗ii s∗j j skk ll
∗ s∗i j skl
2s∗ (sii , skk ) − i j Cov (sii , skl ) , Cov ∗ ∗ sii skk s∗ii
(si j , skl ) + × Cov
k
n
239
(5)
n
3 where k and l=k represent k=1 and l=1, l=k , respectively. Drawing on Schäfer and Strimmer (2005) for the current setting, we estimate the various covariances in the following manner: T For the T observations R i1 , R i2 , . . . , R iT of the random variable R i , the sample mean is R i = T1 t =1 R it . Let w i jt = ( R it − R i )( R jt − R j ), for i , j = 1, 2, . . . , n and
t = 1, 2, . . . , T . Let also w i j = cases where i = j and i = j, is
( R i , R j ) = si j = Cov
1 T
T T −1
T
t =1
w i jt . The unbiased sample covariance of R i and R j , including
wij.
(6)
The sampling variance of si j is
(s i j ) = Var
T2
( T − 1)2
( w i j ). Var
(7)
Letting w i j be a random variable, whose sample mean based on the T observations w i j1 , w i j2 , . . . , w i jT is w i j , we have
( w i j ) = Var
1 T
and
(s i j ) = Var where
t
T
(T
− 1)3 T
represents
(si j , skl ) = Cov
1
( w i j ) = Var
1 T −1
( w i jt − w i j )2
(8)
t
( w i jt − w i j )2 ,
(9)
t
t =1 .
T
(T
T
− 1)3
More generally, we have
( w i jt − w i j )( w klt − w kl ),
for i , j , k, l = 1, 2, . . . , n.
(10)
t
The individual sampling variances and covariances that Eqs. (9) and (10) provide, once substituted (r ) to be computed directly. into Eq. (5), allow Var 3. Shrinkage estimation of the covariance matrix Let Σ be an n × n population covariance matrix of security returns, with each (i , j )-element being σi j (implicitly for i , j = 1, 2, . . . , n). Let S, also an n × n matrix, be an unbiased estimator of Σ . Each (i , j )-element of S is si j , the sample covariance between securities i and j, as defined previously. Further, let F , an n × n matrix, be another estimator of Σ . Each (i , j )-element of F is labeled √ as f i j , where f ii = sii and f i j = r sii s j j , for i = j, with r being the sample average correlation, also as defined previously. In order to improve the estimation of Σ , we follow Ledoit and Wolf to shrink S toward the target F . With [λ F + (1 − λ) S ] being a weighted average of F and S, the weighting factor λ is the shrinkage intensity. The optimal λ is determined by minimizing a quadratic loss function, L (λ) = E
i
λ f i j + (1 − λ)si j − σi j
2
(11)
,
j
where E (·) represents the expected value of the variable (·) in question.
3
(ri j ), where r ∗ = s∗ / s∗ s∗ , is equivalent to Eq. (2) in Scheinberg Eq. (4), when written as an expression of (1/r i∗j2 )Var ij ij ii j j
(1966).
240
C.C.Y. Kwan / Finance Research Letters 5 (2008) 236–244
Noting that, for a random variable x, E (x2 ) = Var(x) + [ E (x)]2 , we have
λ2 Var( f i j ) + (1 − λ)2 Var(si j ) + 2λ(1 − λ) Cov( f i j , si j )
L (λ) =
i
j
+ λ E ( f i j − s i j ) + E ( s i j ) − σi j
2
(12)
.
Minimizing L (λ) yields d[ L (λ)]
=0=
dλ
i
2λ Var( f i j ) − 2(1 − λ) Var(si j ) + 2(1 − 2λ) Cov( f i j , si j )
j
+ 2 E ( f i j − s i j ) λ E ( f i j − s i j ) + E ( s i j ) − σi j ,
(13)
which leads to
i
j {Var(s i j ) − Cov( f i j , s i j ) − [ E (s i j ) −
i
j {Var( f i j ) + Var(s i j ) − 2 Cov( f i j , s i j ) + [ E ( f i j
λ=
σi j ][ E ( f i j − si j )]} − si j )]2 }
(14)
.
As si j is an unbiased estimator of σi j , we have E (si j ) = σi j , for i , j = 1, 2, . . . , n. Thus, with the denominator written more compactly, Eq. (14) reduces to
i
λ=
j [Var(s i j ) − Cov( f i j , s i j )] . 2 i j E [( f i j − s i j ) ]
(15)
By substituting the individual variances, covariances, and expected values with the corresponding sample estimates, and noting that f ii = sii and Cov( f ii , sii ) = Var(sii ), for i = 1, 2, . . . , n, we can write the estimated value of the shrinkage intensity as
λ=
i
j =i [Var(s i j ) − Cov( f i j , s i j )] . ∗ ∗ 2 i j =i ( f i j − s i j )
(16)
Here, with r ∗ being the point estimate of r, as computed from Eq. (1) for si j = s∗i j , sii = s∗ii , and s j j = s∗j j , each f i∗j = r ∗
s∗ii s∗j j is the point estimate of f i j , for i = j.
( f i j , si j ) term in Eq. (16). To this end, λ requires an explicit expression of the i j =i Cov To find √ we start with a first-order Taylor expansion of f i j = r sii s j j , for i = j, around r = r ∗ , sii = s∗ii , and s j j = s∗j j ; that is, fij = r
∗
s∗ s∗
ii j j
+
s∗ii s∗j j (r
∗
−r )+
r∗
s∗j j
sii s∗
2
∗
− sii +
ii
r∗
2
s∗ii
s∗
s j j − s∗j j .
(17)
jj
The expansion here differs from the Ledoit and Wolf case by the additional term
s∗ii s∗j j (r − r ∗ ) in
order to capture the randomness of r for a finite sample. It follows from si j = s ji that
i
( f i j , si j ) = Cov
j =i
i
(r , si j ) + r ∗ s∗ii s∗j j Cov
j =i
i
where
i
j =i
(r , si j ) = s∗ii s∗j j Cov
1 n(n − 1)
i
In view of Eq. (3), we have, more explicitly,
j =i
k
l=k
j =i
s∗j j s∗ii
(sii , si j ), Cov
√ s∗ii s∗j j Cov
skl skk sll
, si j .
(18)
(19)
C.C.Y. Kwan / Finance Research Letters 5 (2008) 236–244
i
( f i j , si j ) = Cov
j =i
1 n(n − 1)
+ r∗
i
i
j =i
j =i
l=k
k
s∗
jj
s∗ii
s∗ii s∗j j
241
s∗ (si j , skl ) − kl Cov (si j , skk ) Cov ∗ ∗ ∗ skk sll skk
(sii , si j ). Cov
(20)
Under the simplifying assumption in the Ledoit and Wolf study that r is estimated without error, (r , si j ) = 0, Eq. (16), now with λ denoted by λ0 , reduces to which implies Cov
i
λ0 =
s∗j j ∗ (sii , si j )] Cov j =i [Var(s i j ) − r s∗ii . ∗ ∗ 2 i j =i ( f i j − s i j )
(21)
Eqs. (16) and (21), which differ only in whether estimation error in r is recognized, allow its impact on the shrinkage intensity to be examined directly. Notice that, as indicated in the Ledoit and Wolf study, λ the shrinkage intensity is intended to be in the range of zero to one. Thus, if any of the values of λ0 are outside the permissible range, they ought to be set at the corresponding boundary values. and A zero shrinkage intensity indicates that there is no need to shrink the sample covariance matrix; a shrinkage intensity of one, in contrast, indicates that the covariance matrix is better characterized by a common correlation. As each of the n2 (n − 1)2 bracketed terms in the quadruple summation on the right-hand side λ0 overstates or unof Eq. (20) can be of either sign, it is difficult to establish analytically whether λ. Intuitively, the presence of estimation error in the average correlation obscures the true derstates location of the shrinkage target for potentially improving the quality of the sample covariance matrix. With the shrinkage target being not as precise, the less reliance on shrinkage or, equivalently, a lower λ0 is expected to overstate λ. shrinkage intensity seems to be a natural outcome. Thus, intuitively, However, whether this conjecture is true has to be assessed empirically. 4. Shrinkage estimation of the correlation matrix To shrink the sample correlation matrix requires Σ , S, and F to be defined as correlation matrices instead. Each diagonal element of a correlation matrix is always one. Each off-diagonal (i , j )-element in the case of Σ is ρi j , the population correlation between securities i and j (implicitly for i , j = 1, 2, . . . , n and i = j). In the case of S, it is r i j ; in the case of F , it is r. With [γ F + (1 − γ ) S ] being a weighted average of F and S, the quadratic loss function of the shrinkage intensity γ is L (γ ) = E
γ r + (1 − γ )ri j − ρi j
i
2
.
(22)
j =i
Minimizing L (γ ) leads to
γ=
i
j =i {[Var(r i j ) − Cov(r , r i j )] − [ E (r i j ) − 2 i j =i E [(r − r i j ) ]
ρi j ][ E (r − ri j )]}
.
(23)
Unlike Eq. (15), Eq. (23) has a term that captures the bias in the sample correlation r i j as an estimator of the population correlation ρi j . The bias, which is downward for ρi j > 0, diminishes as the number of observations for the estimation increases. As Zimmerman et al. (2003) indicate, a simple formula in Olkin and Pratt (1958) is able to eliminate nearly all of the bias in the sample correlation of normally distributed variables. The Olkin and Pratt formula is
i j = r i j 1 + ρ
1 − r i2j 2( T − 3)
,
i j is a nearly unbiased estimator of where ρ
(24)
ρi j .
242
C.C.Y. Kwan / Finance Research Letters 5 (2008) 236–244
By substituting the individual variances, covariances, and expected values with the corresponding sample estimates and using Eq. (24) to estimate [ E (r i j ) − ρi j ] for a sample with T observations, we have
i
γ =
r i∗j (1−r i∗j2 ) ∗ j =i {[Var(r i j ) − Cov(r , r i j )] + [ 2( T −3) ](r ∗ ∗ 2 i j =i (r − r i j )
− r i∗j )}
(25)
.
(r i j ) is given by Eq. (4), and the sum Here, each r i∗j = s∗i j / s∗ii s∗j j is the point estimate of r i j , each Var i
(r ) (r , r i j ) = n(n − 1)Var Cov
(26)
j =i
can be computed by using Eq. (5). labeled as γ b becomes If the bias in r i j is ignored, Eq. (25) with γ
γb =
i
j =i [Var(r i j ) − Cov(r , r i j )] . ∗ 2 ∗ i j =i (r − r i j )
(27)
or γ b represents Just like shrinkage estimation of the covariance matrix, the shrinkage intensity that γ and γ b are outside is intended to be in the range of zero to one. In case that any of the values of γ the permissible range, it is necessary to set them at the corresponding boundary values. b overstates can be established. According to Eqs. (25) and A sufficient condition under which γ γ γb − γ) is the same as that of i j=i ri∗j (1 − ri∗j2 )(ri∗j − r ∗ ). This sum is proportional (27), the sign of ( to the weighted average of the n(n − 1)/2 terms of (r i∗j − r ∗ ), for i = 2, 3, . . . , n and j = 1, 2, . . . , i − 1, with the weighting factor for each term being r i∗j (1 − r i∗j2 ). Although the average of the n(n − 1)/2 terms of (r i∗j − r ∗ ) is zero, their weighted average can be of either sign. If we sort the n(n − 1)/2
√
terms of r i∗j in an ascending order, r i∗j (1 − r i∗j2 ) will increase with the sorted r i∗j for r i∗j = − 3/3 to
√
√
3/3 (= 0.5774) and will decrease afterwards. If all observed values of r i∗j are no greater than 3/3, the weighted average of the n(n − 1)/2 terms of (r i∗j − r ∗ ) must be positive, resulting in ( γb − γ) being positive.4 The presence of any r i∗j >
√
√
3/3, however, will lower this weighted average. If there
are predominantly many cases of r i∗j > 3/3, (r i∗j − r ∗ ) can be negative. Thus, for a set of securities < γ b is expected. with generally low to moderate pairwise correlations of returns, the result of γ analytically is difficult, as their derivaλ and γ In contrast, to compare the relative magnitudes of tions are based on two different formulations. From an analytical perspective, whether the sample covariance matrix or the sample correlation matrix based on a common set of return data tends to be of better quality and thus tends to rely less on shrinkage is unclear. Intuitively, however, while the sample correlation of returns of two stocks captures only how well their returns move together in a sample period, their sample covariance captures also how volatile their returns are in the same period. For a finite sample period, as there is estimation error in each sample variance of returns, it seems reasonable to expect, in general, each off-diagonal element of the sample covariance matrix to have higher estimation error, as a proportion of its estimated value, than does the corresponding element of the sample correlation matrix. If so, with the point estimates of correlations tending to be more precise than those of the corresponding covariances, we would expect less reliance of to be the sample correlation matrix on shrinkage to improve its quality; that is, we would expect γ λ. In the absence of a proof, the truth or falsity of this conjecture can only be assessed lower than empirically.
4
If any of the n(n − 1)/2 sorted terms of r i∗j is negative, the corresponding weighting factor ri∗j (1 − r i∗j2 ), which is negative,
when multiplied to the term (r i∗j − r ∗ ), will result in a positive product, thus contributing to a positive ( γb − γ).
C.C.Y. Kwan / Finance Research Letters 5 (2008) 236–244
243
5. An illustrative example The illustrative example here is based on six years of monthly returns of the 30 U.S. stocks currently included in the Dow Jones Industrial Average, from January 2002 to December 2007, as collected from the Center for Research in Security Prices via Wharton Research Data Services.5 In this example, there are 435 off-diagonal elements in the lower triangle of the sample correlation matrix, labeled as r i j , for i = 2, 3, . . . , 30 and j = 1, 2, . . . , i − 1. Their average is r ∗ = 0.2992. The average of the 435 standard errors of r i j , with each case being SE(r i j ) =
(r i j ), is 0.1151. As expected, the Var
(r ) = 0.0428, is much standard error of the average of the sample correlations, which is SE(r ) = Var lower. However, with the coefficient of variation CV (r ) = SE(r )/r ∗ = 0.1431, the error still accounts for more than 14% of the estimated value of the average correlation. λ0 = 0.7534, where estimation In the case of shrinkage for the sample covariance matrix, while error in r is ignored, a more precise shrinkage intensity, where estimation error in r is recognized, λ = 0.6250 instead. With λ0 − is λ = 0.1284, ignoring estimation error in r would result in an extra 12.84% weight on the shrinkage target. The results here are consistent with the intuitive idea that, as estimation error in r obscures the true location of the shrinkage target for potentially improving the quality of the sample covariance matrix, to ignore its presence (by treating r as a precise measure) would exaggerate the attractiveness of the shrinkage target and thus would overstate the shrinkage intensity. In the case of shrinkage for the correlation matrix instead, the shrinkage intensity that ignores the b = 0.4523; the corresponding shrinkage intensity that bias in each estimated pairwise correlation is γ = 0.4476, which is marginally lower. Among the 435 off-diagonal elements in recognizes the bias is γ the lower triangle of the sample correlation matrix, with each element labeled as r i∗j , there are only √
20 cases where r i∗j > 3/3. Thus, as explained in the preceding section, for a set of securities with generally low to moderate pairwise correlations of returns, such as the example here, the result of γ < γb is as expected. < As this example shows, with γ λ, the sample correlation matrix relies less on shrinkage to improve its quality than the corresponding sample covariance matrix does. To relate such a finding to the quality of the two matrices, the absolute values of the coefficient of variation of the corresponding off-diagonal elements of the two matrices are compared. In the case of the sample correlation matrix, the coefficient of variation for each (i , j )-element is CV (r i j ) = SE(r i j )/r i∗j , where SE(r i j ) =
the case of the sample covariance matrix, it is CV (si j ) = SE(si j )/s∗i j , where SE(si j ) =
(r i j ); in Var
(si j ), instead. Var
The use of absolute values here is to accommodate cases of negative correlations (and thus negative covariances) in the sample. Among the 435 matched pairs of off-diagonal elements in the lower triangle of the two matrices, there are only 43 cases where |CV (si j )| < |CV (r i j )|. The corresponding Wilcoxon signed-rank test score for paired observations is 87,920, with the median of the 435 cases of |CV (si j )| being higher than that of |CV (r i j )|. As the critical value of the score for a two-tailed test at the 1% significance level is only 13,517, the Wilcoxon test results do support the idea that the sample correlation matrix is of better quality between the two matrices for the set of monthly return data here. 6. Concluding remarks For practical portfolio analysis, although financial analysts may be capable of providing more reliable expected returns and variances of returns for the securities considered than estimates based on historical return data do, reliance on such data for estimating the correlation matrix is inevitable. While recognizing the importance of conditional heteroskedasticity in financial time series, this study
5 As this example is for illustrative purposes, its results are not intended to be empirical evidence in support of the shrinkage approach. Other lengths of the sample period, ranging from 36 to 252 months, have also been considered. The details, including a comparison of corresponding results from different subperiods, are available from the author upon request.
244
C.C.Y. Kwan / Finance Research Letters 5 (2008) 236–244
has considered a simple approach based on unconditional estimation. The approach, which utilizes the sample average correlation of security returns, is particularly suitable for high-dimensional cases, for which multivariate GARCH methods are difficult to implement. With estimation error in the sample average correlation accounted for, this study has refined the shrinkage approach (with an average correlation target) for estimating the covariance matrix. This study has also considered shrinkage estimation of the correlation matrix, thus allowing the variance data for portfolio analysis to be generated separately if so desired. As shown in an illustrative example based on the current Dow Jones stocks, errors in the estimated pairwise correlations and thus in the average correlation have considerable effects on the shrinkage results. Not having to rely on asymptotic statistical properties of the return series involved, the shrinkage approach here can accommodate cases where not all securities considered have long series of historical return data. Whether shrinkage estimation toward an average correlation target represents a significant improvement in the estimation of the correlation matrix is an empirical issue to examine. This study, which has its focus on improving the analytical side of shrinkage estimation based on the average correlation, would allow a fairer comparison between the shrinkage approach and other competing approaches, in terms of forecasting errors. As the comparison typically involves return data from many subperiods of a long sample period, with relatively short series of returns for each subperiod, the shrinkage approach here, which does not rely on asymptotic statistical properties of the data, is particularly suitable for this purpose. Acknowledgments Financial support for this study was provided by the Social Sciences and Humanities Research Council of Canada. The author wishes to thank an anonymous reviewer for helpful comments and suggestions. References Bauwens, L., Laurent, S., Rombouts, J.V.K., 2006. Multivariate GARCH models: A survey. Journal of Applied Econometrics 21, 79–109. Brooks, C., Burke, S.P., Persand, G., 2003. Multivariate GARCH models: Software choice and estimation issues. Journal of Applied Econometrics 18, 725–734. Chan, L.K.C., Karceski, J., Lakonishok, J., 1999. On portfolio optimization: Forecasting covariances and choosing the risk model. Review of Financial Studies 12, 937–974. Elton, E.J., Gruber, M.J., 1973. Estimating the dependence structure of share prices—Implications for portfolio selection. Journal of Finance 28, 1203–1232. Elton, E.J., Gruber, M.J., Urich, T.J., 1978. Are betas best? Journal of Finance 33, 1375–1384. Elton, E.J., Gruber, M.J., Spitzer, J., 2006. Improved estimates of correlation coefficients and their impact on optimum portfolios. European Financial Management 12, 303–318. Ledoit, O., Wolf, M., 2004. Honey, I shrunk the sample covariance matrix. Journal of Portfolio Management 30 (4), 110–119. Ledoit, O., Santa-Clara, P., Wolf, M., 2003. Flexible multivariate GARCH modeling with an application to international stock markets. Review of Economics and Statistics 85, 735–747. Markowitz, H., 1952. Portfolio selection. Journal of Finance 7, 77–91. Olkin, I., Pratt, J.W., 1958. Unbiased estimation of certain correlation coefficient. Annals of Mathematical Statistics 29, 201–211. Schäfer, J., Strimmer, K., 2005. A shrinkage approach to large-scale covariance matrix estimation and implications for functional genomics. Statistical Applications in Genetics and Molecular Biology 4 (1). Article 32. Scheinberg, E., 1966. The sampling variance of the correlation coefficients estimated in genetic experiments. Biometrics 22, 187–191. Silvennoinen, A., Teräsvirta, T., 2008. Multivariate GARCH models. In: Anderson, T.G., Davis, R.A., Kreiss, J.-P., Mikosch, T. (Eds.), Handbook of Financial Time Series. Springer, New York. In press. Stuart, A., Ord, J.K., 1987. Kendall’s Advanced Theory of Statistics, Distribution Theory, vol. 1, fifth ed. Charles Griffin & Co, London. Zimmerman, D.W., Zumbo, B.D., Williams, R.H., 2003. Bias in estimation and hypothesis testing of correlation. Psicológica 24, 133–158.