Accepted Manuscript Higher-order asymptotic theory of shrinkage estimation for general statistical models Hiroshi Shiraishi, Masanobu Taniguchi, Takashi Yamashita
PII: DOI: Reference:
S0047-259X(18)30138-6 https://doi.org/10.1016/j.jmva.2018.03.006 YJMVA 4338
To appear in:
Journal of Multivariate Analysis
Received date : 9 March 2018 Please cite this article as: H. Shiraishi, M. Taniguchi, T. Yamashita, Higher-order asymptotic theory of shrinkage estimation for general statistical models, Journal of Multivariate Analysis (2018), https://doi.org/10.1016/j.jmva.2018.03.006 This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
Higher-order asymptotic theory of shrinkage estimation for general statistical models Hiroshi Shiraishi1,∗, Masanobu Taniguchi2 , Takashi Yamashita3
Abstract In this study, we develop a higher-order asymptotic theory of shrinkage estimation for general statistical models, which includes dependent processes, multivariate models, and regression models (i.e., non-independent and identically distributed models). We introduce a shrinkage estimator of the maximum likelihood estimator (MLE) and compare it with the standard MLE by using the third-order mean squared error. A sufficient condition for the shrinkage estimator to improve the MLE is given in a general setting. Our model is described as a curved statistical model p(·; θ(u)), where θ is a parameter of the larger model and u is a parameter of interest with dim u < dim θ. This setting is especially suitable for estimating portfolio coefficients u based on the mean and variance parameters θ. We finally provide the results of our numerical study and discuss an interesting feature of the shrinkage estimator. Keywords: Curved statistical model, Dependent data, Higher-order asymptotic theory, Maximum likelihood estimation, Portfolio estimation, Regression model, Shrinkage estimator, Stationary process
1. Introduction Since the seminal works by Stein and James [8, 18], termed JS works hereafter, shrinkage analysis has been extended to estimate a linear regression model X = Uβ + ε. Based on the least squares (LS) estimator βˆ LS of β, JS-type estimators have been proposed. When ε ∼ Nn (0, σ2 In ), Arnold [2] proposed the estimator ( ) cσ ˆ2 βˆ LS , βˆ S = 1 − ∥Uβˆ LS ∥2 where c is a positive constant and σ ˆ 2 = ∥X − Uβˆ LS ∥2 /(n − p) with p = dim β is an estimator of σ2 , and showed that βˆ S ˆ improves βLS with a suitable choice of c if p ≥ 3. Although shrinkage estimation has developed in many directions, one stream has become known as empirical Bayes estimation. For instance, by adopting a mixed linear regression model, Lahiri and Rao [12] evaluated the mean squared error (MSE) of an empirical linear Bayes estimator of small area means and discussed MSE estimation. They showed this MSE estimator to be robust with respect to non-normality. We often use statistical methods designed for independent samples when the observations may in fact be dependent (e.g., financial engineering and econometrics). In such cases, it is important to investigate the influence of the statistical method to be used. For a vector-valued Gaussian stationary process with a mean vector µ, Taniguchi and Hirukawa [20] studied the MSE of the sample mean µˆ and the JS estimator µˆ JS of µ. Then, they provided a set of sufficient conditions for µˆ JS to improve µˆ in terms of the spectral density matrix. For a time series regression model, Senda and Taniguchi [16] evaluated the MSE of the JS estimator of regression coefficients and compared it with that of the LS estimator. Then, the authors provided sufficient conditions for the JS estimator to improve the LS estimator. For the autocovariance functions of a Gaussian stationary process, Taniguchi et al. [22] introduced a modified autocovariance estimator and then compared its MSE with that of the usual sample autocovariance estimator. From the statistical asymptotic theory perspective, to compare the improvement of shrinkage estimators with that of typical estimators, a third-order asymptotic theory is required. For an iid curved exponential family with the parameter u, Amari [1] developed the third-order asymptotic theory of estimation and testing for u. For a more general setting that includes not only iid observations but also regression, multivariate, and time series models, Taniguchi and Watanabe [23] developed the third-order asymptotic theory of inference for the family of curved probability densities ∗ Corresponding
author of Mathematics, Keio University 2 Research Institute for Science & Engineering, Waseda University 3 Government Pension Investment Fund 1 Department
Preprint submitted to Journal of Multivariate Analysis
March 11, 2018
M = {pn (xn ; θ(u)) : dim u < dim θ} embedded in S = {pn (xn ; θ) : θ ∈ Θ}. This setting is suitable for estimating portfolio coefficients u based on the mean and variance parameters θ of the financial return process. Taniguchi and Kakizawa [21] provided a systematic approach to the third-order asymptotic theory for estimating and testing a variety of time series models. In the nonparametric context, Velasco and Robinson [24] established the thirdorder Edgeworth expansion for the distribution of smoothed non-parametric spectral estimates and studentized sample means. In this study, we deal with a general class of statistical models, including multivariate dependent and non-identically distributed models. Concretely, let Xn = (X1 , . . . , Xn ) be a collection of m-dimensional random vectors that form a stochastic process. Denote by pn (xn ; θ) the probability density of Xn , where xn ∈ Rmn and θ = (θ1 , . . . , θ p )⊤ ∈ Θ ⊂ R p . We are interested in the family of curved probability densities M = {pn (xn ; θ(u)) : u = (u1 , . . . , uq ), q < p} embedded in S. As we discuss later, this setting is suitable for estimating portfolio coefficients. When the data Xn come from M, we want to estimate u. Taniguchi and Watanabe [23] showed that the bias-corrected maximum likelihood estimator (MLE) uˆ ML is third-order asymptotically efficient. However, if we evaluate the MSE of uˆ ML without bias correction, there is no uniformly optimal estimator in the sense of “third order”. In Section 2, we introduce the shrinkage estimator uˆ S = {Iq − ∆n (ˆu ML )}ˆu ML , where ∆n is a shrinkage function. Then, we evaluate the third-order difference between the MSEs of uˆ ML and uˆ S , which determines when uˆ S improves uˆ ML . In Section 3, we provide concrete examples, namely estimations of the mean, regression coefficients, and autocovariances of stationary processes, and show when uˆ S improves uˆ ML . Recently, high-dimensional statistical analysis has prevailed. We note that if the dimension q becomes large, uˆ S improves uˆ ML . In Section 4, we discuss the estimation problem of portfolio coefficients u based on the mean and variance parameters. For the portfolio choice problem, shrinkage estimation has been considered in [4, 9–11, 13, 14], among others. Jorion [11] proposed a plug-in portfolio weight estimator, which was constructed with a shrinkage mean, viz. ˆ ζˆS = δζ0 e + (1 − δ)ζ, where δ is a shrinkage factor satisfying 0 < δ < 1, ζˆ is the typical sample mean, e is a constant vector with all the elements equal to 1 and ζ0 is a common constant value. Furthermore, it was shown in [4, 13, 14] that shrinkage estimation can be applied not only to means, but also to covariance matrices. These authors proposed a shrinkage covariance matrix estimator that uses the typical sample covariance matrix Σˆ and a shrinkage target S (or its estimator Sˆ ), viz. ˆ Σˆ S = δS + (1 − δ)Σ. In this study, we propose a new class of shrinkage estimator, namely shrinkage estimation for the optimal portfolio weight. Traditional mean-variance portfolio weights are described as a function of the mean ζ and covariance matrix Σ, whereas our proposed shrinkage estimator can also be regarded as a function of the mean and covariance matrix. Suppose that a vector of portfolio weights u is defined as a function of the mean and variance parameters θ (i.e., u ≡ u(θ)). We introduce the MLE uˆ ML = u(θˆ ML ) and a shrinkage estimator uˆ S = {Iq − ∆n (ˆu ML )}ˆu ML . We then provide the conditions for uˆ S to improve uˆ ML . In Section 5, we analyze daily stock return data and compare the performance of our proposed portfolio strategy with that of the existing method. The proofs of all results are provided in the Appendix. Throughout this paper, ∥ · ∥ and ∥ · ∥F denote the Euclidean norm and Frobenius norm, respectively, and we use the “vech” operator to transform a symmetric matrix into a vector by stacking the elements on and below the main diagonal. For matrices A and B, A ⊗ B denotes the Kronecker product of A and B, whose ( j1 , j2 )th block is a j1 j2 B. For a, b ∈ R, δ(a, b) denotes Kronecker’s delta. We also assume Einstein’s summation convention, i.e., “summation is automatically ∑ taken without the symbol for those indices that appear twice in one term, once as a subscript and once as a superscript”. 2. Higher-order asymptotic theory for shrinkage estimators
In this study, we deal with a general class of statistical models that includes not only iid observations, but also regression, multivariate, and time series models. Concretely, let Xn = (X1 , . . . , Xn ) be a collection of m-dimensional random vectors Xi that form a vector-valued stochastic process. We denote by pn (xn ; θ) the probability density function of Xn with respect to a carrier measure, where xn ∈ Rmn and θ = (θ1 , . . . , θ p )⊤ ∈ Θ ⊂ R p . We are now interested in the family of curved probability densities M = { pn (xn ; θ(u)) : u = (u1 , . . . , uq ), q < p } embedded in S = { pn (xn ; θ) : θ ∈ Θ }; in other words, the embedding map θ = θ(u) is smooth and its Jacobian matrix { i } ∂θ /∂ua : i ∈ {1, . . . , p}, a ∈ {1, . . . , q}
has full rank (= q) on M. The setting includes that of [1] for the curved exponential family as a special case. We propose shrinkage estimators of unknown parameter u when Xn comes from M. Because we apply this shrinkage method in our mean-variance portfolio estimation, the parametrization θ(u) is natural; hence, θ corresponds to the mean and variance 2
parameters and u corresponds to the portfolio coefficients. In the ambient large class S, we estimate θ by using the MLE ˆ If we consider the estimation of u on M in terms of θ, ˆ we must then solve the equation θˆ = θ(ˆu) with respect to uˆ , but θ. this cannot be done because dim θ = p > dim u = q. Hence, we introduce new extra coordinates v = (v1 , . . . , v p−q ), such that w = (w1 , . . . , w p ) = (u, v) becomes a coordinate system in S. Then, the equation θˆ = θ(ˆu, vˆ )
(1)
can be uniquely solved with respect to uˆ and vˆ , for which we assume θ(u, 0) = θ(u). By fixing u, we locally define the ancillary space A(u) = {(u, v) : (u, v) ∈ S} so that the family {A(u)} defines a local foliation of S. The determination of the uˆ of u is in one-to-one correspondence with the introduction of the local foliation {A(u)}, which is called the ancillary family associated with the estimator uˆ . In what follows, we use α, β, γ, . . . for the indices of the w coordinates; a, b, c, . . . for the indices of the u coordinates; and κ, µ, λ, . . . for the indices of the v coordinates. Let ℓn (θ) = ln {pn (Xn ; θ)} and ℓ˜n (w) = ln {pn (Xn ; θ(w))}, and ∂/∂wα is abbreviated to ∂α . Henceforth, it is assumed that ℓn (θ) and ℓ˜n (w) are differentiable with respect to θ and w up to the necessary order, respectively. Next, define [ ] [ ] Zα = ∂α ℓ˜n (w)/cn , Zαβ = ∂α ∂β ℓ˜n (w) − Ew {∂α ∂β ℓ˜n (w)} /cn , Zαβγ = ∂α ∂β ∂γ ℓ˜n (w) − Ew {∂α ∂β ∂γ ℓ˜n (w)} /cn , √ where α, β, γ ∈ {1, . . . , p}. In regular statistical models, the consistency order cn s are often n. We impose the following assumptions, which are reasonable even in the dependent case; see pp. 45–48 in [19]. Assumption 1. The asymptotic moments (cumulants) of Zα , Zαβ and Zαβγ are evaluated as follows: E(Zα Zβ ) = gαβ + ∆αβ /c2n + o(1/c2n ),
E(Zα Zβγ ) = Jαβγ + o(1/c2n ),
E(Zα Zβγδ ) = Lαβγδ + o(1/c2n ), and
E(Zα Zβ Zγ ) = Kαβγ /cn + o(1/c2n ),
E(Zα Zβ Zγδ ) = Nαβγδ /cn + o(1/c2n ),
cum(Zαβ , Zγδ ) = Mαβγδ + o(1/c2n ),
cum(Zα , Zβ , Zγ , Zδ ) = Hαβγδ /c2n + o(1/c2n ).
Assumption 2. The estimator uˆ = (ˆua ), as defined by the relation (1), has the stochastic expansion: wˆ a ≡ cn (ˆua − ua ) = gaβ Zβ + Qa /cn + o p (1/cn ),
(2)
where a ∈ {1, . . . , q}, gγδ is the (γ, δ)th element of the inverse matrix of { gγδ : γ, δ ∈ {1, . . . , p} }, and Qa = O p (1). Write the second-order bias of wˆ a as µa , i.e., for all a ∈ {1, . . . , q},
E(wˆ a ) = µa /cn + o(1/c2n ),
(3)
where µa ≡ µa (u) = −gaβ gγδ (Kβγδ + Jβγδ )/2. Henceforth, we assume the validity of the Edgeworth expansion for wˆ a .
Assumption 3. The standardized estimator (wˆ a ) has a valid third-order Edgeworth expansion. Let cγδψ = Kγδψ + Jγδψ ,
′
′
′
′
cab = gaa ∂a′ µb + gbb ∂b′ µa + gaα ∆αβ gβb + cov(Qa , Qb ) ≡ gaa ∂a′ µb + gbb ∂b′ µa + c¯ ab
say. Then, we have the following result. Theorem 1. Suppose that Assumptions 1–3 hold and that gaκ = 0 for a ∈ {1, . . . , q}, κ ∈ {q + 1, . . . , p}. Then, ′
′
E(wˆ a wˆ b ) = gab + (cab + gaa ∂a′ µb + gbb ∂b′ µa + µa µb )/c2n + o(1/cn2 ) ≡ V ab + o(1/c2n ),
(4)
say, where a, b, a′ , b′ ∈ {1, . . . , q}.
In what follows, we write the MLE of u as uˆ ML = (ˆuaML ). Generally, this has the second-order bias µaML ; see Eq. (3). For each a ∈ {1, . . . , q}, we modify uˆ aML as uˆˆ aML = uˆ aML − µaML (ˆu ML )/c2n ,
so that uˆˆ ML becomes a third-order asymptotically unbiased estimator. In class U of the third-order asymptotically unbiased estimators, it is shown that uˆˆ ML minimizes the matrix {V ab } in the sense of matrix order A ≥ B, defined as A − B ≥ 0, i.e., non-negative definite; see pp. 219–230 of [21]. Hence, the MLE is third-order optimal in class U. However, it is natural that we discuss the third-order optimality of estimators without higher-order asymptotic unbiasedness. In view of this, we propose the following shrinkage estimator uˆ S = (ˆuaS ): uˆ S ≡ {Iq − F(ˆu ML )/c2n } uˆ ML , where Iq is the q × q identity matrix and F(·) is a q × q matrix-valued continuously differentiable function. Typically, most conventional estimators satisfy the following assumption. 3
(5)
Assumption 4. There exists a positive constant c such that cn ∥ˆu ML − u∥ ≤ c ln(cn ) a.s. ′
a a 2 a ′ From (5), it follows that uˆ aS = uˆ aML − c−2 n F(u)a′ u + o(1/cn ), where F(u)a′ is the (a, a )th element of F(u). Hence, the second-order bias for uˆ aS is ′ (6) µaS = µaML − F(u)aa′ ua ≡ µaML − sa ,
ab say. Write the asymptotic covariance matrix of uˆ ML and uˆ S as V ML = {V ML } and VS = {VSab }, respectively. Summarizing the above, we have the following theorem.
Theorem 2. Under Assumptions 1–4, one has (i) c2n {tr(V ML ) − tr(VS )} =
q ∑ a=1
′
{2gaa ∂a′ sa + 2µaML sa − (sa )2 } + o(1) = W(u) + o(1).
(ii) If W(u) > 0, the shrinkage estimator uˆ S asymptotically improves uˆ ML to the third order. Theorem 2 is general and can be applied to many statistical models, typical examples of which we provide in the next section. 3. Examples Because the results of Theorems 1 and 2 provide a general and unified view of the asymptotic theory of shrinkage estimators, we grasp the asymptotics of various shrinkage estimators as special cases of ours. Example 1 (JS estimator). Let X1 , . . . , Xn be a sequence of iid random vectors distributed as Nq (u, Iq ), where Iq is the q × q identity matrix. The sample mean X¯ n = (X1 + · · · + Xn )/n becomes the MLE uˆ ML . For q ≥ 3, [18] showed that X¯ n is not admissible with respect to the MSE. James and Stein [8] proposed a shrinkage estimator defined by ( ) q−2 ¯ uˆ S = 1 − Xn , (7) n ∥X¯ n ∥2 √ which improves X¯ n with respect to the MSE. We can understand this estimator as a special case of (5), with cn = n, ∑ ′ ′ µaML = 0, sa = (q − 2)ua / qa′ =1 (ua )2 , and gaa = δ(a, a′ ) (Kronecker’s delta). Then, it is easy to see that the right-hand side of (2) is q ∑ W(u) = (q − 2)2 (ua )2 ≥ 0, a=1
which implies that the JS shrinkage estimator uˆ S improves X¯ n up to the “third order.”
Remark 1. If q becomes large, the improvement of uˆ S increases accordingly. Hence, uˆ S is suitable for high-dimensional cases. Example 2. Suppose that {Xt : t ∈ Z} is a q-dimensional Gaussian stationary process with mean vector E(Xt ) = u ¯ and spectral density matrix f (λ). Then, Theorem 2 of [20] evaluated the difference between the MSEs of √ Xn aand the shrinkage estimator uˆ S defined in (7). We can understand the result above as a special case of (2), with cn = n, µ ML = 0, ∑ ′ ′ {gaa } = 2π f (0), and sa = (q − 2)ua / qa′ =1 (ua )2 . That is, W(u) in (2) is 2u⊤ {2π f (0)}u q − 2 2(q − 2) tr{2π f (0)} − ∑q . − W(u) = ∑q a 2 a 2 2 a=1 (u ) a=1 (u )
(8)
Evidently, (8) is equivalent to that of [20], p. 739. Let ν1 ≤ · · · ≤ νq be the eigenvalues of 2π f (0). Then, W(u) > 0 if ν1 + · · · + νq−1 − νq > (q − 2)/2, which implies that if all the νa values are large and the values are close to νq , then uˆ S greatly improves X¯ n . Example 3. Let X1 , . . . , Xn be generated by
Xt = B⊤ zt + εt ,
(9)
where Xt = (X1,t , . . . , Xm,t )⊤ , B = {b jk } is an ℓ × m matrix of unknown parameters, and εt = (ε1,t , . . . , εm,t )⊤ is a Gaussian stationary process with E(εt ) = 0 and spectral density matrix f (λ). Here, zt = (z1,t , . . . , zℓ,t )⊤ is an ℓ ×1 vector of regressors ∑ satisfying assumptions (A1)–(A4). Let d2j (n) = nt=1 z2j,t for each j ∈ {1, . . . , ℓ}. (A1) d j (n) = T j × cn for j ∈ {1, . . . , ℓ}, and cn → ∞ as n → ∞, where the T j s are positive constants. 4
(A2) For all j ∈ {1, . . . , ℓ}, z j,n /cn → 0 as n → ∞. (A3) For all j, k ∈ {1, . . . , ℓ}, the following limit exists: n
∑ 1 z j,t zk,t+h . n→∞ d j (n)dk (n) t=1
ρ jk (h) ≡ lim Let R(h) ≡ {ρ jk (h)}. (A4) R(0) is non-singular.
From (A1)–(A4), there exists a matrix function M(λ) whose increments are Hermitian non-negative, such that ∫ π R(h) = eihλ d M(λ). −π
Next, we make an additional assumption. (A5) (εt ) has an absolutely continuous spectrum, which is piecewise continuous with no discontinuities at the jumps of M(λ). We then rewrite (9) as X = Uβ + ε, where X j,t is in row ( j − 1) × m + t of X, ε j,t is in row ( j − 1) × m + t of ε, β has b jk in row (k − 1) × ℓ + j, and U ≡ Im ⊗ Z. Here, Z has z j,t in row t and column j. If we are interested in estimating β ( ) based on X and Z, the most fundamental candidate is the LS estimator βˆ LS = U⊤ U −1 U⊤ X. When the εt s are iid, βˆ LS is not admissible if q ≡ ℓm ≥ 3. Stein [18] proposed an alternative estimator, viz. { } c ˆβ JS = 1 − (βˆ LS − b) + b, (10) ∥(Im ⊗ Dn )(βˆ LS − b)∥2 where c is a positive constant, Dn = cn diag(T 1 , . . . , T ℓ ) = cn T (say), and b is a preassigned q-dimensional vector. Senda and Taniguchi [16] evaluated the third-order difference between the MSEs of βˆ LS and βˆ JS when (εt ) is a Gaussian process with spectral density matrix f (λ). In what follows, we impose the following assumption. (A6) The regression spectrum M(λ) increases no more than ℓ values of λ ∈ [0, π], and the sum of the ranks of the increases in M(λ) is ℓ. Under (A1)–(A6), it was shown in [6], see p. 216, and [5] that βˆ LS is asymptotically efficient with asymptotic variance matrix ∫ π C = {Im ⊗ R(0)}−1
−π
2π f (λ) ⊗ M⊤ (dλ) {Im ⊗ R(0)}−1 .
By letting uˆ S = βˆ JS defined by (10), we can understand the result presented in [16] as a special case of (2), with {gaa } = C, F(u) = c/∥(Im ⊗ T)(u − b)∥2 , i.e., W(u) in (2) is { } 2c 2(u − b)⊤ (Im ⊗ T)C(Im ⊗ T)(u − b) c W(u) = tr(C) − − , (11) 2 ∥(Im ⊗ T)(u − b)∥2 ∥(Im ⊗ T)(u − b)∥2 ′
which is equivalent to the formula given in Theorem 2 of [16]. To evaluate (11) concretely, let m = 1 and, for each j ∈ {1, . . . , ℓ}, set z j,t = cos(λ j t), where 0 ≤ λ1 < · · · < λℓ ≤ π (time series regression model with a cyclic trend). Then, we observe that d2j (n) = O(n), and R(h) = diag{cos(λ1 h), . . . , cos(λℓ h)}. Hence, M(λ) has jump 1/2 at λ = ±λ j for all j ∈ {1, . . . , ℓ}, and C = diag {2π f (λ1 ), . . . , 2π f (λℓ )} . Let f (λ( j) ) be the jth smallest value of f (λ1 ), . . . , f (λℓ ). Thus, { } on the right-hand side of (11) becomes ℓ−1 c ∑ − , f (λ ) − f (λ ) 2π ( j) (ℓ) 2 j=1
which implies that if all the values of f (λ j ) are large and the values are close to f (λ(ℓ) ), the shrinkage estimator uˆ S greatly improves the LS estimator. 5
Remark 2. If the spectral density f (λ) is bounded and away from zero, then if ℓ becomes large, the improvement of uˆ S (= βˆ JS ) increases accordingly. Example 4. Let {Xt : t ∈ Z} be a Gaussian stationary process with mean 0 and autocovariance function γ(s) satisfying ∞ ∑
s=−∞
Then, the process has the spectral density
f (λ) = Write
Γ =
|s| × |γ(s)| < ∞. ∞ 1 ∑ γ(s)e−isλ . 2π s=−∞
γ(0) ..
γ(1) .. .
.
..
. γ(p − 1) · · ·
..
. γ(1)
sym . γ(0)
Suppose that an observed stretch {X1 , . . . , Xn } of {Xt : t ∈ Z} is available. As an estimator for Γ, we can use Γˆ 0 ≡
n
∑ 1 1 Xt Xt⊤ ≡ S n, n − p + 1 t=p n− p+1
say, where Xt = (Xt , . . . , Xt−p+1 )⊤ . Here, we measure the goodness of the estimator by using the loss function L(Γˆ 0 , Γ) ≡ tr(Γˆ 0 Γ−1 − I p )2 , where I p is the identity matrix, and the risk R(Γˆ 0 , Γ) ≡ E{L(Γˆ 0 , Γ)}. As another estimator, [22] introduced } { 1 b Γˆ = Sn + ( −1 ) C , n− p+1 n tr S n C
where b is a constant and C is a given p × p positive definite matrix. The authors then evaluated the third-order difference between the MSEs of Γˆ 0 and Γˆ in the form ˆ Γ)} lim n2 {R(Γˆ 0 , Γ) − R(Γ,
n→∞
(12)
and provided a condition so that Γˆ improves Γˆ 0 . Because our risk in Theorem 2 differs from the above R(·, ·), and because we discuss the problem of shrinkage estimation for covariance structures in Section 4 more generally, we evaluate the result (12) in the case of p = 1. To see this, we let, by [7], ∫ π ′ gaa = 4π f 2 (λ)dλ a
−π
′
and s = −bΓ for a, a = 1. Then, (2) is essentially equivalent to Eq. (8) in [22]. 4. Shrinkage estimation for the mean-variance portfolio
Suppose the existence of a finite number of assets indexed by a ∈ {1, . . . , q}. Let Xt = (X1t , . . . , Xqt )⊤ denote the random returns on q assets at time t with E(Xt ) = ζ = (ζ1 , . . . , ζq )⊤ and var(Xt ) = Σ = (σaa′ )a,a′ ∈{1,...,q} (non-singular). By writing Xn = (X1 , . . . , Xn ) as the collection of observations of the random returns from t = 1 to t = n and θ = (θ1 , . . . , θ p )⊤ ∈ Θ ⊂ R p , we assume that the probability density function pn (xn ; θ) is identified by the parameter θ = (ζ ⊤ , vech(Σ)⊤ )⊤ , i.e., p = q + q(q + 1)/2 > q. Let u = (u1 , . . . , uq )⊤ be the vector of portfolio weights. Mean-variance optimal portfolio weights have been proposed based on various criteria. The typical one is as follows (see, e.g., [17]): ( ) e⊤ Σ−1 ζ −1 Σ−1 e 1 −1 Σ ζ − ⊤ −1 Σ e + ⊤ −1 , (13) u ≡ u(θ) = 2b e Σ e e Σ e
where b > 0 is a constant depending on an investor and e = (1, . . . , 1)⊤ ∈ Rq . In this section, we consider the following two estimators of optimal portfolio weights: ( ) 1 ˆ −1 ˆ e⊤ Σˆ −1 ζˆ ˆ −1 Σˆ −1 e uˆ ML ≡ u(θˆ ML ) = Σ ζ− Σ e + , (14) ⊤ −1 ⊤ 2b e Σˆ e e Σˆ −1 e (15) uˆ S = {Iq − F (ˆu ML ) /n}ˆu ML , ∑ ∑ ˆ t − ζ) ˆ ⊤ /n. Here, uˆ ML is the MLE of u when X1 , . . . , Xn are iid Nq (ζ, Σ), and where ζˆ = nt=1 Xt /n and Σˆ = nt=1 (Xt − ζ)(X uˆ S is the shrinkage estimator defined in (5). 6
Remark 3. Our portfolio weights uˆ S do not usually satisfy the condition that the sum of the weights equals one. For instance, when F(u) ≡ dIq /∥u∥2 , it is easy to see that ( ) 1 d 1 d ⊤ e⊤ uˆ ML = 1 − e uˆ S = 1 − < 1. n ∥ˆu ML ∥2 n ∥ˆu ML ∥2 Nevertheless, there is no problem if you consider only a part of the total investment amount (i.e., 1 − uˆ ⊤S e is not invested or is invested in a risk-free asset). In this section, we evaluate W(u) =
q ∑ a=1
′
{2gaa ∂a′ sa + 2µaML sa − (sa )2 }
and seek the necessary condition for W(u) > 0. In view of Theorem 2, W(u) corresponds to the main term of [ { √ } { √ }] n{tr(V ML ) − tr(VS )} ≈ n E ∥ n (ˆu ML − u)∥2 − E ∥ n (ˆuS − u)∥2 ,
which implies that the shrinkage estimator uˆ S asymptotically improves uˆ ML up to the third order if this condition is ′ satisfied. First, we evaluate gaa and µaML when X1 , . . . , Xn are iid ∼Nq (ζ, Σ). Given Xn = (X1 , . . . , Xn ), we write the log likelihood ℓn (θ) as follows: n
ℓn (θ) = −
qn n 1∑ (Xt − ζ)⊤ Σ−1 (Xt − ζ) . ln(2π) − ln {det(Σ)} − 2 2 2 t=1
(16)
Suppose that θˆ ML = (θˆ 1 , . . . , θˆ p )⊤ is the MLE of θ satisfied with ∂i ℓn (θˆ ML ) = 0 for all i ∈ {1, . . . , p}. Define √ [ ] √ Yi = ∂i ℓn (θ)/ n, Yi j = ∂i ∂ j ℓn (θ) − Eθ {∂i ∂ j ℓn (θ)} / n,
and suppose that g∗i j , J ∗jkm , K ∗jkm are described as E(Yi Y j ) = g∗i j + δi j /n + o(1/n),
(17) [ ] √ Yi jk = ∂i ∂ j ∂k ℓn (θ) − Eθ {∂i ∂ j ∂k ℓn (θ)} / n
E(Yi Y jk ) = Ji∗jk + o(1/n),
√ E(Yi Y j Yk ) = Ki∗jk / n + o(1/n),
and that (g∗ )i j is the (i, j)th element of the inverse matrix of {g∗i j }. Proposition 1. The ith element of θˆ ML has the stochastic expansion √ ′ ′ θˆ i = θi + (g∗ )i j Y j / n + (g∗ )i j (g∗ )km Y jk Ym /n + R∗jkm (g∗ )i j (g∗ )kk (g∗ )mm Yk′ Ym′ /(2n) + o p (1/n), ∗ ∗ where R∗jkm = −K ∗jkm − J ∗jkm − Jkm j − Jm jk .
By writing B¯ ai = ∂i ua = ∂ua /∂θi and Ca′ bb′ = lim
n→∞
1 E(∂a′ ℓn ∂b ℓn ∂b′ ℓn + ∂a′ ℓn ∂b ∂b′ ℓn ) = Ka∗′ bb′ + Ja∗′ bb′ n
for a, b, a′ , b′ ∈ {1, . . . , q}, we have the following proposition. Proposition 2.
′ ′ (i) µaML (of (3)) = −(g∗ )aa B¯ bi B¯ bj Ca′ bb′ (g∗ )i j /2.
(ii) One has W(u) =
q ∑ a=1
′
{2gaa ∂a′ sa +2µaML sa −(sa )2 } =
q { ∑ a=1
( ) } 1 ′ ′ ′ 2 B¯ aj B¯ ai (g∗ )i j ∂a′ sa + 2 − (g∗ )aa B¯ bi B¯ bj Ca′ bb′ (g∗ )i j sa − (sa )2 (18) 2
7
′
Consider a JS-type estimator (see [8]) with F(u) ≡ dIq /∥u∥2 , where d is a constant value. Then, sa = F(u)aa′ ua from (2) and ∂a′ sa can be written as ) ( a) ) { ( ( ′ } ∂ dua ∂u d ∂ 1 δ(a, a′ ) 2ua ua a = = d , ∂a′ sa = a′ + du − ∂u ∥u∥2 ∂ua′ ∥u∥2 ∂ua′ ∥u∥2 ∥u∥2 ∥u∥4 which implies that from (18), { ( a ) ( ) ( a )2 ] } q [ ′ ∑ u 1 u 2ua ua′ ∗ ij a ¯ a′ ∗ i j δ(a, a ) ∗ aa′ ¯ b ¯ b′ ¯ W(u) = d 2 B j Bi (g ) + 2 − (g ) Bi B j Ca′ bb′ (g ) −d − 2 4 2 2 2 ∥u∥ ∥u∥ ∥u∥ ∥u∥ a=1 q (ua )2 } d ∑{ ¯ a ¯ a ∗ i j ∗ ij a ¯ a′ ∗ i j ua ua′ ∗ aa′ ¯ b ¯ b′ ¯ ′ bb′ (g ) ua − d = 2 B B (g ) − 4 B B (g ) − (g ) B B C . a j i j i i j ∥u∥2 a=1 ∥u∥2 ∥u∥2
(19)
Therefore, by denoting ¯ B =
B¯ 11 .. . B¯ q 1
B¯ 1p .. . B¯ qp
··· .. . ···
we have the following result.
,
G−1
∗ 11 (g ) .. = . (g∗ ) p1
(g∗ )1p .. .
··· .. . ···
(g∗ ) pp
,
′ ′ C(u) = (g∗ )aa B¯ bi B¯ bj Ca′ bb′ (g∗ )i j ua ,
Theorem 3. W(u) =
2d { ¯ −1 ¯ ⊤ ¯ −1 B¯ ⊤ u/∥u∥2 − C(u)/2 − d/2}. tr( BG B ) − 2u⊤ BG 2 ∥u∥
(20)
¯ −1 B¯ ⊤ . Then, the shrinkage estimator uˆ S asymptotically improves uˆ ML up to Let ν1 < · · · < νq be the eigenvalues of BG the third order if ∑ νi − νq − C(u)/2 − d/2 > 0. i,q
5. Real data analysis In this section, we compare different portfolios based on the q = 20 daily stock returns listed in Table 1. The stock data (closing price) are observed from August 3, 2015 to February 28, 2017 (384 observations in total) and there are 383 observations of the return data. With respect to the portfolio choice strategies, we considered seven options as follows: 1. Naive Portfolio (NP): The naive strategy has the portfolio weight uˆ (NP) = 1/q in each of the q risky assets, e.g., [3]. i 2. Sample-based Mean-Variance Portfolio (ML): In the mean-variance model of [15], the investor optimized the tradeoff between the mean and variance of portfolio returns. The sample-based approach corresponds to the classic “plug-in” approach. That is, the mean vector and covariance matrix of the asset returns of (13) are replaced by their ˆ i.e., (14). sample counterparts ζˆ and Σ, 3. Bayes–Stein Shrinkage Portfolio (BS): The Bayes–Stein portfolio proposed by [11] applies the shrinkage estimation ˆ ζ, ˆ where concept pioneered in [8, 18], but replaces ζˆ with ζˆS in (14). Here, ζˆS is defined by ζˆS = δˆ ζˆ0 e + (1 − δ) } { q−2 ,1 δˆ = min n(ζˆ − ζˆ0 e)⊤ Σˆ −1 (ζˆ − ζˆ0 e) ˆ −1 e/e⊤ Σ−1 e. and ζˆ0 = ζΣ 4. Covariance Shrinkage Estimated Portfolio (CS): Another shrinkage estimation strategy, developed by [14], replaces ˆ where the covariance matrix estimator Σˆ with Σˆ S in (14). Here, Σˆ S is defined by Σˆ S = e δS + (1 − e δ)Σ, ∑n ˆ 2 i=1 ∥Xi Xi⊤ − Σ∥ F e δ = min , 1 2 2 2 ˆ ˆ n {tr(Σ ) − tr (Σ)/q} ˆ q /q. and S = tr(Σ)I
8
Table 1: List of datasets considered
# 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Stock Code 1689 2303 2315 3264 3300 3624 3660 3664 3666 3902 3906 3912 6033 6034 6063 6094 6656 6836 7575 8704
Issue Name ETFS Natural Gas Dawn Corporation Caica Inc. Ascot Corp. Ambition Corporation Axel Mark Inc. istyle Inc. mobcast Inc. Tecnos Japan Inc. Medical Data Vision Co., Ltd. Albert Inc. Mobile Factory Inc. Extreme Co., Ltd. MRT Inc. Emergency Assistance Japan Co., Ltd. FreakOut Holdings Inc. inspec Inc. Plat’home Co., Ltd. Japan Lifeline Co., Ltd. Traders Holdings Co., Ltd.
Market Division ETF JASDAQ Standard JASDAQ Standard JASDAQ Standard Mothers Mothers First Section Mothers First Section First Section Mothers Mothers Mothers Mothers JASDAQ Standard Mothers Mothers Second Section First Section JASDAQ Standard
Category of Industry Information & Communication Information & Communication Real Estate Real Estate Information & Communication Information & Communication Information & Communication Information & Communication Information & Communication Information & Communication Information & Communication Services Services Services Services Electric Appliances Electric Appliances Wholesale Trade Securities & Commodity Futures
5. Our Proposed Shrinkage Portfolio (PP1, PP2, PP3): We propose three types of portfolio weights, as defined by (15), but the shrinkage functions F1 , F2 , F3 are given by F1 (u)u = du/∥u∥2 ,
F2 (u)u = du(NP) /∥u∥2 ,
F3 (u)u = d{u(NP) − u⊤ u(NP) u/∥u∥2 }/∥u∥2 ,
where d is a given constant value and u(NP) is the NP. Our analysis relies on a “rolling sample” approach. Specifically, given T (≡ 383) daily asset returns, we choose an estimation window of length n (≡ 50) days. At each time t, starting from t = n + 1, we use data from the previous n days to estimate the portfolio weights and then use these weights to compute the return from times t to t + 1. This process continues by adding the return for the next period to the dataset and dropping the earliest return until reaching the end of the dataset. The outcome of this rolling window approach is a series of T − n (≡ 333) daily out-of-sample returns generated by each of the portfolio strategies. Given the time series of the daily out-of-sample returns generated by each strategy, we computed five quantities. First, we computed the out-of-sample mean (Mean) and the out-of-sample standard deviation (STD) of strategy k (µˆ k and σ ˆ k ), which are defined as the sample mean and sample standard deviation of the out-of-sample returns, respectively. Then, we measured the out-of-sample Sharpe ratio (SR) and certainty-equivalent return (CEQ) of strategy k, which are defined as c k = µˆ k /σ SR ˆ k,
d k = µˆ k − σ CEQ ˆ 2k /(2b),
where b is the risk aversion given in (14). Generally, the strategy is more effective as the Mean rises and the STD falls. However, when comparing the two strategies, both the Mean and the STD of one are typically greater than those of another. In such a case, we consider that the strategy with the greater SR or CEQ is more effective. Next, we computed the portfolio turnover (TO) to obtain a sense of the amount of trading required to implement each portfolio strategy. TO is defined as the average sum of the absolute value of the trades across q available assets: ck = TO
T −n q
1 ∑ ∑ (k) |ˆu − uˆ (k) j,t+ |, T − n t=1 j=1 j,t+1
where uˆ (k) ˆ (k) ˆ (k) j,t+ = u j,t exp(X j,t ) is the portfolio weight before rebalancing at j,t+1 is the portfolio weight in asset j at time t + 1, u time t + 1, and X j,t is the asset return in asset j at time t. Because the TO is related to the transaction cost when rebalancing portfolio weights, it is better to take a smaller value in practice. Finally, we computed the mean of the sum of the portfolio 9
weights (SW) of strategy k, which is defined as follows: dk = SW
T −n q
1 ∑ ∑ (k) uˆ . T − n t=1 j=1 j,t
The SW is an index calculated only for our proposed strategy and the value of the other strategy is always 1. As noted in Remark 3, there is no practical problem if we remember that we invest only a part (1-SW) of the total investment amount. Furthermore, when considering 1 − SW to invest in the risk-free asset, when the Mean is greater (in many cases), and the STD is the same, we can expect the SR and CEQ to improve. Table 2 lists the Mean, STD, SR, CEQ, TO, and SW for each strategy. In the case of PP1, the Mean tends to decrease as d increases. However, in the case of PP3, which modifies the shrinkage function F, the Mean may increase, suggesting that the direction of the shrinkage function should differ from that of uˆ ML . In particular, in the case of PP3, the direction is orthogonal to uˆ ML . On the contrary, the STD tends to decrease as d increases, because the sum of the portfolio weights is less than one, meaning that part of the total investment amount is not invested. When the Mean increases and STD decreases, the performances of the SR and CEQ improve. In addition, although not considered here, in the proposed methods, by investing a part 1 − SW of the total investment amount in a risk-free asset, the Mean will increase, whereas the STD will be unchanged. As a result, performance will improve (except when the interest rate is negative as is currently the case in Japan). If we compare the existing methods, we see that the CS exhibits the highest performance. However, this result also shows that unless F and d can be chosen appropriately, our proposed portfolio can generate better performance than the CS. Although the shrinkage functions F and d are identified in several cases, we leave the topic of how to choose the optimal F and d values to future work. In the case of PP1, the TO tends to decrease as d increases, which implies that the trading cost falls. On the contrary, in the case of PP3, the TO tends to increase as d increases. Appendix Proof of Theorem 1. It follows from (4.1.2) on p. 169 of [21] that 1
1
∫
∫
{ a N(x; F ) 1 + c−1 n µ Ha (x) −∞ −∞ ( ) 1 abcd 1 a bcd 1 1 abc c + µ c c Habc (x) + 2 (cab + µa µb )Hab (x) + Habcd (x) + 6cn 2cn 24c2n 6c2n } 1 abc a′ b′ c′ ′ ′ ′ + c c H (x) dx + o(1/c2n ), (21) abca b c 72c2n q
q
Pr(wˆ ≤ x , . . . , wˆ ≤ x ) =
x1
···
xq
where x = (x1 , . . . , xq )⊤ , N(x; F ) = (2π)−q/2 |F |−1/2 exp(−x⊤ F −1 x/2),
H j1 ,..., js (x) =
(−1) s ∂s N(x; F ) N(x; F ) ∂x j1 · · · ∂x js
and the indices a, b, c, a′ , b′ , c′ run from 1 to q. Here, F = 1/(gab − gaκ gbµ gκµ ), where the indices κ and µ run from q + 1 to p, and cbcd and cabcd are written in terms of the quantities in Assumption 1. The integration of wˆ a wˆ b with respect to the Edgeworth expansion (21) yields (4). Proof of Theorem 2. The substitution of (6) and µ ML for (4) yields VS and V ML . Then, the difference between tr(VS ) and tr(V ML ) leads to (2). Proof of Proposition 1. Let K1 = {1, . . . , q} and K2 = {q + 1, . . . , p}. For each i ∈ K2 , the suffix of σi1 i2 is determined by (i1 , i2 ) ≡ (i1 (i), i2 (i)) ∈ {1, . . . , q} × {1, . . . , q} specified by the following equation: q+1 θ θq+2 θq+3 .. . θp
=
θq+1 θq+2 θq+3 .. .
θq+q(q+1)/2
σ11 . = vech(Σ) = vech .. σq1
..
. ···
σ 11 ∗ σ21 = σ22 . .. σqq σqq
.
For example, (i1 (q+1), i2 (q+1)) = (1, 1), (i1 (q+2), i2 (q+2)) = (2, 1), (i1 (q+3), i2 (q+3)) = (2, 2), and (i1 (p), i2 (p)) = (q, q). 10
Table 2: The out-of-sample mean (Mean), out-of-sample standard deviation (STD), out-of-sample Sharpe ratio (SR), certainty-equivalent return (CEQ), portfolio turnover (TO), and sum of the portfolio weights (SW) of our proposed shrinkage portfolios (PP1, PP2, PP3), naive portfolio (NP), samplebased mean-variance portfolio (ML), Bayes–Stein shrinkage portfolio (BS), and covariance shrinkage estimated portfolio (CS) for asset size q = 20), sample size n = 50, and T = 383.
Strategy
Mean
PP1 (d = 1) (d = 3) (d = 5) PP2 (d = 1) (d = 3) (d = 5) PP3 (d = 1) (d = 3) (d = 5) NP ML BS CS
0.0031 0.0025 0.0018 0.0034 0.0034 0.0033 0.0035 0.0036 0.0037 0.0012 0.0035 0.0035 0.0033
PP1 (d = 1) (d = 3) (d = 5) PP2 (d = 1) (d = 3) (d = 5) PP3 (d = 1) (d = 3) (d = 5) NP ML BS CS
0.0033 0.0026 0.0019 0.0036 0.0035 0.0034 0.0037 0.0037 0.0038 0.0012 0.0036 0.0036 0.0034
PP1 (d = 1) (d = 3) (d = 5) PP2 (d = 1) (d = 3) (d = 5) PP3 (d = 1) (d = 3) (d = 5) NP ML BS CS
0.0033 0.0026 0.0020 0.0036 0.0035 0.0035 0.0037 0.0038 0.0039 0.0012 0.0037 0.0037 0.0035
STD
SR b = 10 0.0323 0.0973 0.0278 0.0893 0.0236 0.0773 0.0337 0.1018 0.0323 0.1037 0.0315 0.1042 0.0342 0.1029 0.0337 0.1073 0.0335 0.1106 0.0259 0.0470 0.0346 0.1003 0.0346 0.1024 0.0309 0.1078 b = 30 0.0323 0.1012 0.0279 0.0931 0.0237 0.0811 0.0337 0.1057 0.0324 0.1077 0.0315 0.1081 0.0342 0.1068 0.0337 0.1113 0.0335 0.1146 0.0259 0.0470 0.0346 0.1042 0.0346 0.1049 0.0309 0.1117 b = 100 0.0323 0.1025 0.0279 0.0944 0.0237 0.0825 0.0338 0.1070 0.0324 0.1090 0.0316 0.1095 0.0342 0.1081 0.0337 0.1126 0.0336 0.1159 0.0259 0.0470 0.0346 0.1055 0.0346 0.1057 0.0309 0.1130
11
CEQ
TO
SW
0.0031 0.0024 0.0018 0.0034 0.0033 0.0032 0.0035 0.0036 0.0037 0.0012 0.0034 0.0035 0.0033
0.3217 0.2802 0.2435 0.3436 0.3434 0.3444 0.3481 0.3569 0.3663 0.0108 0.3440 0.3440 0.2479
0.9229 0.7687 0.6145 0.9929 0.7687 0.6145 0.9405 0.8215 0.7025 1.0000 1.0000 1.0000 1.0000
0.0033 0.0026 0.0019 0.0035 0.0035 0.0034 0.0036 0.0037 0.0038 0.0012 0.0036 0.0036 0.0034
0.3219 0.2803 0.2437 0.3437 0.3435 0.3445 0.3482 0.3569 0.3663 0.0108 0.3441 0.3442 0.2478
0.9230 0.7691 0.6152 0.9230 0.7691 0.6152 0.9406 0.8217 0.7028 1.0000 1.0000 1.0000 1.0000
0.0033 0.0026 0.0020 0.0036 0.0035 0.0034 0.0037 0.0038 0.0039 0.0012 0.0036 0.0037 0.0035
0.3221 0.2806 0.2440 0.3438 0.3437 0.3446 0.3484 0.3570 0.3664 0.0108 0.3443 0.3444 0.2479
0.9231 0.7694 0.6156 0.9231 0.7694 0.6156 0.9406 0.8218 0.7031 1.0000 1.0000 1.0000 1.0000
We now introduce the following: i
ˇ 0, . . . , 0)⊤ , ei = (0, . . . , 0, 1,
Ei1 i2
∂Σ = = ∂σi1 i2
∑ ∑ ζˆ = nt=1 Xt /n, Σ˜ = nt=1 (Xt − ζ)(Xt − ζ)⊤ /n and
{
ei1 e⊤i2 + ei2 e⊤i1 ei1 e⊤i2
if i1 , i2 , if i1 = i2 ,
Σ(i) = Σ−1 Ei1 i2 Σ−1 ,
Σ(i, j) = Σ−1 Ei1 i2 Σ−1 E j1 j2 Σ−1 + Σ−1 E j1 j2 Σ−1 Ei1 i2 Σ−1 , Σ(i, j,k) = Σ−1 Ei1 i2 Σ−1 E j1 j2 Σ−1 Ek1 k2 Σ−1 + Σ−1 E j1 j2 Σ−1 Ek1 k2 Σ−1 Ei1 i2 Σ−1 + Σ−1 Ek1 k2 Σ−1 Ei1 i2 Σ−1 E j1 j2 Σ−1 + Σ−1 Ek1 k2 Σ−1 E j1 j2 Σ−1 Ei1 i2 Σ−1 + Σ−1 E j1 j2 Σ−1 Ei1 i2 Σ−1 Ek1 k2 Σ−1 + Σ−1 Ei1 i2 Σ−1 Ek1 k2 Σ−1 E j1 j2 Σ−1 . Then, for a fixed i ∈ K2 , ∂i ℓn (θ) (where the log likelihood ℓn (θ) is defined in (16)) can be written as follows: n ∂ℓn (θ) n 1 ∑ −1 ⊤ −1 −1 (X (X = − tr(Σ Ei1 i2 ) + tr ∂i ℓn (θ) = − ζ) Σ E Σ − ζ) t i i t 1 2 ∂σi1 i2 2 2 t=1 n ∑ 1 n −1 −1 ⊤ −1 (Xt − ζ) (Xt − ζ) Σ E i1 i2 Σ = − tr(Σ Ei1 i2 ) + tr 2 2 t=1
n ˜ = n tr{Σ−1 Ei1 i2 Σ−1 (Σ˜ − Σ)} = n tr{Σ(i) (Σ˜ − Σ)}. = tr(−Σ−1 Ei1 i2 + Σ−1 Ei1 i2 Σ−1 Σ) 2 2 2
In the same manner, the first to third derivatives of ℓn (θ) are as follows: { n e⊤i Σ−1 (ζˆ − ζ) if i ∈ K1 , ∂i ℓn (θ) = (n/2) tr{Σ(i) (Σ˜ − Σ)} if i ∈ K2 , if (i, j) ∈ K1 × K1 , −n e⊤i Σ−1 e j ⊤ ( j) ˆ Σ ( ζ − ζ) if (i, j) ∈ K1 × K2 , −n e ∂i ∂ j ℓn (θ) = i −(n/2) tr{Σ(i, j) (Σ˜ − Σ) + Σ(i) E j1 j2 } if (i, j) ∈ K2 × K2 , 0 if (i, j, k) ∈ if (i, j, k) ∈ n e⊤i Σ(k) e j ∂i ∂ j ∂k ℓn (θ) = n e⊤i Σ( j,k) (ζˆ − ζ) if (i, j, k) ∈ (n/2) tr{Σ(i, j,k) (Σ˜ − Σ) + Σ( j,k) Ei i + Σ(i, j) Ek k } if (i, j, k) ∈ 1 2 1 2
(22) (23) K1 × K1 × K1 , K1 × K1 × K2 , K1 × K2 × K2 , K2 × K2 × K2 .
(24)
Then, from (22) to (24), we have the following: √ ( ) ⊤ −1 if i ∈ K1 , n ei Σ { ζˆ − ζ } Yi = √ ( n/2) tr Σ(i) (Σ˜ − Σ) if i ∈ K2 , 0 if (i, j) ∈ K1 × K1 , √ ⊤ ( j) ˆ − n e Σ ( ζ − ζ) if (i, j) ∈ K1 × K2 , Yi j = i −( √n/2) tr{Σ(i, j) (Σ˜ − Σ)} if (i, j) ∈ K2 × K2 , 0 if (i, j, k) ∈ K1 × K1 × K1 , if (i, j, k) ∈ K1 × K1 × K2 , 0√ Yi jk = ⊤ ( j,k) ˆ n e Σ ( ζ − ζ) if (i, j, k) ∈ K1 × K2 × K2 , i ( √n/2) tr{Σ(i, j,k) (Σ˜ − Σ)} if (i, j, k) ∈ K2 × K2 × K2 . Next, we compute the expectations E(Yi Y j ), E(Yi Y jk ) and E(Yi Y j Yk ). For (i, j) ∈ K2 × K2 ,
E(Yi Y j ) = n E[tr{(Σ˜ − Σ)Σ(i) }tr{(Σ˜ − Σ)Σ( j) }]/4 = n vec(Σ( j) )⊤ E{vec(Σ˜ − Σ)vec(Σ˜ − Σ)⊤ }vec(Σ( j) )/4. Here, let Ca1 a2 b1 b2 denote E{(Σ˜ − Σ)a1 a2 (Σ˜ − Σ)b2 b1 }. Then, it follows that Ca1 a2 b1 b2 = (σa1 b2 σa2 b1 + σa1 b1 σa2 b2 )/n. Let C denote a q2 × q2 matrix with Cq(a1 −1)+a2 ,q(b1 −1)+b2 = Ca1 a2 b1 b2 = σa1 b1 σa2 b2 + σa1 b2 σa2 b1 . If i1 (i) , i2 (i) and j1 ( j) , j2 ( j), we
12
have E(Yi Y j ) = n vec(Σ(i) )⊤Cvec(Σ( j) )/4 q ∑ ( j) =n Σ(i) a1 a2 C a1 a2 b1 b2 Σb1 b2 /4 =
=
a1 ,a2 ,b1 ,b2 =1 q ∑
(σa1 i1 σi2 a2 + σa1 i2 σi1 a2 )(σa1 b1 σa2 b2 + σa1 b2 σa2 b1 )(σb1 j1 σ j2 b2 + σb1 j2 σ j1 b2 )/4
a1 ,a2 ,b1 ,b2 =1 q ∑ a1 ,b2 =1
{δ(i1 , b1 )δ(i2 , b2 ) + δ(i1 , b2 )δ(i2 , b1 ) + δ(i2 , b1 )δ(i1 , b2 ) + δ(i2 , b2 )δ(i1 , b1 )} × (σb1 j1 σ j2 b2 + σb1 j2 σ j1 b2 )/4
( ) = σi1 j1 σ j2 i2 + σi2 j1 σ j2 i1 + σi2 j1 σ j2 i1 + σi1 j1 σ j2 i2 + σi1 j2 σ j1 i2 + σi2 j2 σ j1 i1 + σi2 j2 σ j1 i1 + σi1 j2 σ j1 i2 /4 = σ j1 i1 σ j2 i2 + σ j2 i1 σ j1 i2 ,
where σi j is the (i, j)th element of Σ−1 . In the same manner, we have the following: ij if (i, j) ∈ K1 × K1 , σ 0 if (i, j) ∈ K1 × K2 , E(Yi Y j ) = (Σ(i) ) /2 if (i, j) ∈ K × K , j 2 2 0 (i, j, k) ∈ K × K1 × K1 , 1 (k) −(Σ ) if (i, j, k) ∈ K × K1 × K2 , ij 1 if (i, j, k) ∈ K1 × K2 × K2 , 0 E(Yi Y jk ) = 0 if (i, j, k) ∈ K2 × K1 × K1 , 0 if (i, j, k) ∈ K2 × K1 × K2 , −(Σ(i, j) ) /2 if (i, j, k) ∈ K × K × K , k 2 2 2 0 if (i, j, k) ∈ K × K × K1 , 1 1 √ if (i, j, k) ∈ K1 × K1 × K2 , (Σ(k) )i j / n E(Yi Y j Yk ) = 0 if (i, j, k) ∈ K1 × K2 × K2 (Σ(i, j) )k /(2 √n) if (i, j, k) ∈ K2 × K2 × K2 ,
where (H)i = (H)i1 i2 + (H)i2 i1 if i1 (i) , i2 (i) and (H)i = (H)i1 i2 if i1 (i) = i2 (i) for the q × q matrix H and i ∈ K2 . Write the above as follows: √ E(Yi Y j ) = g∗i j + δi j /n + o(1/n), E(Yi Y jk ) = Ji∗jk + o(1/n), E(Yi Y j Yk ) = Ki∗jk / n + o(1/n). We obtain the stochastic expansion of θˆ i in the same way as discussed in Chapter 4 of [21]. From (17), we have 0 = ∂i ℓn (θˆ ML ) = ∂i ℓn (θ) + (θˆ j − θ j )∂i ∂ j ℓn (θ) + (θˆ j − θ j )(θˆ k − θk )∂i ∂ j ∂k ℓn (θ∗ )/2, √ √ where ∥θ∗ − θ∥ ≤ ∥θˆ ML − θ∥. Writing U i = n (θˆ i − θi ), (25)/ n becomes √ 0 = Yi + U j Yi j / n + U j E{∂i ∂ j ℓn (θ)}/n + U j U k Yi jk /(2n) + U j U k E{∂i ∂ j ∂k ℓn (θ)}/(2n3/2 ) + o p (1/n). Since we have E{∂i ∂ j ℓn (θ)}/n = −g∗i j and E{∂i ∂ j ∂k ℓn (θ)}/n = −Ki∗jk − Ji∗jk − J ∗jki − Jki∗ j = R∗i jk , it follows that √ √ √ g∗i j { n(θˆ j − θ j )} = Yi + U j Yi j / n + U j U k Yi jk /(2n) + U j U k R∗i jk /(2 n) + o p (1/n).
Moreover, if there exists an inverse matrix of (g∗i j ), we can write √ θˆ i − θi = (g∗ )i j Y j / n + (g∗ )i j U k Y jk /n + (g∗ )i j U k U m R∗jkm /(2n) + o p (1/n) √ ′ ′ = (g∗ )i j Y j / n + (g∗ )i j (g∗ )km Y jk Ym /n + R∗jkm (g∗ )i j (g∗ )kk (g∗ )mm Yk′ Ym′ /(2n) + o p (1/n), where (g∗ )i j is the (i, j)th element of the inverse matrix of (g∗i j ) and U i = (g∗ )i j Y j + o p (1). Proof of Proposition 2. To prove Statement (i),
′ ′ ′ ′ ′ µaML = E(Qa ) = E{ B¯ ai (g∗ )i j (g∗ )km Ym (Y jk − Jℓ∗ jk (g∗ )ℓℓ Yℓ′ ) − (g∗ )aa B¯ bi B¯ bj Ca′ bb′ (g∗ )ii (g∗ ) j j Yi′ Y j′ /2} ′ ′ ′ ′ ′ = B¯ ai (g∗ )i j (g∗ )km {Jm∗ jk − Jℓ∗ jk (g∗ )ℓℓ (g∗ )mℓ′ } − (g∗ )aa B¯ bi B¯ bj Ca′ bb′ (g∗ )ii (g∗ ) j j (g∗ )i′ j′ /2 ′ ′ = B¯ ai (g∗ )i j (g∗ )km (Jm∗ jk − Jm∗ jk ) − (g∗ )aa B¯ bi B¯ bj Ca′ bb′ (g∗ )i j /2 ′ ′ = −(g∗ )aa B¯ bi B¯ bj Ca′ bb′ (g∗ )i j /2.
13
(25)
Statement (ii) follows from Statement (i). Proof of Theorem 3. (i) From (19), it is easy to obtain (20). (ii) We find q ∑ 2d ∑ 2d W(u) ≥ νi − 2νq − η − d = νi − νq − η − d > 0. 2 2 ∥u∥ i=1 ∥u∥ i,q This completes the argument.
Acknowledgments The authors would like to thank the Editor and anonymous referees for their constructive comments. The first and second authors were supported by the JSPS Grant-in-Aid 16K00036 (Shiraishi, H., Keio Univ.) and JSPS Grant-in-Aid 15H02061 (Taniguchi, M., Waseda Univ.). References [1] S. Amari, Differential Geometrical Methods in Statistics, Springer, New York, 1985. [2] S.F. Arnold, The Theory of Linear Models and Multivariate Analysis, Wiley, New York, 1981. [3] V. DeMiguel, L. Garlappi, R. Uppal, Optimal versus Naive Diversification: How Inefficient is the 1/N Portfolio Strategy? Rev. Financial Studies 22 (2007) 1915–1953. [4] P. Frost, J. Savarino, An Empirical Bayes Approach to Efficient Portfolio Selection, J. Financial Quantitative Anal. 21 (1986) 293–305. [5] U. Grenander, M. Rosenblatt, Statistical Analysis of Stationary Time Series, Wiley, New York, 1957. [6] E.J. Hannan, Multiple Time Series, Wiley, New York, 1970. [7] Y. Hosoya, M. Taniguchi, A central limit theorem for stationary processes and the parameter estimation of linear processes, Ann. Statist. 10 (1982) 132–153 [Correction: 21 (1993) 1115–1117]. [8] W. James, C. Stein, Estimation with quadratic loss, in: Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability, Ed. J. Neyman, Berkeley, CA: University of California Press 1 (1961) 361–379. [9] J. Jobson, B. Korkie, Putting Markowitz theory to work, J. Portfolio Manag. 7 (1981) 70–74. [10] J. Jobson, B. Korkie, V. Ratti, Improved estimation for Markowitz portfolios using James–Stein type estimation, Proc. Amer. Statist. Assoc., Business and Economics Statistics Section 41 (1979) 279–284. [11] P. Jorion, Bayes–Stein estimation for portfolio analysis, J. Financial Quantitative Anal. 21 (1986) 279–292. [12] P. Lahiri, J.N.K. Rao, Robust estimation of mean squared error of small area estimators, J. Amer. Statist. Assoc. 90 (1995) 758–766. [13] O. Ledoit, M. Wolf, Improved estimation of the covariance matrix of stock returns with an application to portfolio selection, J. Empirical Finance 10 (2003) 603–621. [14] O. Ledoit, M. Wolf, Honey, I shrunk the sample covariance matrix, J. Portfolio Manag. 30 (2004) 110–119. [15] H. Markowitz, Portfolio selection, J. Finance 7 (1952) 77–91. [16] M. Senda, M. Taniguchi, James–Stein estimators for time series regression models, J. Multivariate Anal. 97 (2006) 1984–1996. [17] H. Shiraishi, M. Taniguchi, Statistical estimation of optimal portfolios for non-Gaussian dependent returns of assets, J. Forecasting 27 (2008) 193–215. [18] C. Stein, Inadmissibility of the usual estimator for the mean of a multivariate normal distribution, Proc. Third Berkeley Symposium 1 (1956) 197–206. [19] M. Taniguchi, Higher Order Asymptotic Theory for Time Series Analysis, Springer, Heidelberg, 1991. [20] M. Taniguchi, J. Hirukawa, The Stein–James estimator for short- and long-memory Gaussian processes, Biometrika 92 (2005) 737–746. [21] M. Taniguchi, Y. Kakizawa, Asymptotic Theory of Statistical Inference for Time Series, Springer, New York, 2000. [22] M. Taniguchi, H. Shiraishi, H. Ogata, Improved Estimation for the autocovariances of a Gaussian stationary process, Statistics 41 (2007) 269–277. [23] M. Taniguchi, Y. Watanabe, Statistical analysis of curved probability densities, J. Multivariate Analysis 48 (1994) 228–248. [24] C. Velasco, P.M. Robinson, Edgeworth expansions for spectral density estimates and Studentized sample mean, Econom. Theory 17 (2001) 497– 539.
14