Journal of Econometrics 167 (2012) 458–472
Contents lists available at SciVerse ScienceDirect
Journal of Econometrics journal homepage: www.elsevier.com/locate/jeconom
Semiparametric inference in a GARCH-in-mean model Bent Jesper Christensen a,b , Christian M. Dahl c,b,∗ , Emma M. Iglesias d a
Department of Economics and Business, Aarhus University, Building 1322, DK-8000 Aarhus C, Denmark
b
CREATES, Aarhus University, Building 1322, DK-8000 Aarhus C, Denmark
c
Department of Business and Economics, University of Southern Denmark, Campusvej 55, DK-5230 Odense M, Denmark
d
Department of Applied Economics II, Faculty of Economics and Business, University of A Coruña. Campus de Elviña, A Coruña, 15071, Spain
article
info
Article history: Available online 15 October 2011 JEL classification: C13 C14 C22 G12 Keywords: Efficiency bound GARCH-M model Profile likelihood Risk-return relation Semiparametric inference
abstract A new semiparametric estimator for an empirical asset pricing model with general nonparametric riskreturn tradeoff and GARCH-type underlying volatility is introduced. Based on the profile likelihood approach, it does not rely on any initial parametric estimator of the conditional mean function, and it is under stated conditions consistent, asymptotically normal, and efficient, i.e., it achieves the semiparametric lower bound. A sampling experiment provides finite sample comparisons with the parametric approach and the iterative semiparametric approach with parametric initial estimate of Conrad and Mammen (2008). An application to daily stock market returns suggests that the risk-return relation is indeed nonlinear. © 2011 Elsevier B.V. All rights reserved.
1. Introduction The relation between risk and return is of central importance in asset pricing, hedging, derivative pricing, and risk management. The intertemporal capital asset pricing model (ICAPM) of Merton (1973) predicts a positive and linear relation between the expectation and the variance of returns. Essentially, investors must be compensated for bearing additional risk. Perhaps surprisingly, both significance and even the sign of the linear relation between expected return and variance of return have proved elusive in empirical work. In the present paper, we explore the possibility that the mixed empirical evidence may be due to misspecification of the functional form of the risk-return relation. We allow for a general nonparametric risk-return tradeoff, and model the conditional variances as a GARCH-type process. Besides the possibly nonlinear GARCH-in-mean effect, our specification accommodates exogenous regressors that are typically used as conditioning variables entering linearly in the mean equation, such as the dividend yield. We introduce a new semiparametric estimation procedure
∗ Corresponding author at: Department of Business and Economics, University of Southern Denmark, Campusvej 55, DK-5230 Odense M, Denmark. E-mail addresses:
[email protected] (B.J. Christensen),
[email protected] (C.M. Dahl),
[email protected] (E.M. Iglesias). 0304-4076/$ – see front matter © 2011 Elsevier B.V. All rights reserved. doi:10.1016/j.jeconom.2011.09.028
for the resulting model that does not rely on an initial parametric (linear) estimate of the risk-return relation, which would necessarily be inconsistent if the true relation is indeed nonlinear, and this feature is the key to establishing the asymptotic properties of our estimator. Using the profile likelihood approach, we prove that our semiparametric estimator is consistent, asymptotically normal, and achieves the semiparametric efficiency bound. The literature on the risk-return tradeoff is massive. Motivated by the ICAPM, the original ARCH-M model proposed by Engle et al. (1987) introduces conditional variance into the conditional mean return equation in a linear fashion. Empirical studies of the riskreturn tradeoff applying GARCH-type models to stock returns have obtained mixed results on both the sign and significance of the inmean effect, see, e.g., Bollerslev et al. (1988), Chou (1988), Nelson (1991), Campbell and Hentschel (1992), Chou et al. (1992), Backus and Gregory (1993), Glosten et al. (1993), and Harrison and Zhang (1999). Poterba and Summers (1986) show that the stock market level is determined by the risk-return tradeoff in conjunction with the degree of serial correlation in volatility. Indeed, recent work in asset pricing focusing on volatility innovations examines crosssectional risk premia induced by covariance between volatility changes and stock returns and finds negative premia, e.g., Ang et al. (2006). The idea is that since innovations in volatility are higher during recessions, stocks that co-vary with volatility pay off in bad states, and so should require smaller risk premia. Christensen et al. (2010) consider aggregate time series data on returns and
B.J. Christensen et al. / Journal of Econometrics 167 (2012) 458–472
innovations in GARCH-volatilities and confirm the negative premia from the cross-sectional studies. On the other hand, a positive risk-return tradeoff has been indicated by Brandt and Kang (2004) using a latent VAR methodology, by Ghysels et al. (2005) using weighted rolling sample windows in the variance measurements, and by Christensen and Nielsen (2007) who consider innovations in realized as well as option-implied volatility. For a survey of related studies, see Lettau and Ludvigson (2010). One possible source of the mixed results is misspecification of the way in which conditional variance enters the conditional mean return equation. Indeed, in his early empirical study, Merton (1980) regressed returns not only on sample variances, but also on realized sample standard deviations (volatilities) of returns over subintervals, to determine which relation was more stable. The coefficient in the first regression would be interpreted as the Arrow–Pratt rate of relative risk aversion of the representative investor in the ICAPM, and that in the alternative (squareroot) version of the regression as the slope of the capital market line or Sharpe ratio in the static CAPM of Sharpe (1964) and Lintner (1965). Considerations and debates of these kinds have lead to the interest in specifying more flexible models of the riskreturn relation. Flexibility, however, comes at the cost of more complicated statistical properties. Linton and Perron (2003) use a mean equation given by yt = µ σt2 + εt σt ,
(1)
where yt is the daily return, σt2 is the conditional return variance given information available up to time t − 1, εt ∼ i.i.d. (0, 1), and µ (·) is a smooth mean function determining the functional form of the risk-return relation. The specification is estimated semiparametrically, using an EGARCH process for the conditional variance, but no asymptotic theory is provided. As the authors state, writing the smooth mean function given the other parameters φ as µφ (·), ‘‘· · ·unfortunately, in our we cannot define the correspond model, ing profile quantity µφ σt2 so easily, since σt2 depends, in addition to the parameters, on lagged ε ′ s, which in turn depend on lagged µ′ s. Therefore, we need to know the entire function µ(·) (or at least its values at the T sample points) to construct µφ σt2 ’’. Conrad and Mammen (2008) propose an algorithm and a specification test for GARCH-M effects of the type (1), using QML to get starting values, but they actually require a consistent estimator for the starting values (e.g., in their Assumption 5), and the QML estimator they use is necessarily inconsistent if µ(·) is indeed nonlinear. Further, their main tool is empirical process theory, and this involves high level assumptions, such as E [exp (ρ |εt |)] < ∞ for ρ > 0. Hodgson and Vorkink (2003) estimate the density function of a multivariate GARCH-M model in a semiparametric fashion, but also do not provide a formal asymptotic theory. Sun and Stengos (2006) propose yet another type of semiparametric GARCH-M model, but the nonparametric portion is the density of the innovations, whereas the conditional means and standard deviations are proportional (we reject this case empirically, using a general µ (·)). In this paper, we consider an alternative approach based on (1) and establish the asymptotic theory necessary for inference. The two main differences between our approach and that of Conrad and Mammen (2008) are that (i) we do not use QML or any other inconsistent estimator as starting value, and (ii) instead of empirical process theory we use a profile likelihood approach. In Section 3.4 we also extend our model to allow for exogenous covariates in the conditional mean equation. We could have added covariates in the conditional variance equation, following Campbell (1993), who found a negative risk-return relation in a parametric model in this case. However, Han and Park (2008) and Iglesias (2009) show that allowing for covariates in the conditional variance equation leads to difficulties, and we leave the extension for future research. The model of the present
459
paper is an extension of the double autoregressive model of Ling (2004) to include a general risk-return relation, and is amenable to asymptotic analysis based on the profile likelihood methodology, along the lines of Severini and Wong (1992). Ling (2004) provides empirical support favoring the double autoregressive model over the traditional autoregressive-ARCH in a number of financial return series. Dahl and Iglesias (2011) provide evidence of further cases where volatility is driven by functionals of data, as in the double autoregressive case. In the present paper, we establish the empirical relevance of introducing a risk premium in the double autoregressive model. Our estimation procedure is easy to apply and readily allows calculation of consistent standard errors. In contrast to alternatives such as adaptive estimation, the profile likelihood approach does not require the matrix of expected second order derivatives with respect to the parameters of interest and the nuisance parameters to be block diagonal. Indeed, block diagonality is violated in our model, as we show. The profile approach is based on the principle that a semiparametric problem is at least ‘‘as hard’’ as any parametric subproblem. Therefore, the Fisher information for estimating the parameter of interest in a semiparametric problem is not greater than the Fisher information for estimating that parameter in any parametric subproblem. Hence, we may look at the ‘‘least favorable subproblem’’ and obtain a lower bound on the asymptotic variance of the parameter of interest in the original semiparametric problem. In our case, the parameters of the underlying GARCH process play the role as the parameters of interest, and in this way they become robust to possible nonlinearity of unspecified form in the conditional mean function. Our asymptotic theory utilizes the classic Cramér type conditions for consistency and asymptotic normality, i.e., a central limit theorem for the score, convergence of the Hessian, and uniformly bounded third order derivatives (see, e.g., Lehmann (1999) and Jensen and Rahbek (2004a,b)). This third order approach works through local identification, rather than assuming identification at the outset, and hence we do not require an initial consistent estimate of the conditional mean function. We demonstrate which µ-functions are permitted under this approach (we provide sufficient conditions). Building up the analysis in steps, we first establish the asymptotic theory for the case of a known ‘‘curve’’ defining the relevant subproblem, e.g., µ σt2 may be given by λ(φ)σt2 or λ(φ)σt , for a known function λ(φ) giving the relative risk aversion respectively Sharpe ratio corresponding to given values of the GARCH parameters φ . This part of the paper provides the first asymptotic theory for parametric GARCH-M models. Based on this, we then go on to the general semiparametric case of unknown curves and provide the required consistent estimator of a least favorable curve (or subproblem). The paper is laid out as follows. Section 2 describes our general strategy, based on the profile likelihood approach and the estimation of a least favorable curve. The presentation is heuristic, intended to provide intuition, and leaving technical details to later sections. Section 3 presents our new model and semiparametric estimator. We state conditions under which our estimator is consistent, asymptotically normal, and attains the semiparametric lower bound, without relying on an initial consistent estimate. Section 4 describes in detail our semiparametric estimation algorithm. In a sampling experiment we explore finite sample accuracy, comparing with the parametric approach and the iterative semiparametric approach with parametric initial estimate proposed by Conrad and Mammen (2008). Finally, an empirical application to daily stock market returns is offered. Section 5 concludes. Appendix A collects the proofs of lemmas and theorems. Appendix B contains information about from where to obtain supplementary material related to this article.
460
B.J. Christensen et al. / Journal of Econometrics 167 (2012) 458–472
2. The profile approach It is instructive to compare with the profile likelihood approach applied to the simpler parametric case, with log-likelihood function for T observations given by LT (φ, µ) = L(φ, µ), say. Suppose this may be profiled (concentrated, or partially maximized) with respect to µ to yield the profile log-likelihood L(φ) = L(φ, µφ ), where µφ is the estimator of µ for given φ . Then L behaves like a log-likelihood function itself, e.g., φ maximizing L is the MLE, and the negative Hessian per observation H /T = −∂ 2 L/∂φ∂φ ′ /T consistently estimates the Fisher information on φ , say iφ , so H −1 provides consistent standard errors and ( H /T )−1 reaches the Cramér–Rao lower bound. Standard errors for µ= µφ on the other hand require calculating (and inverting) the full negative Hessian H of L with respect to the vector (φ, µ). What is going on is that since the Cramér–Rao lower bound for (φ, µ) clearly is the inverse of the usual Fisher information and is estimated by (H /T )−1 , as is well-known, the lower bound for the subparameter φ must be the upper left corner of this. Writing the full information matrix as
i=
iφφ iµφ
iφµ iµµ
,
(2)
the lower bound for φ is given by iφφ − iφµ iµµ
−1
iµφ
−1
, by the
rules for partitioned inverses. Equivalently, the marginal information on φ is
iφ = iφφ − iφµ iµµ
−1
iµφ .
(3)
This is readily reinterpreted as the profile information. Thus, µφ is determined by the first order condition for maximizing L(φ, µ) with respect to µ for given φ , i.e., Lµ φ, µφ = 0, using subscripts on L to indicate differentiation. By the implicit function theorem, Lµφ + Lµµ ∂ µφ /∂φ = 0, or, on taking expectation, ∂ µφ /∂φ =
− iµµ
−1
iµφ . This shows the sensitivity of the parameter being profiled out with respect to the parameter of interest. Inserting in (3) above allows recasting the information on φ as the profile information, defined as
∂ µφ Lφ + Lµ ∂φ
iφ = Eφ,µ
∂ µφ Lφ + Lµ ∂φ
′
2
.
(4)
2 −1 = iφφ − 2 iφµ iµµ
The semiparametric case with nonparametric µ is similar. Here, it is important to distinguish between the nonparametric function σ → µ(σ ), a possibly nonlinear risk premium specification, and a curve φ → µφ (·), i.e., a family of such functions, describing a curve in µ-space. The difference from the finite-dimensional case is that we cannot any longer concentrate out µ exactly. For any curve in µ-space that passes through the true parameter point, i.e., the curve φ → µφ is correctly specified in that µφ0 is the true risk premium function µ0 when φ0 is the true value of φ , we have the profile information (4) (without the circumflex) on φ along the curve, namely, the expectation of the square of the derivative of L φ, µφ , where Lµ is a Frechet derivative. The question is which curve to consider. The information on φ cannot be more than that along the least favorable curve, yielding the least information across all curves. We now use that (3) may be rewrit-
Lφ − Lµ βφ|µ
2
, where βφ|µ = iµµ
E0 L1 (φ, µ)|σ 2 with respect to the scalar µ. Of course, the true expectation operator E0 (·) is unknown, but a feasible estimator µφ σ 2 may be constructed by maximizing a kernel smoothed sample analogue of the conditional expectation with respect to µ, given φ and σ 2 . This is the approach we pursue.
3. Asymptotic theory: the semiparametric lower bound We extend the model of Ling (2004) to include a general riskreturn relation. Writing yt for the daily returns, the model we specify is
2 −2 + iµµ iφµ iµµ .
ten as E
−1
totically reaches the semiparametric efficiency bound iφ , and consistent standard errors are again calculated off the negative inverse profile Hessian. It only remains to determine a consistent estimator of a least favorable curve. The idea here is simple: Ideally, we would have estimated µφ (·) by maximizing LT (φ, µ) with respect to µ, for each given φ , producing µφ and in turn the profile log-likelihood function LT φ, µφ . Maximizing this with respect to φ would yield an asymptotically efficient estimator (indeed, the semiparametric MLE). This is exactly maximization along the curve φ → µφ (·), i.e., a particular parametric subproblem. Unfortunately, it is not in general feasible to maximize LT with respect to µ, for each φ . Instead, we consider an asymptotic version obtained by estimating for each φ and at each σ 2 the value of µφ σ 2 by maximizing
yt = µ σt2 + εt σt ,
This is verified by completing the square. For example, in the case where both subparameters are scalar, it reads like E L2φ + 2Lφ Lµ ∂ µφ /∂φ + L2µ ∂ µφ /∂φ
of squares after regression of Lφ on Lµ . Comparing with (4), the curve φ → µφ should be chosen such that the tangent vector ∂ µφ /∂φ is minus the regression coefficient of Lφ on Lµ , thus leaving the minimum information on φ across all possible curves. Equivalently, the residual should be orthogonal to Lµ . Although the curve itself remains unknown, the profile approach delivers an asymptotic equivalence result. Thus, following Severini and Wong (1992), when using a consistent estimator φ → µφ (·) of a least favorable curve φ → µφ (·), maximizing the resulting generalized profile log-likelihood L(φ) = L(φ, µφ ) is asymptotically equivalent to maximizing L(φ, µφ ) itself. The resulting estimator φ asymp-
−1
iµφ denotes the
regression coefficient of Lφ on Lµ , again seen by completing the square and taking expectation. This is the expected residual sum
(5a)
σ =ω+γ + βσ , (5b) 2 2 where E εt σt |It −1 = σt2 is the conditional variance of return, It −1 denotes the sigma field generated by the information available up to time t − 1, and εt ∼ i.i.d. (0, 1) (see Assumption A1 below). We write φ = (ω, γ , β)′ ∈ Φ for the volatility parameters, φ0 = (ω0 , γ0 , β0 )′ for the true values, and µ (·) is a smooth conditional mean function with true value µ0 (·). In contrast to Linton 2 t
y2t −1
2 t −1
and Perron (2003) and Conrad and Mammen (2008), we follow Ling (2004) and use the squared lagged return y2t −1 in (5b), rather than the squared lagged innovation εt2−1 as in the traditional (Bollerslev, 1986) GARCH(1,1) equation σt2 = ω + γ εt2−1 σt2−1 + βσt2−1 . Consider a data set of the form {yt }Tt=1 . The conditional (quasi) log-likelihood corresponding to (5a)–(5b) (and using normality, the notion of true parameter values makes sense in that conditional means and variances are correctly specified) is given by LT (φ, µ (·)) = ln lT (φ, µ (·))
=−
T 1
2 t =1
T 1 yt − µ σt2
ln σt2 −
2 t =1
σt2
2 ,
(6)
with lT denoting the conditional likelihood, and σt2 given by (5b). As φ varies, LT (φ, µ (·)) varies because by (5b) σt2 depends on φ = (ω, γ , β)′ , for each t.
B.J. Christensen et al. / Journal of Econometrics 167 (2012) 458–472
The semiparametric lower bound on the asymptotic variances 1 of estimators of φ is i− φ . Here, the marginal Fisher or profile information on φ is given by (4). Specifically, we have ∂ LT ∂ µφ
∂µ (·)
(φ, µ (·))
∂φ T dµ σt2 yt − µ σt2 dσ 2 · · t . = 2 2 dφ σt dσt t =1
(7)
For clarity, we provide our results in four stages. First, we consider the case of a known correctly specified curve µφ (·), and provide the asymptotic theory of the QMLE. Secondly, we consider estimating an unknown least favorable curve µφ (·) nonparametrically. Third, we provide the asymptotic theory for the semiparametric estimator obtained by nesting nonparametric estimation of µφ (·) in the QML estimation. The fourth stage allows for exogenous variables in the mean equation. 3.1. Asymptotic theory for given curve We start by assuming that in (5a)–(5b) we have a known correctly specified curve, e.g., µ σt2 may be given by λ(φ)σt2 or λ(φ)σt , for a known function λ(·) satisfying λ(φ0 ) = λ0 ,the true λ, at φ0 , the true value of φ . Another example is µ σt2 =
2 λ(φ) σt for known λ(·), or µ (·) could simply be a known
conditional mean function. We also need Assumption A, provided in Appendix A. Assumption A1 is a collection of conditions related to the disturbance and the conditional variance. They are quite standard in the literature on QML estimation of GARCH processes (see, e.g., Berkes and Horváth (2004), Jensen and Rahbek (2004a,b), Straumann and Mikosch (2006), and Dahl and Iglesias (2011)). Assumption A2 requires µφ (·) to take values inside the interior of the parameter space. Assumption A3 is a typical assumption needed in the GARCH context, as in Jensen and Rahbek (2004a,b), to ensure local identification, and it characterizes the risk premium specifications allowed in our setting in terms of first, second and third order derivatives. Theorem A1. Let µφ (·) denote a known correctly specified curve. Define φ ≡ φT to be any element of int (Φ ) satisfying LT φ, µφ (·) = sup LT φ, µφ (·) .
φ∈Φ
Then, under Assumption A,
3.2. Nonparametric estimation of least favorable curve We now relax the assumption of a known curve, and estimate this nonparametrically. We add Assumption B below. Following Severini and Wong (1992, Lemma 8) and Su and Jin (2010), we work with T φ (yt ), a scalar function of yt depending on φ such that µφ σt2 = E T φ (yt ) |It −1 . Initially we will have T φ (yt ) = yt , and this is generalized to T φ (yt ) = yt − x′t α in Section 3.4 below. We denote by fφ j (yt |It −1 ) for j = 0, 1, 2 the conditional density of (j)
T φ (yt ) =
∂j T φ (yt ) , ∂φ j
(8)
and f σt2 denotes the marginal density of σt2 .
(j)
Assumption B. B1. E supφ T φ (yt )
< ∞, j = 0, 1, 2, 3 for
any t.
(j)
v
B2. For some even integer v ≥ 10, supφ E T φ (yt ) 0, 1, 2 for any t. (r )
< ∞, j =
B3. supφ supyt ,yt −1 fφ j (yt |It −1 ) < ∞, j = 0, 1, 2; r = 0, 1, 2, 3, 4 for any t. fφ j (yt |It −1 ) belongs conditionally to the exponential family. B4. supσ 2 f (r ) σt2 < ∞, r = 0, 1, 2, 3, 4 for any t.
t
B5. 0 < infσ 2 ∈S f σt2 ≤ supσ 2 ∈S f σt2 < ∞ for any t. t t B6. σt and yt are geometrically strongly mixing strictly stationary 2 processes. The distribution 2 of σt is absolutely continuous with continuous density f σt . B7. The kernel function K (·) is any everywhere positive density with an absolutely integrable characteristic function, (r ) K (u) 2 du = 1, uK (u) du = 0, u K (u) du < ∞, supu K (u) < ∞, r = 0, . . . , 4. K is also Lipschitzian.
B8. µφ (·) =
ϕ φ (·) , g φ (·)
ϕ φ (·) f (·) and g φ (·) f (·) are twice differen-
tiable with continuous and uniformly bounded second derivatives. ϕ and g belong to C2,1 (b) for some b, i.e., they belong to the space of twice continuously differentiable real valued functions F , defined on R, and such that
∥F ∥∞ ≤ b , (2) F ≤ b, ∞
p φ −→ φ0
as T → ∞, and µ φ (·) is a consistent estimator of µ0 (·). Theorem A1 shows consistency. Asymptotic normality of φ is established next. Theorem B1. Suppose that Assumption A holds. Then (a)
√
461
d
1 T ( φ − φ0 ) −→ N (0, i− φ ),
where iφ is given in Lemmas 2–3 in Appendix A. Finally, let 2 iφ = −T −1 ∂ LT φ, µφ (·) |φ=φ , ′ ∂φ∂φ
then (b) p iφ −→ iφ as T → ∞.
The proofs of Theorems A1 and B1 proceed by the classic Cramér type conditions for consistency and asymptotic normality (central limit theorem for the score, convergence of the Hessian, and uniformly bounded third order derivatives, see Lehmann (1999)) established, e.g., as in Jensen and Rahbek (2004a,b), see Appendix A.
where F (2) is any partial derivative of order 2 for F , and
∥F ∥∞ = inf {a : Pr (F > a) = 0} , τ and such that E exp a T φ (yt ) < ∞ for some a > 0 and τ > 0. We define S to be a compact set such that infσ 2 ∈S g φ σ 2 > 0 and for each integer k δT =
T 2/5 Logk T ln T
,
and the bandwidth satisfies
hT ≃
ln T T
1/5
.
Assumptions B1–B5 follow Severini and Wong (1992), except that higher moments are needed in B2. We also need restrictions on the volatility process, such as B8, following Bosq (1998), in order to extend the results of Severini and Wong (1992) from the i.i.d. case. Lemma 1 provides the result on nonparametric estimation of a least favorable curve. The proof uses the quantities from the assumptions, such as δ T and S, see Appendix A.
462
B.J. Christensen et al. / Journal of Econometrics 167 (2012) 458–472
Lemma 1. Suppose that Assumptions A and B hold. Define µφ (·) by T
µφ σ 2 =
σj−2 yj K
T
p iφ −→ iφ as T → ∞.
σ 2 − σj2 /hT
j =1
. σj−2 K
σ 2 − σj2 /hT
j=1
Then µφ (·) is a consistent estimator of a least favorable curve. Note that µφ (·) depends on σ12 , . . . , σT2 , and hence on φ , by (5b). In particular, given φ , the conditional variances σ12 , . . . , σT2 are known in our case, unlike in Linton and Perron (2003) and Conrad and Mammen (2008). The estimator does not take the simple Nadaraya–Watson form, but controls in addition for conditional heteroskedasticity. 3.3. Asymptotic theory: the semiparametric case We consider now the case that in (5a)–(5b) we know neither the parameter φ nor the curve µφ (·). We combine the results of the previous subsections on estimation of φ given µφ (·), and the estimation of µ (·) given φ . We need to add an extra Assumption C. Assumptions B1–B5 can be relaxed in case we are not interested in reaching the semiparametric efficiency bound, but only in asymptotic normality and consistency. Assumption C1 is a standard condition corresponding to the global identifiability condition of Severini and Wong (1992). Assumption C. C1 (a) For fixed but arbitrary φ1 ∈ int (Φ ) and µ1φ (·) with µ1φ σt2 ∈ int (M ) for all σt2 we define
ϕ φ, µφ (·) =
lT y, φ, µφ (·) f y; φ1 , µ1φ σt2
for all φ ∈ int (Φ ) ,
dy
all µφ (·) with µφ σt2 ∈ int (M ) .
If φ ̸= φ1 , then
ϕ φ, µφ (·) < ϕ φ1 , µ1φ (·) . (b) Assume that the marginal Fisher information on φ is strictly positive, iφ φ, µφ (·) > 0, for all φ ∈ int (Φ ), and all µφ (·)
with µφ σt2 ∈ int (M ).
We now provide the asymptotic theory combining the estimators Theorem A2. Let µφ (·) be a consistent estimator of a least favorable curve µφ (·), e.g., the estimator from Lemma 1. Let φ ≡ φT be any element of int (Φ ) satisfying LT φ, µφ (·) = sup LT φ, µφ (·) .
φ∈Φ
Then, under Assumptions A–C,
3.4. Exogenous conditioning variables in the mean equation Here, we generalize the semiparametric GARCH-M model to accommodate exogenous regressors since additional conditioning variables are typically entered linearly into the mean equation, e.g., the dividend yield. We extend the theory from (5a) and (5b) to the model given by
yt = x′t α + µ σt2 + εt σt ,
(9a)
σt2 = ω + γ y2t −1 + βσt2−1 , (9b) where xt is a p × 1 vector of pre-determined regressors that are correctly specified in the mean equation and α is a conformable vector of parameters. We avoid identification issues by not allowing lagged dependent variables or an intercept in xt . In this ′ case φ = α ′ , ω, γ , β . We extend Theorems A2 and B2 to allow for exogenous variables, and to this end strengthen our assumptions. The relevant Assumption D is given in Appendix A. Theorem A3. . Let µφ σt2 be the estimator from D2 in Appendix A.
2 Then µφ σt is a consistent estimator of a least favorable curve µφ (·) ′ ′ with µφ σt2 ∈ int (M ) for all σt2 . Let φ ≡ φT = αT , ωT , γT , βT be any element of int (Φ ) satisfying LT φ, µφ (·) = sup LT φ, µφ (·) . φ∈Φ
Then, under Assumptions A–D,
as T → ∞, and µφ (·) is a consistent estimator of µ0 (·). By Theorem A3, the maximizer of the generalized profile quasilog-likelihood remains consistent in the presence of exogenous regressors. Theorem B3. Suppose that Assumptions A–D hold in (9a)–(9b). Then (a)
√
d
1 T ( φ − φ0 ) −→ N (0, i− φ ),
where iφ is given in Lemmas 2–3 in Appendix A. Finally, let
p φ −→ φ0
as T → ∞, and µφ (·) is a consistent estimator of µ0 (·). Theorem A2 shows consistency of the estimator of φ obtained by maximizing the generalized profile quasi-log-likelihood, and provides for consistent estimation of the risk-return relation. Theorem B2. Suppose that Assumptions A–C hold. Then (a)
√
The proofs of Theorems A2 and B2 proceed as those for Theorems A1 and B1, but with µφ (·) replaced by µφ (·), based on the asymptotic equivalence result of Severini and Wong (1992) stated in Corollary 1 in Appendix A. Theorem B2 shows that the semiparametric estimator is asymptotically normal and achieves the semiparametric lower bound. In addition, part (b) provides for consistent estimation of standard errors using the second derivative matrix of the generalized profile quasi-log-likelihood.
p φ −→ φ0
φ and µφ (·).
where µφ (·) is given in Lemma 1, then (b)
d
1 T ( φ − φ0 ) −→ N (0, i− φ ),
where iφ is given in Lemmas 2–3 in Appendix A. Finally, let 2 iφ = −T −1 ∂ LT φ, µφ (·) |φ=φ , ∂φ∂φ ′
2 iφ = −T −1 ∂ LT φ, µφ (·) |φ=φ , ∂φ∂φ ′
where µφ (·) is an estimator of µφ (·) as given in D2 in Appendix A, then (b) p iφ −→ iφ as T → ∞.
The proofs of Theorems A3 and B3 proceed as those for Theorems A2 and B2, except that we need to control the extra derivatives with respect to the α vector, see Appendix A. Theorem B3 shows that asymptotic normality carries over to the case with exogenous regressors. Again, the semiparametric 1 efficiency bound i− φ is achieved.
B.J. Christensen et al. / Journal of Econometrics 167 (2012) 458–472
4. Illustrations We begin this section by describing in detail our semiparametric estimation algorithm. Secondly, we carry out a simulation study on the accuracy of the new approach and provide comparisons to the parametric approach as well as the iterative approach with parametric initial estimate proposed by Conrad and Mammen (2008). Finally, we present an empirical illustration, applying our method to the daily S&P 500 stock index return series. 4.1. Estimation of conditional mean and variance Write the model from (9a) and (9b) as yt − xt α = µ(σ ) + et , et = εt σt ,
(10a) (10b)
σt2 = ω + γ y2t −1 + βσt2−1 ,
(10c)
zt = yt − xt α.
(10d)
′
2 t
′
Consider first σ as given. The kernel based estimator of µ(·) given α in this case is given by 2 t
µ(σ 2 ) =
=
wj σ 2 yj − wj σ 2 x′j α,
j
j
In the simulations we use a fixed number of iterations. The estimation algorithm is very stable and convergence is relatively fast. For a convergence criterion one could use the suggestions by Linton and Perron (2003, p. 357) or Conrad and Mammen (2008, p. 19). For the choice of bandwidth parameter we recommend using cross validation, following Linton and Perron (2003). The quasi-log-likelihood function is maximized at each point on a grid and the leave-one-out maximizer is chosen. In order to compute nonparametric confidence bands, we suggest the following wild bootstrap algorithm, which is similar to the one described in Linton and Perron (2003). Step 1: Based on the estimated φ α, ω, γ , β)′ , σt2 , µ( σt2 ) , and = ( εt = yt − x′t α − µ( σt2 ) / σt , calculate εtc = εt − T − 1 εt . Step 2: Let ut be a discrete variable taking the values −1 and 1 with equal probability. Draw a pseudo-random sample (u1 , u2 , . . . , uT ) and construct the sequence εt∗ = εtc ut for t = 1, 2, . . . , T . Step 3: Given initial starting values for y0 and σ02 , define recursively
wj σ 2 zj
∗ 2 2 ∗ 2 σt = ω + γ y∗t −1 + β σt −1 , T 2 y∗t = x′t α + µ σt∗ ; σs2 s=1 , h + σt∗ εt∗ ,
j
(11)
for t = 1, 2, . . . , T .
where
wj σ 2 =
σj K
σ − σj /hT . −2 2 2 σs K σ − σs /hT
2
2
(12)
T
s=1
Plugging µ into (10a) yields yt = x′t α + µ(σt2 ) + et ,
(13)
which may conveniently be written as
′ 2 wj σt yj = xt − wj σt2 xj α + et .
j
j
(14)
From (14), an estimator of α is easily obtained by QMLE for GARCH(1,1), or simply WLS, weighted least squares using weights σt2 . Since we do not observe σt2 , we need to estimate φ = (α ′ , ω, γ , β)′ = (α ′ , θ ′ )′ . The derivations above suggest the following iterative estimation procedure.
(i)
Step 1: Provide a set of initial parameters (i = 0) θ and compute (i) σt2(i) = σt2 θ for t = 1, 2, . . . , T by iterating on Eq. (10c).
T Step 2: Based on the sequence σt2(i) , compute first α (i) based (i)
on (14), and secondly µt from (11) and (12).
t =1
2(i) t
= µ( σ
) for t = 1, 2, . . . , T
T ′ Step 3: Update θ (i) and σt2(i) (i.e., find θ (i+1) = ω, γ , β and t =T 1 consequently σt2(i+1) ) by performing QML on the t =1
GARCH(1,1) model (i) yt = σt ϵt ,
σ =ω+γ 2 t
y2t −1
+ βσ
2 t −1
,
∗ T
yt t =1 , calculate φ∗ ∗ T and µt t =1 by the proposed semiparametric estimation
Step 4: Given the bootstrapped sequence −2
yt −
463
algorithm. Step 5: Repeat steps 2 through 4 m times. The pointwise p · 100% confidence band around µ( σt2 ) is then constructed as the p/2 and (1 − p) /2 quantiles of the empirical distribution of the m bootstrapped estimates µ∗t of µ( σt2 ). As a byproduct, standard errors of φ are estimated from the sample standard deviation of the m bootstrapped estimates φ∗. In some situations, kernel based regressions can be severely biased. In such cases, we propose bias corrected kernel regression, as in Racine (2001). The approach is simple and turns out to be effective in the empirical section below in reducing both curvature bias and boundary bias. It is particularly convenient here, since it can be implemented using the wild bootstrap algorithm above. The biased corrected estimator of µ( σt2 ), denoted µc ( σt2 ), is computed as
µc ( σt2 ) = µ( σt2 ) − bias µ( σt2 ) ,
(15)
where
bias µ( σt2 ) =
m 1 ∗j 2 µ ( σt ) − µ( σt2 ), m j=1
(16)
and µ∗j ( σt2 ) for j = 1, 2, . . . , m are the bootstrapped estimates of 2 µ( σt ) obtained under Step 5.1 As shown by Racine (2001), the bias corrected (pointwise) p · 100% confidence band is then
c 2 µ ( σt ) − µ∗ ( σt2 )p/2 , µc ( σt2 ) + µ∗ ( σt2 )(1−p)/2 ,
(17)
where µ∗ ( σt2 )p denotes the pth quantile in the empirical bootstrap distribution of µ∗ ( σt2 ).
where (i) yt ≡ yt − x′t α (i) − µt(i) .
Step 4: Repeat Steps 2 and 3 for a finite fixed number of iterations or until convergence.
1 In particular, we employ the iterated bias corrected kernel estimator. For computational details, see Racine (2001), or our R codes, available free of charge from the authors.
464
B.J. Christensen et al. / Journal of Econometrics 167 (2012) 458–472
Table 1 Monte Carlo simulation results on the precision of the parametric GARCH-M estimator under alternative data generating processes (described in main text). The number of Monte Carlo replications equals 999. Sample size is 1000. Reported are the medians for the estimated parameters over their simulated (marginal) distributions plus the associated 5th and 95th % quantiles.
N1 N2 N3 A1 A2 A3
Q5
ω
Q95
Q5
γ
Q95
Q5
β
Q95
Q5
µ
Q95
Q5
λ
Q95
0.005 0.006 0.006 0.009 0.005 0.006
0.011 0.011 0.011 0.027 0.011 0.010
0.021 0.021 0.019 0.057 0.020 0.018
0.060 0.061 0.063 0.084 0.063 0.059
0.100 0.100 0.095 0.116 0.102 0.088
0.142 0.138 0.130 0.149 0.142 0.120
0.764 0.750 0.738 0.420 0.760 0.771
0.845 0.833 0.815 0.576 0.835 0.835
0.902 0.894 0.873 0.713 0.898 0.885
−0.061 −0.064 −0.065
0.001 0.001 0.006 0.640 0.054 −0.021
0.065 0.056 0.060 0.821 0.160 0.056
−0.298
0.042 0.500 0.945 −0.420 0.083 1.146
0.387 0.907 1.487 0.330 0.524 1.742
0.472
−0.031 −0.100
0.162 0.575 −1.230 −0.664 0.766
Table 2 Monte Carlo simulation results on the precision of the efficient semiparametric estimator (CDI) under alternative data generating processes (described in main text). For further details, see Table 1.
N1 N2 N3 A1 A2 A3
Q5
ω
Q95
Q5
γ
Q95
Q5
β
Q95
0.005 0.005 0.006 0.008 0.005 0.008
0.010 0.010 0.011 0.021 0.011 0.014
0.023 0.020 0.020 0.038 0.022 0.024
0.058 0.064 0.068 0.083 0.065 0.056
0.099 0.104 0.104 0.119 0.105 0.096
0.133 0.143 0.151 0.146 0.149 0.140
0.772 0.757 0.699 0.498 0.733 0.722
0.849 0.832 0.810 0.607 0.831 0.808
0.906 0.890 0.878 0.708 0.896 0.874
4.2. Monte Carlo simulation experiment For comparison purposes, we work with the data generating processes suggested by Conrad and Mammen (2008), given as
µ(σt2 ) = 0.05σt2 , µ(σt2 ) = 0.5σt2 , µ(σt2 ) = σt2 , µ(σt2 ) = σt2 + 0.5 sin 10σt2 , µ(σt2 ) = 0.5σt2 + 0.1 sin 0.5 + 20 σt2 , µ(σt2 ) = σt2 + 0.12 sin 3 + 30σt2 , ′ with θ = ω, γ , β (CM ) = (0.01, 0.1, 0.85)′ , i.e.,
N1: N2: N3: A1: A2: A3:
σt2(CM ) = 0.01 + 0.1εt2−1 + 0.85σt2−1 . Since our volatility process (10c) is defined differently from that in Conrad and Mammen (2008), we cannot use the exact same value for θ and obtain processes with the same properties as theirs. In order to obtain comparable processes we fix ω = 0.01, γ = 0.1, and set β such that 2(CM ) t
E (σ ) = E (σ 2 t
),
(18a) 2(CM ) 2 t
E (µ(σ ) ) = E (µ(σ 2 2 t
) ).
(18b)
This yields the relationship
β = β (CM ) − (1 − γ − β (CM ) )ω−1 γ E (µ(σt2(CM ) )2 ) = β (CM ) − γ
2(CM ) 2
E (µ(σt
2(CM )
E (σ
2(CM ) t
) )
)
,
(19)
where E (µ(σt )2 ) is obtained by simulation. By this procedure, our GARCH-M specifications are based on the following values of β : β(N1) = 0.85, β(N2) = 0.84, β(N3) = 0.82, β(A1) = 0.68, β(A2) = 0.84, β(A3) = 0.82. The residual ϵt is drawn form a standard normal distribution. Table 1 presents the medians as well as the 5% and 95% quantiles of the estimated parameters in 999 replications based on the parametric GARCH(1,1)-M approach with µ(σt2 ) = µ + λσt2 . As expected, the median parameter estimates under the data generating processes N1–N3 are very close to the true parameter values. As noted by Conrad and Mammen (2008), the in-mean parameter λ is relative well estimated when the model is correctly specified (N1–N3).
Fig. 1. Parametric and semiparametric estimates of µ(σt2 ) for model N1 : µ(σt2 ) = 0.05σt2 . The parametric model assumes that data are generated according to yt = µ + λσt2 + εt σt , σt2 = ω + γ y2t −1 + βσt2−1 , εt ∼ n.i.d.(0, 1). The number of Monte Carlo replications equals 999 and the sample size is 1000. The first/upper panel shows the estimates (medians + the 5th and 95th percent quantiles for the semiparametric estimator over the simulated distributions) of µ(σt2 ) (y-axis) for alternative values of σt2 (x-axis). The second/lower panel shows the histogram of the estimated σt2 .
Table 2 presents the corresponding results based on our semiparametric GARCH(1,1)-M estimator. From the table it is clear that the semiparametric estimator yields very precise estimates of the conditional variance function parameters θ = (ω, γ , β)′ under N1–N3. The parametric and semiparametric estimates of ω, γ , and β are practically identical. Furthermore, the ranges between the 5% and 95 quantiles are approximately the same, indicating that the precisions of the semiparametric estimates are comparable to the parametric maximum likelihood estimates. The first panel in Figs. 1–3 shows the true mean functions and the pointwise median of the parametric and semiparametric estimates, along with the pointwise 5% and 95% quantiles of the semiparametric estimates. Under the models N1, N2, and N3, both estimation procedures seem to do equally well in recovering the true structure of the conditional mean function. The second panel in Figs. 1–3 shows a histogram of the estimated conditional variance and should be interpreted as a measure of denseness/sparseness of σˆ 2 . As expected, the confidence band for the semiparametric estimator is relative narrow when σˆ 2 is dense, and wider in regions where σˆ 2 is sparsely distributed. Table 1 also presents the mean and variance parameter estimates from the parametric GARCH(1,1)-M approach with µ(σt2 ) = µ + λσt2 applied to the data generating processes A1, A2, and A3, i.e., when the model is misspecified. While the estimates
B.J. Christensen et al. / Journal of Econometrics 167 (2012) 458–472
Fig. 2. Parametric and semiparametric estimates of µ(σt2 ) for model N2 : µ(σt2 ) = 0.5σt2 . The parametric model assumes that data are generated according to yt = µ + λσt2 + εt σt , σt2 = ω + γ y2t −1 + βσt2−1 , εt ∼ n.i.d.(0, 1). For further details, see Fig. 1.
Fig. 3. Parametric and semiparametric estimates of µ(σt2 ) for model N3 : µ(σt2 ) = σt2 . The parametric model assumes that data are generated according to yt = µ + λσt2 + εt σt , σt2 = ω + γ y2t −1 + βσt2−1 , εt ∼ n.i.d.(0, 1). For further details, see Fig. 1.
465
Fig. 5. Parametric and semiparametric estimates of µ(σt2 ) for model A2:µ(σt2 ) = 0.5σt2 + 0.1 sin 0.5 + 20σt2 . The parametric model assumes that data are
generated according to yt = µ + λσt2 + εt σt , σt2 = ω + γ y2t −1 + βσt2−1 , εt ∼ n.i.d.(0, 1). For further details, see Fig. 1.
Fig. 6. Parametric and semiparametric estimates of µ(σt2 ) for model A3 : µ(σt2 ) = σt2 + 0.12 sin 3 + 30σt2 . The parametric model assumes that data are generated
according to yt = µ + λσt2 + εt σt , σt2 = ω + γ y2t −1 + βσt2−1 , εt ∼ n.i.d.(0, 1). For further details, see Fig. 1.
This estimation approach is essentially identical to our approach, except that it is based on an initial estimate of σt2 obtained from a parametric GARCH(1,1)-M specification, and that the kernel estimator of µ(·) (given α ) is of the simple Nadaraya–Watson type, i.e., T
K
µ(σ 2 ) =
σ 2 − σj2 /hT zj
j =1 T
. K
(20)
σ 2 − σs2 /hT
s=1
Fig. 4. Parametric and semiparametric estimates of µ(σt2 ) for model A1 : µ(σt2 ) = σt2 + 0.5 sin 10σt2 . The parametric model assumes that data are generated according to yt = µ + λσt2 + εt σt , σt2 = ω + γ y2t −1 + βσt2−1 , εt ∼ n.i.d.(0, 1). For further details, see Fig. 1.
of the variance function parameters are perhaps surprisingly accurate, the mean function parameters are clearly inconsistently estimated. In particular, using the parametric GARCH(1,1)-M model to approximate A1 would lead to a significantly negative estimate of λ, thus falsely indicating a negative risk-return relation. In Table 2 similar results are presented based on the semiparametric approach. Again, the estimated variance parameters are close to their true values and quite accurately estimated. The first panels in Figs. 4–6 reveal that the semiparametric estimate of the mean function again performs very well in uncovering the true mean function. The parametric estimate, which is restricted to be linear, fails to do so. The second panel in Figs. 4–6 shows how important the denseness of σˆ 2 is for the precision of the semiparametric estimator of the risk-return relation. Finally, we use the Conrad and Mammen (2008) semiparametric GARCH(1,1)-M approach adapted to our model (10a)–(10d).
The results are presented in Table 3 and show that if the Conrad and Mammen (2008) approach is iterated until convergence, it provides almost exactly the same results as our approach. This feature is perhaps a bit surprising in finite samples. The result is a consequence of the fact that the parametric variance function estimates are quite robust to mean function misspecification, as illustrated in Table 1. However, the extent to which this result generalizes to other in-mean functional forms is unknown. 4.3. Empirical application We apply the efficient semiparametric GARCH(1,1)-M estimator to the daily returns on the Standard and Poors (S & P) 500 stock market index from January 2, 1990 to December 31, 2007, for a total of T = 4537 daily returns in the time series. The continuously compounded cum dividend returns are defined as yt = log(pt + dt ) − log(pt −1 ), where pt is the closing index level and dt the dividend paid to the index on day t, obtained from CRSP. In addition to the stock returns, we also collect daily data for the same period on a set of four explanatory variables, denoted xt , that have been used in the literature as state variables for conditional mean returns, namely, the dividend yield, term spread, default
466
B.J. Christensen et al. / Journal of Econometrics 167 (2012) 458–472
Table 3 Monte Carlo simulation results on the precision of the semiparametric estimator of Conrad and Mammen (2008) adapted to the alternative data generating processes (described in main text). For further details, see Table 1.
N1 N2 N3 A1 A2 A3
Q5
ω
Q95
Q5
γ
Q95
Q5
β
Q95
0.005 0.006 0.006 0.008 0.005 0.009
0.011 0.011 0.012 0.021 0.011 0.017
0.023 0.020 0.021 0.037 0.022 0.028
0.058 0.064 0.066 0.082 0.063 0.052
0.099 0.104 0.100 0.118 0.105 0.089
0.132 0.137 0.146 0.145 0.149 0.135
0.770 0.755 0.702 0.505 0.732 0.703
0.849 0.832 0.810 0.611 0.831 0.802
0.906 0.890 0.877 0.710 0.896 0.878
Table 4 Estimation results based on the parametric GARCH model: yt = µ + x′t α + σt ϵt , σt2 = ω + γ y2t −1 + βσt2−1 , εt ∼ n.i.d.(0, 1). Sample: Jan 2. 1990–Dec 31. 2007. For a definition of yt = S&P 500 and xt = (Term spread, Default spread, Dividend yield, Momentum) see the body text. Numbers in parentheses are standard errors based on the wild bootstrap.
ω γ β µ Term spread Default spread Dividend yield Momentum * ** ***
Model 1
Model 2
Model 3
Model 4
Model 5
Model 6
0.006*** (0.002) 0.055*** (0.009) 0.939*** (0.010) −0.277*** (0.067) 0.007 (0.012) 0.162** (0.063) 3.229* (1.894) 2.048*** (0.241)
0.007*** (0.002) 0.054*** (0.008) 0.937*** (0.009) 0.041*** (0.012)
0.005*** (0.002) 0.056*** (0.008) 0.940*** (0.009) 0.072*** (0.018) −0.009 (0.010)
0.000 (0.000) 0.061*** (0.009) 0.944*** (0.007) −0.015 (0.054)
0.001 (0.001) 0.078*** (0.010) 0.931*** (0.008) 0.023 (0.039)
0.003* (0.002) 0.068*** (0.008) 0.930*** (0.008) 0.040*** (0.016)
0.090 (0.069) 0.970 (1.705) 0.004 (0.201)
Significant at 10%. Significant at 5%. Significant at 1%.
spread, and momentum. The dividend yield or dividend-price ratio has been used for modeling conditional mean returns by Campbell and Shiller (1988a,b), Fama and French (1988, 1989), and many others. We compute it as the sum j>0 dt −j of dividends to the index over the 12-month period up to and including date t − 1 divided by pt −1 , so that it is known at the beginning of the time interval over which the return yt is realized. The term spread or yield curve slope has been used for conditioning mean returns by Campbell (1987) and Fama and French (1988, 1989), among others. It was also used by Campbell (1987) to condition return variances, and in this capacity it may through the risk-return tradeoff be a candidate regressor in the conditional mean equation, too, as an alternative to the GARCH-variance σt2 that enters through µ(·). The term spread is computed as the difference between the yields on the 10-year Treasury bond and the 12-month T-bill from CRSP. The credit or default spread has been used for conditional mean returns by Fama and French (1988, 1989) and others, and for conditional variances by Schwert (1989) and others. It is calculated as the difference between Moody’s seasoned Baa-rated and Aaa-rated corporate bond yields from the homepage of the Federal Reserve Bank of St. Louis. Finally, trend or momentum factors have been used by Keim and Stambaugh (1986), Carhart (1997), and others. We define momentum as log pt −1 − log p, where p is the average index level across the 12-month period ending on date t − 1. For benchmarking purposes, we first estimate the parametric models, then turn to the new semiparametric estimator. Table 4 shows estimation results for the GARCH(1,1) model without variance-in-mean effects but with xt entering linearly in the conditional mean specification. Model 1 in the first column includes all explanatory variables in xt . The estimates of the parameters θ = (ω, γ , β)′ in the variance equation are strongly significant, with β large (at .94) and the sum of γ and β close to unity, which is standard. Dividend yield, default spread,
and momentum all enter the conditional mean return with significantly positive coefficients, whereas the term spread is insignificant. Models 2 through 6 in the remaining columns are for the cases where each regressor enters alone, or none is used. The GARCH parameters in the variance equation are relatively robust to these reductions. None of the regressors is significant if entered alone, consistent with the notion that the model is misspecified if any of the regressors (except possibly the term spread) is left out. Results from estimation of the parametric GARCH(1,1)-M model obtained by adding the conditional variance to the mean equation in a linear fashion appear in Table 5, laid out as Table 4. The variance equation parameters are largely unaltered by changing the mean specification. The risk-return tradeoff parameter λ, the new parameter compared to Table 4, is significantly positive throughout the table, except in Model 3 (only term spread included from xt ) which is not the preferred model in the table. In Model 1 including all regressors (first column), dividend yield and momentum enter significantly as in Table 4, but the default spread drops out when adding σt2 to the mean specification. Again, Model 1 is preferred within the table, and the model with M-effect (Table 5) is preferred over the pure GARCH (Table 4). The point estimate of λ in Model 1, at .13, would suggest a moderate degree of relative risk aversion in the representative investor. Table 6 shows the similar results when σt replaces σt2 in the mean equation, and the findings are similar: The risk-return tradeoff is positive and significant, the variance parameters θ are robust to changes in the mean specification, and the preferred model retains in addition dividend yield and momentum in the mean equation, whereas any regressor from xt is insignificant if entered individually. Standard information criteria (not reported) do not yield much guidance regarding whether it is conditional variance or its squareroot (conditional volatility) that enters the conditional mean return.
B.J. Christensen et al. / Journal of Econometrics 167 (2012) 458–472
467
Table 5 Estimation results based on the parametric GARCH-M(σt2 ) model: yt = µ + λσt2 + x′t α + σt ϵt , σt2 = ω + γ y2t −1 + βσt2−1 , εt ∼ n.i.d.(0, 1). For further details, see Table 4.
ω
Model 1
Model 2
Model 3
Model 4
0.007 (0.002) 0.056∗∗∗ (0.009) 0.937∗∗∗ (0.010) −0.387∗∗∗ (0.069) 0.125∗∗∗ (0.028) 0.009 (0.012) 0.085 (0.066) 6.613∗∗∗ (2.065) 2.378∗∗∗ (0.241)
0.007 (0.002) 0.055∗∗∗ (0.008) 0.936∗∗∗ (0.009) 0.011 (0.018) 0.048∗∗ (0.022)
0.006 (0.002) 0.056∗∗∗ (0.008) 0.938∗∗∗ (0.009) 0.048∗∗ (0.023) 0.038 (0.024) −0.009 (0.010)
0.004 (0.002) 0.053∗∗∗ (0.007) 0.943∗∗∗ (0.008) −0.041 (0.054) 0.054∗∗ (0.025)
∗∗∗
γ β µ λ Term spread Default spread Dividend yield Momentum
∗∗∗
∗∗∗
∗∗∗
Model 5 0.006 (0.002) 0.060∗∗∗ (0.008) 0.935∗∗∗ (0.008) 0.047 (0.051) 0.086∗∗∗ (0.029)
∗∗∗
Model 6 0.008∗∗∗ (0.002) 0.058∗∗∗ (0.008) 0.935∗∗∗ (0.008) 0.031 (0.026) 0.090∗∗∗ (0.028)
0.085 (0.075)
−1.850 (1.748)
−0.133 (0.215)
* Significant at 10%; ** Significant at 5%; *** Significant at 1%. Table 6
√
2
Estimation results based on the parametric GARCH-M(σt ) model: yt = µ + λ σ t + x′t α + σt ϵt , σt2 = ω + γ y2t −1 + βσt2−1 , εt ∼ n.i.d.(0, 1). For further details, see Table 4.
ω γ β µ λ Term spread Default spread Dividend yield Momentum * ** ***
Model 1
Model 2
Model 3
Model 4
Model 5
Model 6
0.007*** (0.002) 0.055*** (0.009) 0.937*** (0.010) −0.442*** (0.078) 0.198*** (0.055) 0.008 (0.012) 0.089 (0.067) 6.030*** (2.063) 2.268*** (0.240)
0.007*** (0.002) 0.055*** (0.008) 0.936*** (0.009) −0.024 (0.035) 0.086** (0.043)
0.006*** (0.002) 0.057*** (0.008) 0.938*** (0.009) 0.023 (0.040) 0.065 (0.047) −0.010 (0.010)
0.004** (0.002) 0.053*** (0.007) 0.944*** (0.008) −0.078 (0.054) 0.097* (0.051)
0.006*** (0.002) 0.060*** (0.008) 0.935*** (0.008) −0.027 (0.074) 0.167*** (0.060)
0.008*** (0.002) 0.057*** (0.007) 0.936*** (0.008) −0.058 (0.047) 0.191*** (0.055)
0.083 (0.076)
−1.749 (1.780)
−0.114 (0.213)
Significant at 10%. Significant at 5%. Significant at 1%.
Our new semiparametric estimator is applied next, and the results are shown in Table 7. The model with all explanatory variables included (first column) is clearly preferred over that without xt in the mean equation, thus verifying the empirical relevance of our generalization of the semiparametric model to include exogenous linear regressors. In the preferred specification (Model 1), momentum enters positively, as in the parametric models, and in the semiparametric model when entered alone (Model 6), whereas the default spread now regains significance (it was lost in the parametric case when entering the Meffect, moving from Table 4 to 5), and dividend yield gets a negative coefficient. The results make sense, i.e., higher credit or default spread indicates increased risk and so required return is up, and similarly momentum is expected to enter positively, whereas expected returns should be down if more is paid out in the form of dividends, consistent with dividends being valued positively by investors. Thus, the semiparametric approach makes a difference for inferences, and the received estimates appear plausible. Fig. 7 shows the estimated in-mean risk-return tradeoff function µ(·) from Model 1, Table 7, along with 5% and 95% quantile curves (the argument σ 2 is on the first axis). From the figure, the estimated effect is initially decreasing, then increasing and convex.
The bias corrected kernel estimator is also depicted. As expected, the estimated bias appears near the boundaries, and where the curvature of the estimated risk-return tradeoff is the greatest. However, although important, the significance of the bias seems to be modest in our application. Also shown are the parametric fits (Model 1 from Table 5 to 6) with either variance or volatility in-mean, producing linear or even concave curves. This is strong evidence against the parametric specifications based on CAPM and ICAPM. The results using our new semiparametric estimator show that the large variance events are the ones that matter for conditional mean returns. We also show the estimated relation using the approach proposed by Conrad and Mammen (2008). Perhaps surprisingly, this differs significantly from our efficient semiparametric estimator. For comparison, Table 8 shows the semiparametric point estimates based on the Conrad and Mammen (2008) approach, where their method has been adapted to our model specification. The reported standard errors are based on the wild bootstrap as in Tables 4 through 7, again for comparison purposes. The results on the point estimates in Tables 7 and 8 are similar in terms of their signs and magnitudes, although as expected the estimated standard errors based on the efficient semiparametric estimator presented in Table 7 are smaller.
468
B.J. Christensen et al. / Journal of Econometrics 167 (2012) 458–472
Table 7 Estimation results based on the efficient semiparametric estimator (CDI) for the model: yt = µ(σt2 ) + x′t α + σt ϵt , σt2 = ω + γ y2t −1 + βσt2−1 , εt ∼ n.i.d.(0, 1). For further details, see Table 4.
ω γ β Term spread Default spread Dividend yield Momentum
Model 1
Model 2
Model 3
Model 4
Model 5
Model 6
0.006 (0.017) 0.054∗∗∗ (0.008) 0.939∗∗∗ (0.008) 0.018 (0.020) 0.390∗∗∗ (0.148) −1.995∗∗∗ (0.468) 2.565∗∗∗ (0.397)
0.006∗∗ (0.003) 0.055∗∗∗ (0.007) 0.940∗∗∗ (0.008)
0.006∗∗∗ (0.002) 0.055∗∗∗ (0.008) 0.940∗∗∗ (0.008) −0.003 (0.006)
0.006∗∗ (0.002) 0.055∗∗∗ (0.007) 0.940∗∗∗ (0.008)
0.006∗∗∗ (0.002) 0.055∗∗∗ (0.007) 0.940∗∗∗ (0.008)
0.006∗∗∗ (0.003) 0.054∗∗∗ (0.007) 0.939∗∗∗ (0.008)
0.002 (0.008) 0.081 (0.467)
1.322∗∗∗ (0.165)
* Significant at 10%; ** Significant at 5%; *** Significant at 1%. Table 8 Estimation results based on the semiparametric estimator proposed by Conrad and Mammen, adapted to the model: yt = µ(σt2 )+ x′t α +σt ϵt , σt2 = ω+γ (σt2−1 ϵt2−1 )+βσt2−1 , ϵt ∼ n.i.d.(0, 1). For further details, see Table 4. Model 1
ω
β Term spread Default spread Dividend yield Momentum *
***
***
0.003 (0.017) 0.041*** (0.006) 0.956*** (0.008) 0.024 (0.038) 0.322* (0.227) −2.053* (1.149) 2.722*** (0.504)
γ
**
Model 2 0.003 (0.001) 0.056*** (0.006) 0.938*** (0.008)
Model 3 **
0.003 (0.001) 0.046*** (0.017) 0.952*** (0.007) −0.005 (0.007)
Model 4 ***
0.003 (0.001) 0.046*** (0.007) 0.952*** (0.008)
Model 5 ***
0.003 (0.001) 0.046*** (0.007) 0.952*** (0.007)
Model 6 0.003*** (0.001) 0.045*** (0.007) 0.954*** (0.007)
−0.002 (0.005)
−0.109 (0.437) 1.845*** (0.256)
Significant at 10%. Significant at 5%. Significant at 1%.
Finally, we consider a few goodness of fit measures which can be computed for all the estimated models, bias (mean) =
MSE (mean) =
T 1 yt − x′t α − µt , T t =1 T 1
T t =1
bias (variance) =
MSE (variance) =
yt − x′t α−µ t
T 1
T t =1
2
,
yt − x′t α − µt
2
− σt2 ,
T 2 2 1 yt − x′t α − µt − σt2 . T t =1
The results on the goodness of fit measures are reported in Table 9. The most noticeable findings are for the variance related measures. The Conrad and Mammen (2008) estimator exhibits the poorest performance. The other four estimators perform almost equally well in terms of the bias (variance) measure, but our efficient semiparametric estimators (with and without bias correction) outperform the parametric estimators in terms of the MSE (variance) measure. All in all, the empirical application verifies the usefulness of our new semiparametric estimator. The parametric alternatives are misspecified, the risk-return relation is positively sloped at high risk levels and convex in shape, and certain economic conditioning variables are relevant in the mean specification. To analyze
Fig. 7. Parametric and semiparametric estimates of µ(σt2 ) based on the model for S&P 500 with covariates (defined in main text) over the period Jan. 2, 1990–Dec 31, 2007. All models assume that data are generated according to yt = µ(σt2 ) + x′t α + εt σt , σt2 = ω+γ y2t −1 +βσt2−1 , εt ∼ n.i.d.(0, 1). In addition, GARCH-M(σ 2 ) assumes
√ 2 µ(σt2 ) = µ+λσt2 whereas GARCH-M(σ ) assumes µ(σt2 ) = µ+λ σ t . CDI denotes
the efficient semiparametric estimator. The number of Monte Carlo replications equals 999. The figure shows the estimates (medians for all estimators + the 5th (= CI low) and 95th (= CI high) percent quantiles for the CDI estimator over the simulated distributions) of µ(σt2 ) (y-axis) for alternative values of σt2 (x-axis).
this issue further, it would be interesting to compare the semiparametric estimator of the risk-return relation with the nonparametric estimator based on ‘‘realized’’ variance/volatility.
B.J. Christensen et al. / Journal of Econometrics 167 (2012) 458–472
469
Table 9 Alternative goodness of fit measures. The models are estimated based on the sample: Jan 2. 1990–Dec 31. 2007 and with yt = S&P 500 and the covariates xt = (Term spread, Default spread, Dividend yield, Momentum). The measures are defined as bias(mean) = bias(variance) =
1 T
Bias (mean) MSE (mean) Bias (variance) MSE (variance)
T
t =1
y t − xt α − µt ′
2
− σ , MSE(variance) = 2 t
1 T
T
t =1
yt − xt α − µt ′
2
T yt − x′t α − µt , MSE(mean) = 2 t =1
1 T
− σ
2 t
1 T
T
t =1
yt − x′t α − µt
Semiparam. GARCH-M (Bias corrected)
Conrad–Mammen
GARCH-M (σ 2 )
GARCH-M (σ )
0.006 0.964 −0.005 4.515
−0.020
0.000 0.991 0.104 5.085
−0.034
−0.031
0.965 0.006 4.629
−0.002
0.963 4.477
,
.
Semiparam. GARCH-M
−0.006
2
0.968 4.765
This could serve as a general robustness check of the model specification. Our empirical findings also tentatively suggest that it is indeed relevant to consider the GARCH specification where the term y2t −1 enters in the conditional variance (it is statistically significant when the models are fitted) instead of e2t −1 as traditionally assumed. Furthermore, even if the GARCH model uses e2t −1 , our semiparametric estimator may serve as an auxiliary model, as it is much more stable numerically and is fully developed in terms of the asymptotics. This facilitates indirect inference procedures for GARCH models. Formal work on these ideas is in progress.
Acknowledgments
5. Conclusions
As indicated below, parts of the proofs are collected in a Technical Appendix available from the authors upon request.
We have proposed a new semiparametric estimator for an empirical asset pricing model with general nonparametric risk-return tradeoff and a GARCH process for the underlying volatility. The estimator does not rely on any initial parametric estimator of the conditional mean function, and this feature facilitates the derivation of asymptotic theory under possible nonlinearity of unspecified form of the risk-return tradeoff. Using the profile likelihood approach, we have shown that our estimator under stated conditions is consistent, asymptotically normal, and achieves the semiparametric lower bound, including in our generalized semiparametric model where we also allow for the presence of exogenous variables in the mean equation. In Theorems A1 and B1, we provide the first order asymptotic theory for a parametric GARCH-M-type model. To the best of our knowledge, this is a separate contribution. A common problem with most GARCH-type models is that their validity hinges on correct specification of the conditional mean function. Without consistent estimation of the conditional mean, the estimation of conditional variance could be very misleading (see for example Escanciano (2008)). One important feature of our estimator is that it is flexible and does not depend on correct specification of the conditional mean. Here, we use a kernel based smoother. In practice, this may be biased near the boundaries or in regions where data are sparse. A local linear smoother would probably be more robust, and also feasible to implement, but would not fit equally well within our theoretical setup, and the asymptotics would be based on different principles. Although relevant, this extension is outside the scope of the present paper. The sampling experiment shows that our estimator is wellbehaved in a number of benchmark scenarios, with performance similar to that of the iterative approach with parametric initial estimate of Conrad and Mammen (2008). In the empirical application to daily stock market returns, the estimators differ significantly, and the evidence indicates superior performance of our efficient estimator, in terms of bias and mean squared error. Finally, on the substantive side, the empirical results suggest that the linear relation between expected return and variance or volatility from the literature is misspecified. The true relation appears to be initially decreasing, then sharply increasing in volatile states, and this could explain the disagreement in the literature on the sign and significance of the relation.
We are very grateful to two referees, Valentina Corradi, Enno Mammen, Joon Y. Park, and participants at the 2008 SETA Conference in Seoul, at the 2008 Econometric Society European Meetings in Milan, and at a seminar at Oxford University for useful comments, to Christian Bach for research assistance, and to Center for Research in Econometric Analysis of TimE Series (CREATES), funded by the Danish National Research Foundation, and the MSU Intramural Research Grants Program for research support. Appendix A
Assumption A. A1. σt is a strictly stationary process satisfying εt ∼ i.i.d. (0, 1) with E εt4 − 1 = ζ < ∞, and E |εt |2r < ∞ for some r > 2. Furthermore, σt is a sequence of strong mixing random variables with mixing numbers αm , m = 1, 2, . . ., that satisfy αm ≤ Cm−(4r −2)/(2r −2)−δ for positive C and δ , as m → ∞. Moreover, ω, γ and β are strictly positive, σ02 is a finite constant, φ ∈ Φ and φ0 ∈ int (Φ ). positive Finally, E ln γ εt2 + β < 0, and σ02 (φ) is a drawing from the stationary distribution. A2. M denotes a compact subset of the real line such that µφ σt2 ∈ int (M ) for all σt2 . A3. µφ (·) is three times differentiable with
2 y − µ σ 2 ∂µφ σt φ t t ∂ω E < ∞, σt2 2 ∂µφ σt yt − µφ σ 2 t ∂γ E < ∞, σt2 ∂µφ σt2 yt − µφ σ 2 t ∂β E < ∞. 2 σt We also assume that there are constants C1 , C2 , C3 , C4 , C5 and C6 so that
2 ∂µφ σt2
T 1
∂ω
T t =1
σt2
2 ∂µφ σt2
T 1
∂γ
T t =1
σ
2 t
p
|φ=φ0 → C1 < ∞,
p
|φ=φ0 → C2 < ∞,
2 ∂µφ σt2
T 1
∂β
T t =1
σt2
p
|φ=φ0 → C3 < ∞,
470
B.J. Christensen et al. / Journal of Econometrics 167 (2012) 458–472
T 1
∂µφ σt2 ∂µφ σt2 ∂β
Second order derivatives. The second order derivatives (in the Technical Appendix) are needed in Lemma 3 below on the convergence of the Hessian.
p
Lemma 3. Under Assumption A, and with the second order derivatives evaluated at the true parameters,
∂µφ σt2 ∂µφ σt2 ∂ω
∂γ
|φ=φ0 → C5 < ∞,
σt2
T t =1 T 1
p
|φ=φ0 → C4 < ∞,
σt2
T t =1 T 1
∂γ
∂µφ σt2 ∂µφ σt2 ∂ω
∂β
σ
T t =1
2 t
p
|φ=φ0 → C6 < ∞.
Also, there exists a neighborhood N (φ0 ) given by (21) in Definition 1 below for which (a)
(b)
(c)
(d)
(e)
(f)
supφ∈N (φ0 ) T1 supφ∈N (φ0 ) T1 supφ∈N (φ0 ) T1 supφ∈N (φ0 ) T1 supφ∈N (φ0 ) T1 supφ∈N (φ0 ) T1
T
t =1
T
t =1
T
t =1
T
t =1
T
t =1
T
t =1
( ) ∂µφ (σt2 ) 1 T ∂k ≤ T t =1 w11t , σt2 ∂µφ (σt2 ) ∂µφ (σt2 ) ∂σt2 T ∂i ∂j ∂k ≤ T1 t =1 w12t , σt4 ∂µφ (σt2 ) ∂ 2 σt2 T ∂k ∂ i∂ j ≤ T1 t =1 w13t , σt4 2 2 ∂µφ (σt ) ∂σt ∂σt2 T ∂k ∂i ∂j ≤ T1 t =1 w14t , σt6 ∂ 2 µφ (σt2 ) ∂σt2 T ∂ i∂ j ∂k ≤ T1 t =1 w15t , σt4 ∂ 3 µφ (σt2 ) 1 T ∂ i∂ j∂ k ≤ T t =1 w16t , σt2 ∂ 2 µφ σt2 ∂ i∂ j
where w11t , . . . , w16t are stationary and have finite moments, for any i, j, k = ω, γ , β, E (wlt ) = Ml < ∞, ∀l = 11, . . . , 16. Finally, 1 T
T
t =1
a.s.
wlt −→ Ml , ∀l = 11, . . . , 16.
Proof of Lemma 1. The proof of Lemma 1 is in the Technical Appendix. Proof of Theorem A1. The proof follows from Lemmas 2–4 given in the proof of Theorem B1 below. Lemmas 2–4 are proved in the Technical Appendix. Proof of Theorem B1. The proof of part (a) follows from Lemmas 2–4 below. First order derivatives. The first order derivatives (in the Technical Appendix) are needed in Lemma 2 below on the CLT for the score function (see Brown (1971)). It is possible to prove the asymptotic negligibility of the initial value σ02 , following Lumsdaine (1996, Lemma 6). We then have the following CLT for the score.
(a)
1 T
(b)
1 T
(c)
1 T
(d)
1 T
(e)
1 T
(f)
1 T
as T −→ ∞. Third order derivatives. The third order derivatives (in the Technical Appendix) are needed in Lemma 4 below on their uniform boundedness. As in Jensen and Rahbek (2004b), we consider suitable parameter bounds. Definition 1. We consider bounds on each parameter in φ0 ,
ωL < ω0 < ωU ,
ζ s1t → N 0, + C1 , √ 4ω02 T t =1 T 1 ζ d s2t → N 0, + C2 , √ 4γ02 T t =1 T 1 ζ (1 + µ1 ) µ2 d s3t → N 0, + C3 , √ 4β02 (1 − µ1 ) (1 − µ2 ) T t =1 i with µi = E β0 / γ0 εt2 + β0 , i = 1, 2 as T −→ ∞. T 1
d
βL < β0 < βU ,
γL < γ0 < γU ,
and define the neighborhood N (φ0 ) around φ0 as N (φ0 ) = {φ : ωL ≤ ω ≤ ωU , βL ≤ β ≤ βU , γL < γ < γU } .
(21)
Lemma 4. Under Assumption A, there exists a neighborhood N (φ0 ) of the type given by (21) in Definition 1 for which
∂ (a) supφ∈N (φ0 ) ∂ω 3 LT φ, µφ (·) ≤
(b) (c) (d) (e) (f) (g) (h) (i) (j)
Lemma 2. Under Assumption A, and with the first order derivatives evaluated at the true parameters,
p ∂2 − ∂ω −→ 2ω1 2 + C1 > 0, 2 LT φ, µφ (·) |φ=φ0 0 2 p − ∂γ∂ 2 LT φ, µφ (·) |φ=φ0 −→ 2γ1 2 + C2 > 0, 0 p +µ1 )µ2 ∂2 + C3 > 0, − ∂β −→ 2β 2 ((11−µ 2 LT φ, µφ (·) |φ=φ0 1 )(1−µ2 ) 0 p µ1 ∂2 − LT φ, µφ (·) |φ=φ0 −→ 2γ β (1−µ ) + C4 , 0 0 1 p ∂γ ∂β 2 ∂ − ∂ω∂γ LT φ, µφ (·) |φ=φ0 −→ 2ω1 γ + C5 , 0 0 p ∂2 − ∂w∂β LT φ, µφ (·) |φ=φ0 −→ 2ω0 β0µ(11−µ1 ) + C6 ,
3
t =1 w1t , 3 1 T ∂ supφ∈N (φ0 ) ∂β 3 LT φ, µφ (·) ≤ T t =1 w2t , 3 1 T 1 ∂ supφ∈N (φ0 ) T ∂γ 3 LT φ, µφ (·) ≤ T t =1 w3t , 1 T 1 ∂3 supφ∈N (φ0 ) T ∂ω2 ∂β LT φ, µφ (·) ≤ T t =1 w4t , 1 T 1 ∂3 supφ∈N (φ0 ) T ∂ω2 ∂γ LT φ, µφ (·) ≤ T t =1 w5t , 1 T 1 ∂3 supφ∈N (φ0 ) T ∂β 2 ∂γ LT φ, µφ (·) ≤ T t =1 w6t , 1 T 1 ∂3 supφ∈N (φ0 ) T ∂ω∂β 2 LT φ, µφ (·) ≤ T t =1 w7t , 3 ∂ 1 T supφ∈N (φ0 ) T1 ∂ω∂γ 2 LT φ, µφ (·) ≤ T t =1 w8t , 3 ∂ 1 T supφ∈N (φ0 ) T1 ∂β∂γ 2 LT φ, µφ (·) ≤ T t =1 w9t , 3 T ∂ supφ∈N (φ0 ) T1 ∂ω∂β∂γ LT φ, µφ (·) ≤ T1 t =1 w10t ,
1 T
T
where w1t , . . . , w9t and w10t are stationary and have finite moments, E (wit ) = Mi < ∞, ∀i = 1, . . . , 10. Furthermore 1 T
T
t =1
a.s.
wit −→ Mi , ∀i = 1, . . . , 10.
Part (b) of Theorem B1 follows directly from Theorem A1.
Proofs of Theorems A2 and B2. Follow from the asymptotic equivalence result provided in Severini and Wong (1992, p. 1775) and stated in the following Corollary. Corollary 1. Under Assumptions A–C, maximizing the generalized profile quasi-log-likelihood function LT φ, µφ (·) evaluated at
µφ (·) is asymptotically equivalent to maximizing LT φ, µφ (·) .
B.J. Christensen et al. / Journal of Econometrics 167 (2012) 458–472
Assumption D. D1. µφ (·) denotes a smooth mean function that is three times differentiable with
y t E yt E yt E y t E
2 ′ − µφ σt − xt α ∂ω < ∞, 2 σt ∂µφ σt2 − µφ σt2 − x′t α ∂γ < ∞, σt2 ∂µφ σt2 − µφ σt2 − x′t α ∂β < ∞, 2 σt 2 ∂µφ σt − µφ σt2 − x′t α ∂α < ∞. σt2
∂µφ σt2
(c)
(d)
(e)
(f)
Finally,
2 2 ∂µφ σt 1
∂ω
T t =1
σt2
∂γ
T t =1
σt2
∂β
T t =1
σt2
T 1
∂γ
p
|φ=φ0 → C4 < ∞,
σ
2 t
∂µφ σt2 ∂µφ σt2 ∂ω
∂γ
σt2 2
∂µφ σt
p
|φ=φ0 → C5 < ∞,
∂α
T t =1
p
|φ=φ0 → C7 < ∞,
σt2
T t =1
T 1
p
|φ=φ0 → C6 < ∞,
′ ∂µφ σt2
∂α
∂µφ σt2 ∂µφ σt2 ∂α ∂ω
σt2
p
|φ=φ0 → C8 < ∞,
∂µφ σt2 ∂µφ σt2 ∂α
∂γ
p
|φ=φ0 → C9 < ∞,
σ
2 t
T t =1
T
t =1
T
t =1
t =1
a.s.
wlt −→ Ml , ∀l = 11, . . . , 16.
σj−2 yj − x′j α K σ 2 − σj2 /hT
j
σj−2 K
σ 2 − σj2 /hT
.
References
∂µφ σt2 ∂µφ σt2 ∂ω ∂β
1
t =1
Supplementary material related to this article can be found online at doi:10.1016/j.jeconom.2011.09.028.
|φ=φ0 → C3 < ∞,
T 1
T
T
T
Appendix B. Supplementary data
p
∂β
T t =1
t =1
t =1 w12t , ∂µφ (σt2 ) ∂ 2 σt2 T ∂k ∂ i∂ j ≤ T1 t =1 w13t , σt4 ∂µφ (σt2 ) ∂σt2 ∂σt2 T ∂k ∂i ∂j ≤ T1 t =1 w14t , σt6 ∂ 2 µφ (σt2 ) ∂σt2 T ∂ i∂ j ∂k ≤ T1 t =1 w15t , σt4 ∂ 3 µφ (σt2 ) 1 T ∂ i∂ j∂ k ≤ T t =1 w16t , σt2 1 T
j
|φ=φ0 → C2 < ∞,
σt2
T 1
T
( )
≤
Proofs of Theorems A3 and B3. Follow from the proofs of Theorems A2 and B2, the change in T φ (yt ), and Assumption D.
T t =1
T 1
t =1
∂σt2 ∂k
D2. Define µφ (·) by
p
∂µφ σt2 ∂µφ σt2
T t =1
T
|φ=φ0 → C1 < ∞,
2 ∂µφ σt2
T t =1
( )
T
∂µφ σt2 ∂j σt4
p
T 1
T 1
1 T
µφ σ 2 =
2 ∂µφ σt2
T 1
471 ∂µφ σt2 ∂i
where w11t , . . . , w16t are stationary and have finite moments, for any i, j, k = ω, γ , β, α ′ , E (wlt ) = Ml < ∞, ∀l = 11, . . . , 16.
We also assume that there are constants C1 , C2 , . . . , C10 so that T
(b)
supφ∈N (φ0 ) T1 supφ∈N (φ0 ) T1 supφ∈N (φ0 ) T1 supφ∈N (φ0 ) T1 supφ∈N (φ0 ) T1
∂µφ σt2 ∂µφ σt2 ∂α
∂β
σ
2 t
p
|φ=φ0 → C10 < ∞.
Also, there exists a neighborhood N (φ0 ) given by (21) in Definition 1 for which
T (a) supφ∈N (φ0 ) T1 t =1
( ) ∂µφ (σt2 ) ∂k ≤ σt2
∂ 2 µφ σt2 ∂ i∂ j
1 T
T
t =1
w11t ,
Ang, A., Hodrick, R.J., Xing, Y., Zhang, X., 2006. The cross-section of volatility and expected returns. Journal of Finance 61, 259–299. Backus, D.K., Gregory, A.W., 1993. Theoretical relations between risk premiums and conditional variances. Journal of Business and Economic Statistics 11, 177–185. Berkes, I., Horváth, L., 2004. The efficiency of the estimators of the parameters in GARCH processes. The Annals of Statistics 32 (2), 633–655. Bollerslev, T., 1986. Generalized autoregressive conditional heteroskedasticity. Journal of Econometrics 31, 307–327. Bollerslev, T., Engle, R.F., Wooldridge, J.M., 1988. A capital asset pricing model with time-varying covariances. Journal of Political Economy 96, 116–131. Bosq, D., 1998. Nonparametric statistics for stochastic processes. In: Lecture Notes in Statistics, second ed.. Springer-Verlag. Brandt, M.W., Kang, Q., 2004. On the relationship between the conditional mean and volatility of stock returns: a latent VAR approach. Journal of Financial Economics 72, 217–257. Brown, B.M., 1971. Martingale central limit theorems. Annals of Mathematical Statistics 42, 59–66. Campbell, J.Y., 1987. Stock returns and the term structure. Journal of Financial Economics 18, 373–399. Campbell, J.Y., 1993. Intertemporal asset pricing without consumption data. American Economic Review 83, 487–512. Campbell, J.Y., Hentschel, L., 1992. No news is good news: an asymmetric model of changing volatility in stock returns. Journal of Financial Economics 31, 281–318. Campbell, J.Y., Shiller, R.J., 1988a. The dividend price ratio and expectations of future dividends and discount factors. Review of Financial Studies 1, 195–228. Campbell, J.Y., Shiller, R.J., 1988b. Stock prices, earnings, and expected dividends. Journal of Finance 43, 661–676. Carhart, M., 1997. On persistence in mutual fund performance. Journal of Finance 52, 57–82. Chou, R.Y., 1988. Volatility persistence and stock valuations: some empirical evidence using GARCH. Journal of Applied Econometrics 3, 279–294. Chou, R.Y., Engle, R.F., Kane, A., 1992. Measuring risk aversion from excess returns on a stock index. Journal of Econometrics 52, 201–224. Christensen, B.J., Nielsen, M.Ø., 2007. The effect of long memory in volatility on stock market fluctuations. Review of Economics and Statistics 89 (4), 684–700. Christensen, B.J., Nielsen, M.Ø., Zhu, J., 2010. Long memory in stock market volatility and the volatility-in-mean effect: the FIEGARCH-M model. Journal of Empirical Finance 17, 460–470. Conrad, C., Mammen, E., 2008, Nonparametric regression on latent covariates with an application to semiparametric GARCH-in-mean models, Working paper, University of Mannheim.
472
B.J. Christensen et al. / Journal of Econometrics 167 (2012) 458–472
Dahl, C.M., Iglesias, E.M., 2011. Modelling the risk-return trade off when volatility may be non-stationary. Journal of Time Series Econometrics 3, 1, article 10, 1–30. Engle, R.F., Lilien, D.M., Robins, R.P., 1987. Estimating time-varying risk premia in the term structure: the ARCH-M model. Econometrica 55, 391–407. Escanciano, J.C., 2008. Joint and marginal specification tests for conditional mean and variance models. Journal of Econometrics 143 (1), 74–87. Fama, E.F., French, K.R., 1988. Dividend yields and expected stock returns. Journal of Financial Economics 22, 3–25. Fama, E.F., French, K.R., 1989. Business conditions and expected returns on stock and bonds. Journal of Financial Economics 25, 23–49. Ghysels, E., Santa-Clara, P., Valkanov, R., 2005. There is a risk-return tradeoff after all. Journal of Financial Economics 76, 509–548. Glosten, L.R., Jagannathan, R., Runkle, D.E., 1993. On the relation between the expected value and the volatility of the nominal excess return on stocks. Journal of Finance 48, 1779–1801. Han, H., Park, J.Y., 2008. Time series properties of ARCH processes with persistent covariates. Journal of Econometrics 146 (2), 275–292. Harrison, P., Zhang, H.H., 1999. An investigation of the risk and return relation at long horizons. Review of Economics and Statistics 81, 399–408. Hodgson, D.J., Vorkink, K.P., 2003. Efficient estimation of conditional asset pricing models. Journal of Business and Economic Statistics 21 (2), 269–283. Iglesias, E.M., 2009. Finite sample theory of QMLEs in ARCH models with an exogenous variable in the conditional variance equation. Studies in Nonlinear Dynamics and Econometrics 2, article 6, 1–28. Jensen, S.T., Rahbek, A., 2004a. Asymptotic normality of the QML estimator of ARCH in the nonstationary case. Econometrica 72 (2), 641–646. Jensen, S.T., Rahbek, A., 2004b. Asymptotic inference for nonstationary GARCH. Econometric Theory 20 (6), 1203–1226. Keim, D.B., Stambaugh, R.F., 1986. Predicting returns in the stock and bond markets. Journal of Financial Economics 17, 357–390. Lehmann, E.L., 1999. Elements of Large Sample Theory. Springer Verlag, New York. Lettau, M., Ludvigson, S.C., 2010. Measuring and modeling variation in the riskreturn tradeoff. In: Ait-Sahalia, Y., Hansen, L.P. (Eds.), Handbook of Financial Econometrics. North-Holland, Amsterdam (Chapter 11).
Ling, S., 2004. Estimation and testing stationarity for double-autoregressive models. Journal of the Royal Statistical Society Series B 66 (1), 63–78. Lintner, J., 1965. The valuation of risk assets and the selection of risky investments in stock portfolios and capital budgets. Review of Economics and Statistics 47, 13–37. Linton, O., Perron, B., 2003. The shape of the risk premium: evidence from a semiparametric GARCH model. Journal of Business and Economic Statistics 21 (3), 354–367. Lumsdaine, R.L., 1996. Asymptotic properties of the maximum likelihood estimator in GARCH (1, 1) and IGARCH (1, 1) models. Econometrica 64 (3), 575–596. Merton, R., 1973. An intertemporal capital asset pricing model. Econometrica 41, 867–887. Merton, R., 1980. On estimating the expected return on the market. Journal of Financial Economics 8, 323–361. Nelson, D.B., 1991. Conditional heteroskedasticity in asset returns: a new approach. Econometrica 59, 347–370. Poterba, J.M., Summers, L.H., 1986. The persistence of volatility and stock market fluctuations. American Economic Review 76, 1142–1151. Racine, J.S., 2001. Bias-corrected kernel regression. Journal of Quantitative Economics 17 (1), 25–42. Schwert, G.W., 1989. Why does stock market volatility change over time? Journal of Finance 44, 1115–1153. Severini, T.A., Wong, W.H., 1992. Profile likelihood and conditionally parametric models. The Annals of Statistics 20 (4), 1768–1802. Sharpe, W.F., 1964. Capital asset prices—a theory of market equilibrium under conditions of risk. Journal of Finance 19, 425–442. Straumann, D., Mikosch T, T., 2006. Quasi-maximum-likelihood estimation in conditionally heteroscedastic time series: a stochastic recurrence equations approach. The Annals of Statistics 34 (5), 2449–2495. Su, L., Jin, S., 2010. Profile quasi-maximum likelihood estimation of partially linear spatial autoregressive models. Journal of Econometrics 157 (1), 18–33. Sun, Y., Stengos, T., 2006. Semiparametric efficient adaptive estimation of asymmetric GARCH models. Journal of Econometrics 133, 373–386.