Journal of Financial Economics 105 (2012) 131–152
Contents lists available at SciVerse ScienceDirect
Journal of Financial Economics journal homepage: www.elsevier.com/locate/jfec
Payout yield, risk, and mispricing: A Bayesian analysis$ Jay Shanken a,n, Ane Tamayo b,1 a b
Goizueta Business School, Emory University, 1300 Clifton Road, Atlanta, GA 30322, United States London School of Economics, Houghton Street, London WC2A 2AE, UK
a r t i c l e i n f o
abstract
Article history: Received 1 April 2011 Received in revised form 18 May 2011 Accepted 23 May 2011 Available online 7 December 2011
We develop a simple parametric model in which hypotheses about predictability, mispricing, and the risk-return tradeoff can be evaluated simultaneously, while allowing for time variation in both risk and expected return. Most of the return predictability based on aggregate payout yield is unrelated to market risk. We consider a range of Bayesian prior beliefs about the risk-return tradeoff and the extent to which predictability is driven by mispricing. The impact of these beliefs on an investor’s certainty-equivalent return when choosing between a market index and riskless T-bills is economically significant, in both ex ante and out-of-sample analyses. & 2011 Elsevier B.V. All rights reserved.
JEL classification: G11 G12 C11 Keywords: Predictability Time-varying risk Mispricing Bayesian
1. Introduction Evidence on the predictability of aggregate stock returns goes back at least to Fama and Schwert (1977), who consider the forecasting power of the short-term interest rate. While bond yield spreads and various financial ratios have also been studied, perhaps no variable has been examined more than the dividend yield, with most studies
$ We thank Greg Bauer, Bruno Gerard, Chris Jones, Bill Schwert, Alan Scowcroft, Rob Stambaugh, Lance Young, seminar participants at the American Finance Association Meetings, the Inquire Europe/UK Dublin conference, Emory University, Insead, the Norwegian School of Management, the Universities of Alberta, Georgia, Pennsylvania, Rochester, South Carolina, and Southern California and, especially, Doron Avramov (the referee) and Jon Lewellen for helpful comments and discussions. We are grateful for the financial support provided by Inquire Europe and Inquire UK. n Corresponding author. Tel.: þ1 404 727 4772. E-mail addresses:
[email protected] (J. Shanken),
[email protected] (A. Tamayo). 1 Tel.: þ44 2078494689.
0304-405X/$ - see front matter & 2011 Elsevier B.V. All rights reserved. doi:10.1016/j.jfineco.2011.12.002
documenting a positive relation between yield and subsequent return (e.g., Rozeff, 1984; Keim and Stambaugh, 1986; Fama and French, 1988, 1989; Campbell and Shiller, 1988). More recent data seem to weaken the case for predictability (e.g., Welch and Goyal, 2008). However, as Stambaugh (1999) shows, weak frequentist evidence for predictability can be surprisingly strong from a Bayesian perspective with noninformative prior beliefs. While inferences about predictability are of considerable interest, the literature has gone further, linking Bayesian beliefs about predictability with the asset allocation decision of an individual investor. In their pioneering work, Kandel and Stambaugh (1996), henceforth KS, show that predictive regression results that would typically be dismissed as ‘‘statistically insignificant’’ can have important implications for asset allocation between a riskless asset and a stock market index. The stock position becomes more aggressive as the current dividend yield increases, reflecting the higher expected excess return. The analysis in KS is conducted under the assumption that market risk is constant over time. However, as they
132
J. Shanken, A. Tamayo / Journal of Financial Economics 105 (2012) 131–152
recognize, if market risk increases with yield, as some earlier evidence suggests (e.g., Attanasio, 1991; Whitelaw, 1994), the optimal allocation to stock could be less aggressive, or even decrease with yield. KS have inspired many interesting follow-up studies, but the assumption of constant aggregate market risk has invariably been maintained in the literature.2 Thus, a Bayesian analysis of return predictability with changing risk seems long overdue, particularly given the fact that persistent time variation in risk is one feature of the data about which there is no doubt. In this paper, we provide such an analysis, allowing both risk and expected return to vary with yield, and also with a variance predictor based on the prior month’s squared daily market returns (French, Schwert, and Stambaugh, 1987).3,4 Boudoukh et al. (2007) argue that payout yield is a more appropriate predictor than dividend yield, given firms’ propensity to substitute share repurchases for cash dividends since the 1980s, and they obtain stronger predictability results with this measure. Therefore, we use total payout yield in our empirical analysis. Economic theory provides some guidance in thinking about the association between our predictors and expected returns. Classical equilibrium models (e.g., Merton, 1980) with fully rational investors imply that the market risk premium is proportional to the conditional variance of market return, a logical starting point for analyzing the risk-return relation. In this case, time variation in expected returns will be tracked by financial ratios like payout yield insofar as an increase in risk simultaneously lowers the current price and raises the yield through a discount rate effect (e.g., Fama and French, 1988). Alternatively, advocates of behavioral finance (e.g., DeBondt and Thaler, 1985; Shiller, 2000) argue that yield-related predictability could result from irrational swings in investor sentiment, with underpricing (overpricing) raising (lowering) the yield and subsequent returns. Thus, an optimal asset allocation for a rational investor will ultimately depend on both the economic source of the return predictability and the nature of the risk-return relation. With this in mind, we posit a statistical model that permits a parsimonious decomposition of return predictability, with one component reflecting the impact of yield on total market risk and the associated risk premium, and the other capturing the incremental impact of yield, controlling for market risk. We think of this incremental component as due mainly to mispricing. However, it is a 2 See, for example, Tamayo (2002), Avramov (2002, 2004), Avramov and Wermers (2006), and Wachter and Warusawitharana (2009). Multiperiod issues are addressed by Stambaugh (1999), Barberis (2000), and Brandt et al. (2005). 3 Pastor and Stambaugh (2001) allow for changes in volatility across regimes, rather than as a function of observable predictive variables. They also work with priors that incorporate a belief about the proportional relation between the market risk premium and variance, but do not explore mispricing or asset allocation issues. 4 Wang (2004) models conditional variance as an ARCH(1) process, while Johannes, Polson, and Stroud (2002) model expected return and variance as mean-reverting processes, but do not condition on observable predictive variables. Neither paper includes variance in the expected return relation and so issues related to mispricing versus risk are not considered.
‘‘catch-all’’ that could also relate to intertemporal hedging demands that are correlated with yield (e.g., Merton, 1973) or to model misspecification due to nonlinearity of expected return in risk, or the exclusion of relevant predictors.5 From the perspective of the single-period investor that we consider, however, the mispricing– hedging distinction is irrelevant. In the asset pricing literature, it seems clear that researchers tend to view empirical evidence about returns through different lenses, depending on their classical or behavioral orientation. Indeed, this orientation, which can be modeled in terms of a Bayesian prior belief, would appear to have an important effect on the interpretation of empirical results. This is due, in part, to the considerable volatility of stock returns, which leaves substantial uncertainty about the precise nature of predictive return relations, even after a careful analysis of fairly long time series. In a Bayesian setting, this allows for the possibility that differences in priors will be influential in forming beliefs and making portfolio decisions. Thus, it is interesting to ask: how, in light of the apparent differences in their priors, would the posterior beliefs and investment decisions of Eugene Fama and Richard Thaler differ after observing the same evidence on stock return predictability? And, perhaps more importantly, how much do these beliefs matter? Or, in quantitative terms: if each individual were forced to hold the other’s optimal portfolio, how much would their expected utilities decline? Our paper makes some significant strides toward answering these fundamental questions. Of course, there is no unique prior that can be considered the ‘‘correct’’ one for an investor, behavioral or classical. Therefore, we have tried to formulate a range of priors that plausibly reflects much of the diversity of viewpoints that exist on return predictability and the risk-return tradeoff. In addition to a diffuse (noninformative) prior and a prior that rules out mispricing entirely, we include two mispricing priors. The first is motivated by the premise that investors overreact to new information (DeBondt and Thaler, 1985; Lakonishok, Shleifer, and Vishny, 1994), while the second allows for mispricing, but is more agnostic as to the form it might take. With regard to the risk-return relation, our baseline prior assumes that a positive link between market variance and the risk premium is likely, while still allowing for the possibility of a fairly weak relation. We also explore scenarios in which the risk premium is strictly proportional to variance or, at the other extreme, the two are unrelated. Posterior beliefs are derived by combining these priors with the data using Bayes law. As in previous work, the data suggest substantial predictability, with a change in the risk premium of about 3.25% (annualized) for a onepercentage-point change in payout yield. We find little association between yield and the conditional market variance, however. This, together with the weak empirical relation between market risk and the risk premium (as in 5 Guo and Whitelaw (2006) conclude that hedging demands are more important than the variance effect on expected return in a framework that complements ours by imposing the Merton (1973) model, implicitly ruling out mispricing.
J. Shanken, A. Tamayo / Journal of Financial Economics 105 (2012) 131–152
much of the literature), leads to the conclusion that yieldrelated predictability is dominated by the ‘‘mispricing’’ effect. This conclusion could be viewed as already implicit in earlier work, after piecing together different strands of the empirical asset pricing literature. However, the interaction between these empirical regularities and prior beliefs of the sort described above has not previously been explored.6 Next, we examine the asset allocation between a market index and a riskless asset in a single-period setting, for an investor with power utility. We explore scenarios in which our predictors (payout yield and prior month variance) are 1.5 standard deviations above or below their sample means. With a diffuse prior, we find that ignoring yield-related predictability can reduce the certainty equivalent excess return (CER) by about 140 basis points (bps) per annum. This effect is mainly due to mispricing given that the conditional variance barely changes with payout yield. However, ignoring persistence in ex ante risk, as captured by the lagged variance predictor, leads to a CER reduction of 120 bps per annum. Thus, persistence in risk is an important consideration in asset allocation, but changing risk plays a minimal role in the relation between yield and asset allocation. These declines in CER from ignoring predictability in the market return distribution may not seem dramatic at first glance. However, they are economically significant when we consider the fact that the CER computed at mean values of the predictors is itself just 2.6% per annum in this context. Thus, a reduction of 120 bps wipes out nearly half of this baseline CER. Turning to the impact of different priors, we find that forcing an investor who firmly believes there is no mispricing to hold the optimal portfolio of an investor with our overreaction prior can reduce CER by 110 bps per annum. However, comparing that overreaction prior to our more agnostic behavioral prior yields a CER reduction of only 20 bps. Finally, a classical investor who believes that the market risk premium is proportional to the market variance would experience a CER decline of 110 bps if forced to hold either the portfolio of an investor with our baseline prior (which views a positive relation as likely) or the portfolio of an investor who believes there is no risk-return tradeoff at all. Thus, prior beliefs about mispricing and the nature of the risk-return tradeoff both matter for asset allocation. While the Bayesian asset allocation analysis of KS and our extension adopt a conditional ex ante perspective, it is also interesting to evaluate the ex post or out-of-sample performance of investment strategies generated by our different priors. Most past studies have only reported point estimates, despite the fact that the underlying return series are quite volatile. In contrast, we introduce
6 Papers that have tried to connect predictability in risk to time variation in expected returns include Merton (1980), Campbell (1987), French, Schwert, and Stambaugh (1987), Glosten, Jagannathan, and Runkle (1993), and Whitelaw (1994). Surprisingly, some studies find a negative relation, if any, between risk and expected return. Recent evidence by Ghysels, Santa-Clara, and Valkanov (2005), however, points to a positive relation (see also Scruggs, 1998; Harvey, 2001).
133
a simple approach for estimating standard errors of the CERs and their differences across strategies, permitting statistical tests of significance and providing insights into the power of out-of-sample tests. We find that portfolio allocations based on informative priors about predictability display significantly better ex post performance than allocations based on diffuse priors. Moreover, informative priors that allow for some mispricing do better than dogmatic priors that do not (the gap in performance is greater if the no-mispricing prior also assumes that the market risk premium varies proportionally with market variance). However, our two rather different behavioral priors—one that anticipates a positive relation between yield and expected return (perhaps related to investor overreaction), and the other centered at zero—deliver similar results ex post. Thus, a certain amount of openness to mispricing seems to be beneficial when it comes to predictability, but the particulars turn out not to matter very much in our setting. In sum, our main contribution is twofold: (i) the development of a simple parametric model in which hypotheses about predictability, mispricing, and the risk-return tradeoff can be evaluated simultaneously, while allowing for time variation in both risk and expected returns; and (ii) quantifying the extent to which different priors can impact a Bayesian portfolio analysis (ex ante and ex post), with priors specified so as to accommodate much of the diversity of beliefs among researchers and investors about the sources of return predictability and the nature of the market riskreturn relation. The rest of the paper is organized as follows. Section 2 describes the specification of our model of risk and return. Section 3 provides an overview of the data. Section 4 discusses the choice of prior distributions. Empirical evidence on our risk/return parameters is analyzed in Section 5 and the impact of different prior beliefs on predictive return moments and asset allocation is explored in Section 6. Out-of-sample analyses are presented in Section 7. Section 8 concludes and considers directions for future research.
2. The model of risk and return We assume that expected returns can vary directly with payout yield and indirectly through a risk effect. We start with the return equation. Following Merton (1980) and French, Schwert, and Stambaugh (1987), we allow the market risk premium to vary directly with the ex ante variance. In addition, we let the expected excess return depend separately on payout yield: r t þ 1 ¼ k0 þk1 s2et þ bpyt þ et þ 1 ,
ð1Þ
where rt þ 1 is the difference between the continuously compounded returns realized at t þ1 on the stock index and a one-month riskless T-bill, pyt is the payout yield at time t, et þ 1 is the unexpected return, and set is its standard deviation conditional on information at time t. Henceforth, ‘‘return’’ always refers to the continuously compounded excess return. The coefficient k1 measures the extent to which changes in risk affect expected return,
134
J. Shanken, A. Tamayo / Journal of Financial Economics 105 (2012) 131–152
while b reflects any additional yield-related variation in expected return that is not captured by the risk measure.7 Turning to the risk equation, we model the conditional ex ante standard deviation set as a nonlinear function of past volatility and payout yield:
set stdevðet þ 1 9xt Þ ¼ c exp½l1 logðsdt Þ þ l2 pyt ,
ð2Þ
where xt [log(sdt), pyt]0 , and sdt is a lagged measure of realized within-month daily volatility similar to that used by French, Schwert, and Stambaugh (1987). This exponential specification ensures that the standard deviation is always positive and amounts to assuming that log(set) is linear in pyt and log(sdt). Inclusion of the lagged ex post volatility estimate, sd, is intended to capture the wellknown short-term persistence in volatility observed in the estimation of ARCH models. Payout yield is more highly autocorrelated than sd and may track a slowly changing component of risk akin to a GARCH term, as well as leverage effects impounded in market price. Since payout yield and variance are always positive, it is hard to develop intuition about the intercept k0 in (1), or c in (2). This will be important later when specifying prior beliefs and interpreting estimates. Working with the independent variables measured as deviations from some fixed reference points facilitates the interpretation of these parameters. In our empirical analysis, py will be the deviation of yield from its sample mean and log(sd) the deviation from the log of the average standard deviation estimate.8 However, we have verified that our main results are not sensitive to this choice or to the use of reference points based on pre-sample data. Henceforth, we use the same variable names, py and log(sd), to refer to the transformed deviations, with xt in deviation form as well. The parameter c in Eq. (2) can now be interpreted as the ‘‘long-run’’ standard deviation of return or, more specifically, the value of set conditional on xt ¼0. Time variation in risk depends on the parameters l (l1, l2). If l ¼0, the error term is homoskedastic. Note that inclusion of py in (2) implies that positive and negative returns of the same magnitude can have asymmetric effects on conditional volatility. For example, suppose c¼5% and l2 ¼10. If both predictors are initially at their means, then se is 5% as well. Now consider a 10% return that lowers yield from, say, 4% to 4%/1.1¼ 3.64%, a deviation of -36 basis points (bps). Assuming sd is unchanged, se declines by 17.9 bps in (2). Similarly, with a 10% return, the yield deviation is 40 bps and se increases by 20.4 bps, a change that is 14% larger in magnitude. By (1), the ‘‘long-run’’ expected market return, E(rt þ 1 9xt ¼0) a, can be expressed as k0 þk, where k k1c2. For ease of interpretation, we focus on k, which is measured in units of expected return, rather than k1. Eq. (1) can then
7 Previous work by Attanasio (1991) considers the relation between several predictive variables and expected return after controlling for the effect on return volatility. Also, see Scruggs (1998). 8 For sd, we use the log of the average standard deviation estimate, rather than the average of the log(sd), because the resulting ‘‘long-run’’ standard deviation is closer to the unconditional estimate.
be rewritten as r t þ 1 ¼ ðakÞ þ kðs2et =c2 Þ þ bpyt þ et þ 1 :
ð3Þ
If expected return is proportional to variance, then, in the absence of an incremental yield effect, k¼ a. If there is no relation between expected return and risk, k¼0. Harvey (1989) rejects the hypothesis that the market price of risk and the Sharpe ratio are constant. We let g denote the long-run Sharpe ratio a/c. Using (1) and (2), in the absence of incremental yield effects (b ¼0), the market’s Sharpe ratio is k0/set þk1set. If the proportionality condition holds (k0 ¼0) and a ¼k40, then k1 is also positive and the ratio increases with the conditional standard deviation. In general, the relation need not be monotonic and it can be decreasing in set if k0 40 and k1 o0. There will be additional variation in the Sharpe ratio that is unrelated to (total) risk, however, if b is nonzero. For example, a positive b implies that expected return increases with yield, beyond any effect of changing risk, tending to increase the Sharpe ratio as well.
2.1. Bayesian estimation of the model In the standard Bayesian regression framework, the regressors are distributed independently of the disturbances, with a distribution that does not depend on the regression parameters. These conditions hold, in particular, if the regressors are nonstochastic. As KS note, however, independence is violated in a predictive regression on yield because the return surprise at time t impacts all values of dividend yield from time t forward. Correlation between the return surprise and future volatility estimates might be expected as well due to leverage or other effects. Hence, one cannot simply work with the joint density of returns conditional on the time series xt. KS accommodate stochastic regressors by modeling return and yield as elements of a vector autoregression (VAR) with homoskedastic errors, conditioning on the initial sample value of yield, and imposing a noninformative prior on the VAR parameters. Results from Zellner (1971) are then applied. The posterior distribution is identical to that for the nonstochastic regressor case, apart from a degrees of freedom adjustment. An alternative approach developed in this paper permits fairly general informative prior beliefs for the parameters in the return equation, as well as conditional heteroskedasticity in returns and yields. We also introduce a method for dealing with issues that arise in conditioning on lagged estimates of volatility. Since analytic results are not, to our knowledge, available for a model with these features, simulation methods are employed. In particular, we make use of a simulation technique known as ‘‘importance sampling’’ (see Bauwens, Lubrano, and Richard, 1999, pp. 76–83; Geweke, 1989), an alternative to the Gibbs sampling method that is often used in Bayesian applications. Importance sampling can be used to approximate the expectation of any function of the model parameters under fairly general priors. Moreover, since it entails i.i.d. sampling, the standard central limit theorem can be invoked to obtain direct measures of precision for the simulation estimates. A general overview of the methodology
J. Shanken, A. Tamayo / Journal of Financial Economics 105 (2012) 131–152
135
Table 1 Descriptive statistics: excess return, payout ratio, and risk. This table presents the descriptive statistics for the continuously compounded excess return on the NYSE value-weighted index and the payout yield for CRSP and Compustat firms. The payout yield, PY, is computed by dividing the cash flow-based payout yield in Boudoukh et al. (2007) by one plus the value-weighted return on the market. The monthly standard deviation, SD, is the sample standard deviation of daily returns multiplied by the square root of 22 trading days. All numbers except the Sharpe ratio are percentages. Monthly excess return
1927–2004 1927–1939 1940–2004
PY
Mean
Standard deviation
Sharpe ratio
Mean
Standard deviation
Mean
Standard deviation
0.50 0.17 0.57
5.41 9.43 4.17
0.09 0.02 0.14
4.32 4.97 4.18
1.35 1.58 1.27
4.46 7.85 3.79
3.02 4.79 1.91
is given in Appendix A and additional details are given in Appendices B and C. 3. Descriptive statistics for the data Before describing the prior beliefs used in our study and conducting the Bayesian analysis, we look at some descriptive statistics over the period 1927–2004 and various subperiods. This will give us an initial feel for the data before exploring the impact of prior beliefs in a richer context. Our data consist of the value-weighted NYSE return, the NYSE payout yield, and the within-month sample standard deviation of daily returns.9 Since the payout yield is an annual measure, to obtain a monthly measure we do the following. For a given month, we take the most recent calendar year-end payout yield and update it by dividing by one plus the value-weighted market return from that year-end to the beginning of the given month. This ensures that the yield is always known ex ante. For example, our monthly payout yield at the beginning of February 1940 is the yield at the end of 1939 divided by one plus the January 1940 return. The standard deviation is estimated as in French, Schwert, and Stambaugh (1987) and is scaled by the square root of 22 (trading days) so as to be a measure of risk over a monthly horizon. The statistics for the NYSE value-weighted excess return, payout yield (py) and scaled standard deviation (sd) are provided in Table 1. As has often been noted, the period before 1940 was extremely volatile and perhaps could be viewed as a different ‘‘regime.’’ Schwert (1990), in studying aggregate stock returns from 1802 to 1987, emphasizes that, apart from this unusual period, the average volatility of stock returns has been remarkably stable. For example, monthly volatility in consecutive 20-year periods from 1841 to 1920 ranged from 4.14% to 4.79%. Our empirical analysis focuses on the relatively stationary period since 1940. During this period, however, the ex post risk in October 1987, at 30% per month, stands out as very extreme by historical standards. Therefore, when using sd as a predictive variable, we replace this observation by 20%, a sort of winsorization that implicitly allows for more meanreversion in volatility than our simple model would otherwise imply for that month. 9
SD
We are grateful to Roni Michaely for providing the payout data.
4. Prior beliefs about the risk/return parameters The aim of this study is to explore how the empirical evidence would affect beliefs about predictability starting with different priors. Thus, understanding the mapping between prior and posterior beliefs is a central theme of our analysis. Our continuous priors, described below and summarized in Appendix D, embrace the possibility that mispricing effects and (total) risk effects are present simultaneously, while our dogmatic b ¼0 prior presumes that there is no incremental yield effect. For each parameter, we also consider diffuse priors. These are flat priors (identically equal to one) for all parameters except c. In that case, we follow the usual approach of taking the prior for ln(c) to be flat. For simplicity, we assume the priors for (c, g, l1, l2, b, k) are independent. Since a ¼ gc, its prior will be induced by the priors for g and c.10 A detailed description of the intuitive economic motivation behind the choice of parameter values for our priors is given below. The values selected provide a range of beliefs that can serve as reasonable points of departure for thinking about these issues. In principle, these prior beliefs represent the perceptions a researcher would have had before observing the data. Is it then appropriate to appeal to theories developed after the start of our sample in 1940? We believe it is, provided that the theories are motivated by fundamental economic logic, rather than specific empirical patterns observed in the data. Furthermore, our priors can be viewed as beliefs that might have been held by some investors at the beginning of the sample, insofar as the idea of a risk-return tradeoff and notions of mispricing were already recognized at that time. That awareness would, of course, have been on a more intuitive level, before the formal theoretical underpinnings were provided. 4.1. Priors for the long-run levels of risk and the Sharpe ratio, c and g The prior distribution for the ‘‘long-run’’ standard deviation c is taken to be lognormal to ensure positivity. Based in part on the Schwert (1990) data, we let mc ¼6%, 10 Given the specification of (1) in terms of deviations from the mean yield, our priors are conditioned on this level of yield. Allowing certain sample characteristics to enter into the prior is often referred to as an ‘‘empirical Bayes’’ approach.
136
J. Shanken, A. Tamayo / Journal of Financial Economics 105 (2012) 131–152
sc ¼1%. Thus, beliefs about risk are quite informative, in keeping with the historical homogeneity discussed earlier. As of 1940, our investor is characterized as expecting somewhat greater economic stability than was experienced in the 1930s, but with considerably more uncertainty than that of the pre-1927 period. The prior for the Sharpe ratio g is taken to be normal, independent of c. Therefore, the conditional prior mean for long-run expected return, given c, is simply E(g)c. The use of early stock return data to form a prior for the Sharpe ratio is complicated by the fact that Treasury bills were not issued over most of the period studied by Schwert (1990). For simplicity, we decompose the average unobserved riskfree rate over the period 1841–1940 as the sum of the average real rate and expected inflation. We treat the real rate as a constant that can be backed out from later Treasury bill data and assume the average inflation surprise is zero over long periods. Given these assumptions, we obtain an average market excess return of 0.65% per month and a sample Sharpe ratio of 0.12 for the 1841–1940 period. In the spirit of Jorion and Goetzmann (1999), who argue that ex post performance in the U.S. market has likely exceeded ex ante expectations due to ‘‘survivor biases,’’ we specify a slightly lower prior mean mg equal to 0.111. Since mg is positive, the conditional mean of a increases with c, e.g., it equals 50 bps when c¼4.5% and 67 bps when c ¼6% (the prior expected value). Although the variance of returns can be estimated fairly precisely with relatively little data, this is not true of the mean. Therefore, we need to allow for relatively greater uncertainty about the Sharpe ratio, as compared to c. We let sg ¼0.05, low enough to reflect a strong belief that g and the corresponding risk premium are above zero. Before turning to the other model parameters, we comment on the long-run impact of mispricing. While irrational mispricing can induce variation in expected returns, if the mispricing reverts to a mean of zero, it should not have a direct effect on the long-run level of expected return. Uncertainty about future mispricing can affect risk, however, and thus have an indirect impact on expected return. For example, De Long, Shleifer, Summers, and Waldmann (1990) analyze a model in which the random perceptions of irrational investors induce additional return variability and an associated price discount (higher expected return), as both rational and irrational investors demand compensation for the additional ‘‘sentiment risk.’’ This would be reflected in the parameters g and c of our model. Thus, risk-related variation in expected return need not be unambiguously rational, as it can arise from the interplay between rational and irrational investors.
4.2. Priors for the time-varying risk parameters, l We want our prior for l1 in (2) to reflect a belief that risk will display positive persistence from one month to the next, though with some mean-reversion. Given the great precision with which this coefficient can be estimated, the exact prior specification is not important. We take it to be normal with ml1 ¼ sl1 ¼0.5.
It is reasonable to expect the direct relation between yield and risk to be positive through a leverage effect (Black, 1976; Christie, 1982); i.e., leverage and stock risk both increase when price declines, with the decline also increasing the yield if payouts change slowly over time. However, l2 measures the partial effect of py on risk controlling for sd, so the anticipated relation is less clear. Insofar as the estimate sdt captures the information about ex ante risk at time t with error, however, we might expect the partial effect of py to still be positive. In order to get some feel for the economic significance of different values of l2, we note that, as an approximation when lx is small, se Ec(1þ lx). Therefore, a value of l2 equal to 10, say, implies (roughly) that a 100 bps change in yield will be associated with a proportionate change in se of about 10% [10% of mc ¼(0.10)(0.06)¼60 bps] when log(sd) (deviation) is zero. We take the prior for l2 to be normal with ml2 ¼10 and sl2 ¼10. Thus, a positive relation is expected under the prior, but some mass, about 16%, is placed on negative values, reflecting our uncertainty about the parameter. We will see that the data tend to dominate beliefs about l2 with this level of prior uncertainty. 4.3. Priors for the time-varying expected return parameters, b and k The slope coefficient on yield in (3), b, reflects expected return variation that is related to yield but is unrelated to (total) risk. A common contrarian view is that an overpriced (underpriced) market, in which anticipated dividend growth is unrealistically high (low), can be reflected in a low (high) dividend yield. This situation might arise, for example, if investors tend to extrapolate recent economic strength too far into the future (e.g., Lakonishok, Shleifer, and Vishny, 1994), a sort of overreaction scenario. If this irrational sentiment is meanreverting, the implied ‘‘correction’’ can lead to a positive relation between yield and actual (as opposed to perceived) expected return, implying a positive value for b. The influential paper by DeBondt and Thaler (1985) viewed overreaction as the fundamental behavioral hypothesis emerging from the work of Kahneman and Tversky (1979). To capture this perspective, we consider an ‘‘overreaction prior’’ (OP) that is normally distributed with mb ¼0.20, implying that a 100 bps change in yield is associated with a substantial change of 20 bps in monthly expected return, or 240 bps annualized. In this case, a 200 bps drop in yield below the mean, a fairly low value in historical terms, reduces the annualized prior expected return from 8% (67 bps per month) to 3.24% (67–200 0.20 ¼27 bps per month). We let sb ¼0.10, indicating a high degree of confidence that b is positive. Note that at relatively high values of b entertained by this prior, expected returns will sometimes be negative, an implausible conclusion from the risk-based perspective. Even if overreaction is a fundamental behavioral bias, the leap from experimental psychology to implications for financial market equilibrium entails additional uncertainty. Moreover, an underreaction scenario might also be viewed as plausible a priori, which could result in a negative value of b. In this view, the literature’s emphasis on
J. Shanken, A. Tamayo / Journal of Financial Economics 105 (2012) 131–152
137
Table 2 Bayesian estimates. This table reports Bayesian estimates for the regression: r t þ 1 ¼ ðakÞ þ kðs2et =c2 Þ þ bpyt þ et þ 1 for the 1940–2004 sample period. rt þ 1 is the continuously compounded monthly excess return on the NYSE value-weighted stock index and pyt is the payout yield. The unexpected return, et þ 1, is normally distributed with mean zero and conditional standard deviation set ¼ c exp[l1 log(sdt)þ l2pyt], where sdt is the monthly standard deviation estimate based on daily data within month t. Both py and log(sd) are measured as deviations from long-run values. b measures the impact of payout yield on expected excess return, net of its impact through set; the total impact of payout yield on expected return is given by btot; g is the long-run [py and log(sd) deviations are 0] Sharpe ratio, and c is the long-run standard deviation of return; gc¼ a is the long-run expected excess return. Means and standard deviations (in parentheses) of the posterior distributions of the parameters are reported. The probability that the given parameter is larger than zero is given in brackets.
a (%)
btot
c (%)
l1
l2
g
b
k (%)
Diffuse Posterior Mean (Stdev) [Prob(n 40)]
0.57 (0.17) [1.00]
0.27 (0.11) [0.99]
4.50 (0.03) [1.00]
0.62 (0.01) [1.00]
0.03 (0.42) [0.52]
0.127 (0.038) [1.00]
0.26 (0.11) [0.99]
0.06 (0.34) [0.57]
Neutral (NP) Prior Mean (Stdev) [Prob(n 40)]
0.67 (0.32) [0.99]
0.13 (0.27) [0.70]
6.00 (1.00) [1.00]
0.50 (0.50) [0.84]
10.0 (10.0) [0.84]
0.111 (0.050) [0.99]
0.00 (0.10) [0.50]
0.67 (0.81) [0.83]
Posterior Mean (Stdev) [Prob(n 40)]
0.55 (0.13) [1.00]
0.13 (0.07) [0.96]
4.50 (0.03) [1.00]
0.62 (0.01) [1.00]
0.06 (0.42) [0.56]
0.122 (0.029) [1.00]
0.12 (0.07) [0.95]
0.18 (0.26) [0.75]
Overreaction (OP) Prior Mean (Stdev) [Prob(n 40)]
0.67 (0.32) [0.99]
0.33 (0.27) [0.96]
6.00 (1.00) [1.00]
0.50 (0.50) [0.84]
10.0 (10.0) [0.84]
0.111 (0.050) [0.99]
0.20 (0.10) [0.98]
0.67 (0.81) [0.83]
Posterior Mean (Stdev) [Prob(n 40)]
0.55 (0.13) [1.00]
0.23 (0.07) [0.99]
4.50 (0.03) [1.00]
0.62 (0.01) [1.00]
0.05 (0.42) [0.54]
0.122 (0.029) [1.00]
0.23 (0.07) [0.99]
0.15 (0.26) [0.71]
overreaction is driven more by empirical observations than compelling theoretical arguments. To accommodate this perspective, we also consider a prior that is ‘‘neutral’’ (NP) about the direction of the mispricing effect, but reflects considerable uncertainty about that effect: mb ¼0.0 and sb ¼0.10. Later in the paper, our OP and NP results will be compared to those for a dogmatic ‘‘no-mispricing prior’’ that assumes b is zero. Finally, to specify the prior for k, which reflects the market risk/return tradeoff, we turn to financial theory as a reference point. Merton (1980, p. 329) emphasizes that if state variables other than aggregate wealth have a relatively small effect on consumption, or if the variance of changes in wealth is much larger than the variance of changes in the state variables, then the market risk premium can be reasonably approximated as proportional to market variance. If the representative investor has a utility function with constant relative risk aversion, then the constant of proportionality will equal the relative risk aversion coefficient. This relation motivates the early empirical studies by Merton (1980) and French, Schwert, and Stambaugh (1987). While Merton (1980, p. 328) clearly recognizes the possibility that an increase in market risk will not be
accompanied by an increase in the risk premium, he refers to a positive relation as a ‘‘generally reasonable assumption.’’ Similarly, Abel (1988) identifies an ‘‘allowable, though implausible’’ equilibrium context in which the market risk premium is a decreasing function of (dividend) risk. We take these comments as motivation for a prior that imposes a positive relation with fairly high probability. Recall that in our parameterization, the proportionality condition corresponds to a ¼k, where a ¼ gc is the long-run expected return. We specify a prior for k, conditional on a, that is normal with mean and standard deviation both equal to a. Thus, our ‘‘best guess’’ is that k will equal a and, more generally, the odds that k will be positive are about five to one, given a positive value of a (the prior probability that a is positive is 0.99 in Table 2 for OP and NP). We also entertain dogmatic priors that assume either k¼ a or k¼0. 5. Empirical evidence: Bayesian posterior distributions 5.1. Posterior distributions based on diffuse priors Posterior beliefs about the model parameters over the 1940–2004 period are presented in Table 2. Results based
138
J. Shanken, A. Tamayo / Journal of Financial Economics 105 (2012) 131–152
on diffuse priors are reported in the first panel. Posterior means (the Bayesian ‘‘estimates’’) are given first, with posterior standard deviations (similar to conventional standard errors) below in parentheses. We often refer to the moments based on diffuse priors as ‘‘data-based’’ estimates or standard deviations.11 The numbers in brackets are posterior probabilities that the given parameter exceeds zero. The long-run expected return parameter, a, is almost certainly positive with a posterior mean of 57 bps per month. The estimate of the long-run standard deviation, c, is 4.5% and, as expected, it is estimated with great precision. The implied Sharpe ratio, g, is significantly positive as well. The estimated coefficient on py, b, is about two and one-half standard deviations above zero. Consequently, the relation between yield and expected return, controlling for risk, is positive with (posterior) probability nearly equal to one. The estimated effect is economically substantial; a one-percentage-point increase in yield implies an increase in expected return of 26 bps per month or more than 300 bps per year, holding risk constant. The estimated relation between risk and expected return is also positive, but the posterior probability that k is greater than zero is only 57%. The 0.06 estimate of k is substantially below the 0.57 estimate of a, indicating that expected return varies far less with risk than would be the case under the proportionality condition (a ¼k). In fact, the posterior probability (not shown) for the hypothesis ko a is 0.97.12 Turning to the parameters describing time variation in risk, we see the anticipated strong relation between the lagged risk measure sdt and ex ante market risk, as measured by l1. The incremental effect of yield, measured by l2, however, is close to zero. As a result of the weak relation between yield and risk and the relatively low value of k for this period, there is not much difference between btot and b.13 Thus, essentially all of the predictability in yield is attributed to mispricing in this case. To summarize, the data provide strong evidence of persistence in volatility and expected return variation that is positively related to payout yield. The latter cannot be accounted for by the relation between yield and risk, however. The link between expected return and risk is weak, with no support for the Merton proportionality condition.
5.2. Posterior distributions with informative prior beliefs The posterior results based on informative priors are reported in the second and third panels of Table 2. Informative priors, as specified here, have minimal effects 11 While it does not appear to be the case here, in some contexts diffuse priors can exert a strong influence, quite apart from the data. See Kandel, McCulloch, and Stambaugh (1995) for an interesting example. 12 This and other similar probabilities are derived by letting p(y) in Appendix A be the appropriate indicator function. 13 Recall that b reflects the partial effect of yield on expected return, holding risk constant. Given the nonlinearity in Eqs. (2) and (3), we measure the total impact of yield, btot, as the change in expected return when py varies from one percentage point below to one point above its mean, divided by 0.02, with the values of the risk inputs determined by the auxiliary regression of log(sd) on py. This is analogous to the usual simple regression coefficient in a linear predictive model.
on the variance parameters since the data-based estimates are far more precise than the priors, particularly for c and l1. Recall that our prior for the risk-return tradeoff parameter is intended to be somewhat informative, in that a fairly low probability is assigned to negative values of k. Although the estimation precision for k is much greater than the prior precision, the prior does have a nontrivial influence, raising the posterior mean from 0.06 (diffuse case) to 0.18 for the neutral prior (NP) and 0.15 for the overreaction prior (OP). Informative prior beliefs have a much greater effect on beliefs about mispricing since the data-based standard deviation for b (0.11) turns out to be similar to the prior standard deviations (0.10). In a standard regression context, posterior precision (reciprocal of variance) is the sum of the prior and sample precisions, while the posterior mean is a precision-weighted average of the prior mean and the regression estimate. This need not be true for all parameters in our nonlinear specification, but it holds approximately for b.14 For example, the NP posterior mean b of 0.12 is about halfway between the prior mean of zero and the data-based estimate 0.26. The associated posterior probability that b is positive is 0.95, as compared to the prior probability of 0.50. OP results are much closer to the diffuse prior results since the data-based estimate is not far from the OP prior mean. We have also examined posteriors (not shown) under the dogmatic prior assumption that the proportionality condition, k¼ a, holds with probability one. This has little effect on posterior beliefs about b. Likewise if, instead, we assume that k¼0. To summarize, the informative priors explored here have substantive effects on beliefs about the yield–return relation and the risk-return tradeoff, but little impact on beliefs about the conditional variance relation. In the next section, we examine the impact of these changes in beliefs on asset allocation. 6. Asset allocation analysis In general, given probability beliefs about model parameters and specific levels of the predictive variables, a ‘‘predictive’’ or perceived return distribution can be computed. An optimal portfolio and corresponding certainty equivalent return (CER) can then be determined based on an assumed utility function. Recall that KS consider a simple regression of returns on dividend yield. To assess the ‘‘economic significance’’ of their slope parameter estimate when the ‘‘current’’ value of yield deviates from the mean, they determine the portfolio that would be optimal if yield were at its mean, i.e., ignoring the predictability evidence. Of course, this portfolio will not be optimal with respect to the actual predictive distribution and so utility with respect to that distribution will be lowered accordingly. The CER loss 14 Recall that a is the product of g and c. The informative posterior means for a are below the prior means because of the large prior mean for c relative to the data. They are a bit lower than the diffuse posterior mean because the posterior mean for g is lower than the diffuse mean. The similar behavior of k may relate to its definition as the product of k1 and c2.
J. Shanken, A. Tamayo / Journal of Financial Economics 105 (2012) 131–152
incurred using the ‘‘suboptimal portfolio’’ in this manner is the metric KS use to capture the notion of economic significance. In our model, expected return can vary with risk and yield, and risk can vary with yield and past volatility. Given all the parameters and predictive inputs, as well as the range of priors, the number of permutations is large and evaluating significance becomes far more complicated. To make the analysis more manageable we focus on what seem to us the most interesting cases. We formally describe the optimization framework in the next section and then present the evidence. 6.1. The optimization framework As in KS, we consider an individual investor with a single-period investment horizon and an isoelastic utility function, UðWÞ ¼ W 1A =ð1AÞ,
ð4Þ
where A is the coefficient of relative risk aversion. At the end of month T, the investor chooses a weight on on the market index, so as to maximize the expected utility of end-of-period wealth,
139
prior-based analog of (6), the prior-predictive distribution: Z pðr T þ 1 Þ ¼ f ðr T þ 1 9y1 Þpðy1 Þ dy1 , ð7Þ where p(y1) is a proper (informative) prior density for y1. KS observe, and we confirm for our model, that the following approximation to the optimal weight works quite well:
on mT =ðAs2T Þ þ 1=ð2AÞ,
ð8Þ 2 T
where mT is the predictive mean and s the predictive variance. We use this to simplify the computations. In particular, if there is no incremental yield effect (b ¼0) and the Merton condition (k¼ a) holds, then based on (3), the ratio of expected return to variance remains constant as yield changes. Insofar as this holds approximately for the predictive mean and variance as well, on will also be constant. When there is a b effect, we see from (8) that the impact on on will be greater when risk is relatively low. Finally, it is clear from (8) that whether a given value of b reflects mispricing or hedging effects does not matter from the perspective of our single-period investor. 6.2. Predictive moment and portfolio weight comparisons at different levels of yield
W T þ 1 ¼ W T ½o expðr T þ 1 þ iT þ 1 Þ þ ð1oÞ expðiT þ 1 Þ ¼ W T expðiT þ 1 Þ½o expðr T þ 1 Þ þ ð1oÞ,
ð5Þ
where 0 r o r1, rT þ 1 is the continuously compounded excess stock return and iT þ 1 is the continuously compounded riskless interest rate for month Tþ1. In general, the certainty equivalent return premium (CER) is the excess return that, if known for sure, would provide the same utility as the optimal portfolio. A is taken to be 5.0 in our illustrations, similar to values used in previous studies—low enough to generate substantial allocations to stock, but high enough to avoid too many corner solutions. The riskless rate is 40 bps per month throughout. It follows from (4) and (5) that on does not depend on the level of the riskless rate, though the optimal level of utility is affected. The investor maximizes expected utility with respect to his predictive probability distribution, which conditions on past empirical evidence—yields and returns (the vector DT), as of the end of month T, and prior beliefs. As in KS, our investor is not assumed to be a representative investor for the economy. In fact, since we entertain the possibility of behavioral biases and associated mispricing effects in equilibrium, our fully rational Bayesian investor cannot, in general, be representative. The predictive distribution can be viewed as a mixture of distributions, each conditioned on a set of parameter values, and averaged according to the probability distribution of the parameters: Z pðr T þ 1 9DT Þ ¼ f ðr T þ 1 9y1 Þpðy1 9DT Þ dy1 , ð6Þ where y1 (g, c, l, b, k) and p(y19DT) is the posterior density derived from a prior density and the data. We sometimes refer to this posterior-based distribution simply as ‘‘the predictive distribution.’’ Also of interest is the
In this section, we look at predictive moments and implied portfolio weights based on the 1940–2004 data. The first column of Table 3 lists the scenarios examined. For each prior, we consider values of payout yield at the mean and 1.5 sample standard deviations above or below the mean (the log(sd) deviation is fixed at zero). The next three columns give the predictive return moments for three situations: first under the posterior, then under the prior (if informative), and finally under the posterior except that the mispricing effect is ignored, i.e., b is taken to be zero in computing the expected returns (more on this below). The ‘‘weight’’ columns then report allocations for each of these three situations. Finally, the ‘‘CER’’ columns report the CER under the optimal posterior-based portfolio and the loss that an investor would face if forced to hold a suboptimal portfolio, either by ignoring the data (Prior pred.), or by ignoring the mispricing effect (b ¼0). Let us start with the diffuse prior. When yield is at its mean, the predictive expected return is just the posterior mean of a, 57 bps per month (see Table 2). When yield is high, the mean increases to 107 bps, and at the low value of yield it is just 7 bps. These changes are driven by the substantial estimate of the (incremental) yield effect on expected return (measured by b). In contrast, predictive risk remains fairly constant as yield varies. As a result, the optimal portfolio weights rise sharply, from 17% to 100%, as yield increases, while the corresponding optimal CERs increase from 1 to 67 bps per month. The posterior predictive expected returns under OP are similar to those for the diffuse prior, though a bit less dispersed, while the NP expected returns vary least with yield. This makes sense given that the data-based estimate of b (0.26) is close to the prior mean under OP (0.20), whereas the prior mean under NP is zero. Posterior-predictive risk is dominated by the estimate of
140
J. Shanken, A. Tamayo / Journal of Financial Economics 105 (2012) 131–152
Table 3 Predictive analysis at various levels of yield. This table reports predictive return moments for the market index, the optimal allocations to stock (weight), and certainty equivalent (excess) returns (CER) or losses (in basis points per month) associated with investment in optimal or suboptimal portfolios, respectively. These are given for each prior and three levels of payout yield, shown as sample standard deviations from the mean. The log(sd) deviation is set to zero. The investor has power utility with relative risk aversion of 5. Sample period: 1940–2004. The posterior pred. and prior pred. columns report the return moments and market weights obtained under the posterior and prior predictive densities. The b ¼0 columns assume that the mispricing effect of payout yield is ignored in computing expected returns, resulting in ‘‘suboptimal’’ portfolio allocations. The associated loss is the difference between the optimal CER computed under the correct posterior-predictive distribution and the suboptimal CER. The prior weight and CER columns ignore the data entirely—the resulting loss measures the incremental significance of the data, given the informative prior. Prior
Yield
Diffuse 1.5 0 1.5
Neutral (NP) 1.5 0 1.5
Overreact (OP) 1.5 0 1.5
Mean (Stdev) (%) Posterior pred.
Prior pred.
Weight (%)
b ¼0
0.07 (4.51) 0.57 (4.50) 1.07 (4.51)
–
0.32 (4.50) 0.55 (4.50) 0.78 (4.51)
0.49 (5.23) 0.66 (6.08) 1.05 (7.68)
0.55 (4.50) 0.55 (4.50) 0.55 (4.51)
0.12 (4.50) 0.55 (4.50) 0.98 (4.51)
0.11 (5.23) 0.66 (6.08) 1.43 (7.68)
0.55 (4.50) 0.55 (4.50) 0.55 (4.51)
– –
Posterior pred.
0.57 (4.50) 0.57 (4.50) 0.57 (4.51)
long-run standard deviation, c, and the fact that risk barely changes with yield. Prior-predictive risk reflects the higher prior for c and the positive prior mean for l2. Notice that the prior-predictive risk of 6.08% exceeds the 6% prior mean for c. This occurs because the 6.08% reflects not only the prior belief about the true variability of returns, but also the investor’s subjective uncertainty about the various parameter values. This additional uncertainty, often referred to as ‘‘estimation risk,’’ is reduced after learning from the data. Indeed, posterior-predictive risk in Table 3 is nearly equal to the posterior mean for c, about 4.5% Turning to the portfolio weights, we observe considerable differences between the posterior-based optimal portfolios and the prior-based allocations. An important influence here is the lower than expected risk after the 1930s (4.50% posterior mean for c versus 6% prior mean). The higher posterior allocations to stock when yield is zero (64% versus 46%) reflect this lower risk and the fact that expected return does not decline sufficiently (due to the low k estimate) to offset the risk effect (see (8)). The largest differences between posterior and prior weights occur when yield is high. Also note that the NP prior-predictive weights are constant at 46% as yield varies. This is consistent with the discussion below Eq. (8), given that the most likely
Prior pred.
CER (bps)
b ¼0
Optimal posterior
Loss Prior pred.
b¼0
17
–
66
1
–
12
66
–
66
22
–
0
100
–
66
67
–
11
42
46
64
9
0
3
64
46
64
21
2
0
87
46
64
38
8
2
22
18
64
2
0
9
64
46
64
21
2
0
100
58
64
57
11
8
scenarios under the NP prior are no yield effect (b ¼0) and expected return proportionality in variance (k¼ a). 6.3. Certainty equivalent (excess) return analysis at different levels of yield The observed shifts in predictive moments in Table 3 are economically large. In the last three columns of the table, we evaluate these differences from several angles using the certainty equivalent return metric. We emphasize, however, that this is not the only characterization of ‘‘significance’’ that is of interest from a research perspective. For example, important shifts in predictive risk and expected return could be completely offsetting with respect to asset allocation, but of great interest nonetheless. First, we assess the economic significance of the yield effect on expected return, controlling for risk. The analysis is similar to that in KS, but modified to accommodate the possibility of yield-related risk effects. Thus, we evaluate the extent to which utility would be reduced if our investor were forced to hold a portfolio based on a predictive distribution that ignores the posterior belief about b. This ‘‘suboptimal’’ scenario is implemented by setting b equal to zero in the third component of expected return in Eq. (3).
J. Shanken, A. Tamayo / Journal of Financial Economics 105 (2012) 131–152
Since we focus on mispricing here, the log(sd) deviation is set to zero throughout. The CER losses are evaluated relative to the posterior-based predictive beliefs. The predictive moments and the suboptimal weights derived in this context are given in the corresponding b ¼0 columns of Table 3. The weights do not vary with yield since the yield effect on expected return is ignored and the effect of yield on risk is small. Consider, for example, the high-yield case with the diffuse prior. The loss of 11 bps shown in the last column indicates that CER drops from the optimal 67 bps to 56 bps (a 16.4% drop) if we only invest 66% of our portfolio in the market, rather than being fully invested as required under the posteriorpredictive distribution. The loss is higher in the low-yield scenario, almost 150 bps annualized. The CER losses are a little smaller for OP investors and smaller still in the NP case. The NP investors have a much lower perception of b, so ignoring the mispricing effect naturally has less impact on expected returns and optimal investment in this case. Whether 150 bps per annum is an economically ‘‘significant’’ number is a somewhat subjective issue. In making this judgment, it is important to place it in context. Cross-sectional asset pricing studies sometimes report striking results related to well-known return anomalies. However, these anomalies appear to be driven, at least in part, by security mispricing (or perhaps unidentified risk) that is more pronounced in smaller firms. Such pricing effects (positive or negative) will tend to wash out in a broadly diversified market index. Thus, return predictability at the firm level need not translate directly into predictability in the aggregate. For example, although they do not report CERs, Avramov and Chordia (2006) find that predictable time variation in individualstock ‘‘alphas’’ can be profitably exploited by a meanvariance investor, whereas the gains from market timing are much smaller.15 Informally, one could think about a fund manager who is able to deliver 150 bps per year above the market return, year after year. Such an individual would be quite highly regarded, we suspect, so an equivalent loss would presumably be taken seriously as well. More formally, we see in Table 3 that a baseline CER, one computed under diffuse priors and ignoring all predictability, is 22 bps per month or a guaranteed premium over the riskless rate of 264 bps per annum. This corresponds to a portfolio with two-thirds invested in the market and one-third in riskless T-bills. By comparison, a 150 bps CER loss (57% of the baseline) seems to us economically important, but readers can decide for themselves. We have just explored one aspect of the economic significance of posterior beliefs. When the prior is diffuse, any inferred significance is naturally attributed to the impact of the data. This was the context in KS and much
15
Pastor and Stambaugh (2000) obtain large CER differences, e.g., 8% per annum, when considering investors with different beliefs about the correct model for expected returns. These results are due to crosssectional return patterns related to size and book-to-market (time-series predictability is not considered). However, even in this context, the CER differences decline to about 2% or less when realistic margin constraints are imposed.
141
of the Bayesian finance literature. However, when the prior is informative, posterior beliefs are influenced by the specific choice of prior as well as by the data. In this context, it is still of interest to ask whether the data have played an important role in altering one’s prior beliefs. We evaluate this notion of the incremental significance of the data by computing the CER loss associated with forcing an investor to hold the portfolio that would be optimal under the prior-predictive distribution, rather than the posterior-predictive distribution. This exploration of predictive distributions based on prior as well as posterior beliefs is a novel feature of our analysis, one that will likely prove useful in other Bayesian finance applications with informative priors. The results are reported in the CER/Loss/Prior pred. column of Table 3. The losses are greatest in the highyield scenario, consistent with the observations about portfolio weights made earlier. The resulting loss if the OP investor were to ignore the data and invest 58% in the market, rather than be fully invested, is 11 bps or about 130 bps per annum. The loss is 8 bps (about 100 bps per annum) for the NP investor. By the standard described above, we conclude that the empirical evidence on the yield–return relation can play an economically important role in investment, given the priors considered here (the evidence on risk persistence is considered later). Having examined the role of payout yield in asset allocation, holding the prior fixed, in Table 4 we turn to the economic effect of varying the prior. In addition to the priors considered in Table 3, we include a no-mispricing (b ¼0) prior in our comparisons. We report the losses at each level of yield, when an investor with a given prior (under ‘‘optimal’’) is forced to hold a suboptimal portfolio determined by the predictive distribution for a different prior. The optimal CERs are given in the second column and the losses in the next three columns of each panel. The biggest loss of 12 bps (about 140 bps per annum) occurs when the diffuse investor is forced to hold the nomispricing (b ¼0) investor’s portfolio. This makes sense since differences in posterior beliefs about the yield effect are greatest for these investors (posterior mean b equals 0.26 under the diffuse prior). Comparisons between informative priors result in losses that can be as large as 9 bps for the combination of OP and the no-mispricing prior. Among the priors that allow for mispricing effects (OP, NP, and diffuse) the largest loss is much smaller, however, at 3 bps. To summarize, having a prior that allows for incremental yield (mispricing) effects on expected return can have an economically important influence on asset allocation. We also find that investment results are not dominated by the priors considered, in the sense that the shifts in beliefs after learning from the data are also economically important. 6.4. Optimal weights and CER comparisons at different levels of sd Next, we examine the economic significance of predictability related to the lagged risk measure, sd. This is less straightforward than our analysis of b since movements in risk related to sd will, to some degree, be
142
J. Shanken, A. Tamayo / Journal of Financial Economics 105 (2012) 131–152
Table 4 Influence of different prior beliefs at various levels of yield. This table reports optimal certainty equivalent (excess) returns (CER) and losses (in basis points per month) when the investor is forced to hold a ‘‘suboptimal’’ portfolio based on a prior that is different from the actual one. The CERs are given for each prior and three levels of yield, shown as sample standard deviations from the mean. The log(sd) deviation is set to zero. The optimal portfolio is computed using the posterior distribution based on the actual prior. The investor has power utility with relative risk aversion of 5. Sample period: 1940–2004. Panel A: Optimal portfolio under diffuse prior Yield
1.5 0 1.5
Optimal CER
CER loss
Diffuse
b ¼0
Neutral (NP)
Overreaction (OP)
1 22 67
11 0 12
3 0 3
0 0 0
Panel B: Optimal portfolio under no mispricing, b ¼0 Yield
1.5 0 1.5
Optimal CER
CER loss
b¼0
Diffuse
Neutral (NP)
Overreaction (OP)
21 21 21
11 0 7
3 0 3
9 0 7
Panel C: Optimal portfolio under neutral prior (NP) Yield
1.5 0 1.5
Optimal CER
CER loss
Neutral (NP)
b ¼0
Diffuse
Overreaction (OP)
9 21 38
3 0 2
3 0 1
2 0 1
Panel D: Optimal portfolio under overreaction prior (OP) Yield
1.5 0 1.5
Optimal CER
CER loss
Overreaction (OP)
b ¼0
Diffuse
Neutral (NP)
2 21 57
9 0 8
0 0 0
2 0 1
accompanied by changes in expected return. As noted earlier, these effects on asset allocation would be offsetting under the proportionality condition. We study the partial effects of expected return and risk variation, as well as their joint effect. Since beliefs about mispricing are of secondary interest in this context, we report diffuse prior results only and set the payout yield deviation to zero throughout. To focus on the significance of the risk effect, holding expected return constant, we vary the level of sd and force our investor to hold a suboptimal portfolio based on the correct expected returns (given the model and sd), but with risk at the long-run level, c. The associated CER loss is then a measure of the economic significance of volatility persistence, as reflected in l1. The results are reported in the first panel of Table 5. The optimal and suboptimal allocations are quite different at both low and high levels of sd. The largest CER loss of 15 bps (1.8% per annum) occurs when sd is high and the increased risk is ignored, resulting in excessive market exposure.
In contrast, there is no loss associated with ignoring expected return movements related to sd (middle panel). Here, the suboptimal portfolio is based on the correct risk measure, but expected return is at the long-run level a. By (3), this is equivalent to letting k¼0. The weak effect is due to the relatively low estimate of k (mean ¼0.06) in Table 2, substantially below the value implied by the proportionality condition (mean a ¼ 0.57). Ignoring variation in both risk and expected return (bottom panel) naturally leaves the stock weight unchanged as sd varies. The resulting losses in the last panel are 10 bps and 7 bps in the high and low sd states, respectively. These numbers can be viewed as the value of conditioning on sd in the respective states. To summarize, persistence related to the predictive variable sd results in substantial variation in ex ante risk and, holding expected return constant, the effect on asset allocation is quite significant (see also Fleming, Kirby, and Ostdiek, 2003). Additional perspective on k is provided in the next subsection.
J. Shanken, A. Tamayo / Journal of Financial Economics 105 (2012) 131–152
Table 5 Predictive analysis at various levels of sd. This table reports predictive return moments for the market index, the optimal allocations to stock (weight), and certainty equivalent (excess) returns (CER) or losses (in basis points per month) associated with investment in optimal or suboptimal portfolios, respectively. These are given for three levels of the predictive variable log(sd), measured as sample standard deviations from the mean. The yield deviation is set to zero. Priors are diffuse. The investor has power utility with relative risk aversion of 5. Sample period: 1940–2004. The ‘‘suboptimal’’ portfolio allocations are computed under a different assumption in each panel. In the first scenario, the impact of sd on risk is ignored; in the second, the impact of sd on expected return through the changing ex ante variance is ignored; in the third, both of these effects are ignored. The associated loss is the difference between the optimal CER computed under the correct posterior-predictive distribution and the suboptimal CER. Scenario
Weight (%)
CER (bps)
Optimal
Suboptimal
Optimal
Loss
Ignore impact on risk 1.5 0 1.5
100 66 38
63 66 73
36 22 17
8 0 15
Ignore impact on E(r) 1.5 0 1.5
100 66 38
100 66 35
36 22 17
0 0 0
Ignore impact on risk and E(r) 1.5 100 0 66 1.5 38
66 66 66
36 22 17
7 0 10
6.5. Alternative informative prior beliefs about the risk/return tradeoff Table 6 compares results for three alternative informative prior beliefs about k. First is the N(a,a) conditional prior used earlier. This is centered on the proportionality condition, k¼ a, with the probability of a positive relation between risk and expected return equal to 0.84. The corresponding investor is ‘‘open-minded’’ about the risk-return relation. The second prior assumes that k¼ a with probability one. The last assumes that k¼0 with certainty, i.e., there is no relation between market risk and expected return, consistent with theoretical observations that the risk-return tradeoff could be positive or negative. It also serves as a useful reference point, given that other Bayesian studies of predictability typically ignore the risk-return connection. The first column of the table shows the different priors for k. Diffuse priors are used for parameters other than k and the yield deviation is set to zero while we vary the level of sd. The first prior in each pair serves as the belief under which the optimal portfolio is computed and the second prior determines the ‘‘suboptimal’’ portfolio. The biggest loss of 9 bps occurs when our ‘‘open-minded’’ investor is forced to hold the portfolio of one who dogmatically believes that k¼ a. Losses are also found for investors with dogmatic beliefs. For example, the CER loss is 7 bps when an investor who believes that k¼ a is forced to hold the portfolio of an investor who is certain there is no connection between risk and return, k¼ 0. Finally, since the data tend to confirm the k¼0 prior, the respective portfolios starting with this prior and the
143
Table 6 Influence of different prior beliefs about the risk-return tradeoff. This table examines the influence of prior beliefs about the risk-return parameter, k, in our model r t þ 1 ¼ ðakÞ þ kðs2et =c2 Þ þ bpyt þ et þ 1 , where rt þ 1 is the continuously compounded excess return on the NYSE value-weighted stock index and pyt is the payout yield. The unexpected return, et þ 1, is normally distributed with mean zero and conditional standard deviation set ¼ c exp[l1 log(sdt) þ l2pyt], where sdt is the monthly standard deviation estimate based on daily data within month t. Both py and log(sd) are measured as deviations from long-run values. The first prior in each pair serves as the belief under which the optimal portfolio is computed while the second prior in each pair determines the ‘‘suboptimal’’ portfolio for the CER analysis (k¼ a corresponds to Merton proportionality, while k¼ 0 indicates no risk-return tradeoff). Comparisons are made for three levels of the predictive variable log(sd), measured as sample standard deviations from the mean. The yield deviation is set to zero. Priors on parameters other than k are diffuse. Sample period: 1940–2004. Comparison Log(sd)
Weight (%)
CER (bps)
Optimal
Suboptimal
Optimal
Loss
k N(a,a), k ¼ a 1.5 0 1.5
100 66 38
60 60 60
36 22 17
9 0 6
k N(a,a), k ¼ 0 1.5 0 1.5
100 66 38
100 65 34
36 22 17
0 0 1
k ¼ a, k ¼ 0 1.5 0 1.5
60 60 60
100 65 34
8 18 41
4 0 7
N(a,a) prior are not very different and the corresponding CER loss is minimal. 7. Out-of-sample analysis In this final section, we explore out-of-sample performance for various portfolio strategies over the 1961–2004 period. This analysis is motivated in part by studies such as Bossaerts and Hillion (1999) and Goyal and Welch (2003) which cast doubts on out-of-sample return predictability. Our aim is to investigate two questions: (i) are there significant differences in the ex post performance of our portfolio strategies based on different prior beliefs?, and, in particular, (ii) do portfolio strategies that allow for yield-related mispricing outperform strategies that rule this out? We emphasize, however, that finding that a prior does not lead to superior ex post performance says little about whether the prior was ‘‘sensible’’ ex ante. We start our analysis in 1961 and end it in 2004. Thus, the first optimal portfolio is computed after observing 21 years of data (1940–1960). For each month, we compute the posterior distribution of the parameters conditional on the data up to that month. The predictive distribution of returns and the implied portfolio weights are then updated using the values of py and sd at the beginning of that month. These variables are measured as deviations from the means over the 1927–1939 period (i.e., prior to the sample used in the analysis). However, we have verified that our conclusions do not change much
144
J. Shanken, A. Tamayo / Journal of Financial Economics 105 (2012) 131–152
Panel A: Posterior mean of
0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05
Diffuse
NP
OP
1992
1994
1996
1998
2000
2002
2004
1994
1996
1998
2000
2002
2004
1990
1992
1988
1986
1984
1982
1980
1978
1976
1974
1972
1970
1968
1966
1964
1962
0
Panel B: Posterior mean of 2
3 2.5 2 1.5 1 0.5
1990
1988
1986
1984
1982
1980
1978
1976
1974
1972
1970
1968
1966
1964
-0.5
1962
0
-1 -1.5
Diffuse
NP
OP
Diffuse
NP
OP
-2
Panel C: Posterior mean of k (%)
1.2 1 0.8 0.6 0.4 0.2
2004
2002
2000
1998
1996
1994
1992
1990
1988
1986
1984
1982
1980
1978
1976
1974
1972
1970
1968
1966
1964
-0.2
1962
0
-0.4
Fig. 1. Time series of posterior means for selected parameters: b, l2, and k. Starting in 1961, for each month, we compute the posterior distribution of the parameters in the regression r t þ 1 ¼ ðakÞ þ kðs2et =c2 Þ þ bpyt þ et þ 1 , where rt þ 1 is the continuously compounded excess return on the NYSE value-weighted stock index and pyt is the payout yield. The unexpected return,
et þ 1, is normally distributed with mean zero and conditional standard deviation set ¼ c exp[l1 log(sdt) þ l2pyt], where sdt is the monthly standard deviation estimate based on daily data within month t. The posterior distributions of the parameters are computed using monthly data starting in 1940 up to the month of interest. The figures plot the time series of the posterior means of b, l2, and k over the 1961–2004 period.
J. Shanken, A. Tamayo / Journal of Financial Economics 105 (2012) 131–152
if the reference points are set differently (e.g., using the means up to the portfolio formation month). Before looking at the portfolio evidence, we discuss the time-series patterns of the posterior means. The means of the main parameters of interest are plotted in Fig. 1. As shown in Panel A of Fig. 1, the posterior means for the mispricing parameter b are always positive. Interestingly, for most of the period the posterior mean of b under the diffuse prior is larger than the posterior mean under OP. In other words, the data point towards larger mispricing effects than expected by the OP investor. The posterior mean under the diffuse prior reached its highest value at the end of the 1980s, stayed fairly constant in the early 1990s, and declined in the later years of that decade. In Panel B, we plot the posterior means of l2, the parameter that relates payout yield to market variance. As discussed earlier, the data completely dominate the priors for this parameter and, as a result, the posterior means under the three priors are very similar. They are positive for most of the period, but rather small in magnitude. Finally, Panel C of Fig. 1 plots the posterior means of k, the parameter that reflects the risk-return tradeoff. The mean under the diffuse prior is quite large and positive early in the period, but declines later to close to zero. We caution, however, that this parameter is not estimated very precisely, especially in the early years.
7.1. Market weights for the out-of-sample strategies We now turn to the market weights based on the posterior-predictive distributions. Panel B of Fig. 2 plots the weights for the diffuse, NP, and OP investors. Comparing Panels B and C reveals that much of the variation in market weights over time and across priors reflects similar variation in the conditional Sharpe ratios (although the level of market volatility will also affect the portfolio decision). Naturally, given the larger posterior means for b (Panel A of Fig. 1), for the most part, the diffuse and OP weights move more closely with payout yield (plotted in Panel A of Fig. 2) than the NP weights. The diffuse and OP weights also fluctuate more over time as a result of this yield-based market timing (see Panel A of Table 7). After 1975, and especially after 1985, the diffuse and OP weights are, as we expected, quite close. But interestingly, before 1975 the diffuse investor weights are closer to the NP weights than to the OP weights. Understanding this result is a useful exercise in tracing the impact of priors on final beliefs. In the pre-1975 period, the diffuse posterior means for the long-run Sharpe ratio g (not shown) were quite high (between 0.2 and 0.25) and so was a, the long-run expected excess market return (recall that a ¼ gc). In contrast, the posterior means for g under NP and OP were lower since they are shrunk toward the prior mean, 0.11. These observations would seem to suggest higher diffuse weights and lower weights for NP and OP. However, we have to consider the market timing effect as well. This had a negative impact on the weights for the diffuse prior and OP since payout yield was low over the 1961–1975 period (Panel A of Fig. 2). As a result, the timing effect roughly offsets the positive g Sharpe
145
ratio effect for the diffuse prior, thereby explaining the similar diffuse and NP weights, but the lower OP weights. For comparison, we also consider three other investors. The first investor is certain that b is zero (no mispricing), but shares the beliefs of NP and OP with regard to the other parameters. Next is an investor who also believes that b is zero, but in addition, is certain that the market risk premium is proportional to the variance of return. Finally, we analyze an investor who completely ignores any predictability. This investor believes that market returns are independent and identically distributed over time, with the usual diffuse priors for the i.i.d. model. Here, the predictive distribution for a given month is Student t with mean and variance (adjusted for degrees of freedom) equal to the corresponding sample moments of the market return up to that point. We see, in Panel A of Table 7, that all three investors allocate more to stock, on average, than the investors who allow for mispricing. 7.2. Out-of-sample performance Out-of-sample performance for the portfolios of the diffuse, NP, OP, no-mispricing, and i.i.d. investors is reported in Panel B of Table 7. First, we provide the means and standard deviations of the monthly portfolio returns for each strategy. The columns that follow show the out-of-sample Sharpe ratios and mean CERs, together with standard errors for each. Other studies have also looked at out-of-sample Sharpe ratios and CERs for Bayesian portfolio strategies (e.g., Avramov, 2004; Wachter and Warusawitharana, 2009), but most of these studies do not report measures of precision. An exception is Wachter and Warusawitharana, who evaluate the statistical significance of their out-of-sample results using a computationally intensive Monte Carlo procedure that imposes a null hypothesis of no predictability. Instead, we introduce a simple procedure for obtaining standard errors that is broadly applicable. Moreover, in comparing strategies, we need not impose any assumptions about the predictability of returns. Our approach uses the fact that the CER is a transformation of expected utility, where the latter is estimated as a sample mean of the monthly realized utilities over the period. Asymptotic standard errors for the CERs and their differences are then obtained using the Delta method (see, for example, Cochrane, 2005).16 Standard errors for the Sharpe ratios are obtained as in Opdyke (2007). Overall, the results in Panel B indicate that diffuse priors, whether for our predictability model or the i.i.d. model, result in the worst out-of-sample performance. This is particularly true in terms of the CER metric. NP and OP deliver the best performance, while the two nomispricing priors fall in between. These findings are 16 Without loss of generality, initial wealth is taken to be one. As transformations of returns, the monthly utilities have autocorrelations that are close to zero, but more substantial autocorrelation could easily be corrected for. Since the time-series mean can be viewed as the estimate from a regression on a constant, heteroskedasticity is accommodated insofar as White (1980) standard errors reduce to the usual ones in this case.
146
J. Shanken, A. Tamayo / Journal of Financial Economics 105 (2012) 131–152
Panel A: Payout ratio
0.07
0.06
0.05
0.04
0.03
0.02
2004
2002
2000
1996
1998
1992
1994
1990
1988
1986
1984
1980
1982
1976
1978
1974
1972
1970
1968
1966
1964
0
1962
0.01
Panel B: 12-Month rolling average of portfolio weights
120
100
80
60
40
20
1996
1998
2000
2002
1996
1998
2000
2002
2004
1994 1994
OP
1992
1990
NP
1992
1988
1986
1984
1982
1980
1978
1976
1974
1972
1970
1968
1966
1964
1962
Diffuse 0
Panel C: Sharpe ratios
0.35
0.3
0.25
0.2
0.15
0.1
0.05
Diffuse
NP
OP 2004
1990
1988
1986
1984
1982
1980
1978
1976
1974
1972
1970
1968
1966
1964
1962
0
Fig. 2. Time series of payout ratio, portfolio weights, and Sharpe ratios. Panel A plots the aggregate payout yield for Center for Research in Security Prices (CRSP) and Compustat firms over the 1961–2004 period (horizontal dashed line shows the mean yield). Payout yield, PY, is computed each month by dividing the annual payout yield in Boudoukh et al. (2007) at the beginning of that year by one plus the value-weighted return on the market up to the beginning of the given month. Panel B plots the 12-month rolling average portfolio weights for the diffuse, NP, and OP investors. The portfolio weights are obtained under the posterior predictive densities, and the investor is assumed to have a power utility function, with relative risk aversion parameter A ¼ 5. Panel C plots the time series of the conditional Sharpe ratios for each prior belief over the 1961–2004 period. Each month, we compute the ratio of the conditional mean and standard deviation using the posterior means of the parameters and current values of the predictive variables.
J. Shanken, A. Tamayo / Journal of Financial Economics 105 (2012) 131–152
147
Table 7 Out-of-sample analysis. This table reports the out-of-sample performance of allocations calculated under different prior beliefs (the priors are summarized in Appendix D). For each prior, we do the following. At the beginning of each month starting in 1961, the predictive distribution of returns is computed based on Eq. (3), using all past monthly data from 1940 onward. The corresponding optimal allocation between a riskfree asset and the market is then identified and the resulting portfolio weights are used to calculate an ‘‘out-of-sample’’ return for that month. The mean and standard deviation of the series of weights on the market are given in Panel A for each prior. The mean and standard deviation of the out-of-sample returns are reported in Panel B, along with the corresponding sample Sharpe ratio and its standard error. An annualized certainty equivalent (excess) return (CER) and its standard error are also shown. The CER is based on a probability distribution that equally weights the out-of-sample returns for the given prior. Panel C compares the CERs under different investment strategies. Sample period: 1940–2004. n Indicates estimate more than 2 standard errors from zero. Panel A: Portfolio weights Prior
Market weight (%)
Diffuse Neutral (NP) Overreaction (OP) i.i.d. b ¼0 b ¼0 and Merton
Mean
Stdev
67.1 70.8 56.4 75.7 77.0 74.8
28.2 17.7 23.7 15.2 15.2 7.2
Panel B: Out-of-sample performance Monthly portfolio return (%)
100% Fixed weight Diffuse Neutral (NP) Overreaction (OP) i.i.d. b ¼0 b ¼0 and Merton
Sharpe ratio
Annualized excess CER (%)
Mean
Stdev
Estimate
SE
Estimate
SE
0.95 0.74 0.79 0.73 0.79 0.80 0.80
4.23 3.08 2.91 2.59 3.20 3.07 3.16
0.116n 0.091n 0.112n 0.105n 0.102n 0.111n 0.108n
0.045 0.044 0.044 0.043 0.045 0.045 0.045
0.26 0.52 1.34n 1.28n 0.74 1.20n 1.01
0.70 0.48 0.45 0.39 0.51 0.49 0.51
Panel C: CER Comparison Strategies
CER (ii)–CER (i)
(i)
(ii)
Estimate
SE
100% Fixed weight
Diffuse Neutral (NP) Overreaction (OP) i.i.d. b ¼0 b ¼0 and Merton
0.26 1.08n 1.02n 0.48n 0.94n 0.75n
0.34 0.30 0.38 0.23 0.25 0.21
Diffuse
Neutral (NP) Overreaction (OP) i.i.d. b ¼0 b ¼0 and Merton
0.82n 0.76n 0.22 0.68n 0.49n
0.12 0.12 0.20 0.17 0.19
Neutral (NP)
Overreaction (OP) i.i.d. b ¼0 b ¼0 and Merton
0.06 0.60n 0.14 0.33n
0.13 0.13 0.08 0.12
Overreaction (OP)
i.i.d. b ¼0 b ¼0 and Merton
0.54n 0.08 0.27
0.24 0.20 0.21
i.i.d.
b ¼0 b ¼0 and Merton
0.46n 0.27n
0.08 0.06
b ¼0
b ¼0 and Merton
0.19n
0.08
148
J. Shanken, A. Tamayo / Journal of Financial Economics 105 (2012) 131–152
consistent with previous evidence (e.g., Avramov, 2004; Handa and Tiwari, 2006; Wachter and Warusawitharana, 2009) that non-dogmatic informative priors can improve ex post portfolio performance. In Panel C of Table 7, we compare the annualized CERs across the various portfolios. Note that the differences in CERs are estimated much more precisely than the CERs themselves, due to the positive correlations among the portfolio returns. For example, with a standard error of 0.51, the i.i.d. CER in Panel B is not reliably different from zero, yet the standard error of the CER difference for i.i.d. and NP is just 0.13. We find that both diffuse priors are significantly outperformed by all the other Bayesian strategies, with differences as high as 82 bps. NP dominates both no-mispricing priors at the 10% level, with significance at the 1% level when the Merton proportionality condition is imposed along with b ¼0. The small difference between NP and OP of just 6 bps is not reliably different from zero, however. A brief comment on power. Recall that our baseline CER, which ignores predictability, is 2.6% per annum. Thus, the annualized NP/no-mispricing CER difference of 168 bps per annum, while statistically significant only at the 10% level, is quite significant economically. Given the substantial variability of stock returns, the annualized standard error of about 100 bps, despite being one of the smallest in Panel C, is still too large to permit a strong out-of-sample inference in this case. In sum, portfolio allocations based on informative priors about predictability display better ex post performance than allocations based on diffuse priors and the informative priors do better when they are not dogmatic. However, we cannot distinguish statistically between the out-of-sample performance of our two rather different behavioral prior beliefs. 8. Summary and conclusions This is, to our knowledge, the first study that analyzes the conditional distribution of aggregate market returns in a Bayesian setting in which expected return may depend on the time-varying level of risk as well as a predictive variable. Our priors accommodate both uncertainty about the riskreturn relation and the degree to which return can be predicted by payout yield, after accounting for the level of risk. The simplest interpretation of this incremental yield effect is in terms of behavioral mispricing, although, as Lewellen and Shanken (2002) demonstrate, a kind of rational ‘‘mispricing’’ related to the process of learning over time can also induce (in-sample) predictability. Some portion could be due to model misspecification as well, perhaps related to intertemporal hedging demands if these vary with the level of yield. We find that prior beliefs about mispricing matter ex ante, in the sense that an investor with a prior that accommodates mispricing experiences a substantial drop in utility if forced to hold the portfolio of an investor who presumes there is no mispricing (and vice versa). Beliefs about the nature of the market risk-return tradeoff similarly impact ex ante utility, as does incorporating persistence in ex ante risk. Different beliefs about the manner in which
mispricing influences the relation between yield and expected return matter far less, presumably because the gap between those initial beliefs is reduced through the process of learning from the data. From an ex post perspective, openness to mispricing enhances out-of-sample performance, as long as the data are not given too much weight (diffuse priors). However, consistent with the ex ante result, the particulars of those informative prior beliefs have little impact on ex post performance. We conclude with a few comments about potential extensions of this work. Including additional predictors of risk and expected return would certainly be of interest. In particular, accommodating the mixed data sampling (MIDAS) risk framework of Ghysels, Santa-Clara, and Valkanov (2005) or the latent VAR approach of Brandt and Kang (2004) might be desirable. Although the predictive ability of payout yield is almost entirely attributed to mispricing, our intuitive approach to partitioning predictability evidence may reveal a greater role for risk with alternative predictors. Treating market variance as the relevant measure of risk was a natural starting point here, given the lack of attention to changing risk in this literature. However, future research will hopefully integrate the key features of our study (time-varying risk and expected return, along with priors that encompass behavioral and classical views) with a multiperiod portfolio perspective that accommodates hedging risks (e.g., Merton, 1973; Brandt et al., 2005). Appendix A. Overview of the Bayesian framework A.1. Equations for the stochastic regressors, yield, and standard deviation A distinguishing feature of our approach to the stochastic regressor problem is that we work with the following representation of the likelihood function or conditional joint density of returns and predictive variables: f ðr t þ 1 ,xt þ 1 9xt Þ ¼ f ðr t þ 1 9xt Þf ðxt þ 1 9r t þ 1 ,xt Þ,
ðA:1Þ
where f(rt þ 19xt) is the conditional density for the return model in (1) and (2). Consider the conditional distribution of the predictive variables. Given the rapid reaction of prices and slow adjustment of subsequent payouts to new information, percentage changes in yield and price will tend to be similar, but of the opposite sign. Thus, changes in log(yield) should be strongly negatively related to contemporaneous returns, with a coefficient near minus one. The density f(pyt þ 19rt þ 1, xt) is characterized by the following regression equation: logðyieldt þ 1 Þ ¼ j þ r logðyieldt Þ þ fr t þ 1 þ wt þ 1 :
ðA:2Þ
The yield variables are not deviations from means here. This relation is roughly a mechanical identity and, hence, it is not likely to contain much information, if any, about the parameters of primary interest in (2) and (3).17 To simplify the analysis, we formally assume that the prior 17 We estimated the model in (A.2) and obtained estimates of r and of f close to one and minus one, respectively. Moreover, the adjusted R2 is 0.999, which supports the view that the relation is roughly a mechanical identity.
J. Shanken, A. Tamayo / Journal of Financial Economics 105 (2012) 131–152
for the parameters in (A.2) is independent of the prior for the risk-return parameters (c, g, l, b, k). In addition, the disturbance in (3) is assumed to be independent of the negligible disturbance in (A.2).
A.2. The ex post volatility equation We now turn to the conditional density f(sdt þ 19rt þ 1, xt). We assume that wt þ 1 in (A.2) is independent of sdt þ 1, given rt þ 1 and xt. Thus, the density f(xt þ 19rt þ 1, xt) in (A.1) factors into the conditional densities for log(yieldt þ 1) and sdt þ 1. Together with the assumption of prior independence for the parameters in (A.2), this implies that the payout yield equation can be ignored in deriving the posterior distribution for the primary parameters (c, g, l, b, k). It is tempting to treat sd in a similar manner in order to further simplify the analysis. However, this would ignore the fact that the density of this sample statistic for month tþ1 necessarily depends on the ex ante level of risk at the beginning of the month. Thus, sd contains important information about the parameters of the conditional risk relation in (2). In general, the conditional density for sdt þ 1 could be quite complicated. To obtain a tractable solution to the stochastic regressor problem in this context, the withinmonth changes in conditional moments will be ignored. Specifically, we model the daily returns within month tþ1 as independent and identically normally distributed conditional on xt, the state of the world at the beginning of month tþ1. Although a strong assumption, the distribution is permitted to vary from month to month and thus, the unconditional distribution is a mixture of normals. For simplicity, assume there are 22 trading days in each month. Recalling that returns are continuously compounded, the sum of daily returns is just the monthly return rt þ 1. Given our i.i.d. assumption conditional on xt, the mean and variance of the daily (log) returns in month tþ1 equal the corresponding quantities for monthly returns, in (1) and (2), divided by 22. Moreover, the normality assumption implies that the sum and sample variance of daily returns are (conditionally) independent. Therefore, f(sdt þ 19rt þ 1, xt) reduces to f(sdt þ 19xt).18 It is convenient now to work with 2 the transformed variable, vt þ 1 22sdt þ 1, and apply standard results under (conditional) normality. Thus, f(vt þ 19xt) 2 is the density for a variable that is distributed as set/22 (the daily variance) times a chi-square variate with 21 degrees of freedom. This provides the final element needed to derive the posterior distributions via the likelihood function in (A.1). Conditioning on vt is, of course, equivalent to conditioning on sdt, and so we redefine xt to be (pyt, vt)0 in the remaining appendices. Note that the density for vt þ 1 depends on the para2 meters c and l through set, which changes over time with xt. 18 The various ARCH/GARCH models with normally distributed errors estimated in French, Schwert, and Stambaugh (1987) and elsewhere rule out contemporaneous correlation between returns and both unexpected volatility and changes in conditional volatility. Our framework does allow for the latter effect, but not the former. This is one direction in which the Bayesian model might fruitfully be extended. Hentschel (1995) analyzes a broad class of symmetric and asymmetric GARCH models.
149
Stepping back from the technical details, the substantive effect of incorporating the ex post volatility equation is to provide more precise estimates of the parameters in the variance relation (2). The increased precision is achieved by exploiting the within-month variation in returns rather than simply relying on squared monthly residual returns to identify the conditional variance relation.
A.3. The simulation methodology The complexity of our regression specification is such that simple analytical formulas for posterior moments are not readily available. The conditional posteriors are complicated as well by the fact that the variance parameters also enter into the conditional mean of the return equation. We use a simulation technique known as ‘‘importance sampling’’ to estimate the model parameters. In general, importance sampling can be used to approximate the expectation of any function of the model parameters. Consider, for example, the computation of the posterior mean for y, the parameter in some model. By standard Bayesian analysis, the posterior is proportional to the product of the prior and the likelihood function. Hence, the posterior mean is the expectation under the prior of yf(D9y), divided by p(D). Here, f is the likelihood function or density for the data D and p(D) is the probability of the data under the prior, i.e., the expectation under the prior of the likelihood function evaluated at the given data. The expectation in the numerator can be obtained, via simulation, by repeatedly drawing values of y from the prior density and averaging the products, yf(D9y). If the data and the prior ‘‘disagree,’’ however, i.e., if the likelihood function and the prior density are concentrated in different regions of the parameter space, the products will be extremely volatile and often close to zero. This leads to slow convergence and other computational problems. The idea behind importance sampling is ingenious in its simplicity. Random draws are made, not from the prior, but from an importance density, i(y). Almost any density can serve as i(y), but the idea is to pick a density that might roughly approximate the unknown posterior distribution. If one then computes the weighting function, w(y)f(D9y)p(y)/ i(y), at each iteration, the average of yw(y) converges to the required expectation of yf(D9y) under the prior. This amusing conclusion is derived as follows: Z Z ywðyÞiðyÞ dy ¼ ½yf ðD9yÞpðyÞ=iðyÞiðyÞ dy Z ¼
yf ðD9yÞpðyÞ dy:
ðA:3Þ
The advantage of drawing from i(y), rather than the prior, is that, insofar as i(y) does succeed in approximating the posterior, the w(y)’s will tend to be far more stable, leading to much faster convergence. The improved stability is a consequence of the fact that the numerator of the weighting function is proportional to the posterior density. By an argument similar to that in (A.3), the average of the weights w(y) converges to p(D) as the number of random draws approaches infinity. The posterior mean for y can then be obtained by taking the ratio.
150
J. Shanken, A. Tamayo / Journal of Financial Economics 105 (2012) 131–152
A variety of methods have been proposed to specify importance densities. We have had success with the following approach. Initially, the prior is taken as i(y) and ‘‘rough’’ estimates of the posterior moments are obtained through simulation. A second-round importance density is then specified using the rough posterior mean and standard deviation in place of the prior moments. After several rounds, each with a modest number of draws of y, variability in the importance weights is reduced substantially. At that point, one more importance sampling procedure is run with a large number of draws. Various diagnostics (see Geweke, 1989) are computed to help detect problems with convergence, which are rarely encountered in this application. While we have discussed the computation of the posterior mean, the same ideas apply if the expectation of a more complicated function p(y) is desired. To compute expected utility, returns are randomly drawn from their conditional distribution, given the ‘‘current’’ values of the predictive variables and the draw of y, and U(R(y)) then plays the role of p(y). Additional details are given in the remaining two appendices. Appendix B. Obtaining the posterior moments Let D (D1, D2, y , DT), where Dt (rt, xt) is the data for the problem. Recall that xt (pyt, vt)0 , where pyt is the payout yield and vt is the within-month sum of squared deviations from the sample mean of daily returns for month t. The joint density (conditioned on x0) for the data is f(D) ¼f(D1)f(D29D1) y f(DT9D1, D2, y , DT 1), where f(Dt þ 19D1, D2, y , Dt) ¼f(rt þ 19D1, D2, y , Dt)f(xt þ 19rt þ 1, D1, D2, y , Dt) for t ¼1, y, T 1. We assume that xt captures the state of the world at time t in the sense that f(rt þ 19D1, D2, y , Dt)¼f(rt þ 19xt) and f(xt þ 19rt þ 1, D1, D2, y , Dt)¼f(xt þ 19rt þ 1, xt). This representation of the joint density of returns and yields includes the restricted VAR of Stambaugh (1999) with homoskedastic errors as a special case, with vt omitted and yield following an AR(1) process. We assume that the (negligible) residual, wt þ 1, in the yield Eq. (A.2) is independent (conditional on xt) of the daily returns within month t þ1 and hence, independent of their sum, et þ 1, and vt þ 1 as well. In this case, f ðxt þ 1 9r t þ 1 ,xt Þ ¼ f ðpyt þ 1 9r t þ 1 ,xt Þf ðvt þ 1 9xt Þ,
ðB:1Þ
where the density for pyt þ 1 is based on (A.2) and the 2 density for vt þ 1 is that of set/22 times a chi-square variate with 21 degrees of freedom, as discussed in Appendix A. Let y be the parameter vector for the joint distribution of (rt þ 1, pyt þ 1, vt þ 1) and partition y as (y1, y2), where y1 (g, b, c, l, k) and y2 (j, r, f, sw), where y1 is the vector of return-risk parameters in (2) and (3) and y2 the additional parameters in (A.2). Recall that a gc. Note that the density for vt þ 1 does not depend on y2. Given the discussion above, the joint density of the data or, equivalently, the likelihood function can be written as f ðD9yÞ ¼ hðD, y1 ÞgðD, y2 ÞkðD, y1 Þ:
ðB:2Þ
Here, h(D, y1) is the product of the conditional densities for rt þ 1, g(D, y2) the product of the densities for pyt þ 1, and k(D, y1) the product for vt þ 1. Note that h is identical to
what the conditional density of r, given x, would be if x were nonstochastic. Since g does not depend on y1, it will act as a constant of proportionality in determining the posterior density for y1 and can be ignored. Likewise, the prior for y2 can be ignored, given our assumption that it is independent of p(y1), the prior for y1. Therefore, the posterior moments for y1 can be obtained using the importance sampling approach outlined in Appendix A with h(D, y1)k(D, y1) as the likelihood function and the prior p(y1). Appendix C. Obtaining the predictive moments and optimal weights In the case of expected predictive utility (conditional on values of py and sd), E[U(R)], we would ideally want to compute p(y1) E[U(R)9y1)]. Since this is not readily obtainable, for each draw of y1, we randomly draw R(y1) from the normal density f(R9y1) and compute U(R(y1)). The utilities generated in this manner are i.i.d. draws. The expectation of U(R(y1)), conditional on y1, is E[U(R)9y1] and the unconditional expectation is, by the law of iterated expectations, E[U(R)]. Thus, U(R(y1)) plays the role of p(y1) in this context. We also modify the procedure just described to incorporate antithetic sampling (see Bauwens, Lubrano, and Richard, 1999, pp. 75–76), which increases computational efficiency. Given y1 and the fixed values of py and sd, the mean and variance of the conditional normal distribution of returns is known. The unexpected return, e, is drawn from a normal distribution with mean zero and the given variance, and is then added to the expected return. Then, an antithetic return is obtained by repeating the computation with e in place of e. Utility is computed for each pair of returns and the average serves as U(R(y1)). The expectation of U(R(y1)) is unchanged using this modification, but variability is reduced, improving the computations. To compute the optimal weight, the returns from 100,000 iterations are saved and an optimization routine is used, again incorporating the antithetic perspective. We then sample returns from the original 100,000 with replacement and compute an approximate weight based on Eq. (8). This is done 100 times, generating a series of 100 i.i.d. estimates of the original optimum. We confirm that the difference between the approximation in Eq. (8) and the exact optimal weight is consistently less than 0.01 for a range of cases. Similar bootstrap routines should prove useful in estimating optimal weights and evaluating precision in more complicated situations for which simple approximations are not available. Appendix D. Summary of prior beliefs The table summarizes the continuous prior beliefs for the parameters in the regression model: r t þ 1 ¼ ðakÞ þ kðs2et =c2 Þ þ bpyt þ et þ 1 : rt þ 1 is the continuously compounded excess return on the NYSE value-weighted stock index and pyt is the payout yield. The unexpected return, et þ 1, is normally distributed with mean zero and conditional standard deviation
J. Shanken, A. Tamayo / Journal of Financial Economics 105 (2012) 131–152
set ¼c exp[l1 log(sdt)þ l2pyt], where sdt is the monthly standard deviation estimate based on daily data within month t. Both py and log(sd) are measured as deviations from long-run values. Since a ¼ gc, its prior is conditional on c, the long-run level of risk. We assume that the priors of c and g are independent. Parameter Prior distribution c
g a l1 l2 b
k
i.i.d.
c Log N(0.06, 0.01) g N(0.111, 0.05) a9c N(cE[g], c2var(g)) l1 N(0.5, 0.5) l2 N(10,10) Overreaction prior (OP): b N(0.2, 0.1) Neutral prior (NP): b N(0, 0.1) Dogmatic no-mispricing prior: b ¼0 k9a N(a,a) Dogmatic Merton proportionality: k ¼ a Dogmatic no risk-return tradeoff: k¼0 Independent and identically distributed returns with diffuse priors
References Abel, A., 1988. Stock prices under the time-varying dividend risk: an exact solution in an infinite-horizon general equilibrium model. Journal of Monetary Economics 22, 375–393. Attanasio, O., 1991. Risk, time-varying second moments and market efficiency. Review of Economic Studies 58, 479–494. Avramov, D., 2002. Stock return predictability and model uncertainty. Journal of Financial Economics 64, 423–458. Avramov, D., 2004. Stock return predictability and asset pricing models. Review of Financial Studies 17, 699–738. Avramov, D., Chordia, T., 2006. Predicting stock returns. Journal of Financial Economics 82, 387–415. Avramov, D., Wermers, R., 2006. Investing in mutual funds when returns are predictable. Journal of Financial Economics 81, 339–377. Barberis, N., 2000. Investing for the long run when returns are predictable. Journal of Finance 55, 225–264. Bauwens, L., Lubrano, M., Richard, J., 1999. Bayesian Inference in Dynamic Econometric Models. Oxford University Press, Oxford. Black, F., 1976. Studies of stock price volatility changes. In: Proceedings of the 1976 Meetings of the Business and Economics Statistics Section, American Statistical Association, pp. 177–181. Bossaerts, P., Hillion, P., 1999. Implementing statistical criteria to select return forecasting models: what do we learn? Review of Financial Studies 12, 405–428. Boudoukh, J., Michaely, R., Richardson, M., Roberts, M., 2007. On the importance of measuring payout yield: implications for empirical asset pricing. Journal of Finance 62, 877–915. Brandt, M., Kang, Q., 2004. On the relationship between the conditional mean and volatility of stock returns: a latent VAR approach. Journal of Financial Economics 72, 217–257. Brandt, M., Goyal, A., Santa-Clara, P., Stroud, J., 2005. A simulation approach to dynamic portfolio choice with an application to learning about return predictability. Review of Financial Studies 18, 831–873. Campbell, J., 1987. Stock returns and the term structure. Journal of Financial Economics 18, 373–399. Campbell, J., Shiller, R., 1988. The dividend price ratio and expectations of future dividends and discount factors. Review of Financial Studies 1, 195–228. Christie, A., 1982. The stochastic behavior of common stock variances: value, leverage and interest rate effects. Journal of Financial Economics 10, 407–432. Cochrane, J., 2005. Asset Pricing, revised edition. Princeton University Press, New Jersey. DeBondt, W., Thaler, R., 1985. Does the stock market overreact? Journal of Finance 40, 793–805. De Long, J., Shleifer, A., Summers, L., Waldmann, R., 1990. Noise trader risk in financial markets. Journal of Political Economy 98, 703–738.
151
Fama, E., Schwert, W., 1977. Asset returns and inflation. Journal of Financial Economics 5, 115–146. Fama, E., French, K., 1988. Dividend yields and expected stock returns. Journal of Financial Economics 22, 3–25. Fama, E., French, K., 1989. Business conditions and expected returns on stocks and bonds. Journal of Financial Economics 25, 23–49. Fleming, J., Kirby, C., Ostdiek, B., 2003. The economic value of volatility timing using ‘‘realized’’ volatility. Journal of Financial Economics 67, 473–509. French, K., Schwert, W., Stambaugh, R., 1987. Expected stock returns and volatility. Journal of Financial Economics 19, 3–29. Geweke, J., 1989. Bayesian inference in econometric models using Monte Carlo integration. Econometrica 57, 1317–1339. Ghysels, E., Santa-Clara, P., Valkanov, R., 2005. There is a risk-return tradeoff after all. Journal of Financial Economics 76, 509–548. Glosten, L., Jagannathan, R., Runkle, D., 1993. On the relation between the expected value and volatility of the nominal excess return on stocks. Journal of Finance 48, 1779–1801. Goyal, A., Welch, I., 2003. Predicting the equity premium with dividend ratios. Management Science 49, 639–654. Guo, H., Whitelaw, R., 2006. Uncovering the risk-return relation in the stock market. Journal of Finance 61, 1433–1463. Handa, P., Tiwari, A., 2006. Does stock market predictability imply improved asset allocation and performance? Evidence from the U.S. stock market (1954–2002). Journal of Business 79, 2423–2468. Harvey, C., 1989. Time-varying conditional covariances in tests of assetpricing models. Journal of Financial Economics 24, 289–317. Harvey, C., 2001. The specification of conditional expectations. Journal of Empirical Finance 8, 573–637. Hentschel, L., 1995. All in the family: nesting symmetric and asymmetric GARCH models. Journal of Financial Economics 39, 71–104. Johannes, M., Polson, N., Stroud, J., 2002. Sequential optimal portfolio performance: market and volatility timing. Unpublished working paper. Columbia University, University of Chicago, and University of Pennsylvania. Jorion, P., Goetzmann, W., 1999. Global stock markets in the twentieth century. Journal of Finance 54, 953–980. Kahneman, D., Tversky, A., 1979. Intuitive prediction: biases and corrective procedures. Management Science 12, 313–327. Kandel, S., McCulloch, R., Stambaugh, R., 1995. Bayesian inference and portfolio efficiency. Review of Financial Studies 8, 1–53. Kandel, S., Stambaugh, R., 1996. On the predictability of stock returns: an asset-allocation perspective. Journal of Finance 51, 385–424. Keim, D., Stambaugh, R., 1986. Predicting returns in the stock and bond markets. Journal of Financial Economics 17, 357–390. Lakonishok, J., Shleifer, A., Vishny, R., 1994. Contrarian investment, extrapolation, and risk. Journal of Finance 49, 1541–1578. Lewellen, J., Shanken, J., 2002. Learning, asset-pricing tests, and market efficiency. Journal of Finance 57, 1113–1145. Merton, R., 1973. An intertemporal capital asset pricing model. Econometrica 41, 867–887. Merton, R., 1980. On estimating the expected return on the market: an exploratory investigation. Journal of Financial Economics 8, 323–361. Opdyke, J., 2007. Comparing Sharpe ratios: so where are the p-values? Journal of Asset Management 8, 308–336. Pastor, L., Stambaugh, R., 2000. Comparing asset pricing models: an investment perspective. Journal of Financial Economics 56, 335–381. Pastor, L., Stambaugh, R., 2001. The equity premium and structural breaks. Journal of Finance 56, 1207–1239. Rozeff, M., 1984. Dividend yields are equity risk premiums. Journal of Portfolio Management 11, 68–75. Schwert, W., 1990. Indexes of U.S. stock prices from 1802–1987. Journal of Business 63, 399–442. Scruggs, J., 1998. Resolving the puzzling intertemporal relation between the market risk premium and conditional market variance: a twofactor approach. Journal of Finance 53, 575–603. Shiller, R., 2000. Irrational Exuberance. Princeton University Press, Princeton, NJ. Stambaugh, R., 1999. Predictive regressions. Journal of Financial Economics 54, 375–421. Tamayo, A., 2002. Stock return predictability, conditional asset pricing models and portfolio selection. Unpublished working paper. London School of Economics. Wang, L., 2004. Investing for the short run when return volatility is predictable. Unpublished working paper. Singapore Management University.
152
J. Shanken, A. Tamayo / Journal of Financial Economics 105 (2012) 131–152
Wachter, J., Warusawitharana, M., 2009. Predictable returns and asset allocation: should a skeptical investor time the market? Journal of Econometrics 148, 162–178. Welch, I., Goyal, A., 2008. A comprehensive look at the empirical evidence of equity premium prediction. Review of Financial Studies 21, 1455–1508. White, H., 1980. A Heteroskedasticity-consistent covariance matrix estimator and a direct test for heteroskedasticity. Econometrica 48, 817–838.
Whitelaw, R., 1994. Time variations and covariations in the expectation and volatility of stock market returns. Journal of Finance 49, 515–541. Zellner, A., 1971. An Introduction to Bayesian Inference in Econometrics. John Wiley and Sons, New York.