Balanced predictive regressions

Balanced predictive regressions

Journal of Empirical Finance 54 (2019) 118–142 Contents lists available at ScienceDirect Journal of Empirical Finance journal homepage: www.elsevier...

883KB Sizes 0 Downloads 53 Views

Journal of Empirical Finance 54 (2019) 118–142

Contents lists available at ScienceDirect

Journal of Empirical Finance journal homepage: www.elsevier.com/locate/jempfin

Balanced predictive regressions✩ Yu Ren a , Yundong Tu b , Yanping Yi c ,∗ a

Wenlan School of Business, Zhongnan University of Economics and Law, Wuhan, China Guanghua School of Management and Center for Statistical Science, Peking University, Beijing, China c School of Economics and Academy of Financial Research, Zhejiang University, Hangzhou, China b

ARTICLE

ABSTRACT

INFO

In a predictive regression, a less persistent return series is regressed on the first lag of some highly persistent predictors. Therefore, predictability could often be missed due to the persistence imbalance. The aim of this paper is to balance the predictive regression by augmenting it with an additional lag of the predictors. This second lag generally reduces the persistence level on the right-hand side of the equation to achieve balance. We then propose a simple test procedure for univariate and multivariate predictive regressions, based on least squares estimation. Empirically, we reexamine the popular predictors in the literature and find quite different results.

JEL classification: G1 G12 C12 C22 C32 Keywords: Nearly integrated process Predictive regression Predictability Stock return

1. Introduction Following Dow (1920), there has been a wealth of literature exploring the predictability of stock returns. The conventional test of predictability is analyzed by the predictive regression model (Stambaugh, 1999; Campbell and Yogo, 2006), 𝑦𝑡 = 𝜇𝑦 + 𝛽𝑥𝑡−1 + 𝑢𝑡 ,

𝑥𝑡 = 𝜇𝑥 + 𝑣𝑡 ,

𝑣𝑡 = 𝛼𝑣𝑡−1 + 𝑒𝑡 ,

(1)

where stock returns, 𝑦𝑡 , are regressed on the first lagged predictor, 𝑥𝑡−1 , which could be the dividend–price ratio (DP), the earnings– price ratio (EP), or others (see, e.g., Campbell (1987, 1991), Campbell and Shiller (1988), Fama (1991), Fama and French (1988, 1989), Hodrick (1992), Keim and Stambaugh (1986), and the references therein). These predictors typically manifest a high, yet unknown, degree of persistence (see, e.g., Campbell and Yogo (2006), Elliott and Stock (1994), Mankiw and Shapiro (1986) and Stambaugh (1999)), the largest autoregressive roots of which are very close to unity and vary with data frequencies. One could conclude that there is strong evidence for the predictability of 𝑦𝑡 by 𝑥𝑡−1 if a statistical test suggests that 𝛽 ≠ 0. Two main pitfalls have not attracted sufficient attention in the predictive regression literature. First, the predictability of 𝑦𝑡 by a highly persistent 𝑥𝑡−1 (|𝛼| close to one) in model (1) would imply that 𝑦𝑡 should have inherited a sufficient amount of persistence. However, 𝑦𝑡 , such as asset returns, only displays minor autocorrelation. This imbalance in the persistence of the predictive regression ✩ The authors thank Peter Phillips, Jun Yu and participants of SETA 2014 at Academia Sinica (Taiwan), Conference on Recent Developments in Financial Econometrics and Applications at Deakin University (December 2014 in Melbourne, Australia), 2015 Guanghua Time Series Forum at Xi’an, and SETA 2016 at University of Waikato (New Zealand) for many valuable suggestions that helped improve the paper. Ren’s research is supported by the National Natural Science Foundation of China (71771192, 71301135, 71203189, 71131008); Tu thanks support from the National Natural Science Foundation of China (71301004, 71472007, 71532001, 71671002) and that from the Center for Statistical Science at Peking University; Yanping Yi acknowledges financial support from the National Natural Science Foundation of China (71973122). ∗ Correspondence to: School of Economics and Academy of Financial Research, Zhejiang University, Hangzhou, 310058, China. E-mail address: [email protected] (Y. Yi). URL: http://yanpingyi.weebly.com (Y. Yi). https://doi.org/10.1016/j.jempfin.2019.09.001 Received 17 January 2019; Received in revised form 26 August 2019; Accepted 5 September 2019 Available online 12 September 2019 0927-5398/© 2019 Elsevier B.V. All rights reserved.

Y. Ren, Y. Tu and Y. Yi

Journal of Empirical Finance 54 (2019) 118–142

(1) suggests that 𝛽, if it is not zero, should be rather small in magnitude and probably statistically insignificant. As a result, predictability is rarely detected in empirical results of analyses using predictive regression (see, e.g., Campbell and Yogo (2006)). Second, almost all test statistics in the literature involve an unknown nuisance parameter, which makes the test procedure very complicated in practice. These two pitfalls motivate us to reconsider what the appropriate model to uncover the predictability of stock returns is. In this paper, we provide a simple solution that simultaneously addresses both issues. When a single highly persistent predictor is considered, we add a second lag of the predictor, 𝑥𝑡−2 , to the predictive regression. Balance is achieved through the interaction between the two highly persistent lags. If we regress, for example, the excess return on both the first and the second lags of the predictor, then this will lead to an automatic (approximate) difference between the two lags that essentially functions as the predictor. Unlike the predictive regression, the resulting model will enable us to uncover evidence of predictability, if there is any, due to the persistence balance achieved through this differenced predictor. When the regressor is highly persistent, it is very likely that the approximately differenced predictor contains valuable information for forecasting stock returns (Lee et al., 2015). When the regressor is stationary, our proposed model nests the predictive regression as a special case and enables the discovery of predictability. The same idea can be applied to the multivariate predictive regression, when the predictors are not cointegrated.1 The advocacy of balanced predictive regressions is our first contribution. Our second contribution is to propose simple test procedures for both univariate and multivariate predictive regressions, based on least squares estimation. The test statistics can be constructed using any popular statistical software. Their asymptotic distributions are either normal or 𝜒 2 and therefore free of nuisance parameters. In contrast, most of the existing test procedures for predictive regressions involve non-standard and non-pivotal statistical inferences (see, e.g., Campbell and Yogo (2006), Chen et al. (2013), Jansson and Moreira (2006), and the references therein). Moreover, our tests are asymptotically valid under fairly general conditions on the predictors, allowing their largest autoregressive root to be unity, local to unity, moderately deviated from unity, or strictly less than unity. Furthermore, they are robust to the unknown persistence level of the predictors, which cannot be consistently estimated. These distinctions are crucial to practical model building and stock return predictability assessment. Our paper is closely related to a large body of literature that provides rigorous analysis on how to conduct valid inference in conventional predictive regressions. Elliott and Stock (1994) introduce an asymptotic framework to model the behavior of the persistent predictor, the autoregressive root of which is allowed to vary within a local (1∕𝑇 ) neighborhood of unity. Under this localto-unity asymptotics (see, e.g., Cavanagh (1985), Chan (1988), Chan and Wei (1987), Nabeya and Tanaka (1990), Phillips (1987) and the references therein for reviews of theoretical results), the finite sample distributions of test statistics can be approximated accurately when the predictor series is highly persistent. However, the test procedures are not asymptotically pivotal and are dependent on a nuisance parameter for the persistence level 𝑐 (see, e.g., Cavanagh et al. (1995), Campbell and Yogo (2006), Phillips (2013, 2014), Jansson and Moreira (2006) and Chen et al. (2013)). Since these existing procedures all share non-standard and non-pivotal inference that results from the inestimable persistence level 𝑐, they require intensive simulations to obtain the critical values. Predictive regressions with multiple predictors have been explored in the literature, but their theoretical underpinnings are considerably underdeveloped compared to the univariate predictor case. Related empirical studies include the work of Ang and Bekaert (2007), Baker et al. (2003), Baker and Stein (2004), Baker and Wurgler (2004), Fama and French (1988), Guo and Savickas (2006), Keim and Madhavan (2000), Keim and Stambaugh (1986), Lettau and Ludvigson (2001), and Pontiff and Schall (1998), among many others. Related theoretical investigations include Amihud and Hurvich (2004), Amihud et al. (2009) and Kostakis et al. (2015). The notion of balanced predictive regression was pioneered in the cointegration models analyzed by Phillips and Magdalinos (2009). They develop an extended IV procedure that enjoys attractive asymptotic features such as a standard chi-square test with no precise knowledge of the persistence level of the regressors. In recent contributions, Kostakis et al. (2015) and Phillips (2013) apply this IV procedure to the predictive regression model and show that the standard statistical properties are inherited. Compared to the IV procedure, our proposed method has the advantage of simple implementation via least squares estimation and avoids the construction of instruments and the estimation of complicated terms involving variance matrices (see, e.g., Phillips (2013)). Furthermore, the critical values for our testing procedures are tabulated in the standard normal and 𝜒 2 distributions, which is obviously appealing to financial practitioners. As an illustration of our method, we collect monthly and quarterly S&P 500 return data from 1927 to 2015. Following Welch and Goyal (2008), we use ten highly persistent variables and combinations of them as potential predictors. For monthly data, we find that there is always predictability. Many predictors and their combinations can predict stock returns. For quarterly data, the predictability still exists, but the signal becomes weak in the recent periods. Predictability almost disappears in the post-1952 quarterly data. Moreover, we compare our results with the IVX test in Kostakis et al. (2015). We find that there are important differences with respect to which predictors are significant and at which level. The remainder of this paper is organized as follows. Section 2 presents our proposed model and the theoretical results for inference. In Section 3, we evaluate the finite sample performances of the proposed methods through simulation studies. Section 4 revisits the empirical analysis of asset return predictability. Section 5 concludes the paper. All the mathematical proofs are collected 



in Appendix A. Appendix B includes all the figures. In this paper, we use ‘‘⟶’’ and ‘‘⟶’’ to denote convergence in probability and convergence in distribution, respectively. For any positive integer 𝑚, 𝐼𝑚 denotes the 𝑚 × 𝑚 identity matrix. 1 When the multiple predictors are cointegrated, the balance of the regression equation is achieved without adding the second lag of the predictors. See Remark 1 for detailed analysis. However, in this case, adding the second lag does not harm the regression. By doing so, we only lose efficiency.

119

Y. Ren, Y. Tu and Y. Yi

Journal of Empirical Finance 54 (2019) 118–142

2. Balanced predictive regressions with cointegrated predictors 2.1. Preliminaries We address the predictability of stock returns from the perspective that the predictive regression model should be balanced if one seeks to uncover predictability. To this end, we propose a simple balanced model by adding an additional lag of the predictors to the regression equation. Balance is achieved through the ‘‘cointegration’’ between the two highly persistent lags. An easy-to-implement test procedure is proposed that uses least squares estimation and the standard critical values. In this paper, we focus on the case where the predictors are highly persistent. We first introduce the notions of high persistence and cointegration (in a more general sense). Definition 1 (Highly Persistent Predictors). The scalar predictor series {𝑥𝑡 } is highly persistent if its largest autoregressive root 𝛼 = 1 − 𝑐∕𝑇 𝛾 , where 𝑐 ≥ 0 and 𝛾 ∈ (0, 1]. Definition 1 covers the unit root case (𝑐 = 0), the local-to-unity asymptotic framework (𝛾 = 1)2 and the moderate deviations from a unit root asymptotic framework (𝛾 ∈ (0, 1)) (see, e.g., Phillips and Magdalinos (2007)). Therefore, Definition 1 bridges the very different convergence rates of the highly persistent case and the unit root case. Definition 2 (Cointegration for Highly Persistent Time Series). The 𝑘-dimensional time series {𝑋𝑡 } is said to be cointegrated if each of the series is highly persistent, while ∃𝑎 ≠ 0 ∈ 𝑘 , such that 𝑎′ 𝑋𝑡 is stationary. In this paper, a scalar time series {𝑥𝑡 } is stationary if its largest autoregressive root 𝛼 < 1 is fixed. Definition 2 extends the classical ( )′ definition of cointegration (see, e.g., Hamilton (1994) and Johansen (1995)) to highly persistent time series. If 𝑋𝑡 = 𝑥𝑡 , 𝑥𝑡−1 , where ′ 𝑥𝑡 is a highly persistent scalar predictor, then with 𝑎 = (1, −1) , 𝑥𝑡 is cointegrated with its own lag.

2.2. The general setting We next provide a general framework of the balanced predictive regressions with ‘‘cointegrated’’ predictors in order to derive the asymptotic results and propose the testing procedures. In applied work, we often find that popularly used predictors are not ( )′ cointegrated. Denote the lagged non-cointegrated predictors as 𝑍𝑡−1 = 𝑍1,𝑡−1 , … , 𝑍𝑚,𝑡−1 . If we regress stock returns 𝑦𝑡 on 𝑍𝑡−1 , then the regression is not balanced. We propose a general balanced predictive regression by adding one more lag: 𝑦𝑡 = 𝜇𝑦 + 𝛽1′ 𝑍𝑡−1 + 𝛽2′ 𝑍𝑡−2 + 𝑢𝑡 , ( ) 𝐶 𝑍𝑡 = 𝜇𝑧 + 𝑣𝑡 , 𝑣𝑡 = 𝐼𝑚 − 𝛾 𝑣𝑡−1 + 𝑒𝑡 , 𝑇

( ) ( [ 2 𝑢𝑡 i.i.d. 𝜎 ∼ 0𝑚+1 , 𝑢 𝑒𝑡 𝜌

𝜌′ 𝛺𝑒𝑒

]) ,

(2)

where 𝐶 is an 𝑚 × 𝑚 matrix, and 𝛾 ∈ (0, 1], 𝛺𝑒𝑒 is of full rank 𝑚. In regression equation (2), 𝑍𝑡−1 is cointegrated with its own lag, 𝑍𝑡−2 . Since stock returns are stationary and feature moderate autocorrelation, model (2) suggests the (approximate) difference of 𝑍𝑡−1 as the predictor. Predictability is then tested by examining whether 𝛽1 ≠ 0 or 𝛽2 ≠ 0 statistically. Predictive regressions with multiple-predictor (𝑚 > 1) have been explored in the literature, but the theoretical underpinnings are considerably underdeveloped compared to the univariate predictor case (𝑚 = 1). We study the multivariate regression in this section, including the univariate predictor as a special case. A]detailed description of the Theorem 1. [ [ ]′ test procedures is provided following [ ]′ [ ]′ ′ Denote 𝛽̂1 = 𝛽̂1(1) , … , 𝛽̂1(𝑚) and 𝛽̂2 = 𝛽̂2(1) , … , 𝛽̂2(𝑚) as the OLS estimators of 𝛽1 = 𝛽1(1) , … , 𝛽1(𝑚) and 𝛽2 = 𝛽2(1) , … , 𝛽2(𝑚) from [ ′ ]′ ∑ ′ our balanced equation (2). We define 𝑋𝑡−1 = 𝑍𝑡−1 , 𝑍𝑡−2 , and denote 𝑋𝑡−1,𝜇̄ = 𝑋𝑡−1 − 𝑇 1−2 𝑇𝑠=3 𝑋𝑠−1 as the sample mean-corrected series of 𝑋𝑡−1 , for 𝑡 = 3, … , 𝑇 . Theorem 1 summarizes their asymptotic properties. Theorem 1. Assume that {(𝑦𝑡 , 𝑍𝑡 ) ∶ 1 ≤ 𝑡 ≤ 𝑇 } are generated by model (2). 𝑍𝑡 is highly persistent but not cointegrated. The 𝑚 × 𝑚 matrix 𝛺𝑒𝑒 defined in Eq. (2) is nonsingular. Then, as 𝑇 → ∞: √  𝑇 (𝛽̂1 − 𝛽1 ) ⟶ 𝜓,

√  𝑇 (𝛽̂2 − 𝛽2 ) ⟶ −𝜓,

[ ∑𝑇 ]−1 (∑ )−1 √ ∑ ( ′ ( )−1 )   𝑡=2 𝑋𝑡−1,𝜇̄ 𝑋𝑡−1,𝜇̄ 𝑇 ′ where 𝜓𝑇 ≡ 𝑇 𝑇𝑡=2 𝛥𝑍𝑡−1,𝜇̄ 𝑢𝑡 ⟶ 𝜓 ∼  0, 𝜎𝑢2 𝑉 𝑎𝑟 𝛥𝑍𝑡−1 . And as 𝑇 → ∞ : 𝜎 ̂𝑢2 ⟶ 𝑡=2 𝛥𝑍𝑡−1,𝜇̄ 𝛥𝑍𝑡−1,𝜇̄ 𝑇 [ ] ( )−1 ( )−1 𝜎𝑢2 𝑉 𝑎𝑟 𝛥𝑍𝑡−1 −𝜎𝑢2 𝑉 𝑎𝑟 𝛥𝑍𝑡−1 ̂𝑢2 is any consistent estimate of 𝜎𝑢2 . ( ) ( ) −1 −1 , where 𝜎 −𝜎𝑢2 𝑉 𝑎𝑟 𝛥𝑍𝑡−1 𝜎𝑢2 𝑉 𝑎𝑟 𝛥𝑍𝑡−1

2 Under the local-to-unity asymptotic framework 𝛼 = 1 − 𝑐∕𝑇 , the established limiting result also holds for 𝑐 taking negative values (i.e., the predictor series is mildly explosive in finite samples of data).

120

Y. Ren, Y. Tu and Y. Yi

Journal of Empirical Finance 54 (2019) 118–142

Testing procedure: (i) Single reject the null hypothesis of no predictability of 𝑦𝑡 by the 𝑗𝑡ℎ predictor 𝑍𝑗,𝑡−1 (𝐻0 ∶ 𝛽1(𝑗) = 𝛽2(𝑗) = 0), if { test: We } | | | | max ||𝑡𝛽̂(𝑗) || , ||𝑡𝛽̂(𝑗) || > 𝑍1−𝜂∕2 , where 𝜂 is the significance level, and 𝑍1−𝜂∕2 is the corresponding quantile of the standard | 1 | | 2 | normal distribution. We shall discuss more choices of the test statistics in Section 2.3.1. [ ] [ ] (ii) Joint test: Denote the 𝑚 × 2𝑚 matrices 𝑅1 = 𝐼𝑚 0𝑚×𝑚 and 𝑅2 = 0𝑚×𝑚 𝐼𝑚 . Define the associated Wald statistics: −1 [𝑇 ]−1 ⎞ ∑ ( 2 )−1 ′ ⎛  ′ ′⎟ ⎜ ̂ 𝛽̂1 ⟶ 𝜒, 𝛽1 𝑅1 𝑋𝑡−1,𝜇̄ 𝑋𝑡−1,𝜇̄ 𝑅1 ̃ 𝑊 𝑎𝑙𝑑1 = 𝜎 ̂𝑢 ⎜ ⎟ 𝑡=2 ⎝ ⎠ −1 [𝑇 ]−1 ⎞ ∑ ( 2 )−1 ′ ⎛  ′ ′⎟ 𝛽̂2 ⟶ 𝜒, 𝛽̂2 ⎜𝑅2 𝑋𝑡−1,𝜇̄ 𝑋𝑡−1, 𝑅 ̃ 𝑊 𝑎𝑙𝑑2 = 𝜎 ̂𝑢 𝜇̄ 2⎟ ⎜ 𝑡=2 ⎝ ⎠

(3)

(4)

where 𝜒̃ ∼ 𝜒 2 (𝑚). The asymptotic distributions in Eqs. (3) and (4) are obtained using Theorem 1. We reject the null hypothesis of no joint predictability (𝐻0 ∶ 𝛽1 = 𝛽2 = 0) if either 𝑊 𝑎𝑙𝑑1 > 𝜒1−𝜂 or 𝑊 𝑎𝑙𝑑2 > 𝜒1−𝜂 , where 𝜂 is the significance level, and } { 𝜒1−𝜂 is the corresponding quantile of 𝜒 2 (𝑚). Essentially, we are using 𝑊 𝑎𝑙𝑑max = max 𝑊 𝑎𝑙𝑑1 , 𝑊 𝑎𝑙𝑑2 as the test statistic. In addition, the asymptotically degenerate distributions in Eqs. (3) and (4) also suggest that a weighted average (the sum of positive weights equals 1) of 𝑊 𝑎𝑙𝑑1 and 𝑊 𝑎𝑙𝑑2 can be used for testing joint predictability. The decision rules for those tests are the same; however, the local powers are different. In finite samples, the test based on 𝑊 𝑎𝑙𝑑max is better in terms of power. Remark 1 (Return Prediction with Cay). Our balanced model can also be used to test for predictability when the multivariate predictors are cointegrated. We consider the example adopted from Lettau and Ludvigson (2001). They find that (log) aggregate consumption (𝑐𝑡 ), asset holdings (𝑎𝑡 ), and labor income (𝑦𝑡 ) share a common long-term trend (𝑐𝑎𝑦𝑡 ), and this trend is a strong univariate predictor of future stock returns (𝑟𝑡 ), despite that the individual growth rates of consumption, labor income, and wealth bear little relationship to future stock returns. To fix the idea, we consider the following multiple-predictor regression: (5)

𝑟𝑡 = 𝛼 + 𝛽𝑐 𝑐𝑡−1 + 𝛽𝑎 𝑎𝑡−1 + 𝛽𝑦 𝑦𝑡−1 + 𝑢𝑡 .

This regression is different from what is considered in Lettau and Ludvigson (2001). They use the (estimated) 𝑐𝑎𝑦𝑡 as a single predictor, while model (5) directly includes the three predictors in the regression without aggregation. For ease of illustration, we exclude the other predictors used by Lettau and Ludvigson (2001). We note that model (5) is balanced due to the cointegration among the three predictors. Testing the null hypothesis of nonpredictability 𝐻0 ∶ 𝛽𝑐 = 𝛽𝑎 = 𝛽𝑦 = 0 against the alternative 𝐻1 ∶ 𝛽𝑐 ≠ 0 or 𝛽𝑎 ≠ 0 or 𝛽𝑦 ≠ 0 in model (5) is equivalent to testing the corresponding hypothesis concerning the coefficient of 𝑐𝑎𝑦𝑡−1 in the regression of 𝑟𝑡 on 𝑐𝑎𝑦𝑡−1 . We will show that the joint asymptotic distribution of the least squares estimator of 𝛽𝑐 , 𝛽𝑎 , and 𝛽𝑦 is degenerately normal. The associated 𝑡 statistic is asymptotically standard normal.3 [ ]′ [ ]′ Denote 𝑋𝑡−1 = 𝑐𝑡−1 , 𝑎𝑡−1 , 𝑦𝑡−1 , 𝛽 = 𝛽𝑐 , 𝛽𝑎 , 𝛽𝑦 . Provided that there is one cointegration relationship among 𝑐𝑡 , 𝑎𝑡 , and 𝑦𝑡 , 𝛽 is proportional in Assumption 1 in Appendix A), i.e., 𝛽 = 𝑏0 𝑎⃗ and 𝑏0 is a scalar. Denote [ ]′ to the cointegration vector [𝑎⃗ (defined ]′ 𝛽̂ = 𝛽̂𝑐 , 𝛽̂𝑎 , 𝛽̂𝑦 as the OLS estimator of 𝛽 = 𝛽𝑐 , 𝛽𝑎 , 𝛽𝑦 from the regression equation (5). We have, as 𝑇 → ∞, )  √ ( ⃗ , 𝑇 𝛽̂ − 𝛽 ⟶ 𝑎𝜁

(6)

( ) ( ) where 𝜁 ∼  0, 𝜎𝑢2 𝛴 −1 is defined in Eq. (13), 𝜎𝑢2 = 𝑉 𝑎𝑟 𝑢𝑡 , and 𝛴 (a positive scalar in ‘‘cay’’ case) is defined in Assumption 2(ii) in Appendix A. This limiting distribution directly follows from Lemma 1 in Appendix A. Eq. (6) further implies that | | | | || || |𝑡𝛽̂ | = |𝑡𝛽̂ | = |𝑡𝛽̂ | , | 𝑐| | 𝑎| | 𝑦|

almost surely

as 𝑇 → ∞,

[ ]′ under 𝐻0 ∶ 𝛽 = 𝛽𝑐 , 𝛽𝑎 , 𝛽𝑦 = 0. {

} | | | | || || |𝑡𝛽̂ | , |𝑡𝛽̂ | , |𝑡𝛽̂ | > 𝑍1−𝜂∕2 , where 𝑡𝛽̂ , 𝑡𝛽̂ , 𝑡𝛽̂ are the usual 𝑡 statistics 𝑐 𝑎 𝑦 | 𝑐| | 𝑎| | 𝑦| computed from any statistical software, 𝜂 is the significance level, and 𝑍1−𝜂∕2 is the corresponding quantile of the standard normal distribution. Eq. (6) indicates that this testing procedure has an asymptotically correct size.

We reject the null hypothesis of no predictability if max

2.3. Linkage to inference in predictive regressions Since the literature of predictability mostly focuses on the univariate case, we specialize Theorem 1 to the univariate predictor and discuss the connection between our balanced regression and the conventional literature. 3 Attention has to be paid to the predictive regression of 𝑟 on the univariate predictor 𝑐 , 𝑎 , or 𝑦 , one by one. This is the univariate predictive 𝑡 𝑡−1 𝑡−1 𝑡−1 regression that has been analyzed intensively in the literature. The inference with the predictive regression is nonstandard and involves simulations to obtain the critical values. See Chen et al. (2013) and references therein for detailed discussions. Prior findings indicate that tests based on these univariate predictive regressions often lead to the rejection of predictability due to imbalance in the persistence level.

121

Y. Ren, Y. Tu and Y. Yi

Journal of Empirical Finance 54 (2019) 118–142

Suppose that the observed data, {(𝑦𝑡 , 𝑥𝑡 ) ∶ 1 ≤ 𝑡 ≤ 𝑇 } are generated by ( ( 2 )) 𝑦𝑡 = 𝜇𝑦 + 𝛽1 𝑥𝑡−1 + 𝛽2(𝑥𝑡−2 + 𝑢) ( ) i.i.d. 𝑡, 𝜎𝑢 𝜌𝜎𝑢 𝜎𝑒 𝑢𝑡 , 𝑒𝑡 ∼ 0, , (7) 𝑐 𝑥𝑡 = 𝜇𝑥 + 𝑣𝑡 , 𝑣𝑡 = 1 − 𝛾 𝑣𝑡−1 + 𝑒𝑡 , 𝜌𝜎𝑢 𝜎𝑒 𝜎𝑒2 𝑇 where 𝑐 ≥ 0 and 𝛾 ∈ (0, 1]. When considering a single predictor, Eq. (7) provides a simple balanced model to uncover predictability, if there is any. Eq. (7) differs from the conventional predictive regression Eq. (1) in that a second lag, 𝑥𝑡−2 , is included as a regressor. To understand the mechanism behind this, consider the case, for example, in which the predictor 𝑥𝑡 has a unit root (i.e., 𝛼 = 1). It is unlikely that Eq. (1) would suggest any predictability, since a unit root process is not expected to provide any level effect on the return series, which is often characterized as a white noise process. However, it is possible that the difference of the unit root process, 𝑒𝑡−1 , may contain valuable information in forecasting the return series. This transient effect,4 can be uncovered in our proposed Eq. (7) while it is missed in the predictive regression Eq. (1). Our proposed regression model becomes balanced since 𝑒𝑡−1 , the change in 𝑥𝑡−1 , will function as the essential predictor of 𝑦𝑡 . When 𝑥𝑡 has an autoregressive root close to unity, our balanced regression will automatically use the (approximate) difference of 𝑥𝑡−1 as the predictor. Predictability is uncovered using our balanced predictive regression if the data suggest either 𝛽1 ≠ 0 or 𝛽2 ≠ 0. Denote 𝛽̂𝑠 as the OLS estimator of 𝛽𝑠 , 𝑠 = 1, 2, from our balanced model (7). As a special case of Theorems 1 and 2(i) rewrites the result for the univariate model. Assume that {(𝑦𝑡 , 𝑥𝑡 ) ∶ 1 ≤ 𝑡 ≤ 𝑇 } are generated by model (7). √ √  𝜎  𝜎 𝑇 (𝛽̂1 − 𝛽1 ) ⟶ 𝜎𝑢 , 𝑇 (𝛽̂2 − 𝛽2 ) ⟶ − 𝜎𝑢 , where  ∼  (0, 1) is defined in Eq. (18). 𝑒 𝑒 ) √ √ √  𝜎  𝜎 ( (ii) If 𝑥𝑡 is stationary (𝛼 < 1 is fixed), then as 𝑇 → ∞: 𝑇 (𝛽̂1 − 𝛽1 ) ⟶ 𝜎𝑢 , 𝑇 (𝛽̂2 − 𝛽2 ) ⟶ 𝜎𝑢 −𝛼 + 1 − 𝛼 2  , where  and 𝑒 𝑒  are independent standard normal random variables. Theorem 2.

(i) If 𝑥𝑡 is highly persistent, then as 𝑇 → ∞:

For highly persistent predictors, Theorem 2(i) states that the joint distribution of (𝛽̂1 , 𝛽̂2 ) is asymptotically degenerate (normal) √ and converges at the standard rate, 𝑇 . However, this asymptotically degenerate distribution does not depend on the nuisance parameter, 𝑐, unlike those obtained in the conventional predictive regression (see, e.g., Campbell and Yogo (2006), Jansson and Moreira (2006) and Chen et al. (2013) and the references therein for detailed reviews). Therefore, the subsequent test procedures avoid using simulations to obtain the critical values and are largely simplified. Theorem 2(ii) states the asymptotic property for the stationary predictor case. The limiting distributions are standard but different from those derived in (i). Therefore, practical applications of these results would rely on a judgment regarding whether the predictor is stationary or highly persistent. Remark 2. If 𝑦𝑡 (e.g., stock returns) is stationary, and the predictor 𝑥𝑡 is highly persistent, then empirical results (reported in Section 4) often imply 𝛽1 ≈ −𝛽2 , suggesting the (approximate) difference of 𝑥𝑡−1 as the predictor. However, Theorem 2(i) states that the asymptotic properties of (𝛽̂1 , 𝛽̂2 ) are free of the true value (𝛽1 , 𝛽2 ). Therefore, our testing procedure based on Theorem 2 remains valid when 𝛽1 ≠ −𝛽2 . 2.3.1. Testing procedure According to Theorem 2, we separate the tests for highly persistent predictors from those for stationary predictors. | | (i) Highly persistent predictor: We reject the null hypothesis of non-predictability (𝐻0 ∶ 𝛽1 = 𝛽2 = 0) if either |𝑡𝛽̂ | > 𝑍1−𝜂∕2 | 1| | | or |𝑡𝛽̂ | > 𝑍1−𝜂∕2 , where 𝑡𝛽̂ (𝑠 = 1, 2) is the usual 𝑡 statistic of 𝛽̂𝑠 computed using any statistical software, 𝜂 is the significance 𝑠 | 2| level, and 𝑍1−𝜂∕2 is the corresponding quantile of the standard normal distribution. In addition, the asymptotically degenerate | |√ | |√ distribution in Theorem 2(i) suggests that under 𝐻0 ∶ 𝛽1 = 𝛽2 = 0, | 𝑇 𝛽̂1 | = | 𝑇 𝛽̂2 | almost surely as 𝑇 → ∞. Therefore, | | | | | | | | (8) |𝑡𝛽̂ | = |𝑡𝛽̂ | , almost surely as 𝑇 → ∞. | 1| | 2| Thus more asymptotically { valid test } procedures can be constructed. The testing procedure we propose above is based on the | | | | test statistic 𝑡max = max |𝑡𝛽̂ | , |𝑡𝛽̂ | , which can detect either direction of deviation from 𝐻0 , 𝛽1 ≠ 0 or 𝛽2 ≠ 0. We could also | 1| | 2| | | | | employ a weighted average (the sum of positive weights equals 1) of |𝑡𝛽̂ | and |𝑡𝛽̂ |.5 The decision rules for those tests are | 1| | 2| the same, due to the asymptotically degenerate distribution in Theorem 2(i) or Eq. (8). However, their local powers differ. In finite samples, the test based on 𝑡max is better in terms of power. (ii) Stationary predictor (|𝛼| < 1 is fixed): The asymptotical joint distribution in Theorem 2(ii) is standard. Therefore, conventional 𝐹 or Wald statistics can be used to test predictability. Although we provide different test strategies for highly persistent predictors and stationary predictors, these procedures are much simpler than those involved in conventional predictive regression. In the following subsection, we elaborate on the connection and the distinction between our proposed inference procedure and that based on conventional predictive regression, when the predictor 𝑥𝑡 is highly persistent. 4 5

We thank Peter C.B. Phillips for his insightful discussion of the level effect and the transient effect on predictability. { } | | | | The testing procedure based on 𝑡min = min |𝑡𝛽̂ | , |𝑡𝛽̂ | has the asymptotically correct size but fails to detect the alternatives 𝛽1 ≠ 0, 𝛽2 = 0 and 𝛽1 = 0, 𝛽2 ≠ 0. | 1| | 2| 122

Y. Ren, Y. Tu and Y. Yi

Journal of Empirical Finance 54 (2019) 118–142

2.3.2. Linkages between balanced regressions and predictive regressions Assume that data are generated from model (7), which nests the conventional predictive regression (1) as a special case, and that the predictor is highly persistent. If one analyzes the predictability of stock returns using the predictive regression model (1), Proposition 1 shows how the estimators of the parameters of interest behave asymptotically. ( ) Proposition 1. {(𝑦𝑡 , 𝑥𝑡 ) ∶ 1 ≤ 𝑡 ≤ 𝑇 } are generated by model (7). In addition, 𝑢𝑡 , 𝑒𝑡 is bivariate normally distributed. If 𝑥𝑡 is highly persistent, then 𝑦𝑡 can be expressed as ( ) ( ) ( ) 1 1 𝑦𝑡 = 𝜇𝑦 + 𝛽1 + 𝛽2 𝑥𝑡−1 + 𝑢𝑡 − 𝛽2 𝑒𝑡−1 + 𝑂 𝑥𝑡−1 + 𝑂𝑝 . (9) 𝛾 𝑇 𝑇𝛾 (i) Under the local-to-unity asymptotic framework, 𝛼 = 1 − 𝑇𝑐 , as 𝑇 → ∞, ( )  1 1 1 𝜎 𝜎𝑢 ∫0 𝐽𝑐 (𝑟)𝑑𝑊 (𝑟)−∫0 𝐽𝑐 (𝑟)𝑑𝑟𝑊 (1) 𝑢 2 𝑇 𝛽̂ [ ]2 +(1−𝜌 ) 2 [ 1 + 𝛽2 − 𝛽1 − 𝛽2 ⟶ 𝜌 𝜎 𝜎 𝑒

1

1

𝑒

∫0 [𝐽𝑐 (𝑟)]2 𝑑𝑟− ∫0 𝐽𝑐 (𝑟)𝑑𝑟

 [ ]2 ]1∕2 , 1 1 ∫0 [𝐽𝑐 (𝑟)]2 𝑑𝑟− ∫0 𝐽𝑐 (𝑟)𝑑𝑟

where  is a standard normal variable

independent of {𝑥𝑡 }, and 𝑊 (⋅), 𝐽𝑐 (⋅) are defined in Appendix A. 𝜎𝑢 𝜎𝑒

(ii) Under the moderate deviations from a unit root asymptotic framework, 𝛼 = 1 − 𝑇𝑐𝛾 , 𝛾 ∈ (0, 1), as 𝑇 → ∞, 𝑇 √ 2𝑐 , where  ∼  (0, 1) is defined in Appendix A.

1+𝛾 2



(𝛽̂ 1 + 𝛽2 − 𝛽1 − 𝛽2 ) ⟶

Eq. (9) highlights the important connection between our proposed balanced predictive regression and conventional predictive regression. In predictive regression, one essentially tests for the existence of the long-run or level effect by using 𝑥𝑡−1 , i.e., 𝐻0 ∶ 𝛽 = 0 in model (1). From Eq. (9), this is equivalent to testing 𝐻0 ∶ 𝛽1 + 𝛽2 = 0 in model (7). However, even if no evidence is found to support the existence of the level effect, i.e., 𝐻0 ∶ 𝛽1 + 𝛽2 = 0 cannot be rejected, this does not necessarily imply the non-existence of the transient effect (both 𝛽1 = 0 and 𝛽2 = 0). It is possible that 𝛽1 ≈ −𝛽2 ≠ 0, meaning that the (approximate) first difference of 𝑥 helps predict 𝑦. Therefore, this connection vividly demonstrates the distinction, namely, that test procedures based on model (1) could miss predictability that might otherwise be uncovered by our balanced predictive regression model. According to our experience with real data, in many cases, 𝛽̂1 ≈ −𝛽̂2 ≠ 0 (see, e.g., Tables 8 and 9), and the predictor series display a high, yet unknown, degree of persistence. For this very important scenario, we show in Proposition 2 why predictability could often be missed in model (1). Proposition 2. Suppose that {(𝑦𝑡 , 𝑥𝑡 ) ∶ 1 ≤ 𝑡 ≤ 𝑇 } are generated by model (7). If 𝑥𝑡 is highly persistent, 𝛽1 ≠ 0, and 𝛽1 + 𝛽2 =

𝑏 1+𝛾

, any

𝑇 2

predictability test based on model (1) will make an incorrect conclusion of non-predictability with a high probability, even when the sample size, 𝑇 , goes to infinity. ( ) ( ) 𝑐𝛽2 1 𝑏 It is straightforward to rewrite Eq. (7) as 𝑦𝑡 = 𝜇𝑦 + 𝑥𝑡−1 + 𝑢𝑡 − 𝛽2 𝑒𝑡−1 + 𝑂𝑝 𝑇1𝛾 . If we estimate the coefficient + 1+𝛾 𝛼 𝑇𝛾 𝑇 2

̂ of 𝑥𝑡−1 , using either ( an)ordinary least squares estimator or a weighted least squares estimator (Chen et al., 2013), then 𝛽 = 𝑐𝛽2 1 𝑏 1 . The interpretation is that predictability tests based on model (1) will lose power in the local neighborhood 1+𝛾 + 𝛼 𝑇 𝛾 +𝑂𝑝 1+𝛾 𝑇 2

𝑇 2

of 𝛽1 + 𝛽2 = 0, even when 𝑇 → ∞. This is demonstrated numerically in our Figs. 3–4. Furthermore, Tables 2 and 3 in Chen et al. (2013) and Figure 3 in Campbell and Yogo (2006) also illustrate Proposition 2. 3. Monte Carlo studies In this section, we conduct Monte Carlo studies to evaluate the empirical size and power of our proposed Wald test and 𝑡 test proposed in Sections 2.2 and 2.3 based on the balanced predictive regression (henceforth, B). We compare their performance with that of the predictability test suggested by Kostakis et al. (2015), which is based on the IVX procedure (henceforth, IVX) and has been shown to be more powerful than other existing competitors. For all the designs considered, we set up the balanced predictive regression model as in the previous section and conduct the tests of null hypothesis that there is no predictability as stated earlier. For the IVX test, we adopt the regression model considered by Kostakis et al. (2015) and formulate the tests accordingly. 3.1. Single predictor Our first data generating process (DGP) is the classical predictive regression model. ( [ ]) ( ) i.i.d. 𝑦 = 𝛽𝑥𝑡−1 + 𝑢𝑡 , 1 𝜌 𝑐 𝐃𝐆𝐏 𝟏 ∶ 𝑡 𝛼 =1− , 𝑢𝑡 , 𝑒𝑡 ∼  0, , 𝑥𝑡 = 𝛼𝑥𝑡−1 + 𝑒𝑡 , 𝜌 1 𝑇 for 𝑡 = 1, … , 𝑇 . Rejection frequencies of the 𝑡 test and the IVX test using 10,000 replications are computed for values of 𝑐 ∈ {0, 5, 10, 20, 50}, 𝑇 ∈ {50, 250}, and 𝜌 ∈ {−0.95, −0.5, 0}. In Table 1, we report the estimated sizes for 𝛽 = 0. For a small sample size, 𝑇 = 50, both tests suffer from size distortions. As sample size increases to 250, the estimated sizes approach the nominal level of 5%. To evaluate the power of the tests, we consider the following sequence of local alternatives: 𝑏√ 𝛽= 1 − 𝜌2 for 𝑏 ∈ {0, 4, 8, 16, 32, 64, 100}, (10) 𝑇 123

Y. Ren, Y. Tu and Y. Yi

Journal of Empirical Finance 54 (2019) 118–142

Fig. 1. Power curves for IVX and Balanced 𝑡 Tests, for DGP 1, 𝑇 = 50.

with 𝑏 = 0 corresponding to the size of each test. We plot in Figs. 1–2 the rejection frequency (power) for both tests against the values of 𝑏 for 𝑇 = 50 and 250, respectively. For 𝑇 = 50, the power of the balanced test is quite close to that of the IVX test for 𝑐 close to 0. We observe that as 𝑐 increases, the power of the IVX test decreases and becomes lower than that of the balanced test. Interestingly, we observe that when 𝑐 = 50, i.e., 𝑥𝑡 is a white noise, the IVX test does not have any power. However, for larger 𝑇 , the IVX test always has higher power than the new test. We conclude that when 𝑥𝑡 is close to the unit root (𝑐 close to 0), the new test is competitive with the IVX test for a moderate sample size (𝑇 = 250). The second DGP is the newly proposed model in this paper. ( [ ]) ( ) i.i.d. 𝑦 = 𝛽1 𝑥𝑡−1 + 𝛽2 𝑥𝑡−2 + 𝑢𝑡 , 1 𝜌 𝑐 𝑢𝑡 , 𝑒𝑡 ∼  0, , 𝐃𝐆𝐏 𝟐 ∶ 𝑡 𝛼 =1− , 𝑥𝑡 = 𝛼𝑥𝑡−1 + 𝑒𝑡 , 𝜌 1 𝑇 for 𝑡 = 1, … , 𝑇 . We set 𝛽1 = −𝛽2 − 𝛽 = 0.15, where 𝛽 is defined as in Eq. (10). Such a choice for the values of 𝛽1 and 𝛽2 is in line with the motivation of this paper that an approximate difference of the persistent predictors might contain valuable information for predicting the stock return, i.e., a balanced regression might be preferred. Other parameter values are set the same as those in DGP 1. We use this design to check whether the IVX based test would have competitive power with respect to the newly proposed test. We report the estimated rejection frequencies for both tests for 𝑏 = 0 in Table 2. We note that under 𝑏 = 0, the predictors have only a transient effect on the stock return as 𝛽1 = −𝛽2 = 0.15. That is, the differenced predictor rather than the one-lag predictor exhibits predictability. We find from Table 2 that the IVX test has power quite close to the nominal level. However, our balanced 124

Y. Ren, Y. Tu and Y. Yi

Journal of Empirical Finance 54 (2019) 118–142

Fig. 2. Power curves for IVX and Balanced 𝑡 Tests, for DGP 1, 𝑇 = 250.

test enjoys power quite close to 1 for 𝜌 ≠ 0. For 𝜌 = 0, our test still has power increasing to 1 as the sample size increases to 1000 (results for 𝑇 = 1000 are not reported to save the space, but available upon request). To check whether the IVX test would have power in this setting, we plot the rejection frequencies against the values of 𝑏 in Figs. 3–4. We observe that the power of the IVX test begins to increase as 𝑏 increases. Nevertheless, our new test is generally more powerful than the IVX test for all the parameter specifications.

3.2. Multiple predictors

3.2.1. Cointegrated predictors We next consider cases with multiple predictors. For ease of presentation, we focus on the bivariate case with cointegrated predictors, where the data were generated according to the following design. ( ) 𝑦𝑡 = 𝛽 𝑋1,𝑡−1 − 𝑎𝑋2,𝑡−1 + 𝑢𝑡 [ ] 𝑐 𝑋𝑡 = 𝐴𝑋𝑡−1 + 𝑒𝑡 𝐃𝐆𝐏 𝟑 ∶ 𝐴 = 𝐼2 − 𝐶∕𝑇 + 1 [1 − 𝑎] , 𝑐2 ( )′ i.i.d. ( ) 𝑢𝑡 , 𝑒𝑡 ∼ 𝑁 03×1 , 𝛺 125

Y. Ren, Y. Tu and Y. Yi

Journal of Empirical Finance 54 (2019) 118–142 Table 1 Rejection frequency for IVX and balanced tests for 𝐻0 : non-predictability (𝛽1 = 𝛽2 = 0). DGP 1 with 𝛼 = 1 − 𝑐∕𝑇 , 𝛽 = 0, nominal size=5%, 10,000 replications. 𝑇

𝑐

𝛼

𝜌 = −0.95

𝜌 = −0.50

𝜌=0

IVX

B

IVX

B

IVX

B

50

0 5 10 20 50

1.000 0.900 0.800 0.600 0.000

0.099 0.059 0.050 0.064 0.046

0.157 0.103 0.121 0.119 0.102

0.081 0.069 0.060 0.058 0.051

0.098 0.099 0.093 0.112 0.123

0.072 0.063 0.059 0.061 0.053

0.089 0.095 0.108 0.122 0.116

250

0 5 10 20 50

1.000 0.980 0.960 0.920 0.800

0.072 0.054 0.048 0.047 0.055

0.077 0.067 0.057 0.073 0.099

0.065 0.058 0.047 0.048 0.049

0.058 0.071 0.077 0.081 0.073

0.049 0.045 0.052 0.057 0.055

0.060 0.069 0.077 0.082 0.075

This table summarizes the rejection frequencies for IVX and balanced tests, testing the null hypothesis 𝐻0 : non-predictability versus the alternative 𝐻1 : predictability. The data are simulated by DGP 1. 𝑇 is the sample size. 𝑐, 𝜌 and 𝛼 are parameters in the DGP. ‘IVX’ and ‘B’ denote the IVX test and our test, respectively.

Table 2 Rejection frequency for IVX and balanced tests for 𝐻0 : non-predictability (𝛽1 = 𝛽2 = 0). DGP 2 with 𝛼 = 1 − 𝑐∕𝑇 , 𝛽1 = −𝛽2 = 0.15, nominal size=5%, 10,000 replications. 𝑇

𝑐

𝛼

𝜌 = −0.95

𝜌 = −0.50

𝜌=0

IVX

B

IVX

B

IVX

B

50

0 5 10 20 50

1.000 0.900 0.800 0.600 0.000

0.084 0.060 0.052 0.084 0.201

1.000 1.000 1.000 1.000 1.000

0.064 0.064 0.073 0.088 0.205

0.808 0.808 0.803 0.794 0.822

0.050 0.070 0.083 0.094 0.179

0.237 0.267 0.284 0.302 0.322

250

0 5 10 20 50

1.000 0.980 0.960 0.920 0.800

0.065 0.049 0.051 0.070 0.135

1.000 1.000 1.000 1.000 1.000

0.064 0.048 0.066 0.072 0.119

1.000 1.000 1.000 1.000 0.999

0.058 0.048 0.047 0.084 0.108

0.677 0.673 0.724 0.716 0.769

This table summarizes the rejection frequencies for IVX and balanced tests, testing the null hypothesis 𝐻0 : non-predictability versus the alternative 𝐻1 : predictability.. The data are simulated by DGP 2. 𝑇 is the sample size. 𝑐, 𝜌 and 𝛼 are parameters in the DGP. ‘IVX’ and ‘B’ denote the IVX test and our test, respectively.

where [1 − 𝑎] is the cointegration vector. We consider three sets of covariance matrices for 𝛺: [ [ [ ] ] ] 1

𝛺1 = −0.95 −0.98

−0.95 1 0.95

−0.98 0.95 1

1

𝛺2 = −0.98 −0.95

−0.98 1 0.98

−0.95 0.98 1

1

𝛺3 = −0.90 −0.95

−0.90 1 0.90

−0.95 0.90 1

.

We set 𝐶 = diag{𝑐, 𝑐} and let 𝑐 ∈ {0, 5, 10, 20, 50}, 𝑐1 = −0.5, 𝑐2 = 1, and 𝑎 = 0.5. The 𝛽 is set the same as that in DGP 1. We first investigate the size performance of both tests by estimating the rejection frequencies with 𝛽 = 0 (i.e., 𝑏 = 0). The results are collected in Table 3. We observe that the IVX test is seriously oversized for all parameter specifications. The new test is oversized given a small sample size but tends to have proper size as the sample size increases. We next study the power performance by evaluating the rejection frequencies under the local alternatives Eq. (10). The power plots are presented in Figs. 5–6. The IVX test is found to have no power (if adjusted for proper size) in such a setup.6 In contrast, the balanced test is powerful in detecting the predictability. Furthermore, we observe that the power generally increases as the local parameter 𝑏 increases. 3.2.2. Non-cointegrated predictors We next consider a case with non-cointegrated multiple predictors. The data are generated according to the following design.

𝐃𝐆𝐏 𝟒 ∶

𝑦𝑡 = 𝛽1 𝑋1,𝑡−1 + 𝛽2 𝑋1,𝑡−2 + 𝛾1 𝑋2,𝑡−1 + 𝛾2 𝑋2,𝑡−2 + 𝑢𝑡 𝑋𝑡 = 𝐴𝑋𝑡−1 + 𝑒𝑡 , 𝑋𝑡 = (𝑋1,𝑡 , 𝑋2,𝑡 )′ ( )′ i.i.d. ( ) 𝑢𝑡 , 𝑒𝑡 ∼ 𝑁 03×1 , 𝛺

𝐴 = 𝐼2 − 𝐶∕𝑇 .

6 We emphasize that the IVX test is designed for DGP 1 but not for DGP 3. The IVX procedure should be adapted to this setup to obtain a fair comparison. Nevertheless, the point here is that if the predictors only have a transient effect, as in DGP 3, then the predictive regression based IVX test would lose power.

126

Y. Ren, Y. Tu and Y. Yi

Journal of Empirical Finance 54 (2019) 118–142

Fig. 3. Power curves for IVX and Balanced 𝑡 Tests, for DGP 2, 𝑇 = 50.

The parameter values for 𝛽1 and 𝛽2 are set the same as those for DGP 2, and those of the other parameters are set the same as for DGP 3. For a convenient presentation of the results, we set 𝛾1 = 𝛽1 and 𝛾2 = 𝛽2 . This design is the multivariate version of DGP 2. We use this design to check whether the IVX based test would have competitive power with respect to the newly proposed test for the case of noncointegrated multiple predictors. We report the estimated rejection frequencies for both tests for 𝑏 = 0 in Table 4. We note that under 𝑏 = 0, the predictors have only a transient effect on the stock return as 𝛽1 = −𝛽2 = 0.15. That is, the differenced predictors rather than the one-lag predictors exhibit predictability. We find from Table 4 that the IVX test has power quite close to the nominal level, especially for small values of 𝑐. However, our balanced Wald test enjoys power 1 for all the specifications of 𝛺. To check whether the IVX test would have power in this setting, we plot the rejection frequencies against the values of 𝑏 in Figs. 7–8. The power of the IVX test begins to increase as 𝑏 increases. Nevertheless, our new test is generally more powerful than the IVX test for all the parameter specifications. Furthermore, the power of the IVX test does not seem to increase as 𝑐 increases. Especially when 𝑇 = 50 and 𝑐 = 50, i.e., when the regressors 𝑥𝑡 are white noises, we observe that the IVX test has power far from 1 for all values of 𝑏, as indicated in Fig. 7, Panels (e), (j) and (o). The Wald test continues to have power 1 and seems rather robust to the choice of parameter values. 127

Y. Ren, Y. Tu and Y. Yi

Journal of Empirical Finance 54 (2019) 118–142

Fig. 4. Power curves for IVX and Balanced 𝑡 Tests, for DGP 2, 𝑇 = 250.

4. Empirical results 4.1. Data We collect monthly and quarterly U.S. data from 1927 to 2015. We use S&P 500 value-weighted log returns as a proxy for the market returns. Following Welch and Goyal (2008),7 we use the following ten highly persistent variables as potential predictors : T-bill rate (tbl), long-term yield (lty), term spread (tms), default yield spread (dfy), dividend–price ratio (d/p), dividend yield (d/y), earnings–price ratio (e/p), dividend payout ratio (d/e), book-to-market value ratio (b/m), and net equity expansion (ntis). We try to analyze the predictability of the excess market returns by these variables. In addition to this full sample, following Kostakis et al. (2015), we also analyze the subsample from 1952 to 2015 at monthly and quarterly frequencies. For each predictor, we first run the first-order autoregressive regression: 𝑥𝑡+1 = 𝜇𝑥 + 𝛼𝑥𝑡 + 𝑒𝑡 . The estimation results are reported in Tables 5 and 6 for monthly and quarterly data, respectively. The second column is the intercept estimator, 𝜇̂𝑥 , which is followed by the corresponding 95% confidence interval in the third column. As we can see, most predictors have significant 𝜇̂𝑥 , regardless of which sample and which frequency of the data is analyzed. Therefore, it is reasonable to assume that the predictor is an AR(1) process with a drift. Moreover, the estimates of the autoregression coefficient are very close to 1, indicating the high persistence 7

The data are available at Goyal’s web site: http://www.hec.unil.ch/agoyal/. 128

Y. Ren, Y. Tu and Y. Yi

Journal of Empirical Finance 54 (2019) 118–142 Table 3 Rejection frequency for IVX and balanced tests for 𝐻0 : non-predictability (𝛽1 = 𝛽2 = 0). DGP 3 with 𝑎 = 0.5, 𝛽 = 0, nominal size=5%, 10,000 replications. 𝑇

𝑐

1 − 𝑐∕𝑇

𝛺 = 𝛺1

𝛺 = 𝛺2

𝛺 = 𝛺3

IVX

B

IVX

B

IVX

B

50

0 5 10 20 50

1.000 0.900 0.800 0.600 0.000

0.165 0.111 0.137 0.141 0.111

0.109 0.123 0.117 0.108 0.084

0.156 0.124 0.117 0.150 0.109

0.111 0.093 0.107 0.077 0.064

0.180 0.142 0.157 0.140 0.088

0.111 0.112 0.119 0.116 0.093

250

0 5 10 20 50

1.000 0.980 0.960 0.920 0.800

0.197 0.178 0.178 0.203 0.244

0.062 0.058 0.073 0.061 0.093

0.196 0.209 0.196 0.201 0.231

0.081 0.078 0.078 0.056 0.073

0.199 0.190 0.193 0.202 0.234

0.072 0.064 0.091 0.076 0.097

This table summarizes the rejection frequencies for IVX and balanced tests, testing the null hypothesis 𝐻0 : non-predictability versus the alternative 𝐻1 : predictability. The data are simulated by DGP 3. 𝑇 is the sample size. 𝑐 and 𝛺 are parameters in the DGP. ‘IVX’ and ‘B’ denote the IVX test and our test, respectively.

Table 4 Rejection frequency for IVX and balanced Wald tests for 𝐻0 : non-predictability (𝛽1 = 𝛽2 = 0). DGP 4 with 𝛽1 = −𝛽2 = 0.15, nominal size=5%, 10,000 replications. 𝑇

𝑐

1 − 𝑐∕𝑇

𝛺 = 𝛺1

𝛺 = 𝛺2

𝛺 = 𝛺3

IVX

B

IVX

B

IVX

B

50

0 5 10 20 50

1.000 0.900 0.800 0.600 0.000

0.109 0.050 0.073 0.133 0.376

1.000 1.000 1.000 1.000 1.000

0.092 0.048 0.070 0.117 0.404

1.000 1.000 1.000 1.000 1.000

0.091 0.052 0.069 0.132 0.381

1.000 1.000 1.000 1.000 1.000

250

0 5 10 20 50

1.000 0.980 0.960 0.920 0.800

0.069 0.052 0.059 0.132 0.351

1.000 1.000 1.000 1.000 1.000

0.096 0.058 0.060 0.130 0.332

1.000 1.000 1.000 1.000 1.000

0.078 0.059 0.071 0.123 0.356

1.000 1.000 1.000 1.000 1.000

This table summarizes the rejection frequencies for IVX and balanced tests, testing the null hypothesis 𝐻0 : non-predictability versus the alternative 𝐻1 : predictability. The data are simulated by DGP 4. 𝑇 is the sample size. 𝑐 and 𝛺 are parameters in the DGP. ‘IVX’ and ‘B’ denote the IVX test and our test, respectively.

level of the predictors. The 95% confidence intervals of 𝛼 ̂ (constructed under the hypotheses that 𝛼 < 1) contain 1. We cannot infer whether these predictors are stationary or unit root processes. Furthermore, we conduct the ADF, the PP and the KPSS tests on the predictors. The results are summarized in Table 7. Table 7 reports the statistics of the ADF, the PP and the KPSS tests for every predictor at the monthly and the quarterly frequency. *, ** and *** represent significance at a 10%, 5% and 1% level, respectively. As we can see, the KPSS rejects the null hypothesis of stationarity for all the predictors at the level of 1%, which suggests that there is no predictor displaying obvious stationarity property. 4.2. Predictability tests We apply our proposed test to the data based on the balanced predictive regression. We consider both the univariate predictor and the multivariate predictors. For the purpose of comparison, the test results for IVX in Kostakis et al. (2015) are also reported.8 4.2.1. Univariate predictor We first test for predictability of the monthly data. Table 8 summarizes the results. In Panel A, where the full sample is analyzed, we can see that, except for tms, two estimates of the balanced regression, 𝛽̂1 and 𝛽̂2 , are opposite in sign and 𝛽̂1 + 𝛽̂2 ≈ 0. This is consistent with our theory that the approximate difference of the predictor may contain valuable predictive information for stock returns. 𝛽̂1 and 𝛽̂2 together capture the relationship between the two lags of the predictor and the current stock return. For most predictors, for example d/p, 𝛽̂1 is negative and 𝛽̂2 is positive. The positive 𝛽̂2 reveals that a shock to the two-period lagged predictor would lead to an increase in the current stock return through 𝑥𝑡−2 . This shock could hold up next period due to the persistence of the predictor. However, when 𝛽̂1 is negative, the overall effect of this shock on stock return is offset somehow. If 𝛽̂1 + 𝛽̂2 ≈ 0, the 8 The direct comparison of our method with the IVX may not be fair since these two methods emphasize different perspectives of the predictability test. The IVX method captures the level effect, while our method emphasizes the transient effect.

129

Y. Ren, Y. Tu and Y. Yi

Journal of Empirical Finance 54 (2019) 118–142

Fig. 5. Power curves for IVX and Balanced 𝑡 tests, for DGP 3, 𝑇 = 50.

level effect is negligible. Therefore, as indicated by the data, what matters is the transient effect of the predictor. The 𝑝 values of our test statistics suggest that we should reject the null hypothesis that there is no predictability at 5% significance level for the predictors d/p, b/m and dfy. We can claim that these three predictors can predict stock returns. If the IVX test is applied, we can still reject the null for d/p and b/m. However, dfy has a 𝑝 value slightly higher than 5%. Instead, the IVX test also suggests that d/y, e/p and ntis show some predictive ability, which are not indicated by our balanced regression test. Therefore, comparing our findings with the IVX, there are many differences with respect to which predictors are significant. In Panel B, we consider the subsample from January 1952 to December 2015. For this subsample, we can reject the null for d/e, lty, tbl and ntis at the level of 5%, which means that stock returns display the predictability in the post-1952 data. However, the IVX results imply that no predictor can predict stock returns in the subsample. This strikingly different finding highlights the importance of our balanced regression test. We subsequently estimate the univariate regressions using quarterly data. Table 9 summarizes the test results. In Panel A, the statistics of our balanced regression suggest that we can reject the null for e/p, b/m, dfy and ntis at the 5% significance level. However, for these predictors, the IVX test cannot reveal the predictive ability for e/p and dfy. Instead, the IVX test shows that d/y can predict stock returns. In Panel B, there are interesting findings for the subsample from 1952Q1 to 2015Q4. We observe that the predictability of stock returns becomes much weaker. Both our test statistics and the IVX test statistics have 𝑝 values larger than 5% for all the predictors, which means that stock returns cannot be predicted by any univariate predictor in this subsample. 130

Y. Ren, Y. Tu and Y. Yi

Journal of Empirical Finance 54 (2019) 118–142

Fig. 6. Power curves for IVX and Balanced 𝑡 tests, for DGP 3, 𝑇 = 250.

4.2.2. Multivariate predictors As emphasized in Kostakis et al. (2015), some predictors cannot capture the variation of stock returns alone and should be used jointly with other predictors. It is necessary to explore the predictability of stock returns through multivariate regressions. In particular, we use the following five combinations: (1) d/p and tbl (Ang and Bekaert, 2007); (2) d/p, tbl, dfy, and tms (Ferson and Schadt, 1996); (3) d/p and b/m (Kothari and Shanken, 1997); (4) d/p and d/e (Lamont, 1998); and (5) e/p, b/m and tms (Campbell and Vuolteenaho, 2004). Table 10 reports the results for the monthly data. Panel A is our balanced regression on the full sample with the joint Wald test. The 𝑝 values suggest that we should reject the null hypothesis that there is no predictability for all combinations at the 5% significance level. This finding is not surprising because these five combinations contain either d/p or b/m, which already exhibited predictive ability in the univariate test. However, in Panel B, where the IVX test is reported, only the combination of d/p and b/m; and the combination of e/p, b/m and tms can be rejected. When the subsample is analyzed, as shown in Panel C and Panel D, our test rejects the null for the combination of d/p and tbl; and the combination of d/p, tbl, dfy and tms, while the IVX cannot reject any combination. Table 11 summarizes the multivariate regression results for the quarterly data. For the full sample, our test rejects the null for the combination of d/p, tbl, dfy and tms; the combination of d/p and b/m; and the combination of e/p, b/m and tms. This is consistent with our univariate test, where either dfy or b/m is found to have predictive ability. The IVX only rejects the null for the combination of d/p and b/m. For the subsample, the IVX can reject the combination of d/p, tbl, dfy and tms at the 5% significance level. Our test cannot reject any combination. The predictability becomes weaker in the recent quarterly data. 131

Y. Ren, Y. Tu and Y. Yi

Journal of Empirical Finance 54 (2019) 118–142

Fig. 7. Power curves for IVX and Balanced Wald Tests, for DGP 4, 𝑇 = 50.

4.3. Out-of-sample forecasting As stated in Welch and Goyal (2008), the out-of-sample performance is important for model diagnostic. In this section, we analyze the out-of-sample forecasting performances for our model. The out-of-sample period is the last 15 years (or 30 years). After we obtain the forecast on the equity premium of one period ahead, we are rolling and/or expanding the window to recursively increase the estimation period. We report the mean squared error (MSE) to evaluate the performance of every predicting model. Table 12 summarizes the results for the univariate predictors. The first column is the predictor, followed by the MSE obtained by two forecasting methods at two frequencies. We can see that the performances of the predictors are quite similar, no matter which data frequency is employed. In addition, the MSE values also imply that these predictors display very poor ability in the out-of-sample forecasting, consistent with the finding in Welch and Goyal (2008). Similar conclusion can be also drawn for the multivariate model according to the results in Table 13. 4.4. Long-term predictability In addition to the predictability of one period ahead, the long-term predictability also draws much attention in the empirical studies. Following the literature, the long-term predictability can be analyzed as 𝑦𝑡+ℎ + 𝑦𝑡+ℎ−1 + ⋯ + 𝑦𝑡+1 = 𝜇𝑦 + 𝛽𝑥𝑡 + 𝑢𝑡+1 , where 𝑥𝑡 = 𝜇𝑥 + 𝑣𝑡 , 𝑣𝑡 = 𝛼𝑣𝑡−1 + 𝑒𝑡 , and ℎ is the prediction horizon. 132

Y. Ren, Y. Tu and Y. Yi

Journal of Empirical Finance 54 (2019) 118–142

Fig. 8. Power curves for IVX and Balanced Wald Tests, for DGP 4, 𝑇 = 250.

We apply our method to run the univariate and the multivariate balanced predictive regressions for monthly and quarterly data. For monthly data, the prediction horizons are 12, 36 and 60; for quarterly data, the prediction horizons are 4, 12 and 20. The results are summarized in Table 14. Table 14 reports the 𝑝 values of the test on the null hypothesis that there is no predictability. Panel A is obtained based on the univariate predictor model. For monthly data, 8 out of 10 predictors have the smallest 𝑝 values when the predicting horizon is 60. This number increases to 9 when quarterly data is analyzed and the predicting horizon is 20. Panel B summarizes the 𝑝 values of the predictability tests for the multivariate-predictor model. Except for the quarterly data in Kothari and Shanken (1997), all the models have the smallest 𝑝 values when the predicting horizon is the longest. This implies that the predictors display stronger predictability in longer horizon. This is consistent with the finding in Lettau and Ludvigson (2001). 5. Conclusion This paper proposes a balanced predictive regression model to eliminate the imbalance in the persistence level of the conventional predictive regression model. In our model, we include the second lags in addition to the first lags of the predictors. By doing so, when the predictors are highly persistent, the differences of these two lags can serve as predictors. Balance is achieved through the ‘‘cointegration’’ between the two highly persistent lags. Moreover, even if the predictors are stationary, our model remains effective. 133

Y. Ren, Y. Tu and Y. Yi

Journal of Empirical Finance 54 (2019) 118–142 Table 5 Estimation results of autoregressive regressions: Monthly. Variable

𝜇̂𝑥

95% CI

𝛼 ̂

95% CI

Panel A: January 1927–December Dividend payout ratio (d/e) Long-term yield (lty) Dividend yield (d/y) Dividend–price ratio (d/p) T-bill rate (tbl) Earnings–price ratio (e/p) Book-to-market ratio (b/m) Default yield spread (dfy) Net equity expansion (ntis) Term spread (tms)

2015 −0.0056 0.0002 −0.0248 −0.0251 0.0002 −0.0362 0.0081 0.0003 0.0003 0.0007

[−0.0113, 0.0000] [−0.0001, 0.0005] [−0.0495, −0.0001] [−0.0499, −0.0004] [−0.0001, 0.0005] [−0.0631, −0.0094] [0.0016, 0.0145] [0.0001, 0.0005] [−0.0001, 0.0007] [0.0003, 0.0010]

0.9913 0.9965 0.9929 0.9928 0.9934 0.9870 0.9858 0.9752 0.9805 0.9608

[0.9833, [0.9913, [0.9856, [0.9855, [0.9863, [0.9773, [0.9756, [0.9618, [0.9686, [0.9442,

0.9992] 1.0017] 1.0002] 1.0001] 1.0006] 0.9967] 0.9960] 0.9885] 0.9925] 0.9773]

Panel B: January 1952–December Dividend payout ratio (d/e) Long-term yield (lty) Dividend yield (d/y) Dividend–price ratio (d/p) T-bill rate (tbl) Earnings–price ratio (e/p) Book-to-market ratio (b/m) Default yield spread (dfy) Net equity expansion (ntis) Term spread (tms)

2015 −0.0101 0.0003 −0.0465 −0.0262 0.0004 −0.0319 0.0027 0.0003 0.0002 0.0007

[−0.0192, −0.0010] [−0.0002, 0.0008] [−0.0756, −0.0174] [−0.0526, 0.0003] [−0.0002, 0.0009] [−0.0610, −0.0028] [−0.0019, 0.0072] [0.0001, 0.0005] [−0.0001, 0.0005] [0.0003, 0.0012]

0.9865 0.9948 0.9874 0.9930 0.9914 0.9890 0.9938 0.9708 0.9805 0.9573

[0.9749, [0.9874, [0.9792, [0.9855, [0.9818, [0.9787, [0.9860, [0.9536, [0.9660, [0.9368,

0.9980] 1.0021] 0.9956] 1.0004] 1.0011] 0.9993] 1.0017] 0.9879] 0.9950] 0.9778]

This table presents the estimation results of autoregressive regressions for the following list of financial and economic variables: dividend payout ratio (d/e), long-term yield (lty), dividend yield (d/y), dividend–price ratio (d/p), T-bill rate (tbl), earnings price ratio (e/p), book-to-market value ratio (b/m), default yield spread (dfy), net equity expansion (ntis) and term spread (tms). The estimated intercepts, 𝜇̂𝑥 , and the autoregressive coefficients, 𝛼 ̂, are reported with the corresponding 95% confidence intervals. The full sample is from January 1927 to December 2015. Table 6 Estimation results of autoregressive regressions: Quarterly. Variable

𝜇̂𝑥

95% CI

𝛼 ̂

95% CI

Panel A: 1927Q1–2015 Q4 Dividend payout ratio (d/e) Long-term yield (lty) Dividend yield (d/y) Dividend–price ratio (d/p) T-bill rate (tbl) Earnings–price ratio (e/p) Book-to-market ratio (b/m) Default yield spread (dfy) Net equity expansion (ntis) Term spread (tms)

−0.0452 0.0006 −0.0934 −0.0964 0.0012 −0.1782 0.0337 0.0012 0.0010 0.0026

[−0.0729 , −0.0175] [−0.0004 , 0.0017] [−0.1757, −0.0112] [−0.1807, −0.0121] [−0.0002, 0.0025] [−0.2810, −0.0753] [0.0109, 0.0566] [0.0006, 0.0018] [−0.0001, 0.0022] [0.0014, 0.0037]

0.9292 0.9872 0.9728 0.9720 0.9641 0.9354 0.9411 0.8980 0.9318 0.8526

[0.8905, [0.9700, [0.9484, [0.9472, [0.9357, [0.8981, [0.9054, [0.8519, [0.8933, [0.7983,

0.9679] 1.0044] 0.9971] 0.9969] 0.9925] 0.9727] 0.9769] 0.9441] 0.9703] 0.9069]

Panel B: 1952Q1–2015 Q4 Dividend payout ratio (d/e) Long-term yield (lty) Dividend yield (d/y) Dividend–price ratio (d/p) T-bill rate (tbl) Earnings–price ratio (e/p) Book-to-market ratio (b/m) Default yield spread (dfy) Net equity expansion (ntis) Term spread (tms)

−0.0777 0.0012 −0.1445 −0.0911 0.0021 −0.1682 0.0088 0.0012 0.0007 0.0028

[−0.1214, −0.0340] [−0.0005, 0.0028] [−0.2345, −0.0545] [−0.1791, −0.0032] [−0.0000, 0.0042] [−0.2856, −0.0508] [−0.0054, 0.0231] [0.0006, 0.0019] [−0.0003, 0.0018] [ 0.0013, 0.0043]

0.8946 0.9811 0.9605 0.9753 0.9523 0.9410 0.9800 0.8766 0.9326 0.8399

[0.8395, [0.9569, [0.9351, [0.9505, [0.9137, [0.8996, [0.9555, [0.8169, [0.8861, [0.7728,

0.9497] 1.0053] 0.9860] 1.0001] 0.9909] 0.9825] 1.0045] 0.9364] 0.9791] 0.9071]

This table presents the estimation results of autoregressive regressions for the following list of financial and economic variables: dividend payout ratio (d/e), long-term yield (lty), dividend yield (d/y), dividend–price ratio (d/p), T-bill rate (tbl), earnings price ratio (e/p), book-to-market value ratio (b/m), default yield spread (dfy), net equity expansion (ntis) and term spread (tms). The estimated intercepts, 𝜇̂𝑥 , and the autoregressive coefficients, 𝛼 ̂, are reported with the corresponding 95% confidence intervals. The full sample is from 1927Q1 to 2015Q4.

We derive the asymptotic properties of the estimators in our model and propose easy-to-implement procedures to test for predictability. We apply our model to prediction analysis and find that there is always predictability for monthly data. Many predictors and their combinations can predict stock returns. For quarterly data, the predictability still exists, while the signal becomes weak in recent periods. Predictability almost disappears in the post-1952 quarterly data. 134

Y. Ren, Y. Tu and Y. Yi

Journal of Empirical Finance 54 (2019) 118–142

Table 7 Statistics of the unit-root tests. Testing method

Dividend payout ratio (d/e) Long-term yield (lty) Dividend yield (d/y) Dividend–price ratio (d/p) T-bill rate (tbl) Earnings–price ratio (e/p) Book-to-market ratio (b/m) Default yield spread (dfy) Net equity expansion (ntis) Term spread (tms)

Monthly

Quarterly

ADF

PP

KPSS

ADF

PP

KPSS

−2.161 −1.315 −1.921 −1.946 −1.804 −2.624* −2.729* −3.653*** −3.205** −4.646***

−4.359*** −1.277 −2.066 −2.113 −2.135 −3.649*** −2.752* −3.715*** −3.871*** −4.816***

4.34*** 5.12*** 8.25*** 8.24*** 3.41*** 2.98*** 4.42*** 1.49*** 2.55*** 1.61***

−3.602** −1.468 −2.197 −2.212 −2.488 −3.409** −3.239** −4.351*** −3.485*** −5.336***

−4.537** −1.351 −2.134 −2.25 −2.265 −3.706*** −19.026** −4.109*** −4.298*** −5.309***

2.19*** 2.29*** 3.78*** 3.78*** 1.57*** 1.47*** 2.05*** 0.734** 1.32*** 0.841***

This table presents the statistics of the unit-root tests for the following list of financial and economic variables: dividend payout ratio (d/e), long-term yield (lty), dividend yield (d/y), dividend–price ratio (d/p), T-bill rate (tbl), earnings price ratio (e/p), bookto-market value ratio (b/m), default yield spread (dfy), net equity expansion (ntis) and term spread (tms). ‘ADF’ is the Augmented Dickey–Fuller unit-root test, ‘PP’ is the Phillips–Perron unit-root test, and ‘KPSS’ is the Kwiatkowski–Phillips–Schmidt–Shin test. *Significance at a 10% level. **Significance at a 5% level. ***Significance at a 1% level.

There are a few directions for further research. First, we have only considered linear predictive regression models. Nonlinearity could be introduced to check whether it is helpful in detecting predictability when the linear model fails. Second, our discussion about predictability focuses on testing whether the predictive regression coefficient is zero. The alternative approach to examining the predictability is to check whether the predictor helps to improve the accuracy of the forecast, e.g., the mean squared forecast error, when it is a local to unit root process. These issues are left for future research. Appendix A. Proofs A.1. The general theoretical setting and the main lemma We provide a formal account of the balanced predictive regressions with ‘‘cointegrated’’ predictors{in order to derive our asymp{ } ( )′ } totic results in a unified theoretical framework. Consider the predictability of stock returns 𝑦𝑡 by 𝑋𝑡−1 = 𝑋1,𝑡−1 , … , 𝑋𝑘,𝑡−1 , which follows a vector autoregressive VAR(1) model: ( ) ( [ 2 ]) 𝑦𝑡 = 𝜇𝑦 + 𝛽 ′ 𝑋𝑡−1 + 𝑢𝑡 , 𝑢𝑡 i.i.d. 𝜎 𝜌′ ∼ 0𝑘+1 , 𝑢 . (11) 𝑋𝑡 = 𝜇𝑥 + 𝑣𝑡 , 𝑣𝑡 = 𝐴𝑣𝑡−1 + 𝑒𝑡 , 𝑒𝑡 𝜌 𝛺𝑒𝑒 The vector time series 𝑋𝑡−1 contains all the one-period lagged predictors of interest, and 𝛺𝑒𝑒 does not need to be of full rank. We will look into this issue further in Assumption 2. Assumption 1. Each series of {𝑋𝑡 } is highly persistent. There exists 0 < 𝑟 < 𝑘 cointegration relationship in 𝑋𝑡 , i.e. ∃ (𝑘 × 𝑟) matrix 𝑎, ⃗ which is of full column rank, such that the (𝑟 × 1) vector time series 𝑎⃗′ 𝑋𝑡 is stationary. The matrix 𝑎⃗ collects 𝑟 linearly independent cointegrating vectors, and 𝑎⃗′ 𝑋𝑡 represents the 𝑟 long-run equilibrium relationships of the 𝑘 time series. Since {𝑋𝑡 } is generally chosen according to economic or financial theory, without loss of generalization, we assume each row of 𝑎⃗ is nonzero. We normalize 𝑎, ⃗ such that 𝑎⃗′ 𝑎⃗ = 𝐼𝑟 . Remark 3. Since we consider highly persistent {𝑋𝑡 }, the autoregressive matrix 𝐴 in Eq. (11) often depends on 𝑇 (e.g., Eqs. (14) and (15)). Denote 𝐴𝑙𝑖𝑚 = lim𝑇 →∞ 𝐴. Although the persistence level of {𝑋𝑡 } is determined by 𝐴 (or the value of 𝛾 ∈ (0, 1]), the cointegration structure (e.g. the number of highly persistent common trends) is decided by 𝐴𝑙𝑖𝑚 only, therefore Assumption 1 essentially imposes restrictions on 𝐴𝑙𝑖𝑚 . The Granger representation theorem implies that, the matrix 𝐴𝑙𝑖𝑚 − 𝐼𝑘 can be decomposed [ ] into 𝐴𝑙𝑖𝑚 − 𝐼𝑘 = 𝜋 𝑎⃗′ , where 𝜋 is a (𝑘 × 𝑟) matrix. In model (11), Assumption 1 requires : (i) 𝑑𝑒𝑡 𝐼𝑘 − 𝜆𝐴𝑙𝑖𝑚 = 0 has roots on or outside the unit circle; (ii) the matrix 𝐴𝑙𝑖𝑚 − 𝐼𝑘 = 𝜋 𝑎⃗′ has rank 𝑟; (iii) the matrix 𝜋⊥′ 𝑎⃗⊥ is nonsingular, where 𝑎⃗⊥ and 𝜋⊥ are 𝑘 × (𝑘 − 𝑟) matrices satisfying 𝑎⃗′ 𝑎⃗⊥ = 0 and 𝜋 ′ 𝜋⊥ = 0. Then both 𝑎⃗′⊥ 𝑋𝑡 and 𝜋⊥′ 𝑋𝑡 can represent the 𝑘 − 𝑟 highly persistent common trends (see, e.g., Theorem 4.2 and Corollary 4.4 of Johansen (1995) for details), having the same persistence level as 𝑋𝑡 . ( ) ( ) Assumption 2. (i) 𝑉 𝑎𝑟 𝜋⊥′ 𝑒𝑡 is of full rank 𝑘 − 𝑟; (ii) 𝛴 = 𝑉 𝑎𝑟 𝑎⃗′ 𝑋𝑡 is of full rank 𝑟.

( ) Assumption 2 is quite different from the standard assumption in the cointegration literature, which requires 𝛺𝑒𝑒 = 𝑉 𝑎𝑟 𝑒𝑡 to have full rank 𝑘. 𝛺𝑒𝑒 being of full rank is a sufficient condition for Assumption 2, but not vice versa. Assumption 2 allows us to include self-cointegrated regressors, Sections 2.2 and 2.3 in this general setting. 135

Y. Ren, Y. Tu and Y. Yi

Journal of Empirical Finance 54 (2019) 118–142

Table 8 Estimation results of balanced regressions: Monthly. Predictor

Dividend payout ratio (d/e) Long-term yield (lty) Dividend yield (d/y) Dividend–price ratio (d/p) T-bill rate (tbl) Earnings–price ratio (e/p) Book-to-market ratio (b/m) Default yield spread (dfy) Net equity expansion (ntis) Term spread (tms)

Dividend payout ratio (d/e) Long-term yield (lty) Dividend yield (d/y) Dividend–price ratio (d/p) T-bill rate (tbl) Earnings–price ratio (e/p) Book-to-market ratio (b/m) Default yield spread (dfy) Net equity expansion (ntis) Term spread (tms)

Balanced regression

IVX regression

𝛽̂1

𝛽̂2

Panel A: January −0.0591 (−1.5409) −1.3282 (−1.9229) 0.0364 (1.2028) −0.0681 (−2.2657) −0.6350 (−1.3915) −0.0194 (−0.7807) −0.1439 (−3.9095) −1.7824 (−1.6529) −0.4056 (−1.2089) 0.1234 (0.2671)

1927–December 0.0603 (1.5713) 1.2542 (1.8159) −0.0279 (−0.9234) 0.0762 (2.5339) 0.5419 (1.1875) 0.0285 (1.1463) 0.1643 (4.4643) 2.2444 (2.0813) 0.2622 (0.7815) 0.0640 (0.1386)

Panel B: January −0.0605 (−1.9499) −1.5375 (−2.8716) 0.0153 (0.4773) −0.0405 (−1.1388) −1.1482 (−3.1780) 0.0205 (0.8235) −0.0800 (−1.4526) 0.6723 (0.4788) −0.8668 (−2.1476) 0.5270 (1.4344)

1952–December 0.0655 (2.1090) 1.4675 (2.7409) −0.0086 (−0.2677) 0.0469 (1.3214) 1.0503 (2.9071) −0.0169 (−0.6790) 0.0842 (1.5281) −0.4180 (−0.2977) 0.8407 (2.0828) −0.3182 (−0.8663)

𝑅2

𝐴̂𝐼𝑉 𝑋

IVX-Wald

𝑝-value

2015 0.1161

0.0157

0.0012

0.0544

0.8156

0.0545

0.0181

−0.0744

1.3672

0.2423

0.2290

0.0193

0.0095

4.9409

0.0262

0.0113

0.0232

0.0100

4.7778

0.0288

0.1641

0.0177

−0.0891

2.5394

0.1110

0.2517

0.0189

0.0098

5.2140

0.0224

0.0000

0.0387

0.0223

11.5455

0.0007

0.0374

0.0200

0.4716

3.6141

0.0573

0.2267

0.0186

−0.1655

5.1923

0.0227

0.7894

0.0154

0.1970

2.0536

0.1518

2015 0.0349

0.0252

0.0058

1.2329

0.2668

0.0041

0.0307

−0.0694

1.2405

0.2654

0.6332

0.0229

0.0099

1.1127

0.2915

0.1864

0.0245

0.0083

1.9361

0.1641

0.0015

0.0355

−0.0892

2.8588

0.0909

0.4102

0.0207

0.0030

0.5990

0.4389

0.1265

0.0221

0.0043

0.3919

0.5313

0.6321

0.0196

0.2153

0.3396

0.5601

0.0317

0.0246

−0.0103

0.0133

0.9081

0.1514

0.0252

0.1960

2.9787

0.0844

𝑝-value

This table presents the estimation and the test results of the balanced regression and the IVX for the following list of financial and economic variables: dividend payout ratio (d/e), long-term yield (lty), dividend yield (d/y), dividend–price ratio (d/p), T-bill rate (tbl), earnings price ratio (e/p), book-to-market value ratio (b/m), default yield spread (dfy), net equity expansion (ntis) and term spread (tms). 𝛽̂1 and 𝛽̂2 are the estimators of the balanced regression. The corresponding 𝑡-statistics are reported in the parentheses. 𝐴̂𝐼𝑉 𝑋 is the IVX estimator. ‘IVX-Wald’ is the statistic for the IVX predictive test.

We the main lemma in this paper, and it can be used to prove the theorems in Sections 2.2 and 2.3. Denote ( now present )′ ( )′ ̂ 𝛽̂ = 𝛽̂1 , … , 𝛽̂𝑘 as the OLS estimator of 𝛽 = 𝛽1 , … , 𝛽𝑘 from Eq. (11). Lemma 1 summarizes the asymptotic properties of 𝛽. Lemma 1.

{( ) } 𝑦𝑡 , 𝑋𝑡 ∶ 1 ≤ 𝑡 ≤ 𝑇 are generated from Eq. (11). Let Assumptions 1 and 2 hold. Then, as 𝑇 → ∞

)  √ ( 𝑇 𝛽̂ − 𝛽 ⟶ 𝑎𝜁 ⃗ , ( ) where 𝑎⃗ is defined in Assumption 1, 𝜁 ∼  0, 𝜎𝑢2 𝛴 −1 is defined in Eq. (13), 𝜎𝑢2 is defined in Eq. (11), and 𝛴 is defined in Assumption 2(ii). ̂ Here 𝑎⃗ is a (𝑘 × 𝑟) nonrandom matrix, and 𝜁 is a (𝑟 × 1) √random vector, therefore the least squares estimator 𝛽 is asymptotically degenerate (normal) and converges at the standard rate, 𝑇 . Remark 4. If 𝑦𝑡 (e.g. stock returns) is stationary, and the predictors 𝑋𝑡 are highly persistent, then the data imply 𝛽 = 𝑎⃗ × 𝑏0 , where 𝑏0 is a (𝑟 × 1) vector. In other words, model (11) and Assumption 1 suggest that the 𝑟 long-run equilibrium relationships 𝑎⃗′ 𝑋𝑡−1 136

Y. Ren, Y. Tu and Y. Yi

Journal of Empirical Finance 54 (2019) 118–142

Table 9 Estimation results of balanced regressions: Quarterly. Predictor

Balanced regression 𝛽̂1

Dividend payout ratio (d/e) Long-term yield (lty) Dividend yield (d/y) Dividend–price ratio (d/p) T-bill rate (tbl) Earnings–price ratio (e/p) Book-to-market ratio (b/m) Default yield spread (dfy) Net equity expansion (ntis) Term spread (tms)

Dividend payout ratio (d/e) Long-term yield (lty) Dividend yield (d/y) Dividend–price ratio (d/p) T-bill rate (tbl) Earnings–price ratio (e/p) Book-to-market ratio (b/m) Default yield spread (dfy) Net equity expansion (ntis) Term spread (tms)

IVX regression

𝛽̂2

Panel A: 1927Q1–2015Q4 −0.0349 0.0503 (−0.7338) (1.0584) −1.6513 1.4479 (−1.2628) (1.1073) 0.0461 −0.0193 (0.8190) (−0.3438) 0.1024 −0.0729 (1.8930) (−1.3480) −0.7305 0.4685 (−1.0283) (0.6594) 0.0885 −0.0625 (2.2594) (−1.5947) 0.2575 −0.1847 (4.0481) (−2.9034) 6.5691 −4.9227 (3.5343) (−2.6485) −1.4034 0.8084 (−2.1762) (1.2536) 0.5010 0.0648 (0.5826) (0.0754) Panel B: 1952Q1–2015Q4 −0.0202 0.0415 (−0.5698) (1.1723) −1.6765 1.5231 (−1.8045) (1.6394) 0.0777 −0.0540 (1.3539) (−0.9412) −0.0758 0.0984 (−1.2536) (1.6274) −0.7695 0.5118 (−1.5016) (0.9986) 0.0060 0.0042 (0.1748) (0.1225) −0.1691 0.1902 (−1.6814) (1.8912) −1.6729 3.0162 (−0.7452) (1.3435) −1.0018 0.9176 (−1.3779) (1.2621) 0.5727 0.1029 (0.9163) (0.1646)

𝑝-value

𝑅2

𝐴̂𝐼𝑉 𝑋

IVX-Wald

𝑝-value

0.2899

0.0363

0.0187

0.9120

0.3396

0.2067

0.0384

−0.1905

0.6836

0.4084

0.4128

0.0441

0.0302

4.1345

0.0420

0.0584

0.0530

0.0313

4.1807

0.0409

0.3038

0.0390

−0.2444

1.4027

0.2363

0.0239

0.0510

0.0218

1.9480

0.1628

0.0001

0.0911

0.0611

6.3493

0.0117

0.0004

0.0676

0.9574

1.0220

0.3120

0.0295

0.0564

−0.6506

5.6041

0.0179

0.5602

0.0362

0.6511

1.3387

0.2473

0.2411

0.0572

0.0274

2.1905

0.1389

0.0712

0.0618

−0.1512

0.5304

0.4665

0.1758

0.0656

0.0374

1.8137

0.1781

0.1036

0.0687

0.0332

2.7065

0.0999

0.1332

0.0633

−0.2419

1.8028

0.1794

0.8612

0.0509

0.0103

0.6177

0.4319

0.0586

0.0641

0.0270

1.2657

0.2606

0.1791

0.0573

1.7824

1.6758

0.1955

0.1682

0.0552

0.0051

0.0003

0.9870

0.3595

0.0616

0.7173

2.8284

0.0926

This table presents the estimation and the test results of the balanced regression and the IVX for the following list of financial and economic variables: dividend payout ratio (d/e), long-term yield (lty), dividend yield (d/y), dividend–price ratio (d/p), T-bill rate (tbl), earnings price ratio (e/p), book-to-market value ratio (b/m), default yield spread (dfy), net equity expansion (ntis) and term spread (tms). 𝛽̂1 and 𝛽̂2 are the estimators of the balanced regression. The corresponding 𝑡-statistics are reported in the parentheses. 𝐴̂𝐼𝑉 𝑋 is the IVX estimator. ‘IVX-Wald’ is the statistic of the IVX predictive test.

could function as the potential predictors. However, the asymptotic properties of the least squares estimator 𝛽̂ are free of the true value 𝛽. Remark 5. [

𝑇 ∑ 𝑡=2

Let 𝜎 ̂𝑢2 be any consistent estimate of 𝜎𝑢2 , then ]−1 ′ 𝑋𝑡−1,𝜇̄ 𝑋𝑡−1, 𝜇̄

𝜎 ̂𝑢2 =

( ) 𝑎𝛴 ⃗ −1 𝑎⃗′ 2 𝜎𝑢 + 𝑂𝑝 𝑇 −1−𝛾 , 𝑇

(12)

where 𝑋𝑡−1,𝜇̄ is the sample mean-corrected series of 𝑋𝑡−1 . Eq. (12) can be shown in a similar manner as the proof of Lemma 1. [∑ ]−1 𝑇 ′ ̂ Consequently, under the null Therefore, 𝜎 ̂𝑢2 provides a consistent estimator of the asymptotic variance of 𝛽. 𝑡=2 𝑋𝑡−1,𝜇̄ 𝑋𝑡−1,𝜇̄ 𝐻0 ∶ 𝛽 = 0, the usual 𝑡 statistic computed from any statistical software is asymptotically standard normal. For any time series 𝜉𝑡 , 𝑡 = 3, … , 𝑇 , we denote the corresponding sample mean-corrected series and its lag series as 𝜉𝑡,𝜇̄ = 𝜉𝑡 − 𝜉 (1) , ∑ ∑ ∑ 𝜉𝑡−1,𝜇̄ = 𝜉𝑡−1 − 𝜉 (0) , and 𝜉𝑡−2,𝜇̄ = 𝜉𝑡−2 − 𝜉 (−1) , where 𝜉 (1) = 𝑇 1−2 𝑇𝑠=3 𝜉𝑠 , 𝜉 (0) = 𝑇 1−2 𝑇𝑠=3 𝜉𝑠−1 , 𝜉 (−1) = 𝑇 1−2 𝑇𝑠=3 𝜉𝑠−2 . 137

Y. Ren, Y. Tu and Y. Yi

Journal of Empirical Finance 54 (2019) 118–142

Table 10 Estimation results of multivariate regressions : Monthly. Model

d/p

Ang and Bekaert (2007) Ferson and Schadt (1996) Kothari and Shanken (1997) Lamont (1998) Campbell and Vuolteenaho (2004)

Panel A: −0.06 −0.05 0.09 −0.06 –

e/p

Ang and Bekaert (2007) Ferson and Schadt (1996) Kothari and Shanken (1997) Lamont (1998) Campbell and Vuolteenaho (2004)

b/m

d/e

tbl

dfy

tms

𝑊𝑇

𝑝-value

1927–December 2015 −0.55 – – −1.50 −1.79 −1.08 – – – – – – – – 0.20

7.17 11.39 23.31 8.78 25.27

0.028 0.023 0.000 0.012 0.000

Panel B: IVX January 1927–December 0.001 – – – 0.01 – – – −0.005 – 0.03 – 0.01 – – −0.006 – 0.003 0.02 –

2015 −0.10 −0.09 – – –

4.11 8.15 12.44 4.08 13.93

0.130 0.086 0.002 0.130 0.003

Ang and Bekaert (2007) Ferson and Schadt (1996) Kothari and Shanken (1997) Lamont (1998) Campbell and Vuolteenaho (2004)

Panel C: Balanced regressions January −0.02 – – – −0.007 – – – 0.002 – −0.08 – −0.03 – – −0.06 – −0.05 −0.13 –

1952–December 2015 −1.13 – – −1.96 −0.50 −1.01 – – – – – – – – 0.54

10.57 13.57 1.89 5.69 7.11

0.005 0.009 0.388 0.058 0.069

Ang and Bekaert (2007) Ferson and Schadt (1996) Kothari and Shanken (1997) Lamont (1998) Campbell and Vuolteenaho (2004)

Panel D: IVX January 1952–December 0.02 – – – 0.01 – – – 0.03 – −0.03 – 0.008 – – 0.003 – 0.004 0.003 –

2015 −0.23 −0.21 – – –

3.07 8.34 1.77 1.77 4.44

0.220 0.080 0.410 0.410 0.220

Balanced – – – – 0.08

regressions January – – – – −0.23 – – −0.06 −0.22 –

– 0.24 – – –

– 0.25 – – –

– 0.03 – – 0.21

– 0.04 – – 0.24

This table presents the estimation and the test results of the balanced regression and the IVX for the five combination of the following list of financial and economic variables: dividend payout ratio (d/e), long-term yield (lty), dividend yield (d/y), dividend–price ratio (d/p), T-bill rate (tbl), earnings price ratio (e/p), book-to-market value ratio (b/m), default yield spread (dfy), net equity expansion (ntis) and term spread (tms). For the balanced regression, we only report the estimators of the first lags.

[ ]′ Proof of Lemma 1. Let 𝑎⃗⊥ be the 𝑘×(𝑘−𝑟) matrix satisfying 𝑎⃗′ 𝑎⃗⊥ = 0 and 𝑎⃗′⊥ 𝑎⃗⊥ = 𝐼𝑘−𝑟 . Define the transformation matrix 𝑄 = 𝑎, ⃗ 𝑎⃗⊥ ,9 [ ] [ ′ ] 𝑎⃗ 𝑋 𝑊1,𝑡 and note that 𝑄𝑄′ = 𝐼𝑘 . Using the matrix 𝑄, we can get 𝑊𝑡 = = ′ 𝑡 = 𝑄𝑋𝑡 . Using the similar arguments in Theorem 𝑎⃗⊥ 𝑋𝑡 𝑊2,𝑡 4.2 and Corollary 4.4 of Johansen (1995), we are able to show 𝑊2,𝑡 = 𝑎⃗′⊥ 𝑋𝑡 represents the 𝑘 − 𝑟 highly persistent common trends. The elements of 𝑊2,𝑡 have either unit roots, or roots local to unity, or roots moderate deviations from a unity, depending on the persistence level of 𝑋𝑡 . To derive the asymptotic property of the estimators, we define the following terms to simplify the presentation: )−1 𝑇 (𝑇 𝑇 𝑇 ∑ ∑ ∑ ∑ ′ ′ ′ ′ 𝑊1,𝑡−1,𝜇̄ 𝑊2,𝑡−1, , 𝑊 𝑊 𝑊 𝑊 𝑊2,𝑡−1,𝜇̄ 𝑊2,𝑡−1, − 𝐴𝑇 ≡ 1,𝑡−1,𝜇̄ 1,𝑡−1,𝜇̄ 2,𝑡−1,𝜇̄ 1,𝑡−1,𝜇̄ 𝜇̄ 𝜇̄ where the first term, which is of order 𝑂𝑝 ( 𝜁𝑇 ≡

𝑇 ∑

)−1 ′ 𝑊1,𝑡−1,𝜇̄ 𝑊1,𝑡−1, 𝜇̄

𝑡=2

𝑡=2

𝑡=2

𝑡=2

(𝑇 1+𝛾 ),

dominates the second term that is 𝑂𝑝 (𝑇 𝛾 );

𝑇 √ ∑ ( )  𝑇 𝑊1,𝑡−1,𝜇̄ 𝑢𝑡 ⟶ 𝜁 ∼  0, 𝜎𝑢2 𝛴 −1 ,

𝑡=2

𝑡=2

(13)

( ) ( ) where 𝛴 ≡ 𝑉 𝑎𝑟 𝑎⃗′ 𝑋𝑡−1 = 𝑉 𝑎𝑟 𝑊1,𝑡−1 , and 𝜎𝑢2 defined in Eq. (11) is the variance of 𝑢𝑡 . By regressing 𝑦𝑡 on a constant, 𝑋𝑡−1 , we [∑ ]−1 ∑ 𝑇 𝑇 ′ ′ get 𝛽̂ = 𝑡=2 𝑋𝑡−1,𝜇̄ 𝑋𝑡−1,𝜇̄ 𝑡=2 𝑋𝑡−1,𝜇̄ [𝑋𝑡−1,𝜇̄ 𝛽 + 𝑢𝑡 ]. Then: ( 𝛽̂ − 𝛽 =

𝑄′

𝑇 ∑

)−1 ′ 𝑊𝑡−1,𝜇̄ 𝑊𝑡−1, 𝑄 𝜇̄

𝑄−1

𝑡=2 𝑇 ⎡∑ ′ ⎢ 𝑊1,𝑡−1,𝜇̄ 𝑊1,𝑡−1, , 𝜇̄ ⎢ 𝑡=2 = [𝑎, ⃗ 𝑎⃗⊥ ] × ⎢ 𝑇 ⎢∑ ′ ⎢ 𝑊2,𝑡−1,𝜇̄ 𝑊1,𝑡−1,𝜇̄ , ⎣ 𝑡=2

9

𝑇 ∑

𝑊𝑡−1,𝜇̄ 𝑢𝑡

𝑡=2 𝑇 ∑

−1



′ ⎥ 𝑊1,𝑡−1,𝜇̄ 𝑊2,𝑡−1, 𝜇̄

⎥ ⎥ ⎥ ′ 𝑊2,𝑡−1,𝜇̄ 𝑊2,𝑡−1, 𝜇̄ ⎥ ⎦ 𝑡=2

𝑡=2

𝑇 ∑

𝑇 ⎡∑ ⎤ ⎢ 𝑊1,𝑡−1,𝜇̄ 𝑢𝑡 ⎥ ⎢ 𝑡=2 ⎥ ⎢𝑇 ⎥ ∑ ⎢ ⎥ 𝑊 𝑢 2,𝑡−1, 𝜇 ̄ 𝑡 ⎥ ⎢ ⎣ 𝑡=2 ⎦

[ [ ]′ ( )−1 ′ ( )−1 ] One could also choose 𝑄 = 𝑎, ⃗ 𝜋⊥ and 𝑄−1 = 𝑎⃗ − 𝑎⃗⊥ 𝜋⊥′ 𝑎⃗⊥ 𝜋⊥ 𝑎, ⃗ 𝑎⃗⊥ 𝜋⊥′ 𝑎⃗⊥ , then the proof of Lemma 1 follows similarly. 138

Y. Ren, Y. Tu and Y. Yi

Journal of Empirical Finance 54 (2019) 118–142

Table 11 Estimation results of multivariate regressions: Quarterly. Model

d/p

dfy

tms

𝑊𝑇

𝑝-value

Ang and Bekaert (2007) Ferson and Schadt (1996) Kothari and Shanken (1997) Lamont (1998) Campbell and Vuolteenaho (2004)

Panel A: Balanced regressions 1927Q1–2015 Q4 0.10 – – – −0.76 0.03 – – – −0.62 −0.16 – 0.42 – – 0.11 – – −0.07 – – 0.003 0.26 – –

e/p

b/m

d/e

– 5.77 – – –

– −0.20 – – 0.47

4.71 12.41 20.13 5.30 17.56

0.095 0.015 0.000 0.070 0.000

Ang and Bekaert (2007) Ferson and Schadt (1996) Kothari and Shanken (1997) Lamont (1998) Campbell and Vuolteenaho (2004)

Panel B: IVX 1927Q1–2015 Q4 0.03 – – – 0.03 – – – 0.007 – 0.05 – 0.03 – – 0.0003 – −0.001 0.07 –

−0.28 −0.21 – – –

– −0.06 – – –

– 0.32 – – 0.56

1.56 5.32 6.35 3.58 7.22

0.460 0.260 0.040 0.170 0.065

Ang and Bekaert (2007) Ferson and Schadt (1996) Kothari and Shanken (1997) Lamont (1998) Campbell and Vuolteenaho (2004)

Panel C: Balanced regressions 1952Q1–2015 Q4 −0.05 – – – −0.81 −0.018 – – – −2.28 0.01 – −0.18 – – −0.07 – – −0.02 – – 0.02 −0.16 – –

– −3.30 – – –

– −1.81 – – 0.61

3.55 6.42 3.22 3.29 3.14

0.169 0.170 0.200 0.193 0.371

Ang and Bekaert (2007) Ferson and Schadt (1996) Kothari and Shanken (1997) Lamont (1998) Campbell and Vuolteenaho (2004)

Panel D: IVX 1952Q1–2015 Q4 0.06 – – 0.05 – – 0.08 – −0.08 0.03 – – – −0.0005 0.04

– 2.33 – – –

– 0.02 – – 0.85

1.69 10.22 1.35 3.15 5.26

0.430 0.030 0.510 0.210 0.150

– – – 0.02 –

tbl

−0.78 −0.73 – – –

This table presents the estimation and the test results of the balanced regression and the IVX for the five combination of the following list of financial and economic variables: dividend payout ratio (d/e), long-term yield (lty), dividend yield (d/y), dividend–price ratio (d/p), T-bill rate (tbl), earnings price ratio (e/p), book-to-market value ratio (b/m), default yield spread (dfy), net equity expansion (ntis) and term spread (tms). For the balanced regression, we only report the estimators of the first lags.

Table 12 MSE of out-of-sample forecasting. Monthly data

Quarterly data

Rolling window

Expanding window

Rolling window

Expanding window

Dividend payout ratio (d/e) Long-term yield (lty) Dividend yield (d/y) Dividend–price ratio (d/p) T-bill rate (tbl) Earnings–price ratio (e/p) Book-to-market ratio (b/m) Default yield spread (dfy) Net equity expansion (ntis) Term spread (tms)

Panel A: Forecasting period: 2000–2015 0.0019 0.0019 0.0072 0.0019 0.0019 0.0070 0.0019 0.0019 0.0069 0.0019 0.0019 0.0069 0.0019 0.0019 0.0071 0.0019 0.0019 0.0071 0.0019 0.0019 0.0069 0.0019 0.0019 0.0071 0.0019 0.0019 0.0074 0.0019 0.0019 0.0073

0.0072 0.0069 0.0069 0.0069 0.0071 0.0071 0.0069 0.0071 0.0073 0.0073

Dividend payout ratio (d/e) Long-term yield (lty) Dividend yield (d/y) Dividend–price ratio (d/p) T-bill rate (tbl) Earnings–price ratio (e/p) Book-to-market ratio (b/m) Default yield spread (dfy) Net equity expansion (ntis) Term spread (tms)

Panel B: Forecasting period: 1985–2015 0.0019 0.0019 0.0068 0.0019 0.0019 0.0066 0.0019 0.0019 0.0068 0.0019 0.0019 0.0067 0.0019 0.0019 0.0066 0.0019 0.0019 0.0066 0.0019 0.0019 0.0066 0.0019 0.0019 0.0067 0.0020 0.0020 0.0069 0.0020 0.0019 0.0068

0.0068 0.0065 0.0068 0.0067 0.0065 0.0066 0.0065 0.0066 0.0068 0.0068

This table presents the MSE of the out-of-sample forecasting for the following list of financial and economic variables: dividend payout ratio (d/e), long-term yield (lty), dividend yield (d/y), dividend–price ratio (d/p), T-bill rate (tbl), earnings price ratio (e/p), book-to-market value ratio (b/m), default yield spread (dfy), net equity expansion (ntis) and term spread (tms).

139

Y. Ren, Y. Tu and Y. Yi

Journal of Empirical Finance 54 (2019) 118–142 Table 13 MSE of out-of-sample forecasting. Monthly data

Quarterly data

Rolling window

Rolling window

Expanding window

Expanding window

Ang and Bekaert (2007) Ferson and Schadt (1996) Kothari and Shanken (1997) Lamont (1998) Campbell and Vuolteenaho (2004)

Panel A: Forecasting period: 2000–2015 0.0018 0.0018 0.0067 0.0019 0.0019 0.0076 0.0019 0.0019 0.0070 0.0019 0.0019 0.0075 0.0019 0.0019 0.0073

0.0067 0.0077 0.0070 0.0071 0.0070

Ang and Bekaert (2007) Ferson and Schadt (1996) Kothari and Shanken (1997) Lamont (1998) Campbell and Vuolteenaho (2004)

Panel B: Forecasting period: 1985–2015 0.0020 0.0020 0.0067 0.0020 0.0020 0.0071 0.0020 0.0020 0.0070 0.0020 0.0020 0.0070 0.0020 0.0020 0.0073

0.0071 0.0070 0.0074 0.0075 0.0070

This table presents the MSE of the out-of-sample forecasting for the multivariate-predictor models in Ang and Bekaert (2007), Ferson and Schadt (1996), Kothari and Shanken (1997), Lamont (1998), and Campbell and Vuolteenaho (2004).

Table 14 𝑝 values of long-term predictability tests. Predicting horizon

Monthly 12

Quarterly 36

60

4

12

20

Dividend payout ratio (d/e) Long-term yield (lty) Dividend yield (d/y) Dividend–price ratio (d/p) T-bill rate (tbl) Earnings–price ratio (e/p) Book-to-market ratio (b/m) Default yield spread (dfy) Net equity expansion (ntis) Term spread (tms)

Panel A: Univariate predictor 0.0306 0.2812 0.0614 0.0093 0.1399 0.1014 0.3390 0.0751 0.0039 0.0416 0.1541 0.0147 0.1465 0.2011 0.0143 0.5456 0.1007 0.0167 0.2607 0.0060 0.0001 0.5249 0.2136 0.0334 0.0974 0.0000 0.0020 0.2187 0.0671 0.0071

0.1772 0.1103 0.4434 0.2420 0.2191 0.0557 0.0000 0.0153 0.0138 0.3326

0.3058 0.2565 0.0502 0.0074 0.1787 0.0578 0.0000 0.0103 0.0000 0.0286

0.0804 0.1716 0.0010 0.0003 0.0359 0.0172 0.0000 0.0007 0.0006 0.0097

Ang and Bekaert (2007) Ferson and Schadt (1996) Kothari and Shanken (1997) Lamont (1998) Campbell and Vuolteenaho (2004)

Panel B: Multivariate predictors 0.1211 0.1434 0.0010 0.0329 0.0145 0.0000 0.0095 0.0465 0.0067 0.0072 0.1255 0.0041 0.3527 0.0005 0.0000

0.2300 0.1220 0.0000 0.1916 0.0000

0.0086 0.0003 0.0004 0.0232 0.0000

0.0000 0.0000 0.0002 0.0004 0.0000

This table presents the 𝑝-values of the long-term predictability tests for the following list of financial and economic variables: dividend payout ratio (d/e), long-term yield (lty), dividend yield (d/y), dividend–price ratio (d/p), Tbill rate (tbl), earnings price ratio (e/p), book-to-market value ratio (b/m), default yield spread (dfy), net equity expansion (ntis) and term spread (tms). The multivariate-predictor models are Ang and Bekaert (2007), Ferson and Schadt (1996), Kothari and Shanken (1997), Lamont (1998), and Campbell and Vuolteenaho (2004).

= [𝑎, ⃗ 𝑎⃗⊥ ]× (𝑇 )−1 𝑇 𝑇 ⎡ 𝜁 ∑ ∑ ∑ ( )⎤ ′ ′ ⎢ √𝑇 − 𝑊1,𝑡−1,𝜇̄ 𝑊1,𝑡−1,𝜇̄ 𝑊1,𝑡−1,𝜇̄ 𝑊2,𝑡−1, 𝐴−1 𝑊2,𝑡−1,𝜇̄ 𝑢𝑡 + 𝑂𝑝 𝑇 −3∕2 ⎥ 𝑇 𝜇 ̄ ⎢ 𝑇 ⎥ 𝑡=2 𝑡=2 𝑡=2 ⎢ ⎥ 𝑇 ∑ ( ) ⎢ ⎥ −1−𝛾∕2 𝐴−1 𝑊 𝑢 + 𝑂 𝑇 2,𝑡−1, 𝜇 ̄ 𝑡 𝑝 ⎢ ⎥ 𝑇 ⎣ ⎦ 𝑡=2 ) ( 𝑇 ∑ ( ) ( ) 𝜁 + 𝑎⃗⊥ 𝐴−1 𝑊2,𝑡−1,𝜇̄ 𝑢𝑡 + 𝑂𝑝 𝑇 −1−𝛾∕2 , = 𝑎⃗ √𝑇 + 𝑂𝑝 𝑇 −1 𝑇 𝑇 𝑡=2

)−1 ∑ ( −(1+𝛾)∕2 ) ( −1 ) ∑𝑇 𝑇 ′ ′ −1 ∑𝑇 𝑊 , and 𝐴−1 . There𝑊1,𝑡−1,𝜇̄ 𝑊1,𝑡−1, 2,𝑡−1,𝜇̄ 𝑢𝑡 = 𝑂𝑝 𝑇 𝑡=2 𝑡=2 𝑊2,𝑡−1,𝜇̄ 𝑢𝑡 = 𝑂𝑝 𝑇 𝑡=2 𝑊1,𝑡−1,𝜇̄ 𝑊2,𝑡−1,𝜇̄ 𝐴𝑇 𝑇 𝜇̄ ) √ ( ( −(1+𝛾)∕2 ) ( −𝛾∕2 )  𝜁 𝑇 fore, 𝛽̂ − 𝛽 = 𝑎⃗ √ + 𝑂𝑝 𝑇 . As 𝑇 → ∞, 𝑇 𝛽̂ − 𝛽 = 𝑎𝜁 ⃗ 𝑇 + 𝑂𝑝 𝑇 ⟶ 𝑎𝜁. ⃗ ■

where

(∑

𝑇 𝑡=2

𝑇

140

Y. Ren, Y. Tu and Y. Yi

Journal of Empirical Finance 54 (2019) 118–142

A.2. Proofs for theoretical results in Section 2.2 ]′ [ ′ ′ , model (2) is included in the general setting (11). In particular, 𝑘 = 2𝑚, 𝑟 = 𝑚, 𝑎⃗ = , 𝑍𝑡−2 If we define 𝑋𝑡−1 = 𝑍𝑡−1 and

[

𝐼 𝐼 √𝑚 , − √𝑚 2 2

] [ [ (𝑚) ] 𝐶 0𝑚×𝑚 𝐼𝑚 − 𝛾 𝛺𝑒𝑒 0𝑚×𝑚 (2𝑚) , 𝛺𝑒𝑒 = , 𝑇 0𝑚×𝑚 0𝑚×𝑚 𝐼𝑚 0𝑚×𝑚 ] ][ [ ] [ 𝐼𝑚 𝐼𝑚 0 0 0𝑚×𝑚 , − 𝐴𝑙𝑖𝑚 − 𝐼2𝑚 = 𝑚×𝑚 = √𝑚×𝑚 √ √ , 𝐼𝑚 −𝐼𝑚 2𝐼𝑚 2 2

]′ ,

𝐴=

(14)

(2𝑚) (𝑚) where 𝛺𝑒𝑒 denotes the 𝑘 × 𝑘 matrix 𝛺 (11), 𝛺𝑒𝑒 denotes the 𝑚 × 𝑚 nonsingular matrix 𝛺𝑒𝑒 in model (2), and 0𝑚×𝑚 [ 𝑒𝑒 in ]model ′ ]′ [ ′ ′ ̂ ̂ ̂ denotes 𝑚 × 𝑚 zero matrix. Denote 𝛽 = 𝛽1 , 𝛽2 as the OLS estimator of 𝛽 = 𝛽1′ , 𝛽2′ from our balanced equation (2).

Proof of Theorem 1. Assumptions 1 and 2 are satisfied. Theorem 1 follows from Lemma 1 and Remark 5.



A.3. Proofs for theoretical results in Section 2.3 [ ]′ If we define 𝑋𝑡−1 = 𝑥𝑡−1 , 𝑥𝑡−2 , model (7) is included in the general setting (11), i.e. 𝑥𝑡−1 is cointegrated with its own lag. In [ ]′ particular, 𝑘 = 2, 𝑟 = 1, 𝑎⃗ = √1 , − √1 , and 2

[ 𝑐 1− 𝛾 𝑇 𝐴= 1

] 0 0

2

[

,

𝛺𝑒𝑒

𝜎2 = 𝑒 0

] 0 , 0

[

] ][ 0 1 1 √ 𝐴𝑙𝑖𝑚 − 𝐼2 = √ ,−√ . 2 2 2

(15)

We first derive a few expressions, which are useful in the proofs of Theorem 2 and Proposition 1. Using straightforward calculation, we show )2 −1 (𝑇 𝑇 𝑇 ⎤ ⎡∑ ∑ ∑ 2 2 ⎢ 𝑥 𝑒 − 𝑥𝑡−2,𝜇̄ 𝑒𝑡−1,𝜇̄ ⎥ ⎥ ⎢ 𝑡=3 𝑡−2,𝜇̄ 𝑡=3 𝑡−1,𝜇̄ 𝑡=3 ⎦ (16) 𝛽̂1 − 𝛽1 = [ ⎣ ] 𝑇 𝑇 𝑇 𝑇 ∑ ∑ ∑ ∑ 2 𝑥𝑡−2,𝜇̄ 𝑒𝑡−1,𝜇̄ 𝑥𝑡−2,𝜇̄ 𝑢𝑡,𝜇̄ , 𝑒𝑡−1,𝜇̄ 𝑢𝑡,𝜇̄ − 𝑥𝑡−2,𝜇̄ × 𝑡=3

𝑡=3

𝑡=3

𝑡=3

)2 −1 (𝑇 𝑇 𝑇 ⎡∑ ⎤ ∑ ∑ −𝛼(𝛽̂1 − 𝛽1 ) + ⎢ 𝑥2𝑡−2,𝜇̄ 𝑒2𝑡−1,𝜇̄ − 𝑥𝑡−2,𝜇̄ 𝑒𝑡−1,𝜇̄ ⎥ ⎥ ⎢ 𝑡=3 𝑡=3 𝑡=3 ⎦] ⎣ 𝛽̂2 − 𝛽2 = [ 𝑇 𝑇 𝑇 𝑇 ∑ ∑ ∑ ∑ 𝑥𝑡−2,𝜇̄ 𝑢𝑡,𝜇̄ . 𝑒2𝑡−1,𝜇̄ 𝑥𝑡−2,𝜇̄ 𝑒𝑡−1,𝜇̄ 𝑒𝑡−1,𝜇̄ 𝑢𝑡,𝜇̄ + × − 𝑡=3

𝑡=3

𝑡=3

(17)

𝑡=3

Using the CLT for stationary time series, we obtain ∑𝑇 √ 𝑡=3 𝑒𝑡−1,𝜇̄ 𝑢𝑡,𝜇̄  𝜎𝑢 𝑇 ∑𝑇 ⟶ ,  ∼  (0, 1). 2 𝜎𝑒 𝑡=3 𝑒𝑡−1,𝜇̄

(18)

Proof of Theorem 2. Assumptions 1 and 2 are satisfied in Part (i), therefore (i) follows from 1. (√ Lemma ) (√ ) ∑ ∑ ∑ Part (ii): since {𝑥𝑡 } is stationary, we have 𝑇𝑡=3 𝑥2𝑡−2,𝜇̄ = 𝑂𝑝 (𝑇 ), 𝑇𝑡=3 𝑥𝑡−2,𝜇̄ 𝑒𝑡−1,𝜇̄ = 𝑂𝑝 𝑇 , and 𝑇𝑡=3 𝑥𝑡−2,𝜇̄ 𝑢𝑡,𝜇̄ = 𝑂𝑝 𝑇 . ( ) √ √ ∑𝑇 𝑒𝑡−1,𝜇̄ 𝑢𝑡,𝜇̄  𝜎 𝑡=3 Using Eqs. (16), (17) and the CLT for stationary time series, we obtain 𝑇 (𝛽̂1 − 𝛽1 ) = 𝑇 ∑ + 𝑂𝑝 √1 ⟶ 𝜎𝑢 , and 𝑇 2 𝑇 𝑒 𝑡=3 𝑒𝑡−1,𝜇̄ ( ) ∑ ∑ ( ) √ 𝑇 𝑇 √ √ √  𝜎 1 𝑡=3 𝑒𝑡−1,𝜇̄ 𝑢𝑡,𝜇̄ 𝑡=3 𝑥𝑡−2,𝜇̄ 𝑢𝑡,𝜇̄ 𝑢 2 ̂ 𝑇 (𝛽2 − 𝛽2 ) = −𝛼 𝑇 ∑𝑇 2 + 𝑇 ∑𝑇 2 + 𝑂𝑝 √ ⟶ 𝜎 −𝛼 + 1 − 𝛼  .  and  are independent standard normal 𝑥 𝑇 𝑒 𝑡=3 𝑒𝑡−1,𝜇̄ √ 𝑡=3 𝑡−2,𝜇̄ √ ∑𝑇 ∑ random variables, since 𝑡=3 𝑒𝑡−1,𝜇̄ 𝑢𝑡,𝜇̄ ∕ 𝑇 is asymptotically independent of 𝑇𝑡=3 𝑥𝑡−2,𝜇̄ 𝑢𝑡,𝜇̄ ∕ 𝑇 . ■ Proof of Proposition 1. From Eq. (7), it is straightforward to get 𝑦𝑡 = 𝜇𝑦 − We can decompose 𝑢𝑡 into: √ 𝜎 𝑢𝑡 = 𝜌 𝑢 𝑒𝑡 + 1 − 𝜌2 𝜎𝑢 𝜖𝑡 , 𝜎𝑒

𝑐𝛽2 1 𝛼 𝑇𝛾

( 𝜇𝑥 + 𝛽1 + 𝛽2 +

𝑐𝛽2 1 𝛼 𝑇𝛾

)

𝑥𝑡−1 + 𝑢𝑡 − 𝛽2 𝑒𝑡−1 −

𝑐𝛽2 1 𝑒 . 𝛼 𝑇 𝛾 𝑡−1

(19)

where {𝜖𝑡 } is an i.i.d. standard normal series that is independent of {𝑒𝑡 }. 1 ∑  [𝑇 ⋅] (i) Partial sums of {𝑒𝑡 } satisfy the functional law 𝑇 − 2 𝑡=1 𝑒𝑡 ⟶ 𝜎𝑒 𝑊 (⋅), where 𝑊 (⋅) is a standard Brownian motion. Denote 𝑟 𝐽𝑐 (𝑟) = ∫0 𝑒−(𝑟−𝑠)𝑐 𝑑𝑊 (𝑠), for −∞ < 𝑐 < ∞. Using Lemma 1 of Phillips (1987) and Eqs. (17), (19), we get 𝑇 (𝛽̂1 + 𝛽̂2 − 𝛽1 − 𝛽2 ) = 141

Y. Ren, Y. Tu and Y. Yi 𝑇

∑𝑇

𝑡=3 𝑥𝑡−2,𝜇̄ 𝑢𝑡,𝜇̄ 2 𝑡=3 𝑥𝑡−2,𝜇̄

∑𝑇

Journal of Empirical Finance 54 (2019) 118–142 1



1

1

𝜎 ∫0 𝐽𝑐 (𝑟)𝑑𝑊 (𝑟)−∫0 𝐽𝑐 (𝑟)𝑑𝑟×𝑊 (1) [ ]2 1 1 𝑒 ∫0 [𝐽𝑐 (𝑟)]2 𝑑𝑟− ∫0 𝐽𝑐 (𝑟)𝑑𝑟

+ 𝑂𝑝 (𝑇 − 2 ) ⟶ 𝜌 𝜎𝑢

1

+ (1 − 𝜌2 ) 2

𝜎𝑢  [ [ ]2 ]1∕2 , 𝜎𝑒 1 1 ∫0 [𝐽𝑐 (𝑟)]2 𝑑𝑟− ∫0 𝐽𝑐 (𝑟)𝑑𝑟

where  is a standard normal

variable that is independent of {𝑒𝑡 }. (ii) Using Theorem 3.2 of Phillips and Magdalinos (2007) and Eq. (17), we get 𝑇 ( )  √ 𝜎 1 𝑂𝑝 ⟶ 𝜎𝑢 2𝑐 , where  ∼  (0, 1). ■ 𝛾∕2 𝑇

1+𝛾 2

1+𝛾

(𝛽̂1 + 𝛽̂2 − 𝛽1 − 𝛽2 ) =

𝑇 2

∑𝑇

𝑡=3 𝑥𝑡−2,𝜇̄ 𝑢𝑡,𝜇̄ 2 𝑡=3 𝑥𝑡−2,𝜇̄

∑𝑇

+

𝑒

Proof of Proposition 2. The proof is straightforward, thus it is omitted.



Appendix B. Figures See Figs. 1–8. References Amihud, Y., Hurvich, C.M., 2004. Predictive regressions, a reduced-bias estimation method. J. Financ. Quant. Anal. 39, 813–841. Amihud, Y., Hurvich, C.M., Wang, Y., 2009. Multiple-predictor regressions, hypothesis testing. Rev. Financ. Stud. 22, 413–434. Ang, A., Bekaert, G., 2007. Stock return predictability, is it there?. Rev. Financ. Stud. 20, 651–707. Baker, M.P., Greenwood, R., Wurgler, J., 2003. The maturity of debt issues and predictable variation in bond returns. J. Financ. Econ. 70, 261–291. Baker, M.P., Stein, J.C., 2004. Market liquidity as a sentiment indicator. J. Financial Mark. 7, 271–299. Baker, M.P., Wurgler, J., 2004. A catering theory of dividends. J. Finance 59, 1125–1165. Campbell, J.Y., 1987. Stock returns and the term structure. J. Financ. Econ. 18, 373–399. Campbell, J.Y., 1991. A variance decomposition for stock returns. Econom. J. 101, 157–179. Campbell, J.Y., Shiller, R.J., 1988. Stock prices, earnings, and expected dividends. J. Finance 43, 661–676. Campbell, J.Y., Vuolteenaho, T., 2004. Bad beta, good beta. Amer. Econ. Rev. 94, 1249–1275. Campbell, J.Y., Yogo, M., 2006. Efficient tests of stock return predictability. J. Financ. Econ. 81, 27–60. Cavanagh, C.L., 1985. Roots Local to Unity. Manuscript. Harvard University. Cavanagh, C.L., Elliott, G., Stock, J.H., 1995. Inference in models with nearly integrated regressors. Econometric Theory 11, 1131–1147. Chan, N.H., 1988. On the parameter inference for nearly nonstationary time series. J. Amer. Statist. Assoc. 83, 857–862. Chan, N.H., Wei, C.Z., 1987. Asymptotic inference for nearly nonstationary ar(1) processes. Ann. Statist. 15, 1050–1063. Chen, W., Deo, R., Yi, Y., 2013. Uniform inference in predictive regression models. J. Bus. Econom. Statist. 31, 525–533. Dow, C.H., 1920. Scientific stock speculation. In: The Magazine of Wall Street. New York. Elliott, G., Stock, J.H., 1994. Inference in time series regression when the order of integration of a regressor is unknown. Econometric Theory 10, 672–700. Fama, E.F., 1991. Efficient capital markets, ii. J. Finance 46, 1575–1617. Fama, E.F., French, K., 1988. Dividend yields and expected stock returns. J. Financ. Econ. 22, 3–24. Fama, E.F., French, K., 1989. Business conditions and expected returns on stocks and bonds. J. Financ. Econ. 25, 23–49. Ferson, W.E., Schadt, R.W., 1996. Measuring fund strategy and performance in changing economic conditions. J. Finance 51, 425–461. Guo, H., Savickas, R., 2006. Idiosyncratic volatility, stock market volatility, and expected stock returns. J. Bus. Econom. Statist. 24, 43–56. Hamilton, J.D., 1994. Time Series Analysis. Princeton University Press. Hodrick, R.J., 1992. Dividend yields and expected stock returns, alternative procedures for inference and measurement. Rev. Financ. Stud. 5, 357–386. Jansson, M., Moreira, M., 2006. Optimal inference in regression models with nearly integrated regressors. Econometrica 74, 681–714. Johansen, S., 1995. Likelihood-Based Inference in Cointegrated Vector Autoregressive Models. Oxford University Press. Keim, D.B., Madhavan, A., 2000. The relation between stock market movements and nyse seat prices. J. Finance 55, 2817–2840. Keim, D.B., Stambaugh, R.F., 1986. Predicting returns in the stock and bond markets. J. Financ. Econ. 17, 357–390. Kostakis, A., Magdalinos, T., Stamatogiannis, M.P., 2015. Robust econometric inference for stock return predictability. Rev. Financ. Stud. 28, 1506–1553. Kothari, S., Shanken, J., 1997. Book-to-market, dividend yield, and expected market returns, a time-series analysis. J. Financ. Econ. 44, 169–203. Lamont, O., 1998. Earnings and expected returns. J. Finance 53, 1563–1587. Lee, T.-H., Tu, Y., Ullah, A., 2015. Forecasting equity premium, global historical average versus local historical average and constraints. J. Bus. Econ. Statist. 33, 393–402. Lettau, M., Ludvigson, S., 2001. Consumption, aggregate wealth, and expected stock returns. J. Finance 56, 815–849. Mankiw, N.G., Shapiro, M.D., 1986. Do we reject too often? small sample properties of tests of rational expectations models. Econom. Lett. 20, 139–145. Nabeya, S., Tanaka, K., 1990. A general approach to the limiting distribution for estimators in time series regression with nonstable autoregressive errors. Econometrica 58, 145–163. Phillips, P.C.B., 1987. Towards a unified asymptotic theory for autoregression. Biometrika 74, 535–547. Phillips, P.C.B., 2013. Robust econometric inference with mixed integrated and mildly explosive regressors. J. Econometrics 177, 250–264. Phillips, P.C.B., 2014. On confidence intervals for autoregressive roots and predictive regression. Econometrica 82, 1177–1195. Phillips, P.C.B., Magdalinos, T., 2007. Limit theory for moderate deviations from a unit root. J. Econometrics 136, 115–130. Phillips, P.C.B., Magdalinos, T., 2009. Econometric inference in the vicinity of unity. CoFie Working Paper (7), Singapore Management University. Pontiff, J., Schall, L.D., 1998. Book-to-market ratios as predictors of market returns. J. Financ. Econ. 49, 141–160. Stambaugh, R.F., 1999. Predictive regressions. J. Financ. Econ. 54, 375–421. Welch, I., Goyal, A., 2008. A comprehensive look at the empirical performance of equity premium prediction. Rev. Financ. Stud. 21, 1455–1508.

142