Testing for predictability in panels of any time series dimension

Testing for predictability in panels of any time series dimension

International Journal of Forecasting 32 (2016) 1162–1177 Contents lists available at ScienceDirect International Journal of Forecasting journal home...

447KB Sizes 0 Downloads 77 Views

International Journal of Forecasting 32 (2016) 1162–1177

Contents lists available at ScienceDirect

International Journal of Forecasting journal homepage: www.elsevier.com/locate/ijforecast

Testing for predictability in panels of any time series dimension Joakim Westerlund a,b,∗ , Paresh Narayan b a

Lund University, Sweden

b

Centre for Financial Econometrics, Deakin University, Australia

article

info

Keywords: Panel data Predictive regression Stock return predictability China

abstract The few panel data tests for the predictability of returns that exist are based on the prerequisite that both the number of time series observations, T , and the number of cross-section units, N, are large. As a result, it is impossible to apply these tests to stock markets, where lengthy time series of data are scarce. In response to this, the current paper develops a new test for predictability in panels where N is large and T ≥ 2 can be either small or large, or indeed anything in between. This consideration represents an advancement relative to the usual large-N and large-T requirement. The new test is also very general, especially when it comes to allowable predictors, and is easy to implement. As an illustration, we consider the Chinese stock market, for which data are available for only 17 years, but where the number of firms is relatively large, 160. © 2016 International Institute of Forecasters. Published by Elsevier B.V. All rights reserved.

1. Introduction Predicting returns is one of the traditional branches of research in financial economics. This body of literature can be divided into three strands. In particular, while the first strand of studies examines whether past returns predict current returns (see Conrad & Kaul, 1988; Lo & MacKinlay, 1988), the second strand focuses on the predictive ability of macroeconomic variables, such as interest rates, exchange rates, money supply, and economic activity (see Chen, 2009; Hamburner & Kochin, 1972; Patelis, 1997; Thorbecke, 1997). The third strand of studies (see Lewellen, 2004; Nelson & Kim, 1993; Pontiff & Schall, 1998) examines whether financial ratios, such as dividend yields and the book-to-market ratio, predict returns. A common feature of these three strands is that they all focus on the US, for

∗ Correspondence to: Department of Economics, Lund University, Box 7082, 220 07 Lund, Sweden. Tel.: +46 46 222 8997; fax: +46 46 222 4613. E-mail address: [email protected] (J. Westerlund).

which historical time series data are available readily. For emerging markets such as China, however, data availability is a critical issue, and any data that may be available are usually only for short time periods, which is often not sufficient for the fitting of time series predictive regressions. To remedy this, various tests for predictability in panels have been proposed (see Hjalmarsson, 2008, 2010; Kauppi, 2001; Westerlund & Narayan, 2015a). The problem is that both the time series dimension, T , and the cross-sectional dimension, N, have to be large in order for these tests to work properly. Moreover, T has to be substantially larger than N, a condition that is rarely met in practice. This paper can be seen as a response to this problem. The aim is to devise a test for predictability that is applicable regardless of whether T is large or small, and does not require T to be much larger than N. The test should also be easy to implement, and general enough to accommodate typical data features such as heteroskedasticity, endogeneity, and predictor serial and cross-sectional correlation. The proposed test has several distinctive characteristics. First, the test is valid not only when T ≥ 2 and

http://dx.doi.org/10.1016/j.ijforecast.2016.02.009 0169-2070/© 2016 International Institute of Forecasters. Published by Elsevier B.V. All rights reserved.

J. Westerlund, P. Narayan / International Journal of Forecasting 32 (2016) 1162–1177

N → ∞, but also when N , T → ∞ jointly, regardless of the rate of expansion of the two indices. This represents a substantial improvement over the usual asymptotic framework, in which N , T → ∞ with N /T → 0. In practice, this means that the new test is applicable regardless of the value of T , as long as N is large enough for accurate inference. Second, while existing tests make very specific assumptions regarding the properties of the predictor, the new test leaves the predictor almost completely unrestricted. However, while the test is generally sized correctly, in order for the test to have good power properties, we do require there to be at least some cross-section units for which the predictor exhibits a unit root-like behaviour, which is a mild restriction given that most predictors have been found to be very persistent (see for example Campbell & Yogo, 2006; Elliott & Stock, 1994; Lanne, 2002; for discussions and some confirmatory empirical results). Third, the test is extremely simple to implement, requiring only minimal corrections to account for the generality of the data generating process (DGP). Fourth, the test has excellent small-sample properties. In fact, the test is reasonably sized even in panels as small as N = 20 and T = 2. This is a great advantage, especially when working with annual data, as is very common in the return predictability literature; see for example Campbell and Shiller (1988), Cochrane (2008), Fama and French (1988) and Goyal and Welch (2003), who only use annual data. Almost all other influential studies, such as those of Campbell and Thompson (2008), Campbell and Yogo (2006), Rapach, Strauss, and Zhou (2010) and Welch and Goyal (2008), have used annual data along with monthly or quarterly data. Interestingly, whenever predictability is found, the evidence is typically strongest at the annual frequency (see Campbell & Yogo, 2006; Cochrane, 2008). Moreover, the time span does not need to be very large, which, with annual data, amounts to a relatively small T (see for example Ferreira & Santa-Clara, 2011). In the empirical part of the paper, the new test is used to test for predictability in the Chinese stock market. There is generally a lack of consistent time series data on financial ratios and macroeconomic indicators for developing and emerging markets; thus, T is too small for reliable inference using time series methods. In the case of China, there is annual data on the dividend yield, the earnings-price ratio, and the book-to-market ratio, but only for the period 1994–2010. Macroeconomic data for the same period are also available for a large number of variables. Therefore, our goal is to test whether Chinese stock returns can be predicted using financial ratios and/or macroeconomic indicators, making for a quite exhaustive analysis of the issue. The results are strong, in that the evidence of predictability is extremely weak. We organize the balance of the paper as follows. In the next section, we present the predictive model of interest, which is then used in Section 3 as a basis for developing the new test. The small-sample performance of the test is investigated in Section 4, and Section 5 discusses and reports the results from an application to the Chinese stock market. Section 6 concludes. Proofs of important results are provided in Appendix A, while Appendix B contains an analysis of the power of the test.

1163

2. The model Consider the panel data variables Yi,t and Xi,t , which are observable for t = 1, . . . , T time series and i = 1, . . . , N cross-section units. The DGP of Yi,t is given by Yi,t = γi + θt + β Xi,t −1 + ϵi,t .

(1)

This DGP is a panel extension of the prototypical predictive regression model that has been used widely in the time series literature, where Xi,t is a variable that is believed to be able to predict Yi,t . As in previous studies, it is reasonable to assume that ϵi,t is correlated with Xi,t . For example, if Xi,t is the dividend-price ratio, then an increase in the stock price will lower dividends and raise returns. Moreover, as is usual when working with macroeconomic and financial data, Xi,t is likely to be endogenous, heteroskedastic, and both serially and cross-sectionally correlated. In the current paper, we are very general in this regard. In fact, most of our results only require Assumption 1, which restricts ϵi,t but not Xi,t . This means that Xi,t is generally treated as a ‘‘black box’’. Assumption 1. ϵi,t is independent across i, with 2 2 E (ϵi,t |Fi,t −1 ) = 0, E (ϵi2,t |Fi,t −1 ) = σϵ, t > 0, E (σϵ,t ) =

σϵ2 > 0 and E (ϵi4,t ) = κϵ < ∞, where Fi,t is the sigmafield generated by {ϵi,s }ts=1 .

Assumption 1 deserves some discussion. The most notable feature of Assumption 1 is that there are no conditions on Xi,t . This means that the endogeneity, heteroskedasticity, and serial and cross-sectional correlation properties of Xi,t are not restricted in any way. As far as we are aware, this is the first study to consider such a general DGP. In fact, the only other study that comes close is that by Kostakis, Magdalinos, and Stamatogiannis (2015), who propose a test for predictability in time series with unknown persistence, provided that the order of integration is at most one. The current approach is not limited to unit root predictors, but, in principle, can accommodate processes of an arbitrary integration order, or indeed any nonlinear transformation of such processes. As has been mentioned, Assumption 1 is enough for most of our results, but additional restrictions may be required depending on how the test is implemented. Specifically, as we will explain in detail in Section 3.2, unless the null hypothesis is imposed in the estimation, the need for consistent variance estimation requires some structure to be imposed on Xi,t . However, the assumptions are not very restrictive. Assumption 1 allows the conditional variance of ϵi,t to be time-varying, which is consistent with the fact that returns are often found to exhibit generalized autoregressive conditional heteroskedasticity (GARCH) effects. However, the unconditional variance is assumed to be constant. While the assumed constancy of σϵ2 in t is a restriction, the constancy in i is not, provided N that N −1 i=1 E (ϵi2,t ) has a limit such as σϵ2 . Thus, crosssectional homoskedasticity is not a restriction. The condition that E (ϵi,t |Fi,t −1 ) = 0 implies that ϵi,t be serially uncorrelated, which is a standard requirement in the literature (see for example Breitung & Demetrescu,

1164

J. Westerlund, P. Narayan / International Journal of Forecasting 32 (2016) 1162–1177

2015; Campbell & Yogo, 2006; Hjalmarsson, 2010; Kauppi, 2001; Kostakis et al., 2015; Lanne, 2002; Lewellen, 2004; Stambaugh, 1999; Westerlund & Narayan, 2015a,b). Interestingly, the way in which the new test statistic is constructed, by first taking differences and then summing over T , means that the asymptotic null distribution depends only on ϵi,1 and ϵi,T , which become asymptotically uncorrelated as T → ∞. Hence, serial uncorrelatedness is not a restriction if T is large enough, provided of course that the serial correlation is of the weak type. The main reason for retaining the serial uncorrelatedness assumption is that we would like to entertain the possibility that T may be fixed. While ϵi,t may be serially uncorrelated, independence across i is crucial. Thus, conditioning on Xi,t , any crosssectional dependence in Yi,t must be captured by θt , a timespecific fixed effect. Though simple, this parametrization is relevant economically, as the dependence induced by the variation in the return on the market is the same for all stocks under the capital asset pricing model (CAPM), provided that they are homogeneous in firm characteristics, such as stocks that belong to a particular sector.1 Should T be large, the time-specific effects specification considered in Eq. (1) can be treated as a special case of a more general common factor component, which can be estimated by applying the principal components method to Yi,t (see for example, Westerlund & Larsson, 2012; Westerlund, 2015, for similar approaches in the unit root testing context). As with many of the other assumptions, the requirement that β be the same for all i is also not restrictive. This is particularly obvious under the null hypothesis of no predictability, as Xi,t −1 is not present in Eq. (1) then anyway. The power implications of the homogenous slope assumption are discussed in Appendix B. 3. The test procedure In Section 3.1, we study the asymptotic properties of the proposed test under the null hypothesis of no predictability, which can be formulated as H0 : β = 0. The estimation of the various variances that appear in Section 3.1 is then discussed in Section 3.2. Following Lanne (2002), for ease of exposure, the local power analysis is relegated to Appendix B. 3.1. The test and its asymptotic null distribution

variances have lower CAPM-adjusted expected returns, suggesting that vt is informative regarding the return predictability. In this paper, we exploit this possibility more formally. The idea is as follows. While subtracting Y t from Yi,t takes care of θt , it does not remove γi . Thus, the test statistic that we consider is based on ∆vt , not vt . Under H0 , ∆vt only depends on ϵi,t , and so the √ asymptotic null distribution of N ∆vt can be obtained under Assumption 1 only. On the other hand, if H0 is false, then there is also a dependence on Xi,t , and therefore the √ asymptotic distribution of N ∆vt will be different from that which applies under H0 . In this sense, vt is informative regarding the predictability. The discussion in the previous paragraph suggests that the larger the difference between the properties of Xi,t and ϵi,t , the easier it will be for the test to detect deviations from the null. Thus, since the DGP of Xi,t is virtually unrestricted, there are many directions in which power can be evaluated. In this paper, we focus on a particularly well-known feature of predictive regressions, namely that Xi,t can be quite persistent. In the dividendprice ratio example, the persistency of prices means that Xi,t is expected to be highly persistent. Hence, in this context, the test of predictability may be viewed loosely as a test of persistence in Yi,t (see Lanne, 2002, for a similar idea in the time series context). Note in particular that if Xi,t contains a (near) unit root and β ̸= 0, Yi,t contains a linear trend, which means that the trend in vt is quadratic. Thus, deviations from the null are going to be very easy to detect in this case. Of course, there are many ways in which H0 can be tested. By far the most common approach is to use a conventional significance test, such as a Wald or t-test (see for example Breitung & Demetrescu, 2015; Campbell & Yogo, 2006; Hjalmarsson, 2010; Kauppi, 2001; Kostakis et al., 2015; Lewellen, 2004; Stambaugh, 1999; Westerlund & Narayan, 2015a,b). However, one problem with such tests is that they are based on regressions that involve Xi,t , which is known to lead to nonstandard asymptotic distributions if it is endogenous and/or highly persistent. The cross-sectional variance does not involve Xi,t , and is therefore expected to be governed by relatively simple asymptotics. Theorem 1 confirms this. Theorem 1. Suppose that H0 is true. Then, under Assumption 1, as N → ∞ with 2 ≤ T < ∞, or as N , T → ∞ jointly, for each 2 ≤ t ≤ T ,

The test statistic that we consider is surprisingly simple, and is based on the sample cross-sectional variance of Yi,t ,



N 1  vt = (Yi,t − Y t )2 ,

where →d signifies convergence in distribution, Zt ∼ N (0, 1) and

N i=1

(2)

N

where Y t = N −1 i=1 Yi,t . The cross-sectional variance has a long history in the analysis of returns. For example, Guo and Savickas (2010) find that stocks with high

1 In our application, we use subsamples to account for the fact that the characteristics of the market beta may vary, which is tantamount to allowing for multiple time-specific fixed effects.

N ∆vt →d σv Zt ,

σv2 = 8σϵ2 σγ2 + 2(κϵ − σϵ4 ), σγ2 = lim

N →∞

N 1 

N i=1

(γi − γ )2 .

The first thing to note about√Theorem 1 is that the asymptotic null distribution of N ∆vt is standard and does not depend on Xi,t . This is in contrast to the previous

J. Westerlund, P. Narayan / International Journal of Forecasting 32 (2016) 1162–1177

literature, where the asymptotic results depend critically on what is assumed regarding Xi,t ; if Xi,t is stationary, then normality is usually possible (see Lewellen, 2004), whereas if Xi,t is (near) unit root non-stationary, then the results involve functions of Brownian motion (see Cavanagh, Elliott, & Stock, 1995; Elliott & Stock, 1994). The fact that the asymptotic null distribution does not depend on the correlation between ϵi,t and Xi,t is also new. Specifically, the novelty is that the invariance with respect to the correlation does not require any correction, which is of course a great operational advantage. Unlike existing results, Theorem 1 does not require T → ∞. In fact, as long as T ≥ 2, such that ∆vt can be computed at least once, T does not even have to be large. This is a major advantage, because T is always finite in practice. The asymptotic approximation offered by Theorem 1 is therefore expected to be relatively good. However, the theorem is not constrained to fixed-T panels, but also applies when N , T → ∞ jointly, and that without putting any restrictions on the relative expansion rate of N and T .



Remark 1. The variance of N ∆vt increases with the heterogeneity of γi . The minimal variance is obtained when σγ2 = 0, which will be the case if γ1 = · · · = γN . This is as expected, because ∆vt does not involve any estimation of unit-specific intercepts, and is therefore unlikely to be invariant in this regard. In Section 3.2, we describe how to estimate σv2 . Let us assume the existence of a consistent estimator σˆ v2 of σv2 . In view of Theorem 1, a natural proposal for a test statistic of H0 is the following normalized variance ratio:

√ Vt =

N ∆vt

σˆ v

.

(3)

Because σˆ v2 is consistent, according to Theorem 1, Vt →d Z t .

(4)

However, the conditions under which this convergence takes place need not be the same as in Theorem 1, as it also depends on the conditions under which σˆ v2 is consistent. This is discussed in Section 3.2. Point-wise tests such as Vt are useful when T is very small. However, as T increases, so does the number of tests that can possibly be computed. A simple approach would be to compute one test for each time period, and to reject the null of no predictability if at least one of the tests rejects at significance level α . However, this means ignoring the multiplicity of the testing problem, which is likely to result in too many rejections. Therefore, we propose to circumvent this problem by combining the point-wise test statistics. Specifically, we propose the following combination statistic based on summing across the individual point-wise test statistics: V =

T 

Vt .

(5)

t =2

The asymptotic distribution of V can be inferred from the following corollary to Theorem 1.

1165

Corollary 1. Under the conditions of Theorem 1, as N → ∞ with 2 ≤ T < ∞, or as N , T → ∞ jointly, T √ 

N ∆vt →d σv Z ,

t =2

where Z ∼ N (0, 1). Corollary 1 shows that the asymptotic distribution of the sum is identical to that of the point-wise statistics, which of course is very convenient. It also shows that we only need N → ∞ in order for this result to hold, although large-T panels can also be accommodated. Corollary 1 plays the same role for V as Theorem 1 does for Vt . Therefore, in an analogy to the above discussion for Vt , under conditions to be given in Section 3.2, V →d Z .

(6)

Our choice of combination statistic is motivated mainly by the fact that the use of the sum leads to a very simple and convenient asymptotic distribution. One could also consider order statistics (or indeed any other summary measure). However, the asymptotic theory of such statistics is notoriously difficult. Another possibility that is not only simple, but also quite informative, is to look at the empirical rejection frequency of the point-wise tests, as defined by

αˆ =

1

T 

T − 1 t =2

1(Vt > cα ),

(7)

where 1(A) is the indicator function for event A and cα is the appropriate right-tail α -level critical value from N (0, 1). If H0 holds, we would expect αˆ to be close to α , whereas if H0 is false, then αˆ should be larger than α . Specifically, provided that H0 is true and that the conditions required for σˆ v2 to be consistent are met,

αˆ →p α

(8)

as N , T → ∞ jointly, where →p signifies convergence in probability. 3.2. Variance estimation As was mentioned in Section 3.1, while Theorem 1 holds under quite general conditions, the requirement that σˆ v2 should be consistent may imply restrictions. One exception is when ϵi,t is known to be N (0, 1) and γ1 = · · · = γN , such that σγ2 = 0. In this case, σv2 = 4 is known, and therefore Vt and V are asymptotically N (0, 1) under the conditions of Theorem 1. In general, however, σˆ v2 is only equal to σv2 up to some remainder term that will be negligible under certain conditions. These conditions depend on whether or not H0 is imposed in the estimation. Suppose first that the estimation is done under H0 (see for example Lanne, 2002). In this case, Yi,t − Y t = (γi − γ ) + (ϵi,t − ϵ t ),

(9)

1166

J. Westerlund, P. Narayan / International Journal of Forecasting 32 (2016) 1162–1177

and therefore ∆Yi,t − ∆Y t = ∆ϵi,t − ∆ϵ t . In the proof of Theorem 1, we show that E [(∆ϵi,t − ∆ϵ t )2 ] = 2σϵ2 . Therefore,

σˆ ϵ2 =

N  T  (∆Yi,t − ∆Y t )2

1

2NT i=1 t =1

= σϵ2 + Op ((NT )−1/2 ) →p σϵ2 ,

(10)

as NT → ∞. Note in particular that the requirement that NT → ∞ is satisfied regardless of whether N → ∞ or T → ∞, or both. This means that the estimation of σˆ ϵ2 does not imply any restrictions beyond those given in Theorem 1. For κϵ , we use the fact that E [(∆ϵi,t − ∆ϵ t )4 ] → 2κϵ + 6σϵ2 as N → ∞, leading to the following implicit estimator:

κˆ ϵ =

1

N  T 

2NT i=1 t =1

(∆Yi,t − ∆Y t )4 − 3σˆ ϵ2 .

(11)

ρ 2 ) > 0 as N , T → ∞. Also, DNT

For this estimator to be consistent, we require N → ∞, which, in view of Theorem 1, is not a restriction. Similarly, in view of Eq. (9), it is not difficult to see that N T 1 

NT i=1 t =1

(Yi,t − Y t )2 =

N 1 

N i =1

(γi − γ )2

+ σϵ2 + Op ((NT )−1/2 ),

(12)

and therefore

σˆ γ2 = =

N T 1 

NT i=1 t =1 N 1 

N i=1

(Yi,t − Y t )2 − σˆ ϵ2

(γi − γ )2 + Op ((NT )−1/2 ) →p σγ2 ,

4. Monte Carlo simulation

β need not be zero. This amounts to replacing Yi,t − Y t ˆ Xi,t −1 − X t −1 ), in the formulas given with Yi,t − Y t − β( ˆ where β is the pooled least squares estimator of β in Eq.

(1). The main difference here relative to the case where the estimation was carried out under H0 is that βˆ has to be consistent, which requires us to impose more structure on Xi,t . Assumption 2 is enough. Assumption 2. DNT

= Op (1) and → σ as N , T → ∞ jointly,

N T i=1 2

∗ t =2 (Xi,t −1 )

N T i=1

∗ ∗ i=1 t =2 Xi,t −1 ϵi,t converges to a normal variate even if ϵi,t is conditionally heteroskedastic, which is usually ruled out on the grounds that the variance of the resulting estimator of β becomes inestimable (see for example Kostakis et al., 2015). As the discussion in the previous paragraph suggests, Vt and V are asymptotically N (0, 1) even if β is estimated. However, the assumptions of Theorem 1 are not enough to ensure this; we also require Assumption 2 to be satisfied. Denote the least squares estimators of ϵi,t and γi in Eq. (1) by ϵˆi,t and γˆi , respectively. An alternative to the indirect estimation approach described above that may be more intuitive is to simply replace σϵ2 , κϵ and σγ2 with the corresponding sample moments of ϵˆi,t and γˆi . This is the approach used in Sections 4 and 5.

N T

(13)

which again requires N → ∞, but not necessarily T → ∞. Hence, if the estimation is carried out under the null, Vt and V are asymptotically N (0, 1), provided only that the conditions of Theorem 1 are satisfied. The above estimators all make use of Eq. (9), which only holds under H0 . This means that consistency is assured only under the null. This is the ‘‘cost’’ of the simplicity of the restricted model, and, in fact, no additional restrictions are needed. If the estimation is not done under H0 , then σˆ ϵ2 , κˆ ϵ and σˆ γ2 must be adapted to account for the fact that

D2NT

While Assumption 2 is obviously more demanding than Assumption 1, it is not very restrictive, and imposes only minimal regularity conditions. Interestingly, while the regression that we are estimating here does involve Xi,t −1 , the required assumptions would have been still less restrictive if we had used a conventional significance test. The reason for this is that the only purpose of the regression here is the consistent estimation of σˆ v2 , which is less demanding than when the regression is to be used for inference as well. Specifically, while we do require N T ∗ ∗ that DNT i=1 t =2 Xi,t −1 ϵi,t converges in distribution, the distribution does not have to be known. The condition N T ∗ 2 placed on D2NT i=1 t =2 (Xi,t −1 ) is enough to ensure that it is asymptotically invertible. Suppose for example that Xi,t = ηi + Ui,t , where Ui,t = ρ Ui,t −1 + ui,t , |ρ| < 1, E (ui,t ) = 0 and E (u2i,t ) = σu2 (see for example Kostakis et al., 2015; Lewellen, 2004; Stambaugh, 1999). In this case, N T ∗ 2 2 DT = (NT )−1/2 and D2NT i=1 t =2 (Xi,t −1 ) →p σu /(1 −

t =2 2 d x

Xi∗,t −1 ϵi∗,t

where σx2 > 0 almost surely, DNT is a normalizing constant that depends only on N and T , and ϵi∗,t = (ϵi,t − ϵ t − ϵ i − ϵ) with an analogous definition of Xi∗,t −1 .

A small-scale Monte Carlo simulation study was conducted to assess the performance of V in small samples. The DGP used for this purpose is a restricted version of that considered in the local power analysis of Appendix B. It is given by Eq. (1), where θt ∼ N (1, 1), γi ∼ N (1, σγ2 ), Xi,t = 1 + Ui,t , Ui,t = ρi Ui,t −1 + ui,t , and Ui,0 = 0. Also, wi,t = (ui,t , ϵi,t )′ ∼ N (0, Σw ) with

Σw =



1

σuϵ

σuϵ 1



,

where σuϵ is used to control the extent of the endogeneity. The slope is set equal to β = cN −1/4 , where c = 0 if H0 is true and c ̸= 0 otherwise. As for the persistency of Xi,t , ρi = 1 for i = 1, . . . , n and ρi = ρ < 1 for i = n + 1, . . . , N. Let us denote the fraction of unit root predictors by δ = n/N. We show in Appendix B that the power of V increases with the persistency of Xi,t , as measured by δ and ρ . Here, we focus on the case where δ = 0.1 and ρ = 0, which might be considered as a ‘‘worst case scenario’’, with the majority of units having no persistence at all. The size and power results based on 5000 replications are reported in Table 1. As expected, we see that V generally has the correct size even when T = 2, which is uncommon even for tests that are supposed to work well when T is finite (see

J. Westerlund, P. Narayan / International Journal of Forecasting 32 (2016) 1162–1177

1167

Table 1 Simulated size and power of V at the 5% level. N /T

c = 0 (size) 2

4

c = 1 (power)

c = 2 (power)

8

20

2

4

8

20

2

4

8

20

2.4 3.3 3.6 4.5

3.5 4.3 5.0 4.8

44.9 41.5 40.9 38.0

40.1 42.3 44.4 45.6

55.1 58.5 61.0 66.7

71.6 81.3 87.8 93.7

51.6 51.0 52.3 51.8

54.9 59.2 62.8 67.5

65.3 73.1 79.8 87.5

78.6 90.1 96.2 99.3

4.2 5.3 5.9 6.2

5.8 5.3 5.0 5.5

24.4 16.9 13.8 8.6

35.1 33.8 31.0 31.2

44.6 44.2 45.1 45.7

64.0 70.3 75.3 82.3

41.1 37.0 34.6 28.7

52.2 54.0 55.4 58.5

62.4 68.7 73.3 81.6

76.6 87.6 94.3 98.6

0.8 0.1 0.0 0.0

1.6 1.1 1.4 1.2

44.5 42.4 40.4 36.7

25.3 18.8 14.0 9.9

43.5 41.0 40.3 37.7

67.5 74.7 82.2 87.5

50.9 50.3 51.7 49.5

45.7 45.2 43.3 41.1

60.8 66.6 70.5 78.0

76.9 88.1 94.7 98.7

4.3 4.8 4.8 5.2

4.1 4.5 5.0 4.9

25.2 21.7 18.1 15.6

24.8 21.8 20.0 19.2

29.0 26.8 23.7 22.7

37.0 36.5 36.8 35.4

37.9 35.4 33.4 30.6

39.0 38.8 37.9 38.4

45.0 46.7 44.6 47.5

51.8 55.4 62.0 67.1

5.1 6.1 6.3 6.2

5.9 5.3 5.0 5.5

6.2 2.5 1.7 1.3

17.5 13.4 13.3 12.3

17.0 14.0 12.7 11.8

22.5 19.4 17.1 15.6

16.9 12.4 7.0 5.4

31.6 27.3 25.6 23.5

34.8 32.4 30.7 27.3

42.5 44.1 46.3 47.6

4.2 4.2 3.9 4.2

4.7 4.0 5.0 4.9

25.2 20.2 16.8 14.3

21.0 16.8 14.9 12.8

25.7 22.6 20.2 18.8

33.6 31.9 32.1 29.7

37.1 34.1 31.6 29.2

36.4 34.4 32.3 32.5

43.3 43.4 42.2 41.1

50.2 53.8 58.2 62.6

δ = 1, σuϵ = σγ2 = 0 20 50 100 200

δ= 20 50 100 200

δ= 20 50 100 200

δ= 20 50 100 200

δ= 20 50 100 200

δ= 20 50 100 200

3.8 1.3 3.5 1.8 3.8 2.6 4.2 2.8 1, σuϵ = 0, σγ2 = 3 1.0 4.8 0.2 5.4 0.4 6.4 0.5 7.2 1, σuϵ = −0.5, σγ2 = 0 3.5 0.5 3.3 0.1 3.5 0.0 3.9 0.0 0.1, σuϵ = σγ2 = 0 3.3 3.5 3.3 4.1 3.8 4.8 4.1 5.1 0.1, σuϵ = 0, σγ2 = 3 0.9 8.0 0.2 7.4 0.3 8.0 0.5 8.1 0.1, σuϵ = −0.5, σγ2 = 0 3.2 3.0 3.3 2.8 3.4 3.0 3.8 3.3

Notes: δ , σuϵ and σγ2 refer to the fractions of unit roots, the covariance between ϵi,t and ui,t , and the variance of γi , respectively. The autoregressive root in the stationary predictors is set to zero. c is such that β = cN −1/4 .

Hadri & Larsson, 2005, for an example taken from the panel unit root literature). However, there is one notable exception, namely when σuϵ = −0.5, in which case V is undersized. This behavior is consistent with the presence of the so-called ‘‘Stambaugh bias’’ (see Hjalmarsson, 2008, for a derivation in the panel case). That is, if σuϵ ̸= 0, the allowance for unit-specific intercepts renders βˆ smallT biased, leading to a undersized test. The size distortions for σuϵ > 0 (not reported) go in the same direction. Hence, though distorted, at least the distortions are in the ‘‘right’’ direction, leading to a conservative test. The distortions also diminish as T increases. The effect of increasing N is less pronounced, due in part to the fact that the size accuracy is quite good already when N = 20. The power decreases slightly with increases in N, but this is mainly among the smaller values of N. Indeed, power is quite flat in N for N ≥ 100, which is just as expected, because there should be no dependence on N asymptotically. This is explained in Appendix B. The fact that power increases with T and c, measuring the distance to the null, is similarly expected. The power drops quite significantly as δ decreases from 1 to 0.1. This is well expected, as deviations from the null are easier to detect if Xi,t is persistent. However, the drop is not that large, given the smallness of the fraction of unit root units, and the fact that the stationary units do not exhibit any serial correlation. In fact, unreported results for some less extreme cases suggest that the power is actually

quite robust to changes in δ . For instance, the power is almost as high for δ = 0.5 as for δ = 1. To take one specific example, if T = 2, c = 1 and σuϵ = σγ2 = 0, the power with δ = 1 is 44.9%, while the power with δ = 0.5 is 41.2% (not reported). 5. Empirical results 5.1. Data The data considered for this application are quite rich in terms of predictors, and include both financial ratios (see Lewellen, 2004, and the references provided therein) and macroeconomic variables (see for example Chen, 2009; Patelis, 1997; Thorbecke, 1997). While considering this body of literature, it is worth highlighting that there is a good mix of predictability studies based on both time series (see Welch & Goyal, 2008; Westerlund & Narayan, 2012, and the references therein) and cross-sectional (see Polk, Thompson, & Vuolteenaho, 2006) data. We use annual data on A-shares listed at the Shanghai and Shenzhen stock exchanges. Panel data on firm returns, the book-to-market ratio, the dividend yield, and the price-earnings ratio are downloaded from the China Stock Market and Accounting Research (CSMAR) database. While data on firm returns are available for a much longer time period, data on financial ratios are only available

1168

J. Westerlund, P. Narayan / International Journal of Forecasting 32 (2016) 1162–1177

between 1994 and 2010. Thus, the sample period is 1994–2010, giving T = 17. We also have time series data on 11 macroeconomic indicators, namely gross fixed capital formation, GDP growth rate, portfolio equity net inflows, foreign direct investment, the deposit interest rate, domestic credit provided by the banking sector, the interest rate spread, the lending interest rate, money and quasi money growth, the bilateral (vis-a-vis the US dollar) exchange rate, and inflation. These data, which are also available only for the period 1994–2010, are downloaded from the World Development Indicators published by the World Bank. The panel data predictors are obviously ‘‘richest’’ in terms of their variation, and we therefore pay extra attention to these. The number of firms available is largest for the book-to-market ratio (and also for returns). In fact, we have data on no fewer than N = 160 firms for this variable, from five sectors: public utilities, properties, industrials, conglomerates, and commerce. The numbers of firms from each of these sectors are 11, 28, 76, 26 and 18, respectively. We also consider dividing the sample according to firms’ market capitalizations. Specifically, we consider four equally sized groups (with 40 firms in each), denoted SIZE1–SIZE4 henceforth, with SIZE1 containing the largest firms. All-in-all, then, when testing for predictability using the book-to-market ratio as the predictor, we have no fewer than 10 panels (including the full panel). Data on the price-earnings ratio and the dividend yield are available for only 60 and 44 firms, respectively. Therefore, in this instance, we only consider the full panel because of the smallness of the cross-section. Obviously, this means that a large number of the firms available for returns will have to be ignored when performing the test. Similarly, in the case of the macroeconomic indicators, while the test can be run on the returns of all 160 firms, the predictors now take the same value for each firm, which again entails a loss of variation/information. Thus, the most ‘‘information intensive’’ test is obtained when using the book-to-market ratio as a predictor. 5.2. Preliminary results We begin with a simple analysis of the persistency of the predictors. As has been pointed out, the size of the new test is correct regardless of the persistency of the predictor, which means that there is really no need for such an analysis for size considerations. However, such is not the case for power, which depends on the persistency, and therefore it is interesting to consider this in spite of the invariance under the null hypothesis. The results are summarized in Table 2. We report the estimated first-order autoregressive coefficient, its standard error and a test for the presence of a unit root. For the time series variables, we use the usual augmented Dickey–Fuller (ADF) test, whereas for the panel variables we use the average ADF test of Im, Pesaran, and Shin (2003, IPS). Both tests are based on the use of the Schwarz Bayesian information criterion (BIC) for selecting the appropriate lag augmentation to correct for possible serial correlation. In order to account for some degree of

cross-section dependence too, the IPS test is applied to data that have been demeaned with respect to a common time effect. Of course, being based on large-T asymptotic theory, these tests are not ideally suited to the data at hand. However, since this is just a preliminary, we continue with these tests, while pointing out that the results should be interpreted with caution, as there remains a possibility that they might be small-T biased. The main conclusion that can be drawn from Table 2 is that, while the time series variables seem largely unit root non-stationary, the evidence for the panel variables is against the unit root hypothesis. However, we know from Section 4 (see also Appendix B) that it is not the homogenously-estimated autoregressive root that matters, but the fraction of units with a unit root. In response to this, we applied the ADF test in a unit-byunit fashion. The results (which are not reported here) for the price-earnings ratio, the dividend yield and the bookto-market ratio suggest that the unit root null fails to be rejected in 55%, 52% and 63% of cases, respectively, which (as we explain in Section 3) suggests that the test has high power. The ranges of estimated autoregressive roots for the same predictors are [−0.43, 1.12], [−0.24, 0.95] and [−0.39, 0.67], which reinforces the ADF evidence of strong persistence at the unit level. We also examined the persistence of the residuals of the estimated predictive regressions. This was done by fitting autoregressive models and using the BIC to determine the appropriate number of lags. The selected lag lengths are all zero, which suggests that there are no major violations of the assumption of no error serial correlation. In Table 3, we also report some results on the correlations between the estimated residuals in the predictive and predictor equations, which can be seen as a measure of endogeneity. The first thing to note is that most of the correlations are significant at the 1% level. We also see that the correlations for most of the panel predictors are quite high (in absolute value), whereas those for the time series predictors are much lower. The correlations are highest for the book-to-market ratio for the commerce sector (−0.76) and lowest for the exchange rate (0). The fact that the endogeneity is quite strong when using the panel predictors indicates that there is a possibility that the test results might be small-sample biased (see Section 4). In order to get a feeling for the extent of this bias, we computed the cross-sectional mean and variance of the estimated intercept for each regression (not reported here). The means are zero (down to the seventh decimal place), and the variances all lie between 0.008 and 0.035, suggesting that, if there is a bias, the effect should be very small. Indeed, the null hypothesis that the individual intercepts are all equal to zero could not be rejected in any of the regressions. 5.3. Evidence of predictability The fact that the restriction in Section 5.2 that γ1 = · · · = γN = 0 could not be rejected suggests that the predictability test could be implemented with σˆ γ2 set to

zero. However, for the sake of completeness, we also report results based on allowing γi to vary across i.

J. Westerlund, P. Narayan / International Journal of Forecasting 32 (2016) 1162–1177

1169

Table 2 Evidence of persistency. Predictor

AR slope

SE

Unit root

p-value

Price-earnings Dividend yield Book-to-market Domestic credit Deposit interest rate Equity Exchange rate Foreign direct investment Capital formation Inflation Spread Lending interest rate Money GDP growth

0.07 0.16 0.65 0.83 0.74 0.67 1.09 0.50 1.03 0.55 0.55 0.77 0.52 0.57

0.03 0.04 0.02 0.12 0.08 0.22 0.09 0.19 0.13 0.09 0.08 0.12 0.19 0.19

−23.99 −10.89 −7.96 −1.37 −3.04 −1.48 −3.65 −2.65

0.00 0.00 0.00 0.57 0.05 0.52 0.02 0.11 0.97 0.10 0.07 0.29 0.12 0.19

0.24

−2.69 −2.95 −1.99 −2.57 −2.28

Notes: ‘‘AR slope’’ and ‘‘SE’’ refer to the estimated first-order autoregressive slope and its standard error, respectively. ‘‘Unit root’’ refers to the value of the relevant unit root test statistic (IPS and ADF for the panel and time series predictors, respectively). Table 3 Evidence of endogeneity. Predictor

Corr

p-value

Predictor

Corr

p-value

Price-earnings Dividend yield Book-to-market Public utilities Properties Industry Conglomerates Commerce SIZE1 SIZE2 SIZE3 SIZE4

0.28 −0.26 −0.52 −0.62 −0.26 −0.64 −0.72 −0.76 −0.41 −0.69 −0.46 −0.65

0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00

Domestic credit Deposit interest rate Equity Exchange rate Foreign direct investment Capital formation Inflation Spread Lending interest rate Money GDP growth

0.10 0.14 0.11 0.00 −0.08 −0.03 −0.09 0.09 0.20 0.13 0.31

0.00 0.00 0.00 0.89 0.00 0.16 0.00 0.00 0.00 0.00 0.00

Notes: ‘‘Corr’’ refers to the estimated correlation between the residuals in the estimated predictive and predictor regressions.

We begin by considering the results of the panel predictors, which are reported in Table 4. The fullpanel results do not provide any evidence in favor of predictability. In fact, all of the p-values are larger than 0.4, suggesting that the evidence of predictability is very weak indeed. This conclusion is confirmed by the estimated rejection frequencies, which lie between 2.5% (for the book-to-market ratio) and 9.1% (for the dividend yield). We also see that the inclusion of unit-specific intercepts has no effect on the results, which is in agreement with our finding that the variance of the estimated intercepts is very close to zero. One possible explanation for these results could be that the full panels are not homogenous enough for pooling, thus causing deceptive inference. Of course, since such a heterogeneity bias would tend to make the test biased towards the alternative of predictability (rather than the opposite), this is unlikely to be the reason for our inability to reject. However, given some of the recent evidence suggesting that firms belonging to different sectors tend to behave differently (see Beltratti, 2005; Narayan & Sharma, 2011; Pennings & Garcia, 2004), we also consider applying our test to our five sectoral subpanels. Furthermore, research has shown that, when looking at variables such as earnings, financial constraints and risk management, liquidity, and incentives for managers, the behaviors of small firms differ from those of large firms (see for example Banz, 1981; Keim, 1983; Reinganum, 1981), and

Table 4 Evidence of predictability. Predictor

Price-earnings Dividend yield Book-to-market Public utilities Properties Industry Conglomerates Commerce SIZE1 SIZE2 SIZE3 SIZE4

No intercepts

Intercepts

p-value

Rej freq

p-value

Rej freq

0.561 0.496 0.448 0.708 0.553 0.454 0.415 0.529 0.500 0.349 0.500 0.454

5.0 9.1 2.5 18.2 14.3 3.9 7.7 5.6 2.5 7.5 7.5 7.5

0.567 0.496 0.440 0.732 0.558 0.444 0.407 0.534 0.500 0.332 0.500 0.448

5.0 9.1 2.5 18.2 14.3 3.9 7.7 11.1 2.5 10.0 10.0 7.5

Notes: The p-value is for the predictability test. ‘‘Rej freq’’ refers to the estimated rejection frequency, and ‘‘Intercepts’’ and ‘‘No intercepts’’ refer to the two cases with and without unit-specific intercepts, respectively.

we therefore also apply our tests to our four size-based subpanels. The subpanel results do not provide any additional evidence of predictability over those for the full panel. However, that being said, we do notice some variation in the estimated rejection frequencies. At one end of the scale, we have the sectoral subpanels, where there seems to be a great deal of variation in the results, with the rejection frequency ranging from 4% (industry) to 18.2%

1170

J. Westerlund, P. Narayan / International Journal of Forecasting 32 (2016) 1162–1177

(public utilities); while at the other end of the scale, we have the size subpanels, for which there is very little variation in the results. This means that, while it is true in general that the evidence of predictability is weak, the evidence that is found is not homogeneous across sectors. This result would seem to go against existing claims that the weak evidence of cross-sectional return predictability in the Chinese market is driven by the homogeneity of predictors. For example, Chen, Kim, Yao, and Yu (2010, p. 405) claim that: ‘‘One possible explanation of weak return predictability in the Chinese market is that Chinese firms are more homogeneous in the firm-specific characteristics examined’’. The results for the macroeconomic predictors are not reported here, but we describe them briefly. As with the panel predictors, there is no evidence against the no-predictability null, and this is the case regardless of the choice of predictor. There are also no differences in the results when looking at the different sector/firm size groupings. Overall, the conclusion that can be drawn from the bulk of the evidence reported in this section is that there is little or no evidence of predictability in the Chinese stock market. While this does not necessarily imply that returns are truly unpredictable, we would argue that it does. The main alternative interpretation would have to be that, though returns are predictable, our test is not powerful enough to detect it. For example, one may argue that T is not large enough, and that this causes the test to have a low power (see Campbell & Thompson, 2008). However, since the local power of the new test is also driven by N, this is unlikely to be the reason. It is true that some of our subpanels are relatively small, but, in view of the Monte Carlo evidence suggesting that the test has reasonable performance even for N = 20, this is not very likely. There is also the issue of whether the persistence of the predictors is high enough, which seems a valid concern in the case of the dividend yield and the price-earnings ratio. However, there seems to be ample persistence for the other predictors. We also estimated β . The values considered for this parameter in the simulations for the most relevant case when N = 20 are between 0.47 (c = 1) and 0.95 (c = 20). The estimates are all smaller than 0.95, and some are less than 0.47. This confirms that the deviations from the null are large enough. Having demonstrated that our results can in fact be taken as evidence against predictability, we conclude with a brief comparison of our findings with those reported in the existing literature, which are based almost exclusively on US time series data. For example, using annual US data for the period 1926–2002, Campbell and Yogo (2006) find that the earnings-price and dividend-price ratios both predict returns, a finding that is confirmed by Cochrane (2008), who finds that the dividend-price ratio predicts US returns in the period 1926–2004. Goyal and Welch (2003) find that the dividend-price ratio and dividend yield both predict returns over the period 1946–1970, but not in the periods 1926–1945 and 1971–2002. Kothari and Shanken (1997) also use annual US data, but for the period 1963–1991. Their findings suggest that, while the book-tomarket ratio does not predict returns, the dividend yield does (at least to some extent).

Thus, the evidence based on the US experience is rather mixed and not very convincing. The results also do not seem very robust to the choice of sample period and data frequency, with annual data generally being more supportive of predictability than monthly or quarterly data. We use annual data, but, in contrast to the evidence based on the US, the evidence for China is almost nonexistent. For the Chinese experience, there is only one previous study that has used financial ratios as predictors, namely that of Chen et al. (2010). Using data from 1994 to 2007, they find that, out of 18 predictors, only the book-tomarket ratio, net operating assets, R&D spending, asset growth and illiquidity predict the cross-sectional variation in returns. The explanation that they offer is twofold: (i) there is too little variation in the predictors, and (ii) stock prices are uninformative. Our results are quite suggestive of (ii); however, our finding that the rejection frequency varies significantly from one sector to another does not support the claim that the characteristics of predictors are relatively more homogenous. 6. Conclusion Considering the well-known power problem of time series tests in the conventional fixed-T environment, it is quite surprising to find that there are only a handful of tests that allow one to test for predictability in panels. Ironically, while having the advantage of being able to exploit the information contained in the cross-sectional dimension, these tests do not work unless T is large, both in an absolute sense and relative to N; and if these requirements are satisfied, one could just as well use a conventional time series test applied to each unit. In the current paper, we take this as our starting point for developing a new test that is valid for any T ≥ 2, provided that N is large enough, which represents an improvement over the conventional large-N, large-T assumption. However, the test is not limited to fixed-T panels, but is also valid when both N and T are large. Interestingly, this generality with respect to the sample size is not the only advantage of the new test. Another advantage is that the predictor is basically unrestricted, in sharp contrast to existing tests, which are very restrictive in this regard. In spite of this, and in contrast to what one would expect, the test is very simple to implement, and has good small-sample properties. In the empirical part of the paper, the new test is used to test for predictability in the Chinese stock market, for which data are available for only 17 years, but where the number firms is relatively large, 160. Acknowledgments The authors would like to thank Rob Hyndman (Handling Editor), an Associate Editor, and two anonymous referees for valuable comments and suggestions. Westerlund would also like to thank the Knut and Alice Wallenberg Foundation and the Jan Wallander and Tom Hedelius Foundation (P2014-0112:1) for financial support.

J. Westerlund, P. Narayan / International Journal of Forecasting 32 (2016) 1162–1177

Appendix A. Proofs In this appendix, we provide the proofs of Theorem 1 and Corollary 1. We begin by introducing some notation. Note that

independent and identically distributed (i.i.d.) across i. Therefore, because fourth moments are finite, by a central limit theorem (CLT) for i.i.d. variables, we obtain



Yi,t − Y t = (γi − γ ) + β(Xi,t −1 − X t −1 ) + (ϵi,t − ϵ t ),

N 1  2 N ∆Ct = √ (ϵi,t − ϵi2,t −1 ) + Op (N −1/2 ) N i=1

(A.1)

→d

which suggests that

vt = =

N 1 

N i=1 N 1 

N i=1



2(κϵ − σϵ4 )Z1t

[(γi − γ ) + β(Xi,t −1 − X t −1 ) + (ϵi,t − ϵ t )]2

 E  ∆ϵi,t −

(A.2)

N 1 

N j=1

 = E (∆ϵi,t )2 −

where N

1 

(γi − γ )2 ,

N i=1

+

N β2  (Xi,t −1 − X t −1 )2 , Bt =

N

Ct =

(ϵi,t − ϵ t )2 ,

Et =

Ft =



(γi − γ )β(Xi,t −1 − X t −1 ),

i =1

N 2 

N i=1

N j =1

 ∆ϵi,t ∆ϵj,t



NE (∆Ct ∆Et ) =

∆vt = ∆Bt + ∆Ct + ∆Dt + ∆Et + ∆Ft .

(A.3)

N i=1 j=1

N 2 

=

×

Consider ∆Ct . Note that ϵi,t is independent across i, with N four finite moments. Hence, E (ϵ 2t ) = N −2 i=1 E (ϵi2,t ) =

=

N −1 σϵ2 = O(N −1 ), which in turn implies |ϵ t | = Op (N −1/2 ). Note that this result holds uniformly over t. Therefore,

=

N 1 

N i=1

N i =1

ϵi2,t − ϵ 2t

ϵi2,t + Op (N −1 ).

Consider (ϵi2,t − ϵi2,t −1 ), which clearly has a mean of zero and variance E [(ϵi2,t − ϵi2,t −1 )2 ] = 2(κϵ − σϵ4 ). It is also

N i=1

N i=1

N k=1

 + O(N −1/2 )

∆ϵk,t

(γi − γ )E (ϵi2,t − ϵi2,t −1 )

∆ϵi,t −

N 2 

N 1 





(A.4)



(γj − γ )E (ϵi2,t − ϵi2,t −1 )

∆ϵj,t −

Proof of Theorem 1. Since β = 0 in this case, we have Bt = Dt = Ft = 0, which in turn implies that Eq. (A.3) simplifies to

(ϵi,t − ϵ t )2 =



(A.6)

 ×

N 1 

N 1 

N N 2 

All of the results in this paper make use of this expression for ∆vt .

∆vt = ∆Ct + ∆Et .



8σϵ2 σγ2 Z2t

i =1

N i=1

i=1

as N → ∞, where Z2t ∼ N (0, 1). Moreover,

Hence, since ∆At = 0, we obtain

Ct =

N

(γi − γ ) ∆ϵi,t − ∆ϵj,t N ∆Et = √ N j =1 N i =1 →d

(Xi,t −1 − X t −1 )(ϵi,t − ϵ t ).

N 1 

∆ϵi,t ∆ϵj,t

N 2 

(γi − γ )(ϵi,t − ϵ t ),

N 2β 

N

N 1 



and so, by the same CLT argument as before,

2β  N

N 2 i=1 j=1

∆ϵj,t

N

N 1 

N

Dt =

N N 1 

2 

  N 1  1 + 2 = E [(∆ϵi,t )2 ] 1 − E [(∆ϵi,t )2 ] = 2σϵ2 ,

i =1

N i=1

(A.5)

as N → ∞, where Z1t ∼ N (0, 1). For ∆Et , we use the fact that E [(∆ϵi,t )2 ] = 2σϵ2 to obtain

(Yi,t − Y t )2

= At + Bt + Ct + Dt + Et + Ft ,

At =

1171

N 1 

N k=1

 ∆ϵk,t

+ O(N −1/2 )

(γi − γ )E [(ϵi2,t − ϵi2,t −1 )∆ϵi,t ]

+ O(N −1/2 ), where the last equality uses the fact that N N 1 

N 2 k=1 i=1

=

(γi − γ )E [(ϵi2,t − ϵi2,t −1 )∆ϵk,t ]

N 1 

N 2 i=1

(γi − γ )E [(ϵi2,t − ϵi2,t −1 )∆ϵi,t ] = O(N −1 ).

1172

J. Westerlund, P. Narayan / International Journal of Forecasting 32 (2016) 1162–1177

Thus, since E [(ϵi2,t − ϵi2,t −1 )∆ϵi,t ] = E (ϵi3,t − ϵi2,t ϵi,t −1 − ϵi,t ϵi2,t −1 + ϵi3,t −1 ) = 2γϵ , where γϵ = E (ϵi3,t ), we obtain

T √ 

N 4γϵ 

NE (∆Ct ∆Et ) =

(γi − γ ) + O(N −1/2 )

N

t =2

i =1

= O(N √

−1/2

),

(A.7)

(ϵi,T − ϵi,1 ) −

×



2(κϵ − σϵ4 ) and Zt = Z1t + Z2t ∼ N (0, 1), we obtain N ∆vt =

N 2  N ∆Et = √ (γi − γ ) N i=1



suggesting that N ∆Ct and N ∆Et are asymptotically uncorrelated, and thus asymptotically independent by the normality of Z1t and Z2t . Hence, letting σv2 = 8σϵ2 σγ2 +



leading T √ to the following asymptotic distribution for N ∆Et : t =2



N (∆Ct + ∆Et ) →d σv Zt

(A.8)

as N → ∞. Note that this result holds for any t ≥ 2. This completes the proof of the theorem. 

→d



T 

t =2

t =2

=

=

t =2

N 1 

(ϵi2,T − ϵi2,1 ) − (ϵ 2T − ϵ 21 )

N i =1

(ϵi2,T − ϵi2,1 ) + Op (N −1 ).

(A.9)

However, E [(ϵi2,T − ϵi2,1 )2 ] = 2(κϵ − σϵ4 ), which means that we can use the same CLT argument as in the proof of Theorem 1 to show that T √ 

N 1 

t =2

N i=1

N ∆Ct =

→d



(ϵi2,T − ϵi2,1 ) + Op (N −1/2 )

2(κϵ − σϵ4 )Z1

(A.10)

as N → ∞ or as N , T → ∞, where Z1 ∼ N (0, 1). T For t =2 ∆Et , note how T √  t =2

N 2  N ∆Et = √ (γi − γ ) N i=1 T 

×

 ∆ϵi,t −

t =2

N i=1



N j =1

 ∆ϵj,t



N 2 

= √

N 1 

(γi − γ ) (ϵi,T − ϵi,1 )

N 1 

N j=1



(ϵj,T − ϵj,1 ) .

(A.11)

Since E [(ϵi,T − ϵi,1 )2 ] = 2σϵ2 , we can use the same steps as in the proof of Theorem 1 to show that

 2  N 1  E  (ϵi,T − ϵi,1 ) − (ϵj,T − ϵj,1 )  = 2σϵ2 , N j =1

(A.12)

N ∆vt =

N (∆Ct + ∆Et ) →d σv Z ,

(A.13)

which again holds regardless of whether N → ∞ or N , T → ∞.  Appendix B. Local power analysis

N 1 

N i =1

(ϵj,T − ϵj,1 )

2(κϵ − σϵ4 ) and Z = Z1 + Z2 ∼ N (0, 1), we obtain T √ 

N i =1 t =2



as N → ∞ or as N , T → ∞, where Z2 ∼ N (0, 1). In an analogy to the √ T proof of Theorem T √ 1, it is possible to show that t =2 N ∆Ct and t =2 N ∆Et are asymptotically uncorrelated. Hence, letting σv2 = 8σϵ2 σγ2 + T √ 

t =2

N j =1

8σϵ2 σγ2 Z2

Proof of Corollary 1. Using the fact that |ϵ t | = Op (N −1/2 ), T N T  1  2 ∆Ct = (ϵ 2t − ϵ 2t −1 ) (ϵi,t − ϵi2,t −1 ) −

N 1 

In this appendix, we study the local power of Vt and V . The DGP used for this purpose is given by a restricted version of that considered in the main text. In particular, while Yi,t is still generated by Eq. (1), the nonparametric treatment of Xi,t is too general for us to draw any interesting conclusions regarding the local power. The specific DGP considered is given by the following set of equations: Xi,t = ηi + Ui,t ,

(B.1)

Ui,t = ρi Ui,t −1 + ui,t ,

(B.2)

where Ui,0 = 0. While this DGP is obviously more restrictive than that considered in the main text, it is actually standard in the literature (see for example Breitung & Demetrescu, 2015; Campbell & Yogo, 2006; Hjalmarsson, 2010; Kauppi, 2001; Kostakis et al., 2015; Lanne, 2002; Lewellen, 2004; Stambaugh, 1999; Westerlund & Narayan, 2015a,b). However, it should be pointed out that this is just an example of one type of DGP that could be considered. Indeed, as we point out in Section 2, there are basically no restrictions as to the types of predictors that can be accommodated. Thus, the local power will have to be worked out on a case-by-case basis. Here, we focus on a relatively simple DGP. The reason for this is twofold. First, as has been mentioned, Eqs. (B.1) and (B.2) represent a standard consideration in the literature. Second, the simpler the DGP, the more transparent the results will be. Assumption B.1 replaces Assumption 1. Here and throughout this appendix, wi,t = (ui,t , ϵi,t )′ , and Ci,t is the sigma-field generated by {wi,s }ts=1 . Assumption B.1. wi,t is independent across i, with E (wi,t |Ci,t −1 ) = 0, E (ϵi4,t ) = κϵ < ∞, E (u4i,t ) = κu < ∞, E (wi,t wi′,t |Ci,t −1 ) = Σw,t , and E (Σw,t ) = Σw is positive definite.

J. Westerlund, P. Narayan / International Journal of Forecasting 32 (2016) 1162–1177

Note that the off-diagonal elements of Σw are not restricted in any way. Thus, the predictor can be endogenous (or rather, weakly exogenous), which is again consistent with the existing literature. Another well-known feature of predictive regressions is that Xi,t can be quite persistent. In the dividend-price ratio example, Xi,t is expected to be highly persistent because of the persistence of prices. Let us therefore denote the number of units with ρi = 1 by n, such that Xi,t is unit root non-stationary.2 The assumption that we are going to make is that the fraction of units with a unit root is nonnegligible. Assumption B.2. δ = n/N > 0. Assumption B.2 is not very restrictive, especially when considering the usual macroeconomic and financial ratio predictors, which are known to be very persistent. For later use, it will be convenient to assume that the units for which ρi = 1 are ordered first. Hence, in what follows, X1,t , . . . , Xn,t are unit root processes. The null hypothesis of interest is again given by H0 : β = 0. A common way to formulate the alternative hypothesis is to assume that β ̸= 0 is non-local, in the sense that the degree of predictability is not allowed to depend on the sample size. However, such a specification only tells us whether the test is consistent, and, if so, at what rate. Therefore, in order to enable us to evaluate the power analytically, this papers consider an alternative where β is ‘‘local-to-zero’’ as N → ∞. −1/4

Assumption B.3. β = cN , where c is a drift parameter such that |c | < ∞. The local alternative hypothesis considered in this appendix is given by H1 : c ̸= 0. The above ‘‘local-tozero’’ specification of β is similar to the near-unit root assumption that is typically employed in the panel unit root literature (see Westerlund & Breitung, 2013, and the references provided therein).3 As has been mentioned, one reason for considering such a local specification is that it allows us to take an analytical approach to power. However, while this is very appealing theoretically, the main reason is empirical. Indeed, whenever predictability is found, the evidence is usually weak, suggesting that the deviations from the no-predictability null are not large. Remark B.1. As was mentioned in Section 2, the homogenous slope restriction with β equal for all i is not a restriction under the no-predictability null. However, it does represent a restriction under the alternative, though the conclusions are qualitatively the same. In order to appreciate

this, let us denote the heterogeneous slope for unit i by βi . Suppose that βi = ci N −1/2 , where ci is a random variable such that E (cik ) = µk < ∞ for k = 1, . . . , 4. Also, ci is independent of all other random elements of the DGP. In this case, the results reported in this appendix have c k replaced with µk . Theorem B.1. Under Assumptions 1, 2 and B.3,√as N → ∞ with 2 ≤ T < ∞, or as N , T → ∞ jointly with T /N 1/4 → 0, for each 2 ≤ t ≤ T ,



N ∆vt − c 2 σu2 ρ t →d σv Zt ,

where Zt ∼ N (0, 1) and

ρt =

N 1 

N i=1

root specification in panels is the rate of shrinking under the local alternative. In particular, while the rate here is given by N −1/4 , in the panel unit root case it is typically given by N −1/2 T −1 (see Westerlund & Breitung, 2013). The reasons for this difference are twofold. First, unlike the bulk of the panel unit root literature, this paper does not require T → ∞. Second, unlike most unit root test statistics, which are constructed as t-ratios, the test statistic considered here is a variance ratio.

ρi2(t −2) . √

Proof. Consider Eq. (A.3). Since N (∆Ct + ∆Et ) is the same as in the proof of Theorem 1, we only need to consider ∆Bt , ∆Ct and ∆Ft . We begin with ∆Bt . By using the fact that Xi,t = ηi + Ui,t , we obtain Bt =

N β2  (Xi,t −1 − X t −1 )2

N

i=1

N β  2

=

N

2

Xi2,t −1 − β 2 X t −1

i=1

N β  = (ηi + Ui,t −1 )2 − β 2 2

N

=

N

=

N β2 

N

N

ηi Ui,t −1 −

i=1

Ui2,t −1 −

i=1

N β2 

+

N 2 i=1 j=1

i=1

N

+

N  N β2 

ηi2 −

N 2β 2 

+



 η − 2 i

i=1 N 2β 2 

N

N

ηi −

(ηi + Ui,t −1 )

ηi Uj,t −1

Ui,t −1 Uj,t −1

 ηi ηj

N j =1

Ui2,t −1 −

i=1

N 2 i=1 j=1

N 1 

i=1

N β2 

N N 2β  

N 2 i=1 j=1

N j =1

2

ηi ηj

N  N β2 

N 1 



N 1 

N i=1

i=1

N β2 

+ 2 Strictly speaking, it is not necessary to assume that ρ is exactly one. i It is enough if ρi → 1 as N T → ∞, which means that ρi can be ‘‘local-tounity’’, a situation that has received much interest in the literature (see for example Cavanagh et al., 1995; Elliott & Stock, 1994; Lanne, 2002; Westerlund & Narayan, 2015b). 3 One difference between Assumption B.3 and the typical near-unit

1173

 ηj Ui,t −1

N  N β2 

N 2 i=1 j=1

Ui,t −1 Uj,t −1 ,

(B.3)

and therefore

∆Bt =

N 2β 2 

N

+



 ηi −

i=1

N β2 

N

N 1 

N j =1

 ηj ∆Ui,t −1

(Ui2,t −1 − Ui2,t −2 )

i=1

N  N β2  (Ui,t −1 Uj,t −1 − Ui,t −2 Uj,t −2 ). 2

N

i =1 j =1

(B.4)

1174

J. Westerlund, P. Narayan / International Journal of Forecasting 32 (2016) 1162–1177

Consider the last term on the right-hand side. From Ui,t =  t t −s ui,s , s =1 ρ i E (Ui,t Ui,k ) =

t  k 

ρit +k−s−j E (ui,s ui,j )

s=1 j=1

=

k  k 

t +k−s−j E i

ρ

(ui,k−s ui,k−j )

s=1 j=1

=

k 

ρit +k−2j E (u2i,j ) = σu2

j =1

k 

ρit +k−2j

(B.5)

j =1

∞ as T → ∞ for t = ⌊rT ⌋ and r ∈ [0, 1]. Then, by Lemma 6 of Phillips and Moon (1999), it suffices to show N 2(t −2) that lim supN , T →∞ P (|N −1 i=1 Zi,t − c 2 σu2 ρi | > ε) = 0 for all ε > 0. By using first Chebyshev’s inequality, then 2(t −2) the independence of [Zi,t − c 2 σu2 ρi ] over i,    N 1     2 ( t − 2 ) 2 2 P  [Z − c σu ρi ] > ε  N i=1 i,t   2  N 1  1 2 ( t − 2 ) 2 2 [Zi,t − c σu ρi ]  ≤ 2E ε N i=1

for t ≥ k. Hence, letting t = ⌊rT ⌋ and k = ⌊uT ⌋, then E (Ui,t Ui,k ) = O(T ) under the ‘‘worst case’’ scenario when ρi = 1. Making use of this, we can show that

=

 2  N N N 1  1  E Ui,t  = 2 E (Ui,t Uj,t )

=

N i =1

N

=

i=1 j=1

N 1 

E(

Ui2,t

N 2 i =1

) = O(TN

−1

Clearly, if the variance is O(TN −1 ), then |N −1



ε2 N 2

2(t −2) 2

E ([Zi,t − c 2 σu2 ρi

] )

i=1

N 1 1 

ε2 N 2

([E (Zi2,t ) − c 4 σu4 ρi4(t −2) ]).

(B.9)

i=1

Consider E (Ui2,t Ui2,s ) for t ≥ s. A direct calculation reveals that

).

N

must be O( T N −1/2 ). This implies

N 1 1 

i =1

Ui,t |

t  t  s  s 

E (Ui2,t Ui2,s ) =

ρi2(t +s)−k−j−m−n

k=1 j=1 m=1 n=1

    N N   1  √ |c |  1     β Ui,t −1  ≤ 1/4  Ui,t −1  = Op ( T N −3/4 ),   N  N i=1   N i=1

× E (ui,k ui,j ui,m ui,n ) t  s  = ρi2(t +s−k−m) E (u2i,k )E (u2i,m ) k=1 m=1

and so

+2

  N  N   β2    (Ui,t −1 Uj,t −1 − Ui,t −2 Uj,t −2 )  2   N i =1 j =1 = Op (TN

−3/2

+ (B.6)

Let Zi,t = c ( Since

Ui2,t −1



Ui2,t −2

) and ρ t = N

−1

i =1 ρ

N

2(t −2) . i

s 



4 u

s 

t −1 

ρi2(t −1−j) −

2 2(t −2) u i

=c σ ρ

t −2 

ρi2(t −2−j)

(B.7)

E [β 2 (Ui2,t −1 − Ui2,t −2 )] =

1 

=c σ

N i=1

E (Zi,t )

N i=1

[Zi,t − c σ ρ

] →p 0

+ κx ρi

4 u

t −1 

2(2(t −1)−m) i



ρ

3

m=1

− 2c 4 σu4

Since ui,t has four moments, (Ui2,t −1 − Ui2,t −2 ) has two. Therefore, if T is fixed, by a law of large numbers for independently (but not identically) distributed variables, 2 2(t −2) u i

ρi

,

−2m

(B.10)

E (Zi2,t ) = c 4 E (Ui4,t −1 − 2Ui2,t −1 Ui2,t −2 + Ui4,t −2 ) 4

N

2

 −2n

from which it follows that

,

= c 2 σu2 ρ t .

N 1 

ρi−2k

n =1

j =1

N

N i=1

t  k=1



+2



we have 1 



s

j =1



ρ

2(t +s−m) i

m=1



2

ρi2(t +s)−4n E (u4i,n )

n=1

E (Zi,t ) = c 2 E (Ui2,t −1 − Ui2,t −2 )

= c 2 σu2

ρi2(t +s−m−n) E (u2i,m )E (u2i,n )

m=1 n=1

).

2

s  s 

(B.8)

as N → ∞. Establishing convergence is more difficult if 2(t −2) both N and T are large. Suppose that ρi → ρi (r ) <

t −1 

 ρi

+ κx ρi

−2n

−2m

n =1 t −2 

ρi2(2(t −1)−1−m)

m =1

 ×

t −1 

ρi

−2k

+2

k=1

+ c 4 σu4

t −2 

 ρi

−2n

n =1

t −2 

ρi2(2(t −1)−2−m)

m=1

 ×

3

t −2  n=1

 ρi

−2n

+ κx ρi

−2m

+ κx ρi

−2m

J. Westerlund, P. Narayan / International Journal of Forecasting 32 (2016) 1162–1177

= c 4 σu4 (ρi − 1)2

t −2 

ρi2(2(t −1)−m)

=

m=1

 ×

3

t −2 

 ρi

−2n



ρi2(t −2−m)

N

m=1

+ c 4 σu4 (3 + κx ),

3/4

∆Dt =

(B.11)



([E (

δ2 N 2

)−

N 2β 

N 1/4 i=1

=√

N i =1

   N 1     2 ( t − 2 ) P  [Z − c 2 σu2 ρi ] > δ  N i=1 i,t  Zi2,t

4(t −2) c 4 u4 i

σ ρ

E [(N 3/4 ∆Dt )2 ] =

])

i=1

(B.12)

N 1 

[Zi,t − c 2 σu2 ρi2(t −2) ] →p 0

N 

=

(ηi − η)E (∆Ui,t −1 ) = 0.

E [(∆Ui,t )2 ] = E (Ui2,t − 2Ui,t Ui,t −1 + Ui2,t −1 )

 =σ

1 + (ρi − 1)

2

t −1 

ρ

2(t −1−j) i

 ,

(B.14)

j=1

which is O(1), regardless of whether |ρi | < 1 or ρi = 1. It follows that

β4

N 

  N     2  (ηi − η)∆Ui,t −1  = Op (1). β  i=1  It follows that

N i=1

N i=1

(γi − γ )2 E

N 4c 2 

N

(γi − γ )2 E [(∆Ui,t −1 )2 ]

i=1

(B.17)

√ Finally, consider ∆Ft . By using |Xi,t | = Op ( T ) and |ϵ t | = Op (N −1/2 ), we can show the following:     N N β    |c |  1     Xi,t −1 ϵ t  ≤ 1/4  Xi,t −1  |ϵ t |   N i =1   N  N i=1 √ −3/4 = Op ( T N ),

=

N β2 

N 4c 

where we have used the fact that ∆Xi,t = ∆Ui,t . The first term on the right-hand side is O(1), and therefore so is E [(N 3/4 ∆Dt )2 ]. It follows that

(ηi − η)2 E [(∆Ui,t −1 )2 ] = O(1).

N ∆Bt − c 2 σu2 ρ t = √

∆ X j ,t − 1 ,

+ O(N −1/2 ),

Therefore, since the variance is bounded,



N j=1



N i=1

Ft =

N c4 

N i=1

N 1 

∆Xj,t −1

from which it follows that

E [(ηi − η)2 (∆Ui,t −1 )2 ]

i=1

=

(γi − γ ) ∆Xi,t −1 −

√ | N ∆Dt | = Op (N −1/4 ).

Regarding the variance of this quantity, note that

2 u



N j =1



+ O(N −1/2 ) N 4  = c 2 σu2 (γi − γ )2 E [(∆Ui,t −1 )2 ]

(B.13)

i=1

i =1

(γi − γ ) ∆Xi,t −1 −

N 1 

N j=1

as N , T → ∞ with T /N → 0. Finally, for the first term on the right-hand side of ∆Bt , since E (∆Ui,t −1 ) = 0, E [(ηi − η)∆Ui,t −1 ] =



 2  N 1  ×  ∆Xi,t −1 − ∆ X j ,t − 1 

which is o(1), provided that T /N = o(1). Hence, by Lemma 6 of Phillips and Moon (1999),

N 

(B.16)

which has mean zero with variance

= O(TN −1 ),

N i=1

[Zi,t − c 2 σu2 ρi2(t −2) ]

N 2c 

which is O(T ) if ρi = 1. Therefore,

N 1 1 

N i =1

as N → ∞ or as N , T → ∞, provided that T /N → 0. Next, consider ∆Dt . We have

n =1

+ 2c 4 σu4 (3ρi2 − 1)

N 1 

+ Op (TN −3/2 ) = o(1)

+ κx ρi

−2m

t −2

1175

(Ui2,t −1 − Ui2,t −2 )

− c 2 σu2 ρ t + Op (TN −3/2 )

(B.15)

=

N 2β 

N

(Xi,t −1 − X t −1 )(ϵi,t − ϵ t )

i=1

N 2β 

N

N 2β 

N

Xi,t −1 (ϵi,t − ϵ t )

i=1



Xi,t −1 ϵi,t + Op ( T N −3/4 ),

(B.18)

i=1

where the second equality holds by taking deviation from means. Hence, since Xi,t −1 ϵi,t − Xi,t −2 ϵi,t −1 = ∆Xi,t −1 ϵi,t − Xi,t −2 ∆ϵi,t ,

  N  |c |  2   | N ∆Ft | ≤ 1/4  √ (∆Xi,t −1 ϵi,t − Xi,t −2 ∆ϵi,t )  N  N i=1 √

1176

J. Westerlund, P. Narayan / International Journal of Forecasting 32 (2016) 1162–1177



+ Op ( T N −1/4 ) √ = Op ( T N −1/4 ),

(B.19) −1/2

N √i=1  ∆Xi,t −1 ϵi,t and N −1/2 Ni=1 Xi,t −2 ∆ϵi,t are O(1) and O( T ), where the last equality uses the fact that N

respectively. In order to appreciate this last result, note that, by cross-section independence (of ϵi,t and Xi,t −2 ),

 2  N 1  E √ Xi,t −2 ∆ϵi,t 

=

=

N N c2  

N i=1 j=1 N c2 

N i=1 N c2 

N i=1

E (Xi,t −2 Xj,t −2 ∆ϵi,t ∆ϵj,t )

E (Xi2,t −2 )E [(∆ϵi,t )2 ] = O(T ).

N ∆vt − c σ ρ t = 2 u



N (∆Ct + ∆Et )

√ + Op ( T N −1/4 ) →d σv Zt (B.20) √ as N → ∞ or as N , T → ∞ with T /N 1/4 → 0, where Zt ∼ N (0, 1). This completes the proof of the theorem.  Suppose that σv is known. Theorem B.1 implies that 2

c σ ρt 2

Vt −

2 u

σv

→d Z t

(B.21)



as N → ∞ or as N , T → ∞ jointly with T /N 1/4 → 0. According to Theorem 1, the asymptotic null distribution of Vt is given by N (0, 1). This means that the power must be driven by c 2 σu2 ρ t /σv , which comprises four parameters, ρ t , c, σu2 and σv2 . The last two parameters are obviously positive. For the first, note that ρ t > 0 as long as t is fixed. If t → ∞, then

ρt =

n n1

N n i =1



n 1

n i=1

ρi2(t −2) +

N −n N

ρi2(t −2) + o(1). 2(t −2)

σv

ρ t →d Z

(B.23)

t =2

V = Op (T ).

E [Xi2,t −2 (∆ϵi,t )2 ]

2

T c 2 σu2 



When taken together with √ Theorem 1, which gives us the asymptotic distribution of N (∆Ct + ∆Et ), the above results for ∆Bt , ∆Ct and ∆Ft imply



V−

as N → ∞ or as N , T → ∞ jointly with T /N 1/4 → 0. This discussion of the behaviour of ρ t implies that, T provided that δ > 0, such that ρ t > 0, we have t =2 ρ t = Op (T ). Hence, if the alternative is true, so that c ̸= 0,

N i=1

=

the properties of Xi,t deviate from those of ϵi,t the easier it will be for the test to detect violations of the null. Theorem B.1 gives the asymptotic local power function for Vt . However, by the same steps as were used in establishing Corollary 1, Theorem B.1 can be extended to provide the local power function for V as well. Specifically,

1

N 

N − n i=n+1

ρi2(t −2) (B.22)

Hence, since δ > 0 and ρi = 1 whenever ρi = 1, we have that ρ t → δ > 0 as t → ∞. Thus, ρ t , σu2 and σv2 are all positive, and the extent of the power is determined by c. If c = 0, then we are back under the null, whereas if c ̸= 0, the mean of the test statistic will be positive, and therefore the power will be larger than the size. The level of power is determined by the relative magnitudes of c, ρ t , σu2 and σv2 . Note in particular that the power increases with both the level and the variability of ρi , as measured by ρ t . This is as expected, since, as was pointed out in Section 2, the more

(B.24)

This means that the power goes to one as T → ∞, illustrating the potential to increase the power by allowing both N and T to go to infinity. In fact, it is not difficult to see that the appropriate local alternative to consider in the large-T case is given by β = cN −1/4 T −1/2 , with the resulting drift in the local power function for V being given by T c 2 σu2 1 

σv T

ρt =

t =2



n T c 2 σu2 δ 1  

σv c 2 σu2 δ

σv

nT i=1 t =2

ρi2(t −2) + o(1)

,

(B.25)

where the last result is the same as for Vt . Remark B.2. The rate of shrinkage of the local alternative measures the radius of the neighborhood around unity for which the power is non-negligible. Thus, a test that has power within N −1/4 neighborhoods is more powerful than one whose local power is defined within N −1/6 neighborhoods, say. While there are no other fixed-T panel predictability tests, there are various tests that can be used when T is large. Westerlund and Narayan (2015a) provide the only other study that we know of that has considered local power in the current panel predictabiliy context. According to their results, the Lagrange multiplier (LM) test that they consider has local power within N −1/2 T −1 neighborhoods. Thus, this test is more powerful than V . However, the LM test is also much more restrictive, requiring, among other things, X1,t , . . . , XN ,t to be pure unit root processes that are independent of each other, which is too restrictive to be held in any empirically relevant scenario. This means that, while a rejection by this test could be due to predictability, it could also be due to size distortions.



T /N 1/4 → 0 if Remark B.3. The requirement that N , T → ∞ jointly is much stronger than in Theorem 1. However, this condition is only necessary for deriving the stated expression for the drift √ in the asymptotic distribution, c 2 σu2 ρ t . Thus, while T /N 1/4 → 0 can be relaxed under the local alternative as well, the resulting drift term would not be as informative as that in Theorem B.1.

J. Westerlund, P. Narayan / International Journal of Forecasting 32 (2016) 1162–1177

References Banz, R. W. (1981). The relationship between return and market value of common stocks. Journal of Financial Economics, 9, 3–18. Beltratti, A. (2005). Capital market equilibrium with externalities, production and heterogeneous agents. Journal of Banking & Finance, 29, 3061–3073. Breitung, J., & Demetrescu, M. (2015). Instrumental variable and variable addition based inference in predictive regressions. Journal of Econometrics, 187, 358–375. Campbell, J. Y., & Shiller, R. J. (1988). Stock prices, earnings, and expected dividends. Journal of Finance, 43, 661–676. Campbell, J. Y., & Thompson, S. B. (2008). Predicting excess stock returns out of sample: Can anything beat the historical average? Review of Financial Studies, 21, 1509–1531. Campbell, J. Y., & Yogo, M. (2006). Efficient tests of stock return predictability. Journal of Financial Economics, 81, 27–60. Cavanagh, C., Elliott, G., & Stock, J. (1995). Inference in models with nearly integrated regressors. Econometric Theory, 11, 1131–1147. Chen, S.-S. (2009). Predicting the bear stock market: macroeconomic variables as leading indicators. Journal of Banking & Finance, 33, 211–223. Chen, X., Kim, K., Yao, T., & Yu, T. A. (2010). On the predictability of Chinese stock returns. Pacific-Basin Finance Journal, 18, 403–425. Cochrane, J. H. (2008). The dog that did not bark: A defense of return predictability. Review of Financial Studies, 21, 1533–1575. Conrad, J., & Kaul, G. (1988). Time-variation in expected returns. Journal of Business, 61, 409–425. Elliott, G., & Stock, J. (1994). Inference in time series regression when the order of integration of a regressor is unknown. Econometric Theory, 10, 672–700. Fama, E. F., & French, K. R. (1988). Permanent and transitory components of stock prices. Journal of Political Economy, 96, 246–273. Ferreira, M. A., & Santa-Clara, P. (2011). Forecasting stock market returns: The sum of the parts is more than the whole. Journal of Financial Economics, 100, 514–537. Goyal, A., & Welch, I. (2003). Predicting the equity premium with dividend ratios. Management Science, 49, 639–654. Guo, H., & Savickas, R. (2010). Relation between time-series and crosssectional effects of idiosyncratic variance on stock returns. Journal of Banking & Finance, 34, 1637–1649. Hadri, K., & Larsson, R. (2005). Testing for stationarity in heterogeneous panel data where the time dimension is finite. Econometrics Journal, 8, 55–69. Hamburner, M., & Kochin, L. (1972). Money and stock prices: The channels of influences. Journal of Finance, 27, 231–249. Hjalmarsson, E. (2008). The Stambaugh bias in panel predictive regressions. Finance Research Letters, 5, 47–58. Hjalmarsson, E. (2010). Predicting global stock returns. Journal of Financial and Quantitative Analysis, 45, 49–80. Im, K.-S., Pesaran, H., & Shin, Y. (2003). Testing for unit roots in heterogeneous panels. Journal of Econometrics, 115, 53–74. Kauppi, H. (2001). Panel data limit theory and asymptotic analysis of a panel regression with near integrated regressors. In B. H. Baltagi, T. B. Fomby, & R. C. Hill (Eds.), Advances in econometrics: Vol. 15. Nonstationary panels, panel cointegration, and dynamic panels (pp. 239–274). Emerald Group Publishing Limited. Keim, D. B. (1983). Size-related anomalies and stock return seasonality: Further empirical evidence. Journal of Financial Economics, 12, 13–32. Kostakis, A., Magdalinos, T., & Stamatogiannis, M. P. (2015). Robust econometric inference for stock return predictability. Review of Financial Studies, 28, 1506–1553.

1177

Kothari, S. P., & Shanken, J. (1997). Book-to-market, dividend yield, and expected market returns: A time series analysis. Journal of Financial Economics, 44, 169–203. Lanne, M. (2002). Testing the predictability of stock returns. The Review of Economics and Statistics, 84, 407–415. Lewellen, J. (2004). Predicting returns with financial ratios. Journal of Financial Economics, 74, 209–235. Lo, A. W., & MacKinlay, A. C. (1988). Stock market prices do not follow random walks: evidence from a simple specification test. Review of Financial Studies, 1, 41–66. Narayan, P. K., & Sharma, S. S. (2011). New evidence on oil price and firm returns. Journal of Banking & Finance, 35, 3253–3262. Nelson, C. R., & Kim, M. J. (1993). Predictable stock returns: the role of small sample bias. Journal of Finance, 48, 641–661. Patelis, A. D. (1997). Stock return predictability and the role of monetary policy. Journal of Finance, 52, 1951–1972. Pennings, J. M. E., & Garcia, P. (2004). Hedging behaviour in small and medium-sized enterprises: The role of unobserved heterogeneity. Journal of Banking & Finance, 28, 951–978. Phillips, P. C. B., & Moon, H. R. (1999). Linear regression limit theory of nonstationary panel data. Econometrica, 67, 1057–1111. Polk, C., Thompson, S., & Vuolteenaho, T. (2006). Cross-sectional forecasts of the equity premium. Journal of Financial Economics, 81, 101–141. Pontiff, J., & Schall, L. D. (1998). Book-to-market ratios as predictors of market returns. Journal of Financial Economics, 49, 141–160. Rapach, D. E., Strauss, J. K., & Zhou, G. (2010). Out-of-sample equity premium prediction: Combination forecasts and links to the real economy. Review of Financial Studies, 23, 821–862. Reinganum, M. R. (1981). Misspecification of capital asset pricing, empirical anomalies based on earnings yields and market values. Journal of Financial Economics, 9, 19–46. Stambaugh, R. F. (1999). Predictive regressions. Journal of Financial Economics, 54, 375–421. Thorbecke, W. (1997). On stock market returns and monetary policy. Journal of Finance, 52, 635–654. Welch, I., & Goyal, A. (2008). A comprehensive look at the empirical performance of equity premium prediction. Review of Financial Studies, 21, 1455–1508. Westerlund, J. (2015). The effect of recursive detrending on panel unit root tests. Journal of Econometrics, 185, 452–467. Westerlund, J., & Breitung, J. (2013). Lessons from a decade of IPS and LLC. Econometric Reviews, 32, 547–591. Westerlund, J., & Larsson, R. (2012). Testing for unit roots in a panel random coefficient model. Journal of Econometrics, 167, 254–273. Westerlund, J., & Narayan, P. K. (2012). Does the choice of estimator matter when forecasting returns? Journal of Banking & Finance, 36, 2632–2640. Westerlund, J., & Narayan, P. K. (2015a). A random coefficient approach to the predictability of stock returns in panels. Journal of Financial Econometrics, 13, 605–664. Westerlund, J., & Narayan, P. K. (2015b). Testing for predictability in conditionally heteroskedastic stock returns. Journal of Financial Econometrics, 13, 342–375.

Joakim Westerlund is Professor of Economics at Lund University (80%) and Professor of Financial Econometrics at Deakin University (20%). Paresh Narayan is Alfred Deakin Professor and Professor of Finance at Deakin University.