Pitfalls in market timing test

Pitfalls in market timing test

Economics Letters 103 (2009) 123–126 Contents lists available at ScienceDirect Economics Letters j o u r n a l h o m e p a g e : w w w. e l s ev i e...

157KB Sizes 3 Downloads 119 Views

Economics Letters 103 (2009) 123–126

Contents lists available at ScienceDirect

Economics Letters j o u r n a l h o m e p a g e : w w w. e l s ev i e r. c o m / l o c a t e / e c o l e t

Pitfalls in market timing test Chia-Shang Chu ⁎, Liping Lu, Zhentao Shi China Center for Economic Research, Peking University, Beijing 100871, China

a r t i c l e

i n f o

Article history: Received 23 July 2007 Received in revised form 24 January 2009 Accepted 27 January 2009 Available online 20 February 2009

a b s t r a c t Henriksson and Merton's market timing test suffers nontrivial size distortion when the observations are serially dependent sequences. Potential danger of finding spurious timing ability can be avoided by implementing a Markov regression that includes the lagged dependent variables as additional explanatory variables. © 2009 Elsevier B.V. All rights reserved.

Keywords: Market timing test Markov regression Spurious regression JEL classification: C12 C22

1. Introduction Making accurate event forecasts is important in many financial and economic applications. The ability to forecast an event, such as whether the stock returns will exceed yield on Treasury bills in the next period, is crucial in a fund manager's strategic asset allocations. In designing an early warning system, a binary warning signal against, for example, the currency crisis (Kaminsky et al., 1998), is usually obtained from truncating some real-valued leading indicators, and the reliability of such a warning signal obviously depends upon its event forecast ability. While the methods for assessing the accuracy of real-valued forecasts are well developed in econometrics (Diebold and Mariano, 1995; Granger and Newbold, 1986; West, 1996), they do not seem readily applicable to the event forecast, however. Although some crude measure such as the percentage of correct prediction can be easily calculated from matching the event forecasts with the realized event time series, it is informal and admits no further statistical analysis. The “market timing” test developed by Henriksson and Merton (1981) is seminal (HM test, henceforth). The null hypothesis is that the event forecasts have no timing ability in the event time series of interest. Since both the event forecasts and event time series of interest are binary sequences, the HM test is also equivalent to test for the independence of two binary time series. The HM test turns out to be a pillar in the literature of event forecast accuracy. Subsequently proposed measures for the event forecast accuracy such as the Kuiper

⁎ Corresponding author. Tel.: +86 10 6275 8935; fax: +86 10 6275 1474. E-mail address: [email protected] (C.-S. Chu). 0165-1765/$ – see front matter © 2009 Elsevier B.V. All rights reserved. doi:10.1016/j.econlet.2009.01.034

score by Granger and Pesaran (2000), and the adjusted noise-tosignal ratio by Kaminsky et al. (1998) can be shown to be closely related to the HM measure. Specifically, the Kuiper score is actually the sample counterpart of HM measure. Henriksson and Merton (1981) market timing test is an exact small sample test with hypergeometric null distribution under the independently identically distributed assumption (i.i.d., henceforth). Pesaran and Timmerrnann (1992) derive asymptotic normal distribution of their market timing test. Granger and Pesaran (2000) further elaborate Pesaran and Timmermann's test (1992) to connect with the Kuiper score. However, statistical properties of the adjusted noise-to-signal ratio received no discussion by Kaminsky et al. (1998). Empirical applications of the HM tests are abundant, c.f. Henriksson (1984), Cumby and Modest (1987), McIntosh and Dorfman (1992), Gencay (1998), Cramer (1999), Buchananan et al. (2001), Greer (2003), Romacho and Cortez (2006), to name a few. However, conducting the HM test may not be as straightforward as it seems because the event time series and event forecasts may well be dependent, which clearly violates the i.i.d. assumption required in the HM test. Bollen and Busse (2001) and Greer (2003) express similar concern. Some methodological extensions of the HM test developed in Jiang (2003) and Marquering and Verbeek (2004) also require i.i.d. assumption. These authors are very aware of the problem that may arise from the violation of the i.i.d. condition. Jiang (2003) admits that his test is vulnerable to serial correlation. Marquering and Verbeek (2004) conduct simulations and show that their test suffers from size distortion when serial correlation is present. The purpose of this article is to investigate the performance of the HM test when it is applied to test for the independence of two

124

C.-S. Chu et al. / Economics Letters 103 (2009) 123–126

individually autocorrelated binary time series. We show in this paper that there is a potential danger of finding a spurious timing ability. In other words, a researcher may find himself rejecting the null hypothesis of no timing ability too often. As a result, an event forecast model that has no timing ability may be mistakenly valued. This is parallel to the spurious regression documented in Granger and Newbold (1986). The rest of this paper is organized as follows. In Section 2 we begin with an i.i.d. bivariate Bernoulli model, and discuss various measures on the event forecast accuracy. Section 3 contains an empirical example of spurious timing ability and simulations of the HM test. Size distortion and the resulting pitfalls due to the autocorrelated event forecasts and event time series are investigated. We propose some empirical strategies to account for serial dependence in Section 4. 2. Market timing tests and measures of event forecast accuracy— a synthesis Let {Xt ∈ {0, 1}, t = 1,...,T} be a sequence of event forecasts, and {Yt ∈ {0, 1}, t = 1,...,T} be the event time series of interest. For example, if one forecasts that a market will be up at time t, Xt = 1, and if the market is indeed up at time t, Yt = 1. Let the joint probability function of Xt and Yt be xy

ð1 − yt Þxt

PrðXt = xt ; Yt = yt Þ = P11t t P10

y ð1 − xt Þ ð1 − yt Þð1 − xt Þ P00

P01t

ð1Þ

where xt,yt = 0, 1; P00 + P01 + P10 + P11 = 1. It is easy to compute the marginal of X t and Y t, which both obey univariate Bernoulli distributions, i.e. xt

1 − xt

PrðXt = xt Þ = P1 · ð1 −P1 · Þ PrðYt = yt Þ = P

;

yt 1 − yt ; · 1 ð1−P · 1 Þ

where P1 · = PrðXt = 1Þ = P10 + P11 where P · 1 = PrðYt = 1Þ = P01 + P11 :

HM test is based on the sum of two conditional probabilities defined in Eq. (2), also known as the HM measure. They show that a necessary and sufficient condition for an event forecast to have no value (or no timing ability) is that HM measure equals unity. H0 : PrðXt = 0jYt = 0Þ + PrðXt = 1jYt = 1Þ = 1

ð2Þ

Routine computation shows that PrðXt = 0j Yt = 0Þ + PrðXt = 1j Yt = 1Þ − 1 =

P00 P P P − P10 P01 + 11 − 1 = 11 00 : 1 − P·1 P·1 ð1 − P · 1 ÞP · 1

ð3Þ In other words, the null of HM test is equivalent to cov(Xt,Yt) = 0. Conditional expectations can also be computed straightforwardly P EðYt jXt = 1Þ = 11 ; P1 ·

P01 EðYt jXt = 0Þ = ; 1 − P1 ·

Xt = 0 Xt = 1

Yt = 0

Yt = 1

n00 n10

n01 n11

Granger and Pesaran (2000) measure the event forecast accuracy with the Kuiper score, defined as KS =

n00 n01 − : n00 + n10 n01 + n11

ð5Þ

Event forecast is said to be of no skill if KS = 0. Rewrite Eq. (5) as   n00 n01 n00 n11 ð6Þ − = − 1− n00 + n10 n01 + n11 n00 + n10 n01 + n11 n00 n11 + −1 = n00 + n10 n01 + n11

KS =

It is clear from Eq. (6) that KS = 0 is exactly the sample counterpart of Eq. (2). Thus, if the null hypothesis in the HM test is true, the Kuiper score converges to zero in probability. Though Henriksson and Merton derive an exact small sample test for Eq. (3) based on a hypergeometric null distribution, it is possible to derive the asymptotic null distribution of KS from the limiting theory of MLE, using the product of Eq. (1) as the likelihood function. Kaminsky et al. (1998) pioneer the “leading indicator” approach to predict the event of currency crises. To measure the event forecast accuracy of a leading indicator, they suggest using the “adjusted noiseto-signal ratio” defined as N=S =

n00 n11 = ; n00 + n10 n01 + n11

ð7Þ

provided n11 ≠ 0. The numerator in Eq. (7), n00 = ðn00 + n10 Þ, is the proportion of false alarms given no crisis, and the denominator is the ratio of successful forecasts given a crisis has occurred. Intuitively, one prefers crisis forecast with low N/S. While Eq. (7)) can be easily calculated, however, Kaminsky et al. (1998) do not provide a formal statistical test to determine whether a particular value of N/S is evidence of prediction ability. Simply comparing the N/S ratios of two leading indicators when both have no forecasting ability does not seem meaningful. Evidently from Eqs. (6) and (7), N = S = 1 − KS =

n11 : n01 + n11

It follows that N/S = 1 if and only if KS = 0. In other words, N/S measure is equally insightful as the Kuiper score. 3. Marketing timing tests for autocorrelated binary time series

or equivalently P P01 P01 EðYt jXt Þ = Xt 11 + ð1 − Xt Þ = + P1 · 1 − P1 · 1 − P1 ·

Arrange the observations of Xt and Yt in the following two by two contingency matrix in which Xt is the forecast and Yt is the realization.



 P11 P01 X: − P1 · 1 − P1 · t

ð4Þ

Consider the following empirical example. rt is the first difference of China's money supply M1, which includes 348 observations dated from January 1978 to December 2006. The event time series of interest, representing the up and down of the money supply, is defined as 

In view of Eq. (4), the regression coefficient of Yt on Xt is P11 P01 P11 P00 − P10 P01 P1 · − 1 − P1 · = ð1 − P1 · ÞP1 · : It is useful to cast the HM test into regression Eq. (4). In a large sample, it is computationally easier to carry out the market timing test via the regression t test, since normal approximation to the t statistic is valid. Moreover, one way to characterized the spurious regression is to base upon the inflated t statistic in Regression (4).

Yt =

1; if rt z 0 : 0; if rt b 0

ð8Þ

In order to forecast the realization Yt, we use a constant-meanreturn model, that is, rt = ct + t :

C.-S. Chu et al. / Economics Letters 103 (2009) 123–126

We construct a 12-observation estimation window, then obviously the 1-step ahead predictor is Eðrt jrt−12 ; :::; rt−1 Þ = ĉt =

t−1 1 X r; 12 i = t−12 i



1; if ĉt z 0 0; if ĉt b 0

Yt = 0:563 + 0:199Xt + et ; ð0:108Þ ð0:11Þ

ð10Þ

the p-value of the t statistic is 0.0723. Under the HM test, the p-value is 0.046. Both tests confirm that event forecast Xt does have timing ability on the event time series Yt at the ten percent significance level. Note that the Durbin–Watson statistic in regression Eq. (10) is 1.52, which indicates strong sample autocorrelation in the residuals. For another example, let rt denote the first difference of the monthly price of the US five-year treasury note dated from January 1962 to June 2007. Yt and Xt are defined similarly as in Eqs. (8) and (9) respectively. First order sample autocorrelations are ρY (1) = 0.136 and ρX (1) = 0.801. Again we regress Yt on a constant and Xt, Yt = 0:473 + 0:073Xt + et ; ð0:03Þ ð0:043Þ

ð11Þ

the p-values of the t statistic and HM test are 0.090 and 0.074 respectively. Both reject the null hypothesis at the ten percent level. We conduct simulations to investigate how the serial correlations distort the market timing tests. We generate Xt and Yt independently from two-state first order Markov chains, and hence by construction the correlation of Xt and Yt is zero, or in other words Xt does not have timing ability in predicting Yt. Markov chains exhibit serial dependence. In our simulations we use the first order autocorrelation to characterize various degrees of serial dependence in Xt and Yt. The empirical sizes of HM test and the regression t test calculated from 5000 replications are reported in Table 1. When there are no serial dependence in Xt and Yt, as indicated in the first row, there is no sign of inflated sizes in both tests. The HM appears a little conservative. This is due to the fact that the null distribution of HM test is a discrete-type hypergeometric distribution. The critical value (integer-valued) corresponding to exactly five percent tailed nominal probability does not often exist when the sample size is small.

Table 1 Simulated probabilities of type I error when individual time series are autocorrelated.

ρY ρY ρY ρY ρY ρY

(1) = 0.0; (1) = 0.3; (1) = 0.5; (1) = 0.7; (1) = 0.0; (1) = 0.6;

HM test ρX(1) = 0.0 ρX(1) = 0.3 ρX(1) = 0.5 ρX(1) = 0.7 ρX(1) = 0.6 ρX(1) = 0.0

Regression t test

T = 50

100

200

T = 50

100

200

0.038 0.107 0.231

0.050 0.119 0.284

0.057 0.137 0.310

0.071 0.160 0.301

0.075 0.160 0.337

0.075 0.168 0.343

ð9Þ

both binary time series exhibit serial correlations, as the first order sample autocorrelations are ρY (1) = 0.223 and ρX (1) = 0.666. We estimate the regression of Yt on a constant and Xt to examine if Xt has timing ability on Yt,

DGPs

HM test

ρY (1) = 0.3; ρX(1) = 0.3 ρY (1) = 0.5; ρX(1) = 0.5 ρY (1) = 0.7; ρX(1) = 0.7

We denote the event forecast Xt as Xt =

Table 2 Simulated probabilities of type I error when individual time series autocorrelated nonMarkovian. DGPs

taf13; 14; N ; 348g:

125

The second to fourth rows correspond to cases of slight, moderate, and high serial dependence, respectively, with individual autocorrelation coefficients equal to 0.3, 0.5, and 0.7. As the individual autocorrelations grow stronger, the size distortion becomes more evident. Particularly, when both the first order autocorrelation coefficients reach 0.7, the empirical size of the t test quadruples the 5% nominal size. We also observe nontrivial size distortion in the case of moderate serial dependence. We do not consider the case of autocorrelations higher than 0.7. The serial dependence in the binary time series within the market timing content is usually not too strong, particular when the binary time series is obtained from truncating the value of some real-valued process. The correlation attenuation from such “clipping” is also well documented in statistics literature (Kedem, 1981). The last two rows of Table 1 suggest that size distortion is due to both the serial dependence of the binary time series Xt and Yt. If either one of Xt and Yt is serially independent, there is little sign of wrong test size. This is similar to the Granger and Newbold (1986)'s argument in spurious regression (cf, p. 207). We also generate independent non-Markovian binary time series Xt and Yt both via discrete-ARMA(1,1) by Jacobs and Lewis (1978) and MacDonald and Zucchini (1997). As shown in Table 2, non-Markovian processes appear worsening the problem of spurious timing ability. To solve the problem of inflated size, we suggest estimating the following regression: EðYt jXt ; Yt−1 Þ = γ0 + γ1 Xt + θYt−1

which is a special case of Markov regression models in Cox (1970) and Zeger and Qauish (1988). This is not to be interpreted as a modeling strategy for Yt, but rather as an artificial regression for implementing the market timing test. The intuition is quite simple. The use of lagged dependent variable should be able to remove the serial correlation in the binary regression. The advantage of this remedy is that we do not need to propose a specific DGP for (Xt, Yt). It is possible to derive the Wald test for timing ability if (Xt, Yt) follows a four-state Markov model. In this approach, one has to commit a particular specification on the DGP of (Xt, Yt) and employ a limiting theorem on the maximum likelihood. Details are available on request. Alternatively, one can also model the marginal distribution of series parametrically or non-parametrically, and base a market timing test on the copula (Chen and Fan, 2006). This approach is currently under the authors' investigation. To examine the performance of Markov regression in Eq. (12), we estimate Eq. (10) with an additional lagged dependent variable. Yt =

Regression t test

T = 50

100

200

T = 50

100

200

0.030 0.038 0.075 0.182 0.028 0.027

0.034 0.046 0.086 0.211 0.034 0.033

0.038 0.049 0.106 0.227 0.036 0.038

0.055 0.072 0.123 0.245 0.054 0.054

0.052 0.070 0.129 0.252 0.054 0.056

0.050 0.070 0.133 0.261 0.052 0.056

The nominal size is 5%. Simulation for the 10% nominal size is available on request, and the number of replications is 5000. ρX (1) and ρY (1) denote the first order autocorrelations of X and Y, respectively.

ð12Þ

0:472 + 0:106Xt + 0:240Yt−1 + et ; ð0:011Þ ð0:011Þ ð0:052Þ

ð13Þ

Eq. (11) is re-estimated as well, Yt =

0:426 + 0:043Xt + 0:126Yt−1 + et ; ð0:036Þ ð0:044Þ ð0:044Þ

ð14Þ

The p-values of the t statistics of Xt in Eqs. (13) and (14) are 0.338 and 0.329 respectively, which fails to reject the null hypothesis of no

126

C.-S. Chu et al. / Economics Letters 103 (2009) 123–126

References

Table 3 Simulated probabilities of type I error t test based on Markov regression. DGPs ρY ρY ρY ρY ρY ρY

(1) = 0.0; (1) = 0.3; (1) = 0.5; (1) = 0.7; (1) = 0.0; (1) = 0.6;

Markovian ρX(1) = 0.0 ρX(1) = 0.3 ρX(1) = 0.5 ρX(1) = 0.7 ρX(1) = 0.6 ρX(1) = 0.0

Non-Markovian

T = 50

100

200

0.046 0.050 0.057 0.071 0.051 0.043

0.046 0.051 0.052 0.059 0.052 0.049

0.048 0.054 0.050 0.059 0.042 0.053

T = 50

100

200

0.058 0.071 0.088

0.051 0.059 0.082

0.054 0.060 0.068

For a discrete-ARMA(1,1) DGP, it is impossible to consider the case of ρY (1) = 0.0 or ρX (1) = 0.0.

timing ability. Simulations similar to Tables 1 and 2 for the t test based on Markov regression are reported in Table 3. As seen from the first row of Table 3, when there is no serial dependence in individual binary time series, Markov regression with additional lagged dependent variable is innocuous. When both Xt and Yt are individually autocorrelated, Markov regression does a good job in correcting inflated sizes, compared to results presented in Tables 1 and 2.

4. Concluding remarks The null hypothesis of no timing ability states that the event forecast is independent of the event time series. The HM test and its equivalent t test based on simple binary regression suffer nontrivial size distortion when the event forecasts and the event time series are autocorrelated. The stronger is the autocorrelation, the larger is the size inflation. Thus event forecasts that have no forecast skill may appear to have timing ability due to distorted testing size. Such a result of spurious timing ability is parallel to the spurious regression documented in Granger and Newbold (1974), where they show that the regression t test can be misleading when a regression of two independent AR(1) processes is estimated. Practically, it is important to examine firstly whether serial correlation is present in event forecasts and event time series. If so, standard market timing test can be quite misleading and we recommend using the t test based on the Markov regression.

Bollen, N.P.B., Busse, J.A., 2001. On the timing ability of mutual fund managers. The Journal of Finance 56 (3), 1075–1094. Buchananan, W., Hodges, P., Theis, J., 2001. Which way the natural gas price: an attempt to predict the direction of natural gas spot price movements using trader positions. Energy Economics 23, 279–293. Chen, X., Fan, Y., 2006. Estimation of copula-based semiparametric time series models. Journal of Econometrics 130, 307–335. Cox, D.R., 1970. The Analysis of Binary Data. Chapman and Hall, London. Cramer, J.S., 1999. Predictive performance of the binary logit model in unbalanced samples. The Statistician 48 (1), 85–94. Cumby, M., Modest, D., 1987. Testing for market timing ability: a framework for forecast evaluation. Journal of Finance Economics 19, 169–189. Diebold, F., Mariano, R., 1995. Comparing predictive accuracy. Journal of Business and Economic Statistics 13, 253–265. Gencay, R., 1998. Optimization of technical trading strategies and the profitability in security markets. Economics Letters 59, 249–254. Granger, C., Newbold, P., 1974. Spurious regressions in econometrics. Journal of Econometrics 2, 111–120. Granger, C., Newbold, P.,1986. Forecasting Economic Time Series, 2nd edn. Academic Press. Granger, C.W.J., Pesaran, M.H., 2000. Economic and statistical measures of forecast accuracy. Journal of Forecast 19, 537–560. Greer, M., 2003. Directional accuracy tests of long-term interest rate forecasts. International Journal of Forecasting 19 (19), 291–298. Henriksson, R.D., 1984. Market timing and mutual fund performance: an empirical investigation. The Journal of Business 57 (1), 73–96. Henriksson, R.D., Merton, R.C., 1981. On market timing and investment performance. II. Statistical procedures for evaluating forecasting skills. The Journal of Business 54 (4), 513–533. Jacobs, P., Lewis , P., 1978. Discrete time series generated by mixtures. I: Correlational and runs properties. Journal of the Royal Statistical Society 40 (1), 94–105. Jiang, W., 2003. A nonparametric test of market timing. Journal of Empirical Finance 10, 399–425. Kaminsky, G., Lizondo, S., Reinhart, C., 1998. Leading indicators of currency crisis. IMF Staff Papers 45, (1). Kedem, B., 1981. Binary Time Series. Marcel Dekker, New York. MacDonald, I.L., Zucchini, W., 1997. Hidden Markov and Other Models for DiscreteValued Time Series. Chapman and Hall. Marquering, W., Verbeek, M., 2004. A multivariate nonparametric test for return and volatility timing. Finance Research Letters 1, 250–260. McIntosh, C.S., Dorfman, J.H., 1992. Qualitative forecast evaluation: a comparison of two performance measures. American Journal of Agricultural Economics 74 (1), 209–214. Pesaran, M.H., Timmerrnann, A., 1992. A simple nonparametric test of predictive performance. Journal of Business and Economic Statistics 10 (4), 461–465. Romacho, J.C., Cortez, M.C., 2006. Timing and selectivity in Portuguese mutual fund performance. Research in International Business and Finance 20, 348–368. West, K., 1996. Asymptotic inference about predictive ability. Econometrica 64,1067–1084. Zeger, S.L., Qauish, B., 1988. Markov regression models for time series: a quasilikelihood approach. Biometrics 44 (4), 1019–1031.