Irrational fads, short-term memory emulation, and asset predictability

Irrational fads, short-term memory emulation, and asset predictability

Review of Financial Economics 22 (2013) 213–219 Contents lists available at ScienceDirect Review of Financial Economics journal homepage: www.elsevi...

622KB Sizes 0 Downloads 17 Views

Review of Financial Economics 22 (2013) 213–219

Contents lists available at ScienceDirect

Review of Financial Economics journal homepage: www.elsevier.com/locate/rfe

Irrational fads, short-term memory emulation, and asset predictability☆ Stelios D. Bekiros ⁎,1,2 European University Institute, Florence, Italy Athens University of Economics and Business, Athens, Greece Rimini Centre for Economic Analysis (RCEA)

a r t i c l e

i n f o

Available online 6 June 2013 JEL classification: G10 G14 C53 C58 Keywords: Machine learning Neural networks Volatility trading Stock predictability

a b s t r a c t Opponents of the efficient markets hypothesis argue that predictability reflects the psychological factors and “fads” of irrational investors in a speculative market. In that, conventional time series analysis often fails to give an accurate forecast for financial processes due to inherent noise patterns, fat tails, and nonlinear components. A recent stream of literature on behavioral finance has revealed that boundedly rational agents using simple rules of thumb for their decisions under uncertainty provides a more realistic description of human behavior than perfect rationality with optimal decision rules. Consequently, the application of technical analysis in trading could produce high returns. Machine learning techniques have been employed in economic systems in modeling nonlinearities and simulating human behavior. In this study, we expand the literature that evaluates return sign forecasting ability by introducing a recurrent neural network approach that combines heuristic learning and short-term memory emulation, thus mimicking the decision-making process of boundedly rational agents. We investigate the relative direction-of-change predictability of the neural network structure implied by the Lee–White–Granger test as well as compare it to other well-established models for the DJIA index. Moreover, we examine the relationship between stock return volatility and returns. Overall, the proposed model presents high profitability, in particular during “bear” market periods. © 2013 Elsevier Inc. All rights reserved.

1. Introduction In view of empirical studies that stock prices can be predicted with a fair degree of reliability, advocates of efficient markets hypothesis (e.g., Fama & French, 1995) claim that such results are based on time-varying-equilibrium expected returns generated by rational pricing in an efficient market, which compensates for the level of risk undertaken. On the contrary, opponents (e.g., La Porta, Lakonishok, Shliefer, & Vishny, 1997; Shiller, 2002) argue that predictability reflects the psychological factors and fashions or “fads” of irrational investors in a speculative market. This irrational behavior has been emphasized by Shleifer and Summers (1990) and Black (1986) in their exposition of noise traders who act on the basis of imperfect information and consequently cause prices to deviate from their equilibrium values. Arbitrageurs dilute a minor part of these shifts in prices, yet the major component of deviation is tradable. Moreover, Black claimed ☆ This research is supported by the Marie Curie Fellowship (FP7-PEOPLE-2011-CIG, No. 303854) under the 7th European Community Framework Programme. The usual disclaimers apply. ⁎ Department of Economics, Via della Piazzuola 43, I-50133 Florence, Italy. Tel.: +39 055 4685916; fax: +39 055 4685 902. E-mail addresses: [email protected], [email protected]. 1 Department of Accounting and Finance, 76 Patission str, GR104 34, Athens, Greece. Tel./fax: +30 210 8203453. 2 RCEA, Via Patara, 3, 47900, Rimini, Italy. Tel.: +39 0541 434 142; fax: +39 0541 55 431. 1058-3300/$ – see front matter © 2013 Elsevier Inc. All rights reserved. http://dx.doi.org/10.1016/j.rfe.2013.05.005

that noise traders play a useful role in promoting market liquidity. Overall, there are two major types of agents in heterogeneous markets: “fundamentalists,” who base their expectations upon dividends, earnings, growth or even macroeconomic factors, and “chartists” (noise traders and technical analysts), who instead base their trading strategies upon historical patterns and heuristics and try to extrapolate trends in future asset prices. Conventional time series analysis, based on stationary processes, does not always perform satisfactorily on economic and financial time series (Harvey, 1989). Economic data are not generally described by simple linear structural models, white noise or random walks. The most commonly used techniques for financial forecasting are regression methods and autoregressive integrated moving average (ARIMA) models (Box & Jenkins, 1970). These models have been used extensively in the past, but they often fail to give an accurate forecast for some series due to inherent noise patterns, fat tails, and nonlinear components. The major challenge for “chartists” is the development of new models, or the modification of existing methods, that would enhance forecasting ability particularly for time series with dynamic time variant patterns. Simon (1957) argued that boundedly rational agents using simple rules of thumb for their decisions under uncertainty, provides a more accurate and realistic description of human behavior than perfect rationality with optimal decision rules. The key arguments of behavioral agent-based models reported by Hommes (2001, 2006) are closely related to Kahneman–Tversky analysis in psychology that individual behavior under uncertainty can be described by simple heuristics and biases.

214

S.D. Bekiros / Review of Financial Economics 22 (2013) 213–219

The present study focuses on the boundedly rational agents. Specifically, the predictive return sign ability of trading rules that rely on a simple switching strategy is investigated: positive predicted returns are executed as long positions and negative returns as short positions. A similar strategy has been employed, with considerable success, by a number of other researchers (Fernández-Rodriguez, Gonzalez-Martel, & Sosvilla-Rivero, 2000; Gençay, 1998). In general terms, they find that the returns from the switching strategy are higher than those from the passive one for annual returns, even when transaction costs are high. They also find that the asset return predictability is increased during volatile periods. The buy and sell signals are produced from technical trading strategies that incorporate various linear or nonlinear econometric models. Currently, technical analysis is being widely used among market practitioners as an effective technique to earn significant profits from financial trading (Kaufman, 1998; Kirkpatrick & Dahlquist, 2007; Murphy, 1999). Recent work based on a behavioral finance perspective has revealed that the application of technical analysis in trading can consistently produce high returns, albeit it is still considered a “pseudoscience” compared to conventional fundamental analysis (Camillo, 2008; Irwin & Park, 2007; Lo, Mamaysky, & Wang, 2000; Plummer & Ridley, 2003). This is due to the inherent subjectivity and a general lack of established guidelines to systematically determine the amount of relevant historical information as well as the optimal parameters of the technical rules employed. Machine learning techniques have been employed in economic systems in modeling nonlinearities and especially toward simulating human behavior. In general, machine learning refers to systems capable of autonomous acquisition and integration of knowledge. It is a very active area of research in artificial intelligence concerned with developing learning algorithms in real-world complex systems. In many applications, taking for instance financial markets, the modeling difficulty is inbuilt because the state space is enormous, the behavior of agents complex and conflicting and the system environment cannot be described by a linear analytical model as in classic model-based control. Hence, it is crucial and essential for the agents to have the ability to learn and adapt. Extensive research in the area of nonlinear modeling has shown that neural networks enhance financial forecasting, mainly because they perform advanced mathematical and statistical processes such as nonlinear interpolation and function approximation. Neural networks are parallel computational models comprising input and output vectors as well as processing units (neurons) interconnected by adaptive connection strengths (weights), trained to store the “knowledge” of the network. Adya and Collopy (1998) demonstrated the advanced predictive ability of neural networks for time series forecasting. White (1989) and Kuan and White (1994) suggested that the relationship between neural networks and conventional statistical approaches for time series forecasting is complementary. Additionally, the function approximation properties of neural networks have been thoroughly investigated by many authors. The results in Cybenko (1989), Funahashi (1989), Hornik (1991), Hornik, Stinchcombe, and White (1989), and Hecht-Nielsen (1989) demonstrated that feedforward networks with sufficiently many hidden units and properly adjusted parameters can approximate any function to any desired degree of accuracy. Poddig (1993) applied a feedforward neural network to predict exchange rates and compared results to regression analysis. Other examples using neural networks in stock and currency markets include Gençay (1998), Green and Pearson (1994), Rawani (1993), Weigend (1991), Yao, Poh, and Jasic (1996), and Zhang (1994). The main objective of this study is the development of heuristic learning algorithms that decode the decision-making process of boundedly rational agents, i.e., technical analysts in the dynamic and noisy environment of financial markets. In this direction, an approach combining learning and short-term memory emulation is followed. In the next section, we discuss the empirical investigation of the relation between volatility and stock returns. Next, Section 3 describes how a new recurrent neural network model for “heuristic

trading” is constructed. In Section 4, a preliminary descriptive statistical analysis is conducted, and the Lee–White–Granger (LWG) statistic (1993) is applied to daily returns of the DJIA in order to rationalize the use of neural networks as the optimal approximating model of the true data-generating nonlinear process. Finally, the empirical results are presented in Section 5, including comparative analysis with respect to other forecasting models, while Section 6 provides concluding remarks.

2. Volatility spillovers and second-moment effects The market timing ability of simple trading rules could be enhanced when conditional implied second-moment effects might be incorporated in a nonlinear heuristic system. In this paper, we investigate this possibility and attempt to advance the existing literature. The empirical investigation of the relation between stock return volatility and stock returns has a long tradition in finance literature. According to the “time-varying risk premium theory,” the return shocks are caused by changes in conditional volatility (Bekaert & Wu, 2000). When bad news arrives in the market, the current volatility increases and this causes upward revisions of the conditional volatility. This increased conditional volatility has to be compensated by a higher expected return, leading to an immediate decline in the current value of the market. An asymmetric nature of the volatility response to return shocks emerges from the above theory. While bad news generates an increase in conditional volatility, the net impact of good news in not clear. Another explanation to the asymmetric reaction of the conditional volatility may be offered through the “leverage effects” (Christie, 1982). A negative (positive) return increases (reduces) financial leverage, which makes the stock riskier (less risky) and increases (reduces) volatility. However, the causality here is different: the return shocks lead to changes in conditional volatility, whereas the time-varying premium theory contends the opposite. An alternative rationalization for the relation of conditional volatility revisions and stock returns may be offered by invoking trigger strategies in the equity markets (Krugman, 1987). Institutional participants in equity markets react whenever the maximum expected loss of portfolios, as measured for example by the value at risk (VaR), reaches a predetermined level and therefore share price dynamics are being driven, partly, by revisions in the measured conditional volatility.3 Each time the conditional volatility rises, a number of those portfolios will deviate from their pre-determined level of VaRs, hitting their risk limits, and this will generate a re-allocation of assets toward safer ones. When portfolio insurers leave the market, the stock prices must fall in order for the other investors to be given an incentive to hold a larger quantity of stock. If a rational expectations world is further assumed, then investors take into account the effects of portfolio insurance schemes and no drop in stock prices is being observed. Furthermore, many researchers study the contemporaneous relationship between the 1-day stock index returns and the associated changes in the level of implied volatility indices (Giot, 2005; White, 1989). The response seems to be asymmetric and the explanation offered could be that option traders react to negative returns by bidding up the implied volatility. In this study, the trading implications of conditional volatility are examined within a broader framework as concerns the nonlinear functional form of the forecast generating mechanism as well as the presence of past returns that might have forecasting power. Christoffersen and Diebold (2006) in their paper show that volatility dependence produces sign dependence and therefore forecasting ability, as long as expected returns are nonzero. The intuition behind this relationship is that volatility changes will alter the probability of observing negative or positive returns. More specifically, the higher the volatility, the higher the

3 VaR depends on a multiple of the estimated conditional volatility under the assumption of normally distributed returns

S.D. Bekiros / Review of Financial Economics 22 (2013) 213–219

probability of a negative return, as long as the expected returns are positive. Moreover, they show that this result is entirely consistent with the existence of no conditional mean dependence, or the absence of conditionally Gaussian distributions. In the present paper, we investigate the empirical relationship between volatility and stock returns, and we examine whether the asset return predictability is increased during volatile periods. The buy and sell signals are produced from technical trading strategies that incorporate various linear or nonlinear econometric models. 3. Artificial learning with “memory” Neural networks are equivalent to nonparametric regression techniques where the underlying nonlinear function is not prescribed, ex-ante, explicitly. In financial applications, a class of ANN models, the single-layer feedforward networks (FNNs), is often used. In an FNN, information suitably weighted is passed from the input layer to a further layer of hidden neurons. This hidden information is also assigned a weight and finally reaches the output layer that represents the forecast. Hornik et al. (1989), have shown that single-layer FNNs can approximate any member of a class of functions with error decreasing at rates as fast as q−1/2 and that the dimension of the input space does not affect the rate of approximation. Despite the importance of selecting the optimum number of hidden neurons, there is no explicit formula for that matter. The geometric pyramid rule proposed by Masters (1993) considers neurons for a three-layer network with n inputs and m outputs. Katz (1992) indicates that an optimal number of hidden neurons can be found between one-half to three times the number of inputs, whereas Ersoy (1990) proposes doubling the number of neurons until the network's RMSE performance deteriorates. The output of a neural network is produced via the application of a transfer function. The functionality is to modulate the output space as well as prevent outputs from reaching very large values which can “block” training. Levich and Thomas (1993) and Kao and Ma (1992) found that hyperbolic sigmoid and tansigmoid transfer functions are appropriate for financial markets data because they are nonlinear and continuously differentiable, which are desirable properties for network learning. Learning typically occurs through training, where the training algorithm iteratively adjusts the connection weights. Common practice is to divide the sample into three distinct sets called the training, validation and testing (out-of-sample) sets; the training set is the largest and is used by the neural network to learn the patterns presented in the data, the validation set is used to evaluate the generalization ability in order to avoid overfitting and the training set should consist of the most recent observations that are processed for testing predictability. The validation error starts decreasing until the network begins to overfit the data and the error will then begins to rise. The weights are calculated at the minimum value of the validation error. Specifically, if Xt = (x1,t,…,xp,t) is feedforward network with q hidden units, the output is given by 2 yt ¼ S4β0 þ

q X i¼1

0 βi G@α i0 þ

p X

13 α ij xj;t A5 ¼ f ðxt ; zÞ

ð1Þ

j¼1

where i = 1,…,q and j = 1,…,p. Consider z = (β0,…,βq,α11,…αij…,αqp)T as the weight vector and S and G as the hyperbolic tangent sigmoid transfer functions. In a stock market context, let pt,t = 1,2,....,T be the daily stock index price. The daily returns are then calculated by xt = log(pt) − log(pt − 1). The solution of the network considers estimation of the unknown vector z with a sample of data values. A recursive estimation methodology, which is called backpropagation, is used to estimate the weight vector, as follows ztþ1 ¼ zt þ η∇f ðxt ; zt Þ⋅½yt −f ðxt ; zt Þ

ð2Þ

215

where ∇f(xt,z) is the gradient vector with respect to z, and η is the learning rate (i.e., it controls the size of the change of the weight vector on the tth iteration). The z vector update is achieved via the minimization of the mean square error function. An alternative approach is the Bayesian updating (Foresee & Hagan, 1997), where the weights and biases of the network are assumed to be random variables with specified distributions. A major disadvantage of this method is that it generally takes longer to converge than backpropagation. However, in a dynamic context, it is natural to include lagged dependent variables as explanatory variables in the FNN since the correct number of lags needed in the hidden layer is unknown. This problem is being addressed in the relevant literature by constructing recurrent networks, i.e., networks with feedbacks from the hidden neurons to the input layer with delay. Recurrent neural networks exhibit characteristics simulating short-term memory. In this study, the recurrent neural networks by Elman (1990) have been utilized. In the Elman networks with a single hidden layer, the lagged outputs of the hidden neurons are fed back into the hidden neurons themselves. The output of the network is given now by " yt ¼ G β 0 þ

q X

# βi ⋅g i;t þ εt

ð3Þ

i¼1

0 where gi;t

¼ G@α i0 þ

p X j¼1

α ij xj;t þ

q X

1 δih g h;t−1 A with G the hyperbolic

h¼1

tangent sigmoid transfer function and z = (β0,…,βq,α11,…αij…,αqp,δ11, …δih…,δqq)T the weight vector. The inputs correspond to the returns over the previous n periods, following Gençay (1998a,b) and Fernández-Rodriguez et al. (2000). As concerns the transfer functions G and S, we use the sigmoid function which normalizes the values of each neuron to be in the interval (−1, +1). The network learns the underlying nonlinear structure by using the error backpropagation method. Once the neural network has been trained, it is applied over a different data set covering the validation period. The purpose here is to evaluate the generalization ability of the trained network in order to avoid overfitting. 4. Preliminary analysis The performance of the models is examined using logarithmic returns of the Dow Jones Industrial Average. The sample spans between 02/08/1971 and 02/05/2002 (8087 observations). This sample contains diverse regimes and several “extreme” events including the Asian crisis and the rise and fall of the tech-market bubble, which makes the analysis for technical traders particularly interesting in trend forecasting. Furthermore, it provides an empirical benchmark also applicable to other turbulent periods such as the financial crisis of 2007–2010, which lead to global recession and was caused by the credit insolvency of investment institutions and high oil prices. The Dow Jones Industrial Average (DJIA) was founded on May 26, 1896, and is now owned by Dow Jones Indexes, which has its majority owned by the CME Group. It is an index that shows how 30 large publicly owned companies based in the United States have traded during a standard trading session in the stock market, and it is the second oldest U.S. market index after the Dow Jones Transportation Average, which was also created by Dow. The average is price weighted, and to compensate for the effects of stock splits and other adjustments, it is currently a scaled average. The value of the Dow is not the actual average of the prices of its component stocks, but the sum of the component prices divided by a divisor, which changes whenever one of the component stocks has a stock split or stock dividend (Fig. 1). The daily returns are calculated as rt = ln(Pt) − ln(Pt − 1), where Pt denotes the daily index price. The predictive performance of the models is examined in the period PTotal: 04/08/1998–02/05/2002

216

S.D. Bekiros / Review of Financial Economics 22 (2013) 213–219 0.08

14000

0.07

12000 10000

0.05 8000 0.04 6000 0.03

Index level

Volatility

0.06

4000

0.02

2000

0.01 0 8/2/71

8/2/74

8/2/77

8/2/80

8/2/83

8/2/86

8/2/89

8/2/92

8/2/95

8/2/98

8/2/01

0

Fig. 1. Daily closing prices and historical volatility (DJIA). Total period: 02/08/1971–02/05/2002; out-of-sample testing period: 4/2/1998–2/5/2002. The historical volatility is estimated based on a rolling 21-day standard deviation (02/08/1971–02/05/2002).

(1000 observations) with the use of a 1-day rolling window. To enhance robustness in the results, the out-of-sample back-testing period is segmented into 2 subperiods, namely, PI: 04/08/1998–01/14/2000 and PII: 01/17/2000–02/05/2002, corresponding to a bull and a bear market period respectively. In this study, the bull and the bear states are formalized following the definition of Lunde and Timmermann (2004), according to which the market switches from a bull (bear) to a bear (bull) period if prices have declined (increased) by a certain percentage since their previous peak within that bull (bear) state. The switching point in our testing sample corresponds to a DJIA peak level of 11722.98 on 01/14/2000. As opposed to the descriptions of Fabozzi and Francis (1977) and Chen (1982), which did not reflect long-run dependencies and trends in price levels, this definition does not rule out sequences of negative (positive) price movements in stock prices during a bull (bear) period as long as their cumulative value does not exceed a certain threshold. Finally, the training and validation period account for the rest of the sample with the validation period covering almost 30% of the entire data set. The descriptive statistics are presented in Table 1. The Jarque–Bera statistic for all periods implies nonnormality while kurtosis indicates the presence of fat tails. The differences between the two periods PI and PII are evident. A significant increase in variance and fat-tailedness can be observed in PII (bear period), whilst additionally, PII witnessed more negative spikes as it can be inferred from the skewness. Next, the Lee–White–Granger (LWG) statistic (1993) is applied to daily returns of the DJIA in order to specifically justify the use of neural networks. The null hypothesis tests linearity in the mean against nonlinearity in general as expressed by the neural network, and the relevant statistic Q is given by



T

−1=2

T X

!′ ^t Φt e

^ −1 T −1=2 V T

t¼1

T X

Mean SD Skewness Kurtosis JB test Q(12)

being the logistic function S(λ) = (1 + e− λ)−1, λ ∈ R, and   ^ ¼ Var T −1=2 ∑T Φ e . The linear least squares approximation V t¼1 t t T is et ¼ yt − x˜ ′t Φt , and Φ⁎t denotes the parameter vector of the optimal linear least squares approximation. The hidden unit activation vector G = (G1,…,Gq) is selected independently of the sequence {xt} for given q ∈ Ν. The statistic Q is asymptotically distributed as χ2q under the null H 0 : EðΦt et Þ ¼ 0 vs H a : EðΦt et Þ≠0 , and Bonferroni bounds provide an upper limit on the p-value. If p1,…,pk denote the ascending-ordered p-values corresponding to k draws from W, then the standard Bonferroni implies rejection of the linear null at the a significance level if p1 ≤ a/k for all j, while in the limit the standard Bonferroni p-value is given by a = kp1. Hochberg (1988) suggested a modification to the Bonferroni method, which provided higher power to the test and the modified Bonferroni limit (improved Bonferroni) is given by a = min j = 1,…,k(k − j + 1)pj, so that H0 is rejected if there exists an j such as pj ≤ a/(k − j + 1),j = 1,…,k. Table 2 reports p-values of the test applied to daily returns of the DJIA index for the out-of-sample (04/08/1998–02/05/2002) as well as for the total sampling period (02/08/1971–02/05/2002). The neural network starts with ten hidden units, m = 1, 2, 3, 4 lagged inputs and the parameter of the logistic activation function is sampled from a uniform distribution on [− 2,2]. Next, the 2nd and 3rd principal component of the activation function S(x't,λ1)…S(x't,λ10) are computed, and the χ2 test statistic is calculated as n ⋅ R2 from a regression of the residuals from an AR(p) model on xt and the principal components. The procedure is repeated 10 times, and the p-values for the LWG test are the standard and improved Bonferroni bound. The 



! ^t Φt e

ð4Þ

t¼1

Table 1 Descriptive statistics (DJIA). Statistic

^ t ¼ yt − x˜ ′t Φt the residuals of the linear model, x˜ ¼ e   ′ ð1; x1 ; …; xt Þ the input vector, Φt ¼ Sð x˜ t′G1 Þ; …; S x˜ ′t Gq with S

where

DJIA returns

Table 2 Neural network nonlinearity (LWG) test. Period

Lagged input (m)

1

2

3

4

Out-of-sample period

Standard Bonferroni Improved Bonferroni Standard Bonferroni Improved Bonferroni

0.031 0.031 0.000 0.001

0.299 0.300 0.000 0.000

0.001 0.001 0.000 0.000

0.286 0.289 0.000 0.001

Total sampling period

PTotal

PI

PII

0.000 0.012 −0.492 6.260 483.4* 17.46

0.000 0.011 −0.497 5.908 182.2* 10.11

0.000 0.013 −0.469 6.355 271.7* 17.98

Note: The index returns are defined as rt = ln(Pt) − ln(Pt − 1), where Pt is the closing level on day t. *Significance at 5% confidence level. The periods are PI: 04/08/1998–01/ 14/2000, PII: 01/17/2000–02/05/2002, and PTotal: 04/08/1998–02/05/2002.

Note: p-values of the ANN test of Lee, White, and Granger (1993) applied to daily returns of the Dow Jones index. The ANN starts with 10 hidden units, m = 1, 2, 3, 4 lagged inputs, and the parameter of the logistic activation function is sampled from a uniform distribution on [−2,2]. Next, the 2nd and 3rd principal components of the activation function S(x't,λ1)…S(x't,λ10) is computed, and the χ2 test-statistic is calculated as n ⋅ R2 from a regression of the residuals from an AR(p) model on xt and the principal components. The procedure is repeated 10 times and the p-values for the ANN test is the standard and improved Bonferroni bound. The sample runs in the out-of-sample period PTotal: 04/08/1998–02/05/2002 as well as the total period: 02/08/1971–02/05/ 2002.

S.D. Bekiros / Review of Financial Economics 22 (2013) 213–219

results demonstrate that the null of linearity is strongly rejected and that the nonlinearity detected is “neural network nonlinearity in the mean.” Thus, the neural network could be considered as an optimal approximating model of the true data-generating nonlinear process.

Table 3 (i.e., for the RNN topologies with three return lags and one lag of the volatility changes provided with roughly equal results, albeit with computational overhead due to the added input variable). This empirical result follows also Katz (1992) and Ersoy (1990). Finally, the predictability of the RNN topologies with added volatility was further compared to a linear autoregressive stochastic model AR(2) with two lagged returns:

5. Profitability of alternative trading strategies The results relating to the statistical predictability and profitability of the estimated recurrent neural network (RNN) on the daily returns of DJIA appear in Table 3. The input space of the RNN comprises two lagged returns in the initial topology as well as “enriched” with one lag of the conditional volatility daily changes, i.e., the daily revisions of the estimated conditional volatility Δh1/2 over the past p periods. This could also be considered as an investigation of the empirical relation – if there exist potentially a nonlinear one – between conditional volatility and expected stock returns. The neural network architecture includes one hidden layer with ten neurons and one output layer with a single neuron. Conditional volatility estimates have been obtained from a rolling 20-day standard deviation of returns (denoted as RNN-MA(20)), an exponentially weighted moving average (EWMA) with a decay factor equal to 0.94 (RNN-RM(0.94)) and a GARCH (1,1) model (RNN-GARCH(1,1)). The exponentially moving average corresponds to the approach adopted by RiskMetrics (J. P. Morgan), and for that reason, it is denoted here as RM (0.94). In particular, the standard GARCH(1,1) model is given by 2

2

2

σ t ¼ α 0 þ α 1 ðr t−1 −μ t Þ þ βσ t−1

p

rt ¼ a þ ∑i¼1 βi r t−i þ εt :

where μt = (1/T) ∑ Ti = 1rt − i. As a special case, the specification adopted by J. P. Morgan is given as follows: 2

2

2

ð7Þ

This could also be considered as a misspecification test against the results by the LWG test, which strongly detected a “neural network nonlinearity in the mean,” as well as further control for 2nd moment effects or nonlinearity with respect to the variance. The neural network model is considered the optimal approximating true data-generating process for the mean but not for the variance. The addition of the volatility measure in the state space could enhance or deteriorate the out-of-sample performance. It would be interesting to investigate what would be the result of the inclusion of the volatility changes in the RNN architecture against the proposed model by the LWG test as well as against a simple linear benchmark. At the end of each trading period, all the RNN topologies are being re-estimated over a rolling sample that is equal to the training period set. If the value of the output signal is greater than zero, it is interpreted as a “buy” signal for the next trading day while a value less than zero as a “sell” signal. Then, the total return of the strategy, when transaction costs are not considered, is estimated as

ð5Þ

σ t ¼ λσ t−1 þ ð1−λÞðr t−1 −μ t Þ

217

TR ¼

To X

st ⋅yt

ð8Þ

t¼1

ð6Þ

where To indicates the out-of-sample horizon, yt is the realized return, and st is the recommended position which takes the value of (−1) for a short and (+1) for a long position (e.g., Gençay 1998b, Jasic & Wood, 2004). Then, the total return of each strategy is estimated with the inclusion of transaction costs, which are estimated as 0.05% for each one-way trade, following Hsu and Kuan (2005) and Fama and Blume (1966). In order to evaluate the forecast accuracy of the models, the percentage of correct predictions or correctly predicted signs was calculated as follows:

where λ = 0.94 is the optimal decay factor for daily data. The process for the selection of the lags involved in the first step the calculation of the Ljung–Box statistics for the first 10 lags of all return series in order to get a first indication. Significant autocorrelations of up to the second lag of the return series were identified. Additionally, the Schwarz information criterion (SIC) that was estimated for the first eight lags provided the minimum value at the second lag. Moreover, a sensitivity analysis based on RMSE conducted stepwise on the RNN model, which involved three lags of the conditional volatility changes and eight lags for returns, revealed that the selected setup provided with the best results. All other topologies were found to be qualitatively worse or roughly equal compared to those finally presented in

ð9Þ

Sign rate ¼ C=T o

where C is the number of correct predictions. Two other comparative profitability measures were also considered: the ideal profit (IP) and

Table 3 Out-of-sample testing for the DJIA stock index. Period

Total return B&H return Sign rate PT test HM test MSFE Sharpe ratio Ideal Profit

RNN

RNN–MA (20)

RNN–RM (0.94)

RNN–GARCH (1,1)

AR (LS)

PI

PII

PI

PII

PI

PII

PI

PII

PI

PII

50.3% (38.9%) 27.6% 54.6% 2.001⁎⁎ 2.779 ⁎⁎⁎ 0.013 0.094 0.128

28.4% (14.7%) −18.4% 51.6% 0.734 1.374⁎ 0.015 0.041 0.056

17.8% (5.1%) 27.6% 53.1% 1.377 ⁎ 1.841⁎⁎ 0.015 0.033 0.045

−19.1% (−32.1%) −18.4% 48.2% −0.821 −0.679 0.018 −0.027 −0.037

−4.02% (−16.5%) 27.6% 48.2% −0.754 −1.067 0.020 −0.007 −0.010

−13.7% (−27.4%) −18.4% 49.3% −0.302 0.061 0.021 −0.020 −0.027

19.4% (7.6%) 27.6% 52.1% 0.946 1.282 ⁎ 0.014 0.036 0.049

−27.6% (−39.8%) −18.4% 47.5% −1.166 −1.307 ⁎ 0.022 −0.040 −0.054

34.2% (22.1%) 27.6% 50.7% 0.098 0.135 0.013 0.064 0.087

32.9% (18.3%) −18.4% 50.2% 0.151 0.467 0.013 0.048 0.065

Note: RNN = recurrent neural network. Methods for forecasting volatility: MA(20) = moving average with a 20-day window. RM(0.94) = RiskMetrics' exponentially weighted MA rule (decay factor = 0.94). GARCH (1,1) = generalized conditional heteroscedastic volatility model. AR(LS) = autoregressive/linear stochastic model. PT test = the Pesaran and Timmermann (1992) test. HT test = the Henriksson and Merton (1981) test. Both tests are asymptotically distributed as N(0,1). The sign rate measures the proportion of correctly predicted signs. The Sharpe ratio is defined as the ratio of the mean return of the strategy over its standard deviation (it has been annualized by multiplying it with the squared root of 250). The ideal profit is the ratio of the returns of the trading strategy over the returns of a perfect predictor. The values of total return after transaction costs (0.05% average fixed cost for each one-way trade) are presented in parenthesis. *, **, and *** indicate significance at the one-sided 10%, 5%, and 1% levels, respectively. The periods are PI: 04/08/1998–01/14/2000 (bull), PII: 01/17/2000–02/05/2002 (bear), and PTotal: 04/ 08/1998–02/05/2002.

218

S.D. Bekiros / Review of Financial Economics 22 (2013) 213–219

the Sharpe ratio (SR). The IP compares the system return against the perfect forecaster and is calculated by

IP ¼

To X

!, st ⋅r t

t¼1

To X

! jr t j

ð10Þ

t¼1

where the value IP = 0 is considered as a benchmark to evaluate the performance of a trading strategy. When the direction indicator θt takes the correct position for all observations in the sample, then IP = 1, whereas if all forecasted positions are wrong IP = −1. The SRatio is the proportion of the mean return of the trading strategy over its standard deviation. The higher the SRatio, the higher the return and the lower the volatility: SRatio ¼ μ T o =σ T o

ð11Þ

As statistical measures of predictability, the mean squared forecast error (MSFE) is used, and the Henriksson–Merton (HM) and Pesaran– Timmerman (PT) tests were implemented on the realized and forecasted returns. Henriksson and Merton (1981) showed that the test statistic has a hypergeometric distribution under the null hypothesis of no market-timing ability, which may be asymptotically distributed as N(0,1). Pesaran and Timmermann (1992) also provided with a variant version of the test that was also asymptotically normally distributed. Based on the results shown in Table 3, the RNN topology as indicated by the LWG test – i.e., without the presence of the volatility changes – generates the best performance. The total return of the trading strategy is 50.3% for the bull period and 28.4% for the bear when a buy-and-hold (B&H) policy earns 27.6% and −18.4% (loss) respectively. When transaction costs are included, again the trading strategy based on the RNN model earns significantly higher than the B&H strategy, i.e., 38.9% and 14.7% for bull and bear markets respectively. Moreover, the proportion of the correctly predicted signs is 54.6% and 51.6% in case of bull and bear markets, while the HM and PT test statistics are strongly statistically significant especially for the bull period. In addition, the chosen strategy behaves better during both periods according to the Sharpe ratio (SR) and the ideal profit (IP) indices. The MSFE is the lowest among all topologies. The overall return compared to the B&H policy is superior during all market subperiods even when transaction costs are considered. This result is in line with those of Fernández-Rodriguez, Sosvilla-Rivero, and Garca-Artiles (1999); Fernández-Rodriguez et al. (2000), on the Nikkei and the Madrid stock market general indices, and of Pesaran and Timmermann (1995) who identified a relation between periods of high volatility and periods of higher than normal predictability. Next, the trading strategies with the inclusion of the conditional volatility measure are evaluated. The first conclusion that emerges is that the profitability is decreased (slightly or significantly) in both periods, when the MA(20), RM(0.94) and GARCH (1,1) volatility daily changes are incorporated in the input state space of the RNN. These strategies do not seem capable of succeeding profits in excess of the “no volatility” or the B&H strategies. Interestingly enough, the sign rate and the HM and PT tests are worse when the conditional volatility measures are included in the RNN topology. Furthermore, the other performance indices, namely, the SR, IP, and MSFE, show deterioration over the “no volatility” case in particular for the bear market period. The only exception might be the RNN-MA(20) and RNN-GARCH(1,1) strategies in the bull period, albeit they both produce worse results than the B&H strategy in terms of return and sign rate. The comparison between the three different specifications for the volatility estimation against the simple RNN model shows that the RNN without the volatility proxy as indicated by the LWG test produces sign forecasts that are not only no worse than those obtained from more complicated econometric models of conditional volatility, but significantly better in terms of predictability and profitability.

Finally, a comparative investigation is attempted with respect to the linear autoregressive model, AR(LS). In this case, the total return of the linear model is substantial better than that from the RNN with included volatility, even when transaction costs are considered. Also, the other statistical measures show a better forecasting performance when compared to those from the volatility-based topologies, with the exception of the RNN-MA(20) and RNN-GARCH(1,1) strategies in the bull period which appear to “beat” the simple linear model. However, the HM and PT tests reveal that the AR(LS) approach produces highly insignificant results at 10%, 5%, and 1% levels. Finally, when compared to the RNN model, the linear approach underperforms with respect to all measures, except perhaps the total return, which yet shows a big deterioration when transaction costs are included. This evidence can be explained based on the LWG test results. Specifically, the LWG test strongly detected a “neural network nonlinearity in the mean.” Thus, it seems absolutely plausible that the RNN topology shows a comparative superiority against the “enriched” topologies with the conditional volatility changes, as the neural network model is considered the true data-generating process for the mean but not for the variance. Interestingly, the addition of the volatility measure in the state space deteriorated the out-of-sample performance against the proposed model by the LWG test and in most cases below the performance of a simple linear benchmark. However, this might as well be a characteristic of the DJIA time series examined in this work; hence, a generic result cannot be inferred for all financial series. The results in this work are also in accordance with the conclusions reached by Christoffersen and Diebold (2006). The profitability of the trading rules as well as the sign rate indicator improves substantially over the bear market period against a B&H strategy, as implied by the aforementioned paper for a higher volatility period that characterize the post-bubble time interval in our test. Moreover, for bear markets this improvement may not be depicted strongly into the results of the HM and PT tests since they have no power to detect sign dependence in the face of no zero expected returns (Christoffersen & Diebold, 2006).

6. Conclusions In this study, we expanded the literature that evaluates return sign forecasting ability based on neural networks. The main objective was the development of a heuristic learning algorithm that mimics the decision-making process of boundedly rational agents in the dynamic and noisy environment of stock markets. An approach combining learning and short-term memory emulation was followed, in particular a recurrent neural network approach that leads to superior predictions upon the direction of change of the market. The Lee– White–Granger (LWG) statistic was applied to daily returns of the DJIA in order to justify the use of neural networks. In that, the results demonstrated that the null of linearity was strongly rejected and that the nonlinearity detected was “neural network nonlinearity in the mean.” Thus, the neural network could be considered as the best model of the true data-generating nonlinear process. As a benchmark, a linear autoregressive model as well as a Buy & Hold (B&H) strategy was used. We first tried to replicate previous evidence in the literature according to which the forecasting ability of simple rules outperform the B&H profits especially over bear market conditions. Moreover, we examined the relationship between stock return volatility and stock returns. In this direction and beyond the existing practice that has utilized in neural networks as inputs return lags, moving averages, etc., the incorporation of volatility changes in addition to endogenous return lags was investigated. The purpose of the paper was to investigate concretely the relative direction-of-change predictability of the volatilitybased trading models against the neural network structure implied by the Lee–White–Granger test as well as compared to other well-established models. Hence, the trading rule revisions included

S.D. Bekiros / Review of Financial Economics 22 (2013) 213–219

the conditional volatility of the DJIA index produced from alternative estimating techniques. The volatility incorporation did not generate a substantial improvement of the total returns and the profitability per unit of risk over bull and bear market periods. The results indicated that the recurrent neural network topology proposed in this study has been correctly trained to predict the direction of the market the following day. It seems plausible that the RNN topology shows a comparative superiority against the “enriched” topologies with the conditional volatility changes, as the neural network model is considered the true data-generating process by the Lee–White–Granger test for the mean but not for the variance. Interestingly, in most cases, the addition of the volatility in the state space deteriorated the predictability and profitability against the proposed model even below the performance of a simple linear benchmark. However, this might as well be an idiosyncratic characteristic of the DJIA time series examined in this work; hence, a generic result cannot be inferred for all financial series. An interesting subject for future research is the investigation of the forecasting ability of the proposed recurrent neural network architecture as well as of other topologies for many stock indices during the financial crisis of 2007–2010 and the Eurozone debt crisis. Perhaps the enhanced predictability of the proposed heuristic methodology could also imply that stock markets might not be efficiently priced especially during crisis periods. References Adya, M., & Collopy, F. (1998). How effective are neural networks at forecasting and prediction? A review and evaluation. Journal of Forecasting, 17, 481–495. Bekaert, G., & Wu, G. (2000). Asymmetric volatility and risk in equity markets. The Review of Financial Studies, 13(1), 1–42. Black, F. (1986). Noise. Journal of Finance, 41, 529–543. Box, G., & Jenkins, G. (1970). Time Series Analysis: Forecasting and Control. San Francisco, CA: Holden-day, Inc. Camillo, L. (2008). A Combined Signal Approach to Technical Analysis on the S&P 500. Journal of Business & Economics Research, 6, 41–51. Chen, S. N. (1982). An examination of risk return relationship in bull and bear markets using time varying betas. Journal of Financial and Quantitative Analysis, 17(2), 265–286. Christie, A. A. (1982). The stochastic behavior of common stock variances—value, leverage and interest rate effects. Journal of Financial Economics, 10, 407–432. Christoffersen, P. F., & Diebold, F. X. (2006). Financial asset returns, direction-of-change forecasting, and volatility dynamics. Management Science, 52, 1273–1287. Cybenko, G. (1989). Approximation by superposition of a sigmoidal function, mathematics of control. Signals and Systems, 2, 303–314. Elman, J. L. (1990). Finding structure in time. Cognitive Science, 14, 179–211. Ersoy, O. (1990). Tutorial at Hawaii International Conference on Systems Sciences. Fabozzi, F. J., & Francis, J. C. (1977). Stability tests for alphas and betas over bull and bear market conditions. Journal of Finance, 32(4), 1093–1099. Fama, E. F., & Blume, M. E. (1966). Filter rules and stock-market trading. Journal of Business, 39, 226–241. Fama, E. F., & French, K. R. (1995). Size and book-to-market factors in earnings and returns. Journal of Finance, 50, 131–155. Fernández-Rodriguez, F., Gonzalez-Martel, C., & Sosvilla-Rivero, S. (2000). On the profitability of technical trading rules based on artificial neural networks: evidence from the Madrid stock market. Economic Letters, 69, 89–94. Fernández-Rodriguez, F., Sosvilla-Rivero, S., & Garca-Artiles, M. D. (1999). Dancing with bulls and bears: nearest-neighbour forecasts for the Nikkei index. Japan and the World Economy, 11, 395–413. Foresee, F. D., & Hagan, M. T. (1997). Gauss–Newton approximation to Bayesian learning. Proceedings of IEEE International Conference on Neural Networks, 3, 1930–1935. Funahashi, K. (1989). On the approximate realization of continuous mappings by neural networks. Neural Networks, 2, 183–192. Gençay, R. (1998a). The predictability of security returns with simple technical trading rules. Journal of Empirical Finance, 5, 347–359. Gençay, R. (1998b). Optimization of technical strategies and the profitability in security markets. Economics Letters, 59, 249–254. Giot, P. (2005). Relationships between implied volatility indexes and stock index returns. Journal of Portfolio Management, 92–100. Green, H., & Pearson, M. (1994). Neural nets for foreign exchange trading. Trading on the Edge: Neural, Genetic, and Fuzzy Systems for Chaotic Financial Markets. New York: Wiley. Harvey, A. C. (1989). Forecasting structural time series models and the Kalman filter. Cambridge: Cambridge University Press.

219

Hecht-Nielsen, R. (1989). Theory of the backpropagation neural networks. Proceedings of the International Joint Conference on Neural Networks, Washington, DC, I. (pp. 593–605). New York: IEEE Press. Henriksson, R. D., & Merton, R. C. (1981). On the market timing and investment performance II: statistical procedures for evaluating forecasting skills. Journal of Business, 54, 513–533. Hochberg, Y. (1988). A sharper Bonferroni procedure for multiple tests of significance. Biometrika, 75, 800–802. Hommes, C. H. (2001). Financial markets as complex adaptive evolutionary systems. Quantitative Finance, 1, 149–167. Hommes, C. H. (2006). Heterogeneous agent models in economics and finance. In L. Tesfatsion, & K. L. Judd (Eds.), Handbook of Computational Economics, Volume 2: Agent-Based Computational Economics (pp. 1109–1186). : Elsevier Science B.V. Hornik, K. (1991). Approximation capabilities of multilayer feedforward networks. Neural Networks, 4, 251–257. Hornik, K., Stinchcombe, M., & White, H. (1989). Multilayer feedforward networks are universal approximators. Neural Networks, 2, 359–366. Hsu, P. -H., & Kuan, C. -M. (2005). Reexamining the profitability of technical analysis with data snooping checks. Journal of Financial Econometrics, 3, 606–628. Irwin, S. H., & Park, C. H. (2007). What do we know about the profitability of technical analysis? Journal of Economic Surveys, 21, 786–826. Jasic, T., & Wood, D. (2004). The profitability of daily stock market indices trades based on neural network predictions: case study for the S&P 500, the DAX, the TOPIX, and the FTSE in the period 1965–1999. Applied Financial Economics, 14, 285–297. Kao, G. W., & Ma, C. K. (1992). Memories, heteroscedasticity and prices limit in currency futures markets. Journal of Futures Markets, 12, 672–692. Katz, J. O. (1992). Developing neural network forecasters for trading. Technical Analysis of Stocks and Commodities, 58–70. Kaufman, J. P. (1998). Trading Systems and Methods (3rd ed.)New York: John Wiley & Sons. Kirkpatrick, C. D., & Dahlquist, J. R. (2007). Technical Analysis: The Complete Resource for Financial Market Technicians. Upper Saddle River: Financial Times Press (New Jersey). Krugman, P. (1987). Trigger Strategies and Price Dynamics in Equity and Foreign Exchange Markets. NBER Working Paper No 2459. Kuan, C. -M., & White, H. (1994). Artificial neural networks: an econometric perspective. Econometric Reviews, 13, 1–91. La Porta, R., Lakonishok, J., Shliefer, A., & Vishny, R. (1997). Good news for value stocks: further evidence on market efficiency. Journal of Finance, 52, 859–874. Lee, T. -H., White, H., & Granger, C. W. J. (1993). Testing for neglected nonlinearity in time series models. Journal of Econometrics, 56, 269–290. Levich, R. M., & Thomas, L. R. (1993). The significance of technical trading rule profits in the foreign exchange market: a bootstrap approach. Strategic Currency Investing Trading and Hedging in the Foreign Exchange Market (pp. 336–365). Chicago: Probus. Lo, A. W., Mamaysky, H., & Wang, J. (2000). Foundations of technical analysis: computational algorithms, statistical inference, and empirical implementation. Journal of Finance, 55, 1705–1765. Lunde, A., & Timmermann, A. (2004). Duration dependence in stock prices: an analysis of bull and bear markets. Journal of Business and Economic Statistics, 3(22), 253–273. Masters, T. (1993). Advanced Algorithms for Neural Networks. : John Wiley. Murphy, J. J. (1999). Technical Analysis of the Financial Markets: A Comprehensive Guide to Trading Methods and Applications. Paramus, New Jersey: Prentice Hall Press. Pesaran, M. H., & Timmermann, A. (1992). A simple non-parametric test of predictive performance. Journal of Business and Economic Statistics, 10, 461–465. Pesaran, M. H., & Timmermann, A. (1995). Predictability of stock returns: robustness and economic significance. Journal of Finance, 50, 1201–1228. Plummer, T., & Ridley, A. (2003). Forecasting Financial Markets: The Psychology of Successful Investing. London: Kogan. Poddig, A. (1993). Short term forecasting of the USD/DM exchange rate. Proceedings of First International Workshop on Neural Networks in Capital Markets, London. Rawani, A. (1993). Forecasting and trading strategy for foreign exchange market. Information Decision Technology, 1, 19. Shiller, R. J. (2002). From Efficient Market Theory to Behavioral Finance. Cowles Foundation Discussion Paper No. 1385. Shleifer, A., & Summers, L. H. (1990). The noise trader approach to finance. Journal of Economic Perspectives, 4(2), 19–33. Simon, H. A. (1957). Models of Man. New York, NY: Wiley. Weigend, A. S. (1991). Generalization by weight-elimination applied to currency exchange rate prediction. Proceedings of the IEEE International Joint Conference on Neural Networks, Singapore. White, H. (1989). Learning in artificial neural networks: a statistical perspective. Neural Computing, 1, 425–464. Yao, J. T., Poh, H. -L., & Jasic, T. (1996). Foreign exchange rates forecasting with neural networks. Proceedings of the International Conference on Neural Information Processing, Hong Kong (pp. 754–759). Zhang, X. R. (1994). Non-linear predictive models for intra-day foreign exchange trading. International Journal of Intelligent Systems Accounting and Finance Management, 3(4), 293–302.