North American Journal of Economics and Finance 17 (2006) 139–153
On the informational efficiency of S&P500 implied volatility Ralf Becker a , Adam E. Clements b,∗ , Scott I. White b a
Economic Studies, School of Social Sciences, University of Manchester, UK School of Economics and Finance, Queensland University of Technology, GPO Box 2434, Brisbane Q 4001, Australia
b
Received 23 November 2004; received in revised form 19 October 2005; accepted 31 October 2005 Available online 1 December 2005
Abstract Implied volatility is often considered to represent a market’s prediction of future volatility. If such a market was to generate efficient volatility forecasts, implied volatility should reflect all relevant conditioning information. The purpose of this paper is to determine whether a publicly available and commonly used implied volatility index, the VIX index (as published by the Chicago Board of Options Exchange) is in fact efficient with respect to a wide set of conditioning information. Results indicate that the VIX index is not efficient with respect to all elements in the information set that may be used to form volatility forecasts. © 2005 Elsevier Inc. All rights reserved. JEL classification: C12; C22; G00; G14 Keywords: Implied volatility; Information; Realized volatility; VIX index
1. Introduction To determine option values, market participants require an estimate of the future volatility of the underlying asset. Conditional on the option-pricing model, implied volatility derived from option prices should reflect all relevant conditioning information and represent the market’s best prediction of the underlying assets’ future volatility.1 The purpose of this paper is to determine whether or not a publicly available implied volatility index is in fact efficient with respect to commonly available conditioning information. ∗ 1
Corresponding author. Tel.: +61 7 3864 2525; fax: +61 7 3864 1500. E-mail address:
[email protected] (A.E. Clements). See, amongst others, Jorion (1995) and Poon and Granger (2003).
1062-9408/$ – see front matter © 2005 Elsevier Inc. All rights reserved. doi:10.1016/j.najef.2005.10.002
140
R. Becker et al. / North American Journal of Economics and Finance 17 (2006) 139–153
Canina and Figlewski (1993) and Lamoureux and Lastrapes (1993) have considered the forecast efficiency of implied volatility. In the former, implied volatility estimates were derived from S&P100 index options, whereas the latter utilized options on selected companies. Both studies conclude that, in addition to implied volatility, historical information is useful with regard to volatility forecasts. This paper is closely related to both Fleming (1998) and Jiang and Tian (2003). Fleming (1998) examines whether S&P100 implied volatility reflects all available historical conditioning information. While Fleming (1998) finds that implied volatility produces biased forecasts of future actual volatility (as defined by average squared returns), forecast errors are orthogonal to historical information. The results reported in Fleming (1998) indicate that S&P100 implied volatility subsumes a wider set of historical information than the implied volatility measures utilized by Canina and Figlewski (1993) and Lamoureux and Lastrapes (1993). Jiang and Tian (2003) consider the informational efficiency of a model-free estimate of S&P500 implied volatility. They find that this measure of implied volatility is an unbiased and efficient forecast of future volatility, subsuming both Black-Scholes implied volatility estimates and lagged volatility. Christensen and Prabhala (1998), also come to the same conclusion with different implied volatility measures, they however utilize a rather small information set to establish their efficiency result. This study considers the forecast efficiency of S&P500 implied volatility by incorporating recent developments in volatility measurement and encompassing tests to update the results of Fleming (1998). This study also provides a supplementary analysis of forecast efficiency to Jiang and Tian (2003), as a much wider set of conditioning information is utilized. The role of a possible volatility risk premium, as discussed in Chernov (2002) will also be considered. The measure of S&P500 implied volatility used here is the VIX index, published by the Chicago Board of Options Exchange (CBOE), a widely used measure of index volatility over a 22 tradingday horizon, a similar model-free estimate to that used by Jiang and Tian (2003). In contrast, Fleming (1998) computes estimates of S&P100 implied volatility with differing horizons and thus does not consider such a publicly available measure of implied volatility with a constant forecast horizon. Furthermore, while Fleming (1998) captures actual volatility with only squared daily returns, this research also utilizes the realized volatility (RV) estimate of Andersen, Bollerslev, Diebold, and Labys (2001, 2003). This paper finds evidence against the hypothesis of VIX being an efficient volatility forecast. Such evidence is strongest when information is sampled daily. In the literature, monthly sampling has frequently been employed to avoid certain econometric difficulties arising from overlapping forecasts. The evidence against the hypothesis of forecast efficiency is much weaker in this sampling scheme, explaining some of the previous findings and concluding that implied volatility forecasts are efficient. The remainder of the paper is structured as follows. In Section 2, a general framework within which the forecast efficiency of any implied volatility estimate may be examined is discussed. Section 3 presents the results from applying this methodology to the particular problem of assessing the forecast efficiency of VIX volatility forecasts. Conclusions are offered in Section 4. 2. Methodology 2.1. Preliminaries At the center of our empirical investigation lies the relationship between the volatility forecast 2 derived from the VIX index, made at time t, ftiv , and the eventual volatility outcome, σˆ t+1→t+22 ,
R. Becker et al. / North American Journal of Economics and Finance 17 (2006) 139–153
141
where the subscript refers to the period for which the volatility forecast is made and the hat indicates the volatility realizations needing to be estimated. Forecast properties are often investigated in the framework of the following regression relationship (Clements & Hendry, 1998)2 : 2 = α + βftiv + εt . σˆ t+1→t+22
(1)
Commonly, the restriction of α = 0 and β = 1 is seen to represent the hypothesis of ftiv , being an unbiased forecast. The regression residual εt is interpreted as the forecast error of ftiv , taking into account any potential biases, should the above hypothesis be rejected. The question whether a forecast is efficient can be tackled in an extended regression framework.3 Consider a set of additional variables, zt , available at time t. The forecast ftiv is said to be efficient, if the forecast error is uncorrelated with any information contained in zt . Forecast efficiency is then equivalent to δ = 0 in 2 = α + βftiv + δzt + εt . σˆ t+1→t+22
(2)
Alternatively, a forecast error estimate from Eq. (1) can be evaluated for correlation with zt . As the volatility forecast used is that from the VIX index, it needs to be acknowledged, that the VIX index forecast is fundamentally different from alternative volatility forecasts, such as GARCH or Stochastic Volatility forecasts. The VIX forecast is derived from options written on the S&P500. It was demonstrated by Chernov (2002), that ftiv not only reflects the market’s view of future volatility, but also reflects a volatility risk premium, which may or may not be constant. Chernov argues that allowance for a potentially time-varying volatility risk premium can be made by augmenting Eq. (1) to 2 = α + βftiv + γ σˆ t2 + εt , σˆ t+1→t+22
(3)
as the time-varying risk premium is correlated with the current value of volatility, σt2 . The latter, of course, is latent and needs to be estimated, σˆ t2 . After estimation of θ = (α, β, γ) , the resultant residual 2 ˆ t = σˆ t+1→t+22 − α − βftiv − γ σˆ t2 eiv t =ε
(4)
is interpreted as the forecast error when using the implied volatility index as a volatility forecast, 2 after taking into account the presence of a volatility risk premium. Issues of estimating σˆ t+1→t+22 2 and σˆ t will be discussed below. Two strategies will be employed to test whether the VIX volatility forecast is efficient, in the sense that its forecast errors are uncorrelated with information available at the time the forecast was produced. First, forecast-encompassing tests will be employed. Here, the information in zt is restricted to consist of alternative volatility forecasts. Second, a potentially more general procedure, nested in a GMM estimation of Eq. (4), will allow tests for correlation between eiv t and any instrument set zt . Seen in this light, it is apparent that forecast encompassing is a necessary but not a sufficient condition for forecast efficiency.
2
The first use of these regressions is often attributed to Mincer and Zarnowitz (1969) and hence these types of regressions are sometimes called Mincer–Zarnowitz regressions. 3 See Jiang and Tian (2003) for an example of this strategy.
142
R. Becker et al. / North American Journal of Economics and Finance 17 (2006) 139–153
2.2. Forecast encompassing Forecast-encompassing tests will be applied to testing the hypothesis that implied volatility forecasts encompass a range of alternative volatility forecasts. The methodology follows that outlined by Harvey and Newbold (2000) for multiple forecast encompassing. A number of studies have tested whether implied volatilities encompass one particular alternative forecast (usually a GARCH forecast) for currencies (Pong, Shackleton, Taylor, & Xu, 2004) and for a range of different financial markets (Szakarmy, Ors, Kim, & Davidson, 2003). Jiang and Tian (2003) provide evidence that implied volatility forecasts derived from S&P500 option implied volatilities encompasses GARCH and Black-Scholes implied volatility forecasts. This study extends the work of Jiang and Tian in three ways. Their implied volatility forecasts are not derived from the VIX, but from an alternative measure of implied volatility. They assume the volatility risk premium to be time invariant and their information set used to test efficiency is restricted to GARCH volatility forecasts and Black-Scholes implied volatilities. To the best of our knowledge, this is the first study performing forecast-encompassing tests for implied volatilities. In particular, it is the first study formally evaluating the forecasts derived from the VIX, comparing these to a comprehensive set of alternative forecasts. Forecast-encompassing tests investigate whether the forecast error of a particular forecast can be explained by the forecast errors of alternative forecasts (Clements & Hendry, 1998). If no alternative forecast error can explain that model’s forecast error, then that model’s forecast is judged to be efficient or to encompass all other forecasts. Harvey and Newbold (2000) spell out the methodology to test the null hypothesis that a particular forecast encompasses all other forecasts when overlapping forecasts are evaluated. This methodology is useful in the present situation, as daily forecasts of average 22-day-ahead volatility are evaluated. Here it is investigated whether the volatility forecast based on the VIX encompasses a range of alternative volatility forecasts. These alternative volatility forecasts are derived from econometric volatility models frequently applied in the financial econometrics literature. The models are the GARCH(1,1) (gar), an asymmetric GARCH-type GJR model (gjr), a stochastic volatility (sv) model, two time-series models of RV (arma and arfima), as well as the exponentially weighted moving average of squared returns (ewma) as applied in Riskmetrics. Some of these models are also extended by the inclusion of realized volatility as additional explanatory variables (garrv, gjrrv, gjrrvg and svrv).4 These models are utilized to produce forecasts of average 22day-ahead volatility. Forecasts are based on parameters estimated from a rolling 1000 observation window. While the VIX forecast error, eiv t , is defined as in Eq. (4), the forecast error of the ith alternative 2 2 forecast model is calculated according to σˆ t+1→t+22 − f i , where σˆ t+1→t+22 is an estimate of the i average observed volatility from day t + 1 to t + 22 and f is the forecast of this quantity from the ith model at time t. In order to test whether the volatility forecasts embedded in the VIX, ftiv , encompass rival forecasts, it is necessary to estimate the following regression5 eiv λi e˜ it + εt , (5) t = i∈F
4 5
See Becker, Clements, and White (2004) for exact specifications of these models. As in Harvey and Newbold (2000), the forecast errors utilized in regression (1) are de-meaned.
R. Becker et al. / North American Journal of Economics and Finance 17 (2006) 139–153
143
with eiv ˜ it = eiv ˜ it t being the forecast error at period t of the implied volatility forecast and e t −e where i ∈ F = (gar, garrv, gjr, gjrrv, gjrrvg, sv, svrv, arma, arfima, ewma). The null hypothesis of the implied volatility forecast encompassing all other volatility forecasts is equivalent to λi = 0 for all i ∈ F. Two statistical tests for this hypothesis have been proposed (Harvey & Newbold, 2000), namely, an F-test along with a modified Diebold–Mariano (DM) test. In both instances, allowance must be made for the 21-day overlap given that 22-day-ahead forecasts are dealt with. The empirical properties of the F- and DM-tests have been investigated in Harvey and Newbold (2000), who conclude that size distortions are likely to occur when forecast errors are non-normally distributed, for long forecast horizons and moderate sample sizes.6 While the forecasting horizon (22 periods) used here is rather long, the sample length is also relatively large. In order to draw robust conclusions, asymptotic and bootstrap p-values will be reported for the F- and DM-tests. Furthermore, an alternative testing strategy is introduced in the following section. 2.3. Orthogonality of implied volatility forecast errors This section introduces the testing strategy proposed in Fleming (1998). It evaluates whether forecast errors are orthogonal to a set of available information, zt , and hence can be used to test the efficiency of a forecast. To consider the forecast efficiency of an implied volatility forecast, ftiv , reconsider Eq. (3): 2 = α + βftiv + γ σˆ t2 + εt . σˆ t+1→t+22
If the sequence of zero-mean forecast errors {ˆεt } is unrelated to any other conditioning information, implied volatility subsumes that information and is therefore an efficient forecast of future volatility. A direct way of testing the orthogonality of {ˆεt } is proposed by Fleming (1998), who employs the Generalized Method of Moments (GMM) framework. Parameter estimates of Eq. (3) are obtained by minimizing V = g(α, β, ␥) Hg(α, β), where T
g(α, β, γ) =
1 2 (σˆ t+1→t+22 − α − βftiv − γ σˆ t2 )zt T
(6)
t=1
and zt is a set of instruments which includes those variables to which the forecast errors are hypothesized to be orthogonal. The weighting matrix H is chosen to be the variance–covariance matrix of the moment conditions in g(α, β, ␥), where allowance is made for residual correlation.7 The traditional test of overidentifying restrictions in the GMM framework is then a test of whether VIX subsumes all historical conditioning information. Here it is argued that the model-based volatility forecasts serve as a filter of all relevant information for future volatility. They are, therefore, prime candidates for inclusion into the instrument set zt as an economical way to summarize all available information. The argument is that econometricians in the past have developed volatility models which are as successful as possible in modelling volatility. Deficiencies of early models were, where possible, eliminated by including additional information such as asymmetric return information into the models.
6
Harvey and Newbold propose two versions of their F-test for forecast encompassing and the test applied here is their F2 -test, which tends to be conservative. 7 See Hansen and Hodrick (1980).
144
R. Becker et al. / North American Journal of Economics and Finance 17 (2006) 139–153
If the instrument set is restricted to contain volatility forecasts from alternative volatility models and the parameter estimates obtained from (6) are similar to those from Eq. (4), then this test is equivalent to testing whether the ftiv forecast encompasses the alternative volatility forecasts. The GMM test is clearly more general, as it allows non-volatility forecast information to be ˆ γ) ˆ β, ˆ is different from that in (4), included into zt . If, however, the parameter vector θˆ GMM = (α, 2 iv 2 then (σˆ t+1→t+22 − α − βft − γ σˆ t ) loses its interpretation as a forecast error and the test for overidentifying restrictions cannot be thought of as a test for forecast efficiency. (1) (2) A number of instrument sets will be considered.8 The sets zt = {1, ft iv , σˆ t2 }, zt = iv 2 {1, ft−1 , σˆ t−1 } merely reflect the information used on the RHS of Eq. (3). The latter will enable us to test whether ftiv is efficient with respect to its own history and past volatility information. Available measures of absolute returns and a measure of asymmetric returns are (4) (5) included in zt = {|rt |, rt − }, variables considered by Fleming (1998). The instrument sets zt = (6) (7) {ft gar , ft gjr , ft sv , ft ewma }, zt = {ft arma , ft arfima }, zt = {ft garrv , ft gjrrv , ft gjrrvg , ft svrv } contain volatility forecasts generated from econometric models introduced in Section 2.2. These (5) sets are split into forecasts from basic econometric models (zt ), time-series models of RV (6) (7) (zt ) and econometric models augmented by RV (zt ), respectively. The final instrument set, (8) (1) (2) (3) (4) zt = {pct , pct , pct , pct }, contains the first four principal components of a principal com(5) (6) (8) ponent analysis of the 10 alternative forecasts captured in instrument sets zt , zt and zt .9 This is considered to economize on the number of instruments, without sacrificing potentially important information. A number of econometric issues arise when applying the above tests. Foremost is the issue of overlapping observations. While both the forecast-encompassing tests and the GMM orthogonality tests utilize adjusted variance–covariance matrices, the empirical properties of these corrections are largely unknown and are likely to depend on very data-specific issues, such as the marginal distribution of the data. Jiang and Tian address this issue by sampling the required information only once monthly and hence avoiding any forecast overlap. This, of course, is at the cost of a significant loss of information. In order to evaluate the impact of this sampling scheme, all tests are replicated with monthly observations only.10 Another issue concerns the measurement of volatility, which is inherently latent. This prob2 and σˆ t2 . In the following section a number of lem arises in the estimation of σˆ t+1→t+22 alternative estimates for both these variables are discussed and tests will be applied using these alternative volatility estimates. There is a tradition of modelling volatility in a log–linear framework.11 One advantage of doing so is to automatically guarantee positive volatility estimates. Recently, Andersen et al. (2001) establish that log volatility is, even unconditionally, close to normally distributed. Jiang and Tian (2003) call on these arguments and propose to estab2 lish the efficiency of log(ftiv ) as a forecast for log(σˆ t+1→t+22 ). In the context of this paper, the VIX volatility forecast efficiency will be tested both in the linear and the log–linear framework.
8 All instrument sets only contain information up to and including time t. No set includes information from within the forward-looking period of t + 1 → t +22. 9 It transpires, that the first four principal components in z(8) explain 97.5% of the variation in the 10 volatility forecasts t for the given sample. 10 The sampling scheme of Jiang and Tian is replicated. We sample the Wednesday which follows the third Friday of the month. Should that day be a holiday, the Thursday or Tuesday is considered as alternative sampling dates. 11 One of the first examples is the EGARCH model of Nelson (1991).
R. Becker et al. / North American Journal of Economics and Finance 17 (2006) 139–153
145
3. Empirical analysis 3.1. Data It is the aim of this paper to test the forecast efficiency of implied volatility forecasts, derived from options written on the S&P500 share index. The sample considered in this study is daily S&P500 index data, from 2 January 1990 to 17 October 2003 (3481 daily observations). The implied volatility measure utilized here is that provided by the Chicago Board of Options Exchange, the VIX.12 The VIX is an implied volatility index derived from a number of put and call options on the S&P500 index, which generally have strike prices close to the current index value with maturities close to the target of 22 trading days.13 It is derived without reference to a restrictive option-pricing model.14 After allowing for a potential volatility risk premium, the VIX is constructed to be a general measure of the market’s estimate of average S&P500 volatility over the subsequent 22 trading days (Blair, Poon, & Taylor 2001; Christensen & Prabhala, 1998).15 As highlighted by Jiang and Tian (2003), the advantages of such a model-free approach to computing implied volatility are two-fold.16 Relative to a model-based estimate such as Black-Scholes, a model-free estimate incorporates more information from a range of observed option prices. In the current context, utilizing model-free estimates avoids a joint test of both the option-pricing model and market efficiency. Before embarking on the empirical application of the forecast-encompassing and orthogonality 2 tests, three more data issues have to be addressed. In turn, the choice of estimates for σˆ t+1→t+22 and σˆ t2 , as well as some more details about the alternative volatility forecasts will be discussed. 2 Two estimates of σˆ t+1→t+22 will be used in this paper. In order to be able to compare results to 2 those from Fleming (1998), average squared daily returns, r¯t+1→t+22 , will be used as an estimate of observed volatility. The second measure utilized is average realized volatility (RV) during the days t + 1 to t + 22, RVt+1→t+22 . Daily estimates of RV, RVt , which have been shown to be less noisy than rt2 , are constructed from intra-day S&P500 index data.17 Individual observations of RVt are combined to create RVt+1→t+22 . In dealing with practical issues such as intra-day seasonality and sampling frequency when constructing daily RVt , the signature plot methodology of Andersen, Bollerslev, Diebold, and Labys (1999) is followed. Given this approach, daily RVt estimates are constructed using 30 min S&P500 index returns. As for the choice of estimates of current volatility, σˆ t2 , required to allow for a time-varying volatility risk premium, two potential measures are chosen, RVt , along with the range estimator used by Chernov (2002).18
12 The VIX index used here is the most recent version of the index, introduced on September 22, 2003. VIX data for this study were downloaded from the CBOE website. 13 The daily volatility implied by the VIX can be calculated when recognizing that the VIX quote is equivalent to 100 √ 2 times the annualized return standard deviation. Hence, (VIX/(100 252)) represents the daily volatility measure (see CBOE, 2003). 14 For technical details relating to the construction of the VIX index, see CBOE (2003). 15 Quoting from the CBOE White paper (2003) on the VIX, “VIX [. . .] provide[s] a minute-by-minute snapshot of expected stock market volatility over the next 30 calendar days.” 16 They utilize a different approach to that embodied into the calculation of the VIX. 17 Intraday S&P500 index data were purchased from Tick Data, Inc. See Andersen et al. (2001, 2003) for a discussion of RV. 18 The range estimator used by Chernov calculates the squared difference between the maximum and minimum intraday return. Parkinson (1980) proposed a range-based proxy for volatility, defined as the difference between the intra-day
146
R. Becker et al. / North American Journal of Economics and Finance 17 (2006) 139–153
In order to generate model-based volatility forecasts which capture the information available at time t efficiently, the volatility models were re-estimated for every t using data from t-999 to t. The resulting parameter values were then used to generate 22 day-ahead volatility forecasts. The first forecast, period covers the trading period from 13 December 1993 to 12 January 1994. For subsequent forecasts, the model parameters were re-estimated using a sliding estimation window of 1000 observations. The last forecast period covers 18 September 2003 to 17 October 2003, leaving 2460 forecasts. 3.2. Forecast encompassing of VIX volatility forecasts 2 The encompassing tests based on the estimation of Eq. (5) were implemented for σˆ t+1→t+22 , 2 being estimated by RVt+1→t+22 and r¯t+1→t+22 , and in the linear and log–linear form. The implied volatility forecast at time t, ftiv , is derived from the value of the VIX index observed at time t. Table 1 reports the results of the forecast-encompassing tests. Besides the different estimates for 2 σˆ t+1→t+22 , the implied volatility forecast error further depends on the parameter values in Eq. (3) and the estimate of σˆ t2 . The following four definitions of the VIX forecast error represent different parameter restrictions and estimates of σˆ t2 . For eiv,1 t , the parameters are restricted to α = 0, β = 1, γ = 0 and consequently this definition assumes the unbiasedness of ftiv , an assumption which is empirically rejected. The forecast error eiv,2 is defined to allow for a constant volatility risk t and eiv,4 premium, whereas the inclusion of an estimate for the current volatility level in eiv,3 t t , allows for a potentially time-varying risk premium. 2 = σˆ t+1→t+22 − ftiv eiv,1 t 2 eiv,2 = σˆ t+1→t+22 − (α + βftiv ) t 2 eiv,3 = σˆ t+1→t+22 − (α + βftiv + γ σˆ t2 ) t 2 eiv,4 = σˆ t+1→t+22 − (α + βftiv + γRanget ). t
where parameters are not restricted, they are estimated by means of OLS.19 Table 1 reports the results of the forecast-encompassing tests along with the R2 measures from the test regressions (5). The table further displays sample skewness and kurtosis of the relevant VIX volatility forecast errors. As the empirical properties of these tests for long forecast overlaps involving daily observations are not well documented,20 Table 1 also reports bootstrap p-values in parentheses. The bootstrap replications were produced by means of the block bootstrap, imposing the null hypothesis of forecast encompassing on (5). The block length was varied between 20 and 40, but only the results for a block length of 25 are shown. No qualitative differences arise from using other block lengths. maximum and minimum prices. Using this estimator does not qualitatively change the results to be presented below. (Results are available upon request from the authors.) 19 Results of these regressions are not shown here as they are only ancillary to our research question. In general, the hypothesis of α = 0 can be rejected. β = 1 can be rejected in the linear specifications, whereas the log–linear specification often yields estimates for β close to 1. The hypothesis γ = 0 is usually rejected, although the parameter estimate is, in general, positive, contrary to the results in Chernov (2002). These results are available upon request. 20 For forecasts without overlap they tend to be rather conservative (see Table 1 in Harvey & Newbold, 2000).
Table 1 Forecast-encompassing test using average realized volatility and average squared returns as the forecast target and the linear or log–linear specification
Daily S K R2 F DM Daily log S K R2 F DM
2 σˆ t+1→t+22 = RVt+1→t+22
2 2 σˆ t+1→t+22 = r¯t+1→t+22
eiv,1 t
eiv,2 t
eiv,3 t
eiv,4 t
−0.882 7.111 0.548 0.0000 (0.000) 0.0000 (0.000)
2.176 11.440 0.138 0.3002 (0.120) 0.1309 (0.243)
2.138 11.515 0.112 0.3993 (0.223) 0.2233 (0.360)
2.027 12.037 0.113 0.2278 (0.165) 0.0757 (0.155)
0.396 3.209 0.192 0.0158 (0.000) 0.0012 (0.005)
0.355 3.232 0.186 0.0162 (0.000) 0.0013 (0.000)
0.340 3.229 0.147 0.0165 (0.000) 0.0014 (0.005)
0.145 3.184 0.155 0.0127 (0.000) 0.0008 (0.005)
eiv,1 t
eiv,2 t
eiv,3 t
eiv,4 t
0.198 8.149 0.312 0.6521 (0.090) 0.5231 (0.683)
2.078 10.709 0.153 0.1604 (0.353) 0.0365 (0.068)
2.051 10.943 0.122 0.2177 (0.265) 0.0692 (0.180)
2.036 11.459 0.135 0.1005 (0.618) 0.0129 (0.070)
−0.235 3.247 0.132 0.0193 (0.000) 0.0020 (0.005)
−0.186 2.993 0.103 0.0416 (0.000) 0.0087 (0.000)
−0.203 3.025 0.069 0.0729 (0.000) 0.0233 (0.005)
−0.254 3.023 0.062 0.0654 (0.000) 0.0194 (0.005)
Monthly S −0.094 K 5.208 R2 0.536 F 0.0000 DM 0.0000
2.456 11.758 0.294 0.1638 0.1495
2.444 10.946 0.246 0.2370 0.2260
2.376 11.123 0.214 0.3404 0.3354
1.099 8.803 0.406 0.0028 0.0008
2.531 12.371 0.354 0.2085 0.1961
2.618 12.437 0.301 0.2267 0.2152
2.534 12.631 0.290 0.4158 0.4151
Monthly log S 0.323 K 3.238 R2 0.256 F 0.0046 DM 0.0017
0.287 3.272 0.246 0.0058 0.0023
0.186 3.296 0.214 0.0090 0.0042
0.097 3.120 0.176 0.0038 0.0013
−0.285 3.559 0.165 0.0128 0.0067
−0.210 3.194 0.092 0.0656 0.0515
−0.200 3.269 0.078 0.0925 0.0774
−0.295 3.091 0.054 0.3809 0.3782
R. Becker et al. / North American Journal of Economics and Finance 17 (2006) 139–153
Var
Notes: Uses different definitions for the VIX volatility forecast error. S (K) = Forecast error sample skewness (kurtosis). R2 is the coefficient of determination from Eq. (5). F and DM results report asymptotic test p-values (bootstrap p-values in parentheses where applicable). 147
148
R. Becker et al. / North American Journal of Economics and Finance 17 (2006) 139–153
While Table 1 does not report estimates for the λi in (5), it transpires that irrespective of the specification, the gar, gjrrvg, svrv and arma models generate information significantly related as a definition to the VIX volatility forecast error. In general, it can be said that using eiv,1 t of forecast error triggers the strongest rejections of the forecast-encompassing null hypothesis. Given that the assumption of unbiasedness and the absence of a risk premium is not supported by the data, this result is not surprising. In general, the evidence for forecast encompassing gets to eiv,3 and further to eiv,4 stronger as one moves from forecast error definition eiv,2 t t t , implying that a time-varying risk premium is important and that a range-based estimator of current volatility may be less noisy than a realized volatility estimate (Chernov, 2002). This is also reflected in a to declining R2 in the regression (5), as we change the VIX forecast error definition from eiv,1 t iv,4 et . 2 . It is well known The second issue arises due to the difference in the estimate for σˆ t+1→t+22 (Andersen et al., 1999), that RVt+1→t+22 provides a less noisy estimate of the actual volatility 2 than r¯t+1→t+22 . It therefore appears reasonable, that any evidence against forecast encompassing could be revealed more easily with less noise obscuring the data. It is therefore plausible that, in general, the forecast-encompassing regressions with the former as estimate for volatility provide stronger evidence against the null hypothesis. The most apparent differences arise due to the type of specification, linear or log–linear and due to the sampling frequency. In short, the results imply a stronger rejection of the forecast-encompassing hypothesis for the log–linear specification and for daily data. Changing to either the linear specification and/or monthly sampling weakens the evidence against forecast encompassing.21 The extreme cases involve strong rejection of forecast encompassing for daily observations in the log–linear specification and no evidence against the null hypothesis for monthly data in the linear specification. Nevertheless, it should be noted that even with monthly data the log–linear specification indicates a clear (marginal) rejection of forecast encompassing, 2 2 when σˆ t+1→t+22 is estimated by means of RVt+1→t+22 (¯rt+1→t+22 ). It is the view taken here, that the differences due to different sampling intervals are reflective of the loss of information as one reduces the sampling frequency to once a month. Jiang and Tian (2003) justify their choice of monthly sampling with the difficulties arising from inference with overlapping data. The statistical techniques used here were designed to deal with these issues and bootstrap p-values do generally support the results provided by using asymptotic distributions. It is more difficult to interpret the differences arising from the various specifications. The differences in the orthogonality tests, which are presented in the next section, are not quite as stark as the differences here. This, combined with the fact that the orthogonality tests should be less sensitive to the distributional features of the data, leads to the view that the differences here might be driven by the distributional features of the data. The VIX volatility forecast errors display significant skewness and excess kurtosis, whereas the log forecast errors are nearly normal. In the limited simulation evidence provided in Harvey and Newbold (2000), deviations from normality lead to increasing conservativeness of the forecast-encompassing tests, which is congruent with the present finding that stronger rejections are being found for the data which are closer to being normally distributed.
21 When sampling at a monthly frequency, 117 observations remain after using the first 1000 days for the initial modelbased forecasts.
R. Becker et al. / North American Journal of Economics and Finance 17 (2006) 139–153
149
3.3. Orthogonality of VIX forecast errors This section presents the results of testing the orthogonality of implied volatility forecast errors with information available at the time at which the forecast was formed. The parameters in Eq. (3) are estimated using the moment conditions in Eq. (6) and the resulting test of overidentifying moment conditions is then utilized to evaluate the efficiency of the VIX volatility forecast. As 2 for the forecast encompassing tests, a number of variations of Eq. (3) were utilized. σˆ t+1→t+22 2 2 is estimated either via RVt+1→t+22 or r¯t+1→t+22 , the current level of volatility σˆ t is restricted to be irrelevant (γ = 0) or approximated by either the current level of realized volatility or by a range-based estimator. This corresponds to allowing for the VIX forecast error specifications eiv,2 t to eiv,4 t . Lastly, the estimations are performed on either the linear or log–linear specification and on the daily and monthly observations. The different instrument sets utilized were discussed in Section 2.3. While a number of different compositions of the instrument vector zt were considered, space constraints made it necessary to present a selection of results only.22 All instrument sets which are used for the results presented (1) here include the variables appearing in Eq. (3), zt = {1, ftiv , σˆ t2 }, where it should be noted that σˆ t2 is only included when γ is allowed to be non-zero (eiv,3 and eiv,4 t t ). This has the effect that the parameter estimates remain such that the residuals are orthogonal to these variables and hence retain the interpretation of a forecast error.23 Additional instruments are added to investigate (2) (4) whether past volatility information (zt ), current return information (zt ), model-based volatility (5) (6) (7) (8) forecasts (zt , zt , zt ) or their first four principal components (zt ) are orthogonal to the VIX (1) forecast errors. Table 2 displays the parameter estimates of θ = (α, β, γ) when zt = zt and the p-values of the test for overidentifying moment restrictions (J-test) for the remaining instrument (1) sets. As argued before, the parameter estimates only vary slightly when zt is included in the 24 instrument set. The results in Table 2 refer to the daily sampling scheme with consecutive forecasts overlapping 21 periods. The weighting matrix H, the parameter standard errors and all inferences are generated with the appropriate Hansen and Hodrick (1980) correction with 21 lags.25 The parameter estimates for θ = (α, β, γ) in Eq. (3) estimated from daily data are qualitatively similar to those estimated from monthly data (see Table 3). For the linear specification, we find largely ˆ which are significantly smaller than 1 and usually take values around 0.5, and ˆ β, insignificant α, ˆ The log–linear specification yields significantly negative constants, but βˆ significantly positive γ. which are uniformly close to 1. As for the linear specification, γˆ is estimated to be significantly larger than 0. The phenomenon that βˆ tends to be much closer to 1 in the log–linear specification has previously been observed by Jiang and Tian (2003) in their estimation for monthly data. They however, do not find significant constants and also do not include current estimates of volatility σˆ t2 . Differences may also be due to the fact that they use a different implied volatility index. Chernov (2002) conjectures that the parameter γ ought to be negative. He finds positive estimates for γˆ in 22
A full set of results is available from the authors. (1) Of course, orthogonality to zt is not guaranteed, but is maintained in all the estimations below. 24 Estimations without z(1) being an element of the instrument set were undertaken and parameter estimates vary signift icantly. 25 The empirical properties of these standard errors are an empirical matter and generally deteriorate with longer forecast horizons (Hodrick, 1992; Richardson & Smith, 1991). Sample sizes used here are significantly larger than those used in these Monte-Carlo studies, such that it is conjectured here that the test is appropriately sized. 23
150
R. Becker et al. / North American Journal of Economics and Finance 17 (2006) 139–153
Table 2 GMM estimates using the Hansen–Hodrick weighting matrix with 21 lags Linear eiv,2 t
Log–linear eiv,3 t
eiv,4 t
2 σˆ t+1→t+22 = RVt+1→t+22 − daily observation α 0.090 (0.121) 0.118 (0.116) 0.077 (0.107) β 0.494 (0.050) 0.437 (0.051) 0.442 (0.045) γ 0.080 (0.019) 0.108 (0.021)
zt (J-test) 1,2 1,4 1,5,6,7 1,8 8
0.7946 (1) 0.0001 (2) 0.0001 (10) 0.0020 (4) 0.0091 (3)
0.0004 (2) 0.0036 (2) 0.0002 (10) 0.0059 (4) 0.0235 (2)
0.0000 (2) 0.0002 (2) 0.0001 (10) 0.0035 (4) 0.0035 (2)
2 2 σˆ t+1→t+22 = r¯t+1→t+22 − daily observation α 0.058 (0.165) 0.099 (0.158) β 0.636 (0.069) 0.551 (0.070) γ 0.119 (0.027)
0.042 (0.151) 0.576 (0.063) 0.126 (0.030)
zt (J-test) 1,2 1,4 1,5,6,7 1,8 8
0.0000 (2) 0.0018 (2) 0.0000 (10) 0.0008 (4) 0.2544 (2)
0.5314 (1) 0.0038 (2) 0.0000 (10) 0.0006 (4) 0.0006 (3)
0.0003 (2) 0.0245 (2) 0.0000 (10) 0.0022 (4) 0.0358 (2)
eiv,2 t
eiv,3 t
eiv,4 t
−0.973 (0.050) 1.051 (0.061)
−0.845 (0.054) 0.903 (0.066) 0.118 (0.024)
−0.836 (0.051) 0.954 (0.058) 0.065 (0.012)
0.0037 (1) 0.0010 (2) 0.0000 (10) 0.0001 (4) 0.1835 (3)
0.0000 (2) 0.3017 (2) 0.0000 (10) 0.0002 (4) 0.3608 (2)
0.0000 (2) 0.0057 (2) 0.0000 (10) 0.0001 (4) 0.0664 (2)
−2.048 (0.077) 1.277 (0.094)
−1.954 (0.088) 1.168 (0.106) 0.088 (0.038)
−1.902 (0.082) 1.173 (0.092) 0.069 (0.019)
0.0029 (1) 0.6289 (2) 0.0015 (10) 0.0917 (4) 0.8871 (3)
0.0002 (2) 0.8598 (2) 0.0012 (10) 0.1098 (4) 0.7289 (2)
0.0000 (2) 0.7367 (2) 0.0039 (10) 0.1218 (4) 0.8523 (2)
Notes: In parentheses the Hansen–Hodrick standard errors with 21 lags. Tests for overidentifying restrictions (J-tests) are also based on this correction. p-values (degrees of freedom in parentheses) are reported for varying instrument sets.
estimations equivalent to those presented here and negative estimates when using instruments for his estimate of ftiv and σˆ t2 to allow for measurement errors.26 The overidentifying moment restriction tests paint a clear picture when applied to daily obser2 vations and using RVt+1→t+22 as an estimate for σˆ t+1→t+22 . The null hypothesis of efficiency (1) iv is rejected for all instrument sets which include zt = {1, ft , σˆ t2 } and model-based forecasts.27 (1) It is interesting to note that the test does not reject the null hypothesis when zt is not included (8) and zt = zt . In this case, the parameter estimates change significantly and the forecast errors are (1) not necessarily orthogonal to ftiv . It also appears as if lagged values of elements in zt and some asymmetric return information contain information relevant for explaining VIX forecast errors. 2 When using r¯t+1→t+22 rather than RVt+1→t+22 , the results remain mainly unchanged, although the null hypothesis is at most marginally rejected when using the principal components of the model-based forecast. Estimations similar to those in Chernov were performed, but the resulting estimates for γˆ remained uniformly positive. Instrumental variable estimations to allow for measurement errors in the estimator for the current volatility and/or the VIX were performed (see also Christensen & Prabhala, 1998). Reasons for the different results could be due to different sample periods and the fact that Chernov utilizes the S&P100 index. Jiang and Tian also obtain positive estimates of γˆ when using an estimate of lagged average volatility over a period which corresponds to their forecast horizon. 27 Inspection of the individual moment conditions invariably confirms that the rejection of the null hypothesis arises (1) while orthogonality with the elements in zt is maintained. 26
R. Becker et al. / North American Journal of Economics and Finance 17 (2006) 139–153
151
Table 3 GMM estimates of Eq. (3) using data sampled once monthly Linear eiv,2 t
Log–linear eiv,3 t
eiv,4 t
2 σˆ t+1→t+22 = RVt+1→t+22 − monthly observations α 0.041 (0.081) 0.127 (0.087) 0.029 (0.082) β 0.533 (0.055) 0.315 (0.088) 0.456 (0.057) γ 0.337 (0.100) 0.161 (0.056)
zt (J-test) 1,2 1,4 1,5,6,7 1,8 8
0.1527 (1) 0.1861 (2) 0.6290 (10) 0.3739 (4) 0.2485 (3)
0.3018 (2) 0.3212 (2) 0.6185 (10) 0.5963 (4) 0.5811 (2)
0.1118 (2) 0.4018 (2) 0.8601 (10) 0.4071 (4) 0.1610 (2)
2 2 σˆ t+1→t+22 = r¯t+1→t+22 − monthly observations α −0.081 (0.143) 0.045 (0.133) −0.099 (0.129) β 0.717 (0.095) 0.395 (0.114) 0.611 (0.089) γ 0.496 (0.108) 0.222 (0.060)
zt (J-test) 1,2 1,4 1,5,6,7 1,8 8
0.0928 (1) 0.2636 (2) 0.3282 (10) 0.2614 (4) 0.1539 (3)
0.3650 (2) 0.5681 (2) 0.4487 (10) 0.5357 (4) 0.9555 (2)
0.1215 (2) 0.4719 (2) 0.5608 (10) 0.2124 (4) 0.0804 (2)
eiv,2 t
eiv,3 t
eiv,4 t
−0.949 (0.044) 1.070 (0.060)
−0.782 (0.082) 0.867 (0.093) 0.152 (0.052)
−0.757 (0.056) 0.925 (0.056) 0.099 (0.021)
0.2402 (1) 0.1441 (2) 0.1393 (10) 0.0825 (4) 0.3498 (3)
0.8679 (2) 0.2757 (2) 0.4871 (10) 0.1637 (4) 0.6393 (2)
0.1057 (2) 0.4101 (2) 0.3659 (10) 0.1794 (4) 0.5949 (2)
−2.065 (0.093) 1.329 (0.120)
−1.925 (0.136) 1.160 (0.166) 0.127 (0.077)
−1.782 (0.107) 1.117 (0.124) 0.146 (0.032)
0.6301 (1) 0.2989 (2) 0.3756 (10) 0.4740 (4) 0.8384 (3)
0.9121 (2) 0.4135 (2) 0.5564 (10) 0.6976 (4) 0.6752 (2)
0.3998 (2) 0.5666 (2) 0.7425 (10) 0.9334 (4) 0.9811 (2)
Notes: In parentheses standard errors. Tests for overidentifying restrictions (J-tests). p-values (degrees of freedom in parentheses) are reported for varying instrument sets.
Results change drastically when the data are constrained to monthly sampling as shown in Table 3. While the parameter estimates for Eq. (3) remain mainly unchanged, there is basically no evidence for inefficiency in the VIX volatility forecasts, irrespective of the choice of VIX forecast error definition and the choice of estimate for the current value of volatility σˆ t2 . None of the instrument sets trigger a rejection of the null hypothesis that VIX volatility forecasts are efficient. While still insignificant at conventional significance levels, the smallest p-values arise for the log–linear specification when estimating realized volatility by RVt+1→t+22 . This corresponds to the results of the forecast-encompassing test, where this was the only specification for which the null hypothesis of encompassing was rejected. Two potential explanations for the discrepancy in results between monthly and daily data can be offered. First, the rejections of the efficiency hypothesis in daily data may be mainly due to size distortions in the tests applied. While the given set up, especially the issue of overlapping observations, has the potential to cause such an effect, some care has been taken to adapt the inference procedures to this data feature. The second potential explanation is that restriction to a significantly smaller set of observations, 117 rather than 2460 daily observations, substantially weakens the evidence against efficiency. It is worthwhile to compare these results with results in the previous literature and point out reasons for differences. Christensen and Prabhala (1998) do conclude, from using monthly, nonoverlapping data on the S&P100, that implied volatilities derived from one at-the-money option provide an efficient forecast for future realized volatility. Efficiency in their paper, however, was
152
R. Becker et al. / North American Journal of Economics and Finance 17 (2006) 139–153
evaluated with respect to a rather limited information set, namely lagged realized volatility.28 The latter point also explains differences in comparison to results in Jiang and Tian (2003), who also conclude, using monthly data, that an implied volatility estimate is efficient. They do examine both linear and log–linear specifications and conclude that the efficiency of implied volatility forecasts is best examined in a log–linear framework. Fleming (1998) also concludes that his implied volatility estimate is efficient with respect to a wider range of non-price information such as volume and interest rate data. None of the instruments considered in Fleming caused the test for overidentifying restrictions to reject the null hypothesis of orthogonality. Clearly, the result arrived at here, being that VIX volatility forecasts are not orthogonal to other available information, is qualitatively different. Two main reasons are put forward to explain this discrepancy. In Fleming, implied volatilities are derived from individual options rather than using an index like the VIX. It is further argued here that the most relevant available information for future volatility are volatility forecasts available at time t. For this reason, the instrument sets discussed above include such volatility forecasts to serve as a filter to distilled available information into one forecast. 4. Concluding remarks It is widely believed that implied volatility is a market’s expectation of future volatility. This study has considered the forecast efficiency of the VIX index, a publicly available S&P500 implied volatility index published by the CBOE. These tests have been based on the testing framework proposed by Fleming (1998) and on forecast-encompassing tests. It is shown, as in previous research, that there is a significant positive correlation between the VIX index and future volatility. In contrast to much of the previous literature, it has been demonstrated that the VIX is not an efficient volatility forecast in the sense that other available information can improve upon the VIX as a volatility forecast. In previous work, it has been shown that implied volatility often dominates other, model-based forecasts of volatility. The results presented here do not contradict such a finding; they merely suggest that an improved forecasting model could potentially be found. References Andersen, T. G., Bollerslev, T., Diebold, F. X., & Labys, P. (1999). (Understanding, optimizing, using and forecasting) realized volatility and correlation (Working Paper). University of Pennsylvania. Andersen, T. G., Bollerslev, T., Diebold, F. X., & Labys, P. (2001). The distribution of exchange rate volatility. Journal of the American Statistical Association, 96, 42–55. Andersen, T. G., Bollerslev, T., Diebold, F. X., & Labys, P. (2003). Modeling and forecasting realized volatility. Econometrica, 71, 579–625. Becker, R., Clements, A. E., & White, S.I. (2004). Forward looking information in S&P500 options. Unpublished mimeo. Queensland University of Technology. Blair, B. J., Poon, S.-H., & Taylor, S. J. (2001). Forecasting S&P100 volatility: The incremental information content of implied volatilities and high-frequency index returns. Journal of Econometrics, 105, 5–26. Canina, L., & Figlewski, S. (1993). The informational content of implied volatility. The Review of Financial Studies, 6, 659–681. Chernov, M. (2002). On the role of volatility risk premia in implied volatilities based forecasting regressions. Unpublsihed manuscript. Chicago Board of Options Exchange. (2003). VIX, CBOE Volatility Index. Christensen, B. J., & Prabhala, N. R. (1998). The relation between implied and realized volatility. Journal of Financial Economics, 50, 125–150.
28
It should be noted, that these authors do not endeavor to test for efficiency with a large information set in mind.
R. Becker et al. / North American Journal of Economics and Finance 17 (2006) 139–153
153
Clements, M. P., & Hendry, D. F. (1998). Forecasting economic time series. Cambridge: Cambridge University Press. Fleming, J. (1998). The quality of market volatility forecasts implied by S&P100 index option prices. Journal of Empirical Finance, 5, 317–345. Hansen, L. P., & Hodrick, R. J. (1980). Forward exchange rates as optimal predictors of future spot rates: An econometric analysis. Journal of Political Economy, 88, 839–853. Harvey, D., & Newbold, P. (2000). Tests for multiple forecast encompassing. Journal of Applied Econometrics, 15, 471–482. Hodrick, R. J. (1992). Dividend yields and expected stock returns: Alternative procedures for inference and measurement. The Review of Financial Studies, 5, 358–386. Jiang, G. J., & Tian, Y. S. (2003). Model-free implied volatility and its information content. Unpublished manuscript. Jorion, P. (1995). Predicting volatility in the foreign exchange market. Journal of Finance, 50, 507–528. Lamoureux, C. G., & Lastrapes, W. D. (1993). Forecasting stock-return variance: Toward an understanding of stochastic implied volatilities. The Review of Financial Studies, 6, 293–326. Mincer, J., & Zarnowitz, V. (1969). The evaluation of economic forecasts. In J. Mincer (Ed.), Economic forecasts and expectations. New York: National Bureau of Economic Research. Nelson, D. B. (1991). Conditional heteroscedasticity in asset returns: A new approach. Econometrica, 59, 347–370. Parkinson, M. (1980). The extreme value method for estimating the variance of the rate of return. Journal of Business, 53, 61–65. Pong, S., Shackleton, M. B., Taylor, S. J., & Xu, X. (2004). Forecasting currency volatility: A comparison of implied volatilities and AR(FI)MA models. Journal of Banking & Finance, 28, 2541–2563. Poon, S.-H., & Granger, C. W. J. (2003). Forecasting volatility in financial markets: A review. Journal of Economic Literature, 41, 478–539. Richardson, M., & Smith, T. (1991). Test of financial models in the presence of overlapping observations. The Review of Financial Studies, 4, 227–254. Szakmary, A., Ors, E., Kim, J. K., & Davidson, W. N., III. (2003). The predictive power of implied volatility: Evidence from 35 futures markets. Journal of Banking & Finance, 27, 2151–2175.