Energy Economics 86 (2020) 104683
Contents lists available at ScienceDirect
Energy Economics journal homepage: www.elsevier.com/locate/eneeco
Forecasting the real prices of crude oil using robust regression models with regularization constraints Xianfeng Hao, Yuyang Zhao, Yudong Wang ⁎ School of Economics and Management, Nanjing University of Science and Technology, 200 Xiaolingwei Street, Nanjing 210094, China
a r t i c l e
i n f o
Article history: Received 1 April 2019 Received in revised form 1 January 2020 Accepted 7 January 2020 Available online 11 January 2020 Keywords: Real oil prices Machine learning Predictive regressions Out-of-sample forecasting
a b s t r a c t In this paper, we forecast the real price of crude oil via a robust loss function (Huber), with regularization constraints including LASSO, Ridge, and Elastic Net. These modifications are designed to avoid problems with overfitting and improve out-of-sample predictive performance. The efficient implementation of penalized regression for Huber losses is supported by the accelerated proximal gradient algorithm. Our results indicate that equal-weight mean combinations based on robust parameter design and parameterization penalties can outperform the benchmark no-change model at all horizons (up to two years). We also find that combinations of forecasts from robust penalized models can significantly outperform those based on OLS in horizons of longer than three months. These models have consistent and significantly higher directional accuracy than the no-change model, with success ratios of up to 63.9%. © 2020 Elsevier B.V. All rights reserved.
1. Introduction Crude oil is an extremely important commodity in the global economy. Oil price shocks have significant effects on the real economy, which has been well documented in the literature and is of major concern to central banks and governments (Baumeister and Kilian, 2016; Kilian, 2014, 2009; Kilian and Park, 2009; Kilian and Zhou, 2018). Thus, accurate forecasts of the real price of oil are central to the development of economic policies (Baumeister et al., 2015; Baumeister and Kilian, 2015, 2014a). The main contribution of our study is that we propose a robust predictive model and variable selection method, which together can generate more accurate oil price forecasts than those derived from the no-change benchmark. Many studies focus on detecting oil price forecastability. Coppola (2008) and Murat and Tokat (2009) use the vector error correction model (VEC) for oil spot and futures prices. Alquist et al. (2013) and various papers by Kilian and coauthors use a lot of methods from models based on futures prices to models based on other commodity prices, exchange rates of commodity exporters, and oil market fundamentals (e.g., Baumeister and Kilian, 2012; Baumeister and Kilian, 2014a, 2014b, 2012, 2015). Baumeister et al. (2018) use product spread oil price forecasting models. Baumeister et al. (2015) employ mixed-data sampling (MIDAS) models. More reliable forecastability is derived ⁎ Corresponding author. E-mail address:
[email protected] (Y. Wang).
https://doi.org/10.1016/j.eneco.2020.104683 0140-9883/© 2020 Elsevier B.V. All rights reserved.
from forecast combinations that generate more robust or even more accurate forecasts by reducing forecast variability (Baumeister et al., 2015; Baumeister and Kilian, 2015; Drachal, 2016; Funk, 2018; Naser, 2016; Wang et al., 2017). Other studies use machine learning methods to produce forecasts, which can capture nonlinear patterns hidden in the crude oil price series. Common machine learning methods include artificial neural networks (ANN) (Jammazi and Aloui, 2012; Yu et al., 2014, 2008), support vector machines (SVM) (Xie et al., 2006; Yu et al., 2014), and gene expression programming (GEP) (Mostafa and El-Masry, 2016). The parameters of popular predictive regressions are obtained via ordinary least squares (OLS). The OLS algorithm produces parameter estimates by minimizing the objective function of the mean squared error (MSE). However, the MSE function amplifies the role of large fluctuations, which is unfair and may lead to high forecast variability. Robust regression models are therefore required to produce more stable forecasts than OLS in the presence of extreme observations, which has previously attracted interest in the statistics literature (Box, 1953; Huber, 1964; Tukey, 1960). We are thus motivated to minimize a robust loss function, Huber loss, instead of the MSE when estimating the parameters in-sample to improve the stability of the out-of-sample performance. The Huber loss was first defined by Huber in 1964 and has been widely applied to robust statistics, M-estimation, and additive modeling (Friedman, 2001; He and Shao, 1996; Huber, 2011, 1973; Yi and Huang, 2017). The Huber loss function combines the best properties of squared loss
2
X. Hao et al. / Energy Economics 86 (2020) 104683
and absolute loss, as it is strongly convex when close to the minimum and less steep for extreme values. The selection of variables is also crucial in forecasting the real price of oil, and the numerous fundamental variables that can be used include global oil production, real economic activity, oil inventories, and financial market variables. However, selecting the appropriate variables can be difficult for forecasters. Several studies have investigated which predictors are the most useful for forecasting. For example, Wang et al. (2015) improve the forecastability of real oil prices by applying both economic and statistical restrictions to the parameters of predictive regressions. Similarly, Yi et al. (2018) develop several constraint approaches that contain predictor-related, parameter-related, and combined constraints. Miao et al. (2017) use penalization-based variable selection methods for linear parametric models, including the least absolute shrinkage and selection operator (LASSO). To solve the problem of variable selection, we impose regularization constraints on the objective function by adding a penalty term. We consider three regularization constraints: Tibshirani's (1996) LASSO, Ridge developed by Hoerl and Kennard (1970a, 1970b), and Elastic Net from Zou and Hastie (2005). Due to the mechanism of the penalty function, the penalized regression simultaneously induces coefficient shrinkage and variable selection. These two properties allow for the combination of many potentially highly correlated crude oil forecasts, as the parameter estimates are stable, overfitting is reduced, and the variables are automatically selected. As Gu et al. (2018) find, the accelerated proximal gradient algorithm supports the efficient implementation of penalized regression under the Huber loss function. We focus on recursive forecasts of the real refiner acquisition cost of imported oil (RAC) and the real price of West Texas Intermediate (WTI) crude oil. Here, we only deal with single-equation forecasting models based on robust penalty regressions. Singlepredictor models and their combinations are applied to predict real oil prices. In univariate models, the parameter penalty can effectively choose the relevant variables by retaining the fundamental variable or intercept. We find that the robust penalized models can significantly outperform both multivariate linear regression (MLR) and the combination of single-predictor models. We report the forecasting performance of the models under two loss functions: the mean squared error (i.e., the OLS method) and the Huber loss. We use two common criteria to evaluate the forecasting accuracy. First, the mean squared prediction error (MSPE) ratio computes the ratio of the given model MSPE relative to the benchmark model MSPE. We choose the no-change model that the best prediction of future oil prices is the current price as the benchmark model, which is the standard in the literature. Intuitively, an MSPE ratio lower than 1 implies that the given model forecasts are on average more accurate than the benchmark model forecasts. Second, we use the success ratio, which indicates how often the model of interest correctly predicts a sign of oil price change. A success ratio higher than 0.5 suggests that the directional accuracy of the given model is superior to that of the no-change benchmark. The empirical results show that the predictions of the penalized regression methods with regularization constraints improve on those of traditional methods. The equalweighted mean combination, based on a robust parameter design and parameterization penalties, can outperform the benchmark of no-change model at all horizons (up to 2 years), and the largest MSPE reduction is as high as 7%. We also find that the forecast combinations of the robust penalized models can significantly outperform the combinations of models with only regularization constraints for horizons of longer than three months. These models consistently reveal significant forecastability in terms of directional accuracy, with success ratios of up to 63.9%. The remainder of the paper is organized as follows. The methodology of the robust estimation and variable selection based on parameterization penalties is described in Section 2. The data and predictor
variables are discussed in Section 3. Section 4 presents the main forecasting results. Section 5 concludes the paper. 2. Methodology In this section, we first show the predictive regressions for real oil prices, and then describe our methodology for estimating the model parameters under a robust loss function and parameterization penalties. Finally, we discuss the “hyperparameter tuning” method. 2.1. Predictive regressions We use predictive regressions to produce forecasts of oil price changes. The general specification for the model is as follows: r tþh ¼ xt θt þ εtþh ;
ð1Þ
where εt may be serially correlated; rt+h = yt+h − yt represents the difference in log prices over the horizon h, and yt is the log level of real oil prices in month t; and xt = [1,Zt] is a vector of the explanatory variables used to predict yt+h. We use two types of predictive regression: the singlepredictor models, in which Zt is an explanatory variable, and multivariate linear regression (MLR), in which all potential variables are incorporated. The out-of-sample performance of the univariate predictive regression may change significantly over time, as suggested in the literature (Baumeister and Kilian, 2012). Thus, we combine the forecasts from different models. For simplicity, the basic equal-weighted mean combination (EWMC) is used. The forecasts from this combination method are given by ^r t;comb ¼
N 1X ^r i;t ; N i¼1
ð2Þ
where ^r i;t is the forecast of oil price changes generated by model i. Although the weighting scheme of EWMC is simple, there is no guarantee that a more sophisticated combination will improve on it (Claeskens et al., 2016; Stock and Watson, 2004). An alternative is to construct the combination forecast as a weighted average of the individual forecasts based on recursive inverse MSPE ratios, which is a natural approach to measuring the real-time forecast accuracy of competing models and is commonly used in econometric studies (Diebold and Pauly, 1987; Stock and Watson, 2004). In line with Baumeister and Kilian (2015), we assign equal weight to each model as the initial weights at the beginning of the evaluation period and calculate them inversely on the historical performance of each individual forecast. The smaller the MSPE of a model is at date t, the larger the weight the model receives in forming the combination forecast, ^r t;comb ¼
N X i¼1
m−1 i;t ωi;t ^r i;t ; ωi;t ¼ PN ; −1 i¼1 mi;t
ð3Þ
where mi,−1 t is the recursive MSPE of model i in period t. Inverse MSPE weights depend on the recent forecast accuracy of each model. 2.2. Robust objective function (Huber) We estimate the parameters by minimizing the Huber loss function, which is an extension of the mean squared error (MSE) that is the object loss of the OLS. Before describing the Huber loss, we give the wellknown MSE loss as follows: LðθÞ ¼
T 1X ðr −xt θt Þ2 ; T t¼1 tþh
ð4Þ
MSE is an efficient and simple linear regression objective function, but it is sensitive to outliers that can cause high levels of instability in
X. Hao et al. / Energy Economics 86 (2020) 104683
parameter estimates when extreme events frequently occur in oil markets. The MSE loss is therefore extended as a more robust Huber loss function (Huber) defined as LH ðθÞ ¼
T 1X H ðr tþh −xt θt ; δÞ; T t¼1
ð5Þ
3
models through both shrinkage and selection. As with λ, we adaptively optimize ρ using the validation sample accelerated proximal gradient algorithm, which supports the convergence of penalized regression for both MSE and Huber losses. Further details are given in the Appendix. 2.4. Sample splitting and tuning via validation
The simple predictive regression is unsuccessful if the model contains extra “irrelevant” predictors, and the regularization constraints are then required to simplify the model to prevent overfitting. Parameterization penalty is the most common method of parameter parsimony and is incorporated in the objective function to improve the out-ofsample performance. The new objective function can thus be given as follows:
The performance of the parameter penalty constraint critically depends on hyperparameters. In line with common approaches in the literature, we determine the hyperparameters in an additional subset of the sample, in which the in-sample data set is separated into two independent subgroups using the last two years' data as the “validation” subsample and the previous data as the “training” subsample. We first estimate the parameters in the “training” subsample with given initial hyperparameters. Then we calculate the forecast errors in the validation subsample based on the parameters from the training sample, and iteratively search for hyperparameters that optimize the validation objective. The parameters can then be reestimated from the training data subject to the updated hyperparameter values. The aim of the validation is to simulate an out-of-sample evaluation of the model. Hyperparameter tuning is in effect a search for a degree of model complexity that is likely to produce reliable out-of-sample performance (Gu et al., 2018). Thus, the evaluation sample serving as the outof-sample is designed to evaluate the forecasting performance. It should be noticed that this approach may make the evaluation period quite short compared to related studies.
Lðθ; Þ ¼
3. Data
x2 ; if jxj≤δ . 2δjxj−δ2 ; if jxjNδ The Huber loss is less sensitive than MSE to outliers in the data, which is quadratic for smaller values and linear for larger values of x. The combination of squared and absolute losses is controlled by the tuning hyperparameter δ. MSE is a special case of the Huber loss when δ = ∞. The hyperparameter can be optimized adaptively from the data using the referable method developed by Fan et al. (2017).
where Hðx; δÞ ¼
2.3. Parameterization penalties
LðθÞ |ffl{zffl}
þ ϕðθ; Þ ; |fflfflffl{zfflfflffl}
Loss function
ð6Þ
Penalty
3.1. Dependent variables
We consider three commonly used penalties for the penalty term ϕ (θ; ·): least absolute shrinkage and selection operator (LASSO) from Tibshirani (1996), Ridge from Hoerl and Kennard (1970a, 1970b), and Elastic Net from Zou and Hastie (2005). LASSO selects a subset of the potential predictors and shrinks the weights of the others to zero, and is a multifunctional penalty for improving the prediction accuracy and interpretability of regression models that takes the form of ϕðθ; λÞ ¼ λ
P X θ j ;
ð7Þ
j¼1
where hyperparameter λ represents the degree of parameterization penalty, which can be optimized adaptively. Unlike Lasso, Ridge attempts to compress irrelevant parameters to as close to zero as possible rather than to zero. It is an effective tool for shrinking large regression coefficients. The penalty function of Ridge is given as ϕðθ; λÞ ¼
P 1 X θ 2 ; λ 2 j¼1 j
ð8Þ
For highly correlated variables LASSO typically selects one randomly, whereas Ridge shrinks them toward each other (Zou and Hastie, 2005). The Elastic Net (ENet) penalty, however, linearly combines LASSO and Ridge methods by considering the Ridge regression penalty when performing LASSO-type shrinkage. The Elastic Net form is defined as ϕðθ; λ; ρÞ ¼ λð1−ρÞ
P P X X θ j þ 1 λρ θ j 2 : 2 j¼1 j¼1
ð9Þ
In addition to the degree of penalty λ, another non-negative hyperparameter, ρ, balances the Ridge and the LASSO terms. Note that LASSO (ρ = 0) and Ridge (ρ = 1) are special cases of the Elastic Net. For intermediate values of ρ = 0.5, the Elastic Net encourages simple
Our goal is to forecast the real oil price, which is naturally taken as the dependent variable. We follow Baumeister and Kilian (2012) and use two common proxies of crude oil price. The first proxy is the refiner's acquisition cost for imported crude oil (RAC), and the second is the spot price of the West Texas Intermediate crude oil (WTI). Kilian (2009) suggests that the RAC is a good proxy for oil price fluctuation in the global oil market. The WTI price is, however, a prevailing benchmark in world crude oil pricing and is the underlying price of future contracts traded in NYMEX. Thus, this measure is often used in oil-related derivative analyses. Our sample data covers the period from January 1986 through May 2018. We use the recursive estimation window (i.e., enlarging window) to produce forecasts starting from January 1992. Monthly nominal RAC and WTI prices are collected from the U.S. Energy Information Administration (EIA). The nominal prices are further deflated by the U.S. Consumer Price Index (CPI) to obtain real oil prices. The CPI data is provided by the Federal Reserve Bank, Saint Louis. In the predictive regressions, we use the first-order differences of the log real prices. 3.2. Explanatory variables We apply seven commonly used variables to predict real crude oil prices. These reflect the fundamentals of the crude oil market from different perspectives and are briefly described below. Oil futures prices (WF). Kilian and his coauthors forecast the real price of RAC and WTI based on the spread of nominal spot price and the price of oil futures contracts (e.g., Alquist and Kilian, 2010; Baumeister and Kilian, 2012, 2015). The specific calculation equation is as follows: ^ oil ¼ Roil 1 þ f h −st −Et πh ; R t tþhjt t tþh h where Roil t represents current level of the real price of oil; ft represents the log of the current WTI oil futures price for maturity h, st represents
4
X. Hao et al. / Energy Economics 86 (2020) 104683
h the log of the WTI spot price, and Et(πt+h ) represents the expected inflation rate over the next h periods, which is proxied by recursively averages of past U.S. CPI inflation data, starting from 1986.1 The monthly WTI oil futures price data is available up to a horizon of 18 months, which is the maximum horizon for which the construction of continuous monthly time series is feasible (Baumeister and Kilian, 2015). The maximum of our evaluation horizon is 24 months. Baumeister and Kilian (2015) give a good method to handle this problem which assigns zero weight to the futures-based forecast in the forecast combinations for horizons beyond 18 months. In our case, the combinations methods show significant forecastability for long horizons whereas the forecastability is not significant for medium horizons. If futures-based model forecasts receive zero weight for long horizons, we cannot infer whether the long-horizon forecastability comes from our constraints or the exclusion of futures-based forecasts. Furthermore, the one of the main purposes of this paper is to show that robust regressions with regularized constraints can improve forecasting performance. Therefore, we use the regression models to predict oil prices. Using futures-based model instead of the regression model may dilute the main contribution of the paper. Due to this consideration, we retain the predictive regression model which uses futures price as the predictor. We use the real price of NYMEX Contract 1 as the predictive variable for the oil spot price. The futures prices collected from the EIA website are further deflated by CPI to construct real futures prices.
3.2.1. Changes in global oil production (CGOP) Oil supply is a main determinant of oil prices and therefore can significantly affect their fluctuation. A small change in the supply may cause a large fluctuation in the market price of oil. We use global oil production as the proxy of supply, and to consider stationarity, we use the percent changes in production. 3.2.2. Real global economic activity (REA) We use the REA index introduced by Kilian (2009). Kilian and Zhou (2018) give a detailed discussion on this index and use it as a proxy of global economic activity when modeling commodity prices. 3.2.3. Percent changes in petroleum consumption (PPC) The PPC index also reflects the demand for crude oil, which can be a driver of long-term equilibrium price. We use the percent changes in the U.S. petroleum consumption data, which are available on the EIA website. 3.2.4. Changes in oil inventory (COI) Inventories have a buffer function on shocks to the real crude oil price. If the increase in crude inventories is greater than the expectation, a weaker demand for crude oil is implied. The predictive ability of these inventories for crude oil prices has often been considered in the literature (Baumeister and Kilian, 2012; Wang et al., 2017). We use the monthly global oil inventory measure proposed by Kilian and Murphy (2014), which is standard in the literature and can be extended back to 1973. We use the changes in global crude oil inventories rather than the inventory level following Baumeister and Kilian (2015). 3.2.5. Percent changes of oil import (PCOI) For oil-importing countries, key information about the shift in expectations about the real price of oil is reflected in changes in crude oil imports. Due to the lack of oil trading data for some major oil1 The forecasting results of futures-based model are reported in the online appendix. Indeed, we find that oil futures model significant outperforms the benchmark of no-change model for the horizons from 1 to 4 months. But the MSPE ratios are higher than the values of regression models reported in Table 1 of the paper. Therefore, regression model has better out-of-sample performance than futures-based model.
importing countries such as China, we calculate the percent changes in the U.S. oil imports provided by the EIA. 3.2.6. Non-energy commodity index (NECI) The link between oil and non-energy commodity prices has been investigated in many empirical studies (Barsky and Kilian, 2001; Kilian and Zhou, 2018). We collect monthly NECI data from the website of the World Bank and eliminate the influence of inflation by transforming the nominal into the real NECI using CPI. In summary, we use a total of 7 variables to predict 2 kinds of oil prices including real RAC and real WTI price. To save space and follow the suggestion of a referee, we mainly show the forecasting results of real RAC, and report WTI oil forecasting results in the online appendix. 4. Forecasting results In this section, we first evaluate the forecasting performance of individual models and then assess the results of equal-weighted mean combinations. We also compare the performance of robust regression models with competing methods such as MLR and subset regressions. 4.1. Forecasting performances of individual models We choose the no-change forecast model as the benchmark in our evaluation of the forecasting performance, which has been standard in the literature (Baumeister et al., 2014; Baumeister and Kilian, 2015, 2014b). This model uses the simple idea that the best forecast of the future real price of crude oil is the current real price of crude oil. We keep the size of the validation sample fixed (last two years) when determining the hyperparameters. Our evaluation period starts from January 1992. The forecasting horizons are allowed to change from 1 to 24 months. Table 1 reports the forecasting performance of the individual models evaluated by the MSPE ratio of the real RAC. The Clark and West (2007) method is used to test the significance of forecastability.2 We first assess the empirical results of the OLS models, as shown in Column 2 of Table 1. The forecasting results indicate levels of instability in the individual models that produce major fluctuations in the MSPE ratios with the change in horizons. The futures price significantly performs better than the benchmark model at the horizons of one and three months, whereas this forecastability disappears for longer horizons. A similar pattern occurs in the NECI models, which outperform the no-change forecast only up to horizons of six months. Our results are consistent with other studies that document the unstable performance of individual models in the forecasting of oil prices. In support of this, when using the simple OLS models without any constraints but only with variables such as oil futures, the COI and NECI models significantly outperform the no-change forecast for the short horizons. For horizons longer than six months, none of the OLS models can beat the benchmark model. We then address the results of the Huber losses in Column 6 of Table 1. The results show that the regressions using the Huber loss have a lower MSPE ratio than the OLS regressions, although in most cases they cannot beat the no-change benchmark. This improvement in forecasting is most prominent for the two-year horizons. We further introduce parameterization penalties in predictive regressions to explore the predictive ability of the shrinking and variable selection methods. As only one fundamental variable and the intercept are incorporated in the individual models, the penalty methods either keep one of them (LASSO) or shrink them simultaneously (Ridge), and this depends on the relevance of the given variable to oil prices. The forecasting results show that there are potential advantages to 2 Strictly, the Clark and West (2007) test does not test out-of-sample forecastability, but in-sample predictability, as noted by Kilian (2015). Therefore, those p-values should be interpreted with caution.
X. Hao et al. / Energy Economics 86 (2020) 104683
5
Table 1 Forecasting performance of individual models for real RAC (no-change benchmark). Explanatory variables
Original version OLS
Huber losses LASSO
Ridge
ENet
Huber
LASSO+H
Ridge+H
ENet+H
0.8452⁎⁎⁎ 1.0198 0.9959⁎⁎ 0.9897 1.0084 1.0077 0.9100⁎⁎
0.8468⁎⁎⁎ 1.0200 0.9943⁎⁎⁎ 0.9863 1.0056 1.0054 0.9090⁎⁎
0.8133⁎⁎⁎ 1.0827 1.0743 0.9825⁎
0.8269⁎⁎⁎ 1.0773 0.9567⁎⁎ 0.9802⁎
0.8229⁎⁎⁎ 1.0536 0.9558⁎⁎ 0.9775⁎⁎
0.8179⁎⁎⁎ 1.0774 0.9547⁎⁎ 0.9803⁎
1.0070 1.0714 0.9094⁎⁎⁎
0.8468⁎⁎⁎ 1.0200 0.9943⁎⁎⁎ 0.9863 1.0056 1.0054 0.9090⁎⁎
1.0086 1.0721 0.9093⁎⁎⁎
1.0069 1.0673 0.9020⁎⁎
1.0087 1.0496 0.9009⁎⁎
1.0069 1.0673 0.8994⁎⁎
Panel B: forecasting horizon of 3 months WF 0.9461⁎⁎⁎ CGOP 1.0647 REA 1.0662 COI 1.0137 PCOI 1.0130 PPC 1.0645 NECI 0.8910⁎⁎⁎
0.9433⁎⁎⁎ 1.0464 1.0054 1.0142 1.0148 1.0439 0.8885⁎⁎⁎
0.9417⁎⁎⁎ 1.0456 1.0053 1.0192 1.0152 1.0459 0.9046⁎⁎⁎
0.9433⁎⁎⁎ 1.0464 1.0054 1.0142 1.0148 1.0439 0.8885⁎⁎⁎
0.9475⁎⁎⁎ 1.0734 1.0212 1.0166 1.0141 1.0724 0.8937⁎⁎⁎
0.9438⁎⁎⁎ 1.0496 0.9810⁎
0.944⁎⁎ 1.0474 0.9797⁎
0.9438⁎⁎⁎ 1.0475 0.9808⁎
1.0137 1.0134 1.0456 0.8879⁎⁎⁎
1.0111 1.0134 1.0468 0.8884⁎⁎⁎
1.0139 1.0117 1.0486 0.8904⁎⁎⁎
Panel C: forecasting horizon of 6 months WF 1.0200 CGOP 1.0418 REA 1.0349 COI 1.0170 PCOI 1.0260 PPC 1.0418 NECI 0.9554⁎⁎⁎
1.0090 1.0155 1.0084 1.0041 1.0164 1.0155 0.9638⁎⁎
1.0080 1.0130 1.0169 1.0039 1.0139 1.0025 0.9667⁎⁎
1.0090 1.0155 1.0084 1.0040 1.0164 1.0156 0.9638⁎⁎
1.0252 1.0406 1.0452 1.0238 1.0390 1.0441 0.9657⁎⁎
1.0114 1.0081 1.0138 1.0063 1.0114 1.0092 0.9669⁎⁎
1.0097 1.0080 1.0150 1.0065 1.0129 1.0098 0.9652⁎⁎
1.0103 1.0077 1.0136 1.0063 1.0103 1.0088 0.9665⁎⁎
Panel D: at the horizon of 12 months WF 1.0450 CGOP 1.0719 REA 1.0481 COI 1.0445 PCOI 1.0411 PPC 1.0631 NECI 1.0536
1.0209 1.0306 1.0205 1.0171 1.0154 1.0266 1.0044
1.0139 1.0236 1.0154 1.0129 1.0121 1.0180 1.0063
1.0229 1.0269 1.0156 1.0111 1.0116 1.0182 1.0087
1.0453 1.0756 1.0468 1.0429 1.0416 1.0667 1.0514
0.9934 1.0059 1.0020 0.9873⁎ 0.9876⁎ 0.9953 1.0027
0.9938 1.006 1.0029 0.9878 0.9882 0.9968 1.0021
0.9936 1.0069 1.0028 0.9874⁎ 0.9872⁎ 0.9973 1.0005
0.9639⁎⁎⁎ 1.0016 0.9748⁎⁎⁎ 0.9677⁎⁎⁎ 0.9685⁎⁎⁎ 1.0001 0.9697⁎⁎⁎
0.9645⁎⁎⁎ 0.9714⁎⁎⁎ 0.9684⁎⁎⁎ 0.9671⁎⁎⁎ 0.9684⁎⁎⁎ 0.9726⁎⁎⁎ 0.9680⁎⁎⁎
0.9643⁎⁎⁎ 0.9918⁎⁎ 0.9741⁎⁎⁎ 0.9662⁎⁎⁎ 0.9684⁎⁎⁎ 0.9907⁎⁎ 0.9700⁎⁎⁎
1.0541 1.0683 1.0661 1.0556 1.0485 1.0703 1.0433
0.9525⁎⁎⁎ 0.9559⁎⁎⁎ 0.9543⁎⁎⁎ 0.9524⁎⁎⁎ 0.9535⁎⁎⁎ 0.9566⁎⁎⁎ 0.9511⁎⁎⁎
0.9523⁎⁎⁎ 0.9554⁎⁎⁎ 0.9550⁎⁎⁎ 0.9532⁎⁎⁎ 0.9540⁎⁎⁎ 0.9569⁎⁎⁎ 0.9510⁎⁎⁎
0.9530⁎⁎⁎ 0.9544⁎⁎⁎ 0.9556⁎⁎⁎ 0.9524⁎⁎⁎ 0.9536⁎⁎⁎ 0.9553⁎⁎⁎ 0.9504⁎⁎⁎
0.9521⁎⁎⁎ 0.9599⁎⁎⁎ 0.9541⁎⁎⁎ 0.9518⁎⁎⁎ 0.9509⁎⁎⁎ 0.9598⁎⁎⁎ 0.9524⁎⁎⁎
0.9538⁎⁎⁎ 0.9574⁎⁎⁎ 0.9553⁎⁎⁎ 0.9522⁎⁎⁎ 0.9525⁎⁎⁎ 0.9574⁎⁎⁎ 0.9506⁎⁎⁎
0.9523⁎⁎⁎ 0.9652⁎⁎⁎ 0.9533⁎⁎⁎ 0.9502⁎⁎⁎ 0.9506⁎⁎⁎ 0.9612⁎⁎⁎ 0.9513⁎⁎⁎
1.0547 1.0516 1.0435 1.0523 1.0528 1.0535 1.0389
0.9359⁎⁎⁎ 0.9355⁎⁎⁎ 0.9369⁎⁎⁎ 0.9333⁎⁎⁎ 0.9333⁎⁎⁎ 0.9381⁎⁎⁎ 0.9321⁎⁎⁎
0.9371⁎⁎⁎ 0.9375⁎⁎⁎ 0.9358⁎⁎⁎ 0.9350⁎⁎⁎ 0.9344⁎⁎⁎ 0.9384⁎⁎⁎ 0.9318⁎⁎⁎
0.9362⁎⁎⁎ 0.9358⁎⁎⁎ 0.9368⁎⁎⁎ 0.9334⁎⁎⁎ 0.9333⁎⁎⁎ 0.9375⁎⁎⁎ 0.9314⁎⁎⁎
Panel A: forecasting horizon of 1 month WF 0.8148⁎⁎⁎ CGOP 1.0819 REA 1.0772 COI 0.9823⁎ PCOI PPC NECI
Panel E: forecasting horizon of 18 months WF 1.0528 CGOP 1.0680 REA 1.0632 COI 1.0534 PCOI 1.0525 PPC 1.0675 NECI 1.0448 Pane F: the horizon of 24 months WF 1.0667 CGOP 1.0729 REA 1.0730 COI 1.0643 PCOI 1.0631 PPC 1.0734 NECI 1.0539
Notes: This table reports the MSPE ratio of the individual models based on two objective functions: mean squared error (MSE) and Huber. The penalties term of LASSO, Ridge and Elastic Net (ENet) are introduced into objective function respectively. “+H” indicates the use of Huber function instead of MSE. The MSPE ratio lower than 1 implies that the corresponding model on average results in more accurate forecasts than the no-change forecasts. The numbers in bolds denote improvement of predictive ability over the benchmark model. The significance of predictability is tested via the Clark and West (2007) statistic. ⁎ Denotes rejections of null hypothesis at 10% significance level. ⁎⁎ Denotes rejections of null hypothesis at 5% significance level. ⁎⁎⁎ Denotes rejections of null hypothesis at 1% significance level.
combining the Huber losses and penalty terms. The “LASSO+H”, “Ridge +H” and “Elastic Net + H” models show significant forecastability when evaluated by the MSPE ratio for long horizons of 18 and 24 months. As a comparison, the performance of OLS models change markedly with the predictors and forecasting horizons. The original LASSO, Elastic Net, and Ridge models generate less accurate forecasts than the counterparts based on robust loss function. We also forecast the real prices of WTI oil. In summary, we find qualitatively consistent evidence that the forecasting performance of individual models varies with the change of forecasting horizons. No individual model can outperform the benchmark model at all horizons.
Imposing the Huber loss can effectively improve the forecasting accuracy, particularly for some fundamental variables at longer horizons. In addition, the shrink and variable selection mechanism of the parameterization penalties provides more reliable forecasting gains under the Huber loss function, although in some cases the modified individual models do not beat the no-change benchmark. To save space, we do not report the WTI oil forecasting results but show them in the online appendix. To further understand the forecast improvement gained from imposing the Huber loss, we compute the MSPE ratio of predictive models with Huber loss when using those without Huber loss as the
6
X. Hao et al. / Energy Economics 86 (2020) 104683
benchmarks. The evaluation results are reported in Table 2. We find that for the short horizons of 1, 3 and 6 months, obtaining more accurate forecasts with the incorporation of the Huber loss is heavily dependent on the predictors. For the longer horizons, LASSO+H, Ridge+H, and ENet+H consistently outperform the LASSO, Ridge, and Elastic net methods, respectively. The significant gains of forecasting accuracy from accounting for the Huber loss in parameter estimation are evident in most cases. The superior performance of the models with Huber loss is generally more prominent for the longer horizons. The parameters of the OLS models are estimated using all past observations, while in the Huber loss models the whole in-sample set is divided into “validation” and “training” subsamples. The more recent validation subsample is used to determine the hyperparameters and the remaining training subsample is for parameter estimation. Thus, the OLS models utilize a larger estimation sample. This inequality is alleviated for the longer horizons. When using the Huber loss models the parameter estimates in the forecasting procedure fluctuate more than with the OLS models because of the shorter estimation sample. These volatile parameter estimates are likely to cause higher forecast error variance, and thus the forecasting performance will deteriorate. Overall, most of the forecasting models that combine the Huber loss function and regularization constraints reduce the MSPE ratio substantially more than those solely using the Huber loss or regularization constraints. The robust combination of these two methods can provide more accurate forecasts than the method based on the OLS models.
4.2. Performance of forecast combination Our forecasting results show that none of the individual models demonstrate significant forecastability at all horizons. For example, the performance of variables such as futures price and inventory changes considerably across different horizons. This is consistent with the main findings in the literature (Baumeister et al., 2014; Baumeister and Kilian, 2015; Wang et al., 2017). Similarly, the out-ofsample performance of individual models is also expected to change significantly over time. To eliminate model uncertainty, we consider the weighted average of individual model forecasts (i.e., forecast combination). An equal-weighted mean combination (EWMC) of all potential models is used, which is the standard approach in the literature. Although this combination strategy is simple, recent empirical studies show that it is difficult to find a sophisticated combination that can outperform the EWMC significantly (Baumeister et al., 2014; Baumeister and Kilian, 2015; Claeskens et al., 2016; Stock and Watson, 2004). The results are reported in Table 3. Table 3 provides us with meaningful findings. The forecasting performance of EWMC for the models without Huber loss is mixed, as it depends on the forecasting horizons and the predicted variable. In comparison, the EWMC outcomes for the Huber loss models are more encouraging. The combination of simple OLS models with the Huber loss significantly outperforms the no-change benchmark in the short horizons (1 and 3 months). Applying the regularization constraint can also further improve forecastability. We find that for the LASSO+H models the combination significantly beats the no-change benchmark in all forecasting horizons. Consistent evidence is found using the combinations for the Ridge+H models and for the ENet+H models. To assess the sensitivity of the combination forecasts to the weighting scheme, we use the inverse MSPE combination suggested by Baumeister and Kilian (2015). This uses the weights that inversely depend on the recent MSPE. Thus, a model with a lower recent MSPE receives higher weights when forming the combination forecasts. Overall, the results are consistent with those reported in Table 3. The MSPE ratios of the inverse MSPE combination and the equal-weighted combination are similar. We thus obtain more accurate forecasts of oil price changes using both Huber loss and regularization constraint than by
Table 2 Forecasting performance of individual models relative to the models without Huber loss. Explanatory variables
Huber
Panel A: forecasting horizon of 1 month WF 0.9982 CGOP 1.0007 REA 0.9973⁎ COI 1.0002 PCOI 1.0015 PPC 1.0007 NECI 0.9998
LASSO+H
Ridge+H
ENet+H
0.9765⁎⁎⁎ 1.0563 0.9622⁎ 0.9938⁎ 1.0013 1.0616 0.9922
0.9736⁎⁎⁎ 1.0332 0.9597⁎ 0.9877⁎⁎⁎ 1.0003 1.0416 0.9901
0.9659⁎⁎⁎ 1.0563 0.9602⁎ 0.9939⁎ 1.0013 1.0616 0.9894
1.0024 1.0018 0.9745⁎ 0.9920⁎⁎
1.0005 1.0010 0.9755⁎ 0.9997 0.9969⁎ 1.0045 1.0021
Panel A: forecasting horizon of 3 month WF 1.0015 1.0005 CGOP 1.0082 1.0031 REA 0.9578⁎⁎⁎ 0.9758⁎ COI 1.0028 0.9995 PCOI 1.0011 0.9987 PPC 1.0074 1.0016 NECI 1.0031 0.9993 Panel A: forecasting horizon of 6 month WF 1.0051 CGOP 0.9989 REA 1.0099 COI 1.0066 PCOI 1.0126 PPC 1.0022 NECI 1.0107
1.0024 0.9927⁎⁎⁎ 1.0053 1.0023 0.9951⁎ 0.9938⁎⁎⁎
1.0018 0.9951⁎⁎ 0.9981 1.0026 0.9991 1.0073 0.9985
1.0013 0.9923⁎⁎⁎ 1.0051 1.0023 0.9940⁎ 0.9933⁎⁎⁎
0.9983
0.9802⁎⁎ 0.9829⁎⁎ 0.9877⁎⁎ 0.9752⁎⁎⁎ 0.9764⁎⁎⁎ 0.9791⁎⁎ 0.9959⁎
0.9714⁎⁎⁎ 0.9805⁎⁎ 0.9874⁎⁎ 0.9765⁎⁎ 0.9759⁎⁎⁎ 0.9795⁎⁎ 0.9919⁎⁎⁎
0.9882⁎⁎⁎ 0.9544⁎⁎⁎ 0.9790⁎⁎⁎ 0.9842⁎⁎⁎ 0.9845⁎⁎⁎ 0.9565⁎⁎⁎ 0.9808⁎⁎⁎
0.9873⁎⁎⁎ 0.9836⁎⁎⁎ 0.9862⁎⁎⁎ 0.9856⁎⁎⁎ 0.9851⁎⁎⁎ 0.9838⁎⁎⁎ 0.9824⁎⁎⁎
0.9884⁎⁎⁎ 0.9623⁎⁎⁎ 0.9810⁎⁎⁎ 0.9857⁎⁎⁎ 0.9848⁎⁎⁎ 0.9643⁎⁎⁎ 0.9797⁎⁎⁎
0.9830⁎⁎⁎ 0.9746⁎⁎⁎ 0.9820⁎⁎⁎ 0.9805⁎⁎⁎ 0.9815⁎⁎⁎ 0.9774⁎⁎⁎ 0.9788⁎⁎⁎
0.9826⁎⁎⁎ 0.9791⁎⁎⁎ 0.9795⁎⁎⁎ 0.9819⁎⁎⁎ 0.9810⁎⁎⁎ 0.9801⁎⁎⁎ 0.9802⁎⁎⁎
0.9831⁎⁎⁎ 0.9695⁎⁎⁎ 0.9827⁎⁎⁎ 0.9823⁎⁎⁎ 0.9818⁎⁎⁎ 0.9754⁎⁎⁎ 0.9791⁎⁎⁎
1.0032
Panel A: forecasting horizon of 12 month WF 1.0003 0.9730⁎⁎ CGOP 1.0034 0.9761⁎⁎⁎ REA 0.9988⁎ 0.9819⁎⁎⁎ COI 0.9984 0.9707⁎⁎⁎ PCOI 1.0005 0.9726⁎⁎⁎ PPC 1.0034 0.9695⁎⁎⁎ NECI
0.9980
Panel A: forecasting horizon of 18 month WF 1.0013 CGOP 1.0002 REA 1.0027 COI 1.0021 PCOI 0.9962⁎⁎⁎ PPC NECI
1.0026 0.9986
Panel A: forecasting horizon of 24 month WF 0.9888⁎⁎⁎ CGOP 0.9801⁎⁎⁎ REA 0.9725⁎⁎⁎ COI 0.9887⁎⁎⁎ PCOI 0.9903⁎⁎⁎ PPC 0.9814⁎⁎⁎ NECI 0.9858⁎⁎⁎
0.9982 1.0009 0.9821⁎⁎⁎
1.0028
Notes: This table reports the MSPE ratio of the individual models based on Huber objective function. The penalties term of LASSO, Ridge and Elastic Net (ENet) are introduced into objective function respectively. “+H” indicates the use of Huber losses instead of MSE. The MSPE ratio lower than 1 implies that the corresponding model on average results in more accurate forecasts than the benchmark of corresponding models without Huber objective function. The numbers in bolds denote improvement of predictive ability over the benchmark model. The significance of predictability is tested via the Clark and West (2007) statistic. ⁎ Denotes rejections of null hypothesis at 10% significance level. ⁎⁎ Denotes rejections of null hypothesis at 5% significance level. ⁎⁎⁎ Denotes rejections of null hypothesis at 1% significance level.
solely using either one. The results are shown in the online appendix to save space. Table 4 gives the evaluation results of the equal-weighted combination for the Huber loss models when the combination of benchmark models is the same as for the corresponding models without Huber loss. We find that the implementation of the Huber loss can also significantly improve the out-of-sample performance of equal-weighted combination based on penalty regressions for horizons longer than 6 months. The significant improvement of equal-weighted combination
X. Hao et al. / Energy Economics 86 (2020) 104683
7
Table 3 Forecasting results of equal-weighted mean combination relative to the no-change forecast. Horizon (h)
1
Original version
Huber losses
OLS
LASSO
Ridge
ENet
Huber
LASSO+H
Ridge+H
ENet+H
0.9483⁎⁎⁎
0.9466⁎⁎⁎
0.9497⁎⁎⁎
0.9466⁎⁎⁎
0.9486⁎⁎⁎
0.9488⁎⁎⁎
0.9436⁎⁎⁎
0.9477⁎⁎⁎
0.9790⁎
0.9750⁎
0.9796⁎
0.9750⁎
0.9809⁎
0.9694⁎⁎
0.9704⁎⁎
0.9704⁎⁎
1.0105
0.9979
0.9987
0.9979
1.0168
0.9976
0.9977
0.9971
1.0459
1.0161
1.0121
1.0136
1.0460
0.9936
0.9941
0.9938
1.0541
0.9760⁎⁎⁎
0.9681⁎⁎⁎
0.9732⁎⁎⁎
1.0544
0.9529⁎⁎⁎
0.9532⁎⁎⁎
0.9528⁎⁎⁎
1.0628
0.9523⁎⁎⁎
0.9525⁎⁎⁎
0.9529⁎⁎⁎
1.0469
0.9332⁎⁎⁎
0.9340⁎⁎⁎
0.9332⁎⁎⁎
3 6 12 18 24 Notes: This table reports the MSPE ratio of the equal-weighted mean combination for the predictive regressions based on two objective functions: mean squared error (MSE) and Huber. The penalties term of LASSO, Ridge and Elastic Net (ENet) are introduced into objective function respectively. “+H” indicates the use of Huber loss instead of MSE loss. The MSPE ratio lower than 1 implies that the corresponding model on average results in more accurate forecasts than the benchmark of no-change forecast model. The numbers in bolds denote improvement of predictive ability over the benchmark model. The significance of predictability is tested via the Clark and West (2007) statistic. ⁎ Denotes rejections of null hypothesis at 10% significance level. ⁎⁎ Denotes rejections of null hypothesis at 5% significance level. ⁎⁎⁎ Denotes rejections of null hypothesis at 1% significance level.
based on regression models without regularization constraints is only found at the horizon of 24 months.
4.3. Alternative approaches In this subsection, we consider two alternative approaches for dealing with multivariate information. Multiple linear regression (MLR) uses all potential predictors and subset regressions proposed by Elliott et al. (2013). Table 5 shows the forecasting performance of the MLR models. We find that the forecasting performance of the MLR models with parameterization penalties is better than EWMC in the short horizons (up to three months), and they significantly outperform the benchmark model in the short horizons. In horizons longer than three months, the performance of MLR becomes worse than that of EWMC. MLR with the Huber loss and regularization constraints results in significant forecastability. Similarly, the implementation of Huber loss further improves the forecasting accuracy of penalty MLR models that LASSO +H, Ridge+H, and ENet+H consistently generate forecasts with lower MSPE ratios than the LASSO, Ridge, and Elastic net methods, respectively.
The poor performance of MLR at long horizons is to be expected, as it contains many parameters that must be estimated, leading to volatile oil price change forecasts. Such methods imply that the predictor variables incorporated in the predictive regressions are relevant to the predicted variables all the time. This assumption looks a bit unrealistic because the major determinants of oil prices change over time. The inclusion of irrelevant predictors causes very volatile forecasts, deteriorating the out-ofsample performance. This is consistently found in the financial return forecasting literature (e.g., Rapach et al., 2010). Shrinkage is involved in the subset regression approach. Theoretically, the forecasters have a total of 2k − 1 predictive models by determining each of k potential variables that should be incorporated in the predictive regression. Elliott et al. (2013) find that the combination of predictive regressions with a few predictors works better than a combination of univariate regressions or a combination of all potential regressions. Thus, we consider the equal-weighted combination of the forecasts from two-variate, three-variate, first two k-variate, and first three k-variate models. The results for the models with OLS and Huber losses are given in Table 6. Table 6 shows that there is a significant improvement in accuracy in the subset regressions over the equal-weighted combination of singlepredictor models. Similar conclusions emerge using the MSE or Huber
Table 4 Forecasting results of equal-weighted mean combination for the models with Huber objective function. Horizon (h)
RAC Huber
LASSO+H
Ridge+H
ENet+H
1.0003
1.0023
0.9936
1.0012
1.0020
0.9942
0.9906⁎⁎
0.9953
1.0062
0.9996
0.9990
0.9992
1.0002
0.9778⁎⁎⁎
0.9822⁎⁎⁎
0.9805⁎⁎⁎
1.0002
0.9764⁎⁎⁎
0.9847⁎⁎⁎
0.979⁎⁎⁎
0.9850⁎⁎⁎
0.9799⁎⁎⁎
0.9806⁎⁎⁎
0.9794⁎⁎⁎
1 3 6 12 18 24
Notes: This table reports the MSPE ratio of equal-weighted mean combination for the predictive regressions with Huber objective function. The penalties term of LASSO, Ridge and Elastic Net (ENet) are introduced into objective function respectively. “+H” indicates the use of Huber losses instead of mean squared error (MSE). The MSPE ratio lower than 1 implies that the corresponding model on average results in more accurate forecasts than the benchmark of corresponding models without Huber loss. The numbers in bolds denote improvement of predictive ability over the benchmark model. The significance of predictability is tested via the Clark and West (2007) statistic. ⁎ Denotes rejections of null hypothesis at 10% significance level. ⁎⁎ Denotes rejections of null hypothesis at 5% significance level. ⁎⁎⁎ Denotes rejections of null hypothesis at 1% significance level.
8
X. Hao et al. / Energy Economics 86 (2020) 104683
Table 5 Forecasting performance of multivariate linear regression relative to the no-change forecast. Horizon (h)
1 3
Original version
Huber losses
OLS
LASSO
Ridge
ENet
Huber
LASSO+H
Ridge+H
ENet+H
0.8430⁎⁎⁎
0.8884⁎⁎⁎
0.9051⁎⁎⁎
0.8884⁎⁎⁎
0.8163⁎⁎⁎
0.9100⁎⁎
0.8826⁎⁎⁎
0.9100⁎⁎
0.9368⁎⁎
0.9208⁎
0.9099⁎⁎
0.9208⁎
0.9071⁎⁎
0.8919⁎⁎
0.8915⁎⁎
0.8919⁎⁎
0.9884⁎
0.9720⁎
0.9691⁎
0.9720⁎
0.9689⁎⁎⁎
0.9832
0.9754⁎
0.9832
1.1119
1.0641
1.0414
1.0561
1.1661
1.0300
1.0263
1.0348
1.0686
1.0037
0.9822⁎⁎
1.0043
1.0846
0.9838⁎⁎⁎
0.9578⁎⁎⁎
0.9785⁎⁎⁎
1.0825
0.9915⁎⁎⁎
0.9578⁎⁎⁎
0.9705⁎⁎⁎
1.0451
0.9788⁎⁎⁎
0.9462⁎⁎⁎
0.9546⁎⁎⁎
6 12 18 24 Notes: This table reports the MSPE ratio of multivariate liner regressions based on two objective functions: mean squared error (MSE) and Huber. The penalties term of LASSO, Ridge and Elastic Net (ENet) are introduced into objective function respectively. “+H” indicates the use of Huber losses instead of MSE. The MSPE ratio lower than 1 implies that the corresponding model on average results in more accurate forecasts than the no-change forecasts. The numbers in bolds denote improvement of predictive ability over the benchmark model. The significance of predictability is tested via the Clark and West (2007) statistic. ⁎ Denotes rejections of null hypothesis at 10% significance level. ⁎⁎ Denotes rejections of null hypothesis at 5% significance level. ⁎⁎⁎ Denotes rejections of null hypothesis at 1% significance level.
loss functions. In the horizon of 1 month, the combination of the 3variate models has the greatest MSPE reduction of 12% relative to the no-change benchmark in forecasting real RAC. The forecasting performance of the subset regressions with the Huber loss and those with OLS are similar in the horizons of 1, 3 and 6 months. The superiority of the Huber loss models emerges in the longer horizon 24 months. This is confirmed by the finding that the subset regressions with OLS have MSPE ratios higher than one, whereas the same approach with the Huber loss results in an MSPE ratio significantly lower than one, thus demonstrating significant forecastability.
4.4. Comparison with existing studies In this subsection, we compare the forecasting gains of equalweighted forecast combination based on models of interest with the gains reported in the literature. Specifically, we choose the equalweighted forecast combination based on 5 individual models in Baumeister et al. (2014) and the same approach based on 6 individual models in Baumeister and Kilian (2015). The 5 models in Baumeister et al. (2014) include a VAR model for global oil market, a forecast
Table 6 Forecasting performance of subset regression models relative to the no-change forecast.
OLS
Huber
No. of predictors
h=1
h=3
h=6
h = 12
h = 18
h = 24
single two three first two first three single two three first two first three
0.9483⁎⁎⁎ 0.9091⁎⁎⁎ 0.8824⁎⁎⁎ 0.9176⁎⁎⁎ 0.8958⁎⁎⁎ 0.9486⁎⁎⁎
0.9790⁎ 0.9589⁎⁎ 0.9464⁎⁎ 0.9633⁎⁎ 0.9528⁎⁎ 0.9809⁎ 0.9731⁎ 0.9076⁎⁎ 0.9745⁎ 0.9341⁎⁎
1.0105 1.0041 0.9996 1.0055 1.0019 1.0168 0.9979 0.9923 1.0019 0.9952
1.0459 1.0549 1.0656 1.0524 1.0593 1.046 1.0489 1.0565 1.0477 1.0522
1.0541 1.0572 1.0601 1.0563 1.0582 1.0544 1.0393 1.0403 1.0424 1.0409
1.0628 1.0641 1.0666 1.0636 1.0649 1.0469 0.9875⁎⁎⁎ 0.9922⁎⁎ 1.0008 0.9955⁎⁎
0.9847 0.8775⁎⁎⁎ 0.9733⁎ 0.9151⁎⁎⁎
Notes: This table reports the MSPE ratio of subset regressions based on two objective functions: mean squared error (MSE) and Huber. The penalties term of LASSO, Ridge and Elastic Net (ENet) are introduced into objective function respectively. “+H” indicates the use of Huber losses instead of MSE. The MSPE ratio lower than 1 implies that the corresponding model on average results in more accurate forecasts than the no-change forecasts. The numbers in bolds denote improvement of predictive ability over the benchmark model. The significance of predictability is tested via the Clark and West (2007) statistic. ⁎ Denotes rejections of null hypothesis at 10% significance level. ⁎⁎ Denotes rejections of null hypothesis at 5% significance level. ⁎⁎⁎ Denotes rejections of null hypothesis at 1% significance level.
model based on non-oil industrial commodity prices, a forecast model based on oil futures prices, a forecast model based on the spread of product prices relative to the price of crude oil, and an oil inventory model. Baumeister and Kilian (2015) add the no-change model in the pool for forecast combination. For comparison, we select the same evaluation period from 1992.01 to 2012.09. Table 7 reports the corresponding results for the real price of RAC. For the longer horizons of 18 and 24 months, the LASSO+H, Ridge+H and Enet+H approaches have lower MSPE ratios than the values reported in Baumeister et al. (2014) and Baumeister and Kilian (2015),3 suggesting the greater forecasting gains. Notably, for the horizon of 24 months, the MSPEs of our approaches are 13% lower than the MSPE of no-change model in forecasting real RAC, in comparison with the 8% reduction of MSPE reported in two existing studies. However, for the shorter horizons, our robust regressions with regularized constraints perform worse than the methods developed by Baumeister et al. (2014) and Baumeister and Kilian (2015). That is, the superiority of our approach concentrates in the longer horizons.4
4.5. Directional accuracy Tables 8–9 provide the results for the directional accuracy test of mean combination and MLR models, respectively. Under the null hypothesis of no directional accuracy, the success probability of a model in forecasting the directions of real oil price changes should be not significantly different from 0.5, i.e., the probability of heads or tails when tossing a coin. Thus, if the success ratio of a strategy is higher than 0.5, we can conclude that it is more accurate than the no-change forecast. The null of no directional accuracy is statistically tested using the method of Pesaran and Timmermann (2009). These approaches for the models with Huber loss and regularization constraints are better at predicting the direction of change than the no-change benchmark at all horizons. However, the directional accuracy of OLS-based approaches relies heavily on the forecasting horizon. The biggest success ratio for forecasting the real RAC is with EWMC based on the “ENet +Huber” method at the horizon of 6 months, and is up to 0.637.
3 Please see Table 1 of Baumeister et al. (2014) and Table 1a of Baumeister and Kilian (2015). 4 It needs to be noticed that those comparison results should be viewed with caution, since the two earlier studies use real-time data, whereas we use the fully revised data that may make it easier to achieve forecasting success.
X. Hao et al. / Energy Economics 86 (2020) 104683
9
Table 7 Forecasting results of equal-weighted mean combination relative to the no-change forecast (1992.01–2012.09). Horizon (h)
1
Original version
Huber losses
OLS
LASSO
Ridge
ENet
Huber
LASSO+H
Ridge+H
ENet+H
0.949⁎⁎⁎
0.948⁎⁎
0.951⁎⁎
0.948⁎⁎
0.949⁎⁎⁎
0.954⁎⁎
0.947⁎⁎⁎
0.953⁎⁎
0.971⁎
0.974
0.977
0.974
0.972⁎
0.967⁎
0.967⁎
0.968⁎
0.996
0.998
0.999
0.998
1.001
0.997
0.996
0.996
1.004
1.018
1.011
1.014
1.004
0.992
0.992
0.992
0.961⁎⁎
0.946⁎⁎⁎
0.933⁎⁎⁎
0.941⁎⁎⁎
0.962⁎⁎
0.915⁎⁎⁎
0.915⁎⁎⁎
0.915⁎⁎⁎
0.927⁎⁎⁎
0.894⁎⁎⁎
0.893⁎⁎⁎
0.896⁎⁎⁎
0.907⁎⁎⁎
0.869⁎⁎⁎
0.869⁎⁎⁎
0.869⁎⁎⁎
3 6 12 18 24 Notes: This table reports the MSPE ratio of the equal-weighted mean combination for the predictive regressions based on two objective functions: mean squared error (MSE) and Huber. The penalties term of LASSO, Ridge and Elastic Net (ENet) are introduced into objective function respectively. “+H” indicates the use of Huber loss instead of MSE loss. The MSPE ratio lower than 1 implies that the corresponding model on average results in more accurate forecasts than the benchmark of no-change forecast model. The numbers in bolds denote improvement of predictive ability over the benchmark model. The significance of predictability is tested via the Clark and West (2007) statistic. ⁎ Denotes rejections of null hypothesis at 10% significance level. ⁎⁎ Denotes rejections of null hypothesis at 5% significance level. ⁎⁎⁎ Denotes rejections of null hypothesis at 1% significance level.
Table 8 Forecasting performance of equal-weighted mean combination evaluated by success ratio. Horizon (h)
1 3
Original version LASSO
Ridge
ENet
0.543⁎
Huber
LASSO+H
Ridge+H
ENet+H
0.555⁎⁎
0.517
0.558⁎⁎
0.555⁎⁎
0.606⁎⁎⁎
0.577⁎⁎⁎
0.606⁎⁎⁎
0.558⁎⁎
0.571⁎⁎⁎
0.543⁎
0.558⁎⁎
0.562⁎⁎
0.615⁎⁎⁎
0.587⁎⁎⁎
0.621⁎⁎⁎
0.549⁎⁎
0.539⁎
0.521
0.539⁎
0.565⁎⁎
0.603⁎⁎⁎
0.596⁎⁎⁎
0.637⁎⁎⁎
0.524
0.556⁎⁎
0.518
0.576⁎⁎⁎
0.527
0.585⁎⁎⁎
0.547⁎⁎
0.608⁎⁎⁎
0.538⁎
0.498
0.498
0.489
0.521
0.597⁎⁎⁎
0.548⁎⁎
0.607⁎⁎⁎
0.538⁎
0.505
0.458
0.538⁎
0.535
0.612⁎⁎⁎
0.569⁎⁎⁎
0.629⁎⁎⁎
6 12 18
Huber losses
OLS
24 Notes: This table reports the success ratio of the equal-weighted mean combination for the predictive regressions based on two objective functions: mean squared error (MSE) and Huber. The penalties term of LASSO, Ridge and Elastic Net (ENet) are introduced into objective function respectively. “+H” indicates the use of Huber loss instead of MSE loss. The success ratio higher than 0.5, the probability of tossing a coin, indicates that the given method can beat the no-change forecast. The numbers in bolds denote the improvement of predictive ability over the benchmark model. The significance of predictability is tested via the Pesaran and Timmermann (2009) method. ⁎ Denotes rejections of null hypothesis at 10% significance level. ⁎⁎ Denotes rejections of null hypothesis at 5% significance level. ⁎⁎⁎ Denotes rejections of null hypothesis at 1% significance level.
Table 9 Forecasting performance of multivariate linear regression evaluated by success ratio. Horizon (h)
1 3
Original version LASSO
Ridge
ENet
0.558⁎⁎
Huber
LASSO+H
Ridge+H
ENet+H
0.558⁎⁎
0.521
0.558⁎⁎
0.552⁎⁎
0.596⁎⁎⁎
0.568⁎⁎⁎
0.599⁎⁎⁎
0.562⁎⁎
0.571⁎⁎⁎
0.543⁎
0.562⁎⁎
0.562⁎⁎
0.618⁎⁎⁎
0.590⁎⁎⁎
0.618⁎⁎⁎
0.555⁎⁎
0.539⁎
0.514
0.539⁎
0.571⁎⁎⁎
0.580⁎⁎⁎
0.593⁎⁎⁎
0.599⁎⁎⁎
0.531
0.556⁎⁎
0.524
0.576⁎⁎⁎
0.531
0.566⁎⁎⁎
0.553⁎⁎
0.572⁎⁎⁎
0.538⁎
0.498
0.498
0.492
0.521
0.548⁎⁎
0.548⁎⁎
0.548⁎⁎
0.538⁎
0.505
0.458
0.542⁎
0.535
0.565⁎⁎
0.569⁎⁎⁎
0.562⁎⁎
6 12 18
Huber losses
OLS
24 Notes: This table reports the success ratio of multivariate linear regressions based on two objective functions: mean squared error (MSE) and Huber. The penalties term of LASSO, Ridge and Elastic Net (ENet) are introduced into objective function respectively. “+H” indicates the use of Huber loss instead of MSE loss. The success ratio higher than 0.5, the probability of tossing a coin, indicates that the given method can beat the no-change forecast. The numbers in bolds denote the improvement of predictive ability over the benchmark model. The significance of predictability is tested via the Pesaran and Timmermann (2009) method. ⁎ Denotes rejections of null hypothesis at 10% significance level. ⁎⁎ Denotes rejections of null hypothesis at 5% significance level. ⁎⁎⁎ Denotes rejections of null hypothesis at 1% significance level.
10
X. Hao et al. / Energy Economics 86 (2020) 104683
Fig. 1. Recursive MSPE ratios.
4.6. Changes in forecastability over time In this subsection, we focus on the question of time-variation in the forecast accuracy, and whether the forecasting gains are affected by particularly unusual periods. The MSPE ratios are uninformative when based on too short an evaluation period, so we discard the first five years of the period. To identify the specific improvement in accuracy, we calculate the recursive MSPE ratios of EWMC based on the OLS models, the regressions with regularization constraints (Elastic Net), the robust regression model (Huber), and the robust regression with regularization constraints (ENet +Huber). The results of RAC at each horizon for the evaluation period since January 1997 are plotted in Fig. 1. As Fig. 1 shows, the EWMC models with parameterization penalties are generally robust to time evolution at all horizons. At the horizons of 1 and 3 months, all four strategies perform significantly better than the no-change forecast model after mid-2008. At horizon of 12 months, only ENet+Huber is able to outperform the benchmark model after the recursive MSPE ratio experienced a sharp decline during 2011. At horizons longer than 12 months, penalty regression models (ENet and ENet+Huber) generate more accurate forecast that the no-change forecast models with the time change, and implementing Huber loss further reduces the MSPE ratio. The performance of the other two strategies cannot consistently beat the no-change benchmark, particularly at the 18 and 24 month horizons.
sectors, and is thus of major interest to academics and central banks. In this study, we focus on a robust loss (Huber) to reduce the forecast variability. In addition, regularization constraints of parameter penalties are introduced into the forecasting models to develop a robust variable selection method, which can avoid overfitting and improve out-ofsample forecasting performance. Our empirical results show that none of the individual models can beat the no-change model benchmark in forecasting real oil prices at all horizons. Thus, we assess the predictive power of equal weight mean combination (EWMC) and multiple linear regression (MLR). Our results indicate that an EWMC based on robust parameter design and parameterization penalties can outperform the benchmark no-change model at all horizons (up to two years), and are robust with the time variation. We also find that the combinations of robust penalized models can significantly outperform those with regularization constraints in long horizons (over three months). The success ratios further show that our models predict the direction of change better than the no-change benchmark. We would like to conclude the paper by pointing some limitations of the work. It has become standard in the literature to forecast the real price of oil subject to the constraints on the real-time availability of the data (Baumeister and Kilian, 2012, 2014a, 2014b, 2015). In this paper, we actually use the fully revised data that are only available ex post. We recognize that it may make easier to achieve forecasting success.
CRediT authorship contribution statement 5. Conclusions Accurately forecasting real oil prices is extremely important for the world economy, particularly in the transportation and manufacturing
Xianfeng Hao:Formal analysis, Software, Writing - original draft. Yuyang Zhao:Formal analysis, Software, Writing - original draft. Yudong Wang:Formal analysis, Software, Writing - original draft.
X. Hao et al. / Energy Economics 86 (2020) 104683
Acknowledgements This work is supported by the National Natural Science Foundation of China (Nos. 71501095 and 71722015). Appendix A. The accelerated proximal algorithm We present the accelerated proximal algorithm (APG) (see, e.g., Parikh, 2014; Polson et al., 2015; Gu et al., 2018), which allows for efficient implementation of the Elastic Net, LASSO and Ridge regression for both OLS and Huber losses. Algorithm: Accelerated Proximal Gradient Method Initialization: θ0 = 0, m = 0, γ; while θ0 not converged do θ←θm −γ∇LðθÞjθ ¼ θm ~ θ←prox ðθÞ γ∅
m ~ ðθ−θm Þ mþ3 m←m+1 end 8 θ > > ; Ridge > > < 1 þ λγ Sðθ; λγÞ; LASSO proxγ∅ ðθÞ ¼ > > 1 > > Sðθ; ð1−ρÞλγÞ; Elastic Net : 1 þ λγρ where S(x, u) is vector-valued functions, whose i-th components are defined by: 8 < xi −μ; if xi N 0 and μbjxi j ðSðx; uÞÞi ¼ xi þ μ; if xi b0 and μbjxi j . : 0; if μ ≥jxi j Result: The final parameter estimate is θm. θmþ1 ←~θ þ
Appendix B. The choice of optimal hyperparameters It is almost impossible in real time to determine the optimal hyperparameter for out-of-sample forecasting. Alternatively, we use the hyperparameter that has the lowest forecasting error in “validation” subsample. The hyperparameter ρ in regularization constraints is responsible for balancing the Ridge and the LASSO term, and we set ρ = 0 for LASSO, ρ = 1 for Ridge and ρ = 0.5 for Elastic Net. Theoretically, another hyperparameter λ in regularization constraints is a nonnegative number. Similarly, the hyperparameter δ used in Huber loss is a positive number. The range of values for hyperparameters that we search for can significantly affect the success of hyperparameter optimization. It is nearly impractical to specify a very large range that covers every possible value for a hyperparameter. In addition, the success probability will increase if limiting the search to a small range of values. OLS is a special case of the Huber loss when δ = ∞. Based on the data we collected Huber loss is equivalent to OLS when the value of δ is around 1.5. Thus, we search the values in the range of [10−7, 2] by 2000 different values of δ using a linear scale. Unlike the hyperparameter in Huber loss, hyperparameter λrepresents the degree of parameterization penalty, which may change in a range that spans several orders of magnitude. Considering this we implement a searching uniformly on a logarithmic scale in the range of [0,106 − 1] with 2000 possible values, rather than searching on a linear scale. Appendix C. Supplementary data Supplementary data to this article can be found online at https://doi. org/10.1016/j.eneco.2020.104683.
11
Barsky, R.B., Kilian, L., 2001. Do we really know that oil caused the great stagflation? A Monetary Alternative. NBER Macroecon. Annu. 16, 117–183. Baumeister, C., Kilian, L., 2012. Real-time forecasts of the real price of oil. J. Bus. Econ. Stat. 30, 326–336. Baumeister, C., Kilian, L., 2014a. What central bankers need to know about forecasting oil prices. Int. Econ. Rev. (Philadelphia). 55, 869–889. Baumeister, C., Kilian, L., 2014b. Real-time analysis of oil price risks using forecast scenarios. IMF Econ. Rev. 62, 119–145. Baumeister, C., Kilian, L., 2015. Forecasting the real price of oil in a changing world: a forecast combination approach. J. Bus. Econ. Stat. 33, 338–351. Baumeister, C., Kilian, L., 2016. Forty years of oil price fluctuations: why the price of oil may still surprise US. J. Econ. Perspect. 30, 139–160. Baumeister, C., Kilian, L., Lee, T.K., 2014. Are there gains from pooling real-time oil price forecasts? Energy Econ. 46, S33–S43. Baumeister, C., Guérin, P., Kilian, L., 2015. Do high-frequency financial data help forecast oil prices? The MIDAS touch at work. Int. J. Forecast. 31, 238–252. Baumeister, C., Kilian, L., Zhou, X., 2018. Are product spreads useful for forecasting oil prices? An empirical evaluation of the Verleger hypothesis. Macroecon. Dyn. 22, 562–580. Box, G.E.P., 1953. Non-normality and tests on variances. Biometrika 40, 318. Claeskens, G., Magnus, J.R., Vasnev, A.L., Wang, W., 2016. The forecast combination puzzle: a simple theoretical explanation. Int. J. Forecast. 32, 754–762. Clark, T.E., West, K.D., 2007. Approximately normal tests for equal predictive accuracy in nested models. J. Econom. 138, 291–311. Coppola, A., 2008. Forecasting oil price movements: exploiting the information in the futures market. J. Futur. Mark. 28, 34–55. Diebold, F.X., Pauly, P., 1987. Structural change and the combination of forecasts. J. Forecast. 6, 21–40. Drachal, K., 2016. Forecasting spot oil price in a dynamic model averaging framework — have the determinants changed over time? Energy Econ. 60, 35–46. Elliott, G., Gargano, A., Timmermann, A., 2013. Complete subset regressions. J. Econom. 177 (2), 357–373. Fan, J., Li, Q., Wang, Y., 2017. Estimation of high dimensional mean regression in the absence of symmetry and light tail assumptions. J. R. Stat. Soc. Ser. B Stat Methodol. 79, 247–265. Friedman, J.H., 2001. Greedy function approximation: a gradient boosting machine. Ann. Stat. 29, 1189–1232. Funk, C., 2018. Forecasting the real price of oil - time-variation and forecast combination. Energy Econ. 76, 288–302. Gu, S., Kelly, B.T., Xiu, D., 2018. Empirical Asset Pricing Via Machine Learning (SSRN Electron. J). He, X., Shao, Q.M., 1996. A general Bahadur representation of M-estimators and its application to linear regression with nonstochastic designs. Ann. Stat. 24, 2608–2630. Hoerl, A.E., Kennard, R.W., 1970a. Ridge regression: biased estimation for nonorthogonal problems. Technometrics 12, 55–67. Hoerl, A.E., Kennard, R.W., 1970b. Ridge regression: applications to nonorthogonal problems. Technometrics 12, 69–82. Huber, P.J., 1964. Robust estimation of a location parameter. Ann. Math. Stat. 35, 73–101. Huber, P.J., 1973. Robust regression: asymptotics, conjectures and Monte Carlo. Ann. Stat. 1, 799–821. Huber, P.J., 2011. Robust Statistics, in: International Encyclopedia of Statistical Science. pp. 1248–1251. Jammazi, R., Aloui, C., 2012. Crude oil price forecasting: experimental evidence from wavelet decomposition and neural network modeling. Energy Econ. 34, 828–841. Kilian, L., 2009. Not all oil price shocks are alike: disentangling demand and supply shocks in the crude oil market. Am. Econ. Rev. 99, 1053–1069. Kilian, L., 2014. Oil price shocks: causes and consequences. Annu. Rev. Resour. Econ. 6, 133–154. Kilian, L., 2015. Comment. J. Bus. Econ. Stat. 33, 13–17. Kilian, L., Murphy, D.P., 2014. The role of inventories and speculative trading in the global market for crude oil. J. Appl. Ecol. 29 (3), 454–478. Kilian, L., Park, C., 2009. The impact of oil price shocks on the U.S. stock market. Int. Econ. Rev. (Philadelphia). 50, 1267–1287. Kilian, L., Zhou, X., 2018. The Propagation of Regional Shocks in Housing Markets: Evidence from Oil Price Shocks in Canada (SSRN Electron. J). Miao, H., Ramchander, S., Wang, T., Yang, D., 2017. Influential factors in crude oil price forecasting. Energy Econ. 68, 77–88. Mostafa, M.M., El-Masry, A.A., 2016. Oil price forecasting using gene expression programming and artificial neural networks. Econ. Model. 54, 40–53. Murat, A., Tokat, E., 2009. Forecasting oil price movements with crack spread futures. Energy Econ. 31, 85–90. Naser, H., 2016. Estimating and forecasting the real prices of crude oil: a data rich model using a dynamic model averaging (DMA) approach. Energy Econ. 56, 75–87. Parikh, N., 2014. Proximal algorithms. Found. Trends®. Optim. 1, 127–239. Pesaran, M.H., Timmermann, A., 2009. Testing dependence among serially correlated multicategory variables. J. Am. Stat. Assoc. 104, 325–337. Polson, N.G., Scott, J.G., Willard, B.T., 2015. Proximal algorithms in statistics and machine learning. Stat. Sci. 30, 559–581. Rapach, D.E., Strauss, J.K., Zhou, G., 2010. Out-of-sample equity premium prediction: combination forecasts and links to the real economy. Rev. Financ. Stud. 23, 821–862.
References
Stock, J.H., Watson, M.W., 2004. Combination forecasts of output growth in a sevencountry data set. J. Forecast. 23, 405–430.
Alquist, R., Kilian, L., 2010. What do we learn from the price of crude oil futures? J. Appl. Econ. 25 (4), 539–573. Alquist, R., Kilian, L., Vigfusson, R.J., 2013. Forecasting the price of oil. Handbook of Economic Forecasting, pp. 427–507.
Tibshirani, R., 1996. Regression shrinkage and selection via the Lasso. J. R. Stat. Soc. Ser. B 58, 267–288. Tukey, J.W., 1960. A survey of sampling from contaminated distributions. Contrib. to Probab. Stat. Essays Honor Harold Hotell. 2, 448–485.
12
X. Hao et al. / Energy Economics 86 (2020) 104683
Wang, Y., Liu, L., Diao, X., Wu, C., 2015. Forecasting the real prices of crude oil under economic and statistical constraints. Energy Econ. 51, 599–608. Wang, Y., Liu, L., Wu, C., 2017. Forecasting the real prices of crude oil using forecast combinations over time-varying parameter models. Energy Econ. 66, 337–348. Xie, W., Yu, L., Xu, S., Wang, S., 2006. A new method for crude oil price forecasting based on support vector machines. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), pp. 444–451. Yi, C., Huang, J., 2017. Semismooth Newton coordinate descent algorithm for elastic-net penalized Huber loss regression and quantile regression. J. Comput. Graph. Stat. 26, 547–557.
Yi, Y., Ma, F., Zhang, Y., Huang, D., 2018. Forecasting the prices of crude oil using the predictor, economic and combined constraints. Econ. Model. 75, 237–245. Yu, L., Wang, S., Lai, K.K., 2008. Forecasting crude oil price with an EMD-based neural network ensemble learning paradigm. Energy Econ. 30, 2623–2635. Yu, L., Zhao, Y., Tang, L., 2014. A compressed sensing based AI learning paradigm for crude oil price forecasting. Energy Econ. 46, 236–245. Zou, H., Hastie, T., 2005. Regularization and variable selection via the elastic net. J. R. Stat. Soc. Ser. B Stat Methodol. 67, 301–320.