Accepted Manuscript An adaptive forecasting approach for copper price volatility through hybrid and non-hybrid models Diego García, Werner Kristjanpoller
PII: DOI: Reference:
S1568-4946(18)30564-7 https://doi.org/10.1016/j.asoc.2018.10.007 ASOC 5125
To appear in:
Applied Soft Computing Journal
Received date : 29 March 2018 Revised date : 26 July 2018 Accepted date : 2 October 2018 Please cite this article as: D. García, W. Kristjanpoller, An adaptive forecasting approach for copper price volatility through hybrid and non-hybrid models, Applied Soft Computing Journal (2018), https://doi.org/10.1016/j.asoc.2018.10.007 This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
An adaptive forecasting approach for copper price volatility through hybrid and non-hybrid models
Diego García Departamento de Industrias, Universidad Técnica Federico Santa María, Av. España 1680, Valparaíso, Chile Email:
[email protected]
Werner Kristjanpoller * Departamento de Industrias, Universidad Técnica Federico Santa María, Av. España 1680, Valparaíso, Chile Email:
[email protected] Phone: +56 32 2654571 Fax: +56 32 2654815
* Corresponding author
*Highlights (for review)
Highlights
Adaptive and non-adaptive models were applied to forecast copper price volatility Adaptive capacity helped to identify the best explanatory variables, window sizes, and model configuration parameters Adaptive capacity in models improved volatility forecast accuracy The Adaptive-GARCH-Fuzzy Inference System model yielded the best results
*Manuscript Click here to view linked References
An adaptive forecasting approach for copper price volatility through hybrid and non-hybrid models
ABSTRACT This article studies monthly volatility forecasting for the copper market, which is of practical interest for various participants such as producers, consumers, governments, and investors. Using data from 1990 to 2016, we propose a framework composed of a set of time series models such as Auto-Regressive Integrated Moving Average (ARIMA) and Generalized Auto-Regressive Conditional Heteroskedasticity (GARCH), non-parametric models from soft computing, e.g. Artificial Neural Networks (ANN) and Fuzzy Inference Systems (FIS), and hybrid specifications of both. The adaptability characteristic of these models in exogenous variables, their configuration parameters and window size, simultaneously, are provided by a Genetic Algorithm in pursuit of achieving the best possible forecasts. Also, recognized drivers of this specific market are considered. We examine out-of-sample performance based on Heteroskedasticity-adjusted Mean Squared Error (HMSE), and we test model superiority using the Model Confidence Set (MCS). The results show that making forecasts using an adaptive technique is crucial to obtaining robust and improved performance. The Adaptive-GARCH-FIS specification yielded the best forecasting power.
Keywords: Copper price volatility, Adaptive models, Genetic Algorithm, GARCH model, Fuzzy Inference System, Copper price returns.
1
1. INTRODUCTION Studying the variability or instability of commodity prices, better understood as price volatility, is a widely studied yet challenging topic, especially when we perform forecasts for practical purposes. In this sense, considering a monthly timeframe is very important, due to the number of participants that are benefited. Producers obtain both realistic monthly budgeting and optimal extraction policies, while consumers receive a stable price plan. For their part, governments have a better estimation of income from ad valorem royalty taxes. Finally, investors and speculators benefit, for example, in the derivatives valuation scenario. In order to address this issue, soft computing techniques provide useful tools to forecast noisy environments such as stock markets, capturing their non-linear behavior (Atsalakis & Valavanis, 2009a). Additionally, over the last few years, adaptive systems 1 for forecasting problems have attracted a great deal of attention, with a focus on neural networks (Ghiassi et al., 2005) and evolving fuzzy rule-based models (Angelov & Filev, 2004). These non-parametric techniques provide useful insights into financial time series and are especially helpful in determining the highly chaotic behavior these present. Recent literature includes studies that combine different types of machine learning algorithms and evolutionary techniques to improve forecasts in time series analysis, with great results. Moreover, the use of soft computing techniques as a direct improvement of econometric approaches (sometimes called hybrid or embedded models), has become more
1
An adaptive system, as its name indicates, is one that adapts itself to changes in the dynamics of the problem. Adaptation can also involve parameter and/or structural updates that are inherent to models (Luna & Ballini, 2011).
2
commonplace. The main idea behind this is that different types of models can correctly identify and capture different characteristics of the analyzed time series. In addition, as suggested by the economic theory of mineral markets, many factors can affect the nature of market fundamentals (and hence price behavior), which can vary greatly from one mineral commodity to another (Tilton & Guzmán, 2016). Thus, it is difficult to consider commodities as a single class of assets, as other studies have done. Specifically, the copper market’s behavior has been widely studied from an economic perspective, considering the influence of exogenous factors on price behavior 2. This, coupled with the availability of reliable forecasting techniques such as those mentioned above, has led us to develop promising studies with regards to forecasts, as well as the practical benefits and challenges involved in analyzing the complex dynamics of adjustment processes in commodity markets. As Cashing & McDermott (2002) indicate, commodity price behavior contains features such as rapid, unexpected and often large movements. This study proposes a set of recognized models but in an adaptive environment, with the aim of producing the best monthly forecasts possible for copper price volatility, such as Auto-Regressive Integrated Moving Average (ARIMA) models, Generalized AutoRegressive Conditional Heteroskedasticity (GARCH) models, Artificial Neural Networks (ANN) and Fuzzy Inference Systems (FIS) models. Particularly, we focus both on identifying the best type of model with its respective specification, and highlighting the importance of the simultaneous adaptation capacity of three input elements which are 2
Moreover, Marra (2015) notes that exogenous factors can have a significant and non-linear impact on volatility.
3
commonly assumed as fixed in the models, namely each one’s hyperparameters, window sizes, and exogenous variables. This feature, developed and provided by a Genetic Algorithm (GA), is based on the fact that there is no global agreement about which input configuration yields the best forecasting performance, thus there is potential for improvement in the models’ information processing, to find patterns that govern the system dynamics, at least in this market and time frequency. This study is novel for two reasons. First, to our knowledge, there are no studies that specifically address the adaptation of exogenous variables, which may contain relevant information from their time series in regard to forecasts. Moreover, simultaneously adapting more than one element in the models involved has not been addressed before (possibly due to the size increment of the combinatorial problems in concern), in comparison with other works in the adaptive context. Second, in papers studying the monthly price volatility for mineral commodities, there are none that assess volatility forecasting in this particular timeframe, as we do. This is possibly due to the limited data available to train the models based on periods following structural breaks in these markets, with the objective of producing great forecasts. The present work’s contribution can be summarized as follows: on the one hand, it highlights an aspect which is important to consider in forecasting, based on the potential increase in model prediction power due to the simultaneous adaptation of different input types, particularly exogenous variables. On the other hand, it constitutes the first attempt to predict monthly volatility in the copper market, despite the large number of potential beneficiaries (unlike shorter timeframes such as intraday or daily, limited to investors and speculators). As such, taking the risk associated with returns forecasting accuracy into 4
account means future volatility estimates can become an important additional source due to their common use as a measure of risk levels, helping the aforementioned participants to make more informed decisions under uncertainty. The paper is distributed as follows: section 2 presents a literature review. Section 3 presents the data used, the methodology and the proposed model. Section 4 shows the results and their discussion. Finally, section 5 presents the conclusions. 2. LITERATURE REVIEW With respect to forecasting commodity volatility, regardless of the commodity’s specific category, the methodology used, or the type of market, be it spot or futures, the current literature is extensive (see for example: Trück & Liang, 2012; Arouri et al., 2012; Kang & Yoon, 2013; Bentes, 2015a; Triantafyllou et al., 2015; Basher & Sadorsky, 2016; Degiannakis & Potamia, 2017; Bollerslev et al., 2017). Nonetheless, all of the above are framed within an analysis based on intraday, daily, or weekly frequencies. Regarding techniques and methodologies used for forecasting, parametric models of time series have produced important results on their own, as with Dooley & Lenihan (2005), or Amini et al. (2015). They have also been combined with soft computing tools, which belong to the category of non-parametric models of artificial intelligence; the latter have also been used individually. As such, within the context of commodities or stock markets, certain studies bear highlighting, for example Hamid & Iqbal (2004) and Sánchez et al. (2015), who make use of ANN; Tay & Cao (2002) and Huang & Tsai (2009), in the case of Support Vector Machines (SVR); and Sheta (2006), Chang & Liu (2008), Atsalakis
5
& Valavanis (2009b), and Boyacioglu & Avci (2010), who employ FIS or Adaptive Neuro Fuzzy Inference Systems (ANFIS). Regarding hybrid configurations between parametric and non-parametric models, the study by Donaldson & Kamstra (1997) can be mentioned, who applied a neural network model in combination with a GARCH model to capture the effects of volatility on stock returns, or Fiordaliso (1998), who employed FIS to combine a set of individual forecasts. Babu & Reddy (2014) used a hybrid ARIMA-ANN model, whereas Guresen et al. (2011) used hybrid GARCH-ANN models, obtaining results that do not exhibit significant improvements for the hybrid model in comparison to traditional models. Hajizadeh et al. (2012) compared two hybrid neural network models, with the aim of improving the predictive power of GARCH family models, using explanatory variables. Kristjanpoller & Minutolo (2016) and Kristjanpoller & Hernández (2017) combined ANN and GARCH models to predict volatility in commodity spot markets at a daily frequency. Finally, it is important to mention the works by Panapakidis & Dagoumas (2017), who combined wavelet transformation, ANFIS, genetic algorithms (GAs), and ANN for the case of natural gas; and by Chang et al. (2011), who used the ANFIS-AR hybrid model for volatility forecasting. In the context of adaptive systems for forecasting, genetic algorithms are considered an efficient additional tool for solving problems. Thus, Goonatilake et al. (1995) proposed the use of fuzzy systems to make trading decisions, with a GA to find the optimal fuzzy sets configuration. Wagner et al. (2007) included an adaptive sliding-window approach that chooses the best window size for time series models to reach the best prediction of current data, applying it to estimate U.S. Gross Domestic Product. Hung (2011a; 2011b) combined 6
fuzzy systems and a GARCH model to adaptively forecast stock market volatility, estimating parameters for FIS and GARCH models based on a GA and Particle Swarm Optimization (PSO), respectively. Pradeepkumar & Ravi (2017) employed PSO to estimate weights and biases for a Quantile Regression Neural Network to forecast eight financial time series; similarly, Wei (2013) employed a GA to refine the weights of rules joining in an ANFIS model to forecast the Taiwan stock index. Finally, Tan et al. (2017) employed Fruit Fly Optimization Algorithm (FOA) to adjust adaptively fuzzy inference rules from the ANFIS model in stock market volatility forecasting, while Singh & Mohanty (2017) used a GA along with fuzzy tuning to train the free parameters in their adaptive generalized neuron model, in electricity price forecasting. Despite the extensive literature mentioned above, in terms of monthly volatility in mineral commodity spot markets, there are but a few studies, notwithstanding the practical importance of this topic (in terms of modeling only, see for example: Davidson et al., 1998; Brunetti & Gilbert, 1995; Tully & Lucey, 2007; Labys et al., 2000). Nonetheless, as mentioned in the above section, there are none focused on monthly volatility forecasts in spot markets, at least to our knowledge. 3. DATA AND METHODOLOGY 3.1 Drivers behind copper price behavior, data sources and processing The focus is on the factors that determine prices and price changes in the short term. Table 1 summarizes the drivers behind this market, reporting the main studies that address their influence on copper price and an explanation related to each of them. Table 1. Drivers behind copper price behavior. 7
Drivers
Representative studies
Related explanation
Metal exchange inventories
Brunetti & Gilbert (1995)
Inverse relationship between copper prices and inventories in the metal exchanges
Investors and speculators
Radetzki & Wårell (2017), Tilton & Guzmán (2016), Brunetti et al. (2016), Humphreys (2015), Tilton et al. (2011), Jaramillo & Selaive (2006)
Changes in investor demand in future markets determine spot prices if the market is in contango3. Upward or downward movements in future prices cause corresponding movements in the spot price.
Demand curve variations
Radetzki & Wårell (2017), Tilton & Guzmán (2016)
Short-term volatility is mainly due to variations in demand (due to current conditions of the business cycle or economic activity), since supply does not have the ability to adjust in this period
Real exchange rate
A real depreciation of the US dollar increases copper prices, De Gregorio et al. (2005), Ridler & Yandle and vice versa. In a more indirect way, the known (1972), Dornbusch et al (1985), Gilbert relationship between copper prices and costs (which involve (1989), Reinhart (1991) exchange rate influence), as exhibited in Tilton (2014).
Data are obtained for the period4 Feb 1990 until Nov 2016. First, the nominal spot prices of copper are obtained as quoted daily on the London Metal Exchange (LME). With regard to the exogenous variables, all are considered observable, first taking the future prices at three months, daily, and nominal, in order to represent speculative activity and the role of investors5. Regarding inventories, a total sum of daily inventories is taken from LME, COMEX, and Shanghai exchanges, which are the main exchanges in copper 3
Contango refers to the market state when the future price is above the spot price. Contrary to contango is backwardation, i.e., when the future price is below the spot price. 4 Among the different arguments for starting at this date, there are: that mentioned by Radetzki and Wårell (2017), in which China's industrialization thrust evolved in all seriousness only in the 90s, at a remarkable pace, similar to Japan's between 1950 and 1975; that the intensive use of commodities in hedge funds began around mid-1990 (Humphreys, 2010); that it avoids periods where structural changes occurred, such as the prolonged economic recession which followed the 1970s oil shocks and the collapse of the Soviet Union, which suppressed global demand for minerals (Humphreys, 2015); and lastly, due to difficulties in obtaining databases in periods prior to this. 5 The reason for using future prices at three months lies in Marshall & Silva (2002), who established that the forecasting capacity of future contracts decreases as contract maturity increases. Additionally, a dummy variable is used to model the market state, and given that we seek to use this information for forecasting purposes, as a proxy for the situation of contango (or backwardation) in the period t, we will use the situation of contango (or backwardation) in the preceding period, t-1.
8
inventories (Lagos, 2013). The three previous time series were obtained directly from Comisión Chilena del Cobre (COCHILCO). To reflect economic activity, we decided to use China's Industrial Production Index6, retrieved from the Ministry of Economic Affairs of Taiwan, which reports monthly. Regarding the real exchange rate, we decided to use the multilateral exchange rate Major Currency Index, which includes the seven main commercial partners of United States with respect to the U.S. dollar. This is obtained from the U.S Federal Reserve. All prices were deflated by the monthly Producer Price Index based on Nov 2016 = 100, also obtained from the previous source. The data processing is summarized in the following way: 1. Monthly averages are calculated from daily series in the first level, obtaining monthly series, which are then transformed into their first difference from a natural logarithm, meaning their monthly percentage variation or yield, , and
, where
corresponds to the price on the day i belonging to the month
t, composed of N days. 2. The daily series in the first level are taken again, and these are transformed into their first difference from a natural logarithm, meaning their daily percentage variation or yield,
, similarly to the previous case.
3. The daily return series obtained previously are considered, and volatility is calculated monthly (with the exception of the industrial production index, as there are no reports
6
As shown in the results of Labys et al. (1999) based on monthly data, fluctuations in the industrial production index influence metal prices, including copper, in a reasonably strong manner compared to other macroeconomic factors. China was selected for its industrial impulse, having established clear marks in the consumption of various intensively-used commodities in the industrialization phase. Thus, in the 15-year period between 1990 and 2005, the demand for copper increased between six and eight times; China was responsible for 95% of the growth in global demand from 2000-2005 (Radetzki & Wårell, 2017).
9
with less than a month frequency). The measurement of volatility to be used is realized variance,
, where
is the monthly average of daily returns;
. The RV is used as a proxy for population conditional variance
, as
Fuertes et al. (2009) indicate. 4. Finally, RV is calculated on a monthly basis by multiplying by the number of days N corresponding to the month t, resulting in:
.
Fig. 1 plots copper price returns and the corresponding realized variance. Periods of high volatility can be appreciated, especially in 1996 and 2008. Table 2 exhibits descriptive statistics for the percentage variation of all the observed variables. Both inventories and China’s Industrial Production Index series are the most volatile, while the Major Currency Index series is the least volatile. Moreover, the latter presents the lowest percentage variation. The Jarque-Bera Normality Test indicates that distributions are abnormally distributed, with the exception of the Major Currency Index series. The Augmented Dickey-Fuller test (ADF) is significant at 1% for all series, which leads to the conclusion that they are stationary. Finally, the heteroskedasticity ARCH LM test to identify the presence of ARCH effects in residue was applied. The value of the statistic (for 12 lags) is 26.75, and the critical value corresponds to 21.03, which affirms the existence of arch effects in the monthly copper returns.
10
Fig. 1. Time series plot for price returns and the price return realized variance. Return
Volatility
0.3
0.09 0.08
0.2
0.07 0.1 0.06 0
0.05
-0.1
0.04 0.03
-0.2 0.02 -0.3 -0.4 1990
0.01 0 1993
1996
1999
2002 RCU
2005 2008 VCU
2011
2014
2017
Table 2. Descriptive statistics. Statistics Mean (%) Standard deviation Minimum (%) Maximum(%) Normality ADF
Copper Spot 0.00107 0.0600
Copper Futures 0.00111 0.0574
Inventories 0.00632 0.0867
China's IPI 0.00365 0.0855
Major Currency Index 0.00004 0.0167
-0.294 0.221 139.288 -12.136
-0.298 0.216 204.130 -12.081
-0.455 0.529 2299.281 -13.305
-0.233 0.268 68.152 -4.872
-0.048 0.065 3.801 -12.425
The normality test used is the Jarque-Bera test, which has a distribution of with two degrees of freedom, and its null hypothesis implies that errors are normally distributed. The critical value at 10% for this test is 4.6.
The evolution of returns and variance for each explanatory variable are detailed in Appendix B. Note that there are only five volatility accumulations for the inventories during the analyzed period, whereas in the future prices and major currency index there is a 11
base volatility and three significant periods of high volatility, towards the end of 1995, in 2006 and during the subprime crisis. In the case of China’s IPI, the returns follow a cyclical pattern. Thus, in first instance, there are nine different time series available: five for returns and four for volatilities, associated with copper spot prices, copper future prices, inventories, China’s Industrial Production Index, and real exchange rates, reaching a total of
observations per series. Finally, we must consider the complexity of the
dynamic relationships between these factors, as well as how they determine changes in prices. Although these dynamics are not the study objective, they may have an influence on the forecasts (either positive or negative). For the purposes of this study, the variables listed herein will be considered as the major determinants of copper price and its changes. 3.2 Parametric models and context of the problem Given the set of explanatory variables, we would like to know which of them, alone or collectively, produce the best monthly volatility forecasts. Thus, the models consider the lags from all past time series, taking those from period 1 retrogressively, up to a maximum of 24, in order to capture behavior patterns in certain periods7. It is possible that only certain months within these lags have greater influence on the current observed variability, and therefore contain the information with greatest predictive capacity, which represents seasonal effects.
7
At least in terms of cyclicity, Labys et al. (2000) found periodicity of less than 12 and 24 months, related to speculative influences and short-term macroeconomic cycles in the U.S. economy, respectively.
12
Among the parametric models proposed, which will use these time series as inputs, we will first consider the ARIMA model with exogenous variables (ARIMAX), given that it allows the addition of one or more exogenous variables. This corresponds to an extension of the ARIMA models, which besides being widely used as benchmarks, are particularly appropriate when dealing with the prediction of variables that are observed on a short-term basis, i.e., quarterly or monthly (Labys, 2016). Second, we will consider Auto-Regressive Conditional Heteroskedasticity (ARCH) models and their generalization (GARCH), introduced by Engle (1982) and Bollerslev (1986), and which according to Labys (2016), have become the most widely used auto-regressive heteroskedastic model. We will use the GARCH extension similarly to the ARIMA model, where in this case both mean and variance equations can be modeled as a function of past values for the dependent variable and exogenous variables. In mathematical terms, a GARCH (p, q) model with explanatory variables is of the form:
where
denotes the lag
order moving average j, and
of the exogenous variable x in the mean equation,
the
denotes the lag h of the exogenous variable y in the
variance equation. For simplicity, hereafter we will refer to GARCH models as those including explanatory variables as well as those that do not. 13
At this point, given this set of time series which serve as inputs for any model, and depending on the type of model (ARIMAX or GARCH), various combinations may arise, one of which will deliver the best forecasts. It should be noted that, for the specific case of the GARCH model, the possible combinations include monthly return and variance time series, since the performance of the latter equation depends on the innovation term in the mean equation, as can be deduced from Eq. (1a) and Eq. (1c). Therefore, in this type of model in particular, the search space is considerably greater than ARIMAX. For example, for a GARCH model with 198 input variables8, the number of combinations is approximately
(see Appendix A for more details), while in the case of
ARIMAX with the same number of lags, this number drops to
. The
manner in which we can quantify the combinations for the non-parametric models is the same as for ARIMAX. 3.3 Non-parametric models: Artificial Neural Networks and Fuzzy Inference Systems Fuzzy rule-based systems include an approach with skill-learning, and consist mainly of a set of first-order Sugeno-type fuzzy rules (Takagi & Sugeno, 1985). The premise of Fuzzy Inference Systems, in the context of forecasting, is that previous values from the time series can be used to estimate future values. What makes FIS an efficient tool, in comparison with other methods, is that it deals with the vagueness of predictors and 8
For the mean equation: 24 lags of copper returns, 24 of moving averages, 24 of Major Currency Index, 24 of Inventories, 24 of Chinese Industrial Production Index, 1 of copper future returns, 1 of dummy variable, and 1 of a constant. This gives a total of 123 variables for this equation. For the variance equation: 24 lags of copper returns variance, 24 of Major Currency Index, 24 of Inventories, 1 of variance in copper future returns, 1 of dummy variable, and 1 of a constant; resulting in a total of 75 variables for this equation (in the case of future returns and variances, we considered only 1 lag, since by changing the situation of contango to backwardation after many consecutive months, it would not be consistent to consider information from many months ago for the present situation). Thus, the total number of variables to specify a GARCH model, in whatever form, is 198.
14
their non-linear relationships with dependent variables (Shahab, 2014). In methodical terms, these systems estimate non-linear functions in a uniform manner with arbitrary precision, interpreting the values in the input vector and, based on certain sets of rules (like if-then), assign values to an output vector. As Luna & Ballini (2011) indicate, a fuzzy rule-based system can also be seen as a set of local models representing a local mapping between independent and dependent variables. The activation level of each local model is determined by the actual input pattern’s degree of membership to each sub-region. In more detailed terms, every input pattern will have a degree of membership to each region of the input partition space, and this is calculated through Gaussian membership functions. To specify the FIS structure three types of input are mainly required: an input variables vector, an output variables vector (which are related to the input variables), and a certain radius. As for Artificial Neural Networks, they are a powerful non-parametric tool used for signal filtering, pattern recognition, and interpolation, among many other applications. They can tolerate data with errors and identify non-linear associations between model parameters. In addition, one major advantage over other econometric methods is that it is not necessary to take the model’s functionality, which means we do not need to make an assumption about the functional relationship between variables; however, it is necessary to incorporate the appropriate variables for a good estimate. Traditional ANN modeling uses the feed-forward backpropagation approach first introduced by Rumelhart et al. (1986a) and Rumelhart et al. (1986b). The backpropagation 15
algorithm is an algorithm for supervised learning which seeks to minimize the quadratic error by maximum gradient descent. It is based on the “backpropagation” of errors. In a more detailed way, the general form of this type of ANN inputs values one at a time, and given a set of weights and an activation function, it computes the associated output values. In the forecasting context, just one output variable will be present, which corresponds to the forecasted value. This value is then compared with the target output to compute the error for this record. The gradient descent approach and the set of user-defined parameters (such as learning rate) are used to propagate this error back through the network, adjusting the weights so as to minimize the total accumulated error. The process stops, and the network is trained, when the error stabilizes at an acceptable level (Guiassi et al., 2005). Therefore, to estimate the neural network model, it is necessary to define the input variables and the characteristic parameters of the network. 3.4 Genetic Algorithms (GAs) GAs have been used by many researchers as a tool for search and optimization, and remain the most popular approach in the literature (Aguilar et al., 2015), being widely used in financial applications among other evolutionary algorithms. GAs are mainly used for solving optimization problems, in which the central complexity is the large number of possible combinations that can provide a solution. In these cases, an important goal is often to quickly find a satisfactory, albeit sub-optimal, solution. Here, convergence to the global optimum is something we can only hope for, but which is often impossible to guarantee given the size of the search space and the time 16
available to the optimization algorithm, distancing it from practical purposes. As Lankhorst (1996) indicated, finding the optimal point(s) of a large search space is often an NP-hard problem, and requires an exponentially increasing number of search space points to be investigated as the problem is scaled up. In this sense, evolutionary algorithms tend to find good solutions to many problems fairly quickly. As seen in the literature review, the use of GAs has been widely used in terms of forecasts under an adaptive context. As Panapakidis & Dagoumas (2017) indicate, a crucial factor in the overall success of a forecasting task is proper selection of the number and types of input. In this sense, the GA will form a complementary part of the forecasting models involved. 3.5 Application of the GA in forecasting and its specification The reason we used GA as an optimization tool is that, as Aguilar et al. (2015) indicate, it has remained the most popular approach in the literature among other evolutionary algorithms for this type of application. We will use it to determine: the variables (alone or collectively) and lags that should be considered as inputs in the models to be used; the number of observations to estimate/train the models (window size); and the parameter values for model configuration. The above implies, therefore, finding the optimal specification for the model. To measure forecast performance, we will use HMSE, Theil-U, Q-Like, and Mean Squared Error (MSE) loss functions. The first two are more informative (Fuertes et al., 2009), whereas Q-Like and MSE are recognized as robust loss functions (Patton, 2011). Nonetheless, in the context of the optimization problem which will be solved by GA, we 17
have decided to base selection of the best model on HMSE, given that, as Fuertes et al. indicate, it is adjusted by heteroskedasticity:
where
corresponds to predicted variance for the month t, and n to the number of periods
to forecast. Thus, the fitness function will correspond to this loss function. In this particular regard, the methodology is similar to that used in Hung (2011a). The time horizon chosen to forecast, and which is relevant in the calculation of Eq. (2), corresponds to 5 years, which means forecasting a total of 60 observations from the period December 2011 through November 2016. Thus,
.
Individual encoding in the GA is specified using strings, whose chromosomes will be binary in parametric models (whether in the exogenous input variables, the model parameters, or window size). Thus, for example, a value equal to 1 in one chromosome of a given slot of an individual will mean: “use this specific time series” among the set of input variables; meanwhile, a value equal to 0 will mean: “omit this time series” among the set of input variables. Therefore, each slot is associated with a single variable and lag, and each individual represents a unique specification for a certain model type (either ARIMAX or GARCH). In the case of non-parametric models, the binary coding is used only for explanatory input variables (as before). Regarding the model parameters and the window size, these are used directly for reasons of software efficiency. In other words, window sizes will be integer non-binary variables, as well as the number of neurons and hidden layers in the case of models involving ANN. For those involving FIS, the value of the
18
radius will be a continuous variable9. In this way, the GA provides the models with adaptive capacity in: explanatory variables, configuration parameters, and window sizes. The possibility that both the model parameters and the window size vary results in an increase in the number of possible combinations, regardless of the type. Therefore, this leads to even larger search spaces than those mentioned previously. To reduce these, we take a few considerations10 into account. To illustrate the effect of the above, this allows us to reduce the search space from more than order of
combinations to others in the
combinations in the GARCH model.
3.6 Adaptive operation of the models Fig. 2 graphically represents the adaptive process of all models used in this study. As can be seen, it is necessary to identify the complexity of the optimization problem, and thus decide whether or not to use Terasvirta’s test to simplify it. In relatively large problems (in this study, we arbitrarily consider those of the mixed integer type, and/or those with individuals with a number of slots equal to or greater than 80), we decided to use large population sizes of individuals, of an order of 900 per generation. Otherwise, population
9
Thus, in the context of the optimization problem to be solved by the GA, integer problems will be solved in the parametric models, ANN, or combinations of these; meanwhile, in those considering only FIS models or parametric-FIS hybrids, mixed integer problems will be solved. This distinction is relevant to the size of the search space that the GA will confront (finite and infinite, respectively), which could lead to local convergence. 10 First, reducing the number of lags from 24 to 12, for parametric models only; second, and also only in parametric models, considering only 13 possible window sizes, based mainly on trial and error: 96, 102, 108, 114, 120, 126, 132, 138, 144, 150, 156, 162 and 168 months behind; third, discerning which variables are best suited to linear modeling rather than non-linear. For this last instance, we opt to use Terasvirta’s test (Terasvirta et al., 1993), which assesses if a group of time series are non-linearly related (this method is based on neural networks and their theory. When the null hypothesis is rejected, the model is said to suffer from neglected non-linearity, meaning that a non-linear model may provide better forecasts than those obtained with the linear model (Arratia 2014, p.137)). Finally, given that local convergence may be encouraged if there is not a great diversity in the initial population, we opt to use large population sizes to guarantee this, as Lankhorst (1996) suggests.
19
sizes of 400 individuals per generation are used (which was based on various tests), referred to as the default value in the displayed schema. In each generation, each individual (or model specification) performs exactly 60 out-of-sample and one-step ahead forecasts11 for a fixed window size. This implies that each model does not change its specification or window size between different forecast periods; however, two models belonging to the same population could use different window sizes. The forecasts are then used to compute the fitness of each individual, or HMSE loss function, of Eq. (2). The models involving ANN use a set of training and test12 observations, whose sum is equal to the window size, for the period prior to each forecast; in the case of FIS models, a Sugeno structure is created which contains training in every step of the rolling window process, to then evaluate this structure using the forecast for the following period. In the transition from one generation to another, the genetic operators of selection, mutation, and crossover13 are applied. This iterative process continues until the average cumulative change in the fitness function over a specified number of generations is less than the desired tolerance; with this stop criterion, all individuals in the final population are, in a way, similar. 11
The manner of creating them follows a straightforward process, known also as “rolling window forecasts”. Given a sample of size , we remove M observations at the end of the sample and that correspond to the forecast horizon considered (here: . The model is then estimated or trained on the sample or window size (i.e., until T), and the variance value is forecast for period . Then, the observed value for the dependent variable is added to our sample, and the model is re-estimated. The observation is then forecast, and so on, until all M observations are covered. These updates allow us to capture the variation dynamics over time that are inherent to the volatility behavior of this type of market. 12 These are 70% and 30%, respectively, according to prior calibrations performed. 13 A value of 10% was used for selection; additionally, 90% crossover, and hence, 10% for mutation. This is because, besides producing better results in experimental tests, crossover is the main search driver in GAs (Lankhorst, 1996). Mutation mainly serves to maintain a certain level of genetic diversity in the population, preventing premature convergence to sub-optimal values.
20
Start
Account the maximum number of slots and type of optimization problem
¿Problem relatively quite big?
Yes Reduce number of variables (slots) using Teraesvirta’s test
No Select a population size more suitable than the default
Select default population size
Encode individuals
Generate initial population
Make forecasts with each individual
Compute loss function for each individual
End criterion satisfied?
No Apply genetic operators
Yes
Register the final population of individuals
Identify best fitness individual with the corresponding fitness value
Decode individuals Obtain a new generation of individuals End
21 Fig. 2. Flow chart. Operation of an adaptive model.
Finally, having stopped the algorithm, individuals are obtained and recorded from the final population, including both their chromosome compositions and fitness values. The formation and forecasting process of an adaptive hybrid model is illustrated in Fig. 3, which is obtained by developing two steps. Firstly, after selecting the type of parametric model and obtaining the final population of individuals adaptively, model specifications are ranked according to their HMSE value, from low to high. From this total population, the 12 best14 specifications
are identified and selected, and the value of the
optimal window size T is calculated, to then forecast the last
observations (Step
1). In other words, the total number of observations that are available (which include outand in-sample), minus the optimum number of observations that these individuals needed to estimate and forecast. Therefore, the optimum window size resulting from the first step determines the dimensions for data entry in the second.
14
This number was determined in an arbitrary manner, based on the aim that the fitness values should not differ considerably from the best obtained.
22
Step 2
Step 1
Time series of explanatory variables
Best 12 specifications from adaptive parametric model 1
𝑪𝑿𝑽
𝑪𝑿𝟏𝟐
12
2
Adaptive hybrid model
Final forecasts 𝑪𝑿𝟏𝟐
𝑍𝟔𝟎𝑿𝟏
Time series of forecasts
Fig. 3. Formation and forecasting process of an adaptive hybrid model. Consecutively, the data entry in Step 2 consists of: 12 time series, where each includes C forecasts performed by the best specifications of the parametric model in question, contained in the
matrix; and the set of explanatory variables consisting of
lagged variances up to a maximum of 24 times (this number since in this step, the problem size is reduced), with the inherent parameters/window size of non-parametric model configuration, represented within the
matrix. The first group aims to increase the
probability that the non-parametric model recognizes a forecasting pattern, considering forecasts made by similar parametric models, and detecting that such forecasts must be surpassed. We seek to achieve this by complementing forecasts from the best model through a combination of similar models, in addition to explanatory variable data (second group). With the same purpose, this number of forecasts was considered to form the hybrid
23
model, i.e. C, since each of these observations is generated by the same sample size, corresponding to T. Due to scope purposes only, all hybrid models are composed of parametric models only in their Step 1, and of non-parametric models in their Step 2. Additionally, note that the model from Step 2 operates in a same way as the adaptive model explained in Fig. 2. The time series models, artificial intelligence models, and hybrid models presented do not involve an optimization process to determine parameter configuration or window sizes, which instead were determined by selectively analyzing different configurations, with more than 5,000 different models (Appendix C). We added the GARCH (1,1) model, which was proposed due to its wide use for volatility estimation, as Hull (2015) indicates. Table 3 presents the set of models that is proposed, either non-adaptive or adaptive, with the aim of making comparisons between them. ANN and FIS models were selected due to their wide use in the literature, and their capacity to recognize non-linear patterns, either to address the forecasting task themselves or to potentially improve the forecasts made by parametric models. The hybrid models are made up, in Step 1, from the nonhybrid models presented. For example, the hybrid A-ARIMAX-FIS model is formed by using, in its first part, the model of A-ARIMAX time series. For the hybrid models, the column associated with variable configuration parameters refers to those from the nonparametric model used in Step 2. Additionally, since it is very common to consider few exogenous variable lags for the model specifications, we used two lags in all non-adaptive models, as with studies by Gargano & Timmermann (2014) and Kristjanpoller & Hernández (2017) (who use regressors of up to 2 and 1 lagged periods in a consecutive manner, respectively). 24
Finally, to test the performance statistically, the Model Confidence Set (MCS) was applied (Hansen et al., 2003; 2011). It is constructed from an evaluation criterion and different models, distinguishing the best according to forecasting accuracy. Table 3. Proposed hybrid and non-hybrid adaptive models.
Model type a
11. A-ARIMAX 12. A-GARCH NonHybrid
Hybrid
Number of slots at GA
Variable configuration parameters [LB, UB] b
64
None f
Window Size [LB, UB]
Terasvirta’s test
Population size
Problem type
[96, 168]
No
400
Integer
[96, 168]
Yes
900
Integer
99
p, q orders: [1, 4]
58
p, q orders: [1, 4]
[96, 168]
No
400
Integer
14. A-ANN
77
N: [3, 78], L: [2, 30]
[48, 124]
No
400
15. A-FIS
76
Radius: ]0, 0.3] e
[40, 60] e
No
900
Integer Mixed Integer
16. A-GARCH-ANN
88
N: [3, 78], L: [2, 30]
[48, 124]
Yes d
900
17. A-GARCH-FIS
87
Radius: ]0, 0.3]
[40, 60]
Yes d
900
18. A-ARIMAX-FIS
76
Radius: ]0, 0.3]
[40, 106]
No
900
13. A-GARCH
a. b. c. d. e.
f.
c
“A” means “Adaptive”. LB denotes Lower Bound. UB denotes Upper Bound. N denotes Neurons, L denotes Layers. This is a self-explained A-GARCH, i.e., it uses only variables related to dependent variable lags (returns and variances). The tests were only applied at Step 1 in these cases, because these are hybrid models. In previous experiments, all models which involve FIS yield better results in these ranges of radius and window size. In previous experiments, all models which involve GARCH yield better results in this range of p and q orders, mainly due to convergence issues.
4. RESULTS AND DISCUSSION The best models in fit and forecast were the time series, artificial intelligence and hybrid models, described in Appendix C. Table 4 presents the best configuration for each model. The best time series model for forecasting was GARCH without explanatory variables; HMSE = 0.3515. The artificial intelligence models produced less accurate 25
Integer Mixed Integer Mixed Integer
forecasts than the best time series model; however, when combined into hybrid models, it can be observed that the HMSE for ANN-GARCH and ANN-ARIMA models were lower, indicating that fusing the models provides greater accuracy than each one separately. Table 4. Results from the proposed hybrid and non-hybrid adaptive models.
Loss functions
Model HMSE
Time series
Artificial Intelligence
Hybrid
a. b.
THEIL-U
Q-LIKE
MSE
Optimal window size
Optimal configuration parameters b
1 ARIMAX 2 GARCH
4.8312 0.3515
0.3958 0.3684
-3.3690 -4.9951
1.52E-05 4.69E-06
168 144
AR: 2, MA: 2 p=2, q=2
3 GARCH a 4 GARCH (1,1) 5 ANN
4.0014
0.4365
-3.9903
1.35E-05
150
p=2, q=2
0.3763
0.3669
-4.9540
4.77E-06
117.4303
0.8582
15.2423
1.43E-03
168 48
p=1, q=1 L: 2, N: 28
1.9753
0.3862
-4.5416
9.38E-06
52
R: 0.25
0.2770
0.5211
-5.3339
6.89E-06
96
L: 6, N: 53
0.2852
0.5557
-5.4384
7.54E-06
96
L: 2, N: 13
0.3077
0.5928
-5.4978
7.88E-06
60
R: 0.25
0.9285
0.3359
-4.7231
6.02E-06
52
R: 0.25
6 FIS 7 ANNARIMA 8 ANNGARCH 9 FISARIMA 10 FISGARCH
This is a self-explained A-GARCH, i.e., it uses only variables related to dependent variable lags (returns and variances). N denotes Neurons. L denotes Layers. R denotes Radius.
Table 5 summarizes the different performances obtained by the proposed adaptive models, distinguishing the optimal configuration parameters for each case, as well as the ideal window size to perform forecasts in each one. We can clearly see that the proposed adaptive models are much more accurate than traditional models in Table 3. For example, the best HMSE in the non-hybrid adaptive models is 41.2% lower than the best nonadaptive model, ANN-ARIMA. The model with the best monthly copper price volatility forecasts was hybrid: the A-GARCH-FIS model. The value of a
improvement in performance compared to 26
comprises , the best non-hybrid
adaptive model. Thus, combining GARCH and FIS models provides a framework which extracts more useful data than each technique separately, and reflects improvement due to the inclusion of artificial intelligence. In fact, the HMSE for the best adaptive model is 52.1% lower than the best non-adaptive model. Table 5. Results from the proposed hybrid and non-hybrid adaptive models. Model type a
NonHybrid
a. b. c.
Optimal window size
Optimal configuration parameters c
HMSE
THEIL-U
Q-LIKE
MSE
11. A-ARIMAX
0.2730
0.3215
-4.9661
4.53E-06
156
-
12. A-GARCH
0.1652
0.4011
-5.2806
5.09E-06
162
p=0, q=2
0.1634
0.4131
-5.2466
5.30E-06
168
p=0, q=2
14. A-ANN
0.2633
0.5465
-5.4155
7.23E-06
118
N: 34, L: 22
15. A-FIS
0.1731
0.3288
-5.0464
4.65E-06
47
R: 0.2820
16. A-GARCHANN 17. A-GARCH-FIS
0.2631
0.5470
-5.4158
7.24E-06
118
N: 34, L: 12
0.1304
0.2796
-5.0863
3.34E-06
45
R: 0.276
18. A-ARIMAXFIS
0.1648
0.2881
-5.0687
3.51E-06
47
R: 0.2825
13. A-GARCH
Hybrid
Loss functions
b
“A” means “Adaptive”. This is a self-explained A-GARCH, i.e., it uses only variables related to dependent variable lags (returns and variances). N denotes Neurons. L denotes Layers. R denotes Radius.
Figure 4 plots the real value of VCU and the best two model forecasts, A-GARCH and A-GARCH-FIS. In general, we can see that A-GARCH-FIS has a better fit than AGARCH. Also, in the months with low volatility, the forecasts are much more accurate than in the months with high volatility. There is one month in particular, January 2015, in which the models were unable to make a good prediction. We believe this is due to two principal factors, whose data are lacking from these models: the drop in oil prices, generating an increase in perceived risk for mineral commodities; and China’s economic growth slowing more than the market predicted.
27
Volatility 0.015
0.012
0.009
0.006
0.003
0 Dec/11
Jun/12
Jan/13
Jul/13
Feb/14
Real VCU
Aug/14
Mar/15
A-GARCH
Oct/15
Apr/16
A-GARCH-FIS
Nov/16 Months
Fig 4. Real VCU and the forecasts by A-GARCH and A-GARCH-FIS models. To analyze result robustness, we applied the Model Confidence Set. We can see from Table 6 that A-GARCH-FIS is statistically superior to all the non-adaptive and nonhybrid adaptive models. In fact, the model is statistically different from the rest of the models, since the MCS p-value is 1. Table 6. Model Confidence Set Results.
Model
Time series
Artificial Intelligence
Hybrid
Non-Hybrid
Model Confidence Set p-value
1. ARIMAX
0.0000
2. GARCH
0.0670
3. GARCH
b
0.0020
4. GARCH (1,1)
0.0670
5. ANN
0.0000
6. FIS
0.0440
7. ANN-ARIMA
0.0160
8. ANN-GARCH
0.0040
9. FIS-ARIMA
0.0440
10. FIS-GARCH
0.0040
11. A-ARIMAX
0.1420
28
Hybrid
12. A-GARCH
0.0670
13. A-GARCH b
0.0670
14. A-ANN
0.0040
15. A-FIS
0.2670
16. A-GARCH-ANN
0.0070
17. A-GARCH-FIS
1.0000
18. A-ARIMAX-FIS Bold: the superior models for
0.6870 70 according Hansen et al. (2011).
A second complementary analysis of robustness is to divide the forecast period into two sub-periods to check whether the best model still performs best in each of the subperiods. In this case, the forecast period is 60 months, therefore two 30-month sub-periods are generated. Table 7 summarizes the results of the best adaptive models for each subperiod. We can see that A-GARCH-FIS is the best model for either sub-period, demonstrating its superiority independent of the forecast period analyzed. Table 7. Results of adaptive models for sub-periods. HMSE
Model type a
Non-Hybrid
Hybrid
a. b.
All period
Sub-period 1
Sub-period 2
11. A-ARIMAX
0.2730
0.3098
0.2362
12. A-GARCH
0.1652
0.1786
0.1518
0.1634
0.1608
0.1660
14. A-ANN
0.2633
0.2175
0.3091
15. A-FIS
0.1731
0.1697
0.1765
16. A-GARCH-ANN
0.2631
0.2190
0.3072
17. A-GARCH-FIS
0.1304
0.1235
0.1373
18. A-ARIMAX-FIS
0.1648
0.1773
0.1522
13. A-GARCH
b
“A” means “Adaptive”. This is a self-explained A-GARCH, i.e., it uses only variables related to dependent variable lags (returns and variances).
Regarding the importance of adaptive capacity, we can make several observations. Firstly, and for parametric models only, the value added from this element is reflected in the superior performance of less efficient, adaptive models (such as A-ARIMAX) in 29
comparison to more efficient, non-adaptive models (such as GARCH (1,1), which addresses volatility more efficiently due to the presence of heteroskedasticity). In addition, when we consider adaptation capacity, it is interesting to observe that parametric-only, adaptive models (A-ARIMAX) managed to outperform even competitive hybrid, adaptive models (A-GARCH-ANN). In those cases, adaptive capacity is more powerful than the nature of the model. Secondly, in the A-GARCH-FIS model, not only were the window size or p and q optimal values interesting, but the clustering radius
is worthy of
note. This is a very specific value15 which is only obtainable if the model estimates adaptively, making this feature crucial in the configuration parameter context. Thirdly, with respect to the explanatory variables which served as inputs, they do effectively possess predictive information, at least with hybrid models. Table 9 in Appendix shows which variables, specifically, are optimally selected according to the best model AGARCH-FIS. We can affirm that this specific selection of variables (not consecutive in lags) produces the most accurate forecasts, which is possibly due to the acquired adaptation feature. From step one, a total of 38 explanatory variables were selected, of the 78 possible considered in the input. Regarding step two, 45 variables were selected from a total of 85. For the latter, it should also be noted that 8 of the 12 time series forecasts from the best models were selected, leading us to infer that the specifications behind these models contain predictive power that the first did not contemplate. As such, rather than adding explanatory variables to improve model forecast accuracy, it would be wise to perform an intelligent, selective treatment of these variables for model estimation. Thus, adaptive
15
The exact value is R = 0.276092892071427
30
capacity depends on the explanatory variables, as well on window sizes and configuration parameters. Fourthly, we affirm that this simultaneous adaptation, with those specific input types, increased forecast performance. In other words, if just one or two of the input types had been considered adaptively, performance would have been worse. Accordingly, if the window size had not been adaptive, we might not have been able to affirm that all FIS models require a lower window size (between 45 and 50 observations) than the rest of models for optimal specification. In this context, we believe that this same element explains why including ANN in the model A-GARCH-ANN does not present significant improvements in performance with respect to the non-hybrid model, since there are insufficient data to train, validate, and forecast as we would like. This does not occur, for example, in the studies by Kristjanpoller & Hernandez (2017) or Hamid & Iqbal (2004), who used daily data16. Additionally, please note that adaptive models include a generalization of nonadaptive models, since the latter are particular specifications wihin the search space of the former. It is clear that those specifications do not produce the best forecasts; if they did, they would have been identified through the optimization process. The best adaptive model was 52.9% more accurate than the best non-adaptive model.
On the other hand, the models involving FIS proved to be statistically superior to the rest of the models, according to MCS for a 90% confidence level. In this category, the greatest improvement occurred with A-ARIMAX-FIS, at around 40%, with an MCS p16
By way of comparison, for example, Hamid & Iqbal (2004) had 2,531 observations to make forecasts through ANN.
31
value of 0.684, separating it from the rest of the models in terms of performance, although unable to surpass the A-GARCH-FIS model. As a complementary robustness test, the forecast period was divided into two sub-periods of 30 months each, to analyze the superiority of the model predictions. Finally, the efficiency that GA possesses for reaching satisfactory solutions was remarkable. In the winning model, the number of generations needed to achieve the optimum specification was 126 (compressing both steps), where each was made up of 900 individuals, estimating around 113,000 specifications in total. Although this number may seem high, it is much lower than the total number of specifications within the search space, which is more than
and, therefore, greatly reduces the problem in practical
terms.
5. CONCLUSIONS The main objective in this paper was to perform monthly forecasts for copper price volatility, something which, despite important practical consequences for various participants of this specific market, has not been assessed previously, even in other mineral commodity spot markets. Although there are numerous studies addressing this objective for other time frequencies, the limited number of market participants limits the practical consequences. To fulfil the above objective, we proposed a set of models, consisting of parametric time series models, other non-parametric models used in the area of soft computing, and hybrid combinations of these models. All were estimated under an adaptive and nonadaptive context, to determine which variables and parameter specifications lead to the best
32
forecast performance within a model, highlighting the in/exclusion of adaptive capacity. The main conclusions from this study can be summarized as follow:
The A-GARCH-FIS model produced the best forecasting performance. It reflects an improvement of 20.2% with respect to its first step A-GARCH model, extracting more useful information than each technique separately.
Including adaptive capacity regarding input elements such as explanatory variables, window sizes, and inherent configuration parameters within the models proved crucial for improvement and forecasting robustness. Moreover, the adaptive models outperformed non-adaptive models considerably (HMSE improvement of 52.1%).
On the other hand, it can be affirmed that the models involving FIS proved to be statistically superior to the rest according to MCS, for a 90% confidence level.
The Fuzzy Inference Systems proved to be a great alternative in addressing forecasting problems adaptively in cases where the available database includes low numbers of observations. This study had that limitation, which was reflected in the poor performance of models that included ANN.
The use of artificial intelligence was decisive in achieving great results. Firstly, it improved forecasts made by other models, leading to increases of 20% and up to 40% in predictive power for the case of Adaptive Fuzzy Inference Systems models, surpassing other widely used models; and on the other hand, it formed a complementary tool to solve complex problems, such as the use of Terasvirta's test to detect non-linear relationships, and the use of GA to solve large problems.
33
The inclusion of artificial intelligence in order to make hybrid models does not always lead to better results. This can occur either under the adaptive or the non-adaptive environment.
Considering economic foundations reflected through adequate variable proxies has also been a crucial source. More importantly than including the variables, it is necessary to carefully sort them when it comes to model estimation, due to the way these models process information. However, it remains unclear why certain explanatory variables (and why some lags specifically) lead to better performances, or which of them are the most significant in this sense, due to the black box characteristic that the soft computing tools tend to possess.
Regarding future work, the adaptive capacity of the configuration parameters for certain soft computing models could be deepened, with the same goal of achieving the best forecasts possible. For example, optimally determining the initial weights in the ANN model; similarly, expanding the specification of the membership function beyond Gaussian default type in the FIS model. Additionally, in the same context of adaptive capacity, including other model specifications from the GARCH family, as well as expanding the possible assumptions for the conditional distribution of innovations such as t-student or generalized error. For such alternatives, the researcher should be careful with significantly increasing the search space size. Since this last issue reduces the efficiency of the evolutionary algorithm in the context of adaptive forecasting, another future study could focus on improving the performance of this algorithm either by testing other ones (like PSO) or improving GA.
34
REFERENCES Aguilar, R., Valenzuela, M. & Rodríguez, J. (2015). Genetic algorithms and Darwinian approaches in financial applications: A survey. Expert Systems with Applications, 42, 7684 – 7697. Amini, M. H., Karabasoglu, O., Ilic, D., D. Ilic, Boroojenu, G. & Iyengar, S.S. (2015). ARIMA-based demand forecasting method considering probabilistic model of electric vehicles’ parking lots. Proceedings of IEEE Power and Energy Society General Meeting, 1 – 5 Angelov, P. P., & Filev, D. P. (2004). An approach to online identification of Takagi– Sugeno fuzzy models. IEEE Transactions on Systems, Man and Cybernetics—part B, 34(1), 484 – 498. Arouri, M., Lahiani, A., Lévy, A. & Nguyen, D. (2012). Forecasting the conditional volatility of oil spot and futures prices with structural breaks and long memory models. Energy Economics, 34, 283 – 293. Arratia, A. (2014). Computational Finance. Atlantis Studies in Computational Finance and Financial Engineering, vol. 1. Paris: Atlantis Press. Atsalakis, G. & Valavanis, K. (2009a). Surveying stock market forecasting techniques – Part II: Soft computing methods. Expert Systems with Applications, 36, 5932 – 5941. Atsalakis, G. & Valavanis, H. (2009b). Forecasting stock market short-term trends using a neuro-fuzzy based methodology. Expert Systems with Applications, 36, 10696 – 10707. Babu, C. N. & Reddy, B. E. (2014). A moving-average filter based hybrid ARIMA-ANN model for forecasting time series data. Applied Soft Computing, 23, 27 – 38. Basher, S. & Sadorsky, P. (2016). Hedging emerging market stock prices with oil, gold, VIX, and bonds: A comparison between DCC, ADCC and GO-GARCH. Energy Economics, 54, 235 – 247. Bentes, S. (2015). Forecasting volatility in gold returns under the GARCH, IGARCH and FIGARCH frameworks: New evidence. Physica A, 438, 355 – 364. Bollerslev, T. (1986). Generalized autoregressive heteroskedasticity. Journal of Econometrics, 52, 307 – 327.
35
Bollerslev, T., Hood, B., Huss, J. & Pedersen, L. (2017). Risk Everywhere: Modeling and Managing Volatility. Boyacioglu, M. & Avci, D. (2010). An Adaptive Network-Based Fuzzy Inference System (ANFIS) for the prediction of stock market return: The case of the Istanbul Stock Exchange. Expert Systems with Applications, 37, 7908 – 7912. Brunetti, C. & Gilbert, C. (1995). Metals price volatility, 1972-95. Resources Policy, 21, 237 – 254. Brunetti, C., Büyüksahin, B. & Harrys, J. (2016). Speculators, prices and market volatility. Journal of Financial and Quantitative Analysis, 51, 1545 – 1574. Cashin, P. & McDermott, C. https://doi.org/10.2307/3872481
IMF
Econ
Rev
(2002)
49:
175.
Chang, P. & Fan, C. (2008). A hybrid system integrating a wavelet and TSK fuzzy rules for stock price forecasting. IEEE Transactions on Systems, Man and Cybernetics – Part C: Applications and Reviews, 38, No. 6. Chang, P. & Liu, C. (2008). A TSK type fuzzy rule based system for stock price prediction. Expert Systems with Applications, 34, 135 – 144. Chang, J. R., Wei, L. Y. & Cheng, C.H. (2011). A hybrid ANFIS model based on AR and volatility for TAIEX forecasting. Applied Soft Computing, 11, 1388 – 1395. Chiu, S. (1994). A cluster estimation method with extension to fuzzy model identification. In Proceedings of the IEEE international conference on fuzzy systems, vol. 2, 1240–1245. Comisión Chilena del Cobre. https://www.cochilco.cl . Accessed 20 November 2017. Davidson, R., Labys, W. & Lesourd, J. (1998). Wavelet analysis of commodity price behavior. Computational Economics, 11, 103 – 128. Degiannakis, S. & Potamia, A. (2017). Multiple-days-ahead value-at-risk and expected shortfall forecasting for stock indices, commodities and exchange rates: Inter-day versus intra-day data. International Review of Financial Analysis, 49, 176 – 190. De Gregorio, J., González, H. & Jaque, F. (2005). Fluctuaciones del dólar, precio del cobre y términos de intercambio. Central Bank of Chile Working Papers, No 310. Dooley, G. & Lenihan, H. (2005). An assessment of time series methods in metal price forecasting. Resources Policy, 30, 208 – 217.
36
Donaldson, R. & Kamstra, M. (1997). An artificial neural network-GARCH model for international stock return volatility. Journal of Empirical Finance, 4, 17 – 46. Dornbusch, R. (1985). Policy and performance links between LDC debtors and industrial nations. Brooking Papers on Economic Activity, 2, 303 – 368. Engle, R. F. (1982). Autoregressive conditional heteroscedasticity with estimates of the variance of United Kingdom inflation. Econometrica: Journal of the Econometric Society, 50, 987–1007. Fiordaliso, A. (1998). A nonlinear forecasts combination method based on Takagi–Sugeno fuzzy systems. International Journal of Forecasting, 14, 367 – 379. Fuertes, A., Izzeldin, M. & Kalotychou, E. (2009). On forecasting daily stock volatility: The role of intraday information and market conditions. International Journal of Forecasting, 25, 259 – 281. Gargano, A. & Timmermann, A. (2014). Forecasting commodity price indexes using macroeconomic and financial predictors. International Journal of Forecasting, 30, 825 – 843. Gilbert, C. (1989). The impact of exchange rates and developing country debt on commodity prices. The Economic Journal 99, 773-784. Guiassi, M., Saidane, H. & Zimbra, D. (2005). A dynamic artificial neural network model for forecasting time series events. International Journal of Forecasting, 21, 341 – 362. Goonatilake, S., Campbell, J., & Ahmad, N. (1995). Genetic-fuzzy systems for financial decision making. In Advances in fuzzy logic, Neural networks and genetic algorithms. Berlin: Springer, 202–223. Guresen, E., Kayakutlu, G. & Daim, T. (2011). Using artificial neural network models in stock market index prediction. Expert systems with Applications, 38, 10389 – 10397. Hamid, S. and Iqbal, Z. (2004). Using neural networks for forecasting volatility of S&P 500 Index future prices. Journal of Business Research, 57, 1116 – 1125. Hansen, P., Lunde, A. & Nason, J. (2003). Choosing the best volatility models: The model confidence set approach. Oxford Bulletin of Economics and Statistics, 65, 839 – 861. Hansen, P., Lunde, A. & Nason, J. (2011). The model confidence set. Econometrica, 79, 453–497. 37
Hajizadeh, E., Seifi, A., Fazel, M. & Turksen, I. (2012). A hybrid modeling approach for forecasting the volatility of S&P 500 index return. Expert Systems with Applications, 39, 431 – 436. Huang, C. & Tsai, C. (2009). A hybrid SOFM-SVR with a filter-based feature selection for stock market forecasting. Expert Systems with Applications, 36, 1529 – 1539. Hull, J. (2015). Options, futures and other derivatives, 9th edn. New Jersey: Pearson Education. Humphreys, D. (2011). Pricing and Trading in Metals and Minerals. In: P. Darling (Ed). SME mining engineering handbook, 3rd edn. United States: Society for Mining, Metallurgy and Exploration, Inc. Humphreys, D. (2015). The remaking of the mining industry. New York: Palgrave Macmillan. Hung, J. (2011a). Applying a combined fuzzy systems and GARCH model to adaptively forecast stock market volatility. Applied Soft Computing, 11, 3938 – 3945. Hung, J. (2011b). Adaptive Fuzzy-GARCH model applied to forecasting the volatility of stock markets using particle swarm optimization. Information Sciences, 181, 4673 – 4683. Jaramillo, P. & Selaive, J. (2006). Actividad especulativa y precio del cobre. Central Bank of Chile Working Papers, No. 384. Kang, S. & Yoon, S. (2013). Modeling and forecasting the volatility of petroleum futures prices. Energy Economics, 36, 354 – 362. Kristjanpoller, W. & Minutolo, M. (2016). Forecasting volatility of oil price using an Artificial Neural Network-GARCH model. Expert Systems with Applications, 65, 233 – 241. Kristjanpoller, W. & Hernandez, E. (2017). Volatility of main metals forecasted by a hybrid ANN-GARCH model with regressors. Expert Systems with Applications, 84, 290 – 300. Labys, W. (1999). Modeling energy and mineral markets. New York: Springer. Labys, W., Kouassi, E. & Terraza, M. (2000). Short-term cycles in primary commodity prices. The Developing Economies, 38, 330 – 342. Labys, W. (2016). Modeling and forecasting parimary commodity prices. New York: Routledge. 38
Lagos, G. (2013). El incierto precio del cobre. http://www.gustavolagos.cl/publicaciones.html . Accessed on 14 june 2018. Lankhorst, M. M. (1996). Genetic algorithms in data analysis Groningen: s.n. Luna, I. & Ballini, R. (2011). Top-down strategies based on adaptive fuzzy rule-based systems for daily time series forecasting. International Journal of Forecasting, 27, 708 – 724. Marshall, I. & Silva, E. (2002). Determinación del precio del cobre: Un modelo basado en los fundamentos del mercado. In: Meller, P. (Ed). Dilemas y debates en torno al cobre. Santiago: Dolmen Ediciones. Marra, S. (2015). Predicting volatility. Investment Research. Lazard Asset Management LLC, https://www.lazardassetmanagement.com/us/en_us/researchinsights/investment-research/22430-Predicting-Volatility . Accessed on 27 November 2017. Ministry of Economic Affairs of Taiwan. Industrial Statistics. http://dmz9.moea.gov.tw/gmweb/investigate/InvestigateDB.aspx?lang=E . Accessed 27 November 2017. Panapakidis, I. & Dagoumas, A. (2017). Day-ahead natural gas demand forecasting based on the combination of wavelet transform and ANFIS/genetic algorithm/neural network model. Energy, 118, 231 – 245. Patton, A. (2011). Volatility forecast comparison using imperfect volatility proxies. Journal of Econometrics, 160, 246 – 256. Pindyck, R. & Rubinfeld, D. (2017). Microeconomics, 8th edn. London: Pearson Education. Pradeepkumar, D. & Ravi, V. (2017). Forecasting financial time series volatility using Particle Swam Optimization trained Quantile Regression Neural Network. Applied Soft Computing, 58, 35 – 52. Radetzki, M. & Wårell, L. (2017). A handbook of primary commodities in the global economy, second expanded edition. Cambridge University Press, UK. Reinhart, C. (1991). IMF Econ Rev, 38: 506. doi.org/10.2307/3867156 Ridler, D., & Yandle, C. (1972). A Simplified Method for Analyzing the Effects of Exchange Rate Changes on Exports of a Primary Commodity. Staff Papers (International Monetary Fund), 19(3), 559-578. doi:10.2307/3866417
39
Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1986a). Learning internal representation by back-propagating errors. In: D. E. Rumelhart, & J. L. McCleland. Parallel Distributed Processing: Explorations in the Microstructure of Cognition, vol. 1. Boston: MIT Press. Rumelhart, D., Hinton, G., & Williams, R. (1986b). Learning representations by backpropagating errors. Nature, 323, 533– 536. Sánchez, F., de Cos, F., Suárez, A., Krzemien, A. & Riesgo, P. (2015). Forecasting the COMEX copper spot price by means of neural networks and ARIMA models. Resources Policy, 45, 37 – 43. Schmitt, L. (2001). Theory of genetic algorithms. Theoretical Computer Science, 251, 1 – 61. Shahab, A. (2014). Data-driven modeling: Using MATLAB in Water Resources and Environmental Engineering. New York: Springer. Sheta, A. (2006). Software effort estimation and stock market prediction using TakagiSugeno fuzzy models. IEEE International Conference on Fuzzy Systems. Singh, N. & Mohanty S. (2017) Short term price forecasting using adaptive generalized neuron model. In: S. Bhatia et al. (Eds). Advances in Computer and Computational Sciences. Advances in Intelligent Systems and Computing, vol 553. Singapore: Springer. Takagi, T. & Sugeno, M. (1985). Fuzzy identification of systems and its applications to modeling and control. IEEE Transactions on Systems, Man and Cybernetics, SMC15, 116 – 132. Tan, L., Wang, S. & Wang, K. (2017). A new adaptive network-based fuzzy inference system with adaptive adjustment rules for stock market volatility forecasting. Information Processing Letters, 127, 32 – 36. Tay, F. & Cao, L. (2002). Modified support vector machines in financial time series forecasting. Neurocomputing, 48, 847 – 861. Teraesvirta, T., Lin, C. & Granger, C. (1993). Power of the neural network linearity test. Journal of Time Series Analysis, 14, 209–220. Tilton, J., Humphreys, D. & Radetzki, M. (2011). Investor demand and spot commodity prices. Resources Policy, 36, 187 – 95. Tilton, J. (2014). Cyclical and secular determinants of productivity in the copper, aluminum, iron ore, and coal industries. Mineral Economics, 27, 1 – 19. 40
Tilton, J. & Guzmán, J. (2016). Mineral Economics and Policy. New York: Resources for the Future Press. Triantafyllou, A., Dotsis, G. & Sarris, A. (2015). Volatility forecasting and time-varying variance risk premiums in grains commodity markets. Journal of Agricultural Economics, 66, 329 – 357. Truck, S. & Liang, K. (2012). Modelling and forecasting volatility in the gold market. International Journal of Banking and Finance, vol. 9. Tully, E. & Lucey, B. (2007). A power GARCH examination of the gold market. Research in International Business and Finance, 21, 316 – 325. U.S Federal Reserve Bank of St. Louis. Economic Data. https://fred.stlouisfed.org/series . Accessed on November 27 2017. Wagner, N., Michalewicz, Z., Khouja, M. & McGregor, R. (2007). Time Series Forecasting for Dynamic Environments: The DyFor Genetic Program Model. IEEE Transactions on Evolutionary Computation, 11, No. 4. Wei, L. (2013). A GA-weighted ANFIS model based on multiple stock market volatility causality for TAIEX forecasting. Applied Soft Computing, 13, 911 – 920.
41
APPENDIX A. Number of different specifications for GARCH model with explanatory variables Let V be the set of possible variables that can be used as inputs in a GARCH model with exogenous variables, both in its mean and variance equation. We can denote the set of variables that only serve as input for the Eq. (1a), and variables for the Eq. (1c). Then,
the set of input
.
As explained in section 3.2, it follows that the number of elements (variables) that the previous subsets possess, i.e., the cardinality of each, is provided by:
,
. In turn, the number of combinations without repetition with only i elements, with
, where n is the total quantity of elements in the set, is given by . Then, if we consider all possible combinations for each element of n in the previous
sets
and
, which we will call
and
, respectively, we obtain:
which are equal to the number of various ways in which we can specify the equations Eq. (1a) and Eq. (1c), independently, respectively. Then, combining the specifications of each equation, and only considering exogenous variables, each model can specify an amount equal to
. Finally, given that the GARCH model can adjust its orders p
and q up to only a maximum of 4 (GARCH(1,1), GARCH(3,1), …, GARCH(4,4)), which creates 24 different forms, a total amount is obtained of 𝟏𝟎𝟔𝟎 possible forms.
42
(24) = 9.64 x
APPENDIX B. Time series of return and variance of the exogenous variables.
Return 0.6
Volatility 0.45 0.40
0.4
Return 0.08
Volatility 0.004
0.06
0.35
0.2
0.30
0 -0.2
0.25
0.02
0.20
0
0.15 0.10
-0.4
0.05
-0.6 0.00 1990 1993 1996 1999 2002 2005 2008 2011 2014 2017 RINV VINV
Return 0.3
0.003
0.04
0.002 -0.02
0.001
-0.04 -0.06 1990 1993
1996 1999
2002 2005 2008 2011 RMCI VMCI
0 2014 2017
Return 0.3
Volatility 0.10
0.2
0.2
0.08
0.1
0.1
0
0.06
-0.1
0.04
0 -0.1
-0.2 0.02
-0.3
-0.2
-0.4 0.00 1990 1993 1996 1999 2002 2005 2008 2011 2014 2017 RFUT
-0.3 1990
VFUT
1993
1996
1999
2002
2005 RIPICH
43
2008
2011
2014
2017
APPENDIX C. Table 8 Configuration of Time series, Artificial Intelligence and Hybrids models. Variable Model
configuration parameters Window Size [LB,UB]
1 ARIMAX
[LB, UB]b None
[96,168]
2 GARCH
p,q [1,4]
[96,168]
3 GARCHa
p,q [1,4]
[96,168]
None N: [3, 78], L: [2, 30] Radius: ]0, 0.3] e N: [3, 78], L: [2, 30] N: [3, 78], L: [2, 30] Radius: ]0, 0.3] e Radius: ]0, 0.3] e
[96,168] [48,124] [40,60] [48,124] [48,124] [40,60] [40,60]
Time series 4 GARCH(1,1) 5 ANN Artificial Intelligence 6 FIS 7 ANN-ARIMAd 8 ANN-GARCHe Hybrid 9 FIS-ARIMAd 10 FIS-GARCHe a. b. c. d. e.
This is a self-explained A-GARCH, i.e., it uses only variables related to lags of the dependent variable (returns and variances). LB denotes Lower Bound. UB denotes Upper Bound. N denotes Neurons, L denotes Layers. In previous experiments, all models which involve FIS yield better results in these ranges of radius and window size. The best forecasting of the ARIMA models is used as input in the ANN. The best forecasting of the ARIMA models is used as input in the ANN.
44
APPENDIX D. Optimal explanatory variables resulting from the best model
The lack of some variables in the Step 1 is due to either not being considered by the results of Terasvirta’s test, or because the adaptive model showed that these variables do not have additional value for forecasting purposes. Otherwise, the absence of variables in the Step 2 is due to only the second cause, because as explained in the methodology, Terasvirta's test is not used to perform this Step for the non-parametric model.
Table 9 Optimal explanatory variables resulting from the best model STEP 1 Return Variables 1 AR : 1,8,10,12 a
2 MAb: 1,2,8,9,10,12 3 Lags RMCI: 4,7,8,10,12 4 Lags RINV: 2,4,5,8 5 Lags RIPICH: 2,3,5,7,9,12
Variance Variables 1 Constant 2 Lags VCU: 4,5 3 Lags VMCI: 4,5,6,8,10,11,12 4 Lags VINV: 5,9 STEP 2
Models forecasts 1 Yc:
Variance Variables 1 Lags VCU: 1,2,4,6,8,10,11,13,15,16,18,19,20 2 Lags VMCI: 2,3,5,6,7,9,11,14,16,18,19,21,23 3 Lags VINV: 3,5,7,8,11,12,15,16,17,19 4 Lags VFUT: 1
a. b. c.
Autorregressive terms. Moving Average terms. refers to the forecasts provided by the best models from Step 1. The value of the subindex represents the ranking order. RMCI: Major Currency Index returns, RINV: Inventory returns, RIPICH: China’s Industrial Production Index returns, VCU: Copper spot return variance, VMCI: Major Currency Index return variance, VINV: Inventories return variance, VFUT: Copper Futures return variance.
45
Nomenclature. ADF Augmented Dickey-Fuller Test
MSE Mean Squared Error
ANFIS Adaptive Neuro Fuzzy Inference Systems
LME London Metal Exchange
ANN Artificial Neural Networks ARCH AutoRegressive Conditional Heteroskedasticity
MCS Model Confidence Set
ARIMA AutoRegressive Integrated Moving Average
RFU Copper spot future returns
ARIMAX ARIMA with exogenous variables. COCHILCO Comisión Chilena del Cobre
RIPICH China’s Industrial Production Index returns
COMEX Commodity Exchange Inc.
RMCI Major Currency Index returns
FIS Fuzzy Inference Systems
VCU Copper spot return variance
GA Genetic Algorithm
VFUT Copper Futures return variance
GARCH Generalized AutoRegressive Conditional Heteroskedasticity
VINV Inventories return variance
RCU Copper spot price returns
RINV Inventory returns
VMCI Major Currency Index return variance
HMSE Heteroskedasticity-adjusted Mean Squared Error
46