Hybridization of intelligent techniques and ARIMA models for time series prediction

Hybridization of intelligent techniques and ARIMA models for time series prediction

Fuzzy Sets and Systems 159 (2008) 821 – 845 www.elsevier.com/locate/fss Hybridization of intelligent techniques and ARIMA models for time series pred...

937KB Sizes 0 Downloads 55 Views

Fuzzy Sets and Systems 159 (2008) 821 – 845 www.elsevier.com/locate/fss

Hybridization of intelligent techniques and ARIMA models for time series prediction O. Valenzuelaa , I. Rojasb,∗ , F. Rojasb , H. Pomaresb , L.J. Herrerab , A. Guillenb , L. Marqueza , M. Pasadasa a Department of Applied Mathematics, University of Granada, Spain b Department of Computer Architecture and Computer Technology, University of Granada, Spain

Received 17 February 2006; received in revised form 31 October 2007; accepted 4 November 2007 Available online 17 November 2007

Abstract Traditionally, the autoregressive moving average (ARMA) model has been one of the most widely used linear models in time series prediction. Recent research activities in forecasting with artificial neural networks (ANNs) suggest that ANNs can be a promising alternative to the traditional ARMA structure. These linear models and ANNs are often compared with mixed conclusions in terms of the superiority in forecasting performance. In this paper we propose a hybridization of intelligent techniques such as ANNs, fuzzy systems and evolutionary algorithms, so that the final hybrid ARIMA–ANN model could outperform the prediction accuracy of those models when used separately. More specifically, we propose the use of fuzzy rules to elicit the order of the ARMA or ARIMA model, without the intervention of a human expert, and the use of a hybrid ARIMA–ANN model that combines the advantages of the easy-to-use and relatively easy-to-tune ARIMA models, and the computational power of ANNs. © 2007 Elsevier B.V. All rights reserved. Keywords: Time series; ARIMA models; Fuzzy systems; Neural networks; Hybrid system

1. Introduction When modelling linear and stationary time series, the researcher frequently chooses the class of autoregressive moving average (ARMA) models because of its high performance and robustness [6]. The ARMA(p, q) model for a time series can be expressed as xt =

p  i=1

i xt−i −

q 

j et−j + et ,

(1)

j =1

where et is a normal white noise process with zero mean and variance 2 , and t = 1, . . . , T , T being the number of data of the time series. It is assumed that the autoregressive and moving average parameters satisfy the conditions for stationarity and invertibility, respectively. Since the moving average component of Eq. (1) is assumed to be invertible, ∗ Corresponding author. Tel.: +34 958 246128; fax: +34 958 248993.

E-mail address: [email protected] (I. Rojas). 0165-0114/$ - see front matter © 2007 Elsevier B.V. All rights reserved. doi:10.1016/j.fss.2007.11.003

822

O. Valenzuela et al. / Fuzzy Sets and Systems 159 (2008) 821 – 845

Eq. (1) can be rewritten as an infinite order autoregression in the form: xt =

∞ 

i xt−i + et .

(2)

i=1

In order to implement the equation, the autoregression is truncated to a finite order, say R, being R < T , and ignoring the error induced by the truncation gives xt =

R 

i xt−i + et .

(3)

i=1

Eq. (3) can be estimated by OLS to obtain estimates of i ,  i , et and et , where  et = xt −

R 

 i xt−i .

(4)

i=1

These estimates of the innovation series et and et can be substituted into Eq. (1), and therefore the time series can be modelled as xt =

p  i=1

i xt−i −

q 

et−j + t , j 

(5)

j =1

q where t = et − j =1 j (et−j −  et−j ). The main advantage of ARMA forecasting is that it only requires data of the time series in question. First, this feature is advantageous if we are forecasting a large number of time series. Second, this avoids a problem that occurs sometimes with multivariate models: for example, consider a model including wages, prices and money. It is possible that a consistent money series is only available for a shorter period of time than the other two series, restricting the time period over which the model can be estimated. Third, with multivariate models, timeliness of data can be a problem. If one constructs a large structural model containing variables which are only published with a long lag, such as wage data, then forecasts using this model are conditional forecasts based on forecasts of the unavailable observations, adding an additional source of forecast uncertainty. Different order selection methods based on validity criteria, based on the information-theoretic approaches such as the Akaike’s information criterion (AIC) or with other model selection techniques like minimum description length (MDL) have been proposed in [24,29,35,55]. The selection of a particular ARMA model, however, is neither easy nor without implications for the goal of the analysis. In fact, in [20] it is noted that for time time-varying model identification, the optimal model structure is not easy to determine, so that it is difficult to apply the AIC or MDL criteria, which often leads to failure in selecting a suitable model order. Therefore, different approaches based on intelligent paradigms, such as neural networks, genetic algorithms or fuzzy system have been proposed to improve the accuracy of order selection of ARIMA models [63,74,25,20,39,42,41,31,13,8,14]. It is important to note that, nowadays, the most used and commercial tools for time series forecasting (Statgraphics, SPSS, etc.) require the intervention of a human expert for the definition of the ARMA model or the specification of the types of model to be fitted to the time series (X-12-ARIMA). Some other problems which arise when forecasting with ARMA models are: • The result of the Box–Jenkins procedure depends greatly on the competence and experience of the expert researchers and is affected by a strong path-dependence [66]. Unfortunately, one frequently observes that different models have similar estimated correlation patterns, and the choice among competitive models can be quite arbitrary. Some concern derives also from comparative studies in which experts, asked to identify a number of time series, frequently reach different conclusions [6,10,56]. • A further objection to the Box–Jenkins procedure for model selection is related to the time required to develop the identification model, which sometimes can be excessively high. • It is not embedded within any underlying theoretical model or structural relationships. Furthermore, it is not possible to run policy simulations with ARMA models, unlike with structural models • ARMA models are linear models but, in the real-word time series are rarely pure linear combinations.

O. Valenzuela et al. / Fuzzy Sets and Systems 159 (2008) 821 – 845

823

The existence of these disadvantages together with the advance of artificial intelligence techniques, have stimulated the development of new alternative solutions to time series forecasting. Two of the forecasting techniques that allow for the detection and modelling of nonlinear data are rule induction [46,27,21,44] and neural networks [57,47,51]. Rule induction identifies patterns in the data and expresses them as rules. Expert systems are an example of use of this technique. On the other hand, neural networks present an important class of nonlinear prediction model family that has generated considerable interest in the forecasting community in the past decade. The main advantage of this modelling paradigm lies on the universal approximation property, i.e., neural networks could approximate any continuous function defined on a compact set with arbitrary precision. The model parameters are iteratively adjusted and optimized through network learning of historical patterns. One of the best known neural network models is the multi-layer feed-forward neural network, called the multi-layer perceptron (MLP), consisting of an input layer, one or several hidden layers and an output layer. Nevertheless, it is worth mentioning that ARMA models are still the benchmark against which the effectiveness of the new proposed methodologies are tested [63,65,15,1,25]. In fact, it is important to note the comparative study of back-propagation networks versus ARMA models on selected time series data from the well-known M-competition [36] performed by Tang et al. [57]. It was concluded that neural networks not only could provide better long-term forecasting but also did a better job than ARMA models when given short series of input data. Also, the same authors in Tang and Fishwich [58] conducted a more comprehensive study of back-propagation neural network’s ability to model time series. In [58] 14 time series data sets from the well-known M-competition plus two additional airlines and sales data sets were used, with the conclusion that ANNs perform better than Box–Jenkins models for more irregular series and for multiple-period-ahead forecasting. In [22] 104 time series data sets from the same M-competition were used and compared the performance of ANNs with that of six other traditional statistical models including the Box–Jenkins’ ARMA model. They found that ANNs performed considerably better than the six traditional methods for monthly and quarterly time series while Box–Jenkins model was comparable to neural networks for annual time series. When linearity or nonlinearity was considered, they concluded that the neural-network model was significantly better than the reference average. However, it was unclear how the neural-network model behaved compared with the Box–Jenkins model for these linear time series. This again motivated us to carry out the proposed study. Although the application of neural network modelling to Box–Jenkins models encompasses two sub-areas, that is, model identification (by a statistical feature extractor, perform a pattern classification) and forecasting (as a function approximator), the scope of this paper will be focused on both areas, but using a fuzzy expert system for automatic model identification. The organization of the rest of the paper is the following: the next section will explain in detail the fuzzy expert system, which is optimized by an evolutionary algorithm, to obtain in an autonomous way the structure of the ARIMA time series. Section 3 presents the hybrid methodology that combines the linear time series obtained in Section 2 and an ANN model, represented by a MLP, for time series forecasting. Section 4 presents the simulation results and Section 5 the main conclusion of the presented methodology. 2. Fuzzy expert system: the structure of the ARIMA model In Section 1, we have reported several justifications for an automatic ARMA model building procedure. They can be summarized by saying that using the Box–Jenkins method on a large scale requires both expertise and time. Aiming to make the method available to people without that expertise, several software vendors have implemented automated time series forecasting methods, e.g. Mandrake [3], and the list in Tashman [59]. However, these approaches were not very successful and nowadays, most popular statistical package (SPSS, Statgraphics, Matlab, etc.), require the intervention of the user. There are also other software packages that perform automatic model selection with minor intervention of an expert system (as STSA toolbox [71] or X-12-ARIMA software [72]). For example, in the X-12ARIMA software, the user has to specify which types of models are to be fitted to the time series in a procedure in which these different models are sequentially selected and evaluated. To determine whether or not a model fits the data well, the Portmanteau test of fit developed by Box and Pierce [7] with the variance correction for small samples introduced by Ljung and Box [34] is used, and in a final file, all the different models selected by the user expert are evaluated. The null hypothesis of randomness of the residuals is tested at a 5% level of significance and the estimated parameters are checked to avoid over-differencing. Additionally, in order to effectively and easily identify the order of ARIMA, some pattern identification methods have been proposed, including the R and S array method [68], the Corner method [5],

824

O. Valenzuela et al. / Fuzzy Sets and Systems 159 (2008) 821 – 845

the ESACF method [61], the SCAN method [62] and the MINIC method [19]. Although these identification methods seem to provide a better method for determining the order of the ARIMA model, there are some problems which require consideration [10]. In fact, most of them require the interpretation of a human expert for a clear identification of the ARIMA model, they cannot be used for seasonal time series model, and these methods provide only local optimum solutions. In [20] it is noticed that in order to improve the accuracy of the order selection of an ARMA model, it is possible to effectively use knowledge learned from experience. Because this knowledge is usually vaguely expressed, fuzzy systems provide the capability to express subjective and vague objects in an inference engine [45,52,64]. The goal of this section is to present an automated way to obtain the structure of an ARIMA model, using an expert system working with fuzzy logic. On many occasions, a series is not exclusively fitted to one model, but rather various models may equally well fit the series [57]. If we follow the norms of Box and Jenkins, the model chosen is nearly always the simplest one, i.e. the one involving fewest terms [2]. The method is carried out in the following way: the time series of a model must fulfil a set of rules. Given such a rule set, and a time series to be analysed, we measure how many rules are fulfilled by the series. Based on the degree of fulfilment of these rules, we can elicit which is the suitable model for that time series. Now, we have to be careful with one important issue: the number of rules fulfilled for a certain ARMA model should not be the only measure to take into account. Let us suppose that the AR(1) model is associated with two fuzzy rules, and that the ARMA(1,1) model is associated with six. It may occur that for a real AR(1) model, three out of the six ARMA(1,1) rules are satisfied. In this case, the method could mistakenly associate the series with an ARMA(1,1) model instead of the true AR(1) model. Therefore, the methodology proposed in this section will try to maintain a number of rules as similar as possible for every model, and also will assign weights to every rule. Thus, the degree of fulfilment of a model will depend on the sum of weights of the rules that a time series satisfies. Finally, another difficult task we will be dealing with in this section is to propose an algorithm capable of assigning weights to each fuzzy rule so that all time series analysed can be correctly identified. This, as we will see later, will be accomplished with the aid of an evolutionary algorithm.

2.1. Pre-processing the time series It is important to note, that in order to obtain the linear model of a time series, the first step is to determine if a natural log transformation of the series is necessary. This step is accomplished using the likelihood ratio. It is necessary to take into account that the likelihood is greater for the transformed series than the raw series when ⎛

  2 ⎞      1 1 2 1 2 ⎝ ⎠ log > log log(Yi ) − log(Y ) (Yi − Y ) + log(Yi ) . n n n n i

i

i

(6)

i

If the inequality is fulfilled, it is necessary to take the log transform. On the other hand, the univariate Box–Jenkins ARMA method applies only to stationary time series. In time series analysis, a stationary series has a constant mean, variance and autocorrelation through time, i.e., serial or seasonal dependencies have been removed (the formal definition of stationary is more complicated, but the definition given here is adequate for the present purposes). Serial dependency can be removed by differentiating of differencing the series, that is to say, converting each element of the series into its difference from the previous one. There are two major reasons for such transformations: First, the hidden nature of serial dependencies in the series can be thus identified and second, it is necessary for the Box–Jenkins methodology. Therefore, for an ARIMA(p, d, q) model, it is necessary to obtain the (p, d, q) parameters, where p is the number of autoregressive terms, d is the number of non-seasonal differences and q is the number of lagged forecast errors in the prediction equation. The first step is to identify the order of differencing needed to make the series stationary (in conjunction with a previous variance-stabilizing transformation such as taking logs or deflating, as presented in Eq. (6)). Most economic (and also many other) time series do not satisfy the stationarity conditions stated earlier for whichARMA models have been derived. Then these times series are called non-stationary and should be re-expressed so that they become stationary with respect to the variance and the mean, selecting the appropriate value of parameter d.

O. Valenzuela et al. / Fuzzy Sets and Systems 159 (2008) 821 – 845

825

So, before the learning program starts to test the rules fulfilled by each series, a number of simple tasks must be performed: 1. Based upon the information given by the coefficients of ACF and PACF we can define the following steps to determine the stationarity of the time series:

2. 3.

4.

5.

6.

7.

(a) If the time series has many high positive autocorrelation coefficients, then it probably needs a higher order of differentiation. (b) Starting from an original time series with positive autocorrelations, if after differentiating the first autocorrelation coefficient is close to zero (statistically not significant) or negative, then the series does not need a higher order of differentiation. If the first autocorrelation coefficient is near −0.5 or more negative, then the series is overdifferentiated. (c) The optimum order of differentiation is frequently the one with which the standard deviation of the series is smaller. Obtain the estimated ACF and PACF coefficients and the error criterion for the already stationary time series. Check that the series is not white noise. If 95% of the samples fall within the error criterion, the series is white noise, the model is classified (0,d,0) (where d is the differentiation) and no analysis may be undertaken. This check might not be of any use for the learning program, as what is received by the program is a simulated series and not white noise. Exponential fit of the first terms of the ACF and of the PACF. The program attempts to determine whether the shape of the ACF and of the PACF is similar to that of any of the theoretical shapes of the various models described above. To do this, an exponential fit is carried out on the first terms of the ACF and of the PACF, fitting them to an exp(−x) curve. By these means, values of  are obtained for the ACF and the PACF, and these values will be used, together with those of the correlation coefficient, to help identify the model. Determine the spikes in the coefficients of autocorrelation and of partial autocorrelation. In fact, if the series is (for example) of the AR(1) type, it will present a spike in the PACF. This is what is determined in this stage of the procedure. As the program is unaware of the type of series presented, it checks the number of ACF and PACF coefficients that are 70% above the error criterion. Estimate  and . When this step is reached, we still do not know the series type, but it is possible to estimate  and  for various known types of series. In this way, if (for example) the series is MA(2), the coefficient that is estimated for the 1 case will not fulfil the corresponding mathematical rules and this model will be rejected. As a special case, we might consider that of estimating the 1 and 2 coefficients for the MA(2) models. Remember that these coefficients are estimated by applying the Yule–Walker equations, which in this case are nonlinear second-order equations. They are resolved by means of the nonlinear least-squares method. Test changes of sign in the ACF and the PACF. This is performed in order to determine which model is best fitted.

2.2. Creation of the rules for the expert system After the pre-processing stage, in this subsection we will present the rules created for our expert system in order to determine the correct ARMA model, and their associated weights. As we have mentioned before, in this paper we propose a weighted fuzzy rule based system in which each rule has not the same contribution to the final output of the system, and therefore, the weight associated with each rule can be interpreted as a measure of significance or importance. The rules created can have either positive or negative weights; a negative weight is the one associated with rules of the form: “if these conditions are fulfilled, then the model is not of this type”. Conversely, positive-weighted rules give a weight that supports that the time series being currently analysed is of a specific type. The range of the weights, in absolute value, are in the interval [0,1], and can be interpreted as a degree of support for a rule [40]. Taking into account the behaviour of a weight in the final contribution of a rule for obtaining the output of the fuzzy expert system, the rule weight could be understood as a degree of influence, and therefore rules with small weights should marginally contribute to the final output and rules with big weights (close to one) have much significance for the final conclusion. In [40] it is showed how rule weights influence the interpretability of a fuzzy system, and how this influence can be equivalently attained by modifying the membership functions (in the antecedent or consequent) of a fuzzy system. Nevertheless, modifying the membership functions can dramatically affect the linguistic interpretability of the fuzzy

826

O. Valenzuela et al. / Fuzzy Sets and Systems 159 (2008) 821 – 845

system. In [60], a genetic algorithm is proposed to tune a weighted fuzzy rule-based system. In this approach the parameters defining the membership functions, including position and shape of the fuzzy rule set and weights of rules are evolved jointly using a genetic algorithm, obtaining successful results in classification problems. In [11] a new method to estimate null values in relational database systems using genetic algorithms is proposed, where the attributes appearing in the antecedent parts of the generated fuzzy rules have different weights. A similar approach, but using a neural network is presented in [17], to generate weighted fuzzy rules. In this method, through analysis of the weights of neural network, it is possible to obtain a matrix of importance indices that indicates the influential power of each input node in determining the output for each output node. Therefore weighted fuzzy rules are extracted from the neural network. However, in the kind of fuzzy rule weighting proposed in [17] the number of parameter to be optimized is very high (one weight not for one rule, but for all the membership functions in the antecedent of each rule) and the linguistic interpretation of the final fuzzy system is not preserved. The expert system that is created consists of a total of 38 fuzzy rules (Table 1) using the Mamdani inference model. There exist many possibilities to select the set of basic operations in the fuzzy inference process. Thus, there are several alternatives for the defuzzifier, to define the fuzzy implication operators, to select the conjunction and disjunction operators, the set of membership functions to be used in the definition of the linguistic variables, etc. [48,49]. In this paper, the fuzzy inference method uses the minimum as T-norm and the centroid method as the defuzzification strategy, and the type of membership function was triangular functions [50]. For example, analysing Table 1, the first two rules are defined for simple or pure AR/MA models. In fact, the ACF and PACF make it available some assistance on how to select pure AR or pure MA models. However, rule 3 is dedicated to mixture models, which are much harder to identify: RULE 3. If both PACF and ACF fall with the same abruptness, the model is ARMA(P , Q), being P and Q the number of PACF and ACF coefficients above the error criterion. This rule is suggested by the shape of the AR models and MA models. In pure AR models, the ACF falls smoothly, while the PACF falls abruptly. In pure MA models, the PACF falls smoothly, while the ACF falls abruptly. The number of PACF (ACF) coefficients above the error criterion will be the order of the pure AR model (pure MA model). To determine this, the program uses the previously obtained  values of the exponential fit. The exponential presenting the most abrupt change is the one with the highest absolute  value, such that if the  calculated for the PACF is greater than that for the ACF, the model will be AR (for MA, if the  calculated for the ACF is greater than that for the PACF, the model will be MA). To determine the order, we examine the number of spikes in the PACF and ACF. As the series are not ideal, but have a component of white noise, we only take into consideration the PACF and ACF that are 70% above the error criterion. 2.3. Optimization of the fuzzy logic rules In this paper we use a genetic algorithm to optimize just only the weight associated to each fuzzy rule of the expert system. The behaviour of a genetic algorithm can be briefly described as: evaluate a population and generate a new one iteratively, with each successive population referred to as a generation. Given the current generation at iteration t, G(t), the GA generates a new generation, G(t + 1), based on the previous generation, applying a set of genetic operations. Given an optimization problem, simple GAs encode the parameters concerned into finite bit strings, and then run iteratively using genetic operators in a random way but based on the fitness function evolution to perform the optimization task. Note that the necessity for binary encoding of the parameters required is a controversial issue [16,23,69]. The problem existing in binary coding lies in the fact that a long string always occupies space in the computer memory even though only a few bits are actually involved in the crossover and mutation operations. This is particularly the case when a lot of parameters need to be adjusted in the same problem and a higher precision is required for the final result [23]. It has been shown that real-encoded GAs converge faster, the results are more consistent and higher precision can be achieved, especially for larger problems where binary representation becomes prohibitively long and search time increases as well. This paper is concerned with real number representation [16], which in principle seems more favourable for optimizing the parameters of the rule weight of the fuzzy system, which are defined in a continuous domain [54,33]. Therefore, the parameters of each weighted fuzzy rule are encoded in a chromosome as a real vector. Typical crossover and mutation operators have been used for the manipulation of the current population in every iteration of the GA. The crossover operator is “simple one-point crossover”. the mutation operator is “non-uniform mutation” [38]. This operator presents the advantage, when compared to the classical uniform mutation operator,

O. Valenzuela et al. / Fuzzy Sets and Systems 159 (2008) 821 – 845

827

Table 1 Rules used by the expert system for the ARMA model structure 1. 2. 3. 4. 5. 6. 7. 8.

9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25. 26. 27. 28. 29. 30. 31. 32. 33. 34. 35. 36. 37. 38.

If the PACF fall more abruptly than the ACF, then the model is AR(P ), being P the number of PACF above the error criterion If the ACF fall more abruptly than the PACF, then the model is MA(Q), being Q the number of ACF above the error criterion If both PACF and ACF fall with the same abruptness, the model is ARMA(P , Q), being P and Q the number of PACF and ACF coefficients above the error criterion If the ACF are discontinued suddenly, the process is MA(Q), where Q is the number of ACF above the error criterion If both PACF and ACF fall with exactly the same abruptness, the model is ARMA(1,1) If the number of coefficients above the error criterion in the ACF is less than in the PACF, the model is MA(Q), where Q is the number of ACF coefficients above the error criterion If the number of coefficients above the error criterion in the PACF is less than in the ACF, the model is MA(Q), where Q is the number of PACF coefficients above the error criterion If the number of coefficients above the error criterion in the ACF is equal than the number of coefficients above the same criterion in the PACF, the series is ARMA(P , Q), being P and Q the number of PACF and ACF coefficients above the error criterion, respectively If the ACF sign changes, the series is AR(3) If the correlation coefficient of the ACF is greater than the PACF, the model is AR(1) If the estimated 1 coefficient for the AR(1) model fulfils the stationary conditions, the series is AR(1) If the series is AR(1), the first estimated correlation coefficient will be approximately equal to the first estimated partial autocorrelation coefficient If the estimated 1 coefficient is positive and the PACF peak is positive, the series is AR(1) If the estimated 1 coefficient is negative and the PACF peak is negative, the series is AR(1) If the  from the PACF exponential adjustment is zero, the series is AR(1) If the estimated 1 coefficient for the model fulfils the invertibility requirements, the series is MA(1) If only the first autocorrelation coefficient is not null, the series is MA(1) If the estimated 1 coefficient is negative, the ACF peak is positive and the PACF change the sign starting from the positive sign, the process is MA(1) If the estimated 1 coefficient is positive, the ACF peak is negative and the PACF do not change the sign starting from the negative sign, the process is MA(1) If the  from the ACF exponential adjustment is zero, the series is MA(1) If the estimated 1 and 2 coefficients fulfil the stationary conditions, then the series is AR(2) If the series is AR(2), the two first autocorrelation coefficients must fall within the area enclosed by the following curves: 2 < 1 (r + 1) |rk1 | < 1, |rk2 | < 1, rk1 2 k2 If the ACF change theirs sign, the process is AR(2) If there are only two PACF with absolute value above the error criterion, the process is AR(2) If the estimated 1 and 2 coefficients fulfil the invertibility conditions, then the process is MA(2) If there are only two ACF with absolute value above the error criterion, the process is MA(2) If the model is MA(2), rk1 and rk2 must fall within the area enclosed by: 2 = 4r (1 − 2r ) rk1 + rk2 = −0.5, rk2 − rk1 = −0.5, rk1 k2 k2 If the PACF change theirs sign, the process is MA(2) If the estimated 1 and 1 coefficients fulfil the stationary and invertibility conditions, then the series is ARMA(1,1) If rules 11 and 16 are fulfilled, the model is ARMA(1,1) If rule 11 is not fulfilled, the series is not AR(1) If rule 16 is not fulfilled, the series is not MA(1) If rule 21 is not fulfilled, the series is not AR(2) If rule 25 is not fulfilled, the series is not MA(2) If rule 29 is not fulfilled, the series is not ARMA(1,1) If the probability of the series to be an AR(P ) is high and the probability to be an MA(Q) is small, then the series is AR(P ) If the probability of the series to be an MA(Q) is high and the probability to be an AR(P ) is small, then the series is MA(Q) If the probability of the series to be an AR(P ) is high and the probability to be an MA(Q) is high, then the series is ARMA(P , Q)

of performing less significant changes to the genes of the chromosome as the number of generations grows. This property makes the exploration-exploitation trade-off be more favourable to exploration in the early stages of the algorithm, while exploitation takes more importance when the solution given by the GA is closer to the optimal solution. An important function block for the behaviour of the genetic algorithm is the definition of the fitness function. The function fitness f (x) evaluates each chromosome (possible solution) in the population, and test if the end condition is satisfied. The fitness function is responsible for guiding the search towards the global optimum. For this purpose, it must

828

O. Valenzuela et al. / Fuzzy Sets and Systems 159 (2008) 821 – 845

be designed to be able to compare two different individuals in order to distinguish the better solutions from the others. In our approach, the fitness function is the sum of the different errors in the estimation of the ARIMA structure, for the different time series used. Mathematically, the fitness function, for a chromosome P in the generation t, is obtained as: 1 fitness(P t ) = N i i i=1 ARIMA (pR , dR , qR ) − ARIMA (pE , dE , qE ) + 1 = , (7) N i − p i )2 + (d i − d i )2 + (qi − qi )2 + (p i=1 R E R E R E where is a constant required to avoid an infinite result with zero distances, ARIMAi (pR , dR , qR )R is the real ARIMA model for the time series iand ARIMAi (pE , dE , qE )E is the estimated models obtained with the fuzzy expert system proposed, and . is the Euclidean distance, that evaluates the distance between both models. If various models are possible for a given series, we test whether the one assigned the highest weight is at the lowest distance from the real result. If so, there is no penalization, but otherwise what is stored is the average of the distances of the possible models from reality. Therefore, the fitness function adds the errors in the estimation of the structure of the ARIMA model for each of the time series used in the training set. Finally, in order to stop the genetic algorithm different criteria can be used. The most popular and easier criterion is the number of generations. Different analyses have been presented in the literature to determine a threshold for the maximum number of generations [9]; however, all of these studies are based on binary-encoded representation. Furthermore, all of these criteria select a maximum number of generations, number that used to be very big. In this paper, we have used a stop criterion based on the evolution of the fitness function, which allows the evolution to continue if the slope of the regression straight line of the last v0 generations is bigger than a certain value smin . The straight line of the last v0 generation for the fitness function of the best solution obtained can be computed as

t

t k t k=t−v0 +1 fitnessbest (P ) k=t−v0 +1 k − t k k=t−v0 +1 (k − t)fitnessbest (P ) − v0 slope(v0 ) = , (8)

2 t k − t t k=t−v0 +1 2 k=t−v0 +1 (k − t) − v0 where fitnessbest (P k ) is the best fitness of the population in the generation k. Finally, it is important to note that in order to obtain good generalization results, a large quantity of real series will be used as benchmarks in the prediction of time series (http://www.secondmoment.org/time_series.php). The results obtained for the simulations, except in the case of the mixed series, were highly promising and therefore, this ARIMA(p, d, q) structure will be used in conjunction with the neural network. 3. The hybrid ARIMA–ANN approach In this section time series decomposition is proposed in which ARIMA models and ANNs are combined in order to obtain a robust and efficient methodology for time series forecasting. Several time series decomposition methods have been presented in the literature, (simple moving averages, centred moving or weighted moving averages, local regression smoothing, additive or multiplicative decomposition, etc.) [4,12,18,37], which are mainly based on the following equation: Yt = Patternt + Error t = f (trend − cycle, seasonality, error).

(9)

In this approach, the first steps consist in removing the trend-cycle, and then isolate the seasonal component. The error or residual cannot be predicted, but can be identified. Although this decomposition approach has several theoretical weaknesses from a mathematical point of view, however, practitioners have largely uncared for these weaknesses and have used the methodology with significant success. It is important to note that the time series decomposition methods based on Eq. (9) use a linear mathematical model, being the X-12-ARIMA software developed by Findley et al. in 1997 [18] a very well-known tool. As we have stated in Section 1, there exist many works in the literature that have tried to compare the performance of ANNs against linear models for a wide range of time series. The conclusion is that we cannot determine a priori which method will be best suited to our series, since in some cases ANNs show a clear superiority but in other cases, they

O. Valenzuela et al. / Fuzzy Sets and Systems 159 (2008) 821 – 845

829

Initial Population Evolution

Statistical Feature Extractor

Genetic Operators Genetic Algorithm

Fuzzif ication Fuzzy Inf erence Process Weights Optimization

Fuzzy Weighted Rules

Intelligent Identification System

Time series

Automatic ARIMA system Identification

First Lineal Aproximation

+

Residual

Multi Layer Perceptron

+ Final Forecasting Fig. 1. Proposed hybrid method that combines both ARMA and ANN models.

prove to be unnecessary since they provide no further accuracy to the prediction capability of the model with respect to that obtained by the linear system. This is the key reason that motivated us to propose our hybrid model, which is a linear mixture of an ARIMA process and an ANN model. Therefore, the ARIMA model will be responsible of approximating the linear model inherent to the underlying process generating the series, and the ANN will be in charge of the more difficult nonlinear part. According to this scheme, we postulate that our time series can be decomposed in a linear and a nonlinear component: Yt = Lt + NL t ,

(10)

where Lt and NL t are the linear and the nonlinear components, respectively. Hence, the key idea is to apply in the first place an ARIMA model using the steps proposed in Section 2, and then pass the residuals of that model (if significant) to the neural network. Fig. 1 shows a schematic representing this approach.

830

O. Valenzuela et al. / Fuzzy Sets and Systems 159 (2008) 821 – 845

Now, some careful considerations should be taken with respect to the model given by Eq. (10). In the case of artificially producing a model given exactly by that equation, the fitting of an ARIMA model to that time series, will not identify the correct linear part, because the rest (nonlinear part + error) constitutes a systematic deviation instead of a simple error component. Let us remember that the linear parameters obtained will be those that minimize the sum of squared errors. Therefore, the ARIMA model will find a linear component that, although it would not necessary be the actual linear component of the problem, is the one that best fits the model. Nevertheless, the use of a first approximation by means of an ARIMA model, allows us to remove a linear component from the model which could hinder the prediction accuracy using only a nonlinear model directly. It is very important to point out that, although ANNs are universal approximators, it does not exist an exact method to optimize the network for a given model. There are several references in the literature that demonstrate that the bigger the number of the ANN’s parameters the more complex it is to train it, the bigger the computational effort this requires and the bigger the risk to be trapped in bad local optima. Thus, one of the ideas presented in this paper is that of “alleviating” the network of the task of approximating some parts of the model which could be obtained by exact mathematical models, since in this way we can concentrate all the ANN’s parameters on approximating those I /O relationships that could not be modelled otherwise. The use of this initial phase to capture the linearities of the system makes it possible that the size and complexity of the neural system is smaller than in the case the time series is modelled directly, with no pre-processing; and this redounds in a decrease of overfitting risk and in a better quality of the solutions attained. To design the ANN, a fully connected MLP with two hidden layers and one output layer has been used. This class of network consists of an input layer, a number of hidden layers and an output layer. For an MLP with one hidden layer, the output of the system given some selected past observations, Yt−d1 , Yt−d2 , . . . , Yt−dk , at lags d1, d2, . . . , dk, and H neurons in the hidden layer is obtained by:  t = o Y

H 

 who h

h=1

k 

 wih xt−di

.

(11)

i=1

Being wih and who the weights for the synapses between the inputs and the hidden neurons and between the hidden neurons and the output, respectively. The two functions h and o are the activation functions of the hidden and the output layer, respectively. The logistic function, expressed as: h (x) =

1 1 + e−x

(12)

is often used in the hidden layer, and the identity function for the output layer (o ). 4. Simulation results 4.1. Analysis of the automatic model selection expert system The first step was to carry out a wide battery of examples, with the purpose of analysing the robustness of the proposed expert system in the identification of the ARMA model developed in Section 2. Note that once the expert system has been optimized by means of the evolutionary algorithm, the identification of a time series is practically immediately obtained, since just only the inference process of the expert system is needed. It is also important to note that a total of 2175 experiments have been selected. Table 2 shows the simulation results of the expert systems proposed for the 2175 different time series using AR, MA and ARMA models similar to those presented in [26]. In [26], a simple MLP, with two layers and conventional nonlinear transfer functions, using the backpropagation algorithm for optimizing the connection of the neural network, produced some favourable results when compared with MLBP networks and the Box–Jenkins modelling approach for time series forecasting, using ARMA(2,2) as the most complex model. In our case we have extended the models to ARMA(4,3). From the global set of 2175 time series used for simulation, a total of 1618 were classified correctly; therefore the percentage of success is 74% and the error index is 26%. Fig. 2 shows graphically the error distribution for the several ARMA models used.

O. Valenzuela et al. / Fuzzy Sets and Systems 159 (2008) 821 – 845

831

Table 2 Simulation results of the expert systems proposed for 2175 different time series Real model Predicted model AR1 AR2 AR3 AR4 AR5 MA1 MA2 MA3 MA4 MA5 ARMA11 ARMA21 ARMA22 ARMA32 ARMA42 ARMA43 AR1 AR2 AR3 AR4 AR5 MA1 MA2 MA3 MA4 MA5 ARMA11 ARMA21 ARMA22 ARMA32 ARMA42 ARMA43

93 1 0 0 0 5 9 8 8 5 3 4 6 0 0 0

1 118 2 0 0 4 12 12 7 4 2 2 12 8 1 9

1 3 155 1 1 0 4 3 2 2 0 5 8 7 11 2

1 0 2 163 2 0 1 0 2 2 1 4 3 7 9 9

1 0 1 2 68 0 1 0 1 2 0 0 1 5 5 9

1 2 2 0 0 85 1 7 8 3 2 1 0 3 0 0

0 0 2 2 1 3 84 12 7 3 7 1 1 1 0 2

1 0 1 1 1 1 7 109 9 3 1 0 0 2 3 3

1 0 0 0 0 0 0 2 112 5 0 0 0 1 1 5

0 0 1 0 0 0 0 1 4 38 1 0 0 0 1 1

0 0 1 0 0 2 0 7 4 1 82 6 5 1 0 0

0 1 2 0 0 0 1 9 3 2 1 92 7 5 4 3

0 0 1 0 0 0 5 3 3 3 0 4 78 8 5 8

0 0 3 1 0 0 0 2 2 0 0 3 2 123 12 9

0 0 1 3 1 0 0 0 2 2 0 2 1 3 115 12

0 0 1 2 1 0 0 0 1 0 0 1 1 1 8 103

ERROR DI STRIBUTION EXPERT SYSTEM AR1 AR4 AR2 1% 2% 1% AR3 4% AR5 ARMA43 1% 14% MA1 3% MA2 7%

ARMA42 11%

MA3 12% ARMA32 9%

MA4 11%

ARMA22 8% ARMA21 6%

ARMA11 3%

MA5 7%

Fig. 2. Distribution of classification error for the expert system proposed among the different ARMA models.

We have compared these results with the X-12-ARIMA software, obtaining two sets of results because the procedure for the automatic model selection can use two different criteria (first and best). The percentage of success is 41% and the error index is 59% for the X-12-ARIMA software using the criterion method = first, and a 15% of success and 85% of failure for the X-12-ARIMA software using the criterion method = best. Tables 3 and 4 present the distribution

832

O. Valenzuela et al. / Fuzzy Sets and Systems 159 (2008) 821 – 845

Table 3 Simulation results of the X-12-ARIMA software for 2175 different time series (automdl, method=first) Real model Predicted model AR1 AR2 AR3 AR4 AR5 MA1 MA2 MA3 MA4 MA5 ARMA11 ARMA21 ARMA22 ARMA32 ARMA42 ARMA43 AR1 AR2 AR3 AR4 AR5 MA1 MA2 MA3 MA4 MA5 ARMA11 ARMA21 ARMA22 ARMA32 ARMA42 ARMA43

83 0 0 0 0 24 1 15 34 37 0 5 1 0 0 0

1 102 2 0 0 2 57 37 13 7 0 5 33 9 3 9

1 3 141 0 0 0 4 56 11 2 0 13 15 81 27 2

1 2 2 159 0 0 1 0 26 3 0 1 20 7 92 5

1 2 0 0 66 0 1 0 1 5 0 0 2 14 4 27

1 0 5 0 0 59 0 2 10 0 0 0 0 1 0 0

2 0 5 0 1 3 46 22 8 1 30 2 0 4 0 0

1 0 0 0 0 0 0 12 4 2 1 0 0 7 0 0

1 0 1 0 0 0 0 0 48 3 0 0 0 4 1 11

0 0 1 0 2 0 0 0 0 1 1 0 0 0 1 1

0 0 1 0 0 1 2 20 0 1 62 3 9 0 0 0

2 1 6 0 0 0 1 3 1 2 1 81 22 16 9 11

0 0 0 1 1 0 0 1 2 6 0 0 4 1 3 24

0 1 0 1 0 1 0 0 2 0 0 0 2 4 1 31

0 2 1 0 0 1 0 0 2 2 0 0 1 0 22 37

0 0 0 0 0 5 3 0 3 0 0 2 1 0 3 5

Table 4 Simulation results of the X-12-ARIMA software for 2175 different time series (automdl, method=best) Real model Predicted model AR1 AR2 AR3 AR4 AR5 MA1 MA2 MA3 MA4 MA5 ARMA11 ARMA21 ARMA22 ARMA32 ARMA42 ARMA43 AR1 AR2 AR3 AR4 AR5 MA1 MA2 MA3 MA4 MA5 ARMA11 ARMA21 ARMA22 ARMA32 ARMA42 ARMA43

11 0 0 0 0 4 1 1 2 0 0 0 0 0 0 0

1 11 0 0 0 5 9 0 2 0 0 0 4 2 1 1

0 6 26 0 0 2 11 17 4 2 0 3 11 14 6 0

4 7 15 31 0 9 7 5 12 1 6 4 17 25 38 0

7 17 5 16 58 11 13 12 12 19 11 11 15 25 12 19

2 0 0 0 0 3 0 2 2 0 0 0 0 1 0 0

2 0 0 0 0 2 9 9 5 0 4 0 0 0 0 0

4 0 0 0 0 1 3 11 6 2 3 0 0 1 0 0

0 0 5 1 0 0 1 1 13 2 10 0 0 5 0 0

6 0 10 5 1 2 9 5 17 6 6 2 0 8 0 0

1 0 0 0 0 0 5 3 0 0 13 2 7 0 0 0

5 2 10 0 0 0 2 10 5 1 6 17 11 5 8 3

1 4 17 8 0 3 2 15 13 6 11 15 9 10 11 7

6 11 9 16 1 2 4 4 17 12 5 10 6 13 12 21

13 28 15 36 0 13 10 22 19 8 5 17 11 10 35 43

24 27 44 43 10 41 32 52 31 16 8 26 21 35 39 73

of the error among the different ARMA models presented for both methods. Figs. 3 and 4 show graphically the error distribution for the several ARMA models used in both simulations. Other methods using soft-computing techniques have also been presented in the literature to effectively and easily identify the order of ARMA time series [20,39]. For example, in [13] an evolutionary algorithm searches for the best ARMA model using only 8 time series, and therefore it is difficult to compare with the proposed expert system, also because in this paper no error index for model ARMA selection is obtained. Similar conclusions can be obtained analysing [41], where 4 time series are used for comparison. Also using evolutionary algorithms, in [53] a new approach to parameter estimation was presented. Although the practical work carried out in [53] was still in the beginning as pointed out by the authors, in order to analyse the behaviour of their algorithm, synthetic ARMA(3,3) were used for

O. Valenzuela et al. / Fuzzy Sets and Systems 159 (2008) 821 – 845

833

ERROR DISTRIBUTION X-12-ARIMA SOFTWARE AR3 AR4 AR2 AR1 1% 3% 2% AR5 1% 1% ARMA 43 MA1 13% 3% MA2 6% ARMA42 12% MA3 14%

ARMA32 13% MA4 10% ARMA22 9%

MA5 ARMA21 ARMA11 6% 3% 3%

Fig. 3. Distribution of classification error for X-12-ARIMA software (automdl, method = first) among the different ARMA models.

ERROR DISTRIBUTION X-12-ARIMA (USING BEST METHOD IN AUTOMDL)

ARMA42 8%

ARMA43 6%

AR1 5%

AR2 6% AR3 8%

ARMA32 9%

AR4 8%

ARMA22 6%

AR5 1% MA1 5%

ARMA21 6% ARMA11 5%

MA2 6% MA5 4%

MA4 9%

MA3 8%

Fig. 4. Distribution of classification error for X-12-ARIMA software (automdl, method = best) among the different ARMA models.

their study, and as an average index over all tested time series they concluded that a quota of about 20% of the models were correctly identified, taking into account the computation cost for each simulation. It is important to note that not only the error index of the expert system presented in this paper is of relevance, but also the computation time and the computational cost to select a specific ARMA model is very small, compared with soft-computing techniques that use evolutionary algorithms or neural networks.

834

O. Valenzuela et al. / Fuzzy Sets and Systems 159 (2008) 821 – 845

PHASE PORTRAITS 60

3.5 3 2.5 2

40

1.5 ΔX

HISTOGRAM

50

30

1 0.5

20

0

10

-0.5

0

-1.5

-1 10 11 12 13 14 15 16 17 18 19 20 SIGNAL

10 11 12 13 14 15 16 17 18 19 20 X

Fig. 5. The characterization of the Eriel time series: (a) histogram and (b) the phase diagram.

4.2. Analysis of the complete hybrid system Once the forecasting methodology has been established, we can proceed with our intensive experimental studies. In this section, we provide several numerical examples to evaluate the advantages and the effectiveness of the proposed approach. We have used the root mean squared error (RMSE) and the correlation coefficient as performance measures.The expression for the RMSE is   n 1  (yt − yt )2 . (13) RMSE =  n t=1

A. Monthly lake Erie levels: A very interesting problem is the prediction of data that are not generated from a mathematical expression and thus governed by a particular determinism. For this purpose, we used a natural phenomenon, the monthly lake Erie levels from 1921 to 1970. The first 400 data pairs of the series were used as training data, while the remaining 200 where used to validate the model identified. Fig. 5 shows a characterization of this time series (its histogram and its phase diagram). Fig. 6 shows the predicted and desired values (dashed and continuous lines, respectively) using only ANNs and Fig. 7 shows the forecasting using the hybrid method. The error indices (RMSE and correlation coefficient) for both simulations are, respectively, [0.1045, 0.937] and [0.0820, 0.963]. Finally, Fig. 8 shows the results of correlating one prediction step ahead. B. Sunspots: This time series is a real series of the yearly sunspot data from 1770 to 1869. This series also has been analysed by Box and Jenkins, and is identified as having an ARMA(2,0) model or ARMA(2,1) model [43]. Fig. 9 shows this time series. The error index RMSE is [16.71, 9.32, 7.25] for the ARMA(2,1) model, using only ANNs and using the hybrid method, respectively. C. Stocks of evaporated and sweetened condensed milk: This series is taken from Makridakis et al. [37], and has a total of 120 data. The expert system proposed in this paper selects and ARMA(3,2) for this time series. Fig. 10 shows this time series. The error index RMSE is [12.06, 7.54, 5.86] for the ARMA(3,2) model, using only ANNs and using the hybrid method, respectively. D. Australian Monthly electricity production data from January 1980 to August 1995: This series is also taken from [37], and has a total of 476 data. The expert system proposed in this paper selects an ARMA(3,2) for this time series. Fig. 11 shows this time series. The error index RMSE is [411.271, 232.23, 193.21] for the ARMA(3,2) model, using only ANNs and using the hybrid method, respectively. E. Number of users logged on to an Internet server each minute over 100 min: This series is also taken from [37], and has a total of 100 data. The expert system proposed in this paper selects an ARMA(4,3) for this time series. Fig. 12 shows this time series. The error index RMSE is [3.01, 2.41, 1.95] for the ARMA(4,3) model, using only ANNs and using the hybrid method, respectively.

O. Valenzuela et al. / Fuzzy Sets and Systems 159 (2008) 821 – 845

835

DESIRED (SOLID LINE) AND PREDICTED (DASHED LINE) 1

a

0.8 0.6 0.4 0.2 0 -0.2 -0.4 -0.6 -0.8 -1 0

50

100

150

200

250

300

350

400

450

500

PREDICTION ERRORS 0.5

b 0.4 0.3 0.2 0.1 0 -0.1 -0.2 -0.3 -0.4 0

100

200

300

400

500

600

Fig. 6. Eriel time series (prediction step=1): (a) result of the original and predicted Eriel time series using only ANN and (b) prediction error.

F. Pollution 1966–1975: This series is also taken from [37], has a total of 130 data and represents the monthly shipment of pollution equipment from January 1966 through December 1975 (in thousands of French francs). The expert system proposed in this paper selects an ARMA(4,3) for this time series. Fig. 13 shows this time series. The error index RMSE is [371.5, 213.8, 197.4] for the ARMA(4,3) model, using only ANNs and using the hybrid method, respectively. G. Mackey–Glass chaotic time series: The methodology proposed in this paper has also been tested with the time series generated by the Mackey–Glass time-delay differential equation (a very well-known benchmark problem [67]). dx(t) ax(t − ) = − bx(t). dt 1 + x 10 (t − )

(14)

836

O. Valenzuela et al. / Fuzzy Sets and Systems 159 (2008) 821 – 845

DESIRED (SOLID LINE) AND PREDICTED (DASHED LINE). HYBRID SYSTEM 1.5

a

1

0.5

0

-0.5

-1 0

50

100

150

200

250

300

350

400

450

350

400

450

PREDICTION ERRORS 0.4

b

0.3 0.2 0.1 0 -0.1 -0.2 -0.3 -0.4 0

50

100

150

200

250

300

Fig. 7. Eriel time series (prediction step=1): (a) result of the original and predicted Eriel time series using proposed methodology (which are indistinguishable) and (b) prediction error.

Following the previous studies [30,32], the parameters were fixed to a = 0.2 and b = 0.1. One thousand data points were generated with an initial condition x(0) = 1.2, thus obtaining a chaotic time series with no clearly defined period; it does not converge or diverge, and is very sensitive to the initial conditions. The time series values at integer points were obtained applying the fourth-order Runge–Kutta method to find the numerical solution for the above equation. The values = 17 and x(t) = 0 for t < 0 were assumed. This data set can be found in the file mgdata.dat belonging to the Fuzzy logic Toolbox of Matlab. Fig. 14 shows a section of 1000 sample points used in our study. The first 500 data pairs of the series were used as training data, while the remaining 500 were used to validate the model identified. We compare again the behaviour of

O. Valenzuela et al. / Fuzzy Sets and Systems 159 (2008) 821 – 845

837

1.5

PREDICTED DATA

1

0.5

0

-0.5

-1 -1

-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

ORIGINAL DATA Fig. 8. Correlation for the Eriel time series using the proposed methodology.

Fig. 9. Sunspot time series.

the forecasting result using the proposed hybrid methodology and only ANNs. The error indices, the root mean square error and the correlation coefficient, for this simulation were 0.0122 and 0.998 for the model using only ANNs and 0.0027 and 1 for the proposed algorithm. Fig. 14a shows the forecasting using the hybrid methodology.

838

O. Valenzuela et al. / Fuzzy Sets and Systems 159 (2008) 821 – 845

Fig. 10. Stocks of evaporated and sweetened condensed milk time series.

Fig. 11. Australian monthly electricity production data from January 1980 to August 1995.

O. Valenzuela et al. / Fuzzy Sets and Systems 159 (2008) 821 – 845

Fig. 12. Number of users logged on to an Internet server each minute over 100 min.

Fig. 13. Pollution 1966–1975.

839

840

O. Valenzuela et al. / Fuzzy Sets and Systems 159 (2008) 821 – 845 DESIRED (SOLID LINE) AND PREDICTED (DASHED LINE). HYBRID SYSTEM 1.5

a

1

0.5

0

-0.5

-1 0

100

200

300

400

500

600

700

800

600

700

800

PREDICTION ERRORS 0.01

b

0.008 0.006 0.004 0.002 0 -0.002 -0.004 -0.006 -0.008 -0.01 0

100

200

300

400

500

1.5

c

PREDICTED DATA

1

0.5

0

-0.5

-1 -1

-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

ORIGINAL DATA

Fig. 14. Mackey–Glass time series (prediction step=6): (a) result of the original and predicted Mackey–Glass time series using proposed methodology, (b) prediction error and (c) correlation for the time series using the proposed hybrid methodology.

O. Valenzuela et al. / Fuzzy Sets and Systems 159 (2008) 821 – 845

841

Table 5 Comparison results of the prediction error of different methods for prediction step equals to 6 (500 training data) Method

Prediction error (RMSE)

Auto regressive model Cascade correlation NN Back-Prop. NN Sixth-order polynomial Linear predictive method Kim and Kim (genetic algorithm and fuzzy system) 5 MFs 7 MFs 9 MFs Jang et al. (ANFIS and fuzzy system) Wang et al. Product T-norm Min T-norm Classical neural network using RBF Our approach

0.19 0.06 0.02 0.04 0.55 0.049206 0.042275 0.037873 0.007 0.0907 0.0904 0.0114 0.0027

Table 5 compares the prediction accuracy of different computational paradigms presented in the literature for this benchmark problem (including our proposal), for various fuzzy system structures, neural systems and genetic algorithms. The data are taken from [28,30,32]. H. Lorenz attractor time series: The Lorenz equations were discovered by Ed Lorenz in 1963 as a very simplified model of convection rolls in the upper atmosphere. Later these same equations appeared in studies of lasers, batteries, and in a simple chaotic waterwheel that can be easily built. They have been used as a benchmark problem in time series prediction. The trajectories obtained when resolving this equations, for certain settings neither converge to a fixed point or a fixed cycle, nor diverge to infinity. A long trajectory of the Lorenz attractor (1500 points) was generated using a differential equation solver (Runge–Kutta method) with step size of 0.05 to create a univariate time series. The data were split into two parts: 1127 points were used for training and the remaining 361 for assessing the generalization capability of the model. Fig. 15a and b shows a characterization of this time series (its histogram and its phase diagram). Fig. 15c shows the prediction of the Lorenz attractor using the proposed methodology. As they are practically identical, the difference can only be seen on a finer scale (Fig. 15d). The error indices, i.e. the root mean square error and the correlation coefficient, for this simulation were 0.176 and 0.998 for the model using only ANNs and 0.0876 and 0.9996 for the proposed algorithm. It is important to note other approaches appeared in the literature; for example Iokibe et al. [27] obtain an RMSE of 0.244, Jang et al. [28] an RMSE of 0.143, using fuzzy and neuro-fuzzy systems. Fig. 16 shows the results of correlating one prediction step ahead. Finally, Fig. 17 shows the prediction result of the Lorenz attractor when the prediction step is modified. 5. Conclusion In this work we have proposed a hybrid ARIMA–ANN model for time series prediction. This model synergically combines the advantages of the easy-to-use and relatively easy-to-tune ARIMA models, and the computational power of ANNs. In order to obtain the order of differencing of the integrative part of the ARIMA model, and the order of the underlying AR and MA parts, we have proposed the use of an expert fuzzy logic-based system whose rules are weighted in an automatic way, without the intervention of a human expert. To solve this last problem, we have designed an evolutionary algorithm whose fitness function, stop condition and cross-over and mutation operators have been specifically chosen. The simulations performed show that the synergy of the different paradigms and techniques used produce excellent results for the construction of intelligent systems for automatic time series forecasting.

842

O. Valenzuela et al. / Fuzzy Sets and Systems 159 (2008) 821 – 845

PHASE PORTRAITS 120

2.5

a

2 1.5 1

80

0.5 ΔX

HISTOGRAM

100

b

60

0 -0.5

40

-1 -1.5

20

-2 0 -20 -15 -10

-2.5 -5

0 5 SIGNAL

10

15

-20 -15 -10

20

-5

0 X

5

10

15

20

DESIRED (SOLID LINE) AND PREDICTED (DASHED LINE). HYBRID SYSTEM 20

c

15 10 5 0 -5 -10 -15 -20 200

0

400

600

800

1000

1200

1000

1200

PREDICTION ERRORS 0.8

d

0.6 0.4 0.2 0 -0.2 -0.4 -0.6 -0.8 -1 0

200

400

600

800

Fig. 15. Lorenz time series: (a) histogram, (b) phase diagram, (c) original and predicted Lorenz time series using the proposed methodology (which are indistinguishable) and (d) prediction error.

O. Valenzuela et al. / Fuzzy Sets and Systems 159 (2008) 821 – 845

843

20 15

PREDICTED DATA

10 5 0 -5 -10 -15 -20 -20

-15

-10

-5

0

5

10

15

20

ORIGINAL DATA Fig. 16. Correlation for the Lorenz time series using the proposed methodology.

LORENZ ATTRACTOR 2.5 Training Test

2

RMSE

1.5

1

0.5

0 1

2

3

4

5

6

7

8

9

10

PREDICTION STEP Fig. 17. Prediction performance for the Lorenz attractor time series (change of RMSE by prediction step).

Acknowledgements This work has been partially supported by the Spanish Projects TIN2004-01419 and TIN2007-60587. We also want to thank the anonymous reviewers for their helpful and accurate comments.

844

O. Valenzuela et al. / Fuzzy Sets and Systems 159 (2008) 821 – 845

References [1] T. Al-Saba, I. El-Amin, Artificial neural networks as applied to long-term demand forecasting, Artificial Intelligence Eng. 13 (1999) 189–197. [2] A.-S. Adnan, A. Ahmad, Fitting ARMA models to linear non-Gaussian processes using higher order statistics, Signal Process. 82 (2002) 1789–1793. [3] R. Azencott, Modelisation ARIMA automatique: le systeme MANDRAKE, Cahiers du Centre d ’Etudes de Recherche Operationnelle 32 (1990) 229. [4] M.A. Baxter, A guide to seasonal adjustment of monthly data with X-11, third ed., Central Statistical Office, UK, 1994. [5] J.M. Beguin, C. Gourieroux, A. Monfort, Identification of a mixed autoregressive-moving average process: the Corner method, in: O.D. Anderson (Ed.), Time Series, North-Holland, Amsterdam, 1980. [6] G.E.P. Box, G.M. Jenkins, G.C. Reinsel, Time Series Analysis: Forecasting and Control, Prentice-Hall, Englewood Cliffs, NJ, 1994. [7] G.E.P. Box, D.A. Pierce, Distribution of residual autocorrelations in autoregressive integrated moving average time series models, J. Amer. Statist. Assoc. 65 (1970) 1509–1526. [8] O. Castillo, P. Melin, Hybrid intelligent systems for time series prediction using neural networks, fuzzy logic, and fractal theory, IEEE Trans. Neural Networks 13 (6) (2002). [9] P.A. Castillo-Valdivieso, J.J. Merelo, A. Prieto, I. Rojas, G. Romero, Statistical analysis of the parameters of a neuro-genetic algorithm, IEEE Trans. Neural Networks 13 (6) (2002) 1374–1394. [10] W.-S. Chan, A comparison of some of pattern identification methods for order determination of mixed ARMA models, Statist. Probab. Lett. 42 (1999) 69–79. [11] S.-M. Chen, C.-M. Huang, Generating weighted fuzzy rules from relational database systems for estimating null using genetic algorithms, IEEE Trans. Fuzzy Systems 11 (4) (2003). [12] W.S. Cleveland, S. Devlin, E. Grosse, Regression by local fitting: methods, properties and computational algorithms, J. Econometrics 37 (1988) 87–114. [13] P. Cortez, M. Rocha, Evolving time series forecasting ARMA models, J. Heuristics 10 (4) (2004) 415–429. [14] J.W. Denton, How good are neural networks for causal forecasting?, J. Business Forecasting 14 (1995) 17–20. [15] M.M. Elkateb, K. Solaiman, Y. Al-Turki, A comparative study of medium-weather-dependent load forecasting using enhanced artificial-fuzzy neural network and statistical techniques, Neurocomputing 23 (1998) 3–13. [16] L.J. Eshelman, J.D. Shaffer, Real-Coded Genetic Algorithms and Interval-Schema, Foundation of Genetic Algorithms 2, Morgan Kaufmann, Los Altos, CA, 1993 pp. 187–202. [17] T.-G. Fan, S.-T. Wang, J.-M. Chen, Generating weighted fuzzy production rules using neural networks, in: Proc. Fifth Internat. Conf. Machine Learning and Cybernetics, Dalian, 13–16 August 2006. [18] D.F. Findley, B.C. Monsell, W.R. Bell, M.C. Otto, B.C. Chen, New capabilities and methods of the X-12-ARIMA seasonal adjustment program, J. Business Econom. Statist. (1997). [19] E.J. Hannan, J. Rissanen, Recursive estimation of mixed autogressive-moving average order, Biometrika 69 (1) (1982) 81–94. [20] M. Haseyama, H. Kitajima, An ARMA order selection method with fuzzy reasoning, Signal Process. 81 (2001) 1331–1335. [21] L.J. Herrera, H. Pomares, I. Rojas, O. Valenzuela, M. Awad, MultiGrid-based fuzzy systems for function approximation, in: Advances in Artificial Intelligence, in: Lecture Notes in Computer Science, Vol. 2972, 2004, pp. 252–261. [22] T. Hill, M. O’Connor, W. Remus, Neural network models for time series forecasts, Management Sci. 42 (7) (1996) 1082–1092. [23] Y. Huang, C. Haung, Real-valued genetic algorithms for fuzzy grey prediction system, Fuzzy Sets and Systems 87 (1997) 265–276. [24] C.M. Hurvich, C.-L. Tsai, Regression and time series model selection in small samples, Biometrica 76 (2) (1989) 297–307. [25] H.B. Hwang, Insights into neural-network forecasting time series corresponding to ARMA(p; q) structures, Omega 29 (2001) 273–289. [26] H.B. Hwang, H.T. Ang, A simple neural network for ARMA(p; q) time series, Omega 29 (2001) 319–333. [27] T. Iokibe, Y. Fujimoto, M. Kanke, S. Suzuki, Short-term prediction of chaotic time series by local fuzzy reconstruction method, J. Intelligent Fuzzy Systems 5 (1997) 3–21. [28] J.S.R. Jang, C.T. Sun, E. Mizutani, Neuro-Fuzzy and Soft Computing, Prentice-Hall, Englewood Cliffs, NJ, 1997, 0-13-261066-3. [29] R.H. Jones, Fitting autoregressions, J. Amer. Statist. Assoc. 70 (351) (1975) 590–592. [30] D. Kim, C. Kim, Forecasting time series with genetic fuzzy predictor ensemble, IEEE Trans. Fuzzy Systems 5 (4) (1997) 523–535. [31] T. Koutroumanidis, L. Iliadis, G.K. Sylaios, Time-series modeling of fishery landings using ARIMA models and fuzzy expected intervals software, Environmental Modelling Software 21 (12) (2006) 1711–1721. [32] S.-H. Lee, I. Kim, Time series analysis using fuzzy learning, in: Proc. Internat. Conf. on Neural Information Processing, Seoul, Korea, Vol. 6, October 1994, pp. 1577–1582. [33] J. Liska, S.S. Melsheimer, Complete design of fuzzy logic systems using genetic algorithm, in: Proc. of FUZZ-IEEE’94, 1994, pp. 1377–1382. [34] G.M. Ljung, G.E.P. Box, On a measure of lack of fit in time series models, Biometrika 65 (1978) 297–307. [35] L. Ljung, System Identification Theory for the User, Prentice-Hall, Englewood Cliffs, NJ, 1987. [36] S. Makridakis, A. Anderson, R. Carbone, R. Fildes, M. Hibon, R. Lewandowski, J. Newton, E. Parzen, R. Winker, The Forecasting Accuracy of Major Time Series Methods, Wiley, New York, 1984. [37] S. Makridakis, S. Wheelwright, R. Hyndman, Forecasting: Methods and Applications, third ed., Wiley, New York, 1998 http://www-personal.buseco.monash.edu.au/∼hyndman/forecasting/gotodata.htm. [38] Z. Michalewicz, Genetic Algorithms + Data Structures = Evolution Programs, Springer, Berlin, 1996. [39] T. Minerva, I. Poli, Building ARMA models with genetic algorithms, in: Lecture Notes in Computer Science, Vol. 2037, 2001, pp. 335–342. [40] D. Nauck, R. Kruse, How the learning of rule weights affects the interpretability of fuzzy systems, in: Proc. 7th IEEE Internat. Conf. on Fuzzy Systems, Anchorage, May 4–9, 1998, pp. 1235–1240.

O. Valenzuela et al. / Fuzzy Sets and Systems 159 (2008) 821 – 845

845

[41] C.-S. Ong, J.-J. Huang, G.-H. Tzeng, Model identification of ARIMA family using genetic algorithms, Appl. Math. Comput. 164 (3) (2005) 885–912. [42] P.-F. Pai, C.-S. Lin, A hybrid ARIMA and support vector machines model in stock price forecasting, Omega 33 (6) (2005) 497–505. [43] Y.R. Park, T.J. Murray, C. Chen, Predicting sun spots using a layered perceptron neural network, IEEE Trans. Neural Networks 7 (2) (1996) 501–505. [44] H. Pomares, I. Rojas, J. Gonzalez, A. Prieto, Structure identification in complete rule-based fuzzy systems, IEEE Trans. Fuzzy Systems 10 (3) (2002) 349–359. [45] H. Pomares, I. Rojas, J. Ortega, J. Gonzalez, A. Prieto, A systematic approach to a self-generating fuzzy rule-table for function approximation, IEEE Trans. Systems Man Cybernet. B 30 (3) (2000) 431–447. [46] I. Rojas, H. Pomares, J. Ortega, A. Prieto, Self-organized fuzzy system generation from training examples, IEEE Trans. Fuzzy Systems 8 (1) (2000) 23–36. [47] I. Rojas, H. Pomares, J.L. Bernier, J. Ortega, B. Pino, F.J. Pelayo, A. Prieto, Time series analysis using normalized PG-RBF network with regression weights, Neurocomputing 42 (1–4) (2002) 267–285. [48] I. Rojas, H. Pomares, J. Gonzales, et al., Analysis of the functional block involved in the design of radial basis function networks, Neural Process. Lett. 12 (1) (2000) 1–17. [49] I. Rojas, J. Ortega, F.J. Pelayo, A. Prieto, Statistical analysis of the main parameters in the fuzzy inference process, Fuzzy Sets and Systems 102 (2) (1999). [50] I. Rojas, O. Valenzuela, M. Anguita, A. Prieto, Analysis of the operators involved in the definition of the implication functions and in the fuzzy inference process, Internat. J. Approx. Reason. 19 (3–4) (1998) 367–389. [51] I. Rojas, F. Rojas, H. Pomares, L.J. Herrera, J. Gonzalez, O. Valenzuela, The synergy between classical and soft-computing techniques for time series prediction, in: Advances in Artificial Intelligence, Lecture Notes in Computer Science, Vol. 2972, 2004, pp. 30–39. [52] I. Rojas, H. Pomares, J. Ortega, A. Prieto, Self-organized fuzzy system generation from training examples, IEEE Transactions on Fuzzy Systems 8 (1) (2000) 23–36. [53] S. Rolf, J. Sprave, W. Urfer, Model identification and parameter estimation of ARMA models by means of evolutionary algorithms, in: Proc. of IEEE/IAFE 1997 Computational Intelligence for Financial Engineering (CIFEr), 1997, pp. 237–243. [54] T.L. Seng, M.B. Khalid, R. Yusof, Tuning of a neuro-fuzzy controller by genetic algorithm, IEEE Trans. Systems Man Cybernet. B 29 (2) (1999) 226–235. [55] R. Shibata, Selection of the order of an autoregressive model by Akaike’s information criterion, Biometrika AC-63 (1) (1976) 117–126. [56] R.C. Souza, A.C. Neto, A bootstrap simulation study in ARMA(p, q) structures, J. Forecasting 15 (1996) 343–353. [57] Z. Tang, C. Almeida, P.A. Fishwick, Time series forecasting using neural networks vs. Box–Jenkins methodology, Simulation 57 (5) (1991) 303–310. [58] Z. Tang, P.A. Fishwick, Back-propagation neural nets as models for time series forecasting, ORSA J. Comput. 5 (4) (1993) 374–385. [59] L.J. Tashman, Out-of sample tests of forecasting accuracy: a tutorial and review, Internat. J. Forecasting 16 (2000) 437–450. [60] M. Teng, F. Xiong, R. Wang, Z. Wu, Using genetic algorithm for weighted fuzzy rule-based system, Intelligent Control Automat. (2004) 4292–4295. [61] R.S. Tsay, G.C. Tiao, Consistent estimates of autoregressive parameters and extended sample autocorrelation function for stationary and nonstationary ARMA model, J. Amer. Statist. Assoc. 79 (1) (1984) 84–96. [62] R.S. Tsay, G.C. Tiao, Use of canonical analysis in time series model identification, Biometrika 72 (2) (1985) 299–315. [63] F.M. Tseng, H. Yu, G. Tzeng, Combining neural network model with seasonal time series ARMA model, Technological Forecasting Social Change 69 (2002) 71–87. [64] L.X. Wang, J.M. Mendel, Generating fuzzy rules by learning from examples, IEEE Trans. Systems Man Cybernet. 22 (6) (1992) 1414–1427. [65] D.K. Wedding II, K.J. Cios, Time series forecasting by combining RBF networks, certainty factors, and the Box–Jenkins model, Neurocomputing 10 (1996) 149–168. [66] A.S. Weigend, N.A. Gershenfeld, Time Series Prediction, Addison-Wesley, Reading, MA, 1994. [67] B.A. Whitehead, T.D. Choate, Cooperative-competitive genetic evolution of radial basis function centers and widths for time series prediction, IEEE Trans. Neural Networks 7 (4) (1996) 869–880. [68] W.A. Woodward, H.L. Gray, On the relationship between the S array and the Box–Jenkins method of ARMA model identification, J. Amer. Statist. Assoc. 76 (3) (1981) 579–587. [69] A. Wright, Genetic Algorithms for Real Parameter Optimization, Foundation of Genetic Algorithms, Morgan Kaufmann Publishers, Los Altos, CA, 1991 pp. 205–218. [70] http://www-personal.buseco.monash.edu.au/∼hyndman/TSDL/physics.html/. [71] http://www.omatrix.com/stsa.html/ (STSA toolbox). [72] http://www.census.gov/srd/www/x12a/ (X-12-ARIMA software). [73] J. Yao, H.L. Poh, Forecasting the KLSE index using neural networks, in: Internat. Conf. on Neural Networks, Vol. 2, 1995, pp. 1012–1017. [74] G.P. Zhang, Time series forecasting using a hybrid ARMA and neural network model, Neurocomputing 50 (2003) 159–175.