ARTICLE IN PRESS
Neurocomputing 71 (2008) 519–537 www.elsevier.com/locate/neucom
Soft-computing techniques and ARMA model for time series prediction I. Rojasa,, O. Valenzuelab, F. Rojasa, A. Guillena, L.J. Herreraa, H. Pomaresa, L. Marquezb, M. Pasadasb a
Department of Computer Architecture and Computer Technology, University of Granada, Spain b Department of Applied Mathematic, University of Granada, Spain Available online 12 October 2007
Abstract The challenge of predicting future values of a time series covers a variety of disciplines. The fundamental problem of selecting the order and identifying the time varying parameters of an autoregressive moving average model (ARMA) concerns many important fields of interest such as linear prediction, system identification and spectral analysis. Recent research activities in forecasting with artificial neural networks (ANNs) suggest that ANNs can be a promising alternative to the traditional ARMA structure. These linear models and ANNs are often compared with mixed conclusions in terms of the superiority in forecasting performance. This study was designed: (a) to investigate a hybrid methodology that combines ANN and ARMA models; (b) to resolve one of the most important problems in time series using ARMA structure and Box–Jenkins methodology: the identification of the model. In this paper, we present a new procedure to predict time series using paradigms such as: fuzzy systems, neural networks and evolutionary algorithms. Our goal is to obtain an expert system based on paradigms of artificial intelligence, so that the linear model can be identified automatically, without the need of human expert participation. The obtained linear model will be combined with ANN, making up an hybrid system that could outperform the forecasting result. r 2007 Elsevier B.V. All rights reserved. Keywords: Soft computing; Time series; ARMA model; Neural networks
1. Introduction For more than half-century, the Box–Jenkins methodology using autoregressive moving average model (ARMA) linear models have dominated many areas of time series forecasting. In 1970, Box and Jenkins [4], made ARMA models popular by proposing a model building methodology involving an iterative three-stage process of model selection, parameter estimation and model checking. Recent explanations of the process (e.g., Makridakis et al. [24]) often add a preliminary stage of data preparation and a final stage of model application (or forecasting). (1) Data preparation involves transformations and differencing. Transformations of the data (such as square roots or logarithms) can help stabilize the variance in a series where the variation changes with the level. This Corresponding author. Fax: +34958248994.
E-mail address:
[email protected] (I. Rojas). 0925-2312/$ - see front matter r 2007 Elsevier B.V. All rights reserved. doi:10.1016/j.neucom.2007.07.018
often happens with business and economic data. Then the data are differenced until there are no obvious patterns such as trend or seasonality left in the data. ‘‘Differencing’’ means taking the difference between consecutive observations, or between observations a year apart. The differenced data are often easier to model than the original data. (2) Model selection in the Box–Jenkins framework uses various graphs based on the transformed and differenced data to try to identify potential ARIMA processes which might provide a good fit to the data. Later developments have led to other model selection tools such as Akaike’s Information Criterion. (3) Parameter estimation means finding the values of the model coefficients that provide the best fit to the data. There are sophisticated computational algorithms designed to do this. (4) Model checking involves testing the assumptions of the model to identify any areas where the model is inadequate. If the model is found to be inadequate, it
ARTICLE IN PRESS 520
I. Rojas et al. / Neurocomputing 71 (2008) 519–537
is necessary to go back to Step 2 and try to identify a better model. (5) Forecasting is what the whole procedure is designed to accomplish. Once the model has been selected, estimated and checked, it is usually a straightforward task to compute forecasts. Of course, computer does this. The problem of estimating the order and the parameters of a model such as ARMA is an active area of research [7,35,41]. When modelling linear and stationary time series, one frequently chooses the class of ARMA models because of its high performance and robustness. The selection of a particular ARMA model, however, is neither easy nor without implications for the goal of the analysis [5]. The justification for automatic ARMA modelling is the following: (a) the method for building an ARMA model is somewhat complex and requires a deep knowledge of the method; (b) consequently, building an ARMA model is often a difficult task for the user, requiring training in statistical analysis, a good knowledge of the field of application, and the availability of an easy to use but versatile specialized computer program; (c) the number of series to be analysed is often large. It is important to note, that nowadays, the most used commercial tools for the time-series forecasting (Statgraphics, SPSS, etc.) required intervention of an human expert for the definition of the ARMA model. The drawbacks of the linear methods, as well as the development of artificial intelligence, have led to the development of alternative solutions using nonlinear modelling. Neural networks are essentially a nonlinear modelling approach that provide a fairly accurate universal approximation to any function, therefore it can be trained to predict the future values of a dependent variable. While parameters of the aforementioned nonlinear models need to be determined, neural networks are appealing because no a priori postulation of the models is necessary for the system or process under consideration. It is important to note that in the literature of time series forecasting with artificial neural networks (ANN), the ARMA model is used as a benchmark to test the effectiveness of the proposed methodology [16,42]. The emergence of various neural network topologies and efficient learning algorithms have led to a wide range of successful applications in forecasting [2,8,10,14,21,26, 30–32,45]. The results reported in many of these studies and applications are frequently benchmarked against those produced by Box–Jenkins’ ARIMA modelling approach [12,19,37,39,43,45]. Although performance is of primary concern, the choice of the Box–Jenkins model may be partly due to its sound theoretical basis and the numerous research publications available. For example, [39] conducted a comparative study of backpropagation networks versus ARMA models on selected time-series data from the well-known M-competition [23]. It was concluded that neural networks not only could provide better longterm forecasting but also could do a better job than
ARMA models when given short series of input data. Although the application of neural network modeling to Box–Jenkins models encompasses two sub-areas, that is, model identification (by a statistical feature extractor, perform a pattern classification) and forecasting (as a function approximators), the scope of this paper will be focused on forecasting. In this paper, however, the hybridization (instead of the competition) of lineal and nonlinear model is proposed in two different steps, taking into account that in general, time series are rarely pure linear or nonlinear. The proposed system combines both ARMA and neural network models, making a hybrid system that can outperform both models when used separately. First, the ARMA model would try to obtain the linear relation. This step is performed autonomously (without the intervention of a human expert) using a fuzzy expert system. After that, the neural network can forecast the residual or the error between the linear output obtained by the ARMA system and the real output. 2. A brief review of ARMA models Box and Jenkins described a general linear stochastic model as one that produces output whose input is white noise et,or a weighted sum of historical et [4,5]. Mathematically, it can be expressed as follows: Y~ t ¼ m þ t j1 t1 j2 t2 jq tqt ,
(1)
where m is the mean of a stationary process, and jt, t=1,2,y, are coefficients which satisfy 1 X
j2i o1,
(2)
i¼0
where et is an uncorrelated random variable with mean zero and constant variance se2. However, Eq. (1) is not very practical because it involves an infinite number of terms. It is thus convenient to express Eq. (1) in terms of a finite number of autoregressive (AR) and/or moving average (MA) components. If Yt is defined as Y~ t m, the deviation of the process from some origin, or from its mean, an AR(p) process can be generally expressed as follows: Y t ¼ f1 Y t1 þ f2 Y t2 þ þ fp Y tp þ t .
(3)
Likewise, an MA(q) process can be expressed as follows: Y t ¼ t y1 t1 y2 t2 yq tqt .
(4)
Thus, a general express ion for a mixed ARMA(p, q) process can be defined as Y t ¼ f1 Y t1 þ f2 Y t2 þ þ fp Y tp þ t y1 t1 y2 t2 yq tqt ,
ð5Þ
where ft and yt are coefficients that satisfy stationarity and invertability conditions, respectively. Using the backward
ARTICLE IN PRESS I. Rojas et al. / Neurocomputing 71 (2008) 519–537
521
Table 1 Examples of real time series that can be correctly characterized and predicted using ARMA model (taking from [24]) Real data series
Degree of differencing
Possible structure of differenced series
Chemical process concentration readings: every 2 h IBM common stock closing prices: daily (17 May 1961–2 November1962) Monthly demand of repair parts for heavy equipment Iowa (1972–1979) Monthly shipment of pollution equipment (an initial logarithmic transformation is required to have an homogeneous variance) Australian monthly electricity production International airline passengers: monthly total (Jan 1949–Dec 1960) Wolfer sunspot number: yearly
0 1 Nonseasonal: 1, Seasonal:1 Nonseasonal: 1
ARMA(1,1) MA(1) MA(1) MA(1)12 AR(2,0) AR(1)12
Nonseasonal: 1, Seasonal:1 Nonseasonal: 1 0
AR(2,0) AR(2)12 MA(1) MA(1)12 AR(2)
shift operator B, we can define Eq. (5) as
sometimes the correlogram. Note that since:
FðBÞY t ¼ YðBÞt ,
gk ¼ CovðY t ; Y tk Þ ¼ CovðY tk ; Y t Þ ¼ CovðY t ; Y tþk Þ ¼ gk ,
FðBÞ ¼ ð1 f1 B f2 B2 fp Bp Þ, YðBÞ ¼ ð1 y1 B y2 B2 yp Bp Þ.
ð6Þ
Suppose Yt is the output of a stationary ARMA (P,Q) process, then the process can be generally described by Eq. (6) and we can use the building blocks of Box– Jenkins time series modelling approach. Although this approach has often been criticized to be unnecessarily complicated and difficult to use, it is indeed a powerful, structural and formal approach to the solution of many forecasting problems [24,45]. Table 1 lists the possible ARMA structures of some real data series after being properly differenced with or without seasonal components. Seasonal models, however, are beyond the scope of this paper. Two statistical functions are used for the graphical identification of ARMA model using the Box–Jenkins methodology, the autocorrelation function (ACF) and the partial autocorrelation function (PACF). For a weak stationary stochastic process, the first and second moments exist and do not depend on time: EðY 1 Þ ¼ EðY 2 Þ ¼ ¼ EðY t Þ ¼ m, V ðY 1 Þ ¼ V ðY 2 Þ ¼ ¼ V ðY t Þ ¼ s2 , CovðY t ; Y tk Þ ¼ CovðY tþl ; Y tkþl Þ ¼ gk .
ð7Þ
The conditions in the above equations state that the covariances are functions only of the lag k, and not of time. These are usually called autocovariances. From (7) we can easily derive that the autocorrelations, denoted as rk also only depend on the lag. CovðY t ; Y tþk Þ E½ðY t mÞðY tþk mÞ rk ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffipffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ¼ sky VarðY t Þ VarðY tþk Þ g ¼ k g0
ð8Þ
The autocorrelations considered as a function of k are referred to as the autocorrelation function, ACF, or
ð9Þ
it follows that gk ¼ gk , and only the positive half of the ACF is usually given. In practice, we have a finite time series with N observations, and therefore, we can only obtain the estimated autocorrelation. If rk denotes the estimated autocorrelation coefficient, the formula to obtain these parameters is Pnk ðY t mÞðY tþk mÞ rk ¼ t¼1Pn . (10) 2 t¼1 ðY t mÞ And the partial ACF can be obtained as wt ¼ ðY t mÞ ¼ F1k wt1 þ F2k wt2 þ þ Fkk wtk þ t .
ð11Þ
3. Automatic selection of the structure of the ARMA model On many occasions, a series is not exclusively fitted to one model, but rather various models may equally well fit the series [16]. If we follow the norms of Box and Jenkins, the model chosen is nearly always the simplest one, i.e. that involving the fewest terms [5]. The proposed methodology presented in this paper was carried out in the following way: the time series of a model must fulfill a set of rules. Given such a rule set, and a time series to be analysed, according to the number of rules fulfilled by this series, it will be of one or another type. A large number of rules can be defined for the simplest AR(1) and MA(1) models, but when these become more complicated, the number of clearly utilizable rules is much lower. This can give rise to problems. For example, let us assume that a single rule is defined for an AR(2) model and that five are defined for an AR(1) model. It is then possible that an AR(2) time series may fulfill 2 of the 5 rules defined for an AR(1) model, as in most cases these rules are not unequivocal. Of course, the given AR(2) series also fulfils the rule corresponding to its own model. Thus, we will eventually have an AR(2) series that fulfils one rule corresponding to the AR(2) model and
ARTICLE IN PRESS 522
I. Rojas et al. / Neurocomputing 71 (2008) 519–537
two that correspond to the AR(1) model. Therefore, it will be identified by the program as AR(1) and an erroneous result will be obtained. To overcome this kind of problem, we propose the following: firstly, a weight should be assigned to each rule, so that what is finally measured is not the number of rules fulfilled by the series, but rather the sum of the weights of these rules. This procedure provides more accurate results. Secondly, the number of rules defined for each model is approximately the same, and so there is not such a great difference as in the above example. In summary, what we have to achieve is a program that, on the basis of a given number of rules, is capable of assigning weights to each so that the series that are analysed can be correctly identified. For this reason genetic algorithms are used to assign weights to the various rules, being this the novel aspect of the current project. In fact, the genetic algorithms can optimize the weights of all the rules within the fuzzy expert system, so that the number of misclassifications of the structure of the ARMA model could be minimized. As observed in Section 1, using the Box–Jenkins methodology, trial-and-error methods have to be applied in order to obtain the structure of the ARMA model [4,16,34,40]. Fundamentally, a visual examination has been made of the estimated ACF and the estimated PACF, from which relevant conclusions are drawn [5,42]. This technique, naturally, requires a great deal of skill and long practice. On the basis of this visual examination, the various possible models are identified. An estimation is made of the F and y coefficients, and a decision is taken regarding which of the estimated models best fits the series, using mathematical tests. If there are two models fitted equally well by the series, the simpler one is chosen. In this paper, however, an automatic process for selecting the structure of the ARMA model is proposed, without the intervention of a human expert. Two main advantages can be obtained with the proposed fuzzy expert system: (a) there is no need of a human expert in order to decide which are the possible models for the time series (b) designing an automatic process, a big quantity of time series can be processed. 3.1. Initial rules used Before the learning program starts to test the rules fulfilled by each series, a number of simple tasks must be performed: (1) Obtain the estimated ACF and PACF coefficients and the error criterion. For this purpose, the formula of ACF and PACF are used. (2) Check that the series is not white noise. If 95% of the samples fall within the error criterion, the series is white noise, the model is classified (0,d,0) (where d is the differentiation) and no analysis may be undertaken. This check might not be of any use for the learning
program, as what is received by the program is a simulated series and not white noise. (3) Exponential fit of the first terms of the ACF and of the PACF. The program attempts to determine whether the shape of the ACF and of the PACF is similar to that of any of the theoretical shapes of the various models described above. To do this, an exponential fit is carried out on the first terms of the ACF and of the PACF, fitting them to an exp(bx) curve. By these means, values of b are obtained for the ACF and the PACF, and these values will be used, together with those of the correlation coefficient, to help identify the model. (4) Determine the spikes in the coefficients of autocorrelation and of partial autocorrelation. In fact, if the series is (for example) of the AR(1) type, it will present a spike in the PACF. This is what is determined in this stage of the procedure. As the program is unaware of the type of series presented, it checks the number of ACF and PACF coefficients that are 70% above the error criterion. (5) Estimate F and y. When this step is reached, we still do not know the series type, but it is possible to estimate F and y for various known types of series, such that we perform an estimation of coefficients for the following models: AR(1)-estimating the value of F1 AR(2)-estimating the values of F1 and F2. MA(1)-estimating the value of y1 MA(2)-estimating the values of y1 and y2. ARMA(1,1)-estimating the values of F1 and y1. For example for an AR(2) model in the form: Y t ¼ F1 Y t1 þ F2 Y t2 þ .
(12)
To ensure the stationary condition, the following condition should be fulfilled: F1 þ F2 o1, F2 F1 o1, 1oF2 o1.
ð13Þ
Several methods exist to estimate the autoregressive parameters, such as least squares, Yule–Walker and Burg’s method. It can be shown that for large data samples these estimation techniques should lead to approximately the same parameter estimates. Mainly for historical reasons, most people use either the Yule–Walker or the leastsquares method [9,33,34,36,40] . In order to obtain the values F1 ; F2 it is necessary to resolve:
F1 ¼
r1 ð1 r2 Þ , 1 r21
F2 ¼
r2 r21 . 1 r21
ð14Þ
ARTICLE IN PRESS I. Rojas et al. / Neurocomputing 71 (2008) 519–537
In general for a p order autoregressive process, AR(P), the following equation should be solved: 0
1 0 1 10 1 r1 r1 r2 ::: rp2 rp1 F1 B C B C CB B r2 C B r1 1 r1 ::: rp3 rp2 C B F2 C B C B C CB B C B C CB B : C B : : : : : : CB : C B C B C CB B C¼B C CB B : C B : : : : : : CB : C B C B C CB B C B C CB B rp1 C B rp2 rp3 rp4 ::: 1 r1 C B Fp1 C @ A @ A A@ rp rp1 rp2 rp3 ::: r1 1 Fp |fflfflfflfflffl{zfflfflfflfflffl} |fflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl} |fflfflfflfflfflffl{zfflfflfflfflfflffl} r
R
or succintly RF ¼ r.
F
ð15Þ
Note that this is a well-posed system (with a square coefficients matrix R ), i.e., with the same number of constraints (equations, R’s rows) as unknowns (the element Fj of the unknown vector F). Further, R is full rank and symmetric, so that ^ ¼ R1 r. Analysing invertability is guaranteed, F now an MA process, the simple one is an MA(1) in the form: Y t ¼ t y1 t1 ,
(16)
in order to obtain the parameter of the time series, the equation to be solved is y21 þ
y1 þ 1 ¼ 0. r1
(17)
Obtaining only one solution which fulfils also the invertibility condition jy1 jo1. For an MA(2), the equations are more difficult to be solved: y1 ð1 y2 Þ , 1 þ y21 þ y22 y2 r2 ¼ , 1 þ y21 þ y22 r1 ¼
rk ¼ 0;
kX3.
ð18Þ
In this way, if (for example) the series is MA(2), the coefficient that is estimated for the F1 case will not fulfil the corresponding mathematical rules and this model will be rejected. (6) Test changes of sign in the ACF and the PACF. This is performed in order to determine which model is best fitted.
523
mathematical, and tests whether the estimated F and y coefficients fulfill all the requirements. Finally, we have a set of negative rules; if these are fulfilled, then the series does not correspond to the model. The last step is to subtract the weights of these rules, rather than summing them, as is done for the other rules. The expert system that is created consists of a total of 43 fuzzy rules (Table 4). The expert system uses the Mamdani inference model, as T-norm the minimum operator was selected, as defuzzifier the Centre of Area, and the type of membership function was triangular functions, being this kind of inference process very common in the bibliography [28,29]. We describe some of these rules, together with the justification for their inclusion. Rule 1. If the PACF falls more abruptly than the ACF, then the model is AR(p), where p is the PACF number immediately above the error criterion. This rule is suggested by the shape of the AR models. In such a model, the ACF fall smoothly, while the PACF fall abruptly. The number of PACF above the error criterion will be the order of the AR model. To determine this, the program uses the previously obtained b values of the exponential fit. The exponential presenting the most abrupt change is the one with the highest absolute b value, such that if the b calculated for the PACF is greater than that for the ACF, the model will be AR. To determine the order, we examine the number of spikes in the PACF. As the series are not ideal, but have a component of white noise, we only take into consideration the PACF that are 70% above the error criterion. Rule 2. If the ACF falls more abruptly than the PACF, then the model is MA(q), where q is the number of ACF above the error criterion. This rule is the inverse of the previous one, and the explanation is analogous. As before, the series is not an ideal one, and so to determine the ACF we take into consideration the ACF that are 70% above the error criterion. Rule 21. If the estimated F1 and F2 coefficients fulfil the stationarity rules, then the series corresponds to the AR(2) model. The stationarity conditions for this type of series are F1 þ F2 o1,
3.2. Rules of the expert system
F2 F1 o1, 1oF2 o1.
Once all the above tasks have been accomplished, we can begin to apply the rules for identification. Two types of rule are used, the first of which is somewhat subjective. Our aim is to determine whether the shapes of the ACF and of the PACF are similar to any of the theoretical shapes of the above-described models. The second rule type is purely
The coefficients must fulfill the three requirements simultaneously; otherwise, the series will not be valid. As stated above, the system thus created is made up of 43 rules, each of which is assigned a value or relative weight. The last eight rules attempt to combine or identify mixed model when both AR and MA models are plausible.
ARTICLE IN PRESS I. Rojas et al. / Neurocomputing 71 (2008) 519–537
524
3.3. Using an evolutionary algorithm to optimize the parameters of the fuzzy system After having tested which rules are fulfilled by a series, the following step is to identify the model, taking into account the weights assigned. This is done by adding up the weights of the rules fulfilled by the series, taking into account the model to which they refer. In other words, if 5 rules indicate that the series is of the type AR(1), while 3 say it is of type AR(2), we must sum the weights of the 5 rules, on the one hand, and those of the 3 rules, on the other. The larger of these sums then corresponds to the model that is sought. The above procedure must be carried out bearing in mind that some rules present negative weights. Some of the rules presented in Table 4 are restrictive, that is, if they are fulfilled then the series is not of a given type. The weights corresponding to these restrictive rules must be subtracted. A genetic algorithm is used to determine the weight assigned to each rule. The limits of the weights range from 0 to 1. The most complex task is the creation of the fitness function, which must perform the following tasks: (1) Using the learning program, evaluate various series of known types, obtaining one or more results for each. (2) Calculate the distances of the possible models obtained for each series from the real model, and store a distance for each series. If various models are possible for a given series, test whether the one assigned the highest weight is at the lowest distance from the real result. If so, there is no penalization, but otherwise what is stored is the average of the distances of the possible models from reality. This is one way of penalizing a surplus of possibilities. (3) Sum the distances thus obtained. The fitness function seeks to minimize the distance. As a maximization algorithm must be applied, the function to be maximized is then: F ¼P
1 , d i i þ
(19)
where e is a constant required to avoid an infinite result with zero distances, and where di is the distance from the model obtained for the series i to the real model. The distance used is the Euclidean distance, i.e. given two vectors v1 ¼ ðx; y; zÞ and v2 ¼ ða; b; cÞ, the distance between them is
dðv1; v2Þ ¼
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ðx aÞ2 þ ðy bÞ2 þ ðz cÞ2 .
(20)
We analysed a large quantity of real series used as benchmarks in the prediction of time series (http:// www.secondmoment.org/time_series.php). Mainly, we use real time series from Library of Time Series Data, which is
a library of time series data, including many major US time series, as well as time series from Japan, Canada, and Europe and collection of annual macroeconomic time series data for the US, Canada, Japan, and many European countries, the Macroeconomic Time Series Data. However, because some of the real time series are not previously classified as an ARMA(p,q) model, we have also simulated time series generated from a wide range of coefficient values, chosen from various sub-regions of the parameter space that satisfy the stationary and invertability conditions. It is important to note, that each of these subregions represents a special class of structures with similar autocorrelation functions and partial autocorrelation functions. Representative sets of coefficient values selected from each of these sub-regions ensure the extensive coverage of the permissible parameter space. For each of the simulated series, 1000 data points are generated. Table 2 presents some of the models used, which also have been selected as benchmark in time series research. For the simplest models (AR(1), MA(1), AR(2), MA(2), ARMA(1,1), ARMA(2,1), ARMA(2,2)), all the coefficients used are presented, however, because there are a lot of possible combinations for complex ARMA(p,q) processes, just an example of ARMA(3,2) and ARMA(4,2) are presented. Without loss of generality, all the simulated series were generated based on a m ¼ 0 and s2 ¼ 1. We have to point out that the ratio of s2 to m affects the magnitude of the error index typically used in time series forecasting, RMSE and MAPE, significantly, as can be analysed by its expression: sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi n 1X RMSE ¼ ðY t Y¯ t Þ2 ; n t¼1 n 100 X Y t Y¯ t MAPE ¼ . ð21Þ n t¼1 Y t The results obtained for the simulations, except in the case of the mixed series, were highly promising, obtaining for AR(p) models classification errors of 1.2% and for MA(q) model classification errors of 12.5% for a total of 1500 time series. Therefore, this ARMA(P,Q) structure will be used in conjunction with the neural network. 4. Using simultaneously ARMA and neural network models Although most of the studies motivated in the comparison of ARMA and ANN models reviewed in Section 1 indicate neural network models’ comparability or superiority to ARMA structure for particular time series, it is questionable whether or not ANN can consistently outperform ARMA models in all situations. Therefore, none of them is an universal model that is suitable for all circumstances. The approximation of ARMA models to complex nonlinear problems may not be adequate. On the other hand, using neural networks to model linear problems has yielded mixed results.
ARTICLE IN PRESS I. Rojas et al. / Neurocomputing 71 (2008) 519–537
525
Table 2 Different structures and coefficients used for simulated time series Structure
Coefficients
Structure
Coefficients
MA(1)
0.8 0.2 0.4 0.8 0.6 0.6
AR(1)
0.8 0.2 0.2 0.8 0.6 0.6
AR(2)
1.2, 1.5, 0.5, 0.5, 1.2, 1.2, 0.1, 0.2, 0.3, 0.5, 0.5,
1.7 0.9 0.5 0.2 0.9 0.9 0.8 0.2 0.5 0.5 0.2
ARMA(1,1)
0.9, 0.8, 0.5, 0.5, 0.3, 0.2, 0.1, 0.2, 0.3, 0.5, 0.5,
MA(2)
1.2, 0.5, 0.5, 0.1, 0.2, 0.3, 0.5, 0.5,
0.7 0.5 0.2 0.8 0.2 0.5 0.5 0.1
ARMA(1,2)
0.2, 0.1, 0.2, 0.2, 0.8, 0.8, 0.8, 0.8,
0.2 0.9, 0.8 0.8, 0.2 0.8, 0.8 0.2, 0.2 0.2, 0.8 0.2, 0.2 0.2, 0.8 0.9, 0.5 0.9, 0.7
ARMA(2,2)
1.5, 1.5, 1.5, 0.1, 0.1, 0.1, 0.2, 0.2, 0.2, 0.5, 0.5, 0.5,
ARMA(4,2)
0.9652 0.8849 0.6265 0.9103 0.8720 0.9211 0.7508
ARMA(2,1)
ARMA(3,2)
1, 5, 0.9, 1.5, 0.1, 0.1, 0.2, 0.2, 0.5, 0.5, 1.2, 1.2,
0.4136 0.6863 0.6686 0.7191 0.9050 0.8112 0.7848
0.6480 0.7427 0.4883 0.3943 0.8525 0.3987 0.9083
0.2758 1.2 0.7 0.4727 1.2 0.7 0.4083 0.5, 0.5 0.2984 0.1, 0.8 0.3700 0.1, 0.8 0.5579 0.5, 0.5 0.5383 0.5, 0.5
The motivation of this work is that real-world time series are rarely pure linear or nonlinear. They often contain both linear and nonlinear terms. The human expert does not know a priori with model is more suitable to be used. However, if we combine both models, making a hybrid system could outperform the forecasting result. First, the ARMA model would try to obtain the linear relation. This step is performed autonomously (without the intervention of a human expert) using the fuzzy expert system proposed in the previous section. After that, the neural network can forecast the residual or the error
0.9 0.9 0.3 0.5 0.5 0.1 0.2 0.9 0.5 0.5 0.3 1.5, 0.9 0.1, 0.7 0.2, 0.2 0.5, 0.2 1.5, 0.9 0.1, 0.8 0.2, 0.2 0.5, 0.2 0.9, 0.1, 0.8 0.9, 0.2, 0.2 0.9, 0.5, 0.2 0.8, 1.5, 0.9 0.8, 0.2, 0.2 0.8, 0.5, 0.2 0.2, 1.5, 0.9 0.2, 0.1, 0.8 0.2, 0.5, 0.2 0.2, 1.5, 0.9 0.2, 0.1, 0.8 0.2, 0.2, 0.2 0.7460 0.9426 0.6676 0.4464 0.9746 0.6304 0.4566
0.6984 0.6335 0.5, 0.5 0.8059 0.4882 0.5, 0.5 0.5922 0.3082 0.5, 0.5 0.5072 0.3976 0.1, 0.8 0.8816 0.3749 0.1, 0.8 0.3376 0.3015 0.1, 0.8 0.4256 0.4654 0.1, 0.8
between the linear output obtained by the ARMA system and the real output. Most of the time series can be decomposed in a linear and nonlinear component in the following way [48]: Y ðtÞ ¼ AðtÞ þ Fðf ðtÞÞ,
(22)
being AðtÞ the linear component and Fðf ðtÞÞ represents the nonlinear component. The linear component can be captured using the ARMA system developed in Section 3. However, the error in the forecasting (the residual), should be produced because the nonlinear component cannot be
ARTICLE IN PRESS I. Rojas et al. / Neurocomputing 71 (2008) 519–537
526
represented by the ARMA model. Therefore, the difference between the time series forecasting using the ARMA model and the real time series will be the input to the neural network. It is important to note, that in order to obtain the linear model, the first step is to determine if a natural log transformation of the series is necessary. This step is accomplished using the likelihood ratio. It is necessary to take into account that the likelihood is greater for the
transformed series than the raw series when: 0
!2 1 X X 1 1 log@ logðY i Þ logðY Þ A n i n i ! ! 1X 2X 2 ¯ 4 log ðY i Y Þ þ logðY i Þ . n i n i
ð23Þ
Using only ANNs Original Time Series
Forecasting
Artificial Neural Network
Using only ARMA model Original Time Series
Identification ARMA (p,d,q) Model
Human Expert’s Judgments
Estimation
Box-Jenkins methodology Diagnosis Checking
Forecasting
Proposed hybrid methodology Original Time Series
Statistical Feature Extractor
Fuzzy Rules
Fuzzy Inference
Automatic Identification
ARMA Identification and Estimation
Residual Artificial Neural Network
Forecasting Fig. 1. Different time series forecasting procedures. (A) Automatic forecasting with ANN; (B) traditional ARMA modelling procedure using Box–Jenkins methodology and (C) proposed hybrid method that combines both ARMA and ANN models.
ARTICLE IN PRESS I. Rojas et al. / Neurocomputing 71 (2008) 519–537
If the inequality is fulfilled, it is necessary to take the log transform. Therefore, the proposed methodology of the hybrid system consists of two steps. In the first step, an ARMA model is used to analyse the linear component of the time series and in the second step a neural network model is developed to model the residuals from the ARMA model. Fig. 1 shows a block diagram of different time series modelling procedures. To design the ANN, a fully connected multilayer perceptron (MLP) with two hidden layers and one output layer has been used. To configure the structure of the MLP, we need to describe the number of
527
hidden layers, the number of hidden neurons in each layer, and the types of activation functions for each hidden neurons. Another important task of ANN modeling of a time series is the selection of the number of lagged observations, i.e. the dimension of the input vector. This is perhaps the most important parameter to be estimated in an ANN model because it plays a major role in determining the (nonlinear) autocorrelation structure of the time series. In fact, modeling nonlinear complex systems using neural networks, the number of inputs and the number of hidden nodes in the networks have to be restricted to avoid over-fitting [27].
DESIRED (SOLID LINE) AND PREDICTED (DASHED LINE) 20 15 10 5 0 –5 –10 –15 –20 0
200
400
600
800
1000
1200
PREDICTION ERRORS 3
2
1
0
–1
–2
–3 0
500
1000
1500
Fig. 2. Lorenz time series (prediction step ¼ 1): (a) result of the original and predicted Lorenz time series using only ANN and (b) prediction error.
ARTICLE IN PRESS I. Rojas et al. / Neurocomputing 71 (2008) 519–537
528
The selection of inputs is an important pre-processing stage and a commonly used method is to try different combinations of the system input and output variables with time lags and choose the one giving the minimum prediction error [1,3]. This method becomes increasingly difficult when applied to multivariable large-scale dynamic systems. In cases where there is little prior knowledge of the system that could be used to restrict the input dimension, one has to resort to automatic methods for selection of relevant input variables. In nonlinear modelling, there are no fully general criteria for making such choices, even though many papers have been published on how to tackle the problem that can be categorized into two
groups, i.e. model-based methods and model-free methods [1,3,6,11,13,15,25,47]. The model-based approach is widely used in input selection, and it typically involves the selection of neural structure (number of neurons, hidden layer, non-linear function, etc.). The computational complexity of these methods used to be extremely high. Modelfree methods are based on performing statistical tests or estimating properties of functions between the subsets of input variable(s) and the desired output(s) using only available data. For many model-free methods however, in order to estimate the interaction of system variables, some independence assumptions are often made on the input variables. These techniques are, however, often based on
DESIRED (SOLID LINE) AND PREDICTED (DASHED LINE). HYBRID SYSTEM 20 15 10 5 0 –5 –10 –15 –20 0
200
400
600
800
1000
1200
800
1000
1200
PREDICTION ERRORS 0.8 0.6 0.4 0.2 0 –0.2 –0.4 –0.6 –0.8 –1 0
200
400
600
Fig. 3. Lorenz time series (prediction step ¼ 1): (a) result of the original and predicted Lorenz time series using the proposed methodology (which are indistinguishable) (b) prediction error.
ARTICLE IN PRESS I. Rojas et al. / Neurocomputing 71 (2008) 519–537
an assumption of linearity of the problem at hand, while the use of a nonlinear modelling tool, such as neural networks, is mainly motivated by the presence of nonlinearities. To solve this problem, we have used the inputs proposed for the linear model, in order to identify and forecast the residual. This is of course a tentative structure of the neural network.
LORENZ ATTRACTOR
2.5
Training Test
2
RMSE
5. Simulation results
529
1.5
1
Once the forecasting methodology has been established, one can proceed with intensive experimental studies. In this section, we provide two numerical examples to evaluate the advantages and the effectiveness of the proposed approach. These include Lorenz attractor, and Mackey–Glass.
0.5
0 1
2
3
5.1. Lorenz attractor time series The Lorenz attractor time series was generated by solving the Lorenz equations dx1 ðtÞ ¼ sð2 ðtÞ x1 ðtÞÞ, dt dx2 ðtÞ ¼ rx1 ðtÞ x2 ðtÞ x1 ðtÞx3 ðtÞ, dt dx3 ðtÞ ¼ x3 ðtÞb þ x1 ðtÞx2 ðtÞ, ð24Þ dt where the parameters are set at the standard values s=10, r=28 and b=8/3. Solutions to this system of three differential equations exhibit the sensitive dependence on initial conditions that is characteristic of chaotic dynamics. In realistic situations, knowledge of the true state of a system can be done only in finite precision. In such cases, sensitivity to initial conditions rules out long-term prediction. On the other hand, short-term prediction is possible to the extent that the current position can be estimated and that the dynamics can be approximated. A long trajectory of the Lorenz attractor (1500 points) was generated using a differential equation solver (Runge–Kutta method) with step size of 0.05 to create a univariate time series (x1(t)). The data was split into two parts: 1127 points were used for training and the remaining 361 for assessing the generalization capability of the network. We used root mean squared error (RMSE) and the correlation coefficient as performance measures. Fig. 2 shows the Lorenz time series forecasting, the predicted and desired values, dashed and continuous lines, respectively, using only ANN. Fig. 3 shows the prediction of the Lorenz attractor using the proposed methodology (ARMA (3,1) and a MLP with two hidden layers, using sigmoidal activation functions, and four input variables were used). As they are practically identical, the difference can only be seen on a finer scale (Fig. 3b). The error indices, the root mean square error and the correlation coefficient, for this simulation were 0.176 and 0.998 for the model using only ANN and 0.0876 and 0.9996 for the
4 5 6 7 PREDICTION STEP
8
9
10
Fig. 4. Result of prediction the Lorenz attractor (change of RMSE by prediction step).
1 0.995 0.99 0.985 0.98 0.975 0.97 0.965 0.96 0.955 0.95 1
2
3
4
5
6
7
8
9
10
PREDICTION STEP Fig. 5. Result of prediction the Lorenz attractor (change of correlation by prediction step).
proposed algorithm. It is important to note that other approaches appeared in the bibliography, for example Iokibe et al. [17] obtain an RMSE of 0.244, Jang et al. [18] an RMSE of 0.143, using fuzzy and neuro-fuzzy systems. Figs. 4 and 5 show the results of predicting the time series of the Lorenz attractor where the prediction steps change (RMSE and correlation coefficient). As expected, the greater the prediction step, the lower the correlation coefficient and greater the RMSE. 5.2. Mackey–Glass chaotic time series The chaotic Mackey–Glass differential delay equation is recognized as a benchmark problem that has been used and
ARTICLE IN PRESS I. Rojas et al. / Neurocomputing 71 (2008) 519–537
530
DESIRED (SOLID LINE) AND PREDICTED (DASHED LINE)
1 0.8 0.6 0.4 0.2 0 –0.2 –0.4 –0.6 –0.8 –1 0
100
200
300
400
500
600
700
800
PREDICTION ERRORS
0.03
0.02
0.01
0
–0.01
–0.02
–0.03 0
100
200
300
400
500
600
700
800
900
1000
Fig. 6. Mackey–Glass time series (prediction step ¼ 6): (a) result of the original and predicted Mackey–Glass time series using only ANN (b) prediction error.
reported by a number of researchers for comparing the learning and generalization ability of different neural architectures [46], fuzzy systems and genetic algorithms [20,44]. The Mackey–Glass chaotic time series is generated from the following delay differential equation: dxðtÞ axðt tÞ ¼ bxðtÞ. dt 1 þ x10 ðt tÞ
(25)
When W417, the equation shows chaotic behaviour. Higher values of W yield higher dimensional chaos. To make the comparisons with earlier work fair, we chose the
parameters of n=4 and P=6. One thousand data points were generated with an initial condition x(0)=1.2 and W=17 based on the fourth-order Runge–Kutta method. Figs. 6 and 7 show a section of 1000 sample points used in our study. The first 500 data pairs of the series were used as training data, while the remaining 500 were used to validate the model identified. Fig. 6 shows the predicted and desired values using only ANN for both training and checking data. Fig. 7 shows the forecasting using the hybrid methodology (ARMA(5,1) and a MLP with two hidden layers, using sigmoidal
ARTICLE IN PRESS I. Rojas et al. / Neurocomputing 71 (2008) 519–537
531
DESIRED (SOLID LINE) AND PREDICTED (DASHED LINE). HYBRID SYSTEM 1.5
1
0.5
0
–0.5
–1 0
100
200
300
400
500
600
700
800
PREDICTION ERRORS 0.01 0.008 0.006 0.004 0.002 0 –0.002 –0.004 –0.006 –0.008 –0.01 0
100
200
300
400
500
600
700
800
Fig. 7. Mackey–Glass time series (prediction step ¼ 6): (a) result of the original and predicted Mackey–Glass time series using the proposed methodology (b) prediction error.
activation functions, and six input variables were used). The error indices, the root mean square error and the correlation coefficient, for this simulation were 0.0122 and 0.998 for the model using only ANN and 0.0027 and 1 for the proposed algorithm. Table 3 compares the prediction accuracy of different computational paradigms presented in the bibliography for this benchmark problem (including our proposal), for various fuzzy system structures, neural systems and genetic algorithms. The data are taken from [18,20,22].
5.3. Daily brightness of a variable star The previous subsection dealt with the time series data generated from arithmetic expressions and characterized by chaotic behaviour. However, a very interesting problem is the prediction of data that are not generated from a mathematical expression and thus governed by a particular determinism, but instead from time series based on real data, whose determinism obviously cannot be known. For this purpose, we used natural phenomena, the brightness of
ARTICLE IN PRESS I. Rojas et al. / Neurocomputing 71 (2008) 519–537
532
Table 3 Comparison results of the prediction error of different methods for prediction step equal to 6 (500 training data) Method
Prediction error (RMSE)
Auto regressive model Cascade correlation NN Back-propagation NN Sixth-order polynomial Linear predictive method
0.19 0.06 0.02 0.04 0.55
Kim and Kim (genetic algorithm and fuzzy system) 5 MFs 0.049206 7 MFs 0.042275 9 MFs 0.037873 J.S.R.Jang et al.(ANFIS and fuzzy system) Wang et al. Product T-norm Min T-norm Classical neural network using RBF Our approach
0.007 0.0907 0.0904 0.0114 0.0025
a variable star at midnight on 600 successive nights. The first 300 data pairs of the series were used as training data, while the remaining 300 were used to validate the model identified. Fig. 8 shows the predicted and desired values using only ANN for both training and checking data. The error indices, the root mean square error and the correlation coefficient, for this simulation were 1.438 and 0.945. Fig. 9 shows the forecasting using the hybrid methodology, where the error indices are 0.868 and 0.986 for the root mean square error and the correlation coefficient, respectively (ARMA(5,0) and a MLP with two hidden layers, using sigmoidal activation functions, and five input variables were used). Figs. 10 and 11 show the results of predicting the daily brightness of a variable star time series where the prediction steps change (RMSE and correlation coefficient). As expected, the greater the prediction step, the lower the correlation coefficient and greater the RMSE.
Table 4 Rules used by the expert system for the ARMA model structure 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
If the PACF fall more abruptly than the ACF, then the model is AR(P), being P the number of PACF above the error criterion If the ACF fall more abruptly than the PACF, then the model is MA(Q), being Q the number of ACF above the error criterion If both PACF and ACF fall with the same abruptness, the model is ARMA(P,Q), being P and Q the number of PACF and ACF coefficients above the error criterion If the ACF are discontinued suddenly, the process is MA(Q), where Q is the number of ACF above the error criterion If both PACF and ACF fall with exactly the same abruptness, the model is ARMA(1,1) If the number of coefficients above the error criterion in the ACF is less than in the PACF, the model is MA(Q), where Q is the number of ACF coefficients above the error criterion If the number of coefficients above the error criterion in the PACF is less than in the ACF, the model is MA(Q), where Q is the number of PACF coefficients above the error criterion If the number of coefficients above the error criterion in the ACF is equal than the number of coefficients above the same criterion in the PACF , the series is ARMA(P, Q), being P and Q the number of PACF and ACF coefficients above the error criterion respectively If the ACF sign changes, the series is AR(3) If the correlation coefficient of the ACF is greater than the PACF, the model is AR(1) If the estimated F1 coefficient for the AR(1) model fulfils the stationary conditions, the series is AR(1) If the series is AR(1), the first estimated correlation coefficient will be approximately equal to the first estimated partial autocorrelation coefficient If the estimated F1 coefficient is positive and the PACF peak is positive, the series is AR(1) If the estimated F1 coefficient is negative and the PACF peak is negative, the series is AR(1) If the b from the PACF exponential adjustment is zero, the series is AR(1) If the estimated y1 coefficient for the model fulfils the invertibility requirements, the series is MA(1) If only the first autocorrelation coefficient is not null, the series is MA(1) If the estimated y1 coefficient is negative, the ACF peak is positive and the PACF change the sign starting from the positive sign, the process is MA(1) If the estimated y1 coefficient is positive, the ACF peak is negative and the PACF do not change the sign starting from the negative sign, the process is MA(1) If the b from the ACF exponential adjustment is zero, the series is MA(1) If the estimated F1 and F2 coefficients fulfil the stationary conditions, then the series is AR(2) If the series is AR(2), the two first autocorrelation coefficients must fall within the area enclosed by the following curves: jrk1 jo1; jrk2 jo1; r2k1 o12ðrk2 þ 1Þ
23 24 25 26 27
If the ACF change theirs sign, the process is AR(2) If there are only two PACF with absolute value above the error criterion, the process is AR(2) If the estimated y1 and y2 coefficients fulfil the invertibility conditions, then the process is MA(2) If there are only two ACF with absolute value above the error criterion, the process is MA(2) If the model is MA(2), rk1 and rk2 must fall within the area enclosed by: rk1 þ rk2 ¼ 0:5; rk2 rk1 ¼ 0:5; r2k1 ¼ 4rk2 ð1 2rk2 Þ
28 29 30 31 32 33 34 35
If If If If If If If If
the PACF change theirs sign, the process is MA(2) the estimated F1 and y1 coefficients fulfil the stationary and invertibility conditions, then the series is ARMA(1,1) rules 11 and 16 are fulfilled, the model is ARMA(1,1) rule 11 is not fulfilled, the series is not AR (1) rule 16 is not fulfilled, the series is not MA(1) rule 21 is not fulfilled, the series is not AR(2) rule 25 is not fulfilled, the series is not MA(2) rule 29 is not fulfilled, the series is not ARMA(1,1)
ARTICLE IN PRESS I. Rojas et al. / Neurocomputing 71 (2008) 519–537
533
Table 4 (continued ) 36 37 38 39 40 41 42 43
If If If If If If If If
the the the the the the the the
probability probability probability probability probability probability probability probability
of of of of of of of of
the the the the the the the the
series series series series series series series series
to to to to to to to to
be be be be be be be be
an an an an an an an an
AR(P) AR(P) AR(P) AR(P) AR(P) AR(P) AR(P) AR(P)
is is is is is is is is
medium and the probability to be an MA(Q) is small, then the series is AR(P) high and the probability to be an MA(Q) is small, then the series is AR(P) small and the probability to be an MA(Q) is medium, then the series is MA(Q) medium and the probability to be an MA(Q) is medium, then the series is ARMA(P,Q) high and the probability to be an MA(Q) is medium, then the series is ARMA(P,Q) small and the probability to be an MA(Q) is high, then the series is MA(Q) medium and the probability to be an MA(Q) is high, then the series is ARMA(P,Q) high and the probability to be an MA(Q) is high, then the series is ARMA(P,Q)
DESIRED (SOLID LINE) AND PREDICTED (DASHED LINE)
30
25
20
15
10
5
0 50
100
150
200
250
300
350
400
450
500
550
400
450
500
550
TIME PREDICTION ERRORS 5
0
–5
–10
–15 50
100
150
200
250
300 TIME
350
Fig. 8. Daily brightness of a variable star time series (prediction step ¼ 5): (a) result of the original and predicted time series using only ANN (b) prediction error.
ARTICLE IN PRESS I. Rojas et al. / Neurocomputing 71 (2008) 519–537
534
DESIRED (SOLID LINE) AND PREDICTED (DASHED LINE)
30
25
20
15
10
5
0 50
100
150
200
250
300 TIME
350
400
450
500
550
400
450
500
550
PREDICTION ERRORS 2 0 –2 –4 –6 –8 –10 50
100
150
200
250
300
350
TIME Fig. 9. Daily brightness of a variable star time series (prediction step ¼ 5): (a) result of the original and predicted time series using the proposed methodology (b) prediction error.
6. Conclusion Even after over 20 years, the Box–Jenkins approach for time series forecasting using linear ARMA models is still frequently discussed in the literature and has been used as a benchmark to evaluate some new modelling approaches. This is especially the case in the community of neural computing. The results reported in many of these studies and applications are frequently benchmarked against those
produced by linear ARMA models. The drawbacks of the linear methods, as well as the development of artificial intelligence, have led to the development of alternative solutions using nonlinear modelling. Two of the forecasting techniques that allow for the detection and modeling of nonlinear data are rule induction and neural networks. In this paper, a hybrid methodology is proposed for time series forecasting that combines linear ARMA models and nonlinear models such as artificial neural networks.
ARTICLE IN PRESS I. Rojas et al. / Neurocomputing 71 (2008) 519–537
mentioning that the automatic system presented, using real data sets and synthetic time series generated by differential equation, indicate that the proposed hybrid methodology improves the error indexes achieved by either of the models used separately
STAR TIME SERIES
2.4
535
2.2 2 1.8 RMSE
1.6
Acknowledgement
1.4 Training Checking
1.2
This work has been partially supported by the Spanish Projects TIN2004-01419 and TIN2007-60587
1 0.8
Reference
0.6 0.4 1
2
3
4
5
6
7
8
9
10
PREDICTION STEP Fig. 10. Result of prediction the daily brightness of a variable star time series (change of RMSE by prediction step).
STAR TIME SERIES
1 0.98
CORRELATION
0.96 0.94 0.92 0.9 0.88 0.86 1
2
3
4
5
6
7
8
9
10
PREDICTION STEP Fig. 11. Result of prediction the daily brightness of a variable star time series (change of correlation by prediction step).
It is important to note, that obtaining the structure of the ARMA model is a problem itself, and in order to develop an automatic system without the direct intervention of a human expert in the Box–Jenkins methodology, a fuzzy expert system was designed to acquire the structure of the ARMA model. The justification for automatic ARMA modeling is the following: (a) the method for building an ARMA model is somewhat complex and requires a deep knowledge of the method; (b) consequently, building an ARMA model is often a difficult task for the user, requiring training in statistical analysis, a good knowledge of the field of application, and the availability of an easy to use but versatile specialized computer program; (c) the number of series to be analysed is often large. It is worth
[1] U. Anders, O. Korn, Model selection in neural networks, Neural Networks 12 (1999) 309–323. [2] I. Beliaev, R. Kozma, Time series prediction using chaotic neural networks on the CATS benchmark, Neurocomputing, accepted manuscript, Available online 22 February 2007, in press. [3] B.V. Bonnlander, A.S. Weigend, Selecting input variables using mutual information and nonparamteric density estimation, in: Proc. 1994 Int. Symp. Artificial Neural Networks (ISANN’94), Tainan, Taiwan, 1994, pp. 42–50. [4] G.E.P. Box, G.M. Jenkins, Time Series Analysis: Forecasting and Control, Holden-Day, San Francisco, 1970. [5] G.E.P. Box, G.M. Jenkins, G.C. Reinsel, Time Series Analysis: Forecasting and Control, Prentice-Hall, Englewood Cliffs, New Jersey, 1994. [6] G. Castellano, A.M. Fanelli, M. Pelillo, An iterative pruning algorithm for feed forward neural networks, IEEE Trans. Neural Networks 8 (1997) 519–531. [7] W.-S. Chan, A comparison of some of pattern identification methods for order determination of mixed ARMA models, Stat. Probab. Lett. 42 (1999) 69–79. [8] Yuehui Chen, Bo Yang, Jiwen Dong, Time-series prediction using a local linear wavelet neural network, Neurocomputing 69 (4–6) (2006) 449–465. [9] ByoungSeon Choi, A recursive algorithm for solving the spatial Yule–Walker equations of causal spatial AR models, Stat. Probab. Lett. 33 (3) (1997) 241–251. [10] Philip Doganis, Alex Alexandridis, Panagiotis Patrinos, Haralambos Sarimveis, Time series sales forecasting for short shelf-life food products based on artificial neural networks and evolutionary computing, J. Food Eng. 75 (2) (2006) 196–204. [11] C.L. Giles, C.W. Omlin, Pruning recurrent neural networks for improved generalization performance, IEEE Trans. Neural Networks 5 (1994) 848–851. [12] J.V. Hansen, R.D. Nelson, Neural networks and traditional time series methods: a synergistic combination in state economic forecasts, IEEE Trans. Neural Networks 8 (1997) 863–873. [13] X. He, H. Asada, A new method for identifying orders of input/ output models for nonlinear dynamic system, in: Proc. Am. Control Conf. San Francisco, CA, 1993, pp. 2520–2523. [14] L.J. Herrera, H. Pomares, I. Rojas, A. Guillen, O. Valenzuela, A. Prieto, TaSe model for long-term time series forecasting, Lecture Notes Comput. Sci. 3512 (2005) 1027–1034. [15] G.-B. Huang, P. Saratchandran, N. Sundararajan, A generalized growing and pruning RBF (GGAP-RBF) neural network for function approximation, IEEE Trans. Neural Networks 16 (2005) 57–67. [16] H.B. Hwang, Insights into neural-network forecasting time series corresponding to ARMA(p,q) structures, Omega 29 (2001) 273–289. [17] T. Iokibe, Y. Fujimoto, M. Kanke, S. Suzuki, Short-term prediction of chaotic time series by local fuzzy reconstruction method, J. Intell. Fuzzy Syst. 5 (1997) 3–21.
ARTICLE IN PRESS 536
I. Rojas et al. / Neurocomputing 71 (2008) 519–537
[18] J.S.R. Jang, C.T. Sun, E. Mizutani, Neuro-Fuzzy and Soft Computing, Prentice Hall, Englewood Cliffs, NJ, ISBN 0-13261066-3, 1997. [19] W.C. Jhee, J.K. Lee, Performance of neural networks in managerial forecasting, Intell. Syst. Account. Financ. Manage. 2 (1993) 55–71. [20] D. Kim, C. Kim, Forecasting time series with genetic fuzzy predictor ensemble, IEEE Trans. Fuzzy Syst. 5 (4) (1997) 523–535. [21] A. Lapedes, R. Faber, Nonlinear signal processing using neural networks: prediction and system modelling, Report No. LA-UR-872662, Los Alamos National Laboratory, Los Alamos, New Mexico, 1987. [22] S.-H. Lee, I. Kim, Time series analysis using fuzzy learning, in: Proc. Int. Conf. Neural Inform. Process. Seoul, Korea, Vol. 6, 1994, pp. 1577–1582. [23] S. Makridakis, A. Anderson, R. Carbone, R. Fildes, M. Hibon, R. Lewandowski, J. Newton, E. Parzen, R. Winker, The Forecasting Accuracy of Major Time Series Methods, Wiley, New York, 1984. [24] S. Makridakis, S.C. Wheelwright, R.J. Hyndman, Forecasting: Methods and Applications, third ed., Wiley, New York, 1998. [25] K.Z. Mao, S.A. Billings, Variable selection in nonlinear systems modelling, Mech. Syst. Signal Process. 13 (2) (1999) 240–255. [26] Adriano L.I. Oliveira, Silvio R.L. Meira, Detecting novelties in time series through neural networks forecasting with robust confidence intervals, Neurocomputing 70 (1–3) (2006) 79–92. [27] J.C. Principe, N.R. Euliano, W.C. Lefebvre, Neural and Adaptive Systems: Fundamentals Through Simulations, Wiley, New York, 1999. [28] I. Rojas, O. Valenzuela, M. Anguita, A. Prieto, Analysis of the operators involved in the definition of the implication functions and in the fuzzy inference process, Int. J. Approx. Reason. 19 (3–4) (1998) 367–389. [29] I. Rojas, J. Ortega, F.J. Pelayo, A. Prieto, Statistical analysis of the main parameters in the fuzzy inference process, Fuzzy Sets Syst. 102 (2) (1999). [30] I. Rojas, H. Pomares, J. Gonzales, et al., Analysis of the functional block involved in the design of radial basis function networks, Neural Process. Lett. 12 (1) (2000) 1–17. [31] I. Rojas, H. Pomares, J.L. Bernier, J. Ortega, B. Pino, F.J. Pelayo, A. Prieto, Time series analysis using normalized PG-RBF network with regression weights, Neurocomputing 42 (1–4) (2002) 267–285. [32] I. Rojas, F. Rojas, H. Pomares, O. Valenzuela, et al., The synergy between classical and soft-computing techniques for time series prediction, Lecture Notes Comput. Sci. 2972 (2004) 30–39. [33] P. Shaman, R. Stine, The bias of autoregressive coefficient estimators, J. Am. Stat. Assoc. 83 (403) (1988) 842–848. [34] Isabel Silva, M. Eduarda Silva, Asymptotic distribution of the Yule–Walker estimator for INAR(p) processes, Stat. Probab. Lett. 76 (15) (2006) 1655–1663. [35] R.C. Souza, A.C. Neto, A bootstrap simulation study in ARMA(p,q) structures, J. Forecasting 15 (1996) 343–353. [36] P. Stoica, B. Friedlander, T. So¨derstro¨m, A high-order Yule–Walker method for estimation of the AR parameters of an ARMA model, Syst. Control Lett. 11 (2) (1988) 99–105. [37] Z. Tang, P.A. Fishwick, Back-propagation neural nets as models for time-series forecasting, ORSA J. Comput. 5 (4) (1993) 374–385. [39] Z. Tang, C. de Almeida, P. Fishwick, A Time series forecasting using neural networks vs. Box–Jenkins methodology, Simulation 57 (5) (1991) 303–310. [40] Miron Tismenetsky, Some properties of solutions of Yule–Walker type equations, Linear Algebra Appl. 173 (1992) 1–17. [41] R. Tsay, G.C. Tiao, Consistent estimates of autoregressive parameters and extended sample autocorrelation function for stationary and nonstationary ARMA models, J. Am. Stat. Assoc. 79 (385) (1984) 84–96.
[42] F.M. Tseng, H. Yu, G. Tzeng, Combining neural network model with seasonal time series ARMA model, Technol. Forecast. Soc. Change 69 (2002) 71–87. [43] J.H. Wang, J.Y. Leu, Stock market trend prediction using ARIMAbased neural networks. In: Proceedings of 1996 IEEE International Conference on Neural Networks, vol. 4, 3–6 June, Washington D.C., USA, 1996. pp. 2160–2165. [44] L.X. Wang, J.M. Mendel, Generating fuzzy rules by learning from examples, IEEE Trans. Syst. Man Cybern. 22 (6) (1992) 1414–1427. [45] D.K. Wedding II, K.J. Cios, Time series forecasting by combining RBF networks, certainty factors, and the Box–Jenkins model, Neurocomputing 10 (3) (1996) 149–168. [46] B.A. Whitehead, Tinothy D. Choate, Cooperative-competitive genetic evolution of radial basis function centers and widths for time series prediction, IEEE Trans. Neural Networks 7 (4) (1996) 869–880. [47] D.L. Yu, J.B. Gomm, D. Williams, Neural model input selection from a MIMO chemical process, Eng. Appl. Artif. Intell. 13 (2000) 12–15. [48] G.P. Zhang, Time series forecasting using a hybrid ARIMA and neural network model, Neurocomputing 50 (2003) 159–175. Ignacio Rojas received the B.Sc. degree in electronic physics in 1992 and the Ph.D. degree in Intelligent Systems with honors in 1996, both from the University of Granada, Spain. He has been working as visiting professor at the University of Dortmund (Germany) at the Department of Electrical Engineering (1993–1995, 2001), as a visiting researcher with the BISC Group of Prof. L. Zadeh, University of California, Berkeley (1998) and as visiting profesor at the University of Genova (Italy), Department of Computer Science (2003). He is currently working as an Associate Professor with the Department of Architecture and Computer Technology, University of Granada. He is a member of the IEEE Computational Intelligence Society and Secretary of the IEEE Spanish Regional Interest Group of the Neural Networks Council. His research interests include hybrid system, hardware–software implementation, combination of intelligent system for adaptive control, self-organizing neuro-fuzzy systems, neural networks, time series forecasting, data mining and architectures for complex optimization problems. Olga Valenzuela received the M.Sc. degree in mathematics in 1995 and the Ph.D. degree in 2003, both from the University of Granada, Spain. She was at the Department of Statistic of the University of Jaen (Spain), and with the Department of Computer and Information Science of the University of Genova (Italy) as invited researcher. Her current research interests include optimization theory and applications, neural networks, time series forecasting using linear and non-linear methods and evolutionary computation. Fernando Rojas was born in 1978. He received the M.Sc. degree in Computer Engineering in 1995 from the University of Granada, Spain and the Ph.D. degree in 2003 from the Universtiy of Granada. He is currently working as an associate lecturer within the Department of Computer Architecture and Technology in the University of Granada. His current areas of research interest are in the fields of function approximation, signal processing and time series prediction.
ARTICLE IN PRESS I. Rojas et al. / Neurocomputing 71 (2008) 519–537 Alberto Guille´n was born in 1979. He received the M.A.Sc. degree in Computer Science in 2002, and the M.Sc. degree in 2004 from the University of Granada, Spain. He is currently working as an associate lecturer within the Department of Computer Science in the University of Jae´n. His current areas of research interest are in the fields of clustering, function approximation and parallel genetic algorithms.
Luis Javier Herrera was born in 1978. He received the M.Sc. degree in Computer Engineering in 1995 from the University of Granada, Spain. He is currently working as an associate lecturer within the Department of Computer Architecture and Technology in the University of Granada. His current areas of research interest are in the fields of function approximation, selforganizing fuzzy interpretable models and time series prediction. Hector Pomares received his M.S. in Electronics Engineering from the University of Granada in 1995 and his M.S. in Physics and Electronics degree in 1997, and the Ph.D. degree in Intelligent Systems with honors in 2000. At present, he is associate professor at the Department of Architecture and Computer Technology. His current areas of research interest are in the fields of fuzzy approximation, neural networks and genetic algorithms, and radial basis function design, applied to time series prediction, bioinformatic and intelligent control.
537
Maria Luisa Marquez was born in 1952. She received M.Sc. degree in mathematics and the Ph.D. degree both from the University of Granada, Spain. She has also a M.Sc. degree in Statistics and Operational Research from the University of Granada. Currently she is a full professor within the Department of Applied Mathematic at the University of Granada. Her current areas of research interest are in the fields of electoral system, proportional representation, electoral formula, intelligent system and time series prediction. She has also been a member of the scientific committee of the VII International Congress APEGA and member of the commission for the adaptation of the curriculum of the career of Architecture in the University of Granada. She has several publications at both national and international level, such as: Revista de las Cortes Generales, Epsilon, Annals of Operations Research, Lecture notes in pure and applied mathematics, UNO and EGA. Currently, she is a member of the commission for the creation of the academical institute of research from the University of Granada ‘‘City and Patrimony’’. Moreover, she participates in I+D projects, such as: PS910121, PFECCS94-001, FQM191, TIN2004-01419 and y FQM1969.
Miguel Pasadas received M.Sc. degree in mathematics and the Ph.D. degree both from the University of Granada, Spain. Currently he is a professor within the Department of Applied Mathematic at the University of Granada. His current areas of research interest are in the fields of approximation by variational methods, applications to the approximation of curves and surfaces; interpolation and smoothing splines; finite elements and time series.