Hybridization of the probabilistic neural networks with feed-forward neural networks for forecasting

Hybridization of the probabilistic neural networks with feed-forward neural networks for forecasting

Engineering Applications of Artificial Intelligence 25 (2012) 1277–1288 Contents lists available at SciVerse ScienceDirect Engineering Applications o...

1MB Sizes 1 Downloads 106 Views

Engineering Applications of Artificial Intelligence 25 (2012) 1277–1288

Contents lists available at SciVerse ScienceDirect

Engineering Applications of Artificial Intelligence journal homepage: www.elsevier.com/locate/engappai

Hybridization of the probabilistic neural networks with feed-forward neural networks for forecasting Mehdi Khashei n, Mehdi Bijari Department of Industrial Engineering, Isfahan University of Technology, Isfahan, Iran

a r t i c l e i n f o

abstract

Article history: Received 9 May 2011 Received in revised form 3 October 2011 Accepted 23 January 2012 Available online 7 February 2012

Feed-forward neural networks (FFNNs) are among the most important neural networks that can be applied to a wide range of forecasting problems with a high degree of accuracy. Several large-scale forecasting competitions with a large number of commonly used time series forecasting models conclude that combining forecasts from more than one model often leads to improved performance, especially when the models in the ensemble are quite different. In the literature, several hybrid models have been proposed by combining different time series models together. In this paper, in contrast of the traditional hybrid models, a novel hybridization of the feed-forward neural networks (FFNNs) is proposed using the probabilistic neural networks (PNNs) in order to yield more accurate results than traditional feed-forward neural networks. In the proposed model, the estimated values of the FFNN models are modified based on the distinguished trend of their residuals and optimum step length, which are respectively yield from a probabilistic neural network and a mathematical programming model. Empirical results with three well-known real data sets indicate that the proposed model can be an effective way in order to construct a more accurate hybrid model than FFNN models. Therefore, it can be applied as an appropriate alternative model for forecasting tasks, especially when higher forecasting accuracy is needed. & 2012 Elsevier Ltd. All rights reserved.

Keywords: Feed-forward neural networks (FFNNs) Probabilistic neural networks (PNNs) Time series forecasting Hybrid models

1. Introduction Feed-forward neural networks (FFNNs) are flexible computing frameworks and universal approximators that can be applied to a wide range of forecasting problems with a high degree of accuracy. Several distinguishing features of feed-forward neural networks make them valuable and attractive for a forecasting task. First, feed-forward neural networks are data-driven selfadaptive methods in that there are few a priori assumptions about the models for problems under study. Second, feed-forward neural networks can generalize. Third, feed-forward neural networks are universal functional approximators that can approximate a large class of functions with a high degree of accuracy, and finally, feed-forward neural networks are nonlinear. Given the advantages of feed-forward neural networks, it is not surprising that this methodology has attracted overwhelming attention in many areas of prediction, especially financial markets prediction (Khashei 2005). Improving forecasting especially time series forecasting accuracy is an important yet often difficult task facing forecasters. Both theoretical and empirical findings have indicated that

n

Corresponding author. Tel.: þ98 311 3912550 1; fax: þ 98 311 3915526. E-mail address: [email protected] (M. Khashei).

0952-1976/$ - see front matter & 2012 Elsevier Ltd. All rights reserved. doi:10.1016/j.engappai.2012.01.019

integration of different models can be an effective way of improving upon their predictive performance, especially when the models in the ensemble are quite different. The basic idea of the model combination in forecasting is to use the unique feature of each model to capture different patterns in the data. In addition, a single model may not be sufficient to identify all the characteristics of the time series or may not identify the true data generating process. In combined models, the aim is to reduce the risk of using an inappropriate model by combining several models to reduce the risk of failure and obtain results that are more accurate (Khashei and Bijari, 2011). Much effort has been devoted to develop and improve the hybrid models of artificial neural networks (ANNs) and especially feed-forward neural networks for time series forecasting, since the early work of Reid (1968) and Bates and Granger (1969). In pioneering work on combined forecasts, Bates and Granger showed that a linear combination of forecasts would give a smaller error variance than any of the individual methods. Since then, the studies on this topic have expanded dramatically. In recent years, more hybrid forecasting models have been proposed using feed-forward neural networks and applied in many areas with good prediction performance. Yu et al. (2005) proposed a novel nonlinear ensemble forecasting model integrating generalized linear auto regression (GLAR) with back-propagation neural networks (BPNNs) in order to obtain accurate prediction in foreign

1278

M. Khashei, M. Bijari / Engineering Applications of Artificial Intelligence 25 (2012) 1277–1288

exchange market. (Faruk (2010) proposed a hybrid approach for water quality time series prediction which consists of an ARIMA methodology and feed-forward, back propagation network structure with an optimized conjugated training algorithm. Khashei et al. (2009) proposed a new hybrid model using feedforward neural networks (FFNNs) and fuzzy logic in order to overcome the data and linear limitations of autoregressive integrated moving average. Cadenas and Rivera (2010) developed a hybrid model consisting of feed-forward neural networks (FFNN) and autoregressive integrated moving average (ARIMA) models for wind speed forecasting. Tseng et al. (2002) proposed a hybrid model called SARIMABP that combines the back-propagation neural network and the seasonal autoregressive integrated moving average (SARIMA) in order to predict seasonal time series data.Amin-Naseri and Soroush (2008) presented a hybrid model of feed-forward neural networks for daily electrical peak load forecasting using self-organizing maps (SOMs). Aladag et al. (2009) proposed a new hybrid approach for time series forecasting that combines the Elman’s recurrent neural networks (ERNN) and autoregressive integrated moving average (ARIMA) models. Khashei et al. (2008) proposed a hybrid model called FANN in order to use the advantages and to fulfill the limitations of the fuzzy regression and back-propagation neural network for time series forecasting. Lin and Wu (2009), in similar work, proposed a hybrid neural network model to forecast the typhoon rainfall using the self-organizing maps and the feedforward neural networks. Shafie-khah et al. (2011) proposed a novel hybrid model to forecast day-ahead electricity price, based on the wavelet transform, autoregressive integrated moving average (ARIMA) models and radial basis function neural networks (RBFN). Khashei and Bijari (2010) proposed a novel hybrid model of feed-forward neural networks, based on the basic concepts of autoregressive integrated moving average models, called ANN(p,d,q), in order to overcome the linear limitation of traditional feed-forward neural networks and yield more accurate results. Cheng et al. (2010) developed an evolutionary fuzzy hybrid neural network (EFHNN) to enhance project cash flow management, which incorporates four artificial intelligence approaches, namely the neural network (NN), high order neural network (HONN), fuzzy logic (FL), and genetic algorithm (GA). Tsaih et al. (1998) proposed a hybrid artificial intelligence model combining the feed-forward neural networks and the rule-based systems technique, by highlighting the advantages and overcoming the limitations of both mentioned techniques in order to accurately predict the direction of daily price changes in S&P 500 stock index futures. Leigh et al. (2002) introduced a method for combining template matching, from pattern recognition, and the feed-forward neural network, from artificial intelligence, to forecast stock market activity. Wu et al. (2010) proposed a new version of support vector machines (SVM), named g-SVM, in order to handle white noises from inputting data by the integration of Gaussian loss function and v-SVM. In this paper, in contrast of the traditional hybrid models, the classifier methods— probabilistic neural networks (PNNs)—are applied to construct a new hybrid model of the feed-forward neural networks in order to yield more accurate results. In our proposed model, a classifier analyzes the residuals of the FFNN model in order to distinguish their trend. In the next stage, a mathematical programming model calculates the optimum step length using obtained results from the first stage. Then, the estimated values of the FFNN model are modified according to the optimum step length and the distinguished trend. In the proposed model, the chance to capture different patterns in the data can be efficiently increased by using unique advantages of the probabilistic neural networks in detecting specific patterns

and break points, which may not be completely modeled by the feed-forward neural networks. Moreover, by combining FFNN and PNN models, complex nonlinear autocorrelation structures can be modeled more completely; and hence the achieved results will be improved. The rest of the paper is organized as follows. In the next section, the basic concepts of the feed-forward neural networks (FFNNs) are briefly reviewed. In Section 3, the probabilistic neural networks (PNNs), which are selected as classifier model, are briefly reviewed. In Section 4, the formulation of the proposed model is introduced. In Section 5, the proposed model is applied to three well-known real data sets—the Wolf’s sunspot data, the Canadian lynx data, and the British pound against the U S dollar exchange rate data— forecasting and its performance is compared with those of other forecasting models in order to show the appropriateness and effectiveness of the proposed model. Section 6 contains the concluding remarks.

2. The feed-forward neural networks (FFNNs) Artificial neural networks (ANNs), based on the connection pattern (architecture), can be generally categorized into two categories: feed-forward neural networks (FFNNs) and recurrent neural networks (RNNs). Feed-forward neural networks (FFNNs) are among the most important and widely used forms of neural networks for time series modeling and forecasting. One significant advantage of the feed-forward neural networks over other classes of nonlinear models is that they are universal approximators that can approximate a large class of functions with a high degree of accuracy. Their power comes from the parallel processing of the information from the data. No prior assumption of the model form is required in the model building process. Instead, the network model is largely determined by the characteristics of the data. The model is characterized by a network of three layers of simple processing units connected by acyclic links (see Fig. 1). The relationship between the output (yt) and the inputs (yt  1,...,yt  p) has the following mathematical representation: ! P Q X X wi,j yti þ et , yt ¼ w0 þ wj g w0j þ ð1Þ j¼1

i¼1

where, wi,j ði ¼ 0,1,2,:::,P,j ¼ 1,2,:::,Q Þ and wj(j¼0,1,2,...,Q) are model parameters often called connection weights; P is the number of input nodes; Q is the number of hidden nodes; et is the residual of model at time t; and g is the transfer function. The logistic function is often used as the hidden layer transfer function, that is, SigðxÞ ¼

1 : 1 þ expðxÞ

Fig. 1. Feed-forward neural network structure (N

ð2Þ

(p-q-1)

.).

M. Khashei, M. Bijari / Engineering Applications of Artificial Intelligence 25 (2012) 1277–1288

Hence, the feed-forward neural network model of (1), in fact, performs a nonlinear functional mapping from the past observations to the future valueyt, i.e., yt ¼ f ðyt1 ,:::,ytP ,WÞ þet ,

ð3Þ

where W is a vector of all parameters and f(.) is a function determined by the network structure and connection weights. Thus, a feed-forward neural network is equivalent to a nonlinear autoregressive model. The expression (1) implies one output node in the output layer, which is typically used for one-step-ahead forecasting. The simple network given by (1) is surprisingly powerful in that it is able to approximate arbitrary function as the number of hidden nodes Q is sufficiently large (Zhang et al., 1998). In practice, simple network structure that has a small number of hidden nodes often works well in out-of-sample forecasting. This may be due to the over-fitting effect typically found in feed-forward neural networks modeling process. It occurs when the feed-forward neural network has too many free parameters, which allow the network to fit the training data well, but typically lead to poor generalization. In addition, it has been experimentally shown that the generalization ability begins to deteriorate when the network has been trained more than necessary, that is when it begins to fit the noise of the training data. These are main reasons that researcher believe designing the architecture of the feed-forward neural networks (determining the P, Q, number of epochs, etc.) is a problematic task. The choice of Q is data dependent and there is no systematic rule in deciding this parameter. In addition to choosing an appropriate number of hidden nodes, another important task of feed-forward neural network modeling of a time series is the selection of the number of lagged observations, P, and the dimension of the input vector. This is perhaps the most important parameter to be estimated in a feed-forward neural network because it plays a major role in determining the (nonlinear) autocorrelation structure of the time series. However, there is no theory that can be used to guide the selection of P. Hence, experiments are often conducted to select an appropriate P, Q, and number of training epochs. There exist a number of different approaches such as the pruning algorithm, the polynomial time algorithm, the canonical decomposition technique, and the network information criterion for finding the optimal architecture of a feed-forward neural network (Khashei 2005). These approaches can be generally categorized as follows: (i) empirical or statistical methods that are used to study the effect of a feed-forward neural network’s parameters and to choose appropriate values for them based on the model performance (Ma and Khorasani, 2003). The most systematic and general of these methods utilizes the principles from Taguchi’s design of experiments (Ross, 1996). (ii) Hybrid methods such as fuzzy inference (Leski and Czogala, 1999) where the feed-forward neural network can be interpreted as an adaptive fuzzy system or in cases where it can operate on fuzzy instead of real numbers. (iii) Constructive and/or pruning algorithms that, respectively, add and/or remove neurons from an initial architecture using a previously specified criterion to indicate how feed-forward neural network performance is affected by the changes (Jiang and Wah, 2003). The basic rules are that neurons are added when training is slow or when the mean squared error is larger than a specified value, and that neurons are removed when a change in a neuron’s value does not correspond to a change in the network’s response or when the weight values that are associated with this neuron remain constant for a large number of training epochs (Marin et al., 2007). (iv) Evolutionary strategies that search over topology space by varying the number of hidden layers and hidden neurons through application of genetic operators (Lee and Kang, 2007) and

1279

evaluation of the different architectures according to an objective function (Benardos and Vosniakos, 2007). Although many different approaches exist in order to find the optimal architecture of a feed-forward neural network, these methods are usually quite complex in nature and are difficult to implement (Zhang et al., 1998). Furthermore, none of these methods can guarantee the optimal solution for all real forecasting problems. To date, there is no structured and simple clear-cut method for determination of these parameters. Hence, the tedious experiments and trial-and-error procedures are often used. In this procedure, the numerous networks with varying numbers of input and hidden units (P, Q) are tested, and generalization error for each is estimated, then the network with the lowest generalization error is selected (Hosseini et al., 2006). Once a network structure (P, Q) is specified, the network is ready for training, a process of parameter estimation. The parameters are estimated such that the cost function of neural network is minimized. Cost function is an overall accuracy criterion such as the following mean squared error: 0 0 ! 112 P Q N N X X 1 X 1 X @yt @w0 þ wi,j yti AA , E¼ ðei Þ2 ¼ wj g w0j þ Nn¼1 Nn¼1 i¼1 j¼1 ð4Þ where N is the number of error terms. This minimization is done with some efficient nonlinear optimization algorithms other than the basic back propagation training algorithm (Rumelhart and McClelland, 1986), in which the parameters of the neural network, wi,j, are changed by an amount Dwi,j, according to the following formula:

Dwi,j ¼ Z

@E , @wi,j

ð5Þ

where the parameter Z is the learning rate and qE/qwi,j is the partial derivative of the function E with respect to the weight wi,j. This derivative is commonly computed in two passes. In the forward pass, an input vector from the training set is applied to the input units of the network and is propagated through the network, layer by layer, producing the final output. During the backward pass, the output of the network is compared with the desired output and the resulting error is then propagated backward through the network, adjusting the weights accordingly. To speed up the learning process, while avoiding the instability of the algorithm, Rumelhart and McClelland (1986) introduced a momentum term d in Eq. (5), thus obtaining the following learning rule:

Dwi,j ðt þ 1Þ ¼ Z

@E þ d Dwi,j ðtÞ , @wi,j

ð6Þ

The momentum term may also be helpful to prevent the learning process from being trapped into poor local minima, and is usually chosen in the interval [0;1]. Finally, the estimated model is evaluated using a separate hold-out sample that is not exposed to the training process.

3. Probabilistic neural networks (PNNs) The probabilistic neural network (PNN) is a Bayes–Parzen classifier (Masters, 1995) that is often an excellent pattern classifier in practice. The foundation of the approach is well known decades ago (1960s); however, the method was not of a widespread use because of the lack of sufficient computation power until recently (Hajmeer and Basheer, 2003). Specht (1990) first introduced the probabilistic neural networks in 1990, who demonstrated how the Bayes–Parzen classifier could be broken

1280

M. Khashei, M. Bijari / Engineering Applications of Artificial Intelligence 25 (2012) 1277–1288

up into a large number of simple processes implemented in a multilayer neural network each of which could be run independently in parallel. Because the probabilistic neural network is primarily based on Bayes–Parzen classification, it is of interest to discuss briefly both Bayes theorem for conditional probability and Parzen’s method for estimating probability density function of random variables. In order to understand Bayes’ theorem, consider a sample x¼ [x1,x2,...,xn] taken from a collection of samples belonging to a number of distinct populations (1,2,...,k,...K). Assuming that the (prior) probability that a sample belongs to the kth population (class) is hk, the cost associated with misclassifying that sample is lk, and that the true probability density function of all populations f1(x),f2(x),...,fk(x),...,fK(x) are known, Bayes theorem classifies an unknown sample into the ith population (Tsai, 2006) if hi li f i ðxÞ 4 hj lj f j ðxÞ

8j a i,

j ¼ 1,2,:::K:

ð7Þ

The density function fk(x) corresponds to the concentration of class k examples around the unknown example. As seen from Eq. (7), Bayes’ theorem favors a class that has high density approximately the unknown sample, or if the cost of misclassification or prior probability is high. The biggest problem with the Bayes’ classification approach lies in the fact that the probability density function fk(x) is not usually known. In nearly all standard statistical classification algorithms, some knowledge regarding the underlying distribution of the population of all random variables used in classification should be known or reasonably assumed. Most often, normal (Gaussian) distribution is assumed; however, the assumption of normality cannot always be safely justified (Hajmeer and Basheer, 2002). When the distribution is not known (which is often the case) and the true distribution deviates considerably from the assumed one, the traditional statistical methods normally run into major classification problems resulting in high misclassification rate. There is a need to derive an estimate of fk(x), from the training set composed of the training example, rather than just assume normal distribution. The resulting distribution will be a multivariate probability density function (PDF) that combines all the explanatory random variables. In order to derive such distribution estimator from a set of training examples, the Parzen’s method (Parzen, 1962) is usually used. The univariate case of PDF was proposed by Parzen (1962) and then was extended to the multivariate case by Cacoullos (1966). The multivariate PDF estimator, g(x), may be expressed as:   N X 1 x1 x1,i x2 x2,i xn xn,i , gðx1 ,x2 ,:::,xn Þ ¼ W , ,:::, ð8Þ N s1 s2 :::sn i ¼ 1 s1 s2 sn where s1,s2,...sn are the smoothing parameters representing standard deviation (also called window or kernel width) around the mean of n random variables x1,x2,...,xn, W is a weighting function to be selected with specific characteristics (Masters, 1995; Specht, 1990), and N is the total number of training examples. If all smoothing parameters are assumed equal (i.e., s1 ¼ s2 ¼.... ¼ sn ¼ s) and a bell-shaped Gaussian function is used for W, a reduced form of Eq. (8) is as follows (Ge et al., 2008): " # 2 N :ðxxi Þ: 1 1X gðxÞ ¼  exp  , ð9Þ 2s2 ð2pÞn=2 sn N i ¼ 1 where x is the vector of random variables (explanatory variables), and xi is the ith training vector. Eq. (9) represents the average of the multivariate distributions where each distribution is centered at one distinct training example. It is worth mentioning that the assumption of a Gaussian weighting function does not imply that the overall PDF will be Gaussian (normal), however, other weighting functions such as the reciprocal function (w(r)¼1/1þ r2) may be used

Fig. 2. A simple probabilistic neural network.

(Masters, 1995). As the sample size, N, increases, the Parzen’s PDF estimator asymptotically approaches the true underlying density function. Regarding the network’s operation based on the aforementioned mathematics, consider the simple network architecture in Fig. 2 with n input nodes in the input layer, two population classes (classes 1 and 2), N1 training examples belonging to class 1, and N2 examples in class 2. The pattern layer is designed to contain one neuron for each training case available and the neurons are split into the two classes. The summation layer contains one neuron for each class. The output layer contains one neuron that operates a trivial threshold discrimination; it simply retains the maximum of the two summation neurons (Chen et al., 2003). The probabilistic neural network executes a training case by first presenting it to all pattern layer neurons. Each neuron in the pattern layer computes a distance measure between the presented input vector and the training example represented by that pattern neuron. The probabilistic neural network then subjects this distance measure to the Parzen window (weighting function, W) and yields the activation of each neuron in the pattern layer. Subsequently, the activation from each class is fed to the corresponding summation layer neuron, which adds all the results in a particular class together. The activation of each summation neuron is executed by applying the remaining part of the Parzen’s estimator equation (e.g., the constant multiplier in Eq. (9)) to obtain the estimated probability density function value of population of a particular class (Hajmeer and Basheer, 2003). If the misclassification cost and prior probabilities are equal between the two classes, and the classes are mutually exclusive (i.e., no case can be classified into more than one class) and exhaustive (i.e., the training set covers all classes fairly), the activation of the summation neurons will be equal to the posterior probability of each class. The results from the two summation neurons are then compared and the largest is fed forward to the output neuron to yield the computed class and the probability that this example will belong to that class. The most important parameter that needs to be determined to obtain an optimal probabilistic neural network is the smoothing parameters (s1,s2,...sn) of the random variables (Xue et al., 2005). A straightforward procedure involves selecting an arbitrary value of s’s, training the network, and testing it on a test (validation) set of examples. This procedure is repeated for other s’s and the set of s’s that produces the least misclassification rate (percentage of examples that were misclassified) is chosen. A better and more efficient procedure for searching for the optimal smoothing

M. Khashei, M. Bijari / Engineering Applications of Artificial Intelligence 25 (2012) 1277–1288

parameter of random variables and classes is proposed by Masters (1995). This procedure prevents any bias in the network to the correctly classified examples, and thus will be followed in this study. Other details on the mathematics as well as advanced variations of probabilistic neural networks are given in Specht (1990) and Masters (1995).

4. Formulation of the proposed model Despite the numerous time series models available, the accuracy of time series forecasting currently is fundamental to many decision processes, and hence, never research into ways of improving the effectiveness of forecasting models been given up. Many researches in time series forecasting have been argued that predictive performance improves in combined models (Khashei, 2005). In the literature, different combination techniques have been proposed in order to overcome the deficiencies of single models and yield more accurate hybrid models. In this paper, in contrast of the traditional hybrid techniques, which combine different time series models together, a time series model— feed-forward neural network (FFNN) —are combined with a classifier—probabilistic neural network (PNN)—model. The aim of our proposed model is to use the unique advantages of the probabilistic neural networks as classifier models in order to classify and determine the existing trend in the residuals of the feed-forward neural networks. The procedure of the proposed model can be summarized in the five stages (see Fig. 3). In the first stage, the under-study time series {yt} is initially modeled by a feed-forward neural network, as follows. yt ¼ FitFFNN ðtÞ þet ¼ y^ t þet

ð10Þ

where FitFFNN(t) or y^ t and et are the estimated values and residuals of the feed-forward neural network at time period t, respectively. In the second stage, according to the obtained results of the first stage—the estimated values and residuals of the feedforward neural network model—and desired level of error (DLE), the residuals of feed-forward neural network are classified The training data Stage 1: Modeling the FFNN model

Results

(1): The estimated values (2): The residual values

The DLE value

Stage 2: Classifying the residuals of FF NN

Results

(1): Class of each residual

The training data

Stage 3: Designing & training a PNN (1): The training data (2): The estimated values

Results

(1): The target values of the PNN Stage 4: Calculating the optimum step length (OSL )

Results

in three categories as follows. The desired level of error is a nonnegative value that determines the sensitivity of the proposed model against the residuals of the feed-forward neural network. The DLE value is often the ideal level of accuracy for under-study problem, which is chosen by forecaster or decision maker. (i) The residuals, which are greater than the desired level of error (fei =ei 4 DLEg), are classified in category one with assigned number ‘‘trend¼ þ1’’. (ii) The residuals, which are less than the negative of the desired level of error (fei =ei oDLEg), are classified in category two with assigned number ‘‘trend¼ 1’’. (iii) The residuals, which are less than or equal to the desired level of error or are greater than or equal to the negative of the desired level of error (fei =DLEr ei rDLE or 9ei 9 r DLEg), are classified in category three with assigned number ‘‘trend¼0’’. In the third stage, a classifier model is applied in order to distinguish the existing trend in the residuals. In this paper, probabilistic neural network (PNN) is used as classifier. Technically, probabilistic neural network is able to deduce the class/ group of a given input vector after the training process is completed. There are a number of appealing features, which justify our adoption of this type of neural networks in this study. First, training of probabilistic neural networks is rapid, enabling us to develop a frequently updated training scheme. Essentially, the network is re-trained each time the data set is updated and thus the most current information can be reflected in estimation. Second, the logic of probabilistic neural network is able to extenuate the effects of outliers and questionable data points and thereby reduces extra effort on scrutinizing training data. Third and the most important, probabilistic neural networks are conceptually built on the Bayesian method of classification which given enough data, is capable of classifying a sample with the maximum probability of success. The probabilistic neural network is designed and trained by considering the assigned numbers of each category (trend) and subset of effective variables as output and input values, respectively. The effective variables on the target value of the mentioned probabilistic neural network at time period t are as follows: (i) Lags 1 until pth of the under-study time series at time period t (yt  1,yt  2,...,yt  p). (ii) Lags 1 until qth of the feed-forward neural network residuals at time period t (et  1,et  2,...,et  q). (iii) Estimated value of the feed-forward neural network at time period t ðy^ t Þ. (iv) Lags 1 until rth of the estimated values of the feed-forward neural network at time t ðy^ t1 , y^ t2 ,:::, y^ tr Þ. where p, q, r are integer. In fourth stage, according to the obtained results of the previous stages —the target values obtained from the designed probabilistic neural network ({ 1,0, þ 1}) and estimated values of the feed-forward neural network— optimum step length (OSL) is calculated using a mathematical programming model as follows: Minimize



n X

dt

t¼1

(1): The optimum step length (OSL)

(1): The estimated values (2): The target values Stage 5: Calculating the fitted values of the hybrid model

Fig. 3. Procedure in the proposed model.

1281

or



n X

2

dt

t¼1

8 dt Z yt y^ t tarðtÞ  OLS > > > > > > d Z y^ t yt þ tarðtÞ  OLS > < t subject to xt dt r xt yt xt y^ t xt tarðtÞ  OLS > > > ð1xt Þdt r ð1xt Þy^ t ð1xt Þyt þ ð1xt ÞtarðtÞ  OLS > > > > : OSL,dt Z 0, xt A f0,1g t ¼ 1,2,:::,n:

for for

t ¼ 1,2,:::,n t ¼ 1,2,:::,n

for

t ¼ 1,2,:::,n

for

t ¼ 1,2,:::,n

ð11Þ

1282

M. Khashei, M. Bijari / Engineering Applications of Artificial Intelligence 25 (2012) 1277–1288

where tar(t) is the target value obtained from probabilistic neural network at time period t and n is the training sample size. In the fifth stage, according to the obtained results of the previous stages —the estimated values of feed-forward neural network, target values of probabilistic neural network, and the optimum step length (OSL)—the fitted values of the proposed model is calculated as follows: Fitp ðtÞ ¼ FitFFNN ðtÞ þ ½tarðtÞ  OSL

ð12Þ

where Fitp(t) and FitFFNN(t) are the fitted values of the proposed model and FFNN model at time period t, respectively. According to Eq. (12) the performance of the proposed model in mean absolute error (MAE) and mean squared error (MSE) is respectively calculated as follows: " # n 1 X MAEp ¼ MAEFFNN þ ðDðtarðtÞ,tradeðtÞÞ  9tarðtÞ9  OSLÞ n t¼1 ð13Þ MSEp ¼ MSEFFNN þ

" # n 1 X ðDðtarðtÞ,tradeðtÞ  9tarðtÞ9  OSLÞ2 n t¼1 ð14Þ

where MAEp, MAEMLP, MSEp, and MSEMLP are the mean absolute error and mean squared error of the proposed model and feedforward neural network model, respectively, and D(x,y) is a function as follows: ( þ 1 if x ¼ y Dðx,yÞ ¼ ð15Þ 1 if x ay

5. Application of the proposed model to time series forecasting In this section, the proposed model is applied to time series forecasting using the three well-known real data sets in order to demonstrate the appropriateness and effectiveness of the proposed model and its performance is compared with those of other forecasting models.

models have been applied to these data sets, although more or less nonlinearities have been found in these series. 5.1.1. The Wolf’s sunspot data set The sunspot series is record of the annual activity of spots visible on the face of the sun and the number of groups into which they cluster. The sunspot data, which is considered in this investigation, contains the annual number of sunspots from 1700 to 1987, giving a total of 288 observations. The study of sunspot activity has practical importance to geophysicists, environment scientists, and climatologists. The data series is regarded as nonlinear and non-Gaussian and is often used to evaluate the effectiveness of nonlinear models. The plot of this time series, which is shown in Fig. 4, also suggests that there is a cyclical pattern with a mean cycle of about 11 years (Khashei and Bijari, 2011). The sunspot data has been extensively studied with a vast variety of linear and nonlinear time series models including autoregressive integrated moving average and artificial neural networks. To assess the forecasting performance of proposed model, the sunspot data set is divided into two samples of training and testing. The training data set, 221 observations (1700–1920), is exclusively used in order to formulate the model and then the test sample, the last 67 observations (1921–1987), is used in order to evaluate the performance of the established model with two forecast horizons of 35 and 67 periods. 5.1.2. The Canadian lynx series data set The lynx series, which is considered in this investigation, contains the number of lynx trapped per year in the Mackenzie River district of Northern Canada. The data set are plotted in Fig. 5, which shows a periodicity of approximately 10 years. The data set has 114 observations, corresponding to the period of 1821–1934. It has also been extensively analyzed in the time series literature with a focus on the nonlinear modeling (Khashei and Bijari, 2011), see Wong and Li (2000) for a survey. Following other studies, the logarithms (to the base 10) of the data are used in the analysis. As in the previous section, the lynx data set is divided into two samples of training and testing. The training data set, 100 observations (1821–1920), is exclusively used in order to formulate the model and then the test sample, the last 14 observations (1921–1934), is used in order to evaluate the performance of the established model.

5.1. Data sets In this section, three well-known real data sets including the Wolf’s sunspot data, the Canadian lynx data, and the British pound against the U S dollar exchange rate data are used in order to demonstrate the appropriateness and effectiveness of the proposed model. These time series come from different areas and have different statistical characteristics. They have been widely studied in the statistical as well as the neural network literature (Khashei and Bijari 2011). Both linear and nonlinear

5.1.3. The exchange rate (British pound/US dollar) data set The last data set that is considered in this investigation, is the exchange rate between British pound and US dollar. Predicting exchange rate is an important yet difficult task in international finance. Various linear and nonlinear theoretical models have been developed but few are more successful in out-of-sample forecasting than a simple random walk model. Recent applications of neural networks in this area have yielded mixed results. The data used in this paper contain the weekly observations from

Fig. 4. Sunspot series (1700–1987).

287

274

261

248

235

222

209

196

183

157

170

144

131

118

92

105

79

66

40

53

27

1

14

200 180 160 140 120 100 80 60 40 20 0

M. Khashei, M. Bijari / Engineering Applications of Artificial Intelligence 25 (2012) 1277–1288

1283

97

103

109

667

704

91 593

630

85 556

79

73

67

61

55

49

43

37

31

25

19

13

7

1

8000 7000 6000 5000 4000 3000 2000 1000 0

Fig. 5. Canadian lynx data series (1821–1934).

519

482

445

408

371

334

297

260

223

186

149

112

75

1

38

3 2.5 2 1.5 1 0.5 0

Fig. 6. Weekly British pound/US dollar exchange rate series (1980–1993).

1980 to 1993, giving 731 data points in the time series. The time series plot is given in Fig. 6, which shows numerous changing turning points in the series. Following (Meese and Rogoff, 1983) and (Zhang, 2003), we use the natural logarithmic transformed data in the modeling and forecasting analysis. As in the previous sections, the exchange rate data set is divided into two samples. The training data set, 679 observations (1800–1992), is exclusively used in order to formulate the model and then the test sample, the last 52 observations (1993), is used in order to evaluate the performance of the established model with three time horizons of 1, 6 and 12 months. 5.2. Results

Fig. 7. Architecture of the best fitted FFNN for sunspot data, N(4-4-1).

In this section, the procedure of the proposed model is illustrated by forecasting three aforementioned data sets. In this paper, all FFNN and PNN modeling is implemented via the MATLAB7 package software. All DLE values in the constructing procedure of the proposed model are considered equal to 5%  MAEFFNN. In addition, all OSL values are calculated using the first form of Eq. (11). The MAE (Mean Absolute Error) and MSE (Mean Squared Error) are employed as performance indicators to measure forecasting performance of proposed model. 5.2.1. The Wolf’s sunspot data forecasts Stage 1: In order to obtain the optimum FFNN architecture, different network architectures are evaluated to compare the performance. The best fitted network which is selected, and therefore, the architecture which presented the best forecasting accuracy with the test data, is composed of four inputs, four hidden and one output neurons (in abbreviated form, N(44-1) ), which has also been used by other researchers such as Zhang (2003), De Groot and Wurtz (1991) and Cottrell et al. (1995). The architecture of the designed network is shown in Fig. 7. Stage 2: Based on the DLE value, which is considered as 5%  MAEFFNN ¼0.677218, and residual values of FFNN model, which are obtained from first stage, the target values are calculated. Stage 3: A probabilistic neural network is designed and trained using target values, which are obtained from second stage, and effective variables. The final architecture of probabilistic

Fig. 8. The final designed architecture of PNN for sunspot data.

neural network, designed for the Wolf’s sunspot data consists of seven input and two output neurons which is shown in Fig. 8. The list of input and output variables of the designed PNN is summarized in Table 1. Stage 4: The optimum step length (OSL) is calculated using a mathematical programming model as Eq. (11), according to the target and estimated values, which are respectively obtained from the probabilistic neural network and FFNN model. The calculated value of OSL for exchange rate data set is 2.85, which are used in the next stage.

1284

M. Khashei, M. Bijari / Engineering Applications of Artificial Intelligence 25 (2012) 1277–1288

Table 1 List of input and output variables of the designed probabilistic neural network for the Sunspot data set. Input(s)

Output(s)

Variable 1

Variable 2

Variable 3

Variable 4

Variable 5

Variable 6

Variable 7

Variable

First lag of time series

Second lag of time series

Third lag of time series

Estimated value

First lag of residuals

Second lag of residuals

Third lag of residuals

Target

284

271

258

245

232

219

206

193

167

prediction

154

141

128

115

89

102

76

63

50

37

24

11

Actual

180

200 175 150 125 100 75 50 25 0

Fig. 9. Results obtained from the hybrid FFNN/PNN model for sunspot data set.

250

Actual

prediction

200 150 100 50 222 225 228 231 234 237 240 243 246 249 252 255 258 261 264 267 270 273 276 279 282 285 288

0

Fig. 10. FFNN model prediction of sunspot data (test sample).

250

Actual

prediction Fig. 12. Architecture of the best fitted FFNN for Canadian lynx data, N(7-5-1).

200 150 100 50 222 225 228 231 234 237 240 243 246 249 252 255 258 261 264 267 270 273 276 279 282 285 288

0

Fig. 11. Hybrid FFNN/PNN model prediction of sunspot data (test sample).

Stage 5: Finally, the fitted values of the proposed model are calculated based on the obtained results of the previous stages using Eq. (12). The estimated values of the hybrid FFNN/PNN model for Wolf’s sunspot data set are plotted in Fig. 9. The estimated values of feed-forward neural network and hybrid proposed model for test data are also plotted in Figs. 10 and 11, respectively.

5.2.2. The Canadian lynx series data forecasts Stage 1: Similar to the previous section, different architectures of the feed-forward neural networks are evaluated in order to obtain the optimum architecture. The best fitted network which is selected, and therefore, the architecture which presented the best forecasting accuracy with the test data, is composed of seven inputs, five hidden and one output neurons (N(7-5-1)) which has also been used by other researchers (Zhang, 2003). The architecture of the designed network is shown in Fig. 12. Stage 2: Based on the DLE value, which is considered as 5%  MAEFFNN ¼0.005605, and residual values of FFNN model, which are obtained from first stage, the target values are calculated.

Fig. 13. The final designed architecture of PNN for Canadian lynx data.

Stage 3: A probabilistic neural network is designed and trained using target values, which are obtained from second stage, and effective variables. The final architecture of probabilistic neural network, designed for the Canadian lynx series data consists of seven input and two output neurons which is shown in Fig. 13. The list of input and output variables of the designed PNN is summarized in Table 2. Stage 4: The optimum step length (OSL) is calculated using a mathematical programming model as Eq. (11), according to

M. Khashei, M. Bijari / Engineering Applications of Artificial Intelligence 25 (2012) 1277–1288

1285

Table 2 List of input and output variables of the designed probabilistic neural network for the Canadian lynx series data set. Input(s)

Output(s)

Variable 1

Variable 2

Variable 3

Variable 4

Variable 5

Variable 6

Variable 7

Variable

First lag of time series

Second lag of time series

Third lag of time series

Fourth lag of time series

Estimated value

First lag of residuals

Second lag of residuals

Target

116

110

98

104

92

86

80

prediction

74

62

56

44

50

38

32

26

20

14

actual

68

4.5 4.0 3.5 3.0 2.5 2.0 1.5

Fig. 14. Results obtained from the hybrid FFNN/PNN model for Canadian lynx data set.

4.0

Fig. 17. Architecture of the best fitted FFNN for exchange rate data, N(7-6-1).

3.0 2.0 1.0

Actual

prediction 114

113

112

111

110

109

108

107

106

105

104

103

102

101

0.0

Fig. 15. FFNN model prediction of Canadian lynx data (test sample).

4.0 3.0 2.0 1.0

Actual

prediction 114

113

112

111

110

109

108

107

106

105

104

103

102

101

0.0

Fig. 16. Hybrid FFNN/PNN model prediction of Canadian lynx data (test sample).

the target and estimated values, which are respectively obtained from the probabilistic neural network and FFNN model. The calculated value of OSL for exchange rate data set is 0.059, which are used in the next stage. Stage 5: Finally, the fitted values of the proposed model are calculated based on the obtained results of the previous stages using Eq. (12). The estimated values of the hybrid FFNN/PNN model for the Canadian lynx series data are plotted in Fig. 14. The estimated values of feed-forward neural network and hybrid proposed model for test data are also plotted in Figs. 15 and 16, respectively.

5.2.3. The exchange rate (British pound/US dollar) data forecasts Stage 1: Similar to the previous sections, in order to obtain the optimum FFNN architecture, different network architectures are evaluated to compare the performance. The best fitted network which is selected, and therefore, the architecture which presented the best forecasting accuracy with the test data, is composed of seven inputs, six hidden and one output neurons (in abbreviated form, N(7-6-1)), which has also been used by other researchers (Zhang, 2003). The architecture of the designed network is shown in Fig. 17.

Fig. 18. The final designed architecture of PNN for exchange rate data.

Stage 2: Based on the DLE value, which is considered to 5%  MAEFFNN ¼0.000263, and residual values of FFNN model, which are obtained from the first stage, the target values are calculated. Stage 3: A probabilistic neural network is designed and trained using target values, which are obtained from second stage, and effective variables. The final architecture of the probabilistic neural network, designed for the British pound against US dollar exchange rate consists of nine input and two output neurons, which is shown in Fig. 18. The list of input and output variables of the designed PNN is summarized in Table 3. Stage 4: The optimum step length (OSL) is calculated using a mathematical programming model as Eq. (11), according to the target and estimated values, which are respectively obtained from the probabilistic neural network and FFNN model. The calculated value of OSL for exchange rate data set is 0.0030511, which are used in the next stage. Stage 5: Finally, the fitted values of the proposed model are calculated based on the obtained results of the previous stages using Eq. (12). The estimated values of the hybrid FFNN/PNN model for the British pound against US dollar exchange rate data set are plotted in Fig. 19. The estimated values of the feed-forward neural network and hybrid proposed model for test data are also plotted in Figs. 20 and 21, respectively.

1286

M. Khashei, M. Bijari / Engineering Applications of Artificial Intelligence 25 (2012) 1277–1288

Table 3 List of input and output variables of the designed probabilistic neural network for the exchange rate data set. Input(s)

Output(s) Variable 7

Variable 8

Variable 9

Variable

Estimated value

First lag of residuals

Second lag of residuals

Third lag of residuals

Fourth lag of residuals

Target

718

681

644

607

570

533

prediction

496

237

163

200

126

89

52

15

Actual

459

3.5 3 2.5 2 1.5 1 0.5

Variable 6

422

Third lag of time Fourth lag of series time series

Variable 5

348

First lag of time Second lag of series time series

Variable 4

385

Variable 3

311

Variable 2

274

Variable 1

Fig. 19. Results obtained from the hybrid FFNN/PNN model for exchange rate data set.

731

728

725

722

719

716

713

707

prediction 704

701

698

695

692

689

686

683

680

Actual

710

0.22 0.2 0.18 0.16 0.14 0.12 0.1

Fig. 20. FFNN model prediction of exchange rate data (test sample).

731

728

725

722

719

713

716

707

prediction 704

701

698

695

692

689

686

683

680

Actual

710

0.22 0.2 0.18 0.16 0.14 0.12 0.1

5.3.1. The Wolf’s sunspot data In the Wolf’s sunspot forecasting case, two forecast horizons of 35 and 67 periods are used in order to assess the forecasting performance of models. The forecasting results of the hybrid FFNN/PNN is given in Table 4. In addition, the improvement percentage of the proposed model in comparison with those models for the sunspot data is summarized in Table 5. The obtained results indicate that our hybrid model not only has significantly yielded more accurate results than FFNN model, but also outperforms than autoregressive integrated moving average (ARIMA) and Zhang’s hybrid model across two different time horizons and with both error measures. For example, in terms of MSE, the percentage improvements of the hybrid (FFNN/ PNN) model over than FFNN model for 35 and 67-period forecasts are 37.17% and 22.3%, respectively.

Fig. 21. Hybrid FFNN/PNN model prediction of exchange rate data (test sample).

5.3. Comparison with other forecasting models In this section, the predictive capabilities of the proposed model based feed-forward neural network (FFNN/PNN) is compared with the feed-forward neural network and also autoregressive integrated moving average, and Zhang’s hybrid (FFNN/ ARIMA) (Zhang 2003) models, using three aforementioned wellknown real data sets. According to the previous works in the literature, the MAE (Mean Absolute Error) and MSE (Mean Squared Error) are employed as performance indicators in order to measure forecasting performance of proposed model in comparison with those other forecasting models. These performance indicators are respectively computed from the following equations: MAE ¼

N 1X 9ðy y^ Þ9 Ni¼1 i i

ð16Þ

MSE ¼

N 1X ðy y^ Þ2 Ni¼1 i i

ð17Þ

where yi and y^ i are actual and estimated values, respectively.

5.3.2. The Canadian lynx data set In the Canadian lynx series forecasting case, one forecast horizons is used in order to assess the forecasting performance of models. In a similar fashion, the forecasting results of the hybrid FFNN/PNN model for the last 14 years are given in Table 6. In addition, the improvement percentage of the proposed model in comparison with those models for the sunspot data is summarized in Table 7. The numerical results show that our hybrid model gives significantly better forecasts than feed-forward neural network and other forecasting models. For example in terms of MAE and MSE, our hybrid (FFNN/PNN) model indicates 28.97% and 27.33% decrease over FFNN model, respectively.

5.3.3. The exchange rate (British pound/US dollar) forecasts In the exchange rate series forecasting case, three time horizons of 1, 6 and 12 months are used in order to assess the forecasting performance of models. As in the previous sections, the forecasting results of the hybrid FFNN/PNN model are given in Table 8. In addition, the improvement percentage of the proposed model in comparison with those models for the exchange rate data is summarized in Table 9. Results of the exchange rate forecasting, same the previous cases, indicate that our hybrid model significantly outperforms FFNN, ARIMA, and Zhang’s hybrid (FFNN/ARIMA) models with both error measures for all time horizons (short-term forecasting (1 month), and long-term horizons (6 and 12 month)).

M. Khashei, M. Bijari / Engineering Applications of Artificial Intelligence 25 (2012) 1277–1288

1287

Table 4 Comparison of the performance of the hybrid FFNN/PNN model with those of other forecasting models (Sunspot data set). Model

35 points ahead

Feed-forward neural network (FFNN) Autoregressive integrated moving average (ARIMA) Zhang’s hybrid (FFNN/ARIMA) model Our hybrid model (FFNN/PNN)

67 points ahead

MAE

MSE

MAE

MSE

10.243 11.319 10.831 8.265

205.302 216.965 186.827 128.987

13.544365 13.033739 12.780186 11.462682

351.19366 306.08217 280.15956 272.888719

Table 5 Percentage improvement of hybrid FFNN/PNN model in comparison with other those forecasting models (Sunspot data set). Model

35 points ahead

Feed-forward neural network (FFNN) Autoregressive integrated moving average (ARIMA) Zhang’s hybrid (FFNN/ARIMA) model

67 points ahead

MAE

MSE

MAE

MSE

19.31 26.98 23.69

37.17 40.55 30.96

15.37 12.05 10.31

22.30 10.84 2.60

Table 6 Comparison of the performance of the hybrid FFNN/PNN model with those of other forecasting models (lynx data set). Model

MAE

MSE

Feed-forward neural network (FFNN) Autoregressive integrated moving average (ARIMA) Zhang’s hybrid (FFNN/ARIMA) model Our hybrid model (FFNN/PNN)

0.112109 0.112255 0.103972 0.079628

0.020466 0.020486 0.017233 0.014872

Table 7 Percentage improvement of hybrid FFNN/PNN model in comparison with those of other forecasting models (lynx data set). Model

MAE (%)

MSE (%)

Feed-forward neural network (FFNN) Autoregressive integrated moving average (ARIMA) Zhang’s hybrid (FFNN/ARIMA) model

28.97 29.07 23.41

27.33 27.40 13.70

Table 8 Comparison of the performance of the hybrid FFNN/PNN model with those of other forecasting models (exchange rate)n. Model

Feed-forward neural network (FFNN) Autoregressive integrated moving average (ARIMA) Zhang’s hybrid (FFNN/ARIMA) model Our hybrid model (FFNN/PNN) n

1 month

6 month

12 month

MAE

MSE

MAE

MSE

MAE

MSE

0.004218 0.005016 0.004146 0.003252

2.76375 3.68493 2.67259 1.46598

0.0059458 0.0060447 0.0058823 0.0045772

5.71096 5.65747 5.65507 3.23553

0.0052513 0.0053579 0.0051212 0.0042713

4.52657 4.52977 4.35907 2.77122

Note: All MSE values should be multiplied by 10  5.

Table 9 Percentage improvement of hybrid FFNN/PNN model in comparison with those of other forecasting models (exchange rate). Model

Feed-forward neural network (FFNN) Autoregressive integrated moving average (ARIMA) Zhang’s hybrid (FFNN/ARIMA) model

1 month

6 month

12 month

MAE

MSE

MAE

MSE

MAE

MSE

22.90 35.17 21.56

46.96 60.22 45.15

23.02 24.28 22.19

43.35 42.81 42.79

18.66 20.28 16.60

38.78 38.82 36.43

1288

M. Khashei, M. Bijari / Engineering Applications of Artificial Intelligence 25 (2012) 1277–1288

6. Conclusions Improving forecasting especially time series forecasting accuracy is an important yet often difficult task facing forecasters. Both theoretical and empirical findings have indicated that integration of different models can be an effective way of improving upon their predictive performance, especially when the models in the ensemble are quite different. In the literature, several hybrid techniques have been proposed by combining different time series models together, in order to yield results that are more accurate. In this paper, a methodology is proposed in order to construct a new class of hybrid models of feed-forward neural networks (FFNNs) using a probabilistic neural networks as classifier. In our proposed method, the estimated values of the feed-forward neural network model are modified using the unique advantages of the PNNs in order to classify the existing trend in the residuals of the FFNN model and optimum step length. Empirical results with three well-known real data sets indicate that the proposed method can be an effective way in order to construct a more accurate hybrid model than feedforward neural networks. Therefore, it can be used as an appropriate alternative model for forecasting task, especially when higher forecasting accuracy is needed.

Acknowledgments The authors wish to express their gratitude to the referees and Dr. Gholam Ali Raissi Ardali, Assistant Professor of industrial engineering, Isfahan University of Technology, for their insightful and constructive comments, which helped to improve the paper greatly. References Amin-Naseri, M.R., Soroush, A.R., 2008. Combined use of unsupervised and supervised learning for daily peak load forecasting. Energy Convers. Manage. 49, 1302–1308. Aladag, C., Egrioglu, E., Kadilar, C., 2009. Forecasting nonlinear time series with a hybrid methodology. Appl. Math. Lett. 22, 1467–1470. Bates, J.M., Granger, W.J., 1969. The combination of forecasts. Oper. Res. 20, 451–468. Benardos, P.G., Vosniakos, G.C., 2007. Optimizing feed-forward artificial neural network architecture. Eng. Appl. Artif. Intell. 20, 365–382. Cadenas, E., Rivera, W., 2010. Wind speed forecasting in three different regions of Mexico, using a hybrid ARIMA–ANN model. Renewable Energy 35, 2732–2738. Cheng, M., Tsai, H., Sudjono, E., 2010. Evolutionary fuzzy hybrid neural network for project cash flow control. Eng. Appl. Artif. Intell. 23, 604–613. Cacoullos, T., 1966. Estimation of multivariate density. Ann. Inst. Stat. Math. 18 (2), 179–189. Chen, A., Leung, M., Daouk, H., 2003. Application of neural networks to an emerging financial market: forecasting and trading the Taiwan Stock Index. Comput. Oper. Res. 30, 901–923. Cottrell, M., Girard, B., Girard, Y., Mangeas, M., Muller, C., 1995. Neural modeling for time series: a statistical stepwise method for weight elimination. IEEE Trans. Neural Networks 6 (6), 1355–1364. De Groot, C., Wurtz, D., 1991. Analysis of univariate time series with connectionist nets: a case study of two classical examples. Neurocomputing 3, 177–192. Faruk, D., 2010. A hybrid neural network and ARIMA model for water quality time series prediction. Eng. Appl. Artif. Intell. 23, 586–594. Ge, S.S., Yang, Y., Lee, T.H., 2008. Hand gesture recognition and tracking based on distributed locally linear embedding. Image Vision Comput. 26, 1607–1620. Hosseini, H., Luo, D., Reynolds, K.J., 2006. The comparison of different feed-forward neural network architectures for ECG signal diagnosis. Med. Eng. Phys. 28, 372–378.

Hajmeer, M., Basheer, I., 2003. Comparison of logistic regression and neural network-based classifiers for bacterial growth. Food Microbiol. 20, 43–55. Hajmeer, M., Basheer, I., 2002. A probabilistic neural network approach for modeling and classification of bacterial growth/no-growth data. J. Microbiol Methods 51, 217–226. Jiang, X., Wah, A.H.K.S., 2003. Constructing and training feed-forward neural networks for pattern classification. Pattern Recognit. 36, 853–867. M. Khashei Forecasting the Isfahan Steel Company production price in Tehran Metals Exchange using Artificial Neural Networks (ANNs), Master of Science Thesis, Isfahan University of Technology, 2005. Khashei, M., Bijari, M., 2011. A novel hybridization of artificial neural networks and ARIMA models for time series forecasting. Appl. Soft. Comput. 11, 664–2675. Khashei, M., Bijari, M., Raissi, G.H.A., 2009. Improvement of auto-regressive integrated moving average models using fuzzy logic and artificial neural networks (ANNs). Neurocomputing 72, 956–967. Khashei, M., Hejazi, S.R., Bijari, M., 2008. A new hybrid artificial neural networks and fuzzy regression model for time series forecasting. Fuzzy Sets Syst. 159, 769–786. Khashei, M., Bijari, M., 2010. An artificial neural network (p, d, q) model for time series forecasting. Expert Syst. Appl. 37, 479–489. Khashei, M., Bijari, M., 2011. A new hybrid methodology for nonlinear time series forecasting. Model. Simul. Eng.. doi:10.1155/2011/379121 11. Lin, G., Wu, M., 2009. A hybrid neural network model for typhoon-rainfall forecasting. J. Hydrol. 375, 450–458. Leigh, W., Paz, M., Purvis, R., 2002. An analysis of a hybrid neural network and pattern recognition technique for predicting short-term increases in the NYSE composite index. Omega 30, 69–76. Leski, J., Czogala, E., 1999. A new artificial network based fuzzy interference system with moving consequents in if-then rules and selected applications. Fuzzy Sets Syst. 108, 289–297. Lee, J., Kang, S., 2007. GA based meta-modeling of BPN architecture for constrained approximate optimization. Int. J. Solids Struct. 44, 5980–5993. Ma, L., Khorasani, K., 2003. A new strategy for adaptively constructing multilayer feed-forward neural networks. Neurocomputing 51, 361–385. Marin, D., Varo, A., Guerrero, J.E., 2007. Non-linear regression methods in NIRS quantitative analysis. Talanta 72, 28–42. Masters, T., 1995. Advanced Algorithms for Neural Networks. Wiley, New York. Meese, R.A., Rogoff, K., 1983. Empirical exchange rate models of the seventies: do they/t out of samples? J. Int. Econ. 14, 3–24. Parzen, E., 1962. On estimation of a probability density function and mode. Ann. Math. Stat. 36, 1065–1076. Reid, M.J., 1968. Combining three estimates of gross domestic product. Economica 35, 431–444. Ross, J.P., 1996. Taguchi Techniques for Quality Engineering. McGraw-Hill, New York. Rumelhart, D., McClelland, J., 1986. Parallel Distributed Processing. MIT Press, Cambridge, MA. Shafie-khah, M., Parsa Moghaddam, M., Sheikh-El-Eslami, M.K., 2011. Price forecasting of day-ahead electricity markets using a hybrid forecast method. Energy Convers. Manage. 52, 2165–2169. Specht, D., 1990. Probabilistic neural networks. Neural Networks 3, 109–118. Tseng, F.M., Yu, H.C., Tzeng, G.H., 2002. Combining neural network model with seasonal time series ARIMA model. Technol. Forecas. Soc. Change 69, 71–87. Tsaih, R., Hsu, Y., Lai, C., 1998. Forecasting S&P 500 stock index futures with a hybrid AI system. Decis. Support Syst. 23, 161–174. Tsai, C., 2006. On detecting nonlinear patterns in discriminant problems. Inf. Sci. 176, 772–798. Wu, Q., Wu, S., Liu, J., 2010. Hybrid model based on SVM with Gaussian loss function and adaptive Gaussian PSO. Eng. Appl. Artif. Intell. 23, 487–494. Wong, C.S., Li, W.K., 2000. On a mixture autoregressive model. J. R. Stat. Soc. Ser., B 62 (1), 91–115. Xue, C.X., Zhang, X.Y., Liu, M.C., Hu, Z.D., Fan, B.T., 2005. Study of probabilistic neural networks to classify the active compounds in medicinal plants. J. Pharm. Biomed. Anal. 38, 497–507. Yu, L., Wang, S., Lai, K.K., 2005. A novel nonlinear ensemble forecasting model incorporating GLAR and ANN for foreign exchange rates. Comput. Oper. Res. 32, 2523–2541. Zhang, G., Patuwo, B.E., Hu, M.Y., 1998. Forecasting with artificial neural networks: the state of the art. Int. J.Forecasting 14, 35–62. Zhang, G.P., 2003. Time series forecasting using a hybrid ARIMA and neural network model. Neurocomputing 50, 159–175.