Deep belief network-based AR model for nonlinear time series forecasting

Deep belief network-based AR model for nonlinear time series forecasting

Applied Soft Computing Journal 77 (2019) 605–621 Contents lists available at ScienceDirect Applied Soft Computing Journal journal homepage: www.else...

1MB Sizes 2 Downloads 56 Views

Applied Soft Computing Journal 77 (2019) 605–621

Contents lists available at ScienceDirect

Applied Soft Computing Journal journal homepage: www.elsevier.com/locate/asoc

Deep belief network-based AR model for nonlinear time series forecasting ∗

Wenquan Xu a,b , Hui Peng a , , Xiaoyong Zeng a , Feng Zhou c , Xiaoying Tian a , Xiaoyan Peng d a

School of Automation, Central South University, Changsha, Hunan 410083, China School of Physics and Electrical Engineering, Anqing Normal University, Anqing, Anhui 246001, China c College of Electronic Information and Electrical Engineering, Changsha University, Changsha, Hunan 410003, China d College of Mechanical and Vehicle Engineering, Hunan University, Changsha, Hunan 410082, China b

highlights • • • • •

The DBN-AR model combines the advantage of the DBN and SD-AR model. The DBN-AR model is driven by the state signal. The initial target values of the DBN are determined by pseudo inverse matrix. The whole DBN-AR model is fine tuned by using a BP algorithm. The results show that the DBN-AR model is superior to some existing models.

article

info

Article history: Received 22 January 2018 Received in revised form 5 November 2018 Accepted 6 February 2019 Available online 7 February 2019 Keywords: Deep belief network State-dependent AR model Time series forecasting System identification

a b s t r a c t For a class of nonlinear time series whose dynamic behavior smoothly changes with the system state, a state-dependent auto-regressive (SD-AR) model is proposed to characterize the nonlinear time series. A set of deep belief networks (DBNs) is used to build the state-dependent functional coefficients of the SD-AR model, and the proposed model is called DBN-AR model, which combines the advantage of DBN in function approximation and the merit of SD-AR model in nonlinear dynamics description. The DBN-AR model is driven by the state signal changing with time. Based on the least squares solution with minimum norm and the pseudo inverse matrix approach, the initial target values of the DBNs are determined in pre-training stage. In fine tuning stage, all parameters of DBN-AR model is finally tuned by the back propagation (BP) algorithm designed for fine-tuning of DBN-AR model. Through experiment and comparative study on the sunspot data, the electricity load demand data sets from Australian Energy Market Operator (AEMO), the weekly British Pound/US dollar (GBP/USD) exchange rate data and the daily electricity generation data of the Three Gorges dam right bank power station, it is shown that the DBN-AR model is superior to some existing models or methods in prediction accuracy. © 2019 Elsevier B.V. All rights reserved.

1. Introduction Time series forecasting has become a very important research field because of its important applications in many fields [1]. For example, forecasting daily electricity generation is helpful to the power department; forecasting traffic condition helps people arrange travel reasonably [2]. However, time series data have different characteristics. It is difficult to use a same model or method to predict the different time series. In the past few decades, many time series models have been developed, an introduction to the time series forecasting models can be found in [3]. In general, a model may be categorized as parametric model or non-parametric model. From the point of view of modeling ∗ Corresponding author. E-mail address: [email protected] (H. Peng). https://doi.org/10.1016/j.asoc.2019.02.006 1568-4946/© 2019 Elsevier B.V. All rights reserved.

methods, the most frequently used time series prediction models may be divided into three types: statistical models, artificial intelligence (AI) models and hybrid models [4,5]. In the first type, the autoregressive (AR) model, autoregressive moving average (ARMA) model, random walk (RW) and autoregressive integrated moving average (ARIMA) model are the widely used statistical models, in which the signal’s future values are modeled by a linear function of the past values. In literature, ARIMA model was applied to the different time series modeling, such as paddy production [6], shortterm city electricity load [7] and daily wind speed [8]. ARMA model was used to forecast tourism demand [9], exchange rate [10] and electricity price [11], and the experimental results showed that the model has certain prediction accuracy. The AR type models above can well model linear parts of time series, but they are not suitable for nonlinear time series modeling [12].

606

W. Xu, H. Peng, X. Zeng et al. / Applied Soft Computing Journal 77 (2019) 605–621

In the second type, a large number of nonlinear artificial intelligence (AI) models have been used in different forecasting fields, such as artificial neural networks (ANNs), support vector machine (SVM), radial basis function (RBF) networks, and deep learning (DL) networks. A popular topic in modern data analysis has been ANNs [13–16]. Many literatures used AI model to handle time series data for modeling and forecasting. Feedforward neural networks (FFNNs) are the most popular neural network paradigms in the prediction of time series [10]. For example, Palani et al. forecasted water quality using ANN [17]. Mandal et al. proposed a novel approach to forecasting electricity price for PJM using ANN and the similar days method [18]. Zhou et al. [19] proposed a SVM for shortterm wind speed forecasting. The RBF neural network offers an alternative to traditional models because of its simple structure and strong learning capability [13]. The ANN models have been used in some classical time series and real life time series modeling, and the results showed that the ANN model has superior performance compared to some existing methods [20–22]. However, over fitting and the learning process to stop at local optima are two problems that ANN needs to avoid [23,24]. To solve those two problems, Hinton et al. [25] proposed a deep belief network (DBN) model with multiple Restricted Boltzmann Machines (RBMs). DBN now is widely used in time series forecasting [12,23,26–28], and the results have shown DBN’s superiority over linear AR type model and conventional back propagation neural network (BPNN). For example, Kuremoto et al. [23] proposed a DBN model for time series forecasting, and the DBN’s superiority over conventional multilayer perceptron (MLP) model and ARIMA model has been verified by the CATS benchmark data modeling. The long short term memory (LSTM) neural network is a novel architecture of neural networks, which has good prediction performance and can overcome the issue of back-propagated error decay through memory blocks [29]. However, the traditional single model may not accurately represent the complex relations existed in the nonlinear and nonstationary time series [4,30–35]. Thus, in recent years, researchers studied some hybrid models that combine neural networks with other models. Hybrid model aims to further improve the performance of forecasting methods by strategically combining multiple algorithms, so this kind of hybrid models may achieve higher prediction accuracy [36]. For example, Ref. [37] showed that the empirical mode decomposition (EMD) based hybrid methods normally outperform the corresponding single structure models for time series forecasting, and nine benchmark methods have compared to verify the effectiveness of the EMD-DBN method, which are Persistence [37], Ensemble DBN (EDBN) [38], support vector regression (SVR) [39], ANN [40], DBN [25], random forest (RF) [41], EMD-SVR [42], EMD-ANN [43] and EMD-RF [37]. Schimbinschi et al. [44] proposed a learning topology-regularized universal vector autoregression (TRU-VAR) model for traffic forecasting, and the results showed that the proposed method scales well and can be trained efficiently with low generalization error. Li. et al. [45] combined traditional ANN with wavelet networks, and established a hybrid model for short-term power load forecasting, and the numerical testing showed that the proposed method can obtain better forecasting results in comparison with other standard and state of the art methods. In [46], the multiscale deep feature learning with hybrid model was used for daily reservoir inflow forecasting. In [12], DBN, ARIMA model and particle swarm optimization (PSO) method were combined for red tide time series forecasting. Although the hybrid model integrated some single models may improve the forecasting ability to some extent [47–50], a single model often may not thoroughly deal with nonlinearity and nonstationarity of time series. State-dependent AR (SD-AR) model [51] can characterize nonlinear and non-stationary time series. On the basis of SD-AR modeling framework, Vesin [52] proposed the RBFAR model that uses a set of RBF nets to approximate the functional

coefficients of SD-AR model. Shi et al. [53] studied the estimation of RBF-AR model that was further developed by Peng et al. [54] to RBF-ARX model. Gan et al. [13] presented a modeling approach to nonlinear time series, which uses a set of locally linear RBF networks (LLRBF) to approximate the functional coefficients of SDAR model. The RBF-ARX model has been applied to various modelbased predictive control problems [55]. In general, as seen in [13,54,56–59], the structure of radial basis function (RBF) network consists of only three layers: an input layer, a hidden layer and an output layer. Indeed, by adding more hidden layers the RBF network will be a deep network, so the input signals flow consecutively through the more hidden layers from the input to the output layer [60–62]. However, this type of deep network may be unstable as the network layer goes deeper, and its parameters estimation process is easy to fall into local minimum [60–62], which has a marked impact on the accuracy of the results. But as aforementioned, DBN model may solve the problem falling into local minimum [25]. Therefore, a type of SD-AR model, called DBN-AR model is proposed in this paper, which uses a set of DBNs to approximate the functional coefficients of SD-AR model. The motivation to study DBN-AR model in this paper is to improve the RBF-AR modeling method [13,54,56] for nonlinear system modeling. The RBF-AR model is a state-dependent AR model [51] that can characterize nonlinear time series. However, The RBF-AR model’s coefficients are composed of the RBF nets with one hidden layer, which are not deep belief networks. Because the representing capability of the RBF net with one hidden layer to nonlinear behavior is usually weaker than that of deep belief network, we study the DBN-AR modeling problem in this paper by using DBN to replace the RBF net with one hidden layer in the RBF-AR for getting better nonlinear time series modeling performance. The DBN-AR model has the advantages of DBN in function approximation and SD-AR model in nonlinear dynamics description. Benefiting from the deeper layers of DBN and the SDAR modeling framework, the proposed DBN-AR model may have a desirable modeling performance. The DBN-AR model is driven by system state signals to reflect the dynamic characteristics of time series. To estimate the DBN-AR model, a pseudo inverse matrix is designed to determine the initial target values of the DBNs in the DBN-AR model at the pre-training stage of the DBNs, and at fine tuning stage, all parameters of the DBN-AR model is finally tuned by the back propagation (BP) algorithm designed for fine-tuning of the DBN-AR model. The DBN-AR model includes a DBN as one of its components, which may be regarded as a more general nonlinear model compared with a single DBN, and the structure of the DBN in the model may be much simpler than that of a single DBN model, because the DBN-AR model partially disperses the complexity of the model into the AR part. In addition, the DBN-AR model is trained in a greedy manner, for example, layer-wise pre-training, which permits training deeper network and alleviate trapping into local minimum [25]. On the other hand, a linear AR model with constant or time-varying coefficients is parametric model, and DBN is a non-parametric model. The DBN-AR model is basically a non-parametric model; however, if one regards it as a timevarying AR model with state-dependent DBN-type time-varying coefficients, then the DBN-AR model may be also regarded as a parametric model with time-varying coefficients. The DBN-AR model is actually a hybrid model combined of non-parametric and parametric approaches. In this paper, the proposed DBN-AR model is applied to characterize the sunspot time series [63], the electricity load demand data sets from Australian Energy Market Operator (AEMO) [64], the weekly British Pound/US dollar (GBP/USD) exchange rate data [10] and the daily electricity generation data from the Three Gorges dam right bank power station. The modeling

W. Xu, H. Peng, X. Zeng et al. / Applied Soft Computing Journal 77 (2019) 605–621

results show that the DBN-AR model exhibits much better prediction accuracy compared with some existing modeling methods. The remainder of this paper is organized as follows. Section 2 introduces the DBN-AR model. In Section 3, the estimation method for the DBN-AR model is presented. In Section 4, some performance measures are given to evaluate the validity of the estimated model. Results of the experimental investigation of the proposed DBNAR model and its comparison with other modeling methods are described in Section 5. We give the future study and conclude this paper in Section 6. Finally, the fine tuning procedure of the DBN-AR model is presented in Appendix. 2. DBN-AR model 2.1. State-dependent AR (SD-AR) model

(1)

where f (•) is the unknown nonlinear map, p is the order of the model, and ε (t ) is the Gaussian white noise. Many types of functions have been applied to approximate the unknown nonlinear mapping f (•) in model (1) [13]. The following state-dependent AR (SD-AR) model proposed by Priestley [51] is adopted in this paper: y (t ) = φ0 (W (t − 1)) +

p ∑

φi (W (t − 1)) y (t − i) + ε (t )

successfully used in prediction of transportation, power load, wind speed and stock markets, and it has achieved a certain prediction accuracy [10,26,45,65]. In this paper, DBN is used to approximate the function type coefficient in model (2), and the input–output relationship of the used DBN given in Fig. 1 is described as follows ⎧ ( ) ⎪ ⎪ ⎪φ (W(t − 1)) = ϕ w1(Nr ) h(Nr −1) (t ) + b(1Nr ) ⎪ ⎪ ⎪ ( )T ⎪ ⎪ (ℓ) (ℓ) (ℓ) ⎪ ⎪ ⎪h(ℓ) (t ) = h1 (t ) , h2 (t ) , . . . , hQℓ (t ) , ℓ ∈ {1, 2, . . . , Nr − 1} ⎪ ⎨ ( ) (ℓ) (ℓ) (ℓ−1) (3) , nℓ ∈ {1, 2, . . . , Qℓ } (t ) + bn(ℓ) ⎪hnℓ (t ) = ϕ wnℓ h ℓ ⎪ ⎪ ( ) ⎪ ⎪ (ℓ) (ℓ) (ℓ) (ℓ) ⎪ ⎪ wnℓ = wn ,1 , wn ,2 , . . . , wn ,Q , Q0 = p ⎪ ⎪ ℓ ℓ ℓ ℓ− 1 ⎪ ⎪ ⎪ ⎩ h(0) (t ) = W(t − 1) = (y(t − 1), y(t − 2), . . . , y(t − p))T (ℓ)

Without loss of generality, we consider one-step-ahead prediction for nonlinear time series. For a given nonlinear time series {y(t) ∈ ℜ, t = 1, 2, . . . , N }, the purpose of the time series modeling is to build a function with satisfactory prediction accuracy, f ∈ ℜ, and the form is defined as follows: y (t ) = f (y (t − 1) , y (t − 2) , . . . , y (t − p)) + ε (t )

607

(2)

i=1

where W (t − 1) = (y (t − {1) , y (t − 2) , . . . , y (t − p))T is} the state vector at time t − 1, and φj (W (t − 1)) , j = 0, 1, . . . , p are the state-dependent functional coefficients of the model. From SDAR model (2), a local linearization of system (1) can be easily obtained by fixing the state-dependent coefficients of model (2). SDAR model (2) can represent nonlinear and non-stationary system, so it provides a very useful framework for general nonlinear time series modeling, but the most challenging problem is how to select the functional form of its coefficients. In this study, we use the deep belief network (DBN) to approximate the nonlinear function φj (W (t − 1)) in SD-AR model (2), and the built model is called as DBN-AR model. 2.2. Structure of DBN-AR model To find the suitable state-dependent coefficients of model (2) is a problem of function approximation from the multidimensional input space W(t − 1) to a one-dimensional space φ . In this paper, the deep belief network (DBN) with multiple restricted Boltzmann machines (RBMs) is applied to approximate each state-dependent coefficient of model (2), and the structure of the DBN is shown in Fig. 1 where h is the hidden vector, and v is the visible vector. DBN is proposed by Hinton in 2006 [25] to simplify the reasoning problem of the logistic belief network, which is one of the deep learning models. A DBN is constructed by stacking Restricted Boltzmann Machines (RBM) [25]. To train a DBN, in the first stage, DBN is pre-trained in a greedy layer-wise unsupervised fashion from the lowest RBM, and the output of the lowest RBM is used as the input of the following RBM, and so on. In the second stage, all the DBN parameters are optimized by back propagation (BP) algorithm. Choosing the appropriate parameters such as the numbers of hidden layers and neurons can make the DBN have good approximation ability to a nonlinear function. DBN has been

where w matrix between layer ℓ and layer ( nℓ denotes the weight ) (ℓ) (ℓ) (ℓ) ℓ − 1, b1 , b2 , . . . , bQℓ the bias of layer ℓ, Qℓ the number of

nodes on the layer ℓ, Nr the total number of layers, h(ℓ) (t ) the output of layer ℓ, φ (W(t − 1)) the output of the DBN, ϕ (x) = 1/(1 + e−x ) the sigmoid activation function, and W(t − 1) is the input state vector. By using a set of DBNs to approximate the functional coefficients of model (2), the DBN-AR model is then obtained. The block diagram of the DBN-AR model is depicted in Fig. 2. As shown in Fig. 2, the relationship between the input and output of the DBN-AR model is represented as follows:

⎧ p ∑ ⎪ ⎪ ⎪ y(t) = φ W(t − 1) + φi (W(t − 1)) y(t − i) + ε (t) ( ) ⎪ 0 ⎪ ⎪ ⎪ i=1 ⎪ ⎪ ( ( ) ) ⎪ ⎪ (j) ⎪ Nr ⎪ ⎪ ⎪ φ W(t − 1) = ϕ u (t ) ( ) j ⎪ 1,j ⎪ ⎪ ⎪ ⎪ ( ( ) ( )) ) ( ⎪ ⎪ (j) (j ) (j) ⎪ Nr Nr −1 Nr ⎪ ⎪ , j ∈ {0, 1, 2, . . . , p} = ϕ w1,j hj (t ) + b1,j ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ( )T ⎪ ⎪ ⎪ ⎪ (ℓj ) (ℓj ) (ℓj ) (ℓj ) ⎪ hj (t ) = h1,j (t ) , h2,j (t ) , . . . , h (j) (t ) , ⎪ ⎪ Qℓ ,j ⎪ ⎨ j { } ⎪ (j) ⎪ ⎪ ⎪ ℓj ∈ 1, 2, . . . , Nr − 1 ⎪ ⎪ ) ) ( ( ⎪ ⎪ ⎪ ℓj ) (ℓj −1) ℓj ) ℓj ) ℓj ) ( ( ( ( ⎪ ⎪ h (j) (t ) = ϕ u (j) (t ) = ϕ w (j) hj (t ) + b (j) , ⎪ ⎪ nℓ ,j n ℓ ,j n ℓ ,j nℓ ,j ⎪ ⎪ j j j j ⎪ ⎪ ⎪ { } ⎪ ⎪ ⎪ ⎪ n(ℓj) ∈ 1, 2, . . . , Qℓ(j) ⎪ ⎪ j j ⎪ ( ) ⎪ ⎪ ⎪ ⎪ ℓj ) ℓj ) ℓj ) ℓj ) ( ( ( ( ⎪ ⎪ w (j) = w (j) , w (j) , . . . , w (j) (j) , Q0(j) = p ⎪ ⎪ n ℓ ,j nℓ ,1,j nℓ ,2,j nℓ ,Qℓ −1 ,j ⎪ j j j j j ⎪ ⎪ ⎪ ⎪ ⎪ ⎩ (0) hj (t ) = W(t − 1) = (y(t − 1), y(t − 2), . . . , y(t − p))T (ℓj )

where w

(j) n ℓ ,j

(4)

represents the weight matrix between layer ℓj and

j

(j)

layer ℓj − 1 in the jth DBN module, Qℓj is the number of nodes on

(ℓj )

layer ℓj in the jth DBN module, hj

((t ) is the output value)of ℓj th (ℓj ) (ℓj ) (ℓj ) hidden layer in the jth DBN module, b1,j , b2,j , . . . , b (j) is the Qℓ ,j j

bias of ℓj th hidden layer in the jth DBN module, and φj (W(t − 1)) is the output of the jth DBN module, i.e. the jth state-dependent coefficient of the DBN-AR model (4). In fact, DBN-AR model (4) with DBN-style coefficients has an autoregressive structure, which is similar to a linear AR model

608

W. Xu, H. Peng, X. Zeng et al. / Applied Soft Computing Journal 77 (2019) 605–621

3. Identification of DBN-AR model

3.1. Determining structure of DBN-AR model

Identification of DBN-AR model (4) mainly includes the selection of the order and the estimation of all the parameters when the model structure is determined. The Akaike Information Criterion (AIC) is used as the selection criteria of the DBN-AR model order and the architecture of the DBNs. AIC has been also used in the estimation of RBF-AR model [13,54]. The AIC is defined as follows: AIC = N log δe2 + 2(s + 1), N ≫ p

(5)

Fig. 1. Structure of the DBN.

structure at each operating point by fixing W(t − 1). From DBN-AR model (4), it is obvious that the model includes a DBN module φ0 , so DBN-AR model is a more general nonlinear model description compared with a single DBN. DBN-AR model (4) divides the complexity of the problem into each regression part of the AR model, so the prediction accuracy of DBN-AR model may be higher than that of a single DBN or a linear AR model. This will be verified by the experiments given in Section 5. Estimation of the DBN-AR model structure and its parameters is presented in the following section.

where δe2 is the modeling residual variance under the chosen orders and DBN structure, p is the order of the model, s is the total number of the parameters to be estimated, and N is the length of observation data. In this paper, the final model order, the numbers of layers and hidden nodes of the DBNs are determined according to the minimum of the AIC value. Likewise, the DBN-AR model learning is also divided into two processes, namely the pre-training learning and the local weight adjustment. First, a pseudo inverse matrix is built to obtain the target values of each DBN module in DBNAR model (4), then, each DBN module is trained by combining the unsupervised pre-training method with the supervised optimization method, which is proposed by Hinton [25]. Finally, the entire DBN-AR model parameters are fine tuned by the back propagation (BP) algorithm designed for fine-tuning of DBN-AR model. If the model order is given, the parameter optimization process of DBNAR model is presented in the following subsections.

Fig. 2. Structure of the DBN-AR model.

W. Xu, H. Peng, X. Zeng et al. / Applied Soft Computing Journal 77 (2019) 605–621

609

Fig. 3. Flow-chart of DBN-AR modeling process.

3.2. DBN-AR model parameters pre-training

where

Before pre-training DBN-AR model (4), we first normalize the original data to [0,1], and then assign the reference output series φj (W(t − 1)) , j ∈ {0, 1, 2, . . . , p} , t ∈ {p, p + 1, . . . , N − 1} of the DBNs in the model based on the least squares solution with minimum norm and the pseudo inverse matrix. To this end, for the normalized data {y(i), y(i + 1), . . . , y(p + i)}, according to DBN-AR { } model (4), we calculate the reference outputs φ0 , φ1 , . . . , φp of the DBNs at sample instant p + i − 1 by solving the least squares solution with minimum norm of the following equation:

⎧ ⎪ ⎨Mp+i−1 = (1, y(p + i − 1), y(p + i − 2), . . . , y(i)) Φp+i−1 = (φ0 (W(p + i − 1)) , φ1 (W(p + i − 1)) , ⎪ )T ⎩ φ2 (W(p + i − 1)) , . . . , φp (W(p + i − 1))

y(p + i) = Mp+i−1 Φp+i−1 , i ∈ {1, 2, . . . , N − p}

Φp+i−1 = Mp+i−1 y(p + i)

(6)

Φp+i−1 is used as the reference outputs of the DBNs at time p + i − 1, and Mp+i−1 is the correlation coefficients in model (4) at time p + i − 1. The least squares solution with minimum norm of Eq. (6), i.e. the target values of the DBNs can be obtained by +

(7)

610

W. Xu, H. Peng, X. Zeng et al. / Applied Soft Computing Journal 77 (2019) 605–621

where, M+ p+i−1 is the pseudo inverse matrix of Mp+i−1 . Next, we train the DBNs in model (4) using the learning algorithm proposed by Hinton [25] and the target values of the DBNs given in (7). The prediction output of DBN-AR model (4) in its pre-training stage is then obtained as follows yˆ (t) = φˆ 0 (W(t − 1)) +

p ∑

as the evaluation criteria to analyze the modeling performance of different models or methods, which are given as follows MSE =

i=1

t = p + 1, p + 2, . . . , N

MAE =

MAPE = 3.3. Fine tuning of DBN-AR model

t = p + 1, p + 2, . . . , N

(9)

Next, using ξ (t) we carry out fine tuning to the parameters of DBN-AR model (4) by the designed BP algorithm ∑ that is given N 1 2 in Appendix until the mean square deviation N − t =p+1 ξ (t) p reaches minimum. The objective function in the fine tuning stage is designed as follows

=

)2 1( ξ 2 (t) = y(t) − yˆ (t) 2( 2

1 1

2

y(t) − φˆ 0 (W(t − 1)) −

(12)

N ∑ (

1 N −p

)2

y (t ) − yˆ (t )

N ∑ ⏐ ⏐ ⏐y (t ) − yˆ (t )⏐

1 N −p

(13)

t =p+1

(14)

t =p+1

⏐ ⏐ N ∑ ⏐ ⏐ y (t ) − yˆ (t ) ⏐ ⏐ × 100% ⏐ ⏐ N −p y (t ) 1

(15)

t =p+1

After the pre-training of DBN-AR model (4), which is presented in Section 3.2, we obtain the modeling error of the pre-trained model as follows

E(t) =

)2

y (t ) − yˆ (t )

t =p+1

(8)

where yˆ (t) is the predicted output, and φˆ j (W(t − 1)) (j = 0, 1, . . . , p) are the output values of the DBN modules after finishing the training of the DBNs.

ξ (t) = y(t) − yˆ (t),

N −p

   RMSE = √

φˆ i (W(t − 1)) y(t − i),

N ∑ (

1

p ∑

)2 φˆ i (W(t − 1)) y(t − i)

SSE =

N ∑ (

)2

y (t ) − yˆ (t )

(16)

t =p+1

∑N NMSE =

∑N

)2

y (t ) − yˆ (t )

(

t =p+1

t =p+1

(y (t ) − y)2

(17)

where y (t ) is the actual value, yˆ (t ) is the predicted value, y is the average value of observed data, p is the order of the model, and N represents the length of the data. Through calculation of the index above, it can be seen that the lower index value indicates the higher prediction accuracy. The MSE, RMSE, MAE, SSE and NMSE are used to evaluate absolute error, and the value of MAPE is used to evaluate relative error. These six indexes are used as the reference for evaluating prediction model.

,

i=1

t = p + 1, p + 2, . . . , N

5. Case studies (10)

where y(t) and yˆ (t) are the actual output and predicted output of DBN-AR model (4), respectively. For all the training data {y(t), t = 1, 2, . . . , N }, we use Eq. (A.20) to fine-tune the parameters of DBN-AR model (4) (see Appendix), and then recalculate the predictive output of the DBN-AR model using the final updated model parameters as follows

˜ y (t ) = ˜ φ0 (W (t − 1)) +

p ∑

˜ φi (W (t − 1)) y (t − i)

(11)

i=1

If the mean square error, MSE =

1 N −p

∑N

t =p+1

To verify the validity of the proposed DBN-AR model, we provide a comprehensive experimental evaluation for the model. Modeling problems of the daily electricity generation of the right bank power station of the Three Gorges dam, the sunspot data [63], the electricity load data from Australian Energy Market Operator (AEMO) [64] and the weekly British Pound/US dollar (GBP/USD) exchange rate data [10] are studied in this section, and the modeling results are computed using the PC with Intel i7-3770 CPU, 3.4 GHz, 8 GB-RAM, and using Matlab 2010b.

y (t ))2 (y (t ) − ˜

is small enough, we stop the parameters fine-tuning process, otherwise, continue to fine tune the parameters using Eq. (A.20) and let the parameters obtained in previous updating process be the initial parameters, until the requirement is met. Finally, ˜ y (t ) is denormalized for getting final prediction of the output signal. The procedure of the parameters estimation of DBN-AR model (4) is summarized in Algorithm 1, and the flow chart of the DBN-AR modeling process is depicted in Fig. 3. 4. Forecast error measures To evaluate the modeling results of different models reasonably, under the same conditions, the modeling results can be analyzed by some evaluation criteria. In this paper, the mean square error (MSE), root mean square error (RMSE), mean absolute error (MAE), mean absolute percentage error (MAPE), square sum of the error (SSE) and the normalized mean squared error (NMSE) are used

5.1. Prediction of daily electricity generation of the Three Gorges dam

In this subsection, the proposed DBN-AR model is used to characterize the daily electricity generation data of the Three Gorges dam Right Bank Hydropower Station. The normalized data are depicted in Fig. 4, and the data are sampled from 1/1/2008 to 31/12/2009. There is 731 data points, in which the first 670 data points are used to train the DBN-AR model, and the last 61 data points are used to test the model. In order to determine the order and the structure of the DBN-AR model, we choose the number of iteration in the training of each DBN module and in the fine-tuning of DBN-AR model as 500 and 100, respectively. By comparing the AIC values of the estimated DBN-AR model under different orders, finally the model order is chosen as p = 4. The following DBN-AR model is used to predict the one-step-ahead prediction output of this nonlinear time series

W. Xu, H. Peng, X. Zeng et al. / Applied Soft Computing Journal 77 (2019) 605–621

611

Table 1 Structure parameters of DBN modules in model (18).

⎧ 4 ∑ ⎪ ⎪ ⎪ ⎪ y(t) = φ W(t − 1) + φi (W(t − 1)) y(t − i) + ε (t) ( ) 0 ⎪ ⎪ ⎪ ⎪ i=1 ⎪ ( ( ) ) ⎪ ⎪ (j) ⎪ Nr ⎪ ⎪ ⎪φj (W(t − 1)) = ϕ u1,j (t ) ⎪ ⎪ ⎪ ⎪ ⎪ ( ( ) ( ) ( )) ⎪ ⎪ (j ) (j) (j) ⎪ Nr Nr −1 Nr ⎪ ⎪ ⎪ = ϕ w1,j hj , j ∈ {0, 1, 2, 3, 4} (t ) + b1,j ⎪ ⎪ ⎪ ⎪ ⎪ ( )T ⎪ ⎪ ⎪ ℓ ℓ ℓ ℓ ⎪ ( ) ( ) ( ) ( ) j j j j ⎪ hj (t ) = h1,j (t ) , h2,j (t ) , . . . , h (j) (t ) , ⎪ ⎪ Qℓ ,j ⎪ ⎪ j ⎪ ⎪ ⎪ ⎨ { } (18) ℓj ∈ 1, 2, . . . , Nr(j) ⎪ ⎪ ( ) ( ) ⎪ ⎪ ⎪ (ℓj ) (ℓj ) (ℓj ) (ℓj −1) (ℓ ) ⎪ ⎪ h (j) (t ) = ϕ u (j) (t ) = ϕ w (j) hj (t ) + b (jj) , ⎪ ⎪ n ℓ ,j n ℓ ,j nℓ ,j n ℓ ,j ⎪ ⎪ j j j ⎪ j ⎪ ⎪ ⎪ { } ⎪ ⎪ (j) (j) ⎪ nℓj ∈ 1, 2, . . . , Qℓj ⎪ ⎪ ⎪ ( ) ⎪ ⎪ ⎪ ⎪ ⎪w(ℓj ) = w(ℓj ) , w(ℓj ) , . . . , w(ℓj ) ⎪ , Q0(j) = 4 ⎪ (j) (j) (j) (j) (j ) ⎪ nℓ ,1,j nℓ ,2,j nℓ ,Qℓ −1 ,j ⎪ nℓj ,j ⎪ j j j j ⎪ ⎪ ⎪ ⎪ ⎪ (0) ⎪ ⎪ hj (t ) = W (t − 1) = (y (t − 1) , y (t − 2) , ⎪ ⎪ ⎪ ⎪ ⎪ ⎩ y (t − 3) , y (t − 4))T

j

Parameters of the jth DBN module

0

Nr

1

Nr

2 3 4

(0)

= 2; Q0(0) = 4; Q1(0) = 7; Q2(0) = 1

(1)

= 2; Q0(1) = 4; Q1(1) = 9; Q2(1) = 1

(2)

= 2; Q0(2) = 4; Q1(2) = 5; Q2(2) = 1

(3)

= 3; Q0(3) = 4; Q1(3) = 15; Q2(3) = 35; Q3(3) = 1

(4)

= 2; Q0(4) = 4; Q1(4) = 9; Q2(4) = 1

Nr

Nr

Nr

The structure parameters of each DBN module in model (18) is shown in Table 1. In the pre-training stage of the DBN-AR model, the target values of each DBN module in (18) are determined according to Eq. (7), and each DBN module is fine tuned by the unsupervised training and supervised training. The target values and predicted values of the estimated DBN modules in the pretraining stage of model (18) for the testing data are depicted in Fig. 5, which shows that the predicted values can well fit the target values in the pre-training stage of model (18). After finishing the training of all the DBN modules, we then fine tune DBN-AR model (18) using Eq. (9), Eqs. (A.1)–(A.20) (see Appendix). In the iterative optimization process of the model parameters in the fine tuning stage, the MSE variation for the training data is given in Fig. 6, which shows that the MSE tends to be stable with the increase of iteration times. On the other hand, all the connection weights and biases of model (18) are also convergent in the iterative optimization process as seen in Fig. 7, which depicted the variation process of one connection weight

612

W. Xu, H. Peng, X. Zeng et al. / Applied Soft Computing Journal 77 (2019) 605–621

Fig. 4. Normalized daily electricity generation data of right bank power station of the Three Gorges dam.

between the input layer and the first hidden layer in the 0th DBN module during the iterative optimization process for the training data. It illustrates that the fine tuning method proposed in this paper is feasible. Comparison results of the actual value and the predicted value of the estimated DBN-AR model (18) for the testing data are given in Fig. 8, which shows that the proposed DBN-AR model possesses good prediction accuracy for the time series. To illustrate the superiority of the DBN-AR modeling compared with other modeling methods, Table 2 gives the comparison results of forecasting accuracy for the testing data and the comparison results of computing time & used memory for the training data for different models, which includes BP network, RBF-AR model, single DBN and linear AR model. Table 2 shows that prediction result of the DBN-AR model is better than that of other four models. This is because DBN-AR model not only has the superiority of DBN in function approximation, but also has the nonlinear behavior description ability of SD-AR model. In general, because of the complexity of training DBN modules, the training time or the used memory of DBN-AR model may be longer or larger than that of other models when the DBN module

Fig. 6. MSE in the iterative optimization process for the training data.

Fig. 7. Variation of a connection weight in the first RBM.

has many hidden layers as seen in Table 2. Besides, the every functional coefficient of a DBN-AR model is a DBN, so it may take

Fig. 5. Comparison of target values and predicted values of the estimated DBN modules for testing data.

W. Xu, H. Peng, X. Zeng et al. / Applied Soft Computing Journal 77 (2019) 605–621

613

Table 2 Comparison of forecasting accuracy for testing data and comparison of computing time & used memory for training data for different models. Method

p

Time

Memory

NMSE

MSE

MAE

MAPE

SSE

BP RBF-AR DBN AR DBN-AR

4 4 4 4 4

47 min 1.78 min 12.4 min 0.031 s 14.1 min

135.8 MB 228.5 MB 450.1 MB 161.6 MB 2.3 G

6.76 × 10−1 6.74 × 10−1 6.78 × 10−1 6.81 × 10−1 6.56 × 10−1

4.18 × 10−4 4.19 × 10−4 4.21 × 10−4 4.24 × 10−4 3.94 × 10−4

7.62 × 10−1 7.30 × 10−1 6.82 × 10−1 7.87 × 10−1 6.64 × 10−1

2.86 × 10−2 2.85 × 10−2 2.75 × 10−2 2.75 × 10−2 2.69 × 10−2

2.38 × 10−2 2.34 × 10−2 2.40 × 10−2 2.41 × 10−2 2.25 × 10−2

Table 3 Average value and standard deviation of modeling error measures after repetition experiment. Method

NMSE

MSE

MAE

MAPE

SSE

AR RBF-AR DBN BP DBN-AR

5.91 × 10−1 ±1.08×10−1 5.75 × 10−1 ±1.04×10−1 1.01±2.81×10−1 8.83 × 10−1 ±3.83×10−1 5.15 × 10−1 ±1.14×10−1

2.17 × 10−4 ±1.48×10−4 2.01 × 10−4 ±1.91×10−4 4.68 × 10−4 ±3.51×10−4 5.51 × 10−4 ±5.19×10−4 1.81 × 10−4 ±1.47×10−4

4.08 × 10−1 ±2.56×10−1 3.84 × 10−1 ±3.27×10−1 5.59 × 10−1 ±2.49×10−1 7.69 × 10−1 ±4.69×10−1 3.24 × 10−1 ±2.26×10−1

2.01 × 10−2 ±4.81×10−3 1.95 × 10−2 ±7.21×10−3 3.86 × 10−2 ±2.44×10−2 3.57 × 10−2 ±1.21×10−3 1.56 × 10−2 ±6.28×10−3

1.02 × 10−2 ±9.25×10−3 5.75 × 10−2 ±1.17×10−2 1.64 × 10−2 ±1.01×10−2 2.67 × 10−2 ±1.52×10−3 8.21 × 10−3 ±8.89×10−3

Fig. 10. Normalized sunspot time series. Fig. 8. Comparison results of real data and its prediction for the testing data.

Fig. 9. Bar chart of the average value and standard deviation of NMSE and MAE indexes of each model after repetition experiment for the ten datasets.

more time and more memory to train the model parameters than other models. However, the training of DBN-AR model parameters performs offline, so in most cases the use of DBN-AR model in practice may be not affected by long time offline training. The used memory shown in Table 2 is that of the Matlab program for training the model in the training process, and the data are from the Matlab Workspace in the training process. By the way, we computed the results in this paper using the PC with Intel i7-3770 CPU, 3.4 GHz, 8 GB-RAM. For comparing the modeling results given in literatures for an actual time series, and to show the forecasting performance of

a modeling method, one usually tries best to select the model order and parameters for getting a better prediction result under the same experiment condition (using the same training data and testing data). This is a common approach in modeling process and is often used in literatures. We give such comparison study results in Sections 5.2–5.4. On the other hand, to show the statistical characteristics of the modeling results, we also give a repetition experiment result. To this end, a random point in time is chosen from the original 731 data points shown in Fig. 4, and the previous data consisting of the data set is used for training model, while the following data set is used for testing as done in [66]. Following this way, 10 sets of the new data are gotten, and then we obtain the repetition experiment modeling results given in Table 3 and Fig. 9 by using AR, BP, DBN, RBF-AR and DBN-AR model, respectively. In Table 3, the data stands for the average value and respective standard deviation of the modeling error measure of each model. It can be seen from Table 3 and Fig. 9 that the proposed DBN-AR modeling method gives the smallest modeling error. 5.2. Prediction of sunspot time series The prediction of sunspot activity is a very important and challenging research topic [47]. Sunspot time series reflect solar activity, which will have an impact on the earth, weather changes, and space test missions. Therefore, the prediction of sunspot has high research value. However, due to the complexity of the system itself, and the lack of proper mathematical models, it is difficult to study sunspots. The sunspot monthly smoothing time series data are obtained from the World Data Center for the Sunspot

614

W. Xu, H. Peng, X. Zeng et al. / Applied Soft Computing Journal 77 (2019) 605–621

Fig. 11. Comparison between target data and predicted values of DBN modules after the pre-training of DBN-AR model for testing data.

Index (SIDC) [63]. For fair comparison, this paper selects the same sunspot time series as done in [31–35,47–50], which are sampled from November 1834 to June 2001 and have 2000 data points. The data are normalized between [0,1], which is depicted in Fig. 10. As also done in [31–35,47–50], the first 1000 points of the data are used as training data, the remaining 1000 points are used as testing data, and the order of the DBN-AR model is five. That is, W (t − 1) = (y (t − 1) , y (t − 2) , . . . , y (t − 5))T is used as the state of the DBN-AR model, and the structure of the DBN-AR model is as follows.

⎧ 5 ∑ ⎪ ⎪ ⎪ y(t) = φ W(t − 1) + φi (W(t − 1)) y(t − i) + ε (t) ( ) ⎪ 0 ⎪ ⎪ ⎪ i=1 ⎪ ⎪ ( ( ) ) ⎪ ⎪ (j ) ⎪ Nr ⎪ ⎪ ⎪ φj (W(t − 1)) = ϕ u1,j (t ) ⎪ ⎪ ⎪ ⎪ ⎪ ( ( ) ( )) ) ( ⎪ ⎪ (j ) (j) (j) ⎪ Nr Nr −1 Nr ⎪ ⎪ =ϕ w , j ∈ {0, 1, . . . , 5} hj (t ) + b1,j ⎪ 1 ,j ⎪ ⎪ ⎪ ⎪ ⎪ ( )T ⎪ ⎪ ⎪ ⎪ (ℓj ) (ℓj ) (ℓj ) (ℓj ) ⎪ hj (t ) = h1,j (t ) , h2,j (t ) , . . . , h (j) (t ) , ⎪ ⎪ Qℓ ,j ⎪ ⎨ j { } ⎪ ⎪ ℓj ∈ 1, 2, . . . , Nr(j) − 1 ⎪ ⎪ ⎪ ⎪ ( ) ( ) ⎪ ⎪ ⎪ ℓj ) ℓj ) ℓj ) ℓj ) (ℓj −1) ( ( ( ( ⎪ ⎪ h (j) (t ) = ϕ u (j) (t ) = ϕ w (j) hj (t ) + b (j) , ⎪ ⎪ nℓ ,j n ℓ ,j n ℓ ,j nℓ ,j ⎪ ⎪ j j j j ⎪ ⎪ ⎪ { } ⎪ ⎪ ⎪ ⎪ n(ℓj) ∈ 1, 2, . . . , Qℓ(j) ⎪ ⎪ j j ⎪ ( ) ⎪ ⎪ ⎪ ⎪ ℓj ) ℓj ) ℓj ) ℓj ) ( ( ( ( ⎪ ⎪ w (j) = w (j) , w (j) , . . . , w (j) (j) , Q0(j) = 5 ⎪ ⎪ nℓ ,j nℓ ,1,j nℓ ,2,j nℓ ,Qℓ −1 ,j ⎪ j j j j j ⎪ ⎪ ⎪ ⎪ ⎪ ⎩ (0) hj (t ) = W (t − 1) = (y (t − 1) , y (t − 2) , . . . , y (t − 5))T (19) In the pre-training stage of model (19), the target value of each DBN module in (19) is computed using Eq. (7), and each DBN

Table 4 Structure parameters of DBN-AR model (19). j

Parameters of the jth DBN module

0

Nr

1 2 3

(0)

= 3; Q0(0) = 5; Q1(0) = 9; Q2(0) = 25; Q3(0) = 1

(1)

= 3; Q0(1) = 5; Q1(1) = 9; Q2(1) = 15; Q3(1) = 1

(2)

= 3; Q0(2) = 5; Q1(2) = 25; Q2(2) = 55; Q3(2) = 1

(3)

= 4; Q0(3) = 5; Q1(3) = 15; Q2(3) = 35; Q3(3) = 55; Q4(4) = 1

(4)

= 3; Q0(4) = 5; Q1(4) = 15; Q2(4) = 55; Q3(4) = 1

(5)

= 3; Q0(5) = 5; Q1(5) = 15; Q2(5) = 25; Q3(5) = 1

Nr

Nr

Nr

4

Nr

5

Nr

module is fine tuned by the unsupervised training and supervised training. The structure parameters of each DBN module in DBN-AR model (19) are shown in Table 4, and the number of iteration in the training of each DBN module is 1.0 × 105 . The comparison between the target data and predicted values of the DBN modules after the pre-training of DBN-AR model (19) for the testing data is given in Fig. 11. Fig. 11 shows that the predicted values of the estimated DBN modules in model (19) can well fit their target values in the pretraining stage of model (19). After finishing the training of all the DBN modules in model (19), we then fine tune DBN-AR model (19) using Eq. (9), Eqs. (A.1)–(A.20) (see Appendix), and the number of iteration is 1.0 × 104 in the fine tuning stage. Comparison results of the actual value and the predicted value of the estimated DBNAR model (19) for the testing data are given in Fig. 12, which shows that the estimated DBN-AR model achieves good prediction accuracy especially near the peaks and valleys for the time series. However, the prediction result given in [47] is poor near the peaks and valleys using the presented model. Fig. 13 shows the predictive error and its histogram of the estimated DBN-AR model (19) for the testing data, from which it is clear that the error histogram shows an obvious Gauss distribution, so it verifies the success of the DBNAR modeling for the time series. For comparison, Table 5 gives the prediction results of DBN-AR model (19) and some models from literatures for the testing data, and the used performance indexes are MSE, RMSE and NMSE. From Table 5, it can be seen that the prediction accuracy of the DBN-AR model is much better than that of the other models.

W. Xu, H. Peng, X. Zeng et al. / Applied Soft Computing Journal 77 (2019) 605–621

615

Table 5 Comparison of prediction results of different models for the sunspot testing data. Prediction method

MSE

RMSE

NMSE

WP-MLP [49] McNish–Lincoln [34] Sello-nonlinear method [32] Waldmeier [32] Denkmayr [35] RBF-OLS [48] LLNF-LoLiMot [48] ERNN [31] MLP [33] Hybrid Elman-NARX with residual [47] SL-CCRNN [50] DBN-AR

Not provided Not provided Not provided Not provided Not provided Not provided Not provided Not provided Not provided 1.41 × 10−4 Not provided 3.31 × 10−5

Not provided Not provided Not provided Not provided Not provided Not provided Not provided 1.29 × 10−2 Not provided 1.19 × 10−2 1.66 × 10−2 5.70 × 10−3

1.25 × 10−1 8 × 10−2 3.4 × 10−1 5.6 × 10−1 1.85 4.6 × 10−2 3.2 × 10−2 2.8 × 10−3 9.79 × 10−2 5.9 × 10−4 1.47 × 10−3 5.39 × 10−4

Fig. 12. Comparison between original data and predicted values of DBN-AR model (19) for testing data.

and test the proposed DBN-AR model. For TAS, the January, April, July and October data are used to reflect the different seasons. For VIC, the January data are used to predict. For SA, the January and October data are used to predict. In the experiment, the first three weeks data are used to train the model, and the remaining one week data are used to test the model [37]. The electricity load demand data from AEMO is sampled every half an hour, it means that there are 48 data points for one day [36]. Therefore, there are 1008 data points for training and 336 data points for testing [37]. In this paper, for one day ahead load demand forecasting, i.e. y (t ), the input data are composed of the data points from y (t − 48) to y(t − 96), y (t ) is the output of the DBN-AR model, and it is the same as that in [37]. The structure parameters of the DBN(j) (j) (j) AR model are set as p = 48, Nr = 2, Q0 = 48, Q1 = 100, (j) Q2 = 1, (j = 0, 1, . . . , p). The iteration numbers in the training of each DBN module and in the fine-tuning of the DBN-AR model are set as 4000 and 5000, respectively. The prediction results of oneday ahead load forecasting are given in Table 6 using the estimated DBN-AR model and the other eleven benchmark methods for the testing data. Table 6 shows that the proposed DBN-AR model gives better prediction results than the other methods in the most cases. Thus, it verifies the advantage of the DBN-AR model for the time series prediction. 5.4. Prediction of foreign currency exchange rate time series

Fig. 13. Predictive error and its histogram for testing data.

5.3. Prediction of electric load time series from AEMO In this subsection, using the electricity load time series from Australian Energy Market Operator (AEMO) [64] the performance of the proposed DBN-AR model is evaluated by comparing with the other eleven benchmark modeling methods, i.e. Persistence [37], ANN [40], DBN [25], SVR [39], Long Short-Term Memory (LSTM) [29], Ensemble DBN (EDBN) [38], Empirical Mode Decomposition (EMD) based SVR model (EMD-SVR) [42], EMD based ANN model (EMD-ANN) [43], EMD based Random Forest (RF [41]) (EMDRF) [37] and EMD based DBN (EMD-DBN) [37]. For fair comparison, the data sets of year 2013 from Tasmania (TAS), Victoria (VIC) and South Australia (SA) are chosen to train

For extending testbeds to public datasets from other domains, the DBN-AR model is also used to forecast financial time series. We use the weekly rates of British Pound/US dollar (GBP/USD) exchange rate data from the beginning of 1976 to the end of 1993, for a total 937 observations [10] for experiment in this subsection. For fair comparison with the results in [10], we also keep 885 observations for training data, and the remaining 52 observations are used as testing data as done in [10]. The exchange rate time series are downloaded from the database retrieval system of ‘‘Pacific Exchange Rate Service’’ (http://fx.sauder.ubc.ca/data. html) [10]. We use raw data of GBP/USD exchange rate, the input order of the DBN-AR model is chosen as six, and it is the same as that in Ref. [10]. Table 7 gives the modeling results of the DBNAR model and some models from literature [10] for the testing data, and the used performance indexes are RMSE and MAPE. It can be seen from Table 7 that the prediction accuracy of the DBNAR model is better than that of the other models. Therefore, the DBN-AR modeling may be a potential method for exchange rate time series forecasting. 5.5. Results and discussion The forecasting results for the time series given in Tables 2, 3, 5, 6, 7 and Figs. 5–8, 11–13 demonstrate that the proposed DBNAR model has very strong forecasting ability from the point of

616

W. Xu, H. Peng, X. Zeng et al. / Applied Soft Computing Journal 77 (2019) 605–621

Table 6 Results of one day ahead load forecasting for testing data of AEMO time series. Data

Month

Jan Apr TAS Jul Oct VIC

Jan Jan

SA Oct

Metrics

Prediction model Persistence [37]

SVR [39]

ANN [40]

DBN [25]

RF [41]

EDBN [38]

EMDSVR [42]

EMDANN [43]

EMDRF [37]

EMDDBN [37]

LSTM [29]

DBNAR

RMSE MAPE RMSE MAPE RMSE MAPE RMSE MAPE

89.82 7.24% 157.73 10.22% 120.47 8.11% 109.46 7.48%

60.97 4.81% 111.89 7.48% 90.99 5.89% 79.45 5.55%

69.92 5,42% 94.40 6.30% 89.17 6.28% 72.86 5.24%

63.96 4.98% 93.81 6.12% 87.30 6.04% 75.73 5.15%

65.90 4.77% 92.64 6.10% 90.48 6.17% 69.80 4.63%

60.68 4.82% 109.78 7.28% 85.19 6.04% 80.81 5.05%

61.73 4.49% 104.59 6.87% 92.54 6.09% 82.85 5.60%

63.38 4.87% 87.41 5.92% 82.92 5.50% 80.85 5.63%

58.51 4.67% 86.61 5.80% 81.34 5.54% 73.86 4.88%

56.10 4.05% 85.13 5.80% 73.91 4.93% 68.26 4.75%

118.02 8.44% 190.61 10.92% 126.02 6.62% 128.89 7.57%

54.81 3.66% 84.42 4.82% 73.70 3.59% 63.37 3.39%

RMSE MAPE

990.74 9.48%

587.98 7.16%

811.43 9.32%

915.21 8.79%

739.65 8.77%

762.16 9.14%

806.29 9.48%

781.17 9.07%

783.58 9.32%

762.57 8.86%

805.81 7.40%

698.17 6.37%

RMSE MAPE RMSE MAPE

433.57 14.32% 240.53 11.54%

337.10 13.34% 210.72 8.94%

411.66 13.72% 233.48 10.03%

401.25 13.62% 204.16 9.33%

349.87 13.41% 218.30 9.11%

363.49 14.43% 203.53 9.32%

280.70 11.13% 203.38 8.39%

397.66 13.80% 199.77 8.54%

288.85 13.03% 209.70 8.22%

238.09 10.46% 192.74 8.11%

309.36 8.88% 306.06 14.6%

224.12 10.32% 159.69 7.80%

Table 7 Prediction Results for weekly GBP/USD exchange rate data. Model

RMSE

MAPE

RW [10] ARMA [10] DBN(1) [10] DBN(2) [10] FFNN [10] DBN-AR

1.44 × 10−2 1.38 × 10−2 8.21 × 10−3 7.69 × 10−3 9.41 × 10−3 7.30 × 10−3

1.60 1.76 0.98 0.90 1.11 0.81

view of prediction accuracy and the comparison results with the other models. Thus, the proposed DBN-AR model is suitable to forecast some complex time series. This is mainly because DBN-AR model possesses both the advantage of SD-AR model structure in representing nonlinear dynamic behavior and the merit of DBN in approximating nonlinear function. The proposed DBN-AR model may be used to characterize a class of nonlinear systems that can be represented by general SDAR model (2) [51]. In this paper, a set of DBNs are used to approximate the state-dependent coefficients of SD-AR model (2) to build DBN-AR model (4), because a DBN may approximate any nonlinear function. DBN is a generative neural network model with many hidden layers along with a greedy layer-wise learning algorithm that is based on the training of a sequence of RBMs [10,25,67]. In DBN-AR model (4), the signal W (t − 1) on which the time varying model coefficients depend may be the output signal, the input signal, or any other measured signal in the system to be considered. At any operating point, a locally linearized AR model may be easily obtained from DBN-AR model (4) by fixing the state vector W (t − 1) at time t. DBN-AR model includes a DBN as one of its components, so it may be regarded as a more general nonlinear model compared with a single DBN, and the structure of every DBN in DBN-AR model may be much simpler than that of a single DBN model, because DBN-AR model partially disperses the complexity of the model into the AR part. DBN-AR model has a quasi-linear AR model structure and its every functional coefficient is made up of a DBN module. This property is very useful in local behavior analysis to a nonlinear system and in the case of using a linear model-based control method to control a nonlinear system, but it cannot be done when using a single DBN model or other nonlinear models such as the Hammerstein model to represent a nonlinear system. DBN-AR Model (4) may also be conveniently applied in real-time control, because it avoids the need for on-line parameter estimation. However, the shortcoming of the proposed DBN-AR modeling approach is that the training time of the model parameters is quite

long as seen in Table 2. This is because of the complexity of training the DBN modules, so the training time of DBN-AR model may be longer than that of other models when the DBN module has many hidden layers, but the training of DBN-AR model parameters performs offline, so in most cases the use of DBN-AR model in practice may be not affected by the long time offline training. 6. Conclusions Based on the deep belief network and the state-dependent AR model structure, the DBN-AR model was proposed to predict nonlinear time series in this paper. The DBN-AR model may combine the advantage of the state-dependent AR model in nonlinear dynamics description and the strong approximation capability of deep belief network for complex function. For estimating the DBNAR model, we proposed an approach to generating the target values of the DBN modules of the model, which are used in the pretraining stage of the model. The target values are determined by the least squares solution with minimum norm and the pseudo inverse matrix designed for estimation of the DBN modules. It makes each DBN module in the DBN-AR model can be trained using the greedy algorithm. All of the parameters of the DBN-AR model are finally fine tuned by the back propagation algorithm given in Appendix, which is specially designed for fine tuning of the DBN-AR model. The formula derivation of fine tuning of the DBN-AR model given in this paper is different from that of the wellknown single DBN model, because the DBN-AR model is essentially an AR type model, it has a quasi-linear AR model structure and its every functional coefficient is made up of a DBN module. In order to clearly describe the process of fine-tuning for the DBN-AR model, giving the detailed fine-tuning steps of the DBN-AR model is necessary. Through modeling four sets of complex time series and comparison studies, it was demonstrated that the proposed DBN-AR model has better prediction accuracy compared to some other models. Future research would be to study the structure optimization of the DBN-AR model and further improve the prediction accuracy of the model. To extend the DBN-AR model to DBN-ARX model for modeling nonlinear input/output system would be also considered. Acknowledgments The authors would like to thank the editors and the anonymous referees for their valuable comments and suggestions, which substantially improved the original manuscript. This work was

W. Xu, H. Peng, X. Zeng et al. / Applied Soft Computing Journal 77 (2019) 605–621

617 (j)

(

supported by the National Natural Science Foundation of China (61773402, 51575167, 61540037).

∂ E (t ) (

∂w

Appendix. Fine tuning procedure of DBN-AR model

(j)

∂ E (t ) ∂ξ (t ) ∂ yˆ (t ) = ∂ξ (t ) ∂ yˆ (t ) ∂ φˆ j (t )

)

Nr −1

(j) n (j) Nr

(

Using formula (9), the objective function in the fine tuning stage is designed as follows

∂h ×

)2 1( ξ 2 (t) = y(t) − yˆ (t) 2( 2 1

E(t) =

1

=

y(t) − φˆ 0 (W(t − 1)) −

2

∂u

)2

p ∑

φˆ i (W(t − 1)) y(t − i)

,

(

×w

∂ E (t ) ∂ξ (t ) ∂ yˆ (t ) = ∂ξ (t ) ∂ yˆ (t ) ∂ φˆ j (t )

(j)

Nr

∂w

(j) 1 ,n (j)

(

(j) Nr

,j

(

(j) Nr

(

(j)

= ϕ

(t ) h

N r −1

(j) n (j)

,j

Nr −1

Nr −1

= ϕ

,j

(t )

Nr

Nr

∂ E (t ) (

∂w

(j)

Nr

(t )

Nr

Nr −1

(j)

)

(j) n (j)

,j

(

)

(t ) h

Nr −1

N r −1

,j

(

(j) Nr −1

(j) n (j)

(j) Nr

∂ b1 ,j

)

(j)

Nr

= ϕ ′ u1,j (

(j)

Nr

(t )

(A.4)

∂w



(j) Nr −1

(j) Nr

∂ u1,j

∂ φˆ j (t ) (

(j) Nr

∂ u1 , j (

(j)

Nr

(j) Nr

)

)

(

Nr −1

,n (j) ,j N r −2

Nr −1

(j)

)

(j ) n (j)

,j

(

∂b

Nr −1

N r −1

=

(j) Nr

(t )

(j ) Nr −1

)

)

(j) n (j)

(

,j

(j) Nr

(t ) h

(j) n (j)

,j

,j

ϕ



(j) Nr

(

u1,j

)

(t )

) (t )

)

(j) n (j)

(t )

,j

)

(

(t ) w

(j) Nr

)

(

(j) 1,n (j)

,j

)

(

)

Nr −1

(j) Nr −2

(j) Nr −2

(

N r −2

(j) 1,n (j)

Nr −1

(j ) Nr

)

δ ,j 1,j

(t )

)

,j

(t )

(

(j) Nr

)

(

Nr −1

)

(

(t ) h

,j

(j) Nr −2

(j) n (j)

Nr −2

∂ E (t ) ∂ξ (t ) ∂ yˆ (t ) ∂ξ (t ) ∂ yˆ (t ) ∂ φˆ j (t ) (j)

Nr

)

∂ u1,j

(t )

(j) Nr −1

(

)

(j) n (j)

,j

N r −1

∂h

,j

(j) Nr

)

δ1,j

(t)

(A.7)

)

(

(j) Nr

(j) Nr −1

)

(j) n (j)

,j

(t ) ∂ u

(j) Nr −1

(j) n (j)

,j

(

(j) Nr

(

×w

(j) Nr

(j) 1,n (j)

Nr −1

( = ϕ



(

u

(j)

the gradient for parameter updating can be computed as follows.

= δ

(j) n (j)

Nr −1

ϕ

)

(j) Nr −1

(j) n (j)

,j

(j) Nr −1

(

in the (Nr − 1)th layer,

(

)

,j



u

(j) Nr −1

)

(j ) n (j)

,j

N r −1

)

(j) Nr

(

(t ) w

(j ) n (j)

,j

(t )

(j) Nr −1

(

)

(j) n (j)

,j

N r −1

) (t )

) (t )

(

)

(j) 1,n (j)

)

(t ) ∂ b

)

(

(j) Nr −1

(

N r −1

)

N r −1

(t )

(t ) ∂ u

(

= −ξ (t ) a (t − j) ϕ ′ u1,j

c (t − j)

(A.8)

)

∂ u1 , j

(

(

(A.5)

(t )

,j

∂ φˆ j (t )

N r −1

)

))

Nr −1

(j)

,n (j) ,j Nr −2 ) )

Similarly, we have

∂ E (t )

∂h

(t ) ∂ b1,j )

}

(

(j) 1,n (j)

,j

(j ) n (j)

(j)

(j) n (j)

(t )

(t )

{ ∈ 1, 2, . . . , Q (j()j)

(

(t ) w

,j

(t ) w

(j) Nr −1

Nr −1

(j)

(t )

,j

Nr −1 ( ) (j) Nr −1

)

)

)

N r −1

)

(t ) (j )

(j) n (j )

(

)

)

= δ1,j

For neuron n

u

(j) Nr −1

Nr −1

∂ E (t ) (

(A.3)

= −ξ (t ) a (t − j) ϕ ′ u1,j (

(

(

(

(

,j

(t ) = ϕ



×

∂ E (t ) ∂ξ (t ) ∂ yˆ (t ) = ∂ξ (t ) ∂ yˆ (t ) ∂ φˆ j (t )

(t )

then formula (A.6) becomes

(

(

(

)

N r −1

Similarly, we have

∂ E (t )

,j

Nr −1

Let

c (t − j)

= δ1,j

)

(j) 1,n (j)

(j)

(j ) n (j)

(j) n (j)

(A.6)

which is the local gradient with respect to neuron of the last layer in the jth DBN module. Thus, formula (A.2) can be rewritten by (

(j) Nr −2

)

Nr −1

(j) Nr −1

(j) n (j )

u1 , j

u

)

(j) n (j)

(t ) ∂ h

(t )

(j)

)

Nr −1

(j ) Nr −1

N r −2

))

(t ) = ϕ ′ u1,j

δ1,j

(j) n (j )

(

u

×h

(A.2)

and ϕ (u) is the derivative of ϕ (u) with respect to u. Let (j)



(



(



)

Nr −1

c (t − j) = −ξ (t ) a (t − j) , j = 0, 1, 2, . . . , p

(

ϕ

(j) Nr −1

× c (t − j) h

δ

)

,j

(

= ϕ

a (t − j) = y (t − j) , j = 1, 2, . . . , p; a (t ) = 1

(j )

u

(

where

(

(



(

Nr −1

)

(j) Nr −1

)

( ′

Nr

)

Nr −1

)

N r −1

(t )

(j) Nr

(j) n (j)

(j)

(

(t ) c (t − j) h

u1,j

(

(j) Nr

(

(j) n (j )

Nr −2

(t ) ∂w (j) 1,n (j) ,j Nr −1 ) ( )

)

= −ξ (t ) a (t − j) ϕ ′ u1,j ( ( ) ) Nr

)

∂ u1 , j

N r −1



Nr

(t ) ∂w

,j

)

∂ u1,j

∂ φˆ j (t )

(j) n (j)

(j) 1,n (j)

where y(t) and yˆ (t) are the actual output and predicted output of DBN-AR model, respectively. Using the gradient descent method, all the parameters of DBNAR model are fine tuned. For the neuron in the last output layer (j) that is the Nr th layer, the gradient for parameter updating can be obtained through formula (10). ∂ E (t))

,j

= −ξ (t ) a (t − j) ϕ

(A.1)

(

(t ) ∂ u

(

t = p + 1, p + 2, . . . , N

(j)

(

Nr −1 ( ) (j) Nr −1

Nr −1

i=1

(

(j) Nr −1

(j)

∂ u1,j

)

(j) n (j)

∂ u1,j

(

(j)

,n (j) ,j −1 N r −2

Nr

∂ φˆ j (t )

)

Nr −1

,j

ϕ



(j) Nr

(

u1,j

)

) (t ) c (t − j)

)

,j

(t ) (A.9)

618

W. Xu, H. Peng, X. Zeng et al. / Applied Soft Computing Journal 77 (2019) 605–621



(

(j) Nr −2

(j) Nr −2

)

(



)

( ) ⎟ ⎜ (j) ∂ h (j) (t) ∂ u (j) (t) Nr −1 ⎟ ⎜ n (j) ,j n (j) ,j ⎟ ⎜ ∂ φˆ j (t) ∂ h1,j (t) Nr −2 N r −2 ⎟ ⎜ ( ) ( ) ( ) ( ) ⎜ ⎟ (j) (j) (j) (j) Nr −1 Nr −2 Nr −2 Nr −2 ⎟ ⎜ ⎜ ∂ h1,j ⎟ (t) ∂ h (j) (t) ∂ u (j) (t) ∂w (j) (j) ⎟ ⎜ n (j) ,j n (j) ,j n (j) ,n (j) ,j ⎟ ⎜ Nr −2 Nr −2 Nr −2 Nr −3 ⎟ ⎜ ( ) ( ) ⎟ ⎜ (j) (j ) N − 2 N − 2 ⎟ ⎜ r r ( ) (j ) ⎟ ⎜ ∂ h ∂ u (t) (t) N − 1 (j) (j) r ⎟ ⎜ n (j) ,j n (j) ,j ∂ h2,j (t) ⎟ ⎜ ∂ φˆ j (t) N r −2 Nr −2 ⎟ ⎜+ ( ) ( ) ( ) j (j) (j ) (j) ⎟ ⎜ Nr −2 Nr −1 Nr −2 Nr −2 ⎟ ∂ E(t) ∂ E(t) ∂ξ (t) ∂ yˆ (t) ⎜ ∂ u (t) (j ) ∂ h ∂ h ∂w (t) (t) ⎟ ⎜ (j) (j) (j) n (j) ,j ( ) 2,j = n (j) ,j n (j) ,n (j) ,j ⎟ (j) N r −2 ∂ξ (t) ∂ yˆ (t) ∂ φˆ j (t) ⎜ Nr −2 Nr −2 Nr −2 Nr −3 ⎟ ⎜ ∂w (j) ⎟ ⎜ (j) n (j) ,n (j) ,j ⎟ ⎜ Nr −2 Nr −3 ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜+ · · · ⎟ ⎜ ( ) ( ) ( ) ⎟ ⎜ (j ) (j) (j ) ⎜ ⎟ Nr −1 Nr −2 Nr −2 ⎟ ⎜ ∂ h (t) ∂ h (t) ∂ u (t) (j) (j) (j) ⎜ ⎟ Q (j) ,j n (j) ,j n (j) ,j ˆ j (t) ⎟ ⎜ ∂ φ N r −1 N r −2 Nr −2 ⎟ ⎜+ ( ) ) ) ) ( ( ( (j) (j ) (j) (j) ⎟ ⎜ Nr −1 Nr −2 Nr −2 Nr −2 ⎠ ⎝ (t) ∂ h (j) (t) ∂ u (j) (t) ∂w (j) ∂ h (j) (j) Q (j) ,j n (j) ,j n (j) ,j n (j) ,n (j) ,j Nr −1 Nr −2 N r −2 Nr −2 Nr −3 ⎞ ⎛ ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ⎜ ⎟ (j) (j) (j) (j ) (j) (j) Nr −3 Nr −2 Nr −1 Nr −1 Nr N ⎟ ⎜ ′ ′ ′ ⎟ ⎜ϕ u r t h ϕ u w ϕ u w ( ) (j) (j ) (j) 1,j 1,1,j 1,j ⎜ ⎟ n (j) ,j n (j) ,j 1,n (j) ,j ⎟ ⎜ Nr −3 N r −2 Nr −2 ⎜ ⎟ ⎟ ⎜ ⎜ ⎟ ( ) ( ) ( ) ) ( ) ( ) ) ( ( ) ) ( ( ⎟ ⎜ (j) (j) (j) (j) (j) (j) Nr −3 Nr −2 Nr −1 Nr −1 Nr Nr ⎜ ′ ⎟ ⎟ ⎜+ϕ u1,j t h (j) ϕ ′ u (j) w (j) w1,2,j ϕ ′ u2,j () ⎜ ⎟ n (j) ,j n (j) ,j 2,n (j) ,j = −ξ (t)a (t − j) ⎜ ⎟ Nr −3 Nr −2 Nr −2 ⎜ ⎟ ⎟ ⎜ ⎟ ⎜ ⎜ ⎟ ⎟ ⎜+ · · · ⎜ ⎟ ⎟ ⎜ ⎜ ⎟ ( ( ( ( ( ( )) ( ) ⎟ ⎜ ) )) ( ) )) ( (j) (j ) (j) (j) (j) (j) ⎜ ⎟ Nr −3 Nr −2 Nr −1 Nr −1 Nr Nr ⎝+ϕ ′ u ′ ′ h (j) ϕ u (j) w (j) ϕ u (j ) w (j) (t )⎠ (j) 1,j 1,Q (j)

Nr −1

(j) Q (j)

(

Nr −1



=

ϕ



u

v=1

(j) n (j)

N r −2

(j) Q (j)

(

Nr −1



=

(j) Nr −2

(

ϕ



u

v=1

))

,j

(

))

(j ) n (j)

,j

(j) Nr −2

N r −2

(

w

(

(j) Nr −1

)

(j)

v,n (j) ,j Nr −2

(j) Nr −1

(

w

ϕ

)



,j

Q (j )

Nr −1

(j) Nr −1

(

))

(

,j

(j) Nr

Q (j)

Nr −1

(

)

w1,v,j ϕ

uv,j



(j) Nr

(

n (j)

))

(

)

(j ) n (j)

,j

u1,j

Nr −2

c (t − j) h

(j) Nr −3

N r −3

(

(j ) Nr −1

)

δ (j) v,n (j) ,j v,j Nr −2

(t ) h

(j) Nr −3

(

)

(j) n (j)

,j

N r −3

,j

,n (j) ,j N r −2

n (j)

N r −3

(A.10)

,j

(t )

(t )

Box I.

For neuron n

(j)

{



(j)

Nr −2

1, 2, . . . , Q

}

(j)

(j)

Nr −2

(j)

in the (Nr

layer, the gradient for parameter updating can be computed as in Box I. Let

(

δ

(j) Nr −2

(j ) n (j)

N r −2

(j) Q (j)

,j

(

N r −1

)

(t ) =

∑ v=1

ϕ



(

u

(j) Nr −2

(j) n (j)

Nr −2

then formula (A.10) becomes

))

,j

(

w

(j ) Nr −1

)

(j) Nr −1

(

δ (j ) v,n (j) ,j v,j N r −2

∂ E (t )

− 2)th (

∂w

(j ) Nr −2

(j) n (j)

Nr −2

) (j )

,n (j) ,j N r −3

(



(j) Nr −2

(j) n (j)

Nr −2

)

,j

(

(t ) h

(j) Nr −3

(j) n (j)

Nr −3

)

,j

(t )

(A.12)

)

(t )

(A.11)

Similarly, we have Eq. (A.13) given in Box II. Therefore, according to the derivation process above, it can be obtained that the local gradient of each neuron of layer ℓj in the jth DBN module can be computed as follows

W. Xu, H. Peng, X. Zeng et al. / Applied Soft Computing Journal 77 (2019) 605–621

(



(j) Nr −2

)

(

(j) Nr −2

)

619



∂ h (j) (t) ∂ u (j) (t) ⎟ Nr −1 ⎜ n (j) ,j ⎜ ∂ φˆ j (t) ⎟ ∂ h1,j (t) nNr(j) −2 ,j Nr −2 ⎜ ( ⎟ ) ( ) ( ) ( ) ⎜ ⎟ (j) (j) (j) (j ) Nr −1 Nr −2 Nr −2 Nr −2 ⎜ ⎟ ⎜ ∂ h1 , j ⎟ (t) ∂ h (j) (t) ∂ u (j) (t) ∂ b (j) ⎜ ⎟ n (j) ,j n (j ) , j n (j) ,j ⎜ ⎟ N r −2 Nr( −2 N − 2 ) (r ) ⎜ ⎟ (j) (j) Nr −2 Nr −2 ⎜ ⎟ ( ) (j) ⎜ ∂ h (j) (t) ∂ u (j) (t) ⎟ Nr −1 ⎜ ⎟ n (j) ,j n (j) ,j ∂ h2,j (t) ∂ φˆ (t) ⎜ ⎟ Nr −2 N r −2 ⎜+ ( (j)j ) ( ) ( ) ⎟ j (j) (j) ⎜ ⎟ Nr − 2 Nr −1 Nr −2 Nr −2 ∂ E(t) ∂ E(t) ∂ξ (t) ∂ yˆ (t) ⎜ ⎟ ∂ h2,j (t) ∂ un(j) ,j (t) ∂ b (j) (t) ∂ h (j) ( ) = ⎜ ⎟ (j) (j) n (j) ,j n (j) ,j ⎟ ∂ξ (t) ∂ yˆ (t) ∂ φˆ j (t) ⎜ Nr −2 Nr −2 N − 2 N − 2 r r ⎜ ⎟ ∂ b (j) ⎜ ⎟ n (j) ,j ⎜ ⎟ Nr −2 ⎜ ⎟ ⎜ ⎟ ⎜+ · · · ⎟ ( ) ( ) ( ) ⎜ ⎟ (j) (j) (j) Nr −1 Nr −2 Nr −2 ⎜ ⎟ ⎜ ∂ h (j) (t) ∂ h (j) (t) ∂ u (j) (t) ⎟ ⎜ ⎟ Q (j) ,j n (j) ,j n (j) ,j ⎜ ⎟ ∂ φˆ (t) Nr −1 Nr −2 N r −2 ⎜+ ( (j)j ) ( ) ( ) ( ) ⎟ (j) (j) (j) ⎜ ⎟ ⎝ ∂ h Nr −1 (t) ∂ h Nr −2 (t) ∂ u Nr −2 (t) ∂ b Nr −2 ⎠ (j ) (j) (j) (j ) Q (j ) , j n (j) ,j n (j) ,j n (j) ,j Nr −1 Nr −2 Nr −2 N r −2 ⎞ ⎛ ( ( ( ( )) ( ) ( ( )) ) )) ( (j) (j) (j) (j) (j) ⎟ ⎜ ′ Nr −2 Nr −1 Nr −1 Nr N ⎟ ⎜ϕ u r ϕ ′ u (j) w (j) w1,1,j ϕ ′ u1,j 1,j ⎟ ⎜ n (j) ,j 1,n (j) ,j ⎟ ⎜ N r −2 Nr −2 ⎟ ⎜ ⎟ ⎜ ) ( ( ) ( ) ) ( ) ( ( ) ( ) ( ) ⎟ ⎜ (j) (j) (j) (j) (j) Nr Nr Nr −1 Nr −1 Nr −2 ⎟ ⎜ ′ ′ ′ ⎟ ⎜+ϕ u1,j w1,2,j ϕ u2,j w (j) ϕ u (j) ⎟ ⎜ 2,n (j) ,j n (j) ,j = −ξ (t)a (t − j) ⎜ ⎟ Nr −2 Nr −2 ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜+ · · · ⎟ ⎜ ⎟ ⎜ ( ( ( ( ( ( )) ( ) ⎜ ) )⎟ ) )) ( (j ) (j) (j) (j) (j) ⎟ ⎜ Nr −2 Nr −1 Nr −1 Nr ⎠ ⎝+ϕ ′ u Nr ϕ′ u w ϕ′ u w (j)

(

1,j

Nr −1

(j) Q (j)

(

N r −1



=

ϕ



(

u

v=1

(

N r −1



ϕ





u

Nr −2

,j

(

w

(j) n (j)

,j

(j) Nr −1 (j)

(

w

(

)

v,n (j) ,j N r −2

))

(j ) Nr −2

Nr −2

(j) Nr −2

(j) n (j)

(j) n (j )

(

v=1

(

))

(j) Nr −2

Nr −2

(j) Q (j)

=

(j) 1,Q (j)

(j) Nr −1

)

(j)

v,n (j) ,j Nr −2

ϕ

(

(j ) Q (j )

,j

Nr −1

))

(j ) Nr −1



uv,j

(

(j) Nr −1

δv,j

)

(

(j) Q (j)

,j

(j) Nr

Nr −1

(

)

w1,v,j ϕ



(

(j) Nr

u1,j

(j) n (j)

(j)

,n (j) ,j N −2

Nr −2

r

(A.13)

,j

))

c (t − j)

)

(t )

)

(t )

,j

Box II.

δ

(ℓj )

(j) nℓ ,j j

(j) Qℓ +1 j

(t ) =



( ϕ



u

v=1

(ℓj )

)

(ℓj +1) (ℓj +1) (j) δ v,nℓ ,j v,j

w

(j) n ℓ ,j j

(t ) ,

j

(j)

ℓj ∈ 1, 2, . . . , Nr − 2 ,

{

}

(A.14)

∂ E (t ) (ℓ ) = δ (j)j (t ) n ℓ ,j (ℓj ) j ∂ b (j )

(A.16)

nℓ ,j j

and the gradients to the connection weight and bias used for parameters updating can be then given by

After getting all the gradients using (A.3)–(A.5), (A.7)–(A.9), (A.11)–(A.16), the following parameters updating values are obtained

∂ E (t ) (ℓ ) ∂w (jj) (j)

∆w ( (j))

nℓ ,nℓ −1 ,j j j



(ℓj )

(j) n ℓ ,j j

(t ) h

(ℓj −1) (j) n ℓ −1 ,j j

(t )

(A.15)

L (j) nL ,nL−1 ,j

= −η

∂ E (t ) ∂w (L(j))

(j) nL ,nL−1 ,j

1) = −ηδ (L(j)) h(L(− (t ) j) nL ,j nL−1 ,j

(A.17)

620

W. Xu, H. Peng, X. Zeng et al. / Applied Soft Computing Journal 77 (2019) 605–621

∆b( ()j) = −η L

n L ,j

∂ E (t ) ∂ b(L()j)

= −ηδ (L(j))

where L ∈

{

(A.18)

n L ,j

nL ,j

(j)

(j)

1, 2, . . . , Nr − 1, Nr

}

and η > 0 is the pre-

determined learning rate, and the parameters are updated by

⎧ ⎪ ⎨w(L(j))

(j) nL ,nL−1 ,j

⇐ w(L(j))

(j) nL ,nL−1 ,j

+ ∆w (L(j))

(j) nL ,nL−1 ,j

(A.19)

⎪ ⎩b(L()j) ⇐ b(L()j) + ∆b(L()j) n L ,j

n L ,j

n L ,j

(L) (j) (j) nL ,nL−1 ,j

where the initial value of w

and b

(L)

(j) nL ,j

are calculated in the

pre-training stage of the DBN-AR model estimation. Furthermore, in order to avoid the parameter oscillation in the fine-tuning process and slowing down the convergence rate, the momentum term is added in the final parameters updating rule as follows

⎧ ⎪ ⎪ ⎪ ⎪ w(L(j)) (j) (k) ⇐ w(L(j)) (j) (k − 1) + ∆w(L(j)) (j) (k) ⎪ ⎪ nL ,nL−1 ,j nL ,nL−1 ,j nL ,nL−1 ,j ⎪ ⎪ ⎪ ) ( ⎪ ⎪ ⎪ ⎪ +α w(L(j)) (j) (k − 1) − w(L(j)) (j) (k − 2) ⎨ nL ,nL−1 ,j

nL ,nL−1 ,j

(A.20)

⎪ ⎪ ⎪ (L) (L) (L) ⎪ b (j) (k) ⇐ b (j) (k − 1) + ∆b (j) (k) ⎪ ⎪ n L ,j n L ,j n L ,j ⎪ ⎪ ⎪ ) ( ⎪ ⎪ ⎪ ⎪ ⎩ +α b(L()j) (k − 1) − b(L()j) (k − 2) n L ,j

nL ,j

where k is the number of parameters updating iteration, α ∈ [0, 1) is the pre-determined momentum factor. References [1] C.N. Babu, B.E. Reddy, A moving-average filter based hybrid ARIMA-ANN model for forecasting time series data, Appl. Soft Comput. 23 (10) (2014) 27– 38. [2] L. Moreira-Matias, J. Gama, J. Mendes-Moreira, Concept neurons-handling drift issues for real-time industrial data mining, in: Joint European Conference on Machine Learning and Knowledge Discovery in Databases, 2016, pp. 96– 111. [3] F. Martinez-Alvarez, A. Troncoso, G. Asencio-Cortes, J.C. Riquelme, A survey on data mining techniques applied to electricity-related time series forecasting, Energies 8 (2015) 13162–13193. [4] D. Wang, H. Guo, H. Luo, O. Grunder, Y. Lin, Multi-step-ahead electricity price forecasting using a hybrid model based on two-layer decomposition technique and bp neural network optimized by firefly algorithm, Appl. Energy 190 (2017) 390–407. [5] M. Lei, S. Luan, C. Jiang, H. Liu, Z. Yan, A review on the forecasting of wind speed and generated power, Renewable Sustainable Energy Rev. 13 (4) (2009) 915–920. [6] D.P. Singh, A. Shrivastava, D.S. Dhakre, G. Sharma, Application of ARIMA model for forecasting paddy production in bastar division of chhattisgarh, Environ. Ecol. 5 (1) (2015) 82–87. [7] H. Cui, X. Peng, Short-term city electric load forecasting with considering temperature effects: An improved ARIMAX model, Math. Probl. Eng. 2015 (1) (2015) 1–10. [8] O.B. Shukur, M.H. Lee, Daily wind speed forecasting through hybrid KF-ANN model based on ARIMA, Renew. Energy 76 (2015) 637–647. [9] F.L. Chu, Forecasting tourism demand with ARMA-based methods, Tourism Manage. 30 (5) (2009) 740–751. [10] F. Shen, J. Chao, J. Zhao, Forecasting exchange rate using deep belief networks and conjugate gradient method, Neurocomputing 167 (C) (2015) 243–253. [11] Z. Yang, C. Li, L. Lian, J. Yan, Electricity price forecasting by a hybrid model, combining wavelet transform, ARMA and kernel-based extreme learning machine methods, Appl. Energy 190 (2017) 291–305. [12] M. Qin, Z. Du, Z. Du, Red tide time series forecasting by combining ARIMA and deep belief network, Knowl.-Based Syst. 125 (2017) 39–52. [13] M. Gan, H. Peng, X. Peng, X. Chen, G. Inoussa, A locally linear RBF networkbased state-dependent AR model for nonlinear time series modeling, Inform. Sci. 180 (22) (2010) 4370–4383.

[14] C. Hamzaçebi, Improving artificial neural networks’ performance in seasonal time series forecasting, Inform. Sci. 178 (23) (2008) 4550–4559. [15] G.P. Zhang, A neural network ensemble method with jittered training data for time series forecasting, Inform. Sci. 177 (23) (2007) 5329–5346. [16] J. Schmidhuber, Deep learning in neural networks: An overview, Neural Netw. 61 (2014) 85–117. [17] S. Palani, S.Y. Liong, P. Tkalich, An ANN application for water quality forecasting, Mar. Pollut. Bull. 56 (9) (2008) 1586–1597. [18] P. Mandal, T. Senjyu, N. Urasaki, T. Funabashi, A.K. Srivastava, A novel approach to forecast electricity price for PJM using neural network and similar days method, IEEE Trans. Power Syst. 22 (4) (2007) 2058–2065. [19] J. Zhou, S. Jing, L. Gong, Fine tuning support vector machines for short-term wind speed forecasting, Energy Convers. Manage. 52 (4) (2011) 1990–1998. [20] R. Adhikari, A neural network based linear ensemble framework for time series forecasting, Neurocomputing 157 (C) (2015) 231–242. [21] E. Egrioglu, U. Yolcu, C.H. Aladag, E. Bas, Recurrent multiplicative neuron model artificial neural network for non-linear time series forecasting, Neural Process. Lett. 41 (2) (2015) 249–258. [22] L. Wang, Y. Zeng, T. Chen, Back propagation neural network with adaptive differential evolution algorithm for time series forecasting, Expert Syst. Appl. 42 (2) (2015) 855–863. [23] T. Kuremoto, S. Kimura, K. Kobayashi, M. Obayashi, Time series forecasting using a deep belief network with restricted Boltzmann machines, Neurocomputing 137 (15) (2014) 47–56. [24] W.C. Hong, Chaotic particle swarm optimization algorithm in a support vector regression electric load forecasting model, Energy Convers. Manage. 50 (1) (2009) 105–117. [25] G.E. Hinton, R.R. Salakhutdinov, Reducing the dimensionality of data with neural networks, Science 313 (5786) (2006) 504–507. [26] C.Y. Zhang, C.L.P. Chen, M. Gan, L. Chen, Predictive deep boltzmann machine for multiperiod wind speed forecasting, IEEE Trans. Sustainable Energy 6 (4) (2017) 1416–1425. [27] J.F. Torres, A.M. Fernández, A. Troncoso, F. Martínez-Álvarez, Deep learningbased approach for time series forecasting with application to electricity load, in: International Work-Conference on the Interplay Between Natural and Artificial Computation, 2017, pp. 203–212. [28] H.B. Huang, R.X. Li, M.L. Yang, C.L. Teik, P.D. Wei, Evaluation of vehicle interior sound quality using a continuous restricted Boltzmann machine-based DBN, Mech. Syst. Signal Process. 84 (2017) 245–267. [29] X. Ma, Z. Tao, Y. Wang, H. Yu, Y. Wang, Long short-term memory neural network for traffic speed prediction using remote microwave sensor data, Transp. Res. C 54 (2015) 187–197. [30] G.P. Zhang, Time series forecasting using a hybrid ARIMA and neural network model, Neurocomputing 50 (1) (2003) 159–175. [31] Q.L. Ma, Q.L. Zheng, H. Peng, T.W. Zhong, L.Q. Xu, Chaotic time series prediction based on evolving recurrent neural networks, in: International Conference on Machine Learning and Cybernetics, Vol. 58, 2007, pp. 3496–3500. [32] S. Sello, Solar cycle forecasting: A nonlinear dynamics approach, Astron. Astrophys. 377 (1) (2001) 312–320. [33] T. Koskela, M. Lehtokangas, J. Saarinen, K. Kaski, Time series prediction with multilayer perceptron, FIR and elman neural networks, in: Proceedings of the World Congress on Neural Networks, 1996, pp. 491–496. [34] A.G. Mcnish, J.V. Lincoln, Prediction of sunspot numbers, EOS Trans. Am. Geophys. Union 30 (5) (1949) 673–685. [35] K. Denkmayr, P. Cugnon, About sunspot number medium-term predictions, in: G. Heckman, et al. (Eds.), Solar-Terrestrial Prediction Workshop V, Hiraiso Solar Terrestrial Research Center, 1997, p. 103. [36] X. Qiu, P.N. Suganthan, G.A.J. Amaratunga, Ensemble incremental learning Random Vector Functional Link network for short-term electric load forecasting, Knowl.-Based Syst. 145 (2018) 182–196. [37] X. Qiu, Y. Ren, P.N. Suganthan, G.A.J. Amaratunga, Empirical mode decomposition based ensemble deep learning for load demand time series forecasting, Appl. Soft Comput. 54 (C) (2017) 246–255. [38] X. Qiu, L. Zhang, Y. Ren, P.N. Suganthan, G. Amaratunga, Ensemble deep learning for regression and time series forecasting, in: Computational Intelligence in Ensemble Learning, 2015, pp. 1–6. [39] C. Cortes, V. Vapnik, Support-vector networks, Mach. Learn. 20 (3) (1995) 273–297. [40] S. Haykin, Neural Networks: A Comprehensive Foundation, InternationalEdition, Prentice Hall, 1999. [41] L. Breiman, Random forests, Mach. Learn. 45 (1) (2001) 5–32. [42] L. Ye, P. Liu, Combined model based on EMD-SVM for short-term wind powerprediction, in: Proc. Chinese Society for Electrical Engineering (CSEE), Vol. 31, 2011, pp. 102–108. [43] H. Liu, C. Chen, H. Tian, Y. Li, A hybrid model for wind speed prediction usingempirical mode decomposition and artificial neural networks, Renew. Energy 48 (2012) 545–556. [44] F. Schimbinschi, L. Moreira-Matias, V.X. Nguyen, J. Bailey, Topologyregularized universal vector autoregression for traffic forecasting in large urban areas, Expert Syst. Appl. 82 (2017) 301–316.

W. Xu, H. Peng, X. Zeng et al. / Applied Soft Computing Journal 77 (2019) 605–621 [45] S. Li, P. Wang, L. Goel, A novel wavelet-based ensemble method for shortterm load forecasting with hybrid neural networks and feature selection, IEEE Trans. Power Syst. 31 (3) (2016) 1788–1798. [46] Y. Bai, Z. Chen, J. Xie, C. Li, Daily reservoir inflow forecasting using multiscale deep feature learning with hybrid models, J. Hydrol. 532 (2016) 193–206. [47] M. Ardalani-Farsa, S. Zolfaghari, Chaotic time series prediction with residual analysis method using hybrid Elman–NARX neural networks, Neurocomputing 73 (13) (2010) 2540–2553. [48] A. Gholipour, B.N. Araabi, C. Lucas, Predicting chaotic time series using neural and neurofuzzy models: A comparative study, Neural Process. Lett. 24 (3) (2006) 217–239. [49] K.K. Teo, L. Wang, Z. Lin, Wavelet packet multi-layer perceptron for chaotic time series prediction: Effects of weight initialization, in: Proc. Intelligent Systems Design and Applications, 2001, pp. 310–317. [50] R. Chandra, M. Zhang, Cooperative coevolution of Elman recurrent neural networks for chaotic time series prediction, Neurocomputing 86 (12) (2012) 116–123. [51] M.B. Priestley, State-dependent models: A general approach to non-linear time series analysis, J. Time 1 (1) (1980) 47–71. [52] J. Vesin, An amplitude-dependent autoregressive signal model based on a radial basis functions expansion, in: Proceedings of International Conference on Acoustics, Speech, Signal Processing, Vol. 3, 1993, pp. 129–132. [53] Z. Shi, Y. Tamura, T. Ozaki, Nonlinear time series modelling with the radial basis function-based state-dependent autoregressive model, Internat. J. Systems Sci. 30 (7) (1999) 717–727. [54] H. Peng, T. Ozaki, V. Haggan-Ozaki, Y. Toyoda, A parameter optimization method for radial basis function type models, IEEE Trans. Neural Netw. 14 (2) (2003) 432–438. [55] F. Zhou, H. Peng, X. Zeng, X. Tian, RBF-ARX Model-based two-stage scheduling RPC for dynamic systems with bounded disturbance, Neural Comput. Appl. 4 (2018) 1–16.

621

[56] M. Gan, C.L. Philip, L. Chen, C.Y. Zhang, Exploiting the interpretability and forecasting ability of the RBF-AR model for nonlinear time series, Internat. J. Systems Sci. 47 (8) (2016) 1868–1876. [57] S. Evt, Y.C. Shin, Radial basis function neural network for approximation and estimation of nonlinear stochastic dynamic systems, IEEE Trans. Neural Netw. 5 (4) (1994) 594. [58] I.R.H. Jackson, Convergence properties of radial basis functions, Constr. Approx. 4 (1) (1988) 243–264. [59] M.J.D. Powell, Radial basis functions for multivariable interpolation, A Review, in: IMA Conference on Algorithms for the Approximation of Functions and Data, RMCS, 1985. [60] M. Han, J. Xi, Efficient clustering of radial basis perceptron neural network for pattern recognition, Pattern Recognit. 37 (10) (2004) 2059–2067. [61] H. Bouzgou, N. Benoudjit, Multiple architecture system for wind speed prediction, Appl. Energy 88 (7) (2011) 2463–2471. [62] A.C. Damianou, N.D. Lawrence, Deep gaussian processes, Comput. Sci. (2012) 207–215. [63] SIDC (World Data Center for the Sunspot Index), http://sidc.oma.be/index php3S. [64] AEMO, Australian Energy Market Operator 2013, 2013. http://www.aemo. com.au/. [65] Y. Lv, Y. Duan, W. Kang, Z. Li, F.Y. Wang, Traffic flow prediction with big data: A deep learning approach, IEEE Trans. Intell. Transp. Syst. 16 (2) (2015) 865– 873. [66] V. Cerqueira, L. Torgo, F. Pinto, C. Soares, Arbitrated ensemble for time series forecasting, in: Joint European Conference on Machine Learning and Knowledge Discovery in Databases, 2017, pp. 478–494. [67] A. Dedinec, S. Filiposka, A. Dedlinec, L. Kocarev, Deep belief network based electricity load forecasting: an analysis of Macedonian case, Energy 115 (2016) 1688–1700.