Applied Soft Computing 12 (2012) 174–181
Contents lists available at SciVerse ScienceDirect
Applied Soft Computing journal homepage: www.elsevier.com/locate/asoc
Stability analysis of RBF network-based state-dependent autoregressive model for nonlinear time series Min Gan a,b,∗ , Hui Peng a a b
School of Information Science and Engineering, Central South University, Changsha, Hunan 410083, China School of Electrical Engineering and Automation, Hefei University of Technology, Hefei, Anhui 230009, China
a r t i c l e
i n f o
Article history: Received 21 June 2010 Received in revised form 29 March 2011 Accepted 31 August 2011 Available online 7 September 2011 Keywords: Radial basis function network State-dependent autoregressive model Stability analysis Limit cycle Canadian lynx data Time series prediction
a b s t r a c t Varying-coefficient models have attracted great attention in nonlinear time series analysis recently. In this paper, we consider a semi-parametric functional-coefficient autoregressive model, called the radial basis function network-based state-dependent autoregressive (RBF-AR) model. The stability conditions and existing conditions of limit cycle of the RBF-AR model are discussed. An efficient structured parameter estimation method and the modified multi-fold cross-validation criterion are applied to identify the RBFAR model. Application of the RBF-AR model to the famous Canadian lynx data is presented. The forecasting capability of the RBF-AR model is compared to those of other competing time series models, which shows that the RBF-AR model is as good as or better than other models for the postsample forecasts. © 2011 Elsevier B.V. All rights reserved.
1. Introduction Varying-coefficient (or functional-coefficient) models, whose parameters may vary with the value of some variables, offer a very flexible structure for modeling nonlinear time series. In recent years, this kind of models has been popularly studied and has many important applications. For example, Cai et al. [1] applied the local linear regression technique for estimation of functionalcoefficient regression models for nonlinear time series. Chen and Liu [2] studied nonparametric estimation and hypothesis testing procedures for functional-coefficient autoregressive (FAR) models. Huang and Shen [3] proposed a global smoothing method based on polynomial splines for the estimation of functionalcoefficient regression models. Harvill and Ray [4] extended the functional-coefficient autoregressive model to the multivariate nonlinear time series framework. Akesson and Toivonen [5] studied state-dependent parameter representations of stochastic nonlinear sampled-data systems. Zhang [6] studied the proportional functional-coefficient linear regression models. Cai et al. [7] studied functional-coefficient regression models with nonstationary time series data. Cao et al. [8] proposed penalized spline estimation for functional coefficient regression model under dependence.
Although the varying-coefficient models have already a vast literature, some researchers believe that the research in this area is just at the beginning [6]. The aforementioned models may be traced back to the statedependent models of Priestley [9]: xt = (Xt−1 ) +
k
i (Xt−1 )xt−i + εt +
i=1
l
j (Xt−1 )εt−j
(1)
j=1
where {xt , t = 1, 2, . . .} are the time series, k and l are positive integers, Xt−1 = (εt−l , . . . , εt−1 , xt−k , . . . , xt−1 )T denotes the “state vector”, {εt } is a sequence of i.i.d. random variables, and εt is independent of {xt−i , i > 0}, (• ), {i (• )}, { j (• )} are measurable functions from k+l → . Many familiar time series models are special cases of the state-dependent model (1). We just name a few below. Take (• ), and {i (• )}, { j (• )} all as constants, then we obtain the linear ARMA model: xt = +
k
i xt−i + εt +
i=1
l
j εt−j
(2)
j=1
Take (• ), {i (• )} as constants, and set
p
j (Xt−1 )
= bj +
c x , then we may obtain the bilinear model [10]: i=1 ji t−i
∗ Corresponding author at: School of Electrical Engineering and Automation, Hefei University of Technology, Hefei, Anhui 230009, China. Tel.: +86 15215698383. E-mail address:
[email protected] (M. Gan). 1568-4946/$ – see front matter © 2011 Elsevier B.V. All rights reserved. doi:10.1016/j.asoc.2011.08.055
xt =
k i=1
i xt−i + εt +
l j=1
bj εt−j +
p m i=1 j=1
cij xt−i εt−j
(3)
M. Gan, H. Peng / Applied Soft Computing 12 (2012) 174–181
The threshold autoregressive (TAR) model introduced by Tong [11], which is one of the simplest but widely used nonlinear time series model, can also be regarded a special case of state-dependent model (1): xt =
k
(i)
(i)
(i)
{(i) + 1 xt−1 + · · · + p xt−p + εt }I(xt−d ∈ ˝i )
(4)
i=1
where I(• ) is the usual indicator function, and {˝i } form a nonoverlapping partition of the real line. Ozaki [12] and Haggan and Ozaki [13] proposed the exponential autoregressive (ExpAR) model which exhibit certain well-known features of non-linear vibrations theory, such as amplitudedependent frequency, jump phenomena, and limit cycle behavior: 2 xt = (1 + 1 exp(−xt−1 ))xt−1 + · · · + (p + p 2 × exp(−xt−1 ))xt−p + εt
(5)
where {1 , . . . , p }, {1 , . . . , p } and > 0 are constants. Chen and Tsay [14] proposed the functional-coefficient autoregressive (FAR) model: xt = 1 (X∗t−1 )xt−1 + · · · + p (X∗t−1 )xt−p + εt
(6)
where X∗t−1 = (xt−i1 , . . . , xt−ik )T is the threshold vector (or state vector) with i1 , . . . , ik as the threshold lags, i (X∗t−1 ) are measurable functions from k → . In practice, the k is usually a small number (e.g., k = 1 in [1,14]), because the models with large k are often not practically useful due to “curse of dimensionality” [1]. Obviously, the FAR model (6) is a special case of the state-dependent model (1) without any white noise term εt−i in the threshold vector and only with the simple AR structure, which makes the empirical estimation is much easier than the general state-dependent model. The main difficulty in using the FAR model (6) is specifying the functional coefficients i (X∗t−1 ). An efficient estimation approach is using the nonparametric regression techniques [1–3,8]. An alternative to the nonparametric estimation method, if we treat the functional specification as a problem of function approximation from a multi-dimensional input space X∗t−1 to a one-dimensional scalar space, is the neural network approximation. Neural networks are very popular tools for time series modeling and forecasting [15–22]. Because of the “universal approximation” capability of the radial basis function (RBF) networks, Vesin [23] and Shi et al. [24] used a set of RBF networks to approximate the functional coefficients of the state-dependent autoregressive (or FAR) model. The derived model, call the RBF network-based autoregressive (RBF-AR) model, takes the form
⎧ p ⎪ ∗ ⎪ x = (X ) + i (X∗t−1 )xt−i + εt ⎪ t 0 t−1 ⎪ ⎪ ⎪ i=1 ⎪ ⎪ m ⎪ ⎪ ⎪ ∗ ⎪ (X ) = c + ck exp{−k ||X∗t−1 − Zk ||2 } ⎨ 0 t−1 0 k=1
m ⎪ ⎪ ⎪ ⎪ i (X∗t−1 ) = ci,0 + ci,k exp{−k ||X∗t−1 − Zk ||2 } ⎪ ⎪ ⎪ ⎪ k=1 ⎪ ⎪ ⎪ X∗ = (xt−1 , . . . , xt−d )T ⎪ ⎩ t−1 T
(7)
Zk = (zk,1 , . . . , zk,d )
where p is the model order, m is the number of hidden nodes of the RBF networks, and d ≤ p is the dimension of the state vector X∗t−1 ; 0 (X∗t−1 ) and i (X∗t−1 ) are the state-dependent functional coefficients which are all composed of RBF networks; Zk (k = 1, 2, ..., m) are the centers of the RBF networks; k > 0 (k = 1, 2, .., m) are the real scaling parameters; ck (k = 0, 1, 2, ..., m) and ci,k (i = 1, 2, ..., p; k = 0, 1, 2, ..., m) are real constants; ||• || denotes
175
the vector 2-norm. The RBF-AR model (7) treats the nonlinear process by splitting the state space up into a large number of small segments, and regarding the process as “locally linear” within each segment. It allows the coefficients to change gradually, rather than abruptly as in the TAR model. This may be appealing in many applications. From Eq. (7), we can see that the RBF-AR model can be regarded as a generalized version of the classic ExpAR model (5). The model turns out, however, to has a much stronger capability in prediction and simulation than an ExpAR model [25]. If we take X∗t−1 = T
(xt−1 , xt−1 , . . . , d−1 xt−1 ) in model (7), it is easily can be seen that the instantaneous dynamics of the RBF-AR model depends not only on the present amplitude of the series but also on its velocity xt and/or acceleration 2 xt . Therefore, it could produce, for example, asymmetric nonlinear wave patterns in time series, since the model dynamics may be different when the series is increasing or decreasing [25]. On the other hand, the RBF-AR model actually shares the same flexibility in characterizing complex dynamics with the RBF network, since it contains the RBF network as a component. The overwhelming advantage of this class of pseudo-linear AR models over the conventional nonlinear models may be more clearly seen in the application of the models in modern predictive control problems. Peng et al. [26] extended the RBF-AR model to the case where there are several exogenous variables (RBF-ARX model) to the system, and designed RBF-ARX model-based predictive control (MPC) strategies to the nitrogen oxide (NOx ) decomposition process in thermal power plants [27–31]. A major feature of the RBF-AR(X) model approach is that, in contrast to most other timevarying liner models, its parameters may be estimated off-line. In this paper, we study some probabilistic properties, the identification procedure and application of the RBF-AR model. In Section 2, stability conditions and existing conditions of limit cycle of the RBF-AR model are discussed. Identification of the RBF-AR model includes the choice of the orders, estimation of all the parameters, and selection of the appropriate state vector. Parameter optimization of the RBF-AR model is essentially a nonlinear optimization problem, which is a very difficult task. However, the parameters of RBF-AR model can be classified into linear and nonlinear sets, and the number of linear parameters is usually much larger than the number of nonlinear parameters. Peng et al. [26] proposed a very efficient estimation algorithm so called structured nonlinear parameter optimization method (SNPOM) for this kind of nonlinear model. The orders of the RBF-AR model are determined by the modified multi-fold cross-validation criterion which is proposed by Cai et al. [1]. We also consider the selection of the appropriate state vector for RBF-AR models, which is never considered in previous research. The identification of the RBF-AR model based on SNPOM and the modified multi-fold cross-validation criterion is presented in Section 3. To compare the forecasting capabilities of the RBF-AR with some other competing time series models, the RBF-AR model is applied to the famous Canadian lynx data which is also used in Cai et al. [1], Zhang [32], Katijani [33], and Aladag et al. [34]. It is shown that the RBF-AR model is as good as or better than other models for the post-sample forecasts.
2. Stability analysis and limit cycle of the RBF-AR model In this section, we give the stability conditions and existing conditions of limit cycle of the RBF-AR model. It is not easy to check whether a time series generated by a nonlinear model is strict stationary. For a nonlinear time series model described by a stochastic difference equation, we usually first represent the time series as a vector-valued Markov chain. Then, we derive the stationarity of the model by proving the corresponding Markov chain is ergodic.
176
M. Gan, H. Peng / Applied Soft Computing 12 (2012) 174–181
Next, we establish a Markov Chain in the p-dimensional Euclidean space for the model (7). Define Xt = (xt , . . . , xt−p+1 )T ,
εt = (εt , 0, . . . , 0)T
Because X∗t−1 in (7) is a subvector of Xt−1 , we can write (X∗t−1 ) as (Xt−1 ). Set T
a(X) = (0 (X), 0, . . . , 0) ,
lim
Theorem 2.1. Assume that the density function of εt in the RBFAR(p,m,d) model (7) is positive everywhere on the real line , and d = p. If all roots of the characteristic function defined below z p − c1,0 z p−1 − · · · − cp,0 = 0
are inside the unit circle, the RBF-AR process in (7) is geometrically ergodic. Proof.
f (x) − ˛T x ||x||
||x||→∞
= lim
Set ˛ = (c1,0 , . . . , cp,0 )T , then
0 (X∗t−1 ) + p i (X∗t−1 )xt−i − ˛T x i=1 ||x||
||x||→∞
= lim
c0,0 + m c0,k exp −k ||X∗t−1 − Zk ||2 + p m ci,k exp{−k ||X∗t−1 − Zk ||2 } xt−i k=1 i=1 k=1 ||x||
||x||→∞
⎛
1 (X) ⎜ 1 ⎜ 0 A(X) = ⎜ ⎜ . ⎝ .. 0
2 (X) 0 1 .. . 0
··· ··· ··· .. . ···
p−1 (X) 0 0 .. . 1
⎞
Notice that because k > 0,
p (X) 0 ⎟ ⎟ 0 ⎟. .. ⎟ . ⎠ 0
lim =
lim =
(8)
xt = f (xt−1 , . . . , xt−p ) + εt
(9)
Then, we can obtain the following Markov Chain: Xt = f(Xt−1 ) + t
(10) T
where f(x) = (f (x), x1 , . . . , xp−1 ) , for x = (x1 , . . . , xp ) . Lemma 2.1 ([35]). Suppose that f (• ) in model (9) is a measurable function and is bounded over bounded sets, and εt has a common almost everywhere positive density and Eεt = 0. The model (10), and thus model (9), is geometrically ergodic, if it satisfies the following condition: lim
||x||
=0
(11)
z p − ˛1 z p−1 − · · · − ˛p = / 0 for all |z| ≥ 1
(12)
Lemma 2.2 ([36]). Suppose f (• ) in model (9) is bounded over bounded sets, and the density function of εt is positive density everywhere. Then {Xt } satisfying (10) is aperiodic and -irreducible with being the Lebesgue measure. Lemma 2.3 ([37]). Let {Xt } be a -irreducible Markov Chain on a normed topological space. If the transition probability Pr(x, • ) is strongly continuous, then a sufficient condition for the geometric ergodicity is that there exists a compact set K and a constant 0 < < 1 such that
wi,k exp{−k ||X∗t−1 − Zk ||2 }xt−i ||x||
∞, x ∈ K, ||x||, x ∈ /K
Lemma 2.4 ([38]). Let {Xt } be an aperiodic Markov chain. If there exists a fixed positive integer h such that {Xht } is geometric ergodic, then the original series {Xt } is geometric ergodic. By Lemmas 2.1–2.4, we obtain the following theorems.
= 0,
=0
we have lim
||x||→∞
|f (x) − ˛T x| = 0. ||x||
And ˛ = (c1,0 , . . . , cp,0 )T satisfies the condition (12), and obviously, f (• ) is bounded over bounded sets. By Lemma 2.1, the RBF-AR model is geometric ergodic. When d < p in the RBF-AR(p,m,d) model (7), condition (11) cannot be stratified. We give Theorem 2.2 below in the case d < p. ∗ ), i = 1, · · ·, p in model Obviously, the functional coefficients i (Xt−1 (7) are bounded by
||i (X(t − 1))|| ≤ ci,0 + ci,1 + · · · + ci,m
Let i = ci,0 + ci,1 + · · · + ci,m , i = 1, . . . , p. Theorem 2.2. Assume that the density function of εt in the RBFAR(p,m,d) model (7) is positive everywhere on the real line , and d < p. If all the roots of the characteristic function defined below z p − 1 z p−1 − · · · − p = 0
where ␣ = (˛1 , . . . , ˛p )T satisfies the condition
E(||Xt+1 |||Xt = x) <
k=1
||x||→∞
We list several lemmas below which are useful for checking the ergodicity. To facilitate formulation, we rewrite model (7) in the general form:
||x||→∞
||x||
m
Xt = a(Xt−1 ) + A(Xt−1 )Xt−1 + εt
f (x) − ␣T x
c0,0 + m c0,k exp{−k ||X∗t−1 − Zk ||2 } k=1
||x||→∞
Then, we can rewrite the RBF-AR model (7) as
T
(13)
(14)
are inside the unit circle, the RBF-AR process in (7) is geometrically ergodic.
Proof. For model (8), by Lemma (2.2), Xt aperiodic Markov chain. Let
⎛
1 ⎜1 ⎜ C=⎜ 0 ⎜ . ⎝ .. 0
2 0 1 .. . 0
··· ··· ··· .. . ···
p−1 0 0 .. . 1
is a -irreducible and
⎞
p 0 ⎟ ⎟ 0 ⎟ .. ⎟ . ⎠ 0
and Ip be the identity matrix of order p. Then, the eigenpolynomial of matrix C is
zI − C = zp − 1 zp−1 − · · · − p .
Hence, all the roots of the characteristic function (14) are eigenvalues of the matrix C. Let zmax be the maximum eigenvalue of C. Because max < 1 and lim ||Cn ||1/n = max , there exists a n→∞
M. Gan, H. Peng / Applied Soft Computing 12 (2012) 174–181
positive constant 0 < ı < 1 and an integer h such that ||Ch || < ı. Using the iterative formula (8), we have
⎛ ⎡ ⎤ h−1 h h−1 ⎣ A(Xt+i )⎦ E (||Xt+h |||Xt = x) = E ⎝ A(Xt+i )Xt + i=0 i=1 j=i ⎡ ⎤ ⎞ h h−1 ⎣ A(Xt+i )⎦ εt+i Xt = x ⎠ × a(Xt+i−1 ) + (15) i=1 j=i T
For any vector b = (b1 , ..., bp ) , we have [14]
||A(X)b|| ≤ ||C b ||
(16)
where b = (b1 , . . . , bp ) . Repeatedly applying (16) to (15), we obtain T
h h Ch−i a(Xt+i−1 ) Xt E(||Xt+h |||Xt = x) ≤ C |x| + E i=1 h h h h−i = x + E C t+i ≤ C x + E Ch−i a(Xt+i−1 ) Xt i=1 i=1 h h Ch−i t+i ≤ ı||x|| + E Ch−i a(Xt+i−1 ) Xt = x + E i=1 i=1 h = x + E Ch−i t+i i=1
According to the definition of RBF-AR model (7), the second term above is bounded and is independent of x. Suppose the modeling error of RBF-AR model is bounded, then the third term above is also bounded and is independent of x. Thus we can find a sufficiently large M > 0 such that when ||x|| > M,
h h Ch−i a(Xt+i−1 ) Xt = x + E ı||x|| + E Ch−i t+i i=1
i=1
< ||x|| where ∈ (ı, 1). Hence, the compact set K = {x : ||x|| ≤ M} satisfies that when X ∈ / K, E(||Xt+h |||Xt = x) < ||x||. By Lemma 2.3, {Xht } is geometric ergodic, and then the original series {Xt } is geometric ergodic by Lemma 2.4. Nonlinear time series models may exhibit some nonlinear phenomena such as amplitude-dependent frequency, limit cycles. These phenomena occur in nonlinear vibration theory [12], which is one of the main fields that have stimulated the development of nonlinear differential equation theory. In this paper, we only discuss the limit cycle of a very simple RBF-AR model with the following form: xt = (c1,0 + c1,1 exp{−||Xt−1 − Z||2 })xt−1 + (c2,0 + c2,1 exp{−||Xt−1 − Z||2 })xt−2 + εt
(17)
where Xt−1 = (xt−1 , xt−2 ) . Define Xt−1 = (|xt−1 |, |xt−2 |)T . The T
limit cycle might be expected in the discrete time series model (17) when the characteristic roots lie inside the unit circle for large |Xt−1 | and move out of the unit circle for small |Xt−1 |. Therefore, if we ignore the white noise input, the necessary conditions for the RBF-AR model (17) to exhibit limit cycle behavior are that (i) The roots of z 2 − c1,0 z − c2,0 = 0 lie inside the unit cycle. (ii) The roots of z 2 − (c1,0 + c1,1 exp{−||Z||2 })z − (c2,0 + c2,1 exp{−||Z||2 }) = 0 do not all lie inside the unit circle.
177
On the other hand, if the model has no limit cycle, the process may converge to a singular point, x, which, if it exists, satisfies the equation (1 − c1,0 − c2,0 )x = (c1,1 + c2,1 )exp{−||X − Z||2 }x
(18)
Obviously x = 0 satisfies this equation, but this is an unstable singular point, if condition (ii) is satisfied. Then, if c1,1 + / 0, from Eq. (18) we obtain c2,1 =
log
1 − c1,0 − c2,0 c1,1 + c2,1
= −||X − Z||2
(19)
A sufficient condition for Eq. (19) to have no real solution, i.e. for the possible existence of a limit cycle, is: (iii) {(1 − c1,0 − c2,0 )/(c1,1 + c2,1 ) > 1} or (1 − c1,0 − c2,0 )/(c1,1 + c2,1 ) < 0. A particular model which satisfies conditions (i)–(iii) is xt = (1.83 + 0.52 exp{−0.5||Xt−1 − Z||2 })xt−1 − (0.84 + 0.55 × exp{−0.5||Xt−1 − Z||2 })xt−2 + εt
(20)
T
where Z = ( 0.87 1 ) . When we use nonzero initial values to simulate model (20) without white noise input, as may be seen in Fig. 1, the limit cycle behavior did occur. 3. Identification of the RBF-AR model As mentioned in Section 1, the identification of the RBF-AR model includes the choice of order, estimation of all the parameters, and selection of the appropriate state vector. The difficulty for parameter estimation of neural network models is well-known even for a single-layer network, let alone the somewhat complex RBF-AR model. Applying a classical gradient-based optimization method to estimate all the parameters simultaneously may result a very slow convergence rate. Fortunately, the parameters of RBFAR model can be divided into linear and nonlinear sets, and the number of linear parameters is usually much larger than the number of nonlinear parameters. An efficient iterative method for this kind of models is the structured nonlinear parameter optimization method (SNPOM) [26]. The search of this method centers on the nonlinear parameters optimization, but, at each of iteration in the optimization process, a search in the nonlinear (or linear) subspace is executed on the basis of the estimated values just obtained in linear (or nonlinear) subspace. The optimization in the nonlinear subspace uses a method similar to Levenberg–Marquardt method (LMM) [39], and the optimization in the linear subspace uses the LSM (least-squares method). SNPOM drastically accelerates the computational convergence and reduces the ill-conditioning of the optimization problem. For the model order selection, many information criteria may be used [40], e.g., the Akaike information criterion (AIC), Bayesian information criterion (BIC), and cross validation. Model selection based on an in-sample information criterion may not be a reliable guide to the out-of-sample performance [41]. For neural network models, some researchers [42,43] argue that the model selection should do out-of-sample tests to check how well the model generalizes beyond the estimation set for a reasonable number of observation. One simple approach is to partition the initial data set into several subsets and to compute an average score over the different partitions. In this paper, we adopt a simple and quick method called the modified multi-fold cross-validation criterion which is proposed by Cai et al. [1]. Additionally, we also consider choosing an appropriate state vector (or threshold vector) based on the cross-validation criterion for the RBF-AR models.
178
M. Gan, H. Peng / Applied Soft Computing 12 (2012) 174–181 3 2 1
xt
0 -1 -2 -3 -4
0
100
200
300
400
500
600
700
800
900
1000
t Fig. 1. Limit cycle simulated by the model (20). +1
3.1. Parameter estimation Here parameters in RBF-AR model (7) are classified into two sets: a nonlinear parameter set N =
The optimization calculation for the search for N at each iteration using Eq. (25) is followed by the immediate update of the +1 linear weights L using the LSM as follows:
T
(1 , 2 , ..., m , ZT1 , ZT2 , ..., ZTm ) and a linear parameter set L = {c0 , . . . , cm , c1,0 , . . . , cp,0 , c1,1 , . . . , cp,1 , . . . , c1,m , . . . , cp,m }T . Rewrite the model (7) in the following form: xt = (N , X∗t−1 ) L + εt T
(21)
where is the regression vector, which is obtained by factorizing equation (7) to the linear regression form with respect to the factor L . The objective function is taken to be: V ( L , N )
1 ||F( L , N )||22 2
⎡
(22)
⎤
xˆ +1| − x +1 ⎢ xˆ +2| +1 − x +2 ⎥ ⎥ F( L , N ) = ⎢ .. ⎣ ⎦ . xˆ M |M−1 − xM
(23)
where xˆ +1| is the one-step-ahead prediction of the output based on model (21), {xi i = + 1, + 2, ..., M } is the measured data set,
is the largest time lag in model (21), and M is the number of data observation. The parameter optimization problem is then to compute (ˆ L , ˆ N ) = arg min V ( L , N )
(24)
+1 L
=
M
t= +1
= N + ˇ d
(25)
where d is the search direction and ˇ is a scalar step length parameter representing the distance to the minimum. In order to increase the robustness of the search process, which is based on the LMM, d in Eq. (25) is obtained from a solution to the set of linear equations as follows:
T
T
J( N ) J( N ) + I d = −J( N ) F( L , N )
(26)
where the scalar controls both the magnitude and direction d . The size of is determined at each iteration by using the method of the function lsqnonlin in the MATLAB Optimization Toolbox.
t, +1 xt
(27)
t= +1
3.2. Selection of the orders and state vector The order (p, m, d) selection criterion we used is the modified cross-validation criterion which is attentive to the structure of time series data [1]. Let l and Q be two given positive integers such that M > lQ. The basic idea of this method is to use Q sub-series of lengths M − ql (q = 1, . . . , Q ) to estimate the unknown model parameters and to compute the one-step forecasting errors of the next section of the time series of length l based on the estimated models. The appropriate order is then chosen by minimizes AMS =
Q
AMSq
(28)
q=1
where for q = 1, . . . , Q 1 AMSq = l
!
M−ql+l
(q) xi − 0 (X∗t−1 ) +
i=M−ql+1
p
"2 (q) i (X∗t−1 )xt−i
i=1
and 0 (X∗t−1 ), {i (X∗t−1 )} are computed from the sample {xi , 1 ≤ i ≤ M − ql}. In practical implementations, Cai et al. [1] suggest to use l = [0.1M] and Q = 4. In previous research, no works have considered the selection of state vector (or threshold vector) in RBF-AR(X) models. However, choosing an appropriate state vector is very important in applying the state-dependent models. Knowledge of physical background of the data may be very helpful. When no prior information is available, a data-driven method is desirable. Now we redefine (q)
+1
N
M
Note that in the above optimization process, the search in the nonlinear (or linear) subspace is executed on the basis of the estimated values just obtained in linear (or nonlinear) subspace. Therefore, at each iteration the linear weights are updated many times during the process of looking for the search direction to update the nonlinear parameters.
L ,N
The iteration step is denoted by ( = 0, 1, 2, ..., max ). For the nonlinear parameter vector N , the updating formula is
−1 T
t, +1 t, +1
(q)
X∗t−1 = (xt−i1 , . . . , xt−id )T
(29)
where i1 , . . . , id are integers. We call i1 , . . . , id the threshold lags, and assume max(i1 , . . . , id ) ≤ p. For a given candidate threshold lags i1 , . . . , id , we compute an AMS defined in (28). Then, we may
M. Gan, H. Peng / Applied Soft Computing 12 (2012) 174–181
Table 1 Results of RBF-AR models with different values of p, m, d or threshold lags for the logged lynx data with base 10.
7000
6000
5000
Number of Lynx
179
4000
3000
2000
1000
0 1820
1840
1860
1880
1900
1920
1940
Year Fig. 2. Canadian lynx time series.
obtain the “best” state vector by minimizing the AMS defined in (28) for different threshold lags.
p
m
d
Threshold lags
2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3 3 3 3 3
1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2
1 1 1 1 2 2 1 1 1 1 1 1 2 2 2 2 2 2 3 3
i1 i1 i1 i1 i1 i1 i1 i1 i1 i1 i1 i1 i1 i1 i1 i1 i1 i1 i1 i1
0.2281 0.2459 0.2260 0.2284 0.2165 0.2479 0.2692 0.3072 0.2385 0.2437 0.2405 0.2604 0.2433 0.3369 0.2672 0.3151 0.2433 0.2939 0.2475 0.2863
i2 = 2 i2 = 2
i2 i2 i2 i2 i2 i2 i2 i2
=2 =2 =3 =3 =3 =3 = 2, i3 = 3 = 2, i3 = 3
xt = 0.5716 + 2.5271 exp{−1.873||Xt−1 − Z||2 } + (1.1818 + 0.3743 exp{−1.8735||Xt−1 − Z||2 })xt−1 − (0.3469 + 1.2079 exp{−1.8735||Xt−1 − Z||2 })xt−2 + εt
3.5709
z 2 − 1.1818z + 0.3469 = 0
T
(30) . It can be seen that the two roots of (31)
lie inside the unit circle. By Theorem 2.1, the process given by fitted RBF-AR model (30) is stationary. The limit cycle behavior of the estimated model, where the white noise is suppressed, is examined by using the first two observed values of the original lynx series as
3.4
3.2
xt
The famous Canadian lynx data is one of several sets of data that have attracted considerable and special attention. The ecologists believe that these data reflect to some extent the relative magnitude of the lynx population, and therefore have been constantly used to examine the concepts such as “balance-of-nature”, predator and prey interaction, etc. The lynx series contains the number of lynx trapped per year in the Mackenzie River District of North-West Canada from 1821 to 1934. The data are plotted in Fig. 2. To compare the prediction performance of various models, we estimate the RBF-AR model using the first 100 data points only, and leave out the remaining 14 data to check the prediction performance. Following previous literature, the logarithms (to the base 10) of the data are used. We apply the identification procedure described in Section 3 to select the optimum RBF-AR model among the class of models with 1 ≤ d ≤ p and 2 ≤ p ≤ 3. For the number of hidden nodes m, in this paper we experimentally determine it as either 1 or 2. If the best model is obtained, then we try 3 hidden nodes case. We let Q = 4 and l = 10 in AMS defined as in (28). Table 1 records the experimental results of RBF-AR models with different values of p, m, d or threshold lags. From Table 1, we can see that the prediction errors of RBF-AR models with one hidden node are all smaller than that of RBF-AR models with two hidden nodes. This indicates that “overfitting” occurs when more hidden nodes are used. Among these models, RBF-AR model with p = 2, m = 1, d = 2 performs best. The fitted RBF-AR(2,1,2) model is
where Z = 3.1129 the equation
AMS
3.6
4. Application
=1 =1 =2 =2 = 1, = 1, =1 =1 =2 =2 =3 =3 = 1, = 1, = 1, = 1, = 2, = 2, = 1, = 1,
3
2.8
2.6
0
10
20
30
40
50
60
70
80
90
100
t (year) Fig. 3. Limit cycle for the RBF-AR model (30) without white noise.
initial values. The result of the simulation is shown in Fig. 3. As can be seen, limit cycle does in fact occur. Fig. 4 shows the autocorrelation of the residuals after fitting the RBF-AR(2,1,2) model to the Canadian lynx data. The result suggests the estimated model perfectly satisfied the correlation validity test. Table 2 lists the mean squared error (MSE) values of the RBF-AR model and some other neural network models or hybrid models for the last 14 postsample data points. From Table 2, it can be seen that the RBF-AR model obtains smaller prediction error than other models. The comparison between the actual values and the predicted values from model (30) for the post-sample data is given in Fig. 5. It can be concluded that the RBF-AR model performs well. It is necessary to test whether the variance of the prediction errors obtained by RBF-AR model is significantly smaller than other methods or not. Suppose the variance of the prediction error of one method is 12 , and the other one is 22 . The null hypothesis is H0 : 12 ≤ 22
(30)
and the alternative hypothesis is H1 : 12 > 22
(31)
180
M. Gan, H. Peng / Applied Soft Computing 12 (2012) 174–181 Table 3 F-values of ANN [32], ARIMA-ANN [32], FFNN [33], SARIMA-ERNN [34] vs. RBF-AR model.
1
Sample Autocorrelation
0.8
Methods
ANN vs. RBF-AR
ARIMA-ANN vs. RBF-AR
FFNN vs. RBF-AR
SARIMA-ERNN vs. RBF-AR
F-value
3.417
2.867
2.167
1.5
0.6
0.4
0.2
Table 4 The absolute post-sample prediction errors of different functional-coefficient models for the Canadian lynx data (AAPE: average absolute prediction error).
0
Year
Value
RBF-AR (2,1,2) model
FAR model
TAR (2) model
1923 1924 1925 1926 1927 1928 1929 1930 1931 1932 1933 1934 AAPE
3.054 3.386 3.553 3.468 3.187 2.723 2.686 2.821 3.000 3.201 3.424 3.531
0.201 0.017 0.017 0.023 0.118 0.043 0.174 0.025 0.040 0.020 0.056 0.060 0.066
0.157 0.012 0.021 0.008 0.085 0.055 0.135 0.016 0.017 0.007 0.089 0.053 0.055
0.187 0.035 0.014 0.022 0.059 0.075 0.273 0.026 0.030 0.060 0.076 0.072 0.073
-0.2
-0.4
0
5
10
15
20
25
30
35
40
Lag Fig. 4. Autocorrelation of the residuals after fitting an RBF-AR(2,1,2) model to the Canadian lynx data. Table 2 Comparison results among various models for the Canadian lynx data.
Zhang [32] Zhang [32] Kajtani et al. [33] Aladag et al. [34] This paper
Model
MSE
ANN ARIMA-ANN FFNN SARIMA-ERNN RBF-AR
0.0205 0.0172 0.0130 0.009 0.0073
The statistics F = S12 /S22 , where
# $ N $ 1 S=% (εi − ε¯ i )2 N−1
(32)
i=1
εi = xi − xˆ i
(33)
1 εi N N
ε¯ i =
(34)
i=1
where xi , xˆ i are observed data and predicted data respectively, N is the number of predicted data. 4
Table 3 shows the F values of ANN [32], ARIMA-ANN [32], FFNN [33], SARIMA-ERNN [34] vs. RBF-AR model. The critical value F˛ (13, 13) equals 2.08 at ˛ = 0.1. From Table 3, it can be seen that the F values of ANN vs. RBF-AR, ARIMA-ANN vs. RBF-AR, and FFNN vs. RBF-AR are all greater than the critical values, so we reject all the corresponding null hypotheses; this indicates that the results of ANN [32], ARIMA-ANN [32], FFNN [33] are significantly worse than that of RBF-AR model. The F value of SARIMA-ERNN vs. RBF-AR is smaller than the critical values, so we accept the null hypothesis; this suggests the SARIMA-ERNN [34] and RBF-AR models achieve similar prediction error variances. Table 4 reports the one-step ahead absolute prediction errors (APE) of the last 12 post-sample points from different varyingcoefficient models. The results of the FAR(2,2) and TAR(2) are taken from [1]. From the table, we can see that the RBF-AR model obtains smaller average absolute prediction error (AAPE) than the TAR model. It is known that the results obtained by the FAR model [1] for this Canadian lynx data are very competitive. However, the RBF-AR model achieves very similar AAPE with that of the FAR model.
3.5
5. Conclusion
3 2.5 2 1.5 1 predicted actual
0.5 0
0
2
4
6
8
10
12
14
Points Fig. 5. Actual and prediction values for the logged Canadian lynx data with base 10.
The RBF-AR model belongs to a class of varying-coefficient models. It is a hybrid structure model with the advantage of the state-dependent AR model in the description of nonlinear dynamics and the merit of the RBF networks in function approximation. We discussed the stability conditions and some existing conditions of limit cycle of the RBF-AR model in this paper. Since the RBF-AR model is somewhat complex, a good parameter optimization method adapted to the model is crucial. The SNPOM is an excellent parameter optimization method for this type models. We also considered the selection of orders and the appropriate state vector based on the modified cross-validation criterion. The forecasting performance of RBF-AR model was examined by applying the model to the Canadian lynx data. Comparison results show that the RBF-AR modeling approach exhibits similar or better results than other competing time series models.
M. Gan, H. Peng / Applied Soft Computing 12 (2012) 174–181
Acknowledgments This work was supported by the International Science & Technology Cooperation Program of China under Grant 2011DFA10440, the Key International Cooperation Programs of Hunan Provincial Science & Technology Department, China under Grant 2009WK2009, the National Natural Science Foundation of China under Grants 70921001, and the Fundamental Research Funds for the Central Universities under Grants 2010HGBZ0597 and 2011HGQC0995. References [1] Z. Cai, J. Fan, Q. Yao, Functional-coefficient regression models for nonlinear time series, Journal of the American Statistical Association 95 (451) (2000) 941–956. [2] R. Chen, L.-M. Liu, Functional coefficient autoregressive models: estimation and tests of hypotheses, Journal of Time Series Analysis 22 (2) (2001) 151–173. [3] J.Z. Huang, H. Shen, Functional coefficient regression models for nonlinear time series: a polynomial spline approach, Scandinavian Journal of Statistics 31 (2004) 515–534. [4] J.L. Harvill, B.K. Ray, Functional coefficient autoregressive models for vector time series, Computational Statistics & Data Analysis 50 (2006) 3547–3566. [5] B.M. Akesson, H.T. Toivonen, State-dependent parameter modelling and identification of stochastic non-linear sampled-data systems, Journal of Process Control 16 (2006) 877–886. [6] R. Zhang, Proportional functional coefficient time series models, Journal of Statistical Planning and Inference 139 (2009) 749–763. [7] Z. Cai, Q. Li, J.Y. Park, Functional-coefficient models for nonstationary time series data, Journal of Econometrics 148 (2009) 101–113. [8] Y. Cao, H. Lin, T.Z. Wu, Y. Yu, Penalized spline estimation for functional coefficient regression models, Computational Statistics and Data Analysis 54 (2010) 891–905. [9] M.B. Priestley, State-dependent models: a general approach to non-linear time series analysis, Journal of Time Series Analysis 1 (1) (1980) 57–71. [10] C.W.J. Granger, A.P. Andersen, An Introduction to Bilinear Time Series Models, Vandenhoek and Ruprecht, Gottingen, 1978. [11] H. Tong, Threshold Models in Nonlinear Time Series Analysis, Springer Verlag, New York, 1983. [12] T. Ozaki, Non-linear time series models for non-linear random vibrations, Journal of Applied Probability 17 (1980) 84–93. [13] V. Haggan, T. Ozaki, Modeling nonlinear random vibrations using an amplitudedependent autoregressive time series model, Biometrika 68 (1981) 189–196. [14] R. Chen, R.S. Tsay, Functional-coefficient autoregressive models, Journal of the American Statistical Association 88 (421) (1993) 298–308. [15] A. Jain, A.M. Kumar, Hybrid neural network models for hydrologic time series forecasting, Applied Soft Computing 7 (2) (2007) 585–592. [16] C. Lee, Y. Chiang, C. Shih, C. Tsai, Noisy time series prediction using M-estimator based robust radial basis function neural networks with growing and pruning techniques, Expert Systems with Applications 36 (3) (2009) 4717–4724. [17] F.A. Guerra, L. dos S. Coelho, Multi-step ahead nonlinear identification of Lorenz’s chaotic system using radial basis neural network with learning by clustering and particle swarm optimization, Chaos, Solitons & Fractals 35 (5) (2008) 967–979. [18] L. dos S. Coelho, R.Z. Freire, G.H. dos Santos, N. Mendes, Identification of temperature and moisture content fields using a combined neural network and clustering method approach, International Communications in Heat and Mass Transfer 36 (4) (2009) 304–313. [19] C. Lee, C. Ko, Time series prediction using RBF neural networks with a nonlinear time-varying evolution PSO algorithm, Neurocomputing 73 (1–3) (2009) 449–460.
181
[20] M. Gan, H. Peng, A hybrid approach for parameter optimization of RBF-AR model, in: Proceedings of 49th IEEE Conference on Decision and Control (CDC2010), Atlanta, GA, USA, 2010 December, pp. 4423–4428. [21] M. Gan, H. Peng, X. Peng, et al., A locally linear RBF network-based statedependent AR model for nonlinear time series modeling, Information Sciences 180 (22) (2010) 4370–4383. [22] P.P. Balestrassi, E. Popova, A.P. Paiva, J.W. Marangon Lima, Design of experiments on neural network’s training for nonlinear time series forecasting, Neurocomputing 72 (4–6) (2009) 1160–1178. [23] J. Vesin, An amplitude-dependent autoregressive signal model based on a radial basis function expansion, Proceedings of International Conference on Acoustics Speech, Signal Processing 3 (1993) I29–I132. [24] Z. Shi, Y. Tamura, T. Ozaki, Nonlinear time series modeling with the radial basis function-based state-dependent autoregressive model, International Journal of System Science 30 (1999) 717–727. [25] T. Ozaki, J.C. Jimenez, H. Peng, V.H. Ozaki, The innovation approach to the identification of nonlinear causal models in time series analysis, in: D.R. Brillinger, E.A. Robinson, F.P. Schoenberg (Eds.), Time Series Analysis and Applications to Geophysical Systems, Springer, New York, 2004, pp. 195–226. [26] H. Peng, T. Ozaki, V. Haggan-Ozaki, Y. Toyoda, A parameter optimization method for radial basis function type models, IEEE Transactions on Neural Networks 14 (2003) 432–438. [27] H. Peng, T. Ozaki, Y. Toyoda, H. Shioya, et al., RBF-ARX model-based nonlinear system modeling and predictive control with application to a NOx decomposition process, Control Engineering Practice 12 (2004) 191–203. [28] H. Peng, Z.J. Yang, W. Gui, M. Wu, et al., Nonlinear system modeling and robust predictive control based on RBF-ARX model, Engineering Applications of Artificial Intelligence 20 (2007) 1–9. [29] H. Peng, K. Nakano, H. Shioya, Nonlinear predictive control using neural netsbased local linearization ARX Model—stability and industrial application, IEEE Transactions on Control Systems Technology 15 (1) (2007) 130–143. [30] H. Peng, J. Wu, G. Inoussa, Q. Deng, K. Nakano, Nonlinear system modeling and predictive control using the RBF nets-based quasi-linear ARX model, Control Engineering Practice 17 (2009) 59–66. [31] V. Haggan-Ozaki, T. Ozaki, Y. Toyoda, An Akaike state-space controller for RBFARX models, IEEE Trans, Control Systems Technology 17 (1) (2009) 191–198. [32] G.P. Zhang, Time series forecasting using a hybrid ARIMA and neural network model, Neurocomputing 50 (2003) 159–175. [33] Y. Katijani, W.K. Hipel, A.L. Mcleod, Forecasting nonlinear time series with feedforward neural networks: a case study of Canadian lynx data, Journal of Forecasting 24 (2005) 105–117. [34] C.H. Aladag, E. Egrioglu, C. Kadilar, Forecasting nonlinear time series with a hybrid methodology, Applied Mathematics Letters 22 (2009) 1467–1470. [35] H.Z. An, F.C. Huang, The geometrical ergodicity of nonlinear autoregressive models, Statistica Sinica 6 (1996) 943–956. [36] K.S. Chan, H. Tong, On the use of the deterministic Lyapunov function for the ergodicity of stochastic difference equations, Advances in Applied Probability 17 (1985) 666–678. [37] R.L. Tweedie, Criteria for classifying general Markov chains, Advances in Applied Probability 8 (1975) 737–771. [38] D. Tjøstheim, Non-linear time series and Markov chains, Advances in Applied Probability 22 (1990) 587–611. [39] D. Marquardt, An algorithm for least-squares estimation of nonlinear parameters, Journal of the Society for Industrial and Applied Mathematics 11 (1963) 431–441. [40] S. Konishi, G. Kitagawa, Information Criteria and Statistical Modeling, Springer, Berlin, 2008. [41] N.R. Swanson, H. White, A model selection approach to assessing the information in the term structure using linear models and artificial neural networks, Journal of Business and Economic Statistics 13 (3) (1995) 265–275. [42] M.J.L. Orr, Introduction to radial basis function networks, 1996. available: http://www.anc.ed.ac.uk/∼mjo/rbf.html. [43] P.D. McNeis, Neural Networks in Finance: Gaining Predictive Edge in the Market, Elsevier Academic Press, Burlington, MA, 2005.