Stability analysis of RBF network-based state-dependent autoregressive model for nonlinear time series

Applied Soft Computing 12 (2012) 174–181 Contents lists available at SciVerse ScienceDirect Applied Soft Computing journal homepage: www.elsevier.co...

Download PDF

439KB Sizes 0 Downloads 21 Views

Report

PDF Reader
Full Text

Applied Soft Computing 12 (2012) 174–181

Contents lists available at SciVerse ScienceDirect

Applied Soft Computing journal homepage: www.elsevier.com/locate/asoc

Stability analysis of RBF network-based state-dependent autoregressive model for nonlinear time series Min Gan a,b,∗ , Hui Peng a a b

School of Information Science and Engineering, Central South University, Changsha, Hunan 410083, China School of Electrical Engineering and Automation, Hefei University of Technology, Hefei, Anhui 230009, China

a r t i c l e

i n f o

Article history: Received 21 June 2010 Received in revised form 29 March 2011 Accepted 31 August 2011 Available online 7 September 2011 Keywords: Radial basis function network State-dependent autoregressive model Stability analysis Limit cycle Canadian lynx data Time series prediction

a b s t r a c t Varying-coefﬁcient models have attracted great attention in nonlinear time series analysis recently. In this paper, we consider a semi-parametric functional-coefﬁcient autoregressive model, called the radial basis function network-based state-dependent autoregressive (RBF-AR) model. The stability conditions and existing conditions of limit cycle of the RBF-AR model are discussed. An efﬁcient structured parameter estimation method and the modiﬁed multi-fold cross-validation criterion are applied to identify the RBFAR model. Application of the RBF-AR model to the famous Canadian lynx data is presented. The forecasting capability of the RBF-AR model is compared to those of other competing time series models, which shows that the RBF-AR model is as good as or better than other models for the postsample forecasts. © 2011 Elsevier B.V. All rights reserved.

1. Introduction Varying-coefﬁcient (or functional-coefﬁcient) models, whose parameters may vary with the value of some variables, offer a very ﬂexible structure for modeling nonlinear time series. In recent years, this kind of models has been popularly studied and has many important applications. For example, Cai et al. [1] applied the local linear regression technique for estimation of functionalcoefﬁcient regression models for nonlinear time series. Chen and Liu [2] studied nonparametric estimation and hypothesis testing procedures for functional-coefﬁcient autoregressive (FAR) models. Huang and Shen [3] proposed a global smoothing method based on polynomial splines for the estimation of functionalcoefﬁcient regression models. Harvill and Ray [4] extended the functional-coefﬁcient autoregressive model to the multivariate nonlinear time series framework. Akesson and Toivonen [5] studied state-dependent parameter representations of stochastic nonlinear sampled-data systems. Zhang [6] studied the proportional functional-coefﬁcient linear regression models. Cai et al. [7] studied functional-coefﬁcient regression models with nonstationary time series data. Cao et al. [8] proposed penalized spline estimation for functional coefﬁcient regression model under dependence.

Although the varying-coefﬁcient models have already a vast literature, some researchers believe that the research in this area is just at the beginning [6]. The aforementioned models may be traced back to the statedependent models of Priestley [9]: xt = (Xt−1 ) +

k

i (Xt−1 )xt−i + εt +

i=1

l

j (Xt−1 )εt−j

(1)

j=1

where {xt , t = 1, 2, . . .} are the time series, k and l are positive integers, Xt−1 = (εt−l , . . . , εt−1 , xt−k , . . . , xt−1 )T denotes the “state vector”, {εt } is a sequence of i.i.d. random variables, and εt is independent of {xt−i , i > 0}, (• ), {i (• )}, { j (• )} are measurable functions from k+l → . Many familiar time series models are special cases of the state-dependent model (1). We just name a few below. Take (• ), and {i (• )}, { j (• )} all as constants, then we obtain the linear ARMA model: xt = +

k

i xt−i + εt +

i=1

l

j εt−j

(2)

j=1

Take (• ), {i (• )} as constants, and set

p

j (Xt−1 )

= bj +

c x , then we may obtain the bilinear model [10]: i=1 ji t−i

∗ Corresponding author at: School of Electrical Engineering and Automation, Hefei University of Technology, Hefei, Anhui 230009, China. Tel.: +86 15215698383. E-mail address: [email protected] (M. Gan). 1568-4946/$ – see front matter © 2011 Elsevier B.V. All rights reserved. doi:10.1016/j.asoc.2011.08.055

xt =

k i=1

i xt−i + εt +

l j=1

bj εt−j +

p m i=1 j=1

cij xt−i εt−j

(3)

M. Gan, H. Peng / Applied Soft Computing 12 (2012) 174–181

The threshold autoregressive (TAR) model introduced by Tong [11], which is one of the simplest but widely used nonlinear time series model, can also be regarded a special case of state-dependent model (1): xt =

k

(i)

(i)

(i)

{(i) + 1 xt−1 + · · · + p xt−p + εt }I(xt−d ∈ ˝i )

(4)

i=1

where I(• ) is the usual indicator function, and {˝i } form a nonoverlapping partition of the real line. Ozaki [12] and Haggan and Ozaki [13] proposed the exponential autoregressive (ExpAR) model which exhibit certain well-known features of non-linear vibrations theory, such as amplitudedependent frequency, jump phenomena, and limit cycle behavior: 2 xt = (1 + 1 exp(−xt−1 ))xt−1 + · · · + (p + p 2 × exp(−xt−1 ))xt−p + εt

(5)

where {1 , . . . , p }, {1 , . . . , p } and > 0 are constants. Chen and Tsay [14] proposed the functional-coefﬁcient autoregressive (FAR) model: xt = 1 (X∗t−1 )xt−1 + · · · + p (X∗t−1 )xt−p + εt

(6)

where X∗t−1 = (xt−i1 , . . . , xt−ik )T is the threshold vector (or state vector) with i1 , . . . , ik as the threshold lags, i (X∗t−1 ) are measurable functions from k → . In practice, the k is usually a small number (e.g., k = 1 in [1,14]), because the models with large k are often not practically useful due to “curse of dimensionality” [1]. Obviously, the FAR model (6) is a special case of the state-dependent model (1) without any white noise term εt−i in the threshold vector and only with the simple AR structure, which makes the empirical estimation is much easier than the general state-dependent model. The main difﬁculty in using the FAR model (6) is specifying the functional coefﬁcients i (X∗t−1 ). An efﬁcient estimation approach is using the nonparametric regression techniques [1–3,8]. An alternative to the nonparametric estimation method, if we treat the functional speciﬁcation as a problem of function approximation from a multi-dimensional input space X∗t−1 to a one-dimensional scalar space, is the neural network approximation. Neural networks are very popular tools for time series modeling and forecasting [15–22]. Because of the “universal approximation” capability of the radial basis function (RBF) networks, Vesin [23] and Shi et al. [24] used a set of RBF networks to approximate the functional coefﬁcients of the state-dependent autoregressive (or FAR) model. The derived model, call the RBF network-based autoregressive (RBF-AR) model, takes the form

⎧ p ⎪ ∗ ⎪ x = (X ) + i (X∗t−1 )xt−i + εt ⎪ t 0 t−1 ⎪ ⎪ ⎪ i=1 ⎪ ⎪ m ⎪ ⎪ ⎪ ∗ ⎪ (X ) = c + ck exp{−k ||X∗t−1 − Zk ||2 } ⎨ 0 t−1 0 k=1

m ⎪ ⎪ ⎪ ⎪ i (X∗t−1 ) = ci,0 + ci,k exp{−k ||X∗t−1 − Zk ||2 } ⎪ ⎪ ⎪ ⎪ k=1 ⎪ ⎪ ⎪ X∗ = (xt−1 , . . . , xt−d )T ⎪ ⎩ t−1 T

(7)

Zk = (zk,1 , . . . , zk,d )

where p is the model order, m is the number of hidden nodes of the RBF networks, and d ≤ p is the dimension of the state vector X∗t−1 ; 0 (X∗t−1 ) and i (X∗t−1 ) are the state-dependent functional coefﬁcients which are all composed of RBF networks; Zk (k = 1, 2, ..., m) are the centers of the RBF networks; k > 0 (k = 1, 2, .., m) are the real scaling parameters; ck (k = 0, 1, 2, ..., m) and ci,k (i = 1, 2, ..., p; k = 0, 1, 2, ..., m) are real constants; ||• || denotes

175

the vector 2-norm. The RBF-AR model (7) treats the nonlinear process by splitting the state space up into a large number of small segments, and regarding the process as “locally linear” within each segment. It allows the coefﬁcients to change gradually, rather than abruptly as in the TAR model. This may be appealing in many applications. From Eq. (7), we can see that the RBF-AR model can be regarded as a generalized version of the classic ExpAR model (5). The model turns out, however, to has a much stronger capability in prediction and simulation than an ExpAR model [25]. If we take X∗t−1 = T

(xt−1 , xt−1 , . . . , d−1 xt−1 ) in model (7), it is easily can be seen that the instantaneous dynamics of the RBF-AR model depends not only on the present amplitude of the series but also on its velocity xt and/or acceleration 2 xt . Therefore, it could produce, for example, asymmetric nonlinear wave patterns in time series, since the model dynamics may be different when the series is increasing or decreasing [25]. On the other hand, the RBF-AR model actually shares the same ﬂexibility in characterizing complex dynamics with the RBF network, since it contains the RBF network as a component. The overwhelming advantage of this class of pseudo-linear AR models over the conventional nonlinear models may be more clearly seen in the application of the models in modern predictive control problems. Peng et al. [26] extended the RBF-AR model to the case where there are several exogenous variables (RBF-ARX model) to the system, and designed RBF-ARX model-based predictive control (MPC) strategies to the nitrogen oxide (NOx ) decomposition process in thermal power plants [27–31]. A major feature of the RBF-AR(X) model approach is that, in contrast to most other timevarying liner models, its parameters may be estimated off-line. In this paper, we study some probabilistic properties, the identiﬁcation procedure and application of the RBF-AR model. In Section 2, stability conditions and existing conditions of limit cycle of the RBF-AR model are discussed. Identiﬁcation of the RBF-AR model includes the choice of the orders, estimation of all the parameters, and selection of the appropriate state vector. Parameter optimization of the RBF-AR model is essentially a nonlinear optimization problem, which is a very difﬁcult task. However, the parameters of RBF-AR model can be classiﬁed into linear and nonlinear sets, and the number of linear parameters is usually much larger than the number of nonlinear parameters. Peng et al. [26] proposed a very efﬁcient estimation algorithm so called structured nonlinear parameter optimization method (SNPOM) for this kind of nonlinear model. The orders of the RBF-AR model are determined by the modiﬁed multi-fold cross-validation criterion which is proposed by Cai et al. [1]. We also consider the selection of the appropriate state vector for RBF-AR models, which is never considered in previous research. The identiﬁcation of the RBF-AR model based on SNPOM and the modiﬁed multi-fold cross-validation criterion is presented in Section 3. To compare the forecasting capabilities of the RBF-AR with some other competing time series models, the RBF-AR model is applied to the famous Canadian lynx data which is also used in Cai et al. [1], Zhang [32], Katijani [33], and Aladag et al. [34]. It is shown that the RBF-AR model is as good as or better than other models for the post-sample forecasts.

2. Stability analysis and limit cycle of the RBF-AR model In this section, we give the stability conditions and existing conditions of limit cycle of the RBF-AR model. It is not easy to check whether a time series generated by a nonlinear model is strict stationary. For a nonlinear time series model described by a stochastic difference equation, we usually ﬁrst represent the time series as a vector-valued Markov chain. Then, we derive the stationarity of the model by proving the corresponding Markov chain is ergodic.

176

M. Gan, H. Peng / Applied Soft Computing 12 (2012) 174–181

Next, we establish a Markov Chain in the p-dimensional Euclidean space for the model (7). Deﬁne Xt = (xt , . . . , xt−p+1 )T ,

εt = (εt , 0, . . . , 0)T

Because X∗t−1 in (7) is a subvector of Xt−1 , we can write (X∗t−1 ) as (Xt−1 ). Set T

a(X) = (0 (X), 0, . . . , 0) ,

lim

Theorem 2.1. Assume that the density function of εt in the RBFAR(p,m,d) model (7) is positive everywhere on the real line , and d = p. If all roots of the characteristic function deﬁned below z p − c1,0 z p−1 − · · · − cp,0 = 0

are inside the unit circle, the RBF-AR process in (7) is geometrically ergodic. Proof.

f (x) − ˛T x ||x||

||x||→∞

= lim

Set ˛ = (c1,0 , . . . , cp,0 )T , then

0 (X∗t−1 ) + p i (X∗t−1 )xt−i − ˛T x i=1 ||x||

||x||→∞

= lim

c0,0 + m c0,k exp −k ||X∗t−1 − Zk ||2 + p m ci,k exp{−k ||X∗t−1 − Zk ||2 } xt−i k=1 i=1 k=1 ||x||

||x||→∞

⎛

1 (X) ⎜ 1 ⎜ 0 A(X) = ⎜ ⎜ . ⎝ .. 0

2 (X) 0 1 .. . 0

··· ··· ··· .. . ···

p−1 (X) 0 0 .. . 1

⎞

Notice that because k > 0,

p (X) 0 ⎟ ⎟ 0 ⎟. .. ⎟ . ⎠ 0

lim =

lim =

(8)

xt = f (xt−1 , . . . , xt−p ) + εt

(9)

Then, we can obtain the following Markov Chain: Xt = f(Xt−1 ) + ␧t

(10) T

where f(x) = (f (x), x1 , . . . , xp−1 ) , for x = (x1 , . . . , xp ) . Lemma 2.1 ([35]). Suppose that f (• ) in model (9) is a measurable function and is bounded over bounded sets, and εt has a common almost everywhere positive density and Eεt = 0. The model (10), and thus model (9), is geometrically ergodic, if it satisﬁes the following condition: lim

||x||

=0

(11)

z p − ˛1 z p−1 − · · · − ˛p = / 0 for all |z| ≥ 1

(12)

Lemma 2.2 ([36]). Suppose f (• ) in model (9) is bounded over bounded sets, and the density function of εt is positive density everywhere. Then {Xt } satisfying (10) is aperiodic and -irreducible with being the Lebesgue measure. Lemma 2.3 ([37]). Let {Xt } be a -irreducible Markov Chain on a normed topological space. If the transition probability Pr(x, • ) is strongly continuous, then a sufﬁcient condition for the geometric ergodicity is that there exists a compact set K and a constant 0 < < 1 such that

wi,k exp{−k ||X∗t−1 − Zk ||2 }xt−i ||x||

∞, x ∈ K, ||x||, x ∈ /K

Lemma 2.4 ([38]). Let {Xt } be an aperiodic Markov chain. If there exists a ﬁxed positive integer h such that {Xht } is geometric ergodic, then the original series {Xt } is geometric ergodic. By Lemmas 2.1–2.4, we obtain the following theorems.

= 0,

=0

we have lim

||x||→∞

|f (x) − ˛T x| = 0. ||x||

And ˛ = (c1,0 , . . . , cp,0 )T satisﬁes the condition (12), and obviously, f (• ) is bounded over bounded sets. By Lemma 2.1, the RBF-AR model is geometric ergodic. When d < p in the RBF-AR(p,m,d) model (7), condition (11) cannot be stratiﬁed. We give Theorem 2.2 below in the case d < p. ∗ ), i = 1, · · ·, p in model Obviously, the functional coefﬁcients i (Xt−1 (7) are bounded by

||i (X(t − 1))|| ≤ ci,0 + ci,1 + · · · + ci,m

Let i = ci,0 + ci,1 + · · · + ci,m , i = 1, . . . , p. Theorem 2.2. Assume that the density function of εt in the RBFAR(p,m,d) model (7) is positive everywhere on the real line , and d < p. If all the roots of the characteristic function deﬁned below z p − 1 z p−1 − · · · − p = 0

where ␣ = (˛1 , . . . , ˛p )T satisﬁes the condition

E(||Xt+1 |||Xt = x) <

k=1

||x||→∞

We list several lemmas below which are useful for checking the ergodicity. To facilitate formulation, we rewrite model (7) in the general form:

||x||→∞

||x||

m

Xt = a(Xt−1 ) + A(Xt−1 )Xt−1 + εt

f (x) − ␣T x

c0,0 + m c0,k exp{−k ||X∗t−1 − Zk ||2 } k=1

||x||→∞

Then, we can rewrite the RBF-AR model (7) as

T

(13)

(14)

are inside the unit circle, the RBF-AR process in (7) is geometrically ergodic.

Proof. For model (8), by Lemma (2.2), Xt aperiodic Markov chain. Let

⎛

1 ⎜1 ⎜ C=⎜ 0 ⎜ . ⎝ .. 0

2 0 1 .. . 0

··· ··· ··· .. . ···

p−1 0 0 .. . 1

is a ␮-irreducible and

⎞

p 0 ⎟ ⎟ 0 ⎟ .. ⎟ . ⎠ 0

and Ip be the identity matrix of order p. Then, the eigenpolynomial of matrix C is

zI − C = zp − 1 zp−1 − · · · − p .

Hence, all the roots of the characteristic function (14) are eigenvalues of the matrix C. Let zmax be the maximum eigenvalue of C. Because max < 1 and lim ||Cn ||1/n = max , there exists a n→∞

M. Gan, H. Peng / Applied Soft Computing 12 (2012) 174–181

positive constant 0 < ı < 1 and an integer h such that ||Ch || < ı. Using the iterative formula (8), we have

⎛ ⎡ ⎤ h−1 h h−1 ⎣ A(Xt+i )⎦ E (||Xt+h |||Xt = x) = E ⎝ A(Xt+i )Xt + i=0 i=1 j=i ⎡ ⎤ ⎞ h h−1 ⎣ A(Xt+i )⎦ εt+i Xt = x ⎠ × a(Xt+i−1 ) + (15) i=1 j=i T

For any vector b = (b1 , ..., bp ) , we have [14]

||A(X)b|| ≤ ||C b ||

(16)

where b = (b1 , . . . , bp ) . Repeatedly applying (16) to (15), we obtain T

h h Ch−i a(Xt+i−1 ) Xt E(||Xt+h |||Xt = x) ≤ C |x| + E i=1 h h h h−i = x + E C ␧t+i ≤ C x + E Ch−i a(Xt+i−1 ) Xt i=1 i=1 h h Ch−i ␧t+i ≤ ı||x|| + E Ch−i a(Xt+i−1 ) Xt = x + E i=1 i=1 h = x + E Ch−i ␧t+i i=1

According to the deﬁnition of RBF-AR model (7), the second term above is bounded and is independent of x. Suppose the modeling error of RBF-AR model is bounded, then the third term above is also bounded and is independent of x. Thus we can ﬁnd a sufﬁciently large M > 0 such that when ||x|| > M,

h h Ch−i a(Xt+i−1 ) Xt = x + E ı||x|| + E Ch−i ␧t+i i=1

i=1

< ||x|| where ∈ (ı, 1). Hence, the compact set K = {x : ||x|| ≤ M} satisﬁes that when X ∈ / K, E(||Xt+h |||Xt = x) < ||x||. By Lemma 2.3, {Xht } is geometric ergodic, and then the original series {Xt } is geometric ergodic by Lemma 2.4. Nonlinear time series models may exhibit some nonlinear phenomena such as amplitude-dependent frequency, limit cycles. These phenomena occur in nonlinear vibration theory [12], which is one of the main ﬁelds that have stimulated the development of nonlinear differential equation theory. In this paper, we only discuss the limit cycle of a very simple RBF-AR model with the following form: xt = (c1,0 + c1,1 exp{−||Xt−1 − Z||2 })xt−1 + (c2,0 + c2,1 exp{−||Xt−1 − Z||2 })xt−2 + εt

(17)

where Xt−1 = (xt−1 , xt−2 ) . Deﬁne Xt−1 = (|xt−1 |, |xt−2 |)T . The T

limit cycle might be expected in the discrete time series model (17) when the characteristic roots lie inside the unit circle for large |Xt−1 | and move out of the unit circle for small |Xt−1 |. Therefore, if we ignore the white noise input, the necessary conditions for the RBF-AR model (17) to exhibit limit cycle behavior are that (i) The roots of z 2 − c1,0 z − c2,0 = 0 lie inside the unit cycle. (ii) The roots of z 2 − (c1,0 + c1,1 exp{−||Z||2 })z − (c2,0 + c2,1 exp{−||Z||2 }) = 0 do not all lie inside the unit circle.

177

On the other hand, if the model has no limit cycle, the process may converge to a singular point, x, which, if it exists, satisﬁes the equation (1 − c1,0 − c2,0 )x = (c1,1 + c2,1 )exp{−||X − Z||2 }x

(18)

Obviously x = 0 satisﬁes this equation, but this is an unstable singular point, if condition (ii) is satisﬁed. Then, if c1,1 + / 0, from Eq. (18) we obtain c2,1 =

log

1 − c1,0 − c2,0 c1,1 + c2,1

= −||X − Z||2

(19)

A sufﬁcient condition for Eq. (19) to have no real solution, i.e. for the possible existence of a limit cycle, is: (iii) {(1 − c1,0 − c2,0 )/(c1,1 + c2,1 ) > 1} or (1 − c1,0 − c2,0 )/(c1,1 + c2,1 ) < 0. A particular model which satisﬁes conditions (i)–(iii) is xt = (1.83 + 0.52 exp{−0.5||Xt−1 − Z||2 })xt−1 − (0.84 + 0.55 × exp{−0.5||Xt−1 − Z||2 })xt−2 + εt

(20)

T

where Z = ( 0.87 1 ) . When we use nonzero initial values to simulate model (20) without white noise input, as may be seen in Fig. 1, the limit cycle behavior did occur. 3. Identiﬁcation of the RBF-AR model As mentioned in Section 1, the identiﬁcation of the RBF-AR model includes the choice of order, estimation of all the parameters, and selection of the appropriate state vector. The difﬁculty for parameter estimation of neural network models is well-known even for a single-layer network, let alone the somewhat complex RBF-AR model. Applying a classical gradient-based optimization method to estimate all the parameters simultaneously may result a very slow convergence rate. Fortunately, the parameters of RBFAR model can be divided into linear and nonlinear sets, and the number of linear parameters is usually much larger than the number of nonlinear parameters. An efﬁcient iterative method for this kind of models is the structured nonlinear parameter optimization method (SNPOM) [26]. The search of this method centers on the nonlinear parameters optimization, but, at each of iteration in the optimization process, a search in the nonlinear (or linear) subspace is executed on the basis of the estimated values just obtained in linear (or nonlinear) subspace. The optimization in the nonlinear subspace uses a method similar to Levenberg–Marquardt method (LMM) [39], and the optimization in the linear subspace uses the LSM (least-squares method). SNPOM drastically accelerates the computational convergence and reduces the ill-conditioning of the optimization problem. For the model order selection, many information criteria may be used [40], e.g., the Akaike information criterion (AIC), Bayesian information criterion (BIC), and cross validation. Model selection based on an in-sample information criterion may not be a reliable guide to the out-of-sample performance [41]. For neural network models, some researchers [42,43] argue that the model selection should do out-of-sample tests to check how well the model generalizes beyond the estimation set for a reasonable number of observation. One simple approach is to partition the initial data set into several subsets and to compute an average score over the different partitions. In this paper, we adopt a simple and quick method called the modiﬁed multi-fold cross-validation criterion which is proposed by Cai et al. [1]. Additionally, we also consider choosing an appropriate state vector (or threshold vector) based on the cross-validation criterion for the RBF-AR models.

178

M. Gan, H. Peng / Applied Soft Computing 12 (2012) 174–181 3 2 1

xt

0 -1 -2 -3 -4

0

100

200

300

400

500

600

700

800

900

1000

t Fig. 1. Limit cycle simulated by the model (20). +1

3.1. Parameter estimation Here parameters in RBF-AR model (7) are classiﬁed into two sets: a nonlinear parameter set N =

The optimization calculation for the search for N at each iteration using Eq. (25) is followed by the immediate update of the +1 linear weights L using the LSM as follows:

T

(1 , 2 , ..., m , ZT1 , ZT2 , ..., ZTm ) and a linear parameter set L = {c0 , . . . , cm , c1,0 , . . . , cp,0 , c1,1 , . . . , cp,1 , . . . , c1,m , . . . , cp,m }T . Rewrite the model (7) in the following form: xt = (␪N , X∗t−1 ) L + εt T

(21)

where ␺ is the regression vector, which is obtained by factorizing equation (7) to the linear regression form with respect to the factor L . The objective function is taken to be: V ( L , N )

1 ||F( L , N )||22 2

⎡

(22)

⎤

xˆ +1| − x +1 ⎢ xˆ +2| +1 − x +2 ⎥ ⎥ F( L , N ) = ⎢ .. ⎣ ⎦ . xˆ M |M−1 − xM

(23)

where xˆ +1| is the one-step-ahead prediction of the output based on model (21), {xi i = + 1, + 2, ..., M } is the measured data set,

is the largest time lag in model (21), and M is the number of data observation. The parameter optimization problem is then to compute (ˆ L , ˆ N ) = arg min V ( L , N )

(24)

+1 L

=

M

t= +1

= N + ˇ d

(25)

where d is the search direction and ˇ is a scalar step length parameter representing the distance to the minimum. In order to increase the robustness of the search process, which is based on the LMM, d in Eq. (25) is obtained from a solution to the set of linear equations as follows:

T

T

J( N ) J( N ) + I d = −J( N ) F( L , N )

(26)

where the scalar controls both the magnitude and direction d . The size of is determined at each iteration by using the method of the function lsqnonlin in the MATLAB Optimization Toolbox.

t, +1 xt

(27)

t= +1

3.2. Selection of the orders and state vector The order (p, m, d) selection criterion we used is the modiﬁed cross-validation criterion which is attentive to the structure of time series data [1]. Let l and Q be two given positive integers such that M > lQ. The basic idea of this method is to use Q sub-series of lengths M − ql (q = 1, . . . , Q ) to estimate the unknown model parameters and to compute the one-step forecasting errors of the next section of the time series of length l based on the estimated models. The appropriate order is then chosen by minimizes AMS =

Q

AMSq

(28)

q=1

where for q = 1, . . . , Q 1 AMSq = l

!

M−ql+l

(q) xi − 0 (X∗t−1 ) +

i=M−ql+1

p

"2 (q) i (X∗t−1 )xt−i

i=1

and 0 (X∗t−1 ), {i (X∗t−1 )} are computed from the sample {xi , 1 ≤ i ≤ M − ql}. In practical implementations, Cai et al. [1] suggest to use l = [0.1M] and Q = 4. In previous research, no works have considered the selection of state vector (or threshold vector) in RBF-AR(X) models. However, choosing an appropriate state vector is very important in applying the state-dependent models. Knowledge of physical background of the data may be very helpful. When no prior information is available, a data-driven method is desirable. Now we redeﬁne (q)

+1

N

M

Note that in the above optimization process, the search in the nonlinear (or linear) subspace is executed on the basis of the estimated values just obtained in linear (or nonlinear) subspace. Therefore, at each iteration the linear weights are updated many times during the process of looking for the search direction to update the nonlinear parameters.

L ,N

The iteration step is denoted by ( = 0, 1, 2, ..., max ). For the nonlinear parameter vector N , the updating formula is

−1 T

t, +1 t, +1

(q)

X∗t−1 = (xt−i1 , . . . , xt−id )T

(29)

where i1 , . . . , id are integers. We call i1 , . . . , id the threshold lags, and assume max(i1 , . . . , id ) ≤ p. For a given candidate threshold lags i1 , . . . , id , we compute an AMS deﬁned in (28). Then, we may

M. Gan, H. Peng / Applied Soft Computing 12 (2012) 174–181

Table 1 Results of RBF-AR models with different values of p, m, d or threshold lags for the logged lynx data with base 10.

7000

6000

5000

Number of Lynx

179

4000

3000

2000

1000

0 1820

1840

1860

1880

1900

1920

1940

Year Fig. 2. Canadian lynx time series.

obtain the “best” state vector by minimizing the AMS deﬁned in (28) for different threshold lags.

p

m

d

Threshold lags

2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3 3 3 3 3

1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2

1 1 1 1 2 2 1 1 1 1 1 1 2 2 2 2 2 2 3 3

i1 i1 i1 i1 i1 i1 i1 i1 i1 i1 i1 i1 i1 i1 i1 i1 i1 i1 i1 i1

0.2281 0.2459 0.2260 0.2284 0.2165 0.2479 0.2692 0.3072 0.2385 0.2437 0.2405 0.2604 0.2433 0.3369 0.2672 0.3151 0.2433 0.2939 0.2475 0.2863

i2 = 2 i2 = 2

i2 i2 i2 i2 i2 i2 i2 i2

=2 =2 =3 =3 =3 =3 = 2, i3 = 3 = 2, i3 = 3

xt = 0.5716 + 2.5271 exp{−1.873||Xt−1 − Z||2 } + (1.1818 + 0.3743 exp{−1.8735||Xt−1 − Z||2 })xt−1 − (0.3469 + 1.2079 exp{−1.8735||Xt−1 − Z||2 })xt−2 + εt

3.5709

z 2 − 1.1818z + 0.3469 = 0

T

(30) . It can be seen that the two roots of (31)

lie inside the unit circle. By Theorem 2.1, the process given by ﬁtted RBF-AR model (30) is stationary. The limit cycle behavior of the estimated model, where the white noise is suppressed, is examined by using the ﬁrst two observed values of the original lynx series as

3.4

3.2

xt

The famous Canadian lynx data is one of several sets of data that have attracted considerable and special attention. The ecologists believe that these data reﬂect to some extent the relative magnitude of the lynx population, and therefore have been constantly used to examine the concepts such as “balance-of-nature”, predator and prey interaction, etc. The lynx series contains the number of lynx trapped per year in the Mackenzie River District of North-West Canada from 1821 to 1934. The data are plotted in Fig. 2. To compare the prediction performance of various models, we estimate the RBF-AR model using the ﬁrst 100 data points only, and leave out the remaining 14 data to check the prediction performance. Following previous literature, the logarithms (to the base 10) of the data are used. We apply the identiﬁcation procedure described in Section 3 to select the optimum RBF-AR model among the class of models with 1 ≤ d ≤ p and 2 ≤ p ≤ 3. For the number of hidden nodes m, in this paper we experimentally determine it as either 1 or 2. If the best model is obtained, then we try 3 hidden nodes case. We let Q = 4 and l = 10 in AMS deﬁned as in (28). Table 1 records the experimental results of RBF-AR models with different values of p, m, d or threshold lags. From Table 1, we can see that the prediction errors of RBF-AR models with one hidden node are all smaller than that of RBF-AR models with two hidden nodes. This indicates that “overﬁtting” occurs when more hidden nodes are used. Among these models, RBF-AR model with p = 2, m = 1, d = 2 performs best. The ﬁtted RBF-AR(2,1,2) model is

where Z = 3.1129 the equation

AMS

3.6

4. Application

=1 =1 =2 =2 = 1, = 1, =1 =1 =2 =2 =3 =3 = 1, = 1, = 1, = 1, = 2, = 2, = 1, = 1,

3

2.8

2.6

0

10

20

30

40

50

60

70

80

90

100

t (year) Fig. 3. Limit cycle for the RBF-AR model (30) without white noise.

initial values. The result of the simulation is shown in Fig. 3. As can be seen, limit cycle does in fact occur. Fig. 4 shows the autocorrelation of the residuals after ﬁtting the RBF-AR(2,1,2) model to the Canadian lynx data. The result suggests the estimated model perfectly satisﬁed the correlation validity test. Table 2 lists the mean squared error (MSE) values of the RBF-AR model and some other neural network models or hybrid models for the last 14 postsample data points. From Table 2, it can be seen that the RBF-AR model obtains smaller prediction error than other models. The comparison between the actual values and the predicted values from model (30) for the post-sample data is given in Fig. 5. It can be concluded that the RBF-AR model performs well. It is necessary to test whether the variance of the prediction errors obtained by RBF-AR model is signiﬁcantly smaller than other methods or not. Suppose the variance of the prediction error of one method is 12 , and the other one is 22 . The null hypothesis is H0 : 12 ≤ 22

(30)

and the alternative hypothesis is H1 : 12 > 22

(31)

180

M. Gan, H. Peng / Applied Soft Computing 12 (2012) 174–181 Table 3 F-values of ANN [32], ARIMA-ANN [32], FFNN [33], SARIMA-ERNN [34] vs. RBF-AR model.

1

Sample Autocorrelation

0.8

Methods

ANN vs. RBF-AR

ARIMA-ANN vs. RBF-AR

FFNN vs. RBF-AR

SARIMA-ERNN vs. RBF-AR

F-value

3.417

2.867

2.167

1.5

0.6

0.4

0.2

Table 4 The absolute post-sample prediction errors of different functional-coefﬁcient models for the Canadian lynx data (AAPE: average absolute prediction error).

0

Year

Value

RBF-AR (2,1,2) model

FAR model

TAR (2) model

1923 1924 1925 1926 1927 1928 1929 1930 1931 1932 1933 1934 AAPE

3.054 3.386 3.553 3.468 3.187 2.723 2.686 2.821 3.000 3.201 3.424 3.531

0.201 0.017 0.017 0.023 0.118 0.043 0.174 0.025 0.040 0.020 0.056 0.060 0.066

0.157 0.012 0.021 0.008 0.085 0.055 0.135 0.016 0.017 0.007 0.089 0.053 0.055

0.187 0.035 0.014 0.022 0.059 0.075 0.273 0.026 0.030 0.060 0.076 0.072 0.073

-0.2

-0.4

0

5

10

15

20

25

30

35

40

Lag Fig. 4. Autocorrelation of the residuals after ﬁtting an RBF-AR(2,1,2) model to the Canadian lynx data. Table 2 Comparison results among various models for the Canadian lynx data.

Zhang [32] Zhang [32] Kajtani et al. [33] Aladag et al. [34] This paper

Model

MSE

ANN ARIMA-ANN FFNN SARIMA-ERNN RBF-AR

0.0205 0.0172 0.0130 0.009 0.0073

The statistics F = S12 /S22 , where

# $ N $ 1 S=% (εi − ε¯ i )2 N−1

(32)

i=1

εi = xi − xˆ i

(33)

1 εi N N

ε¯ i =

(34)

i=1

where xi , xˆ i are observed data and predicted data respectively, N is the number of predicted data. 4

Table 3 shows the F values of ANN [32], ARIMA-ANN [32], FFNN [33], SARIMA-ERNN [34] vs. RBF-AR model. The critical value F˛ (13, 13) equals 2.08 at ˛ = 0.1. From Table 3, it can be seen that the F values of ANN vs. RBF-AR, ARIMA-ANN vs. RBF-AR, and FFNN vs. RBF-AR are all greater than the critical values, so we reject all the corresponding null hypotheses; this indicates that the results of ANN [32], ARIMA-ANN [32], FFNN [33] are signiﬁcantly worse than that of RBF-AR model. The F value of SARIMA-ERNN vs. RBF-AR is smaller than the critical values, so we accept the null hypothesis; this suggests the SARIMA-ERNN [34] and RBF-AR models achieve similar prediction error variances. Table 4 reports the one-step ahead absolute prediction errors (APE) of the last 12 post-sample points from different varyingcoefﬁcient models. The results of the FAR(2,2) and TAR(2) are taken from [1]. From the table, we can see that the RBF-AR model obtains smaller average absolute prediction error (AAPE) than the TAR model. It is known that the results obtained by the FAR model [1] for this Canadian lynx data are very competitive. However, the RBF-AR model achieves very similar AAPE with that of the FAR model.

3.5

5. Conclusion

3 2.5 2 1.5 1 predicted actual

0.5 0

0

2

4

6

8

10

12

14

Points Fig. 5. Actual and prediction values for the logged Canadian lynx data with base 10.

The RBF-AR model belongs to a class of varying-coefﬁcient models. It is a hybrid structure model with the advantage of the state-dependent AR model in the description of nonlinear dynamics and the merit of the RBF networks in function approximation. We discussed the stability conditions and some existing conditions of limit cycle of the RBF-AR model in this paper. Since the RBF-AR model is somewhat complex, a good parameter optimization method adapted to the model is crucial. The SNPOM is an excellent parameter optimization method for this type models. We also considered the selection of orders and the appropriate state vector based on the modiﬁed cross-validation criterion. The forecasting performance of RBF-AR model was examined by applying the model to the Canadian lynx data. Comparison results show that the RBF-AR modeling approach exhibits similar or better results than other competing time series models.

M. Gan, H. Peng / Applied Soft Computing 12 (2012) 174–181

Acknowledgments This work was supported by the International Science & Technology Cooperation Program of China under Grant 2011DFA10440, the Key International Cooperation Programs of Hunan Provincial Science & Technology Department, China under Grant 2009WK2009, the National Natural Science Foundation of China under Grants 70921001, and the Fundamental Research Funds for the Central Universities under Grants 2010HGBZ0597 and 2011HGQC0995. References [1] Z. Cai, J. Fan, Q. Yao, Functional-coefﬁcient regression models for nonlinear time series, Journal of the American Statistical Association 95 (451) (2000) 941–956. [2] R. Chen, L.-M. Liu, Functional coefﬁcient autoregressive models: estimation and tests of hypotheses, Journal of Time Series Analysis 22 (2) (2001) 151–173. [3] J.Z. Huang, H. Shen, Functional coefﬁcient regression models for nonlinear time series: a polynomial spline approach, Scandinavian Journal of Statistics 31 (2004) 515–534. [4] J.L. Harvill, B.K. Ray, Functional coefﬁcient autoregressive models for vector time series, Computational Statistics & Data Analysis 50 (2006) 3547–3566. [5] B.M. Akesson, H.T. Toivonen, State-dependent parameter modelling and identiﬁcation of stochastic non-linear sampled-data systems, Journal of Process Control 16 (2006) 877–886. [6] R. Zhang, Proportional functional coefﬁcient time series models, Journal of Statistical Planning and Inference 139 (2009) 749–763. [7] Z. Cai, Q. Li, J.Y. Park, Functional-coefﬁcient models for nonstationary time series data, Journal of Econometrics 148 (2009) 101–113. [8] Y. Cao, H. Lin, T.Z. Wu, Y. Yu, Penalized spline estimation for functional coefﬁcient regression models, Computational Statistics and Data Analysis 54 (2010) 891–905. [9] M.B. Priestley, State-dependent models: a general approach to non-linear time series analysis, Journal of Time Series Analysis 1 (1) (1980) 57–71. [10] C.W.J. Granger, A.P. Andersen, An Introduction to Bilinear Time Series Models, Vandenhoek and Ruprecht, Gottingen, 1978. [11] H. Tong, Threshold Models in Nonlinear Time Series Analysis, Springer Verlag, New York, 1983. [12] T. Ozaki, Non-linear time series models for non-linear random vibrations, Journal of Applied Probability 17 (1980) 84–93. [13] V. Haggan, T. Ozaki, Modeling nonlinear random vibrations using an amplitudedependent autoregressive time series model, Biometrika 68 (1981) 189–196. [14] R. Chen, R.S. Tsay, Functional-coefﬁcient autoregressive models, Journal of the American Statistical Association 88 (421) (1993) 298–308. [15] A. Jain, A.M. Kumar, Hybrid neural network models for hydrologic time series forecasting, Applied Soft Computing 7 (2) (2007) 585–592. [16] C. Lee, Y. Chiang, C. Shih, C. Tsai, Noisy time series prediction using M-estimator based robust radial basis function neural networks with growing and pruning techniques, Expert Systems with Applications 36 (3) (2009) 4717–4724. [17] F.A. Guerra, L. dos S. Coelho, Multi-step ahead nonlinear identiﬁcation of Lorenz’s chaotic system using radial basis neural network with learning by clustering and particle swarm optimization, Chaos, Solitons & Fractals 35 (5) (2008) 967–979. [18] L. dos S. Coelho, R.Z. Freire, G.H. dos Santos, N. Mendes, Identiﬁcation of temperature and moisture content ﬁelds using a combined neural network and clustering method approach, International Communications in Heat and Mass Transfer 36 (4) (2009) 304–313. [19] C. Lee, C. Ko, Time series prediction using RBF neural networks with a nonlinear time-varying evolution PSO algorithm, Neurocomputing 73 (1–3) (2009) 449–460.

181

[20] M. Gan, H. Peng, A hybrid approach for parameter optimization of RBF-AR model, in: Proceedings of 49th IEEE Conference on Decision and Control (CDC2010), Atlanta, GA, USA, 2010 December, pp. 4423–4428. [21] M. Gan, H. Peng, X. Peng, et al., A locally linear RBF network-based statedependent AR model for nonlinear time series modeling, Information Sciences 180 (22) (2010) 4370–4383. [22] P.P. Balestrassi, E. Popova, A.P. Paiva, J.W. Marangon Lima, Design of experiments on neural network’s training for nonlinear time series forecasting, Neurocomputing 72 (4–6) (2009) 1160–1178. [23] J. Vesin, An amplitude-dependent autoregressive signal model based on a radial basis function expansion, Proceedings of International Conference on Acoustics Speech, Signal Processing 3 (1993) I29–I132. [24] Z. Shi, Y. Tamura, T. Ozaki, Nonlinear time series modeling with the radial basis function-based state-dependent autoregressive model, International Journal of System Science 30 (1999) 717–727. [25] T. Ozaki, J.C. Jimenez, H. Peng, V.H. Ozaki, The innovation approach to the identiﬁcation of nonlinear causal models in time series analysis, in: D.R. Brillinger, E.A. Robinson, F.P. Schoenberg (Eds.), Time Series Analysis and Applications to Geophysical Systems, Springer, New York, 2004, pp. 195–226. [26] H. Peng, T. Ozaki, V. Haggan-Ozaki, Y. Toyoda, A parameter optimization method for radial basis function type models, IEEE Transactions on Neural Networks 14 (2003) 432–438. [27] H. Peng, T. Ozaki, Y. Toyoda, H. Shioya, et al., RBF-ARX model-based nonlinear system modeling and predictive control with application to a NOx decomposition process, Control Engineering Practice 12 (2004) 191–203. [28] H. Peng, Z.J. Yang, W. Gui, M. Wu, et al., Nonlinear system modeling and robust predictive control based on RBF-ARX model, Engineering Applications of Artiﬁcial Intelligence 20 (2007) 1–9. [29] H. Peng, K. Nakano, H. Shioya, Nonlinear predictive control using neural netsbased local linearization ARX Model—stability and industrial application, IEEE Transactions on Control Systems Technology 15 (1) (2007) 130–143. [30] H. Peng, J. Wu, G. Inoussa, Q. Deng, K. Nakano, Nonlinear system modeling and predictive control using the RBF nets-based quasi-linear ARX model, Control Engineering Practice 17 (2009) 59–66. [31] V. Haggan-Ozaki, T. Ozaki, Y. Toyoda, An Akaike state-space controller for RBFARX models, IEEE Trans, Control Systems Technology 17 (1) (2009) 191–198. [32] G.P. Zhang, Time series forecasting using a hybrid ARIMA and neural network model, Neurocomputing 50 (2003) 159–175. [33] Y. Katijani, W.K. Hipel, A.L. Mcleod, Forecasting nonlinear time series with feedforward neural networks: a case study of Canadian lynx data, Journal of Forecasting 24 (2005) 105–117. [34] C.H. Aladag, E. Egrioglu, C. Kadilar, Forecasting nonlinear time series with a hybrid methodology, Applied Mathematics Letters 22 (2009) 1467–1470. [35] H.Z. An, F.C. Huang, The geometrical ergodicity of nonlinear autoregressive models, Statistica Sinica 6 (1996) 943–956. [36] K.S. Chan, H. Tong, On the use of the deterministic Lyapunov function for the ergodicity of stochastic difference equations, Advances in Applied Probability 17 (1985) 666–678. [37] R.L. Tweedie, Criteria for classifying general Markov chains, Advances in Applied Probability 8 (1975) 737–771. [38] D. Tjøstheim, Non-linear time series and Markov chains, Advances in Applied Probability 22 (1990) 587–611. [39] D. Marquardt, An algorithm for least-squares estimation of nonlinear parameters, Journal of the Society for Industrial and Applied Mathematics 11 (1963) 431–441. [40] S. Konishi, G. Kitagawa, Information Criteria and Statistical Modeling, Springer, Berlin, 2008. [41] N.R. Swanson, H. White, A model selection approach to assessing the information in the term structure using linear models and artiﬁcial neural networks, Journal of Business and Economic Statistics 13 (3) (1995) 265–275. [42] M.J.L. Orr, Introduction to radial basis function networks, 1996. available: http://www.anc.ed.ac.uk/∼mjo/rbf.html. [43] P.D. McNeis, Neural Networks in Finance: Gaining Predictive Edge in the Market, Elsevier Academic Press, Burlington, MA, 2005.

Stability analysis of RBF network-based state-dependent autoregressive model for nonlinear time series

Stability analysis of RBF network-based state-dependent autoregressive model for nonlinear time series

Recommend Documents