PHYSICA ELS EVi ER
Physica D 112 (1998) 344-360
Regularized local linear prediction of chaotic time series D.
Kugiumtzis *, O.C. Lingj~erde, N. Christophersen
Department of lnformatics. Universi~, of Oslo, PO Box 1080, Blindern, N-0316 Oslo, Norway
Received 5 March 1997; received in revised form 23 June 1997; accepted 23 June 1997 Communicated by A.M. Albano
Abstract
Local linear prediction, based on the ordinary least squares (OLS) approach, is one of several methods that have been applied to prediction of chaotic time series. Apart from potential linearization errors, a drawback of this approach is the high variance of the predictions under certain conditions. Here, a different set of so-called linear regularization techniques, originally derived to solve ill-posed regression problems, are compared to OLS for chaotic time series corrupted by additive measurement noise. These methods reduce the variance compared to OLS, but introduce more bias. A main tool of analysis is the singular value decomposition (SVD), and a key to successful regularization is to damp the higher order SVD components. Several of the methods achieve improved prediction compared to OLS for synthetic noise-corrupted data from well-known chaotic systems. Similar results were found for real-world data from the R-R intervals of ECG signals. Good results are also obtained for real sunspot data, compared to published predictions using nonlinear techniques. Keywords: Time series; Chaos; Local linear prediction; Regularization
1. Introduction
Nonlinear prediction of chaotic time series has been a popular subject over the last decade [1]. The term "chaotic time series" is applied here in a rather broad sense, covering observations with complex behavior derived from underlying nonlinear phenomena of a presumed deterministic nature. In this field, a variety of more or less complex prediction techniques have been used including neural networks, projection pursuit regression and radial basis functions [2,3]. However, many methods give more or less the same quality of fit as shown in the comparative study in [ 1]. Applying Occam's razor in this situation, local linear methods represent simple and attractive alternatives, which * Corresponding author.
have not yet been explored to their full potential in time series prediction. The ordinary least squares (OLS) solution was applied to chaotic time series several years ago [4], but linear regression and other areas provide a rich set of so-called regularization techniques, developed to tackle ill-posed problems where the regression matrix is poorly conditioned. In prediction with noisy data, the OLS solution can have large variance and the regularization methods, designed to be more robust against noise, may provide better results. Well-known regularization techniques include principal components regression, partial least squares, ridge regression [5], and the truncated total least squares [6]. To our knowledge, a thorough assessment of regularization methods applied to local linear prediction has not previously been attempted. A simple form
0167-2789/98/$19.00 Copyright © 1998 Elsevier Science B.V. All rights reserved PII S0 167-2789(97)00 17 I - 1
345
D. Kugiumtzis et al./Physica D 112 (1998) 344-360
of regularization was applied by Sauer [7], but in a rather ad hoc manner. In nonlinear noise reduction, some techniques include procedures similar to principal components regression, but the regularization property is not emphasized [8]. The linear regression models are presented and compared in Section 2 using the singular value decomposition (SVD). These techniques are studied as part of other ongoing work [9]. Quantitative results for the prediction errors are presently only available under simplifying assumptions neglecting linearization errors and not taking effects of measurement noise fully into account. Simulation studies are presented in Section 3 using synthetic data from several well-known chaotic systems and in Section 4 using the R - R intervals of human electrocardiogram (ECG) data. Real sunspot data are considered in Section 5 where the performance of the local linear techniques is compared to previously published results using nonlinear methods.
2. Local linear prediction and regularization 2.1. The local linear model
Suppose a scalar time series xi = x(irs), i :
1. . . . . N, is given, where rs is the sampling time (for maps we have r~ : 1). Using the method of delays (MOD), the data are represented in m-dimensional space by the vectors Xi :
i=
[ X i - ( m - - I ) r , Xi
l+(m-l)r
(m-2)r .....
. . . . . N,
Xi] T,
(1)
where m is the embedding dimension, r is the delay time in units of rs and the superscript T stands for the transpose [ 10,11 ]. Given a subset 7- (the training set) of the vectors in Eq. (1) and a target point f~t not included in 7-, a local linear prediction of xt+T can be achieved as follows. Let ~t(l) . . . . . xt(k) be the k nearest neighbors in 7- to xt, ordered with respect to increasing distance from the target point. The local linear model is based on centered versions of the matrices X = (:~t~l). . . . . :~t~k))T E ~k,m and ~ = (xt¢~)+r . . . . . xt~k)+T) T E ~k. Let ~ be the column
vector of averages of the m columns of ,-~, ~ the average of the components of Y, and X=)(-
1~ T,
Xt = Xt -- X,
y=~-I
T,
Yt = Y t + T -- .V,
where 1 is a k x 1 vector of ones, x~ is the centered target point and Yt is the centered response. Scaling columns to equal variance is also frequently done. However, in our case the columns in X are already on the same scale, since they originate from the same time series. The following model is assumed: (2)
y = X b + e,
where b is a vector of m unknowns and e is a random error with expectation E{e} = 0 and covariance matrix Var{e} = cr21, where I is the k x k identity matrix. Usually one has k > m in linear regression. It is convenient to retain this constraint in the simulations, although our theoretical treatment is valid also for k < m. If the observed time series is corrupted mainly by white measurement noise and the relationship between nearby points on the reconstructed attractor is well approximated by a linear map, there should be some merit in making predictions based on the model in Eq. (2). By definition of X and y, we should expect elements of the former to be corrupted by noise at the same level as elements of the latter. Thus the implicit assumption in Eq. (2) is that y is linearly related to the corrupted X rather than the true X TM. A more realistic model would be y :
x t r u e b ÷ ~,
X :
X TM + ~'.
(3)
Here E{~} = 0, Var{~} = 6-21 and the columns of the k x m matrix C are uncorrelated and have the same distribution as ~. It is easily seen, however, that the model in Eq. (3) is of the form of Eq. (2) with ~2 = 6.2(1 + Hb[l~), where [l" dlz is the/2-norm. In this case, X and E become correlated which complicates a detailed theoretical analysis of the linear estimators [12]. To simplify further analysis, we use the model in Eq. (2) with the assumption that X is fixed. However, it is our experience from simulations that moderate
346
D. Kugiumtzis et al./Physica D 112 (1998) 344-360
noise corruption of X does not have any serious effect on the estimates. Assume in the following that the SVD of X 6 R k,m is defined as X = U ~ V T,
u T u = I,
vTv
response is y~rue _ (xttrue)Tb. Ignoring random fluctuations in X (i.e. assuming that X is fixed), the meansquare error in predicting y~rue with Yt is M S E ( Y t ) : E{y~ rue -- yt} 2 = (E{yt} -- y~rue)2
= 1,
+ E{~, - E{#t}} 2,
27 = diag(al . . . . . at), where r _< rain(k, m) is the rank of X and ~rl > or2 >_ • -. >_ err > 0 are the non-zero singular values of X. The columns of U and V span the r-dimensional range spaces R ( X ) C R k and R(X T) C ~m, respectively. The subspace R ( X ) is often referred to as the signal space and its orthogonal complement, the null space N ( x T ) , as the noise space [13].
(6)
where E denotes expectation with respect to errors in both y and the target vector in Eq. (5). The second identity in Eq. (6) decomposes MSE as a squared bias term and a variance term. For estimators I~ of the form in Eq. (4) where A is independent of y, we have bias(~t) = E{.~,} - y~rue F
= Z()~ i
Y true T -- l ) ( v i Xt )(Vi b)
(7)
i=1
2.2. Prediction estimators
and
Suppose X 6 ~k,m, y 6 ~k and xt 6 ~m are given and an estimate for Yt is wanted. The prediction estimator is defined as Yt = x~l~, where l~ is an estimate for b. We consider estimators for b of the following form [14]:
Var(~t) = E{~t - E{yt}} 2
fi = V Z - I A U T y
=
L)~i
T
~i(ui y)vi,
(4)
i=1
where A = diag0,1 . . . . . ~-r). This encompasses most of the well-known regularization estimators. Estimators of this form differ only in their choice of the diagonal elements of A, called filter factors. Typically 0 < ~,i ~< 1 and each ~-i then determines the extent of filtering or shrinking in the corresponding singular direction ui. Following the tradition in functional analysis, the inner products (uTy) are denoted the Fourier coefficients for y. The centered target vector xt may itself be corrupted by noise, i.e. Xt = xttrue + e t,
E {et } = 0 and Var{et } = cr21,
(5) where Xlrue represents the true, but unknown, centered target vector (here the identity matrix I is m × m). Assuming that the model in Eq. (2) holds for the true centered target vector, the true hypothetical centered
=0.2
Z
)'i ,. T true.2 _--~|tvi x t ) i=1 ° i
+ ~2(vTb)2 + ~2].
(8)
An immediate observation from Eq. (8) is that reduction of the filter factors ),i decreases the variance of the predictor at the possible cost of introducing extra bias in Eq. (7). The optimal trade-off between bias and variance consists of finding filter factors such that the mean-square error in Eq. (6) is minimized. The bias and variance terms in Eqs. (7) and (8) do not take into account random fluctuations in X. Typically Yt will depend on X in a complicated manner, making derivations of bias and variance expressions considerably more involved. With multinormal errors in X, an exact formula for MSE of the OLS estimator and asymptotic formulas for most of the other predictors considered below are given e.g. in [15]. In our application, linearization errors will also contribute to the prediction uncertainties, but these cannot be quantified in practice.
2.3. Ordinary least squares regression The ordinary least squares (OLS) regression estimate ]~OLS is the solution with least 12-norm of the
347
D. Kugiumtzis et al./Physica D 112 (1998) 344-360
and b. In other words, X and y are assumed compatible, in the sense that y should have its major component in the subspace spanned by the first and (relative to X) most representative columns of U. Partial least squares regression (PLS) [16,17] uses instead a projection into the subspace spanned by the
quadratic minimization problem min ]]y - Xbl]2 b
and is easily shown to be I~OLS = X+Y = V r - I U T y
= L
--1 (uTy)vi,
(9)
Krylov vectors
i=1 cri
XXTy, ( x x T ) 2 y . . . . . where X ~ is the Moore-Penrose generalized inverse of X. Note that only the projection of y into R(X) (spanned by the columns of U) contributes to I~OLS. The filter factors for the OLS estimator are simply ~-i = 1, i = 1. . . . . r (compare Eq. (9) and Eq. (4)). Without errors in X, it follows immediately from Eq. (7) that the OLS estimator is unbiased. Including such errors, OLS is no longer unbiased and yields estimates where the degree of shrinkage generally increases with the noise level [12].
2.4. Regularization techniques From Eq. (8) we observe that the prediction variance can be large due to several factors, one being small singular values of the design matrix X. This motivates consideration of regularized estimators that trade bias for variance. Regularization can be achieved by downweighting some or all of the singular directions by choosing filter factors less than 1 in Eq. (4), as is seen from the variance expression in Eq. (8). One class of estimators (the so-called subspace methods) makes use of a projection P of y into a subspace of R(X). The corresponding estimator for b is then defined as I] = X t Py. One such method is principal components regression (PCR), which uses for P a projection PPCR into the subspace spanned by the first q < r singular directions, i.e. the first q columns in U. For this method, ~.i : l, i = 1. . . . . q, while )~i : 0, i = q + 1. . . . . r. The underlying qualitative assumption behind this method is that the projections of y onto the last r - q columns of U are below the noise level and therefore give little or no information about the true y
(xxT)qy,
(10)
where the vectors in Eq. (10) are assumed to be linearly independent. Letting K denote the k x q matrix with the vectors in Eq. (10) as columns, the PLS estimator for b can be written as ]~PLS = X + K ( K T K ) -1 K T y .
Qualitatively, the PLS estimator shrinks the OLS estimate (in the sense that I]I~PLS112 < I]l~OLS112)by taking into account not only the size of the singular values, but also the size of the Fourier coefficients [18]. The expressions for the PLS filter factors are rather involved and can be found in [9]. Another estimator is the ridge regression (RR) estimator [19], which is defined as the solution of the penalized minimization problem min [lY - Xbll~ + / t Ilbll~, b
(11)
where # is an additional parameter which could be estimated from the data. The solution of Eq. (11) is easily shown to be
bt~t~ ----(X TX +/H)-IXTy-
(12)
The inverse in Eq. (12) always exists provided # > 0. The filter factors for the RR estimator are given by ~.i : (5i2 / (G~ -~ [~), showing that the ridge regression estimator shrinks the OLS estimator in every direction when/~ > 0. Truncated total least squares (TTLS) [20] is designed to improve the estimates for b when X is also contaminated by errors and is therefore a natural candidate for prediction of noisy time series. Note that I~oLs solves Xb = ~ where ~ is the smallest possible perturbation of y (in the sense that IlY - ~112 is minimum) such that ~ ~ R(X). Perturbing y may be seen as an attempt to correct for the noise present in the
D. Kugiumtziset al./PhysicaD 112 (1998)344-360
348
observation vector, since ~, is the projection of y into the signal space R(X) [6]. When X too is contaminated by noise, a more natural approach might be to determine b from 2b=L
(13)
where [.~ ~] is the smallest perturbation of [X y] such that Eq. (13) becomes solvable, i.e. [.,Y ~] is a solution of
II[X y] - [.~" Y][IF ~ E ROf),
minimize such that
(14)
where I] • IIF is the Frobenius norm. Any b satisfying ~'b : ~ is called a total least squares (TLS) solution. The TTLS estimator is found by solving Eq. (14) under the constraint that rank(.~) = q, for a given q _< r. The particular choice q = r yields the TLS estimator. Let [X y] have the SVD [X y]
:/.~T,
~ = diag(61 . . . . . 6"r+l),
(15) where the singular values 6"i are ordered decreasingly (6",.+j > 0 when v ¢ R(X)). Suppose ~" in Eq. (15) is partitioned as
~ = [VII V12] V21 V22 ' where Vll C ~m,q, VI2 G ~m,r-q+l, V21 G [~l.q, and V22 6 ~l.,--q+l. Then the TTLS solution exists provided Vzz ¢ 0, and is given by [6]
~3"rTLS =
-v12vJ2.
The filter factors hi, i = 1. . . . . r, for TTLS satisfy [14]
-9 Oq+l
64 {q+l~
1 <)~i< lq-0" 2 +O\~f-i4 ] f o r / <_q,
O<)~i < ,[V22,122~ ( 1 + 0 ( 0 - 2 ~
(16)
f o r q + l < i
0-2 provided that ILV22112 is not too small. The filter factors above 1 will tend to correct for the shrinkage implicit in the OLS estimate when X is noisecorrupted.
2.5. Selection of regularization parameters In selecting the parameter q in PCR, PLS or TTLS, or the parameter # in RR, we aim at the best possible trade-off between bias and variance. Several model selection criteria exist for linear estimators [21 ]. One of the most popular is cross-validation (CV), but we found that in the implementation on chaotic time series CV often overestimated the regularization parameters. A simpler approach is to use the same value for the regularization parameter in all local predictions for a given time series. In chaotic time series, a reasonable choice for a fixed q is the topological dimension of the underlying attractor, if this can somehow be estimated, e.g. from a dimension estimation method [22]. When the complexity of the underlying system varies locally, it may be more appropriate to let the regularization parameter vary with the target point. If there is a clear gap in the singular value spectrum (this is seldom the case in applications of the type considered in this paper), q can be chosen as the natural cut-off level in the spectrum. For PCR and TTLS, q is frequently determined by the singular spectrum alone, either by finding a threshold value that represents the noise variance, or by requiring that the included singular values account for at least a specified proportion of the total data variation in X. It was found that the former method gave q close to m and the latter gave values of q increasing with m. A different approach would be to use RR, and estimate/z directly from the data involved in each local least squares problem. In an automatic determination of the amount of shrinkage, the modeling and measurement errors (residuals) should be taken into account. The residual variance can be estimated by the projection of y into the noise space, i.e. N(xT). A basis for N ( X T) is conveniently given by the last k - m column vectors o f / J c ~k.k provided by the expanded SVD of X (X = / J ~ V , ~ c ~k,r) [23].
349
D. Kugiumtzis et al./Physica D 112 (1998) 344-360
(a)
(b)
0.8
,o, 2.5
Fourier coefficients
0.7
singular values
0.6
'
/
2
/
~0.5 c
', ',
/
"~1.5
] '
/
I-rLS RR PCR OLS
~0.4 1
~-0.3
----- = = -~
~
-i
0.2 0.5 0.1 0
2
index i
3
index i
4
5
6
Fig. 1. (a) The singular values cri, the magnitude of the Fourier coefficients luTyl, and the level of the residual standard error g are shown for a local prediction problem using data from the Ikeda map [29] (k = 15, m = 6). (b) The filter factors for the methods discussed in the text (see legend) are shown for the same problem as in (a). For RR, g : 32 and for PCR, PLS and TTLS, q = 3.
The estimate for the residual variance is ~2_
k
1 k -
5-" m
(u~y) 2.
(17)
i=m+l
The larger the residuals are, the larger the prediction variance becomes and hence the stronger the regularization should be. To adjust the regularization in this way, we propose to use filter factors determined by a sigmoid type of function regulated by 3 2, e.g. ~,i
:
f~(zi):
1/(1 4-exp(--/3zi)),
zi = ln(~ff/.~2),
(18)
where the parameter /3 determines the slope of the sigmoid, i.e. the length of the intermediate phase. For o-? >> L~2, )~i ~'~ 1 while a2 << 32 gives ~-i ~ 0 as desired. Note that for /3 = 1, one obtains the RR solution with/1 = 32. The advantage of this method for our purposes is that the regularization is determined automatically from the data in a less subjective manner (from Eqs. (17) and (18)). This approach will be used with/3 ---- 1 when applying RR. The filter factors for the regularization techniques we study are shown for an example from a noisy chaotic time series in Fig. 1. Here, the residual standard error ~ is small compared to the singular values O"i and the magnitude of the Fourier coefficients lu/Tyl
(see Fig. l(a)). For both PLS and TTLS, some of the filter factors ki are well above one. 1 For TTLS, )~q in particular may be very large, in which case the solution becomes unstable (see Eq. (16)).
3. Local linear regression applied to chaotic data For noise-free data embedded in a state space of sufficiently large dimension m, e.g. m ~ 2 [d] + 1, the attractor often has locally little variance in some directions, in which case the data matrix X becomes ill-conditioned. (Here d is the fractal dimension of the chaotic attractor and [d] is the smallest integer larger than d.) For example, when X is derived from a time series generated by the chaotic Henon map [24] (d 1.21) with k = 15 and m = 5, the condition number ~1/as is typically of the order of 10 -4. The reason for this is that locally the chaotic attractor is mainly confined to some subspace of Nm. However, small variations outside this subspace may still contain valuable information. As long as the condition number is not
I For PLS and TTLS the filter factors were computed with the Matlab routines isqr and fJ l_fac in the regutools toolbox [I 4].
D. Kugiumtzis et al./Physica D 112 (1998) 344-360
350 (a)
(b)
0.1
0.5
0.09
0.45
0.08
0.4
I J I I
0.07 tll 69
0.06
OLS k=15 OLS k=8 PCR k=15 PCR k=8
// /
0.35
/ t I i i i 1
0.05
t
Z
i i
0.04
uJ ¢0
0.3
OLS k=15 OLS k=8 PCR k=15 PCR k=8
/ 1 ! I 1
/
~ 0.25 Z
0.2
0.03
0.15
0.02
0.1
..... /
0.01 0
3
/
/
~1 5 embedding dimension m
6
.----
0.05 0 2
3
4 5 embedding dimension m
6
Fig. 2. Prediction for different m with OLS and PCR (q = 2) for 2000 samples generated from the Henon map. The test set is comprised of the last 500 points. The one-step prediction error is measured with NRMSE (see text) for increasing embedding dimension m. The four curves correspond to OLS and PCR with k = 8 and k = 15, as shown in the legend. In (a) the data are noise-free and in (b) normal white noise is added to the noise-free data; the standard deviation of the noise is 5% of the standard deviation of the data.
extreme so that numerical problems are encountered, the OLS solution could still be the best. The situation becomes different when observation noise is introduced. Although this will tend to make X better conditioned [25] (the condition number is of the order 1 when 5% white noise is added to the Henon data), the prediction capability of OLS deteriorates.
3.1. Illustrating regularization To illustrate the potential of regularization, Fig. 2 shows results for both noise-free and noisy data from the Henon map. One-step prediction with OLS and PCR is applied for neighborhoods of different sizes (k = 8 and k = 15) over a range of embedding dimensions (m = 2 . . . . . 7). The value q = 2 of the regularization parameter corresponds to rather strong regularization and matches the topological dimension of the Henon attractor. The T-step prediction error is measured with the normalized root mean square error NRMSE _ i ( 1 / ( N - T - n)) ~--~t=n+l N-T (Xt+T ( l / N ) y~N_ 1(Xi -- .~)2
-
-~t+T) 2
where £ is the sample mean of the data and xt+T, t = n + 1. . . . . N - T are the response values of the test set (T = 1 in Fig. 2). A value of N R M S E near 1 means that the prediction is as good as the mean value prediction. Note the difference in scale of the N R M S E in Figs. 2(a) and (b). For noise-free data, OLS predicts better than PCR and equally well for the two values of k, indicating that the linear approximation for the Henon data is good over a range of sizes of the local regions. For noisy data, OLS solutions are more unstable and deteriorate as m increases. For m close to k, the error can be very large. For example, with k = 8 and m = 6, the N R M S E is 0.5, while for m = 7 = k - 1 the error is 20. The N R M S E for OLS shown in Fig. 2(b) is less for k = 15 than for k = 8, since in the former case m is small compared to k. It has been observed that for larger values of m, close to k = 15, the N R M S E increases rapidly for OLS. The PCR solutions are consistently better and more stable over the range of k and m values. Note that the best predictions are obtained for m ~ 4-5, which is consistent with the criterion m > 2 d + 1 [26]. When d or at least Fd] is known, one could imagine that PCR with q = Fd] would be reasonable. However, as we will see below, this is not always correct, so the selection of q is basically an open problem.
D. Kugiumtzis et al./Physica D 112 (1998) 344-360
351
(b)
(a) 0.4 0.35 0.3 0.25
0.9
/
0.8
0.6 i.u 09 ~0.5 z 0.4
0.2 z 0.15
0.3
0.1
0.2 0.1 "/''"
0.05 0
~;~///"
0.7
O 4
5
6 7 8 embedding dimension m
9
10
2
~ 4
6
8 10 12 prediction time T
14
16
18
(c) 1 0.9 0.8 0.7 0.6
~0.5 Zo. 4 0.3 0.2 0.1 0
2
4
6
8 10 12 prediction time T
14
16
18
Fig. 3. Prediction with different regularization techniques for data generated from the Lorenz system and corrupted with 5% noise. Here, N = 2000, n = 1500 and k = 15. In (a) the NRMSE as a function of m is shown for the one-step prediction. In (b) the NRMSE as a function of T is shown for m = 4 and in (c) for m = 7. OLS, PCR, PLS, RR and TLS are shown according to the legend. For PCR, PLS, and TLS q = 2 and for R R / z = 32.
3.2. Prediction with different regularization methods
3.2.1. One-step and direct multi-step prediction
Here a more systematic comparison of the regularization methods will be performed, the issue being how to select appropriate techniques with suitable parameter values. First, one-step and direct multistep prediction are considered. Direct and iterative prediction are then compared and an estimate of the prediction variance in Eq. (8) is applied to a time series.
We consider first the typical situation observed with most of the processes studied (e.g. Henon, Lorenz [27] and Mackey-Glass with delay time 30 and 100 units [28]). In Fig. 3, the results are shown for data generated from the Lorenz system and corrupted with 5% noise. The regularization parameters for PCR, PLS and TTLS are all set to q = 2, because the Lorenz attractor is well approximated locally by a two-dimensional plane. For RR, we use/z = ~2 (see Eq. (17)).
D. Kugiumtzis et al./Physica D 112 (1998) 344-360
352
(b)
(a) 0.65
v
v
r
r
1
p
0.9 0.6 0.8 0.55
0.7
0.5
/
0.6
l
iii CO
~0.5
0.45 Z
Z
0.4
0.4 0.3
0.35
0.2
0.3 0.25 3
0.1 4
5
6 7 8 embedding dimension m
9
10
i
i
i
i
J
i
i
1.5
2
2.5
3
3.5
4
4.5
prediction time T
(c) 1
0.9 0.8 0.7 ,,,0.6
~0.5 Z
z
0.4 0.3 0.2 0.1 0
1.5
2
2.5 3 3.5 prediction time T
4
4.5
5
Fig. 4. Prediction with different regularization techniques for data generated from the Ikeda map and corrupted with 5% noise. Here, N = 2000, n = 1500 and k ---- 15. In (a) the NRMSE as a function of m is shown for the one-step ahead. In (b) the NRMSE as a function of the prediction time T is shown for m = 4 and in (c) for m = 7. OLS, PCR, PLS, RR and TLS are shown according to the legend. For PCR, PLS, and TLS q = 3 and for RR/~ = ~2.
For one-step prediction (Fig. 3(a)), all regularization methods are slightly better than OLS. For multi-step prediction (Figs. 3(b) and (c)), TTLS is inappropriate, while the other regularization methods perform similarly and always better than OLS. Moreover, their advantage over OLS becomes more pronounced with larger values of m. The Ikeda map [29] illustrates a more complicated situation. The topological dimension rd7 of this attractor is 2, but 4 has been suggested as the
minimum m for successful state space reconstruction [30]. This implies that the choice q = [dl may not be suitable here, which is also confirmed by our simulations. Results are shown in Fig. 4. A milder regularization with q = 3 is used for PCR, PLS, and TTLS, while for RR # = ~2 as before. Considering one-step prediction in Fig. 4(a), TTLS performs rather poorly for all embedding dimensions, while the other regularization methods behave similarly to OLS. For direct multi-step prediction with m ---- 4,
353
D. Kugiumtzis et al./Physica D 112 (1998) 344-360 (a)
0.9
t!
.
.
.
.
.
.
~/~ "f~'/~ " ~ ~ "
0.8
lilt /
0.7
0.7
w0.6
0.6
Ill
O3
~0.5 0.4
~0.5 rr" Z 0.4
0.3
0.3
0.2
0.2
0.1
0.1
0
.
0.9
i
0.8
z
(b)
1
1
2
4
6
8 10 12 14 prediction time T
16
18
/ t
~J
2
4
t'o
6
t'2
prediction time T
1'4 1'6 1'8
(c) 0.9
~~; 5
0.8
.
0.7
'
w 0.6
_ - ~ -~'"
0.4
-~7~ ./
,'-'-~"
0,°2°3l i, ';i;;;;;-J-'-"/' ? 0~
2
4
6
8 10 12 14 prediction time T
16
18
Fig. 5. Iterative multi-step prediction with different regularization techniques for data generated from the Lorenz s y s t e m and corrupted with 5 % noise (N = 2000, n = 1500 and k = 15). In (a) the N R M S E as a function of T is s h o w n for m = 4. in (b) for m = 7 and in (c) for m = 10. OLS, PCR, PLS and R R are shown according to the legend. For P C R and PLS q = 2 and for R R # = ,72,
which represents the best choice of m, the results are rather similar as expected, except for TTLS (see Fig. 4(b)). Increasing the embedding dimension to m = 7, the regularized solutions are clearly preferable (see Fig. 4(c)). All results in Figs. 3 and 4 for PCR, PLS and TTLS are with fixed regularization parameters. CV has also been tried and it turns out that the variation of the parameter for a given regularization method is large. No improvements were obtained over the results in Figs. 3 and 4 using CV.
3.2.2. Iterative multi-step prediction Discrepancies between OLS and the regularization techniques are small for T ---- 1 and grow with T when direct multi-step prediction is employed. When iterative multi-step prediction is used instead, the discrepancies seem to grow relatively slower. In Fig. 5, iterative multi-step predictions with different regularization techniques are shown for the Lorenz data for three different values of m. The iterative T-step prediction involves T successive one-step predictions, where at each step the previously predicted values are used
354
D. Kugiumtzis et al./Physica D 112 (1998) 344-360
(a)
(b) 20
15
~,
i0 L
~_
.h ',~
51- ")~'"
~i
~
~, ~ ~'/'
~,"
'\,
,b "6
15
~/ ' '
10
'Z'
'~\---~,'
,~>.4'
';,, ,
• _o
o
true value
~o
OLS point estimate
"'~'
OLS confidence band
-15
PCR point estimate PCR confidence band
-20
5
-5
"~.~'~,
-10
10
,
time
/
20
~
25
30
,~
o
_
\
]
value
z
~
"'
x
~ ko,. ~ ~
,,,~'
OLS point estimate
-10 ",~f',' / 1
_ _z
-
~true
/I
I
15 index i
i
5
~,
-
-5
oJO
-15
/
point estimate PCR confidence band .
-20
~ '~ , i ~t t
OLS confidence band
5
10
,/
I
'Z' J
.J
,
,
/
15 time index i
20
25
30
Fig. 6. Uncertainty of the predictions with OLS and PCR (q = 2) for data generated from the Lorenz system and corrupted with 5% noise. In (a) one time step predictions are made using k = 15 and m = 7 on a test set of 30 samples, shown with circles, based on a training set of 1500 samples. The predictions with OLS and PCR are shown with black and gray lines, respectively, and the confidence intervals (see text) are shown with black and gray stippled lines, respectively. In (b) the same is shown for direct three-step predictions. as coordinates for the new target point. Comparing the iterative predictions for m = 4 and m = 7 in Figs. 5(a) and (b) with the corresponding direct predictions in Figs. 3(b) and (c), respectively, there is less improvement with regularization using the iterative scheme. However, as m increases further, the differences become more pronounced (see Fig. 5(c)). Here, PCR and PLS clearly perform better than RR. Note that with larger m, the predictability of the regularized models farther into the future is enhanced, whereas for OLS the opposite is true (see for example Figs. 5(b) and (c)). This is observed in general with data from continuous systems. In local prediction with OLS, the iterative scheme has been reported to be superior to the direct scheme [31,32]. However, our study using different systems shows that this is not always the case (e.g. compare the results in Figs. 3(b) and 5(a) for T = 10).
prediction variance is shown in Fig. 6 for the Lorenz data using OLS and PCR (q = 2). In the computation of Var{~?t+T} from Eq. (8) (where Var{}t+T} = Var{~t + ~} = Var{~t}), x true t is replaced by the noisy xt, b by the OLS and PCR estimate and cr by 5% of the standard deviation of the data. The prediction estimates } t + r for T = 1 and T = 3 are shown together with the confidence interval defined as ~t+T 42v/Var{~t+v}. Regularization decreases the variance significantly. However, the variances of the estimates both for OLS and PCR grow large with the prediction time T, especially for regions on the attractor where the predictions are bad. Note that the prediction variance may be estimated a priori, i.e. before the prediction is made, in order to anticipate the quality of the forthcoming prediction.
4. Prediction of the R - R intervals of ECG data
3.2.3. Prediction confidence bands The theoretical expression for the prediction variance was given in Eq. (8). This expression may not be very reliable because the uncertainty in the predictor matrix X and the linearization errors are not considered, but at least it can illustrate differences in the prediction variance of the models• Estimated
Here, we demonstrate the improved performance of the regularized methods over OLS for the R - R intervals of human ECG data. The R - R intervals measure the length of the cardiac cycles from the interval between two successive ventricular beats [33]. In some papers, it has been suggested that heart rhythms may
D. Kugiumtzis et al./Physica D 112 (1998) 344-360
0.9
0.85
o 0.8
.-
o~
:
"
.E ~0.75 E
i,
.
'~
':
"
"
-
~• -:;-).i-: ..::!. :i;" .-.-. ';~::!'..':: • ~ • :" '".: ::2" ~ -7-',.'.:'.'..':?, .....".' :;-: .',,)., ::. .-'~::{~'2 ".. "-"-.'.'.L.:.":: ::: ::,;::.. U;.'.i:'~-
: 'i
... .: .:i-
ff
,-r 0.7
0.65
0.6
"
500
2
1000 time index
1500
2000
Fig. 7. R - R intervals obtained from the time between successive beats of an E C G signal. The length of this time series is N = 1938.
show features of deterministic chaos [34], while in others no evidence for chaos was found [35]. We refrain from discussing this topic here and confine ourselves only to the performance of the local linear methods in regard to this type of data. The data set used here, shown in Fig. 7, is obtained from an ECG signal from an arrythmia database [36] (selected at random from an inpatient) and consists a relatively long and stationary time series that seems rather irregular and unpredictable. The last 500 samples are used for testing. The NRMSE of the iterative predictions with OLS, PCR, PLS and RR are shown in Fig. 8. The parameter setup and the illustration of the results are as in Fig. 4. The three regularization techniques outperform OLS for one-step prediction when m increases (see Fig. 8(a)). Multi-step predictions with moderately large m are not possible with OLS, while, on the contrary, the predictions with the regularization techniques are slightly improved when m gets larger (compare Figs. 8(b) and (c)). Other simulations (not shown here) showed that PCR and PLS predictions with q = 1 and q = 2 were slightly worse than for q = 3 (the one used in Fig. 8), but still significantly better than OLS. Simulations with larger k (k = 30) gave less dramatic differences between OLS and the regularization techniques. In all cases, PCR gave the best results (see, e.g. Fig. 8).
355
In conclusion, we report that predictions with a global linear autoregressive (AR) model were almost as good as PCR and significantly better than OLS for larger m (m > 4). In the prediction of normal heart rhythms in [37], it was found using m =- 7 that multistep predictions with AR were slightly worse than those of the local simplex method [38], which agrees with our findings using the regularization techniques. We can therefore conclude that OLS is the least appropriate local linear model for predicting heart rythms and the regularization of OLS is demanded for this type of data. We found similar qualitative results as those shown in Fig. 8 for other real world data such as the souther oscillation index (SOI) record [39] and the blowfly data [40].
5. Prediction of sunspot data As a preliminary application of these methods to the prediction of real data, we chose the annual mean sunspot data (Wolf numbers), a bench-mark data set in time series prediction. This time series is from 1700 to the present. The nature of the underlying process is still under investigation [41-43] and many different prediction models have been applied to these data. Threshold autoregressive (TAR) models with m = 12, corresponding to a "year cycle" of l 1-12 years, were found to give good predictions [21,44]. Later studies with other nonlinear models, such as neural networks (NN) [45], multivariate adaptive regression splines (MARS) [46], and local linear prediction with OLS [47,48], reported equally good or better results. We adopt here the prediction error measure used in most of the previous studies, referred to as average relative variance (ARV), which is the square of NRMSE. For comparison purposes, we use the prediction setup in [45]. There, as in many previous studies, the series was split in two intervals, 1700-1920 for training and 1921-1955 for testing. Iterative multi-step prediction was employed. We chose k = 30 in order to be consistent with [49], where this was shown to be the optimal k for OLS. This is a rather large value (comprising 15% of the training set), but our results were
D. Kugiumtzis et al./Physica D I12 (1998) 344-360
356
(b) 1
(a) 0.9
1 0.9
0.8
...... OLS
0.8
PCR
0.7
PLS
,,, 0.6
0.7
RR
LU 03
J
~0.5
~0.5 . . . . . Z
0.6
Z
0.4
0.4 0.3
I
OLS
o.a
PCR
0.2
PLS
o.1
RR
0.2 0.1
i
0
5
6 7 8 embedding dimension m
9
lO
i
i
L
prediction time T
(c) 1
0.9 0.8
; ; 55----- :5:---- : :-----
0.7 wO.6 q)
~0.5 z 0.4
OLS
0.3
PCR
0.2
PLS
o.1
RR i
o
prediction time T Fig. 8. Prediction with OLS, PCR, PLS and RR of the time series of the R-R intervals shown in Fig. 7. For PCR and PLS q = 3 and for RR /z = g2. The last 500 are used for testing and k = 15. In (a) the NRMSE as a function of m is shown for the one-step ahead. In (b) the NRMSE as a function of the prediction time T is shown for m = 4 and in (c) for m = 7. OLS, PCR, PLS and RR are shown according to the legend.
not very sensitive to the choice of k. Most regularization techniques gave an ARV for T = 1 around 0.1 with best results for small m. For example, PCR with q = 3 and m = 4 gave ARV = 0.078, to be compared with ARV = 0.097 for TAR and ARV = 0.086 for the NN in [45]. For one-step prediction, a small m (from 3 to 5) gives best predictions (regardless of the method applied), while for multi-step prediction a larger m (up to 7-8) gives best results, and for m > 8 both one
and multi-step predictions become worse. Simulations with different methods and various training and test sets showed that PCR with m = 7 and q = 3 gave generally good multi-step predictions. The results for OLS (m = 7) and PCR (q = 3, m = 7) as well as for TAR and the weight-eliminated NN with m = 12 used in [45] are shown in Fig. 9(a). 2 For OLS, the choice
2 The ARV values for the NN model were copied directly from Fig. 5 in [45].
357
D. Kugiumtzis et al./Physica D 112 (1998) 344-360
(a) 1 0.£ 0.~
(b) 180
1
OLS m=7
/
PCR m=7q=3
/
TAR m=12 NN m=12
/ l/ J
0.7
'"
. " "
,'
"
160 140
- - '' 120
0.6
zE 100
0.5 80 0.4 60 0.3 4O
0.2
2O
0.1 0
5
10 15 prediction time T
20
2000
2005
2010
2015
2020
year
Fig. 9. (a) ARV for iterative multi-step prediction with different models. The test set is the period 1921-1955. The first prediction year of the test set increases with T, e.g. for T = 1 the first prediction starts in 1921 and for T = 23 in 1944. Thus the number of predictions in the test set decreases with T. The models used in each plot are shown in the legend. (b) The genuine out-of-sample iterative prediction of the sunspot numbers up to the year 2017 based on data up to 1995 using the PCR model with q = 3 and m = 7. The bars show two times the standard deviation of the point estimate for each step.
m = 7 was the best for the chosen training and test set. The PCR model was not optimized for this particular setup. As shown in Fig. 9(a), PCR is at least as good as NN (slightly worse than NN for T < 9 and slightly better for T > 9), certainly better than OLS and superior to TAR for all but T = 1. The consistently good predictions with q -- 3 support the hypothesis of a low-dimensional system of Id] = 3, which is close to the estimated dimension F d l = 4 in [50]. This also seems intuitively correct, as the oscillatory behavior of the sunspot series is reminiscent of well-studied systems of dimension 2 < d < 3, e.g. the 2- and 3-torus and the Lorenz and R6ssler systems [51] etc. We used PCR with q = 3 and m = 7 to predict the two forthcoming sunspot cycles based on annual data up to 1995. The point estimates are shown in Fig. 9(b) with bars representing the estimated confidence band for the one-step prediction (conditioned on the last predicted values) 3~t_l_1 "4- 2~/Var(Jt+l) (see Eq. (8)), where ~r was estimated here with ,L Note that the two cycles are reproduced and that the predictions of the ascent period have large variance, as discussed by other authors [431.
6. Discussion Regularization certainly improves the prediction of noisy data compared to OLS, with the notable exception of TTLS. Although TTLS is designed to obtain improved estimates for b when X is error-corrupted, it turned out to be unsuitable for local linear prediction. This is not surprising considering the fact that the first q filter factors always are larger than 1, i.e. TTLS does the opposite of regularization in these directions. Differences among the other techniques are small and not systematic. The discrepancies may be explained by the local curvature of the attractor around the target point and by the location of the neighbor points and the way these are contaminated by noise. For example, varying local curvature may require different dimensions of the local state space, and then PCR with fixed q would not give the best results. We have suggested an automatic scheme (see Eqs. (17) and (18)) that changes the amount of regularization depending on the singular values and the residuals for the local least squares problem. This approach consistently gives fairly good predictions and does not require any parameter determination. On the other hand, for most systems we tested, PCR
358
D. Kugiumtzis et al./Physica D 112 (1998) 344-360
and PLS with q = rd] gave the best predictions with only marginal differences. However, in a real application, if q is not indicated by a dimension estimation method, different values of q must be tried when predicting with PCR and PLS. Cross validation does not seem to improve the procedures for choosing q here. Each target point poses a separate problem where a different parameter may be the most appropriate. When m is close to k, OLS deteriorates but the regularized methods do not seem to be affected. This is an important advantage of regularization because m ~ k may sometimes be desired, when k has to be small (e.g. due to few available data) or when m has to be large (e.g. due to a large dimension of the underlying attractor). Actually, the condition m > k is allowed and it suffices that q < min(k, m) to assure numerically stable regularized solutions. This means that even with restricted k values (e.g. with short time series), multi-step predictions that may require large m can still be obtained. For data generated by continuous systems, it seems that a state space reconstruction with large m enables better multi-step predictions. For example, we found that for the noisy Lorenz data, reconstruction with a time window length rw -- (m - 1)r close to the mean orbital period rp (corresponding to m = 10 in Fig. 5(c)) gave better predictions for large T than with a smaller rw. The choice rw ~ rp has previously been found appropriate for reconstruction in the estimation of the correlation dimension [52]. However, for the sunspot data a smaller rw (rw = 6 years corresponding to m ----7) compared to rp = 11 years was found to be more appropriate, but this may be due to the limited number of cycles observed. The parameter r is usually overlooked in work on prediction, probably because the studied time series are sampled coarsely. When the sampling is dense, r should be large enough to avoid excessively large values of m compared to k. This implementation of regularization in local linear prediction gave promising results and motivation for further investigations, such as seacrh for optimal neighborhoods [53] or weighting of the neighbors [54]. It would be interesting to examine the longterm predictability with regularized methods, i.e. their ability to reproduce the original dynamics. Moreover,
regularized methods may be implemented in other areas where local linear models are used, such as estimation of Lyapunov exponents, noise reduction, and control. For such purposes, it has been noted that models taking into account the errors-in-variables problem are the most appopriate [55,56]. However, our analysis showed that the TTLS model, which is especially designed to solve the errors-in-variables problem, is the worst prediction model. Further analysis is needed to clarify this point.
7. Conclusion Several local linear methods designed for ill-posed regression problems have been tested in the prediction of noisy chaotic time series. The regularization techniques have been described within a uniform framework using the SVD. All the regularization methods attempt to reduce the variance of the OLS solution, while keeping the bias small. For most methods, this trade-off between bias and variance is done by damping the higher order SVD components. Simulations with well-known chaotic systems showed that regularization is beneficial when the data are corrupted with noise. In such cases, all methods except TTLS yield improved predictions compared to OLS. Best results require careful selection of the regularization parameter at each target point. However, PCR and PLS with a properly chosen fixed parameter q often turned out to be the methods of choice. For RR, we suggest to estimate the shrinking parameter # from the variance of the residual, which is certainly different for each local least squares problem. In this way, the regularization effect varies with the target point. The regularized techniques were implemented on real-world data, such as the R-R intervals of human ECG data and the sunspot data, and gave improved predictions over OLS. Particularly for the sunspot data, they gave good predictions compared to nonlinear techniques believed to be the most powerful for these data. The results with synthetic and real data showed that there is no universally best regularization method and testing and comparison are necessary for each new application.
D. Kugiumtzis et al./Physica D 112 (1998) 344-360
Acknowledgements The first author thanks the N o r w e g i a n Research Council ( N F R ) for partial support o f this work. The authors thank A. Tsonis for providing the SOI data.
References [I]A.S. Weigend and N.A. Gershenfeld, Time Series Prediction: Forecasting the Future and Understanding the Past (Addison-Wesley, Reading, MA, 1994). [2] A.A. Tsonis, Chaos: From Theory to Applications (Plenum Press, New York, 1992). [3] B. Lillekjendlie, D. Kugiumtzis and N. Christophersen, Chaotic time series Part II: System identification and prediction, Modeling, Identification and Control 15 (4) (1994) 225-243. [4] J.D. Farmer and J.J. Sidorowich, Predicting chaotic time series, Phys. Rev. Lett. 59 (1987) 845-848. [5] P.J. Brown, ed., Measurement, Regression, and Calibration (Oxford University Press, Oxford, 1993). [6] G.H. Golub and C.E Van Loan, An analysis of the total least squares problem, SIAM J. Numer. Anal. 17 (6) (1980) 883-893. [7] T. Sauer, Time series prediction by using delay coordinate embedding, in: Time Series Prediction: Forecasting the Future and Understanding the Past, eds. A.S. Weigend and N.A. Gershenfeld (Addison-Wesley, Reading, MA, 1994) pp. 175-193. [8] E.J. Kostelich and T. Schreiber, Noise reduction in chaotic time-series data: A survey of common methods, Phys. Rev. E 48 (3) (1993) 1752-1763. [9] O.C. Lingizerde and N. Christophersen, Shrinkage Properties of Partial Least Squares, manuscript (1997). [10] N.H. Packard, J.P. Crutchfield, J.D. Farmer and R.S. Shaw, Geometry from a time series, Phys. Rev. Lett. 45 (1980) 712. [11] E Takens, Detecting strange attractors in turbulence, in: Dynamical Systems and Turbulence (Warwick, 1980), eds. D.A. Rand and L.-S. Young, Lecture Notes in Mathematics, Vol. 898 (Springer, Berlin, 1981) pp. 366381. [12] W.A. Fuller, ed., Measurement Error Models (Wiley, New York, 1987). [13] L.L. ScharL Statistical Signal Processing: Detection, Estimation and Time Series Analysis (Addison-Wesley, Reading, MA, 1991). [14] P.C. Hansen, Rank-deficient and discrete ill-posed problems, Ph.D. Thesis, Technical University of Denmark (1996). [15] I.S. Helland, Relevance, Prediction and interpretation for linear methods with many predictors, Research Report 0806-3842, Department of Mathematics, University of Oslo (1996).
359
[ 16] S. Wold, A theoretical foundation of extra thermodynamic relationships (linear free energy relationships), Chimica Scripta 5 (1974) 97-106. [17] I.S. Helland, On the structure of partial least squares regression, Comm. Stat. Simulation Comput. 17 (2) (1988) 581-607. 118] C. Goutis, Partial least squares algorithm yields shrinkage estimators, Ann. Stat. 24 (2) (1996) 816-824. [19] A.E. Hoerl and R.W. Kennard, Ridge regression: Biased estimation for nonorthogonal problems, Technometrics 12 (1) (1970) 55-109. [20] R.D. Fierro and J,R. Bunch, Collinearity and total least squares, SIAM J. Matrix Anal. Appl. 15 (4) (1994) 11671181. [21] M.B. Priestley, Non-linear and Non-stationary Time Series Analysis (Academic Press, New York, 1988). [22] J. Theiler, Estimating fractal dimension, J. Opt. Soc. Amer. A 7 (6) (1990) 1055-107l. [23] G.H. Golub and C.E Van Loan, eds., Matrix Computations, 2nd Ed. (Johns Hopkins University Press, Baltimore, MD, 1989). [24] M. H6non, A two-dimensional map with a strange attractor, Comm. Math. Phys. 50 (1976) 69-77. [25] G.W. Stewart, Perturbation theory for the singular value deomposition, in: SVD and Signal Processing II, Algorithms, Analysis and Applications, ed. R.J. Vaccaro (Elsevier, Amsterdam, 1991) pp. 99-109. [26] T. Sauer, J.A. Yorke and M. Casdagli, Embedology, J. Star. Phys. 65 (1991) 579-616. [27] E.N. Lorenz, Deterministic nonperiodic flow, J. Atmos. Sci. 20 (1963) 130. [28] M. Mackey and L. Glass, Oscillation and chaos in physiological control systems, Science 197 (1977) 287. [29} K. lkeda, Multiple-valued stationary state and its instability of the transmitted light by a ring cavity system, Opt. Comm. 30 (1979) 257. [30] R. Brown, P. Bryant and H.D.I. Abarbanel, Computing the Lyapunov spectrum of a dynamical system from an observed time series, Phys. Rev. A 43 (6) (1991) 27872806. [31] J.D. Farmer and J.J. Sidorowich, Exploiting chaos to predict the future and reduce noise, in: Evolution, Learning and Cognition, ed. Y.C. Lee (World Scientific, Singapore, 1988) pp. 277-330. [32] M. Casdagli, Nonlinear prediction of chaotic time series, Physica D 35 (1989) 335-356. [331 A.L. Goldberger, ed., Clinical Electrocardiography (Mosby, St. Louis, 1977). [34] J.E. Skinner, A.L. Goldberger, G. Mayer-Kress and R.E. Ideker, Chaos in the heart: Implications of clinical cardiology, Bio/Technology 8 (1990) 1018. [35] J.K. Kanters, N.H. Holstein-Rathlou and E. Agner, Lack of evidence tbr low-dimensional chaos in heart rate variability, J. Cardiovascular Electrophysiology 5 (1994) 591. [36] MIT-BIH Arrhythmia Database, BMEC TR010 (Revised), Biomedical Engineering Centre (MIT, Cambridge, 1988).
360
D. Kugiumtzis et al./Physica D 112 (1998) 344-360
[37] J.H. Lefebvre, D.A. Goodings, M.V. Kamath and E.L. Fallen, Predictability of normal heart rythms and deterministic chaos, Chaos 3 (2) (1993) 267-276. [38] G. Sugihara and R.M. May, Nonlinear forecasting as a way of distinguishing chaos from measurement error in time series, Nature 344 (1990) 734-741. [39] A.A. Tsonis and J.B. Elsner, Global temperature as a regulator of climate predictability, Physica D 1640 (1997) 1-7. [40] D.R. Brillinger, J. Guckenheimer, P. Guttorp and G. Oster, Empirical modelling of population time series data: The case of age and density dependent vital rates. Lecture Notes in the Life Sci. 13 (1980) 65-90. [41] C.P. Price, D. Pilchard and E.A. Hogenson, Do the sunspot numbers form a "Chaotic" set?, J. Geophys. Res. 97 (A12) (1992) 19 113-119 120. [42] M. Carbonell, R. Oliver and J.L. Ballester, A search for chaotic behaviour in solar activity, Astronom. Astrophys. 290 (1994) 983-994. [43] M.D. Mundt, W.B. Maguire II and R.R.P. Chase, Chaos in the sunspot cycle: Analysis and prediction, J. Geophys. Res. 96 (A2) (1991) 1705-1716. [44] H. Tong, Non-linear Time Series: A Dynamical System Approach (Oxford University Press, New York, 1990). [45] A.S. Weigend, B.A. Huberman and D.E. Rumelhart, Predicting the future: A connectionist approach, Int. J. Neural Syst. 1 (3) (1990) 193-209. [46] P.A.W. Lewis and J.G. Stevens, Nonlinear modeling of time series using multivariate adaptive regression splines (MARS), J. Ameri. Stat. Assoc. 86 (416) (1991) 864-877.
[47] M. Casdagli, D.D. Jardins, S. Eubank, J.D. Farmer, J. Gibson and J. Theiler, Nonlinear modeling of chaotic time series: theory and applications, in: Applied Chaos, eds. J.H.K. Kim and J. Stringer, Ch. 15 (Wiley, New York, 1992) pp. 335-380. [48] H.D. Navone and H.A. Ceccatto, Forecasting chaos from small data sets: A comparison of different nonlinear algorithms, J. Phys. A 28 (12) (1995) 3381-3388. [49] M. Casdagli, Chaos and deterministic versus stochastic nonlinear modeling, J. Roy. Stat. Soc. Ser. B 54 (1992) 303-328. [50] B. Cheng and H. Tong, On consistent nonparametric order determination and chaos, J. Roy. Stat. Soc. Ser. B 54 (1992) 427-449. [51] O.E. R6ssler, An equation for continuous chaos, Phys. Lett. A 57 (1976) 397-398. [52] D. Kugiumtzis, State space reconstruction parameters in the analysis of chaotic time series - the role of the time window length, Physica D 95 (1996) 13-28. [53] L.A. Smith, Local optimal prediction: Exploiting strangeness and the variation of sensitivity to initial condition, Philos. Trans. Roy. Soc. A 348 (1994) 477-495. [54] G. Sugihara, Nonlinear forecasting for the classification of natural time series, Philos. Trans. Roy. Soc. A 348 (1994) 477-495. [55] E.J. Kostelich, Problems in estimating dynamics from the data, Physica D 58 (1992) 138-152. [56] L. Jaeger and H. Kantz, Unbiased reconstruction of the dynamics underlying a noisy chaotic time series, Chaos 6 (1996) 440.