Nonlinear dynamical system identification with dynamic noise and observational noise

Nonlinear dynamical system identification with dynamic noise and observational noise

Physica D 223 (2006) 54–68 www.elsevier.com/locate/physd Nonlinear dynamical system identification with dynamic noise and observational noise Tomomic...

3MB Sizes 0 Downloads 99 Views

Physica D 223 (2006) 54–68 www.elsevier.com/locate/physd

Nonlinear dynamical system identification with dynamic noise and observational noise Tomomichi Nakamura ∗ , Michael Small Department of Electronic and Information Engineering, The Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong Received 27 September 2004; received in revised form 9 June 2006; accepted 16 August 2006 Available online 18 September 2006 Communicated by A. Stuart

Abstract In this paper we consider the problem of whether a nonlinear system has dynamic noise and then estimate the level of dynamic noise to add to any model we build. The method we propose relies on a nonlinear model and an improved least squares method recently proposed on the assumption that observational noise is not large. We do not need any a priori knowledge for systems to be considered and we can apply the method to both maps and flows. We demonstrate with applications to artificial and experimental data. The results indicate that applying the proposed method can detect presence or absence of dynamic noise from scalar time series and give a reliable level of dynamic noise to add to the model built in some cases. c 2006 Elsevier B.V. All rights reserved.

Keywords: Description length; Dynamic noise; Least squares method; Nonlinear time series modelling

1. Introduction In the real world, systems are not always isolated from their surroundings. That is, the observed system may have two components in the dynamics, a system (the deterministic part) and an input from the surroundings (high-dimensional unexplained dynamics, which we call dynamic noise). Also, nonlinear systems abound in nature and phenomena can often be understood as due to nonlinear dynamics [22,31,17]. To have a profound understanding of the nonlinear system and to identify the system, it is very important to know whether the system has dynamic noise. If we could know whether the system has dynamic noise, it would be a good opportunity to consider what is the dynamic noise and why the dynamic noise is necessary for the system. In this paper, we consider the problem of detecting whether a nonlinear system (deterministic part) has dynamic noise (stochastic part). For investigating nonlinearity (nonlinear determinism) in time series, the method of surrogate data has been proposed [1,2] and often applied to real-world time series [7,8]. ∗ Corresponding author.

E-mail addresses: [email protected] (T. Nakamura), [email protected] (M. Small). c 2006 Elsevier B.V. All rights reserved. 0167-2789/$ - see front matter doi:10.1016/j.physd.2006.08.013

We call such techniques linear surrogate methods, because they are based on a linear process and address a linear null hypothesis [3–6]. By this method, we can consider whether the time series is generated by a linear system or a nonlinear system. Although this discrimination is very useful to understand the system, it is still not enough because the result obtained using the method of surrogate data cannot determine whether the nonlinear system has dynamic noise. The reason is that nonlinear systems indicate nonlinearity irrespective of whether the system has dynamic noise or not. Another approach for tackling nonlinear time series is nonlinear modelling [10,8,11,23,24]. The aim of building a model is, given finite time series which are contaminated by observational noise, to find an approximation of the true dynamics. Time series modelling is very useful for prediction of time series, to help to understand behaviour and to gain insight into the dynamics. A particularly convenient and general class of nonlinear models is the pseudo-linear models, which are linear combinations of nonlinear functions [10,11]. Hence, models we will build in this paper are all pseudo-linear models. The modelling procedure we use has three components: a suitable hierarchical class of nonlinear models; methods to select optimal models of a prescribed order from the model

T. Nakamura, M. Small / Physica D 223 (2006) 54–68

class; and criteria which determine when a model is sufficiently accurate and neither over-fitting nor under-fitting the time series [10,11]. When building models for real-world time series, we usually expect that the system has dynamic noise, because systems are not always isolated from their surroundings in the real world. The free-run behaviour of the model built with some amount of dynamic noise often shows very similar qualitative behaviour to the data used as training data. However, if it were not for dynamic noise in the model, the behaviour is often periodic [10]. Hence, we expect that the system would have dynamic noise. However, we do not know the appropriate level of dynamic noise to add to the model. As is easily expected, either too much or too little dynamic noise is likely to lead to inappropriate free-run behaviour. Also, the model we can build is just an approximation of the true dynamics and the data available are not perfect, that is, the length and numerical accuracy of time series available to us are limited, and the data is contaminated by observational noise. Furthermore, we cannot know whether the system has dynamic noise only by observing time series. Hence, we do not know whether our expectation is correct. There are some works concerning dynamic noise, for example, see [14,15]. However, the object of these studies is a map and one must first know the perfect model. In our approach, we do not need any a priori knowledge for systems to be considered and we can apply the method to both maps and flows. However, we assume that observational noise is not large and dynamic noise is significant for the system.1 The key feature of the method is its combination of pseudo-linear models and the recently proposed noisy least squares (NLS) method [26]. In our approach for detecting whether a system has dynamic noise, a modelling technique is applied. For building pseudolinear models, we perform parameter estimation and model selection. For parameter estimation the least squares (LS) method is applied, and for model selection an information criterion is applied. Hence, in the next section, we first review the LS method and description length. Next, using polynomial models including the correct model, we show differences in the models selected as the best model when the system has dynamic noise. As a more practical example, where there is no correct model, we also build models using radial basis functions for two artificial data sets, one system has dynamic noise and the other does not. Finally, we build models for two real-world time series, a laser time series and heart rate variability data. We find that the idea proposed in this paper can detect the presence or absence of dynamic noise in systems and provide a good estimate of the dynamic noise level to produce good simulations. 2. The least squares method and information criteria for modelling For building pseudo-linear models, we perform parameter estimation and model selection [10,11]. The least squares (LS) 1 In a computer simulation, round-off errors should play the role of the very small dynamic noise. However, it is usually considered that the influence can be ignored or the influence is not significant for the system.

55

method is a vital technique for building models of a nonlinear dynamical system given finite time series which are contaminated by observational noise [40]. These models typically have a large number of parameters and these must be tuned via parameter estimation. A commonly used method to estimate parameters is by least squares. After parameter estimation, we can calculate the fitting error (prediction error) and appropriate information criteria for model selection. That is, these operations are strongly interconnected with modelling. Hence, the LS method is very important [26,27]. In Section 2.1 we describe how the LS method is applied in practice and the drawbacks, and how the LS method should be applied. In Section 2.2 we describe the purpose of using information criteria and description length which we adopt as an information criterion. 2.1. The least squares method We now consider the problem of estimating the parameters λ ∈ Rk of a model xt+1 = f (xt , λ), xt ∈ Rd of a nonlinear deterministic dynamical system given only a finite time series of observations st contaminated by observational noise t (that is, st = xt + t ), where t is Gaussian with zero mean and fixed variance, and the data comprise a set of n scalar measurements. The common and practically used LS method solves the optimization problem min λ

n−1 X

kst+1 − f (st , λ)k2 ,

(1)

t=1

where only the noisy observations st are used for the fitting. This is a maximum likelihood method, which makes the assumption that the noise is Gaussian and independent. Even when these assumptions hold and the noise is white, it is known that LS is biased in the presence of observational noise. White observational noise at the output becomes coloured regression noise in the regression equation which violates the assumptions of LS [28,40]. This parameter estimates would be much less biased if we could solve the optimization problem min λ

n−1 X

kst+1 − f (xt , λ)k2 ,

(2)

t=1

where xt is the true state (noise free data) at time t. But of course we cannot know xt . So in Eq. (1) st is used as a proxy for the noise free data xt . This is clearly not a good thing to do because st is corrupted by noise [28,40]. When using Eq. (1) for nonlinear models, the fitting error (in other words, prediction error) is usually not the same as observational noise and also not normally distributed, even if the observational noise is Gaussian. However, when using Eq. (2), the fitting error is almost identical to the observational noise and normally distributed [27]. It should be noted that the normal distribution is very important because we usually use the assumption that the fitting error is Gaussian when using the LS method and deriving the information criteria formulae. For more details see [36,20,10]. We consider that the usage

56

T. Nakamura, M. Small / Physica D 223 (2006) 54–68

Fig. 1. Information criterion as a function of model size.

of Eq. (1) is “inappropriate” and that of Eq. (2) is “appropriate” as the LS method. For finding the best model among many, some information criteria are usually employed. Hence, we briefly review the basic concept for information criteria and Rissanen’s Description Length modified by Judd and Mees (DL γ ) [10], which we adopt here. Then, we show a significant example: when using Eq. (1), applying DL γ cannot find the correct model as the best model even with a perfect model class and even if the noise level is low. 2.2. Information criteria For building pseudo-linear models, the model size2 is important. Time series available to us are imperfect and insufficient, because they are usually contaminated by observational noise, and numerical accuracy is also limited. Using these unfavorable data, we have to build models, and we hope the models can reflect the underlying system and the influence of noise will be removed as much as possible. Hence, the model should not be fitted to the data either too closely or too poorly, this is called over-fitting and underfitting, respectively. To find the model that best balances model error against model size so as to prevent over-fitting or underfitting of the data, information criteria are often applied. Fig. 1 shows the rough sketch of the relationship between model size and fitting error (prediction error) for a generic information criterion. It is considered that the minimum of the information criterion corresponds to the best (optimal) model size and the smaller the value, the better the model [12,27]. Another important reason for using information criteria is to avoid unnecessary increase in the model size, which occurs when a model is built which models a nested, self-iterated, form of the original system. We refer to this kind of model as degenerate [13,26]. Although such a model is essentially identical to the original model, the size is larger and the model size increases indefinitely. Hence, it is important to remove nesting effect and determine the smallest model size which 2 For pseudo-linear models, the model size refers to the number of basis functions, which is the same as the number of parameters of the model, because the only parameters used to fit the data are the linear weights.

can substantially model the system, otherwise such models are clearly over-fitting [13,26]. Some information criteria have already been proposed for these purposes. The best known is the Akaike Information Criterion (AIC) [20], but it is known to fail to provide statistically consistent estimates [21]. The Rissanen’s Description Length modified by Judd and Mees (DL γ ) has proven to be effective in modelling nonlinear dynamics [8–11], and it has fewer approximations than other information criteria [10]. Hence, DL γ is possibly more reliable [21,10]. It is this criterion we use for determining the best (optimal) model. We note that other information criteria (Schwarz Information Criteria (SIC) [37], Normalized Maximum Likelihood (NML) [38]) give either the same, or very similar, results [27]. The minimum description length (MDL) principle states that the best model is the one that attains the greatest compression. Under the assumption that the fitting error (prediction error) is normally distributed, Judd and Mees showed that the description length DL γ can be approximated by DL γ (k) =

 eT e − 1 ln 2 n   X k 1 + (k + 1) + ln γ − ln δi , 2 i=1 n

(3)

where k is the model size, γ is related to the scale of the data (see below), e is the prediction errors, and the variables δ can be interpreted as the relative precision to which the parameters λ are specified. See more details in [10,11]. A more precise interpretation of γ is that it is the exponent in a floating point representation of the parameters scaled relative to some fixed amount, and it is supposed that 1 ≤ γ ≤ 32. There are good arguments that γ should be 1, but in practice larger values seem to be necessary to prevent over-fitting [12, 25]. We use γ = 32 (that is, DL 32 ) throughout this paper. Here, we note that we do not use DL γ as a strict criterion for model selection, rather we use it as a method for screening alternative formulations in order to produce a set of candidate models for further consideration. 3. Degeneracy and model selection To investigate whether the correct model is selected as the best model, we use one simple well known polynomial model, the Henon map [16], as a nonlinear AR model, and we consider using multivariate polynomial models. This model is the true and unique correct model by our definition. We note that, by the “best model”, we mean the model that has the smallest value of DL 32 . We expect that the DL 32 becomes the smallest when the model built is the correct model. It should be noted that selecting the optimal subset of basis functions is typically an NP-hard problem [10]. Although we usually apply a selection method for selecting basis functions, the models obtained are affected by the selection methods, and there is no guarantee of finding the optimal subset. That is, there is a possibility that the model obtained using various selection methods is not the truly best model. Hence, to obtain the truly

57

T. Nakamura, M. Small / Physica D 223 (2006) 54–68

best model we calculate all possible combinations (in other words, an exhaustive search). However, it should be noted that such a procedure is only practical for fairly small dictionaries. The Henon map [16] is given by  2 xt = A0 + A2 xt−2 + A4 xt−1 + ηt , (4) st = x t + t , where A0 = 1.0, A2 = 0.3 and A4 = −1.4, ηt is Gaussian dynamic noise and t is Gaussian observational noise. We use st as observational data and will use time delay linear polynomial models. Choosing lag = 3 and degree = 3 gives 20 candidate basis functions3 in the dictionary. Data we use is contaminated by 40 dB observational noise and the system does not have dynamic noise, and the other is contaminated by 60 dB observational noise and the system has Gaussian dynamic noise with standard deviation 0.006. We also investigate larger observational noise, 40 dB, where the system has the same level of dynamic noise as above. The number of data points used in all cases is 1000 and 10 000. Table 1 shows the model size of the best model selected. When the system does not have dynamic noise and the level of observational noise is 40 dB (this is the case(a) in Table 1), DL 32 (description length with γ = 32) is not the smallest when the model size is 3 (the correct model size). However, we note that the correct model is selected at the correct model size. That is, the correct model is not the best model. This is, in fact, a very good but degenerate, approximation to the correct model. We can find such good but degenerate approximations of sizes 6, 8 and 11.4 It has been shown elsewhere that the models can be reduced to essentially the correct model [13,26]. When we investigate when the data is contaminated by lower observational noise, 60 dB and 80 dB, the same phenomenon appears. When the system has dynamic noise and the level of observational noise is 60 dB (this is the case(b) in Table 1), DL 32 is the smallest when the model size is 3 (the correct model size). That is, the best model is the correct model. When the system has dynamic noise and the level of observational noise is 40 dB (this is the case(c) in Table 1), although the correct model is selected at the correct model size 3, the best model in both the data numbers is not the correct model and the best models are degenerate. This is the same phenomenon as that when the system does not have dynamic noise. In the example, the standard deviation of Gaussian dynamic noise is 0.006 and the standard deviations of 40 dB and 60 dB Gaussian observational noise are about 0.00721 and 0.00072 respectively. These results imply that broadly speaking, when the system has dynamic noise and the dynamic noise is larger than observational noise, the model selected as the best model will be the correct model. Otherwise, when the system has 3 The basis functions corresponding to the parameters are, A : constant, 0 2 , A : x2 , A : x2 , A : x3 , A1 : xt−1 , A2 : xt−2 , A3 : xt−3 , A4 : xt−1 7 t−1 5 t−2 6 t−3 3 , A8 : xt−2

3 , A9 : xt−3

2 x A13 : xt−1 t−2 ,

A10 : xt−1 xt−2 ,

A11 : xt−1 xt−3 ,

2 x 2 x A14 : xt−1 A15 : xt−2 t−3 , t−1 , 2 2 A17 : xt−3 xt−1 , A18 : xt−3 xt−2 and A19 : xt−1 xt−2 xt−3 .

A12 : xt−2 xt−3 , 2 x A16 : xt−2 t−3 ,

4 We have confirmed that the results using other information criteria such as SIC [37] and NML [38] are essentially the same.

Table 1 The size of the best model for the Henon map Data number

Case(a)

Case(b)

Case(c)

1 000 10 000

8 8

3 3

6 8

Case(a): the system does not have dynamic noise and the data is contaminated by 40 dB observational noise. Case(b): the system has dynamic noise and the data is contaminated by 60 dB observational noise. Case(c): the system has dynamic noise and the data is contaminated by 40 dB observational noise.

dynamic noise, the best model will tend to over-fit and will not be the correct model. As mentioned before, one of the important reasons for using information criteria is to avoid unnecessary increases in model size in which a model is built that reflects a nested, that is, selfiterated form of the original system. However, the result shows that we cannot avoid unnecessary increase in the model size and we cannot select the correct model as the best model, even when the noise level is low and truly the best model is obtained by calculating all possible combinations. Also, the size of the model selected as the best model is much larger than that of the correct model. It indicates that the best model is the result of some kind of “over-fitting”. For more details concerning degenerate model see [13,26]. To overcome this problem and to select the correct model as the best model, Nakamura and Small have proposed the noisy least squares (NLS) method, which is an idea to use the LS method more appropriately in practical situations [26]. Also, by this method we can take advantage of information criteria more effectively and generally avoid over-fitting. In the next section, we review the NLS method. 4. The noisy least squares method: Using the least squares method more appropriately Nakamura and Small have proposed an idea to use the LS method more appropriately (like Eq. (2)) without using the true state in practical situations [26]. As is well known, when the noise level is low, the reconstructed attractors using the noisy data are very similar to that using the noise free data, and also the parameters estimated are almost the same as the correct values. These facts indicate that the noisy data can be regarded as a good proxy for the true state when the noise level is low. From this, we expect that we can achieve a proxy of Eq. (2) using only noisy data when the noise in st is low compared to that of st+1 in Eq. (1). Hence, we propose the addition of larger Gaussian noise to the st+1 term in Eq. (1). 0 Let the additional Gaussian noise be t0 and st+1 = st+1 + 0 t+1 . Then we obtain the new optimization problem min λ

n−1 X

s 0

t+1

2 − f (st , λ) .

(5)

t=1

In Eq. (2), the st+1 term has more noise than xt . Hence, when the level of the additional noise is large enough relative to the noise included in the original noisy data st , we expect that the Eq. (5) can be a good approximation to the Eq. (2). We refer to the method as the “noisy least squares” (NLS) method [26].

58

T. Nakamura, M. Small / Physica D 223 (2006) 54–68

Fig. 2. (Colour online) A plot of the selected basis functions against additional noise levels, where the number of data points used is 1000 ((a), (b) and (c)) and 10 000 ((d), (e) and (f)), and the number in the parentheses is the number of parameters selected (the model size) in the model selected as the best model. (a) and (d) without dynamic noise and 40 dB observational noise, (b) and (e) with dynamic noise and 60 dB observational noise, and (c) and (f) with dynamic noise and 40 dB observational noise.

4.1. Application of the NLS method to the case of degenerate models Consider the Henon system (used in Section 3) again. We apply the NLS method to system identification. We again calculate all possible combinations to obtain the truly best model. We increase the additional noise level every 10 dB and add the noise level from 80 to 0 dB. Fig. 2 simultaneously shows the basis functions in models selected as the best model and the additional noise level (the y-axis index corresponds to that of the parameters given in Section 3). Fig. 2(a) and (d) show the results when the system does not have dynamic noise and the data is contaminated by 40 dB observational noise. The figures indicate that the selected basis functions do not change for a while. However, as the noise level becomes larger, the basis functions selected are identical to the basis functions in the correct model. The three basis functions are all correct basis functions, that is, the model is the correct model. Fig. 2(b) and (e) show the results when the system has dynamic noise and the data is contaminated by 60 dB observational noise. The figure indicates that when the number of data points is 10 000, only the correct basis functions are always selected. That is, the correct model is always selected. When the number of data points is 1000, the selected basis

functions do not change, until the noise level is 0 dB. From 80 to 10 dB, the three basis functions selected are all correct basis functions, that is, the model is the correct model. However, when the noise level is 0 dB, although the size of the best model is 3, two of three basis functions selected are correct and the other is not. Fig. 2(c) and (f) show the result when the system has dynamic noise and the data is contaminated by 40 dB observational noise. The figures indicate that the behaviour is very similar to that seen in Fig. 2(a) and (d). The results (Fig. 2(a), (b), (d) and (e)) indicate that there is a significantly different behaviour when the system has dynamic noise. When the system has dynamic noise, the best model selected by applying the NLS method at each noise level is the same as the best model selected using only the original data. However, when the system does not have dynamic noise, the best model selected by applying the NLS method at each noise level does change and the basis functions selected become identical to the correct basis functions, as the noise level becomes larger. However, it should be noted that when the observational noise is 40 dB and the system has dynamic noise (that is, the observational noise is larger than dynamic noise), the behaviour is very similar to when the system does not have dynamic noise. See Fig. 2(c) and (f). Hence, it indicates that it is important for

T. Nakamura, M. Small / Physica D 223 (2006) 54–68

our idea to detect the presence or absence of dynamic noise, that the level of observational noise is smaller than that of dynamic noise. That is, we want to use data which is not contaminated by much observational noise. However, it should be noted that this requirement is not unusual, because we always want to use clean data for modelling. 4.2. Some observations Here, we consider the reason why the correct model is selected as the best model without applying the NLS method when the Henon system, Eq. (4), has dynamic noise. The systems we consider have dynamics described by  xt = f (xt−1 , λ) + ηt , (6) st = g(xt ) + t , where xt ∈ Rd is the state of the system at time t, f is the dynamics (a nonlinear function), λ ∈ Rk are parameters of the system, st is our measurement of the system via an observation function g, and ηt and t are independent identically distributed (i.i.d.) random variates representing the dynamic noise and observational noise respectively. We decompose Eq. (6) further, when ηt is not zero. The formula xt = f (xt−1 , λ) + ηt can then be rewritten as xt0 = f (xt−1 , λ), xt = xt0 + ηt .

(7)

The operation in Eq. (7) is essentially the same as the NLS method, Eq. (5). Here, we use the observation function g in Eq. (6) as an identity equation (that is, xt = g(xt )) for simplicity. The formula st = g(xt ) + t can be written using Eq. (7) as st = x t + t , st = xt0 + ηt + t .

(8)

When observational noise t is smaller than dynamic noise ηt , broadly speaking, the features of observational noise will be buried in that of dynamic noise. Hence, the correct model would be selected as the best model as happens when applying the NLS method. However, when observational noise is larger than dynamic noise, broadly speaking, the features of dynamic noise will be buried in that of observational noise. Hence, the correct model would not be selected as the best model. The results in Section 3 show that when the system does not have dynamic noise, the best model size is larger than the correct model size. When the system has dynamic noise and dynamic noise is larger than observational noise, the best model is the correct model. When the system has dynamic noise and observational noise is larger than dynamic noise, the best model is not the correct model. That is, the correct model is selected only when dynamic noise is larger than observational noise. Section 4.1 shows that when the system does not have dynamic noise, when the NLS method is applied, as the additional noise level increases, the model size changes immediately and the same model is selected for a while and the model is the correct model. Also, when the system has dynamic noise, when the NLS method is applied, as the noise level increases, the

59

model size does not change, the same model continues to be selected and this model is the correct model. These are clearly different behaviours. The description of Section 4.1 provides a method to exploit the phenomenon observed in Section 3. Hence, although the NLS method is originally proposed to avoid over-fitting [26], we find that we can also detect whether a nonlinear system has dynamic noise by applying this method. In the previous examples, the results were very clear, because there were correct basis functions for the system in the dictionary and we could build the correct model. However, when using real-world time series, we usually do not know the system or the model. Hence, in such cases, we expect that when the NLS method is applied and the additional noise level is very large (for example, larger than 20 dB), the behaviour would be different from that seen in the examples. The model size will be smaller5 unlike the results seen in Fig. 2. When a system does not have dynamic noise, the model tends to be over-fitted, and hence the model size changes at relatively small additional noise level. On the other hand, when a system has dynamic noise, the model is not over-fitted and hence the model size changes at relatively large additional noise level. The rough sketch we expect is shown in Fig. 3. Using this different behaviour, we consider that we can discriminate whether the system has dynamic noise. We will investigate how this idea works in more practical cases in the next section. 5. Example and application In the previous sections, the results were very clear. However, the examples presented so far are somewhat unrealistic. The reasons are that: Firstly, correct basis functions for the system are in a dictionary. That is, there is the answer (the correct model). When using real-world time series, we usually do not know the system or the model. Secondly, when building models, we usually cannot calculate all possible combination sets (an exhaustive search) to obtain the truly best model. We usually apply a method for selecting basis functions from a dictionary [10,23,24]. Finally, the basis functions used are all polynomial. It is not recommended to build models using only polynomials, if one wants to build from strong model classes,6 that is, the models provide better approximations for a given number of basis functions. Generally, it is considered that the polynomial model class is not strong and probably should not be used by themselves [12]. To investigate how our idea works for detecting whether a system has dynamic noise in more practical cases, we build pseudo-linear models using radial basis functions [10]. We use Gaussian function f i (x) = exp(−kx − ci k2 /ri2 ) for arbitrary centres ci and radii ri . Many candidate basis functions are 5 Typically the size of a model built using very noisy data is relatively small. 6 A characteristic feature of strong models is that the fitting process is inherently nonlinear. To obtain a strong model it is necessary, although not sufficient, to have some nonlinear parameter dependence [39]. Hence, radial basis models are strong models because the centres (and scale radii where appropriate) should also be considered as parameters and it is these parameters that provide the essential nonlinear dependence of strong models [10].

60

T. Nakamura, M. Small / Physica D 223 (2006) 54–68

Fig. 3. The rough sketch of a plot of the size of the best model selected against additional noise levels. (a) A system does not have dynamic noise, and (b) a system has dynamic noise.

generated by random assignment of the centres ci and radii ri using time series in the form of a dictionary. It is almost always advantageous to have available constant and linear functions too. For more details on this procedure see [10,12]. Also, we use a selection method, the up-and-down method using marginal error [23,24]. As stated above, although selecting the optimal subset of basis functions from a dictionary is typically an NP-hard problem and selection algorithms cannot usually guarantee to find the optimal subset, the bottom-up method and the improved method, the up-and-down method, have proven to be effective in modelling nonlinear dynamics [9,8,23,24]. Especially, the up-and-down method is able to obtain better models in most cases than the bottom-up method [23,24]. Hence, we employ the up-and-down method. In the earlier examples, we could always obtain clear results for any noise level even when the noise level is relatively large. This was possible, because the correct basis functions were in the dictionary and all possible combination sets were always calculated. In practice, as we have to select a basis function from a dictionary using a selection method, the basis function selected comes under the influence of features of the training data. A general explanation for selecting basis functions is that the basis functions selected would be adequate for extracting the features of the training data, when the training data is very clean. However, when this is not the case, such basis functions would not be selected and basis functions which reflect noise would be added. In other words, when data are noisy, basis functions selection is influenced by noise features. That is, noisy data is not appropriate as training data for building models, and we should avoid the influence of noise in training data for selecting basis functions as much as possible. We want to use clean data in practice. Hence, to save time and trouble, we will use the following idea to apply the NLS method.7 7 In the previous examples, the case of degenerate models, to obtain the best model at each model size and to show that the correct model is the global best model under each condition, we calculated all possible combination sets. When we apply the idea mentioned here to the previous examples, we still find the correct model as the best model, although some models selected at each model size are not the best.

(1) We build models using the LS method, a selection method and training data as usual. (2) Keep using the model obtained (that is, the same basis functions selected) at each model size in step 1, DL 32 of these models is calculated again using the fitting error obtained by applying the NLS method. (3) We find the best model at each additional noise level. We first apply the idea to two artificial data sets, where we know the answer to whether the system has dynamic noise. One system is the differential equations of the Chua’s circuit [29] and the other is the model of the annual sunspot numbers using radial basis functions [30]. These time series are well known as nonlinear time series. To investigate the quality and performance of models obtained by applying the idea, we examine long-term free-run data, which is generated by using the models, because one needs to get the dynamics right to obtain good long-term free-run data. Then, we will confirm that our expectation described in Section 4.2 is correct. Based on the results, we apply the idea to a real-world time series, a laser time series [18] and a heart rate data obtained from a healthy adult male. 5.1. Example In this section, we investigate how our idea works and some differences from results presented previously in this example. We note that we expect that any time series is contaminated by observational noise to some extent. Hence, we contaminate the data with 60 dB Gaussian noise and use these as observational data in this section. 5.1.1. The differential equations of the Chua’s circuit This model is an electronic circuit proposed by Chua et al. [29]. The circuit dynamics are described by  dvc1   C1 = G(vc2 − vc1 ) − g(vc1 )   dt   dvc2 (9) C2 = G(vc1 − vc2 ) + i L  dt     di L  L = −vc2 dt

T. Nakamura, M. Small / Physica D 223 (2006) 54–68

Fig. 4. The mode of the best model size obtained at different additional noise levels for the models of the Chua’s circuit equations.

where vc1 , vc2 and i L denote the voltage across C1 , the voltage across C2 and the current through L, respectively, and g(v) is the piecewise-linear function: 1 g(v) = m 0 v + (m 1 − m 0 )|v + B p | 2 1 + (m 0 − m 1 )|v − B p |. 2

(10)

The parameters for both the equations are 1/C1 = 9.0, 1/C2 = 1.0, 1/L = 7.0, G = 0.7, m 0 = −0.5, m 1 = −0.8 and B p = 1.0. When calculating this data we use the fourth order Runge–Kutta method with sampling interval 0.1, and we use the vc1 component of the model. Then, we contaminate it with 60 dB Gaussian noise and use it as observational data. We note that the system does not have dynamic noise. For building a model using radial basis functions, 5000 data points are used as the training data and the up-and-down method using marginal error is applied [23,24]. The data is embedded using uniform embedding (t − 1, t − 5, t − 9) in 3 dimensions [19] with the aim of predicting a value at time t. The size of the best model is 67. We apply the NLS method and find the best model again at each noise level. We add Gaussian noise from 80 dB for every 5 dB and use 5 different noise realizations. When the NLS method is applied the model of smaller size is selected when the additional noise level is larger than that included in the original data (see Section 4.1). Hence, 80 dB is clearly too small as additional noise. However, in practice, we usually do not know the level of the observational noise and it is not unreasonable to expect the level of the observational noise to be less than 80 dB. Hence, on the assumption that we apply the NLS method to real-world time series where we do not know the observational noise level, we add Gaussian noise from 80 dB for the NLS method. Fig. 4 shows the mode of model size when the model has the smallest value of DL 32 (description length with γ = 32) at each noise level. Although the difference of the first plateau of model size 67 and the second plateau of model size 64 and 63 is not so large, this behaviour is very similar to that seen in Fig. 3(a). This result indicates that the difference of first plateau and second plateau may not be always large in practice. Also, the models obtained seem to be very sensitive to the additional noise level, when there is no correct model. In Section 4.1, the

61

model and model size changed when the additional noise level in the NLS method was larger than that included in the original training data. However, the result obtained here is different from this. When the additional noise level is 60 dB which is the same noise level as that included in the original training data, the model size is 64, which is smaller than that obtained using the original training data. When the noise levels are smaller than 60 dB, that is, 65 dB, 70 dB, 75 dB and 80 dB, the model of size 67 which is the same model size as that obtained using the original training data is selected as the best model. We investigate the models of size 63 and 67. Fig. 5 shows the reconstructed attractors of the original training data and those of the ubiquitous behaviours of the free-run data of these models. Fig. 5 shows that although the model of size 63 has less basis functions, the free-run behaviour is similar to that of the original data as well as the model of size 67. This result indicates that the model of size 63 has enough basis functions and hence the model of size 67 might be over-fitted. Here, it should be noted that when a system does not have dynamic noise, the behaviour of free-run data generated by the system is strongly influenced by the initial condition. To avoid this influence, we apply the following idea. We use all points in the training data as the initial condition, iterate 1000 times and take the last points. If the free-run data goes to fixed points or periodic orbits, the reconstructed attractor using the last points collected would clearly show fixed points or periodic orbits. Fig. 6 shows the reconstructed attractors of the data obtained in this way. For comparison, the reconstructed attractors of the original training data are also shown in Fig. 6(a). Fig. 6(b) and (c) are very similar, although the model of size 63 has less basis functions.8 It should be noted that although the original training data is contaminated by 60 dB Gaussian observational noise, we still see that the points in Fig. 6(a) clearly have no noise influence, while all the others do, that is, the attractor in each case is a little fuzzy. We found that the model of size 63 shows almost the same performance as that of the model of size 67. That is, the model of size 67 is probably over-fitted. Hence, we regard that the system considered does not have dynamic noise. This is in agreement with the original model of the differential equations of the Chua’s circuit. 5.1.2. The model of the annual sunspot numbers We use the model of the annual sunspot numbers built using the behaviour criterion by [30] which has excellent free-run dynamics. To build the model, the actual annual sunspot numbers, over the period 1700–2000, are used. The raw annual sunspot numbers √ yt are first transformed using the nonlinear function9 xt = 2 yt + 1 − 1 [31]. The purpose 8 This is a discussion for the colour graphic. Fig. 6(b) shows that there are most of all blue, pale blue, green and yellow points around the centres on either side and most orange and red points on the belt. Fig. 6(c) is almost the same as Fig. 6(b). 9 For all models in this section, we follow a suggestion of Tong, and apply the √ transform xt = 2 yt + 1−1. When presenting results, we undo this transform, 2  0 xt +1 that is, yt0 = − 1, where yt0 is the simulated sunspot number. 2

62

T. Nakamura, M. Small / Physica D 223 (2006) 54–68

Fig. 5. The reconstructed attractors of time series of the Chua’s circuit equations and a free-run of the models obtained. In each panel, 5000 data points are plotted. (a) Training data, (b) model size 67 and (c) model size 63.

Fig. 6. (Colour online) The reconstructed attractors using the training data and free-run time series of models obtained. We use all points in the training data as the initial condition, iterate 1000 times and take the last points. In each panel, 5000 data points are plotted. (a) Training data, (b) model size 67, and (c) model size 63.

Fig. 7. Time series of (a) the actual annual sunspot numbers and (b) a part of the training data.

of this transform is to weaken the nonlinearity and also to provide a fair comparison to previously published results. The data is embedded using the non-uniform embedding X t−1 = (xt−1 , xt−2 , xt−4 , xt−8 , xt−10 ) in 5 dimensions [11] for predicting xt , and the model is built using radial basis functions [30]. The model is xt+1 = f (X t ) + ηt ,

(11)

where f is the model and ηt is Gaussian dynamic noise. We use the free-run time series of the model. Fig. 7 shows the original annual sunspot numbers and the free-run time series contaminated by 60 dB Gaussian noise of the model obtained by [30], which we use as observational data and the original training data for building models. The free-run data of the model is very similar to the original time series of the annual sunspot numbers. We note that the system has dynamic noise

and if it were not for dynamic noise, the free-run data of the model is just periodic. For building a model using radial basis functions, 3000 data points are used as training data and the up-and-down method using marginal error is applied [23,24]. The data is embedded using the same strategy as the original model, that is, non-uniform embedding (t − 1, t − 2, t − 4, t − 8, t − 10) in 5 dimensions [19] with the aim of predicting a value at time t. The size of the model obtained as the best model is 16. We apply the NLS method and find the best model again at each noise level. We add Gaussian noise from 80 dB for every 5 dB and use 5 different noise realizations. Fig. 8 shows the mode of the best model size obtained. From 80 dB to 20 dB, the model size is 16, which is the same size of the model obtained using the original training data. From 10 dB to 0 dB, the model

T. Nakamura, M. Small / Physica D 223 (2006) 54–68

Fig. 8. The mode of the best model size obtained at different additional noise levels for the models of free-run annual sunspot numbers.

size decreases. This behaviour is very similar to that seen in Fig. 3(b). Hence, we regard that this system has dynamic noise. This is in agreement with the original model of the annual sunspot numbers. Here, we consider whether we can obtain a good estimate of dynamic noise level to add to the model built from the result. When we generate free-run data using a model built using radial basis functions, we often add dynamic noise to the model, where we expect that the model (system) has dynamic noise. However, we usually do not know the appropriate level of dynamic noise to add to the model. Either too much or too little dynamic noise is likely to lead to a wrong free-run behaviour [32]. As a rough standard for the level of dynamic noise, Judd and Mees suggest that the value of 60% of the root-mean-square fitting error (RMS error) of the model is a nominal value which allows some of the residual error to be due to dynamic noise and not observational noise [11]. Hence, we usually expect that the dynamic noise level would be between about 60% and 70% of the RMS error. As one of the rough standards, we use 65% of the RMS error as dynamic noise level. However, we do not claim that this level is generic. As a matter of fact, different dynamic noise levels are suggested. Judd and Mees have used 75% of the RMS error [10], Small and Judd use 10%, 20%, 30% and 60% of the RMS error [8], and Walker and Mees use 5% of the standard deviation of “the original data” (not the RMS error) [33]. Hence, it is important to obtain a good estimate of the dynamic noise level (even if the estimate is rough), because there is no good indication now. From Fig. 8 we consider that the model size would change when the additional noise becomes more significant than dynamic noise to be included in the model. Hence, we expect that appropriate dynamic noise level added to the model would be around the noise level when the model size changes. In this example, the model size changes when the noise level is 15 dB, and the standard deviation of the noise level is 1.014344. We investigate free-run behaviour using diverse levels of dynamic noise. In Fig. 9, we show the original training data and the ubiquitous behaviour of free-run data generated by the model of size 16 using some levels of dynamic noise for comparison of influence of different dynamic noise levels.

63

Dynamic noise added is Gaussian and the standard deviation10 is 1.522003 (this value is used in the original annual sunspot numbers model), 1.1, 0.998863 (this value is 65% of the RMS error) and 0.6. Fig. 9(c) shows that although the free-run behaviour is stable over long periods, as shown in Fig. 9(d), the behaviour is slightly missing in some areas. This would mean that the dynamic noise level is larger. Fig. 9(e) shows that the free-run behaviour is stable over long periods, Fig. 9(f) shows that the behaviour is more “sunspot-like” oscillations, and both the panels are very similar to the original training data seen in Fig. 9(a) and (b). Fig. 9(g) shows that the free-run behaviour is stable over long periods, however, Fig. 9(h) shows the behaviour is periodic rather than sunspot-like oscillations. Fig. 9(i) and (j) show that the behaviour is more periodic than that seen in Fig. 9(g) and (h), and the amplitude is globally smaller than the training data. In the last two cases, these would mean that the dynamic noise level is smaller. From the result we expect that dynamic noise level 1.1 would be the appropriate level for the model built, where 1.1 is not much different from 1.014344 which is the 15 dB observational noise level. Also, 65% of the RMS error (0.998863) is not much different from the value of 1.1. Hence, we consider that the noise level when the model size changes and the RMS error is 65% is a good rough standard of the dynamic noise level. 5.2. Application In the previous examples, we built models using radial basis functions where we do not use correct basis functions. From the results, we find that when a system does not have dynamic noise the model tends to be over-fitted, and hence the model size changes immediately as the noise level increases when we apply the NLS method. In other words, the model size changes at relatively small additional noise level. When a system has dynamic noise the model is not over-fitted and hence the model size changes not immediately as the noise level increases. In other words, the model size changes at relatively large additional noise level. The results are essentially the same as those of the Henon map, and we can therefore discriminate the systems correctly. We now apply the method to two realworld data sets. 5.2.1. Application to a laser time series We apply our idea to building models of a laser time series. The laser is a neodymium-doped yttrium aluminum garnet laser. The time series is considered as chaotic because it has at least one positive Lyapunov exponent associated with its evolution, and a broad and continuous Fourier power spectrum [18]. Fig. 10(a) shows the time series used as training data for building models. For building a model using radial basis functions, 2000 data points are used as the training data and the up-and-down method using marginal error is applied [23,24]. The data is 10 We cannot meaningfully define “SNR” for dynamic noise. Hence, we use percentage of signal standard deviation to show how much dynamic noise is added.

64

T. Nakamura, M. Small / Physica D 223 (2006) 54–68

Fig. 9. The training data and free-run data of models obtained. Panels (a) and (b): training data, and (c) and (d): dynamic noise level is 1.522003 (this is the same value used in the original annual sunspot numbers model), (e) and (f): dynamic noise level is 1.1, (g) and (h): dynamic noise level is 0.998863 (this is 65% of the RMS error), and (i) and (j): dynamic noise level is 0.6, where the size of the model used is 16 and panels (b), (d), (f), (h) and (j) are enlargements of (a), (c), (e), (g) and (i), respectively.

embedded using uniform embedding (t − 1, t − 6, t − 11, t − 16, t − 21) in 5 dimensions [19] with the aim of predicting a value at time t. The size of the best model selected is 30. We apply the NLS method and find the best model again at each noise level. We add Gaussian noise from 80 dB for every 5 dB and use 5 different noise realizations. Fig. 10(b) shows the mode of the best model size obtained. From 80 dB to 35 dB, the model size is 30, which is the same size of the model obtained using the original training data, and from 35 dB, the model size keeps decreasing. This behaviour is very similar to that seen in Fig. 3(b). Hence, we regard that the system of laser data has dynamic noise. This conclusion is in agreement with the

explanation for the laser system published by [18]. According to [18], there is noise associated with the intrinsic quantum fluctuations and the laser is pumped by inputs of diode laser which would have fluctuations even without any other effects, and these factors work as dynamic noise.11 Here, based on the result of Section 5.1.2, we investigate the level of dynamic noise to add to the model. The model size changes when the noise level is 30 dB, where the standard deviation of 30 dB observational noise added in the 11 Although we use the term “dynamic noise”, Abarbanel characterizes the noise high-dimensional unexplained dynamics [18].

T. Nakamura, M. Small / Physica D 223 (2006) 54–68

65

the behaviour is qualitatively similar to Fig. 11(a) except for intermittent bursting. This would mean that the dynamic noise level is larger. Fig. 11(c) shows that the behaviour is stable over long periods, and it is very similar to the data seen in Fig. 11(a). This would mean that the dynamic noise level is appropriate. Fig. 11(d) shows that although the behaviour is stable over long periods, it is very poor. This would mean that the dynamic noise level is smaller.

Fig. 10. The laser time series used as training data and the mode of the best model size obtained at different additional noise levels for the models of the laser data.

NLS method is 1.165343. Hence, we expect that appropriate dynamic noise level added to the model would be around 1.165343. We investigate free-run behaviour using diverse levels of dynamic noise. In Fig. 11, we show data in a certain interval after the training data and the ubiquitous behaviour of free-run data generated by the model of size 30, where dynamic noise added is Gaussian and the standard deviation is 1.1, 0.999956 (this is 65% of the RMS error) and 0.6. Fig. 11(b) shows that

5.2.2. Application to a heart rate variability data In one normal, resting subject the heart rate was monitored for about 20 min using electrocardiogram (ECG). The instantaneous heart rate is expressed as the interval between consecutive heart beats. This is easily estimated from the ECG as the time between the peaks of consecutive R-waves, the so-called R–R intervals. A total of 1500 R–R intervals were measured from the subject. Fig. 12 shows the time series. Fig. 12(a) is training data for building models and Fig. 12(b) is a part of Fig. 12(a). As shown in the figure, their behaviours fluctuate greatly, although the subject was at rest. We assume that the data is stationary because the data was measured when the subject had eyes closed and was at rest, that is, the subject was at rest physically and he was asked not to concentrate on anything. Also, we expect the heart rate data we use here to be nonlinear. As a preliminary test, we have applied linear surrogate methods, FT, AAFT and IAAFT surrogate data [1,2], to the heart rate data, where the embedding dimension is 5, the time-lag is 2, the number of data points used is 1024, and the slope of the correlation sum [17] (this indicates the local correlation dimension) is used as the discriminating statistic. The result shows that there is a clear difference between the

Fig. 11. A certain interval after the training data and free-run data of the model of size 30. (a): data in a certain interval after the training data, (b): free-run data when dynamic noise level is 1.1, (c): free-run data when dynamic noise level is 0.999956 (this is 65% of the RMS error), and (d): free-run data when dynamic noise level is 0.6.

66

T. Nakamura, M. Small / Physica D 223 (2006) 54–68

Fig. 12. Time series of (a) the heart rate data used as training data and (b) a part of the training data.

original data and the surrogate. Hence, we regard that the heart rate data we use would not be linear but could be nonlinear. For building a model using radial basis functions, 1500 data points are used as the training data and the up-and-down method using marginal error is applied [23,24]. The data is embedded using uniform embedding (t − 1, t − 3, t − 5, t − 7) in 4 dimensions [19] with the aim of predicting a value at time t. The size of the best model selected is 10. We apply the NLS method and find the best model again at each noise level. We add Gaussian noise from 80 dB for every 5 dB and use 5 different noise realizations. Fig. 13 shows the mode of the best model size obtained. From 80 dB to 15 dB the model size does not change and from 10 dB the model size changes. This behaviour is very similar to that seen in Fig. 3(b). Hence, this result implies that the heart rate system has dynamic noise.12 That is, the heart rate variability seen is not due to the action of only low-dimensional deterministic components of the system. This is in agreement with the physiology of the cardiovascular system. The heart rate is determined by the activity in the autonomic nerves which supplies the sinus node. The activity in these fibres is not only determined by simple feedback from the baroreceptors (pressure sensors) in the cardiovascular system, but is also influenced by inputs from many other systems, including hormonal systems, and higher centres such as the cerebral cortex. The normal heart beat is initiated by a group of pacemaker cells, the sinus node, located in the right atrium. In isolation, the sinus node appears to be relatively stable, producing heart beats at a regular rate. This is evident in heart transplant patients where the nervous innervation to the heart is missing. In these patients the instantaneous heart rate shows much less variation than in normal subjects [34]. Also, when patients have severe cardiac heart disease, the behaviour of the heart rate variability is also sometimes very periodic [35]. This implies that an important source of the normal variability is the external forcing of the sinus node by the nervous system. Hence, the normal heart rate variability could be an example of a system where the fluctuations arise as a consequence of a nonlinear deterministic system (the sinus node) being forced by a high-dimensional input (the activity in the nerves innervating the sinus node). Here, based on the result of Section 5.1.2, we investigate the level of dynamic noise to add to the model built. The 12 It may be better to call dynamic noise “high-dimensional unexplained dynamics input to control the heart” in this case.

Fig. 13. The mode of the best model size obtained at different additional noise levels for the models of the heart rate data.

model size changes when the noise level is 10 dB, where the standard deviation of 10 dB observational noise added in the NLS method is 0.015866. Hence, we expect that appropriate dynamic noise level added to the model would be around 0.015866. We investigate free-run behaviour using diverse levels of dynamic noise. In Fig. 14, we show the ubiquitous behaviour of free-run data generated by the model of size 10, where dynamic noise added is Gaussian and the standard deviation is 0.013, 0.011009 (this is 65% of the RMS error) and 0.009. Fig. 14(a) shows that the values of free-run data is completely different from those of actual heart rate data used as training data. The data drops from heart rate data range to unphysiological range, that is, the data become negative. This would mean that the dynamic noise level is larger. Fig. 14(c) shows that the behaviour is stable over long periods, and also Fig. 14(c) and (d) are very similar to the actual heart rate data seen in Fig. 12. This would mean that the dynamic noise level is appropriate. Fig. 14(e) shows that although the behaviour fluctuates and is stable over long periods, the amplitude is globally smaller than the training data and Fig. 14(f) shows that the behaviour is slightly poor. This would mean that the dynamic noise level is smaller. 6. Summary and conclusion This paper has described a method for detecting whether a system has dynamic noise under the assumption that observational noise is not large. Also, we found the noise level between about 60% and 70% of the RMS error and the noise level when the model size changes when applying the NLS method can be a good rough standard for dynamic noise level to add to the models.

T. Nakamura, M. Small / Physica D 223 (2006) 54–68

67

Fig. 14. Free-run data of the model of size 10. (a) and (b): dynamic noise level is 0.013, (c) and (d): dynamic noise level is 0.011009 (this is 65% of the RMS error), and (e) and (f): dynamic noise level is 0.009, where (b), (d) and (f) are enlargements of (a), (c) and (e), respectively.

Finally, we note that to make our approach a success, it is important that observational noise be relatively small. As the result shows in Section 4.1, when the level of observational noise is substantially larger than that of dynamic noise for the system, the behaviour of the result obtained by applying our idea is almost the same as when the system does not have dynamic noise. Acknowledgments We would like to acknowledge Professor H.D.I. Abarbanel (The University of California) for providing us the neodymiumdoped yttrium aluminum garnet laser data. This work was supported by a direct allocation grant from Hong Kong Polytechnic University (A-PG58). References [1] J. Theiler, S. LuDanK, A. Longtin, B. Galdrikian, J.D. Farmer, Testing for nonlinearity in time series: the method of surrogate data, Physica D 58 (1992) 77–94. [2] T. Schreiber, A. Schmitz, Improved surrogate data for nonlinearity tests, Phys. Rev. Lett. 77 (1996) 635–638. [3] T. Schreiber, A. Schmitz, Discrimination power of measures for nonlinearity in a time series, Phys. Rev. E 55 (1997) 5443–5447.

[4] T. Schreiber, A. Schmitz, Surrogate time series, Physica D 142 (2000) 346–382. [5] J. Bhattacharya, Detection of weak chaos in infant respiration, IEEE Trans. Syst. Man Cybern. 31 (4) (2001) 637–642. [6] K.T. Dolan, M. Spano, Surrogate for nonlinear time series analysis, Phys. Rev. E 64 (4) (2001) 046128. [7] J. Theiler, P.E. Rapp, Re-examination of the evidence for lowdimensional, nonlinear structure in the human electroencephalogram, Electroencephalogr. Clin. Neurophysiol. 98 (1996) 213–222. [8] M. Small, K. Judd, Detecting nonlinearity in experimental data, Internat. J. Bifur. Chaos 8 (6) (1998) 1231–1244. [9] M. Small, K. Judd, Comparisons of new nonlinear modeling techniques with applications to infant respiration, Physica D 117 (1998) 283–298. [10] K. Judd, A.I. Mees, On selecting models for nonlinear time series, Physica D 82 (1995) 426–444. [11] K. Judd, A.I. Mees, Embedding as modeling problem, Physica D 120 (1998) 273–286. [12] K. Judd, Building optimal models of time series, in: G. Gouesbet, S. Meunier-Guttin-Cluzel, O. M´enard (Eds.), Chaos and its Reconstruction, Nova Science Pub Inc, New York, 2003, pp. 179–214. [13] K. Judd, T. Nakamura, Degeneracy of time series models: The best model is not always the correct model, Chaos 16 (3) (2006) 033105. [14] K.H. Chon, J.K. Kanters, R.J. Cohen, N.-H. Holstein-Rathlou, Detection of chaotic determinism in time series from randomly forced maps, Physica D 99 (1997) 471–486. [15] J.P.M. Heald, J. Stark, Estimation of noise levels for models of chaotic dynamical systems, Phys. Rev. Lett. 84 (2000) 2366–2369. [16] M. Henon, A two-dimensional map with a strange attractor, Commun. Math. Phys. 50 (1976) 69–77.

68

T. Nakamura, M. Small / Physica D 223 (2006) 54–68

[17] H. Kantz, T. Schreiber, Nonlinear Time-Series Analysis, in: Cambridge Nonlinear Science Series, Number 7, Cambridge University Press, Cambridge, 1997. [18] H.D.I. Abarbanel, Analysis of Observed Chaotic Data, Springer-Verlag, New York, 1996. [19] F. Takens, Detecting strange attractors in turbulence, Lect. Notes Math. 898 (1981) 366–381. [20] H. Akaike, A new look at the statistical identification model, IEEE Trans. Automat. Control 19 (1974) 716–723. [21] J. Rissanen, Stochastic Complexity in Statistical Inquiry, World Scientific, Singapore, 1989. [22] A.I. Mees, Nonlinear Dynamics and Statistics, Birkh¨auser, Boston, 2000. [23] T. Nakamura, K. Judd, A.I. Mees, Refinements to model selection for nonlinear time series, Internet J. Bifur. Chaos 13 (5) (2003) 1263–1274. [24] T. Nakamura, D. Kilminster, K. Judd, A.I. Mees, A comparative study of model selection methods for nonlinear time series, Internat. J. Bifur. Chaos 14 (3) (2004) 1129–1146. [25] T. Nakamura, Modelling nonlinear time series using selection methods and information criteria, Ph.D. Thesis, School of Mathematics and Statistics, The University of Western Australia, 2004. [26] T. Nakamura, M. Small, Modelling nonlinear time series using improved least squares method, Internat. J. Bifur. Chaos 16 (2) (2006) 445–464. [27] T. Nakamura, M. Small, A comparative study of information criteria for model selection. Internat. J. Bifur. Chaos 16 (8) (2006) (in press). [28] T. Nakamura, Y. Hirata, K. Judd, D. Kilminster, M. Small, Improved parameter estimation from noisy time series for nonlinear dynamical systems. Internat. J. Bifur. Chaos 17 (3) (2007) (in press).

[29] T. Matsumoto, L.O. Chua, M. Komuro, The double scroll, IEEE Trans. Circuits Syst. 32 (1985) 797–818. [30] D. Kilminster, Modelling Dynamical Systems via Behaviour Criterion. Ph.D. Thesis, School of Mathematics and Statistics, The University of Western Australia, 2003. [31] H. Tong, Non-linear Time Series: A Dynamical Systems Approach, Oxford Univ. Press, Oxford, New York, 1990 (Chapter 7.3), pp. 419–429. [32] M. Small, C.K. Tse, Applying the method of surrogate data to cyclic time series, Physica D 164 (2002) 187–201. [33] D. Walker, A.I. Mees, Reconstructing nonlinear dynamics by extended Kalman filtering, Internat. J. Bifur. Chaos 8 (1998) 557–569. [34] K.E. Sands, M.L. Appel, L.S. Lilly, F.J. Schoen, G.H. Mudge Jr., R.J. Chohen, Power spectrum analysis of heart rate variability in human cardiac transplant recipients, Circulation 79 (1) (1989) 76–82. [35] A.L. Goldberger, Fractal mechanisms in the electrophysiology of the heart, IEEE Eng. Med. Biol. Mag. 11 (2) (1992) 47–52. [36] L. Ljung, E.J. Ljung, System Identification: Theory for the User, in: Prentice Hall Information and System Sciences Series, 1999. [37] G. Schwarz, Estimating the dimension of a model, Ann. Statist. 6 (1978) 461–464. [38] J. Rissanen, MDL denoising, IEEE Trans. Inform. Theory 46 (7) (2000) 2537–2543. [39] A.R. Barron, Universal approximation bounds for superpositions of a sigmoidal function, IEEE Trans. Inform. Theory 39 (1993) 930–945. [40] P. McSharry, L.A. Smith, Better nonlinear models from noisy data: Attractors with maximum likelihood, Phys. Rev. Lett. 83 (1999) 4285–4288.