Statistical Methodology 4 (2007) 461–480 www.elsevier.com/locate/stamet
Estimating the Lyapunov exponent from chaotic time series with dynamic noise Koji Yonemoto a,∗ , Takashi Yanagawa b a Department of Medicine and Clinical Science, Graduate School of Medical Sciences, Kyushu University, Health C&C
Center, Kubara 1822-1, Hisayama Town, Kasuya-gun, Fukuoka 811-2501, Japan b Biostatistics Center, Kurume University, 67 Asahi-machi, Kurume, 830-0011, Japan
Received 26 October 2005; received in revised form 23 July 2006; accepted 13 February 2007
Abstract In this paper, we propose an estimator of the Lyapunov exponent of the skeleton for chaotic time series with dynamic noise and prove the consistency of the estimator under some assumptions. c 2007 Elsevier B.V. All rights reserved.
Keywords: Embedding dimension; Delay time; Nadaraya–Watson kernel type estimator; φ-mixing; Lyapunov exponent
1. Introduction Methods of analyzing experimental and observational data for evidence of chaos have been applied to data in physics, geology, astronomy, neurobiology, ecology and economics. Among them is the Lyapunov exponent which often plays a key role of detecting chaos. Conventional methods for estimating the Lyapunov exponent are reliable if the data are abundant, if measurement error is near 0, and if the data really come from a deterministic system. However, with limited data or a system subject to non-negligible stochastic perturbations, it is well known that the estimates may be incorrect or ambiguous. We consider the following non-linear autoregressive system with additive noise, X t = F(X t−τ , X t−2τ , . . . , X t−dτ ) + εt ,
(1)
where F is an unknown non-linear function, d and τ are unknown positive integers called the ∗ Corresponding author. Tel.: +81 92 652 3080; fax: +81 92 652 3075.
E-mail addresses:
[email protected] (K. Yonemoto), yanagawa
[email protected] (T. Yanagawa). c 2007 Elsevier B.V. All rights reserved. 1572-3127/$ - see front matter doi:10.1016/j.stamet.2007.02.001
462
K. Yonemoto, T. Yanagawa / Statistical Methodology 4 (2007) 461–480
embedding dimension and delay time, respectively, and {εt } is a sequence of random noise which is called dynamic noise. Several methods have been developed to overcome the difficulty. Kostelich and Yorke [9] approximated F by polynomials and separated the signal from the noise, and Pikovsky [13], Landa and Rozenblum [10], Cawley and Hsu [3], and Sauer [14] filtered out the noise by using linear filters. Yao and Tong [17] explored alternative measures of detecting chaos in observational data. We take the same approach as Pikovsky [13], Landa and Rozenblum [10], Cawley and Hsu [3], and Sauer [14], but use kernel-type estimators for filtering out the noise. Model (1) may be represented as (d,τ )
(d,τ )
= F(Xt−τ ) + et ,
Xt
(d,τ )
where Xt = (X t , X t−τ , . . . , X t−(d−1)τ ), F(x) = (F(x), x1 , x2 , . . . , xd−1 ) for x = (x1 , x2 , . . . , xd )0 , and et = (εt , 0, . . . , 0)0 . McCaffrey et al. [11] proposed a consistent estimator of the stochastic Lyapunov exponent for {X t } defined by λ = lim log kJt−1 · Jt−2 · · · J0 k, t→∞
(2)
where Jt = DF(Xt ), and claimed that any system with positive stochastic Lyapunov exponent is chaotic, whether or not it is purely deterministic. Whang and Linton [16] showed asymptotic normality of the estimator. However, Dennis et al. [6] showed that it is possible for the stochastic Lyapunov exponent (2) to be positive when the Lyapunov exponent of the underlying deterministic model is negative, and vice versa, and claimed that “positive stochastic Lyapunov exponents should not be viewed as a hallmark of chaos”. We go along with them. We define the skeleton of {X t } as (d,1)
Yt
(d,1)
= F(Yt−1 ),
(d,1)
where Yt = (Yt , Yt−1 , . . . , Yt−d+1 ). Our goal is to estimate the Lyapunov exponent of {Yt } by observing {X t }. Our method consists of three steps. First, estimate the embedding dimension and delay time. Second, estimate the skeleton by the Nadaraya–Watson kernel-type estimator by using the estimated embedding dimension and delay time in Step 1. Last, generate the data from the estimated skeleton and estimate the Lyapunov exponent by using the generated data and partial derivative of the estimated skeleton. The consistency of the proposed estimator is proved. The present paper is organized as follows. We propose a method for estimating the Lyapunov exponent in Section 2. In Section 3, the consistency of the estimator is proved. In Section 4, the behavior of the procedure is evaluated numerically. 2. Estimator of the Lyapunov exponent 2.1. Basic definitions We consider {X t }t=1,2,...,N generated from X t = F(X t−τ0 , X t−2τ0 , . . . , X t−d0 τ0 ) + εt ,
(3)
where d0 and τ0 are unknown positive integers and F : R d0 → R is an unknown non-linear function such that {Yt } is ergodic, where Yt = F(Yt−1 , Yt−2 , . . . , Yt−d0 ). We assume that {X t } is a discrete-time strictly stationary time series with E X t2 < ∞ and that for any positive integer t, E[εt |At−1 1 (X )] = 0,
almost surely,
(4)
463
K. Yonemoto, T. Yanagawa / Statistical Methodology 4 (2007) 461–480
and 2 E[εt2 |At−1 1 (X )] = σ ,
(σ > 0), almost surely,
where Ats (X ) denotes the sigma algebra generated by (X s , . . . , X t ). It follows that F(X t−τ0 , . . . , X t−d0 τ0 ) = E[X t |X t−τ0 , . . . , X t−d0 τ0 ]. The embedding dimension and delay time are defined as follows. Definition 2.1. The time series {X t } is said to have the embedding dimension d0 and the delay time τ0 if and only if there exist non-negative integers d0 < ∞ and τ0 < ∞ such that E[X t |X t−τ , X t−2τ , . . . , X t−dτ ] 6= E[X t |X t−τ0 , X t−2τ0 , . . . , X t−d0 τ0 ]
a.e.
(5)
a.e.
(6)
for any d < d0 , and any τ > 0, and E[X t |X t−τ , X t−2τ , . . . , X t−dτ ] = E[X t |X t−τ0 , X t−2τ0 , . . . , X t−d0 τ0 ]
for any (d, τ ) ∈ B(d0 , τ0 ), where B(d0 , τ0 ) = {(d, τ )|{τ0 , 2τ0 , . . . , d0 τ0 } ⊂ {τ, 2τ, . . . , dτ }}. Model (3) may be represented as (d0 ,τ0 )
(d ,τ )
0 0 ) + et , = F(Xt−τ 0
Xt
(7)
where F(x) = t (F(x), x1 , . . . , xd0 −1 ) (d0 ,τ0 )
for x = t (x1 , x2 , . . . , xd0 ) ∈ R d0 ,
= t (X t , X t−τ0 , . . . , X t−(d0 −1)τ0 ),
Xt
and
et = (εt , 0, . . . , 0). t
Then skeleton of model (7) is defined as follows. Definition 2.2 (Skeleton). We refer to the following model as the skeleton of model (7). (d0 ,1)
0 ), = F(Yt−1
(d0 ,1)
= t (Yt , Yt−1 , . . . , Yt−d0 +1 ).
Yt
(d ,1)
(8)
where Yt
(d ,1)
Since {Yt 0 } is ergodic from the assumption, the Lyapunov exponent of the skeleton (8) is defined as follows. Definition 2.3 (Lyapunov Exponent). The Lyapunov exponent λ of the skeleton (8) is defined as λ = lim
M→∞
1 (d ,1) (d0 ,1) (d ,1) log kDF(Y M0 )DF(Y M−1 ) · · · DF(Y1 0 )k, M
(d0 ,1)
where DF(Yt (d ,1) Yt 0 ,
) is the matrix of partial derivatives of the map F : Rd0 → Rd0 evaluated at
kT k = sup kT xk
and
kxk=1
kxk =
d0 X
! 12 xi2
i=1
for the T : d0 × d0 matrix and x = t (x1 , x2 , . . . , xd0 ) ∈ Rd0 .
464
K. Yonemoto, T. Yanagawa / Statistical Methodology 4 (2007) 461–480
2.2. The procedure for estimating the Lyapunov exponent Suppose that X 1 , X 2 , . . . , X N are observed. Then we propose a procedure for estimating the Lyapunov exponent of the skeleton from {X t }t=1,2,...,N . An outline of the procedure is as follows. (1) Estimate the embedding dimension d0 and delay time τ0 from {X t }t=1,2,...,N by the procedure proposed by Yonemoto and Yanagawa [18]. Denote the estimated embedding dimension and delay time by dˆ0 and τˆ0 , respectively. (2) Estimate the skeleton from {X t }t=1,2,...,N by the Nadaraya–Watson kernel-type estimator ˆ N ,t }t=1,2,...,M from the estimated skeleton by giving an using dˆ0 and τˆ0 , and generating {Y appropriate initial vector. (3) Estimate the Lyapunov exponent by using the generated data and partial derivatives of the estimated skeleton. The details of each steps are given in the following subsections. 2.3. Estimating the embedding dimension and delay time Let {X 1 , . . . , X N } be a set of observed data. For positive integers d, τ and L ≥ dτ , put C V (d, τ ) =
N X 1 (X t − Fˆ\t (d,τ ) (X t−τ , . . . , X t−dτ ))2 , N − L + 1 t=L
where Fˆ\t (d,τ ) denotes the Nadaraya–Watson kernel-type estimated regression function [12,15] with the t-th point deleted, that is, Fˆ\t (d,τ ) (z) =
1 N−L
N X
X s K d,h (z − (X s−τ , . . . , X s−dτ ))( fˆ\t (d,τ ) (z))−1
s=L ,s6=t
for z = (z 1 , z 2 , . . . , z d ), where the summation over s omits t in each case, and fˆ\t (d,τ ) (z) =
1 N−L
N X
K d,h (z − (X s−τ , . . . , X s−dτ )),
s=L ,s6=t
and K d,h (z) =
1
d Y
h dd,N
i=1
K
zi h d,N
,
(9)
where K : R → R is taken as a density function of a standard normal distribution in this paper, that is, K is a Gaussian kernel. Then Fueda and Yanagawa [7] proposed the following procedure for estimating the embedding dimension and delay time and showed the consistency of their estimator. (F-Y1) Give sufficiently large integers D (≥d0 ) and T (≥τ0 ), and set L = DT . Set h d,N = h d,N (c) = c × N −1/(2d+1) . For each d ∈ {1, 2, . . . , D} and τ ∈ {1, 2, . . . , T }, compute C V (d, τ ). (F-Y2) For each τ ∈ {1, 2, . . . , T }, minimize C V (d, τ ) with respect to d ∈ {1, 2, . . . , D}, and denote the minimizer as dˆ0 (τ ). (F-Y3) Find dˆ0 = minτ dˆ0 (τ ) and τˆ0 = argminτ dˆ0 (τ ).
K. Yonemoto, T. Yanagawa / Statistical Methodology 4 (2007) 461–480
465
However, the procedure often fails in practice. Yonemoto and Yanagawa [18] investigated the cause of this, and modified the procedure to obtain good estimates from finite data. The modified procedure is as follows. (1) Give sufficiently large integers D (≥d0 ) and T (≥τ0 ), and set L = DT . Set h d,N = h d,N (c) = c × N −1/(2d+1) . (2) For each d ∈ {1, 2, . . . , D} and τ ∈ {1, 2, . . . , T }, minimize C V (d, τ )|h d,N =h d,N (c) with respect to c ∈ C by the method illustrated below and put CˆV (d, τ ) = min C V (d, τ )|h d,N =h d,N (c) . c∈C
(3) Then select d ∈ {1, 2, . . . , D} and τ ∈ {1, 2, . . . , T } which attain the ‘minimum’ value of {CˆV (d, τ ) : d ∈ {1, 2, . . . , D}, τ ∈ {1, 2, . . . , T }} as estimators of the embedding dimension dˆ0 and delay time τˆ0 based on the procedure given below. The detail of the minimization of C V (d, τ )|h d,N =h d,N (c) with respect to c ∈ C is given as follows. (a) Give a large real number cmax , and for each d ∈ {1, 2, . . . , D} and τ ∈ {1, 2, . . . , T }, compute C V (d, τ )|h d,N =h d,N (c) and C V (d, τ )|h d,N =h d,N (c+0.1) starting from c = 0.1, compare these values, and if c < cmax and C V (d, τ )|h d,N =h d,N (c) ≥ C V (d, τ )|h d,N =h d,N (c+0.1) then put c = c + 0.1 and repeat the computation; else if, stop and decide c1 = c. (b) In the neighborhood of c1 , find c2 = argmin C V (d, τ )|h d,N =h d,N (c) in class C1 = {c1 − 0.09, c1 − 0.08, . . . , c1 + 0.09}. (c) Furthermore, in the neighborhood of c2 , compute CˆV (d, τ ) = min{C V (d, τ )h d,N =h d,N (c) : c ∈ C2 }, c
where C2 = {c2 − 0.009, c2 − 0.008, . . . , c2 + 0.009}. The procedure for selecting dˆ0 and τˆ0 is as follows. (d) Put C V ∗ (τ ) = min{CˆV (d, τ ) : d ∈ {1, 2, . . . , D}} for each τ ∈ {1, 2, . . . , T }, and find for given ε > 0, ˆ ) = min{d : |CˆV (d, τ ) − C V ∗ (τ )| < ε}. d(τ ˆ ), τ ) | τ ∈ {1, 2, . . . , T }}, and then find (e) Next put C V ∗ = min{CˆV (d(τ ˆ ) : |CˆV (d(τ ˆ ), τ ) − C V ∗ | < ε} dˆ0 = min{d(τ and ˆ ) : |CˆV (d(τ ˆ ), τ ) − C V ∗ | < ε}. τˆ0 = argmin{d(τ The smallest τˆ0 is employed when τˆ0 is not unique. The selection of ε is important in the above procedure. We have no answer for its optimal selection. Yonemoto and Yanagawa [18] examined the performance of the procedure when ε is C V ∗ /10 and C V ∗ /20 by simulation and recommended using ε = C V ∗ /20.
466
K. Yonemoto, T. Yanagawa / Statistical Methodology 4 (2007) 461–480
Fig. 1. Plots of (X t , X t+1 ) (t = 1, 2, . . . , 1000).
2.4. Estimating the skeleton Suppose that X 1 , X 2 , . . . , X N are observed. Then we estimate F(x) by NP −τˆ0
Fˆ N (x) =
i=(dˆ0 −1)τˆ0 +1
(dˆ0 ,τˆ0 )
K dˆ0 ,h (x − Xi
NP −τˆ0 i=(dˆ0 −1)τˆ0 +1
)X i+τˆ0 .
(dˆ ,τˆ ) K dˆ0 ,h (x − Xi 0 0 )
For x = t (x1 , x2 , . . . , xdˆ0 ), put Fˆ N (x) = t ( Fˆ N (x), x1 , . . . , xdˆ0 −1 ). 0
ˆ N ,0 , we generate {Y ˆ N ,t }t=1,2,...,M by Giving an appropriate initial vector Y 0 ,1) ˆ (Nd,t0 ,1) = Fˆ N (Y ˆ (Nd,t−1 Y ),
ˆ
ˆ
t = 1, 2, . . . , M,
(10)
where ˆ (Nd,t0 ,1) = t (Yˆ N ,t , Yˆ N ,t−1 , . . . , Yˆ Y N ,t−dˆ0 +1 ). ˆ
Note that K dˆ0 ,h (x) is defined in Section 2.3 with d = dˆ0 . Example 1. Fig. 1(a) shows the plots of (X t , X t+1 ) where {X t } is generated from X t = cos(2.8X t−1 ) + 0.3X t−2 , and Fig. 1(b) shows the plots of (X t , X t+1 ) for {X t } generated from X t = cos(2.8X t−1 ) + 0.3X t−2 + εt , with εt ∼ N (0, 0.022 ). Fig. 2 shows the plots of (Yˆ N ,t , Yˆ N ,t+1 ) when N = 4000. The figure shows that the procedure reproduced the skeleton fairly well. 2.5. Estimating the Lyapunov exponent We propose an estimator of the Lyapunov exponent as follows. ˆ0 ,1) ˆ0 ,1) ˆ0 ,1) 1 ˆ (Nd,M ˆ (Nd,M−1 ˆ (Nd,1 ) · · · D Fˆ N (Y )k, log kD Fˆ N (Y )D Fˆ N (Y M where D Fˆ N is the matrix of partial derivatives of Fˆ N .
λˆ N ,M =
467
K. Yonemoto, T. Yanagawa / Statistical Methodology 4 (2007) 461–480
Fig. 2. Plots of (Yˆ4000,t , Yˆ4000,t+1 ) (t = 1, 2, . . . , 1000).
3. Consistency of the proposed estimator In this section, we prove the consistency of the estimator. The consistency of the estimators of the embedding dimension and delay time is proved in [7]. Hence in this section we suppose that the embedding dimension and delay time are known. Theorem 3.1. Under the assumptions that are given in the following subsection, it follows that, for any ε > 0, lim P(|λˆ N ,M − λ| > ε) = 0.
lim
M→∞ N →∞
3.1. Assumptions We assume the following assumptions for Theorem 3.1. (d0 ,τ0 )
Assumption 3.1. There exists a compact absorbing set G ⊂ Rd0 for {Xt t ≥ (d0 − 1)τ0 + 1, (d0 ,τ0 )
Xt
(d0 ,τ0 )
where Xt
∈G
(d ,τ )
(d ,τ )
}, i.e., for all
(d ,τ )
0 0 a.s. whenever X(d00 −1)τ , X(d00 −1)τ , . . . , Xd00τ0 0 ∈ G, 0 +1 0 +2
= t (X t , X t−τ0 , . . . , X t−(d0 −1)τ0 ).
Assumption 3.2. F is C 1 class on G. Assumption 3.3. (d ,τ )
(d ,τ )
(d ,τ )
0 0 X(d00 −1)τ , X(d00 −1)τ , . . . , Xd00τ0 0 ∈ G a.s. 0 +1 0 +2
Assumption 3.4. There exists γ > 0 such that (d0 ,τ0 )
P(Xt
∈ B) ≥ γ µ(B)
for any B ∈ BG and t ∈ {(d0 − 1)τ0 + 1, (d0 − 1)τ0 + 2, . . . , N }, where µ is the Lebesgue measure and BG is the Borel sigma algebra on G. Assumption 3.5. There exists Γ < ∞ such that (d0 ,τ0 )
P(Xt
∈ B) ≤ Γ µ(B)
for any B ∈ BG and t ∈ {(d0 − 1)τ0 + 1, (d0 − 1)τ0 + 2, . . . , N }.
468
K. Yonemoto, T. Yanagawa / Statistical Methodology 4 (2007) 461–480
Assumption 3.6. {X t } is φ-mixing, that is, for any positive integers k, n and N such that k ≥ (d0 − 1)τ0 + 1, n ≥ 1 and k + n ≤ N − τ0 , there exists φn ↓ 0 such that |P(A ∩ B) − P(A)P(B)| ≤ φn P(A) for any A (d0 ,τ0 )
(Xs
∈
k and B F(d 0 −1)τ0 +1
∈
(d ,τ )
N −τ
Fk+n 0 , where Fst is σ -algebra generated by
(d0 ,τ0 )
0 0 , X s+τ0 ), (Xs+1 , X s+1+τ0 ), . . . , (Xt
, X t+τ0 ).
Assumption 3.7. For some monotone increasing series {m N ∈ Z} such that 1 ≤ m N ≤ N − d0 τ0 − 1 for any integer N > d0 τ0 + 1, h d0 ,N given in (9) and φn satisfy the following conditions: there exists A > 0 such that
N φm N
for any integer N > d0 τ0 + 1,
and furthermore it follows that d
N h d00 ,N m N log N
→ ∞ as N → ∞.
Remark 3.1. Assumptions 3.4–3.7 are mathematically involved, but needed for consistency of the kernel estimator that will be shown numerically to work well below. For N ∈ N such that N ≥ d0 τ0 + 1, t ∈ {1, 2, . . . , M} and i ∈ {1, 2, . . . , d0 }, let Z N ,t,i = (X 1 , X 2 , . . . , X N , Yˆ N ,t , Yˆ N ,t−1 , . . . , Yˆ N ,t−i+2 , Yˆ N ,t−i , . . . , Yˆ N ,t−d0 +1 ), f (y|Z N ,t,i ) be a conditional probability density function of Yˆ N ,t−i+1 given Z N ,t,i , and G N ,t,i the probability measure of Z N ,t,i . For integers N ≥ d0 τ0 + 1, t ∈ {1, 2, . . . , M} and i ∈ {1, 2, . . . , d0 }, put W N ,t,i (x) = (Yˆ N ,t , Yˆ N ,t−1 , . . . , Yˆ N ,t−i+2 , x, Yˆ N ,t−i , . . . , Yˆ N ,t−d0 +1 ). We use the following lemmas for the proof of Theorem 3.5. Lemma 3.1. For given Z N ,t,i , if ( ) ∂ x (F(x) − Fˆ N (x)) · I{W N ,t,i (x)∈G} > 0 6= ∅, ∂ xi x=W N ,t,i (x) then under Assumption 3.2, there exist integers m (+) (Z N ,t,i ) and the finite collection (+) (+) {gk (Z N ,t,i )}k=1,2,...,m (+) (Z N ,t,i ) such that each gk (Z N ,t,i ) is a non-empty, connected and relative open set in {x | I{W N ,t,i (x)∈G} = 1}, and satisfying ( ∂ (F(x) − Fˆ N (x)) x ∂x
x=W N ,t,i (x)
i
(+)
(+)
) · I{W N ,t,i (x)∈G} > 0 =
and g j (Z N ,t,i ) ∩ gk (Z N ,t,i ) = ∅ for j 6= k.
m (+)[ (Z N ,t,i ) k=1
(+)
gk (Z N ,t,i ),
469
K. Yonemoto, T. Yanagawa / Statistical Methodology 4 (2007) 461–480
Proof. By Assumption 3.2 and the definition of Fˆ N , ∂∂xi (F(x) − Fˆ N (x)) is continuous on G. Hence {x| ∂∂xi (F(x) − Fˆ N (x))|x=W N ,t,i (x) · I{W N ,t,i (x)∈G} > 0} is a relative open set in {x|I{W N ,t,i (x)∈G} = 1}. Since G is a compact set, {x|I{W N ,t,i (x)∈G} = 1} is a compact set, too. Let gk+ (Z N ,t,i ) (k = 1, 2, . . . , m (+) (Z N ,t,i )) be non-empty, connected and relative open sets in {x|I{W N ,t,i (x)∈G} = 1}, and satisfying ( ) m (+) (Z ) [N ,t,i (+) ∂ x (F(x) − Fˆ N (x)) · I{W N ,t,i (x)∈G} > 0 = gk (Z N ,t,i ), ∂ xi x=W N ,t,i (x) k=1 (+)
(+)
and g j (Z N ,t,i ) ∩ gk (Z N ,t,i ) = ∅ for j 6= k. If m (+) (Z N ,t,i ) = ∞, then it contradicts the compactness of x|I{W N ,t,i (x)∈G} = 1 . Hence the assertion follows. Lemma 3.2. For given Z N ,t,i , if ( ) ∂ (F(x) − Fˆ N (x)) · I{W N ,t,i (x)∈G} < 0 6= ∅, x ∂ xi x=W N ,t,i (x) then under Assumption 3.2, there exist integer m (−) (Z N ,t,i ) and finite collection (−) (−) {gk (Z N ,t,i )}k=1,2,...,m (−) (Z N ,t,i ) such that each gk (Z N ,t,i ) is a non-empty, connected and relative open set in {x | I{W N ,t,i (x)∈G} = 1}, and satisfying ( ∂ x (F(x) − Fˆ N (x)) ∂x
x=W N ,t,i (x)
i
(−)
) · I{W N ,t,i (x)∈G} < 0
=
m (−)[ (Z N ,t,i )
(−)
gk (Z N ,t,i )
k=1
(−)
and g j (Z N ,t,i ) ∩ gk (Z N ,t,i ) = ∅ for j 6= k. Proof. The proof is similar to that of Lemma 3.1.
If the condition of the lemmas is violated, we put m (+) (Z N ,t,i ) = 0 and m (−) (Z N ,t,i ) = 0; more precisely, ) ( ∂ (F(x) − Fˆ N (x)) · I{W N ,t,i (x)∈G} > 0 = ∅, m (+) (Z N ,t,i ) = 0 if x ∂ xi x=W N ,t,i (x) and m
(−)
(Z N ,t,i ) = 0
( ∂ if x (F(x) − Fˆ N (x)) ∂x i
x=W N ,t,i (x)
) · I{W N ,t,i (x)∈G} < 0 = ∅.
We assume the following assumption. Assumption 3.8. For any t ∈ {1, 2, . . . , M} and i ∈ {1, 2, . . . , d0 }, there exists Jt,i < ∞ such that for any integer N ≥ d0 τ0 + 1 and y ∈ R except for at most countable number of y ∈ R, f (y|Z N ,t,i ) < Jt,i
a.e. G N ,t,i .
For the notation of Lemmas 3.1 and 3.2, we assume the following assumption.
470
K. Yonemoto, T. Yanagawa / Statistical Methodology 4 (2007) 461–480 (+)
Assumption 3.9. For any t ∈ {1, 2, . . . , M} and i ∈ {1, 2, . . . , d0 }, there exist m t,i < ∞ and (−)
m t,i < ∞ such that (+)
m (+) (Z N ,t,i ) ≤ m t,i
a.e. G N ,t,i
and
(−)
m (−) (Z N ,t,i ) ≤ m t,i
a.e. G N ,t,i
for any N ∈ N such that N ≥ d0 τ0 + 1. Remark 3.2. Assumptions 3.8 and 3.9 are used for the consistency of the Lyapunov exponent. 3.2. Theorems Theorem 3.2 (Collomb [5]). Under Assumptions 3.1–3.7, it follows that for any > 0, P sup |F(x) − Fˆ N (x)| > → 0 as N → ∞. x∈G (d ,1)
Select an initial vector Y0 0 (d ,1)
Y0 0
0 ,1) ˆ (d randomly from G with uniform probability, and set Y = N ,0
. Then the following theorems hold.
Theorem 3.3. Under Assumptions 3.1–3.7, it follows that, for any t ∈ {1, 2, . . . , M}, 0 ,1) (d ,1) ˆ (d Y → Yt 0 N ,t
in probability as N → ∞.
Proof. We prove the theorem by mathematical induction. From Theorem 3.2, for any > 0, (d ,1)
P(|Y1 − Yˆ N ,1 | > ) = P(|F(Y0 0
(d0 ,1)
ˆ N ,0 )| > ) ) − Fˆ N (Y
(d ,1) (d ,1) = P(|F(Y0 0 ) − Fˆ N (Y0 0 )| > ) ≤ P sup |F(x) − Fˆ N (x)| > x∈G
→0
as N → ∞. Pd0 −1 0 ,1) (d ,1) ˆ (d ˆ Hence P(kY1 0 − Y N ,1 k > ) ≤ P( i=0 |Y1−i − Y N ,1−i | > ) → 0 as N → ∞, that is, (d0 ,1) 0 ,1) (d ,1) (d0 ,1) 0 ˆ N ,1 → Y ˆ (d Y in probability as N → ∞. We assume that Y in probability N ,n−1 → Y 1
n−1
0 ,1) ˆ (d Hence P(Y N ,n−1 6∈ G) → 0 ,1) (d0 ,1) ˆ (d 0 as N → ∞. Since F is continuous on the compact set G, F(Y N ,n−1 ) → F(Yn−1 ) in probability as N → ∞. Therefore for any > 0,
as N → ∞. From Assumption 3.1 and
(d ,1) Y0 0
(d ,1)
∈ G,
(d0 ,1) Yn−1
∈
G◦.
(d0 ,1)
0 ˆ N ,n−1 )| > ) P(|Yn − Yˆ N ,n | > ) = P(|F(Yn−1 ) − Fˆ N (Y
(d ,1)
(d0 ,1)
0 ˆ N ,n−1 )| ≤ P(|F(Yn−1 ) − F(Y 0 ,1) ˆ (d ˆ ˆ (d0 ,1) + |F(Y N ,n−1 ) − FN (Y N ,n−1 )| > ) 0 ,1) (d0 ,1) ˆ (d ) − F(Y )| > ≤ P |F(Yn−1 N ,n−1 2 (d0 ,1) (d0 ,1) ˆ ˆ ˆ + P |F(Y N ,n−1 ) − FN (Y N ,n−1 )| > 2
471
K. Yonemoto, T. Yanagawa / Statistical Methodology 4 (2007) 461–480 (d ,1)
(d0 ,1)
0 ˆ N ,n−1 )| > ≤ P |F(Yn−1 ) − F(Y 2 (d0 ,1) (d0 ,1) ˆ ˆ ˆ + P |F(Y N ,n−1 ) − FN (Y N ,n−1 )| > 0 ,1) ˆ (d ˆ ˆ (d0 ,1) + P |F(Y N ,n−1 ) − FN (Y N ,n−1 )| > 0 ,1) (d0 ,1) ˆ (d ≤ P |F(Yn−1 ) − F(Y N ,n−1 )| > 2 (d0 ,1) (d0 ,1) ˆ N ,n−1 ) − Fˆ N (Y ˆ N ,n−1 )| > + P |F(Y
ˆ (d0 ,1) , Y N ,n−1 ∈ G 2 ˆ (d0 ,1) , Y N ,n−1 6∈ G 2 ˆ (d0 ,1) , Y N ,n−1 ∈ G 2
0 ,1) ˆ (d + P(Y N ,n−1 6∈ G)
→0 Hence
(d ,1) Yn 0
→
0 ,1) ˆ (d Y N ,n
as N → ∞.
in probability as N → ∞.
Theorem 3.4. Under Assumptions 3.1–3.7, it follows that ˆ E sup |F(x) − FN (x)| → 0 as N → ∞. x∈G
Proof. By Assumptions 3.1 and 3.3, there exists K 1 < ∞ such that |X t | < K 1 a.s. for any t ∈ {1, 2, . . . , N }. Since the Gaussian kernel is used, it follows that for any integer N ≥ d0 τ0 +1, sup | Fˆ N (x)| < K 1
a.s.
x∈G
Since F is continuous on the compact set G by Assumption 3.2, there exists K 2 < ∞ such that sup |F(x)| < K 2 . x∈G
Hence for any integer N ≥ d0 τ0 + 1, E sup |F(x) − Fˆ N (x)| ≤ K 1 + K 2 < ∞. x∈G
From this inequality, for any integer N ≥ d0 τ0 + 1 and any > 0, there exists c() such that < c() and E sup |F(x) − Fˆ N (x)| · I x∈G
sup |F(x)− Fˆ N (x)|>c()
< .
x∈G
Therefore E sup |F(x) − Fˆ N (x)| = E sup |F(x) − Fˆ N (x)| · I x∈G
x∈G
sup |F(x)− Fˆ N (x)|>c()
x∈G
+ E sup |F(x) − Fˆ N (x)| · I x∈G
sup |F(x)− Fˆ N (x)|≤
x∈G
+ E sup |F(x) − Fˆ N (x)| · I x∈G
472
K. Yonemoto, T. Yanagawa / Statistical Methodology 4 (2007) 461–480
≤ + + c() · P sup |F(x) − Fˆ N (x)| >
x∈G
→ 2
as N → ∞.
Consequently E[supx∈G |F(x) − Fˆ N (x)|] → 0 as N → ∞.
Theorem 3.5. Under Assumptions 3.1–3.9, it follows that λ M − λˆ N ,M = o p (1)
as N → ∞,
where λM =
1 (d ,1) (d0 ,1) (d ,1) ) · · · DF(Y1 0 )k. log kDF(Y M0 )DF(Y M−1 M
Proof. |λ M
1 (d ,1) (d0 ,1) (d ,1) ˆ ) · · · DF(Y1 0 )k − λ N ,M | = log kDF(Y M0 )DF(Y M−1 M 1 (d0 ,1) (d0 ,1) (d0 ,1) ˆ ˆ ˆ ˆ ˆ ˆ − log kD F N (Y N ,M )D F N (Y N ,M−1 ) · · · D F N (Y N ,1 )k M 1 (d ,1) (d0 ,1) (d ,1) (| log kDF(Y M0 )DF(Y M−1 ) · · · DF(Y1 0 )k ≤ M 0 ,1) ˆ (d ˆ (d0 ,1) ˆ (d0 ,1) − log kDF(Y N ,M )DF(Y N ,M−1 ) · · · DF(Y N ,1 )k|) 1 0 ,1) ˆ (d ˆ (d0 ,1) ˆ (d0 ,1) (| log kDF(Y N ,M )DF(Y N ,M−1 ) · · · DF(Y N ,1 )k M 0 ,1) ˆ (d ˆ ˆ (d0 ,1) ˆ ˆ (d0 ,1) − log kD Fˆ N (Y N ,M )D F N (Y N ,M−1 ) · · · D F N (Y N ,1 )k|). +
The first term on the right-hand side of the inequality converges to 0 in probability as N → ∞, 0 ,1) (d0 ,1) ˆ (d because DF is continuous on G by Assumption 3.2 and Y in probability N ,t converges to Yt as N → ∞ for any t ∈ {1, 2, . . . , M} by Theorem 3.3. We show that the second term converges to 0 in probability as N → ∞. Note that it is equivalent to show that for any t ∈ {1, 2, . . . , M}, 0 ,1) ˆ (d ˆ ˆ (d0 ,1) kDF(Y N ,t ) − D F N (Y N ,t )k → 0
(d0 ,1)
ˆ N ,t ) and in probability as N → ∞. Since for x = t (x1 , x2 , . . . , xd0 ), (1, i) elements of DF(Y (d0 ,1) ˆ N ,t ) are given respectively by D Fˆ N (Y ∂ F(x) ∂ xi x=Yˆ (d0 ,1) N ,t
and
∂ Fˆ N (x) ∂ xi
(d ,1)
,
ˆ 0 x=Y N ,t
it may be proved if we show that for any t ∈ {1, 2, . . . , M} and i ∈ {1, 2, . . . , d0 }, ˆ ∂ F(x) ∂ FN (x) → 0 in probability as N → ∞. − ∂ x ˆ (d0 ,1) ∂ xi ˆ (d0 ,1) i x=Y N ,t
x=Y N ,t
K. Yonemoto, T. Yanagawa / Statistical Methodology 4 (2007) 461–480
473
For any t ∈ {1, 2, . . . , M}, i ∈ {1, 2, . . . , d0 } and ε > 0, ˆ ∂ F(x) ∂ FN (x) > ε P ˆ (d0 ,1) − ∂ x ∂ x (d0 ,1) i i x=Y N ,t ˆ x=Y N ,t ˆ ∂ F(x) ∂ F (x) (d0 ,1) N > ε, Y ˆ N ,t ∈ G − = P ∂ xi ˆ (d0 ,1) ∂ xi x=Yˆ (dN ,t0 ,1) x=Y N ,t ˆ N (x) ∂ F(x) ∂ F (d0 ,1) > ε, Y ˆ N ,t 6∈ G − + P ∂ xi ˆ (d0 ,1) ∂ xi x=Yˆ (dN ,t0 ,1) x=Y N ,t (d ,1) ˆ N (x) ∂ F(x) ∂ F (d0 ,1) > ε, Y ˆ N ,t ∈ G + P Y ˆ N ,t0 6∈ G . ≤ P − ∂ xi ˆ (d0 ,1) ∂ xi x=Yˆ (dN ,t0 ,1) x=Y N ,t
By Theorem 3.3 and Assumption 3.3, it follows that (d0 ,1)
ˆ N ,t P(Y
6∈ G) → 0
as N → ∞.
We next consider the first term on the right-hand side of the inequality. By Theorem 3.4, Lemmas 3.1 and 3.2, Assumptions 3.8 and 3.9, for any N ∈ N such that N ≥ d0 τ0 + 1, t ∈ {1, 2, . . . , M} and i ∈ {1, 2, . . . , d0 }, we have " # ∂ E (F(x) − Fˆ N (x)) · I ˆ (d0 ,1) ∂ xi ˆ (d0 ,1) {Y N ,t ∈G} x=Y N ,t ## " " ∂ (F(x) − Fˆ N (x)) =E E · I ˆ (d0 ,1) |Z N ,t,i ∂ xi ˆ (d0 ,1) {Y N ,t ∈G} x=Y N ,t "Z # ∂ =E (F(x) − Fˆ N (x)) · I{W N ,t,i (x)∈G} f (x|Z N ,t,i )dx ∂ xi x=W N ,t,i (x) # "Z ∂ ≤ Jt,i · E (F(x) − Fˆ N (x)) · I{W N ,t,i (x)∈G} dx ∂ xi x=W N ,t,i (x) (+) m X (Z N ,t,i ) Z ∂ = Jt,i · E (F(x) − Fˆ N (x)) dx (+) gk (Z N ,t,i ) ∂ x i x=W N ,t,i (x) k=1 m (−)X (Z N ,t,i ) Z ∂ − (F(x) − Fˆ N (x)) dx (−) ∂ gk (Z N ,t,i ) x i x=W N ,t,i (x) k=1 (+) m X (Z N ,t,i ) = Jt,i · E sup {F(W N ,t,i (x)) − Fˆ N (W N ,t,i (x))} k=1
(+)
x∈gk (Z N ,t,i )
−
inf
(+)
{F(W N ,t,i (x)) − Fˆ N (W N ,t,i (x))}
x∈gk (Z N ,t,i )
474
K. Yonemoto, T. Yanagawa / Statistical Methodology 4 (2007) 461–480
−
m (−)X (Z N ,t,i )
inf
(−)
{F(W N ,t,i (x)) − Fˆ N (W N ,t,i (x))}
x∈gk (Z N ,t,i )
k=1
{F(W N ,t,i (x)) − Fˆ N (W N ,t,i (x))}
sup
−
(−)
x∈gk (Z N ,t,i )
≤ 2Jt,i
(+) (−) ˆ · E (m (Z N ,t,i ) + m (Z N ,t,i )) · sup |F(x) − FN (x)| x∈G (+) (−) ˆ · (m t,i + m t,i ) · E sup |F(x) − FN (x)|
→0
as N → ∞.
≤ 2Jt,i
x∈G
Hence for any t ∈ {1, 2, . . . , M} and i ∈ {1, 2, . . . , d0 }, ˆ N (x) ∂ F(x) ∂ F · I (d ,1) → 0 E − ∂ xi ˆ (d0 ,1) {Yˆ N ,t0 ∈G} ∂ xi x=Yˆ (d0 ,1) N ,t
as N → ∞.
x=Y N ,t
Thus for any t ∈ {1, 2, . . . , M}, i ∈ {1, 2, . . . , d0 } and ε > 0, it follows that ˆ N (x) ∂ F(x) ∂ F (d0 ,1) > ε, Y ˆ N ,t ∈ G → 0 as N → ∞. − P ∂ xi ˆ (d0 ,1) ∂ xi x=Yˆ (d0 ,1) N ,t
x=Y N ,t
Therefore |λ M − λˆ N ,M | = o p (1) as N → ∞.
Proof of Theorem 3.1. For any ε > 0, P(|λ − λˆ N ,M | > ε) ≤ P(|λ − λ M | + |λ M − λˆ N ,M | > ε) ε ε + P |λ M − λˆ N ,M | > . ≤ P |λ − λ M | > 2 2 Hence from the definition of the Lyapunov exponent and Theorem 3.5, for any ε > 0, lim
lim P(|λ − λˆ N ,M | > ε) = 0.
M→∞ N →∞
4. Numerical evaluation In this section, we evaluate numerically the behavior of the proposed procedure using a cosine map and a Henon map. We generated data as follows. (1) Data of size N ∈ {1000, 2000, 3000, 4000, 5000, 10 000} were considered. (2) 11 000 points were generated by giving an appropriate initial vector to model, the first 1000 points were discarded, and then remaining points were used as the data of size 10 000. (3) For the data of size 10 000, the first N points were used as the data of size N . In this section, we estimate the embedding dimension and delay time by the method described V∗ in Section 2.3 with = C20 , and generate 20 data sets of size M = 1000 from the estimated
(d,τˆ ) skeleton for each c ∈ {0.05, 0.1, 0.15, 0.2} by using different initial vectors yˆ 0 which were ˆ
ˆ τˆ ) ˆ τˆ ) ˆ τˆ ) (d, (d, (d, ,x ˆ , . . . , x N }. ˆ (d−1) τˆ +1 (d−1) τˆ +2
randomly selected from data {x
ˆ
Note that h = c × N −1/(2d+1) .
K. Yonemoto, T. Yanagawa / Statistical Methodology 4 (2007) 461–480
(a) δ = 0.01.
475
(b) δ = 0.02.
(c) δ = 0.03. Fig. 3. Plots of (xt , xt+1 ).
4.1. Cosine map 4.1.1. Cosine map with uniform noise We consider the following stochastic cosine map, X t = cos(2.8X t−1 ) + 0.3X t−2 + εt ,
(11)
where εt is a uniform random variable on the range [−δ, δ]. For each δ ∈ {0.01, 0.02, 0.03}, we selected randomly an initial vector (x1 , x2 ) from [0, 1] × [0, 1]. Fig. 3 shows plots of (xt , xt+1 ) (t = 1, 2, . . . , 10 000) for each δ. For all combination of σ and N , the true values of d0 and τ0 were estimated, that is, dˆ0 = 2 and τˆ0 = 1. Now the Lyapunov exponents were estimated from the 20 regenerated data of size M = 1000 for each σ , c and N . Table 1 shows the mean values of 20 estimates. The true Lyapunov exponent of the cosine map without noise is about 0.49. We often obtained bad estimates for some c when δ = 0.03 and N ≤ 5000. However, for each c and δ, the estimates come close to the true value as N become large. 4.1.2. Cosine map with Gaussian noise Next, we consider a cosine map with Gaussian noise, i.e., X t = cos(2.8X t−1 ) + 0.3X t−2 + εt , N (0, σ 2 ).
(12)
εt ∼ For each σ ∈ {0.01, 0.02}, we selected randomly an initial vector (x1 , x2 ) from [0, 1] × [0, 1]. Fig. 4 shows plots of (xt , xt+1 ) (t = 1, 2, . . . , 10 000) for each σ . For all combinations of σ and N , the true embedding dimension and delay time were estimated, that is, dˆ0 = 2 and τˆ0 = 1. We estimated the Lyapunov exponents from 20 regenerated data for each σ , c and N . Table 2 shows the mean values of 20 estimates for each σ , c and N . For each c and σ , the estimates come close to the true value as N become large except for the combination of c = 0.05 and σ = 0.01.
476
K. Yonemoto, T. Yanagawa / Statistical Methodology 4 (2007) 461–480
Table 1 Estimated Lyapunov exponent from a cosine map with uniform noise N = 1000
N = 2000
N = 3000
−0.049 0.410 0.388 0.420
0.420 0.452 0.421 0.403
0.445 0.437 0.462 0.427
0.345 0.382 −0.038 0.200
0.372 0.416 0.402 0.342
0.322 0.397 0.015 0.393
0.434 0.475 0.169 0.407
N = 4000
N = 5000
N = 10 000
0.435 0.453 0.471 0.429
0.450 0.472 0.480 0.438
0.461 0.482 0.488 0.466
0.407 0.428 0.404 0.366
0.426 0.454 0.428 0.393
0.353 0.465 0.438 0.392
0.471 0.452 0.466 0.442
0.473 0.466 0.002 0.380
0.472 0.469 −0.341 0.404
0.456 0.171 0.441 0.174
0.470 0.479 0.469 0.449
(a) δ = 0.01 c = 0.05 c = 0.10 c = 0.15 c = 0.20 (b) δ = 0.02 c = 0.05 c = 0.10 c = 0.15 c = 0.20 (c) δ = 0.03 c = 0.05 c = 0.10 c = 0.15 c = 0.20
(a) σ = 0.01.
(b) σ = 0.02. Fig. 4. Plots of (xt , xt+1 ).
4.1.3. Cosine map without noise Next, we consider a deterministic cosine map, i.e., X t = cos(2.8X t−1 ) + 0.3X t−2 .
(13)
We selected randomly an initial vector (x1 , x2 ) from [0, 1] × [0, 1]. Fig. 5 shows plots of (xt , xt+1 ) (t = 1, 2, . . . , 10 000). For all N , the true embedding dimension and delay time were estimated, that is, dˆ0 = 2 and τˆ0 = 1. Next, we estimated the Lyapunov exponent from 20 regenerated data for each c and N . Table 3 summarizes the means values of 20 estimates for each c and N . For each c and σ , the estimates come close to the true value as N becomes large. 4.2. Henon map 4.2.1. Henon map with uniform noise Next, we consider a Henon map with uniform noise, i.e., 2 X t = 1 − 1.4X t−1 + 0.3X t−2 + εt
(14)
477
K. Yonemoto, T. Yanagawa / Statistical Methodology 4 (2007) 461–480 Table 2 Estimated Lyapunov exponent for a cosine map with Gaussian noise N = 1000
N = 2000
N = 3000
N = 4000
N = 5000
N = 10 000
0.343 0.437 −0.146 0.456
0.392 0.439 0.420 0.445
0.393 0.425 0.454 0.446
0.419 0.465 0.458 0.443
0.433 0.469 0.451 0.441
0.209 0.462 0.467 0.429
0.337 0.423 0.351 0.414
0.414 0.444 0.416 0.417
0.450 0.450 0.420 0.400
0.381 0.442 0.416 0.426
0.478 0.432 0.430 0.402
0.482 0.471 0.460 0.406
(a) σ = 0.01 c = 0.05 c = 0.10 c = 0.15 c = 0.20 (b) σ = 0.02 c = 0.05 c = 0.10 c = 0.15 c = 0.20
Table 3 Estimated Lyapunov exponent for a cosine map without noise
c = 0.05 c = 0.10 c = 0.15 c = 0.20
N = 1000
N = 2000
N = 3000
N = 4000
N = 5000
N = 10 000
0.287 0.425 0.418 0.383
0.414 0.455 0.454 0.404
0.433 0.469 0.455 0.442
0.447 0.479 0.463 0.464
0.453 0.475 0.471 0.453
0.464 0.465 0.493 0.480
Fig. 5. Plots of (xt , xt+1 ).
where εt is a uniform random variable on the range [−0.012, 0.012]. This range was selected by McCaffrey et al. [11] to avoid data going to infinity. An initial vector (x1 , x2 ) was selected randomly from the trapping region which was introduced by Henon [8]. Fig. 6 shows plots of (xt , xt+1 ) (t = 1, 2, . . . , 10 000). For all N , the true embedding dimension and delay time were estimated, that is, dˆ0 = 2 and τˆ0 = 1. Next, we estimated the Lyapunov exponent from 20 regenerated data for each c and N . Table 4 shows the mean values of estimates. The true Lyapunov exponent of the deterministic Henon map is 0.418. For each c, the estimates come close to the true value as N becomes large. 4.2.2. Henon map without noise Last, we consider a deterministic Henon map, i.e., 2 X t = 1 − 1.4X t−1 + 0.3X t−2 .
(15)
478
K. Yonemoto, T. Yanagawa / Statistical Methodology 4 (2007) 461–480
Fig. 6. Plots of (xt , xt+1 ).
Fig. 7. Plots of (xt , xt+1 ). Table 4 Estimated Lyapunov exponent for a Henon map with uniform noise
c = 0.05 c = 0.10 c = 0.15 c = 0.20
N = 1000
N = 2000
N = 3000
N = 4000
N = 5000
N = 10 000
0.142 0.069 0.284 0.326
0.253 0.331 0.362 0.331
0.282 0.351 0.368 0.261
0.351 0.387 0.370 0.364
0.346 0.387 0.385 0.370
0.370 0.396 0.391 0.371
Table 5 Estimated Lyapunov exponent for a Henon map without noise
c = 0.05 c = 0.10 c = 0.15 c = 0.20
N = 1000
N = 2000
N = 3000
N = 4000
N = 5000
N = 10 000
0.246 0.360 0.369 0.400
0.044 0.374 0.395 0.410
0.276 0.375 0.379 0.394
0.311 0.365 0.375 0.368
0.337 0.392 0.380 0.390
0.352 0.383 0.385 0.384
An initial vector (x1 , x2 ) was randomly selected from the trapping region. Fig. 7 shows plots of (xt , xt+1 ) (t = 1, 2, . . . , 10 000). For all N , the true embedding dimension and delay time were estimated, that is, dˆ0 = 2 and τˆ0 = 1. Next, we estimated the Lyapunov exponent from 20 regenerated data for each c and N . Table 5 shows the mean values of estimates. For each c, the estimates come close to the true value as N becomes large. 5. Discussion Assumption 3.1 may seem strange. However, many data generated from a chaotic system are scattered around in a compact space. So we expect that this assumption is satisfied.
K. Yonemoto, T. Yanagawa / Statistical Methodology 4 (2007) 461–480
479
In Section 4, we applied the method to some artificial data. Assumptions 3.1 and 3.2 are satisfied if the data are generated from model (11), (13), (14) or (15). Although we selected randomly an initial vector from a specified interval, we expect that Assumption 3.3 is satisfied for all data used in Section 4 because the first 1000 points were discarded. Data generated from model (11), (13), (14) or (15) satisfy Assumptions 3.4 and 3.5. Data generated from model (11) are geometrically ergodic from Theorem 3.1 of An and Huang [1]. Now absorbing set G is bounded. So if support of the noise density is sufficiently small then the data are uniformly ergodic, i.e., φ-mixing from Theorem 1 of Chan and Tong [4]. Hence the data generated from model (11) satisfy Assumption 3.6. From Theorem 4.2 of Bradley [2], φ(n) converges to 0 exponentially fast for model (11). So model (11) satisfies Assumption 3.7 under bandwidth h = c × N −1/(2d+1) . Assumption 3.8 is satisfied for all models. Data generated from model (11), (13), (14) or (15) satisfy Assumption 3.9. In summary, if data are generated from model (11) then all assumptions are satisfied. For data generated from model (14), we cannot check whether the data satisfy Assumptions 3.6 and 3.7 or not. If data are generated from model (13) or (15) then Assumptions 3.6 and 3.7 are not satisfied. And if data are generated from model (12) then Assumptions 3.1, 3.2, 3.4, 3.5 and 3.9 are not satisfied. However, our method worked well even if the dynamic noise has a Gaussian distribution, that is, the data follow model (12). We consider that even if the dynamic noise has a Gaussian distribution, our method may work well when data scatter around the attractor of the skeleton. We investigated the numerical behavior of our procedure for specified bandwidths. It is important for our procedure to select an appropriate bandwidth. We consider that if we can use an appropriate bandwidth then our procedure may work better. It is needed to construct how to select an appropriate bandwidth h for our estimator. We would like to undertake this in a follow-up paper. References [1] H.Z. An, F.C. Huang, The geometric ergodicity of nonlinear autoregressive models, Stat. Sinica 6 (1996) 943–956. [2] R. Bradley, Basic properties of strong mixing conditions, in: E. Eberlein, M.S. Taqqu (Eds.), Dependence in Probability and Statistics: A Survey of Recent Results, Birkh¨auser, Boston, 1986, pp. 165–192. [3] R. Cawley, G. Hsu, Local-geometric-projection method for noise reduction in chaotic maps and flows, Phys. Rev. A 46 (1992) 3057–3082. [4] K.S. Chan, H. Tong, A note on noisy chaos, J. R. Stat. Soc. Ser. B 56 (2) (1994) 301–311. [5] G. Collomb, Propri´et´es de convergence presque compl`ete du pr´edicteur a` noyau, Z. Wahrscheinlichkeitstheorie verw. Geviete 66 (1984) 441–460. [6] B. Dennis, R.A. Desharnais, J.M. Cushing, S.M. Henson, R.F. Costantino, Can noise induce chaos? OIKOS 102 (2003) 329–339. [7] K. Fueda, T. Yanagawa, Estimating the embedding dimension and delay time from chaotic time series with dynamic noise, J. Japan Statist. Soc. 31 (1) (2001) 27–38. [8] M. Henon, A two-dimensional mapping with a strange attractor, Commun. Math. Phys. 50 (1976) 69–77. [9] K. Kostelich, J.A. Yorke, Noise reduction-finding the simplest dynamical system consistent with the data, Physica D 41 (1990) 183–196. [10] P.S. Landa, M.G. Rozenblum, A comparison of methods for constructing a phase space and determining the dimension of an attractor from experimental data, Sov. Phys. Tech. Phys. 34 (1989) 1229–1232. [11] D.F. McCaffrey, S. Ellner, A.R. Gallant, D. Nychka, Estimating the Lyapunov exponent of a chaotic system with nonparametric regression, J. Amer. Statist. Assoc. 87 (1992) 682–695. [12] E.A. Nadaraya, On estimating regression, Theory Probab. Appl. 9 (1964) 141–142. [13] A.S. Pikovsky, Discrete-time dynamic noise filtering, Radio. Eng. Electron. Phys. 9 (1986) 81. [14] T. Sauer, A noise reduction method for signals from nonlinear systems, Physica D 58 (1992) 193–201. [15] G.S. Watson, Smooth regression analysis, Sankhy¯a Ser. A 26 (1964) 359–372. [16] Y. Whang, O. Linton, The asymptotic distribution of nonparametric estimates of the Lyapunov exponent for stochastic time series, J. Econometrics 91 (1999) 1–42.
480
K. Yonemoto, T. Yanagawa / Statistical Methodology 4 (2007) 461–480
[17] Q. Yao, H. Tong, Quantifying the influence of initial value on nonlinear prediction, J. R. Stat. Soc. B56 (1994) 701–725. [18] K. Yonemoto, T. Yanagawa, Estimating the embedding dimension and delay time of chaotic time series by an autoregressive model, Bull. Inform. Cybernet. 33 (1–2) (2001) 53–62.