Journal of Statistical Planning and Inference 136 (2006) 3319 – 3338 www.elsevier.com/locate/jspi
Non parametric resampling for stationary Markov processes: The local grid bootstrap approach Valérie Monbeta,∗ , Pierre-François Marteaub a UBS/SABRES, Campus Tohannic, F-56000 Vannes, France b UBS/VALORIA, Campus Tohannic, F-56000 Vannes, France
Received 25 April 2003; accepted 16 November 2004 Available online 20 April 2005
Abstract A new resampling technique, referred as “local grid bootstrap” (LGB), based on nonparametric local bootstrap and applicable to a wide range of stationary general space Markov processes is proposed. This nonparametric technique resamples local neighborhoods defined around the true samples of the observed multivariate time serie. The asymptotic behavior of this resampling procedure is studied in detail. Applications to linear and nonlinear (in particular chaotic) simulated time series are presented, and compared to Paparoditis and Politis [2002. J. Statist. Plan. Inf. 108, 301–328] approach, referred as “local bootstrap” (LB) and developed in earlier similar works. The method shows to be efficient and robust even when the length of the observed time series is reasonably small. © 2005 Elsevier B.V. All rights reserved. Keywords: Nonlinear time series; Markov chains; Resampling; Smoothed bootstrap; Nonparametric estimation
1. Introduction Simulation is a powerful tool that can be used to investigate the behavior of any estimation procedure when the objective is to completely specify the sampled population. In these settings, one can obtain estimates for standard errors of parameter estimators even when the standard error formulas have not been (or cannot be) determined analytically. Furthermore, one can determine confidence intervals for unknown parameter values via simulation. When ∗ Corresponding author. Tel.: +33 2 9701 7225; fax: +33 2 9701 7200.
E-mail addresses:
[email protected] (V. Monbet),
[email protected] (P.-F. Marteau). 0378-3758/$ - see front matter © 2005 Elsevier B.V. All rights reserved. doi:10.1016/j.jspi.2004.11.014
3320
V. Monbet, P.-F. Marteau / Journal of Statistical Planning and Inference 136 (2006) 3319 – 3338
one is unable to completely specify the population under study because one just knows a sample, bootstrap methods still provide ways to obtain standard error estimates and/or confidence intervals. Bootstrap is a general method (more precisely, a collection of methods) which makes use of the information contained in a single sample from the population of interest, in conjunction with simulation results, to provide information about the distribution of a statistic. The parametric bootstrap requires partially specifying the population under study assuming a particular family of probability distributions. The parameters of the chosen family are estimated. Nonparametric bootstrap does not require any explicit assumptions about the population’s distribution but uses the single sample to provide an approximation for the population’s distribution. The idea behind the bootstrap is due to Efron (1979) who proposed an extension of the Jacknife in the case of i.i.d variables. Recently, several different approaches for bootstrapping stationary (or almost stationary) observations have been proposed in the literature. We can give a nonexhaustive list: the ‘residual bootstrap’ (cf. Freedman, 1984;Efron and Tibshirani, 1993), the block bootstrap (cf. Künsch, 1989; Lui and Singh, 1992; Politis and White, 2004), the blocks-of-blocks bootstrap (cf. Politis and Romano, 1992), the stationary bootstrap (cf. Politis and Romano , 1994), tapered bootstrap (cf. Paparoditis and Politis, 2001a, b) and the frequency domain bootstrap (cf. Franke and Härdle, 1992). Shao and Tu (1995) make an overview of these methods. Bühlmann (2002) proposes a paper where he compares block bootstrap, AR-sieve bootstrap and local bootstrap for time series. Härdle et al. (2003) discussed the accuracy of bootstrap methods for time series and describe some important unsolved problems. Horowitz, 2003 demonstrates some results for higher order statistics in the context of Markov process resampling. This paper introduces a new resampling procedure called the “local grid bootstrap” (LGB) applicable to strictly stationary general space and discrete time stochastic processes that follow a finite order Markovian structure with continuous joint probability density function. The LGB procedure proposed in this paper, can be considered as a smoothed version of local bootstrap algorithm of Paparoditis and Politis (2002). In practice, smoothing in bootstrap methods permits to improve estimation of some statistics, such as the probability to be in a given small set, when the sample size is small. LGB found its main roots in the paper by Paparoditis and Politis (2002) that proposes a nonparametric local bootstrap procedure for stationary stochastic processes that follow a general autoregressive structure and in the paper by Monbet and Marteau (2001) that presents a similar procedure for cyclostationary time series in the context of the analysis of sea-state processes. Both papers propose a local bootstrap for Markov processes based on a nonparametrical estimation of the transition probability density function. In their paper, Paparoditis and Politis (2002) demonstrate the asymptotic properties of the nonparametric local bootstrap procedure and application of the procedure in nonlinear time series analysis are considered and theoretically justified. In the following, we use the notation LB to refer to the bootstrap procedure of Paparoditis and Politis (2002). Note that the local bootstrap has been proposed and developed in the scope of various applications: Shi’s work (1991) focused on i.i.d regression, Falk and Reiss (1989) one addressed conditional curves bootstrapping, Lall and Sharman (1996) addressed the modelling of hydrological time series while Chan et al. (1997) tackled ecological series.
V. Monbet, P.-F. Marteau / Journal of Statistical Planning and Inference 136 (2006) 3319 – 3338
3321
Our initial concern is the statistical characterization of the behavior of complex dynamical systems that are submitted to long-term multivariate random inputs. Two possible approaches exist to handle such problems: a deterministic (or analytical) approach that allows short term (and often precise) prediction and a statistical modelling approach, for which a solution consists in synthesizing long-term input time series to excite a (simplified) physical model in order to test a large number of possible scenarios. Bootstrap methods are in this context of great importance since they can provide long-term sequences from a limited set of observed real data. For example, the characterization of the erosion of a coastal line is dependent on the long-term effect of random waves. In order to provide erosion statistics, long-term sea state time series are required that could be used as plausible inputs to the erosion models in experimentation. As a matter of fact, long-term (at a millennium scale) sea state observations are not available. Bootstrapping is thus necessary to build a statistical methodology for erosion characterization (cf. Monbet and Marteau, 2001; Ailliot et al., 2003). We address, in this paper, the resampling of process defined as follows: X = {Xt , t = 1, 2, . . .} denotes a d-dimensional stochastic process on a probability space (, F, P ) and follows a general Markovian structure, where t ∈ N and Xt ∈ Rd . The dimension d may be greater than 1. We assume that an integer p > 0 exists such that, for all t ∈ N, the state of the process X at time t depends only on the p previous states i.e., for every t ∈ N and for every Borel set A ⊂ R d , P (Xt+1 ∈ A|Xj , j t) = P (Xt ∈ A|Xj , t − p + 1 j t). This class of Markov processes includes a large number of nonlinear models used in time series analysis: linear and nonlinear autoregressive processes, chaotic processes such as Lorentz oscillator, sea state process parameters, etc. This paper proposes a bootstrap scheme to construct a time series Xˆ 1 , . . . , Xˆ k of any length k sampled from an observed series X1 , . . . , XT of fixed length T. The proposed scheme samples observed and unobserved states and restores the statistical properties of the reference time series under the assumption of continuity of the mapping x → Fx (.) = F (.|Xt = x) where F is the one step transition cumulative distribution function of X. As stated, the generated data Xˆ i are not necessarily observed in the sequence X =(Xi )i∈{1,...,T } . Nevertheless we show that the transition probabilities that govern the processes Xˆ and X are asymptotically equivalent where Xˆ = {Xˆ 1 , . . . , Xˆ n }. We also demonstrate on various examples that for fixed T we obtain a good approximation of some statistical properties of X. In the first part of the paper we give the general assumptions on the basis of which we develop the LGB scheme. In the second part asymptotic properties and behavior of the LGB procedure are studied. Finally, in the third part, we compare our results with those obtained by Paparoditis and Politis (2002) for linear and nonlinear autoregressive processes and we propose an example on a chaotic multivariate Markovian process. Proof of the theorems are given in the Appendix.
3322
V. Monbet, P.-F. Marteau / Journal of Statistical Planning and Inference 136 (2006) 3319 – 3338
2. The local grid bootstrap procedure 2.1. Notations and assumptions Let {Xt , t 1} be a d-dimensional stochastic process in (Rd , Bd ) where Bd is the Borel algebra over Rd . (A1) We suppose that there exists a positive integer p < ∞ such that the stochastic process {Yt , t > p} with Yt = {Xt , Xt−1 , . . . , Xt−p+1 } forms an aperiodic, strictly stationary and geometrically ergodic Markov chain on (Rdp , Bdp ) with transition probability function P (y, A) where A ⊂ Rd . We assume that this Markov chain admits a stationary distribution with continuous density fY with respect to Lebesgue measure. We also suppose that the transition distribution function Fy (x) = P (Xt+1 x|Yt = y) admits a continuous probability density function (x|y). Let us define, for any time t, the transition distribution Fy (Eq. 1) and the stationary distribution F (Eq. 2) of the observed (or reference) time series (Yt ): Fy (x) = P (Xt+1 < x|Yt = y), F (y) = P (Yt < y),
Xt+1 ∈ Rd , Yt ∈ Rdp ,
Yt ∈ Rdp .
(1) (2)
We need to introduce some assumptions on the distribution functions Fy and F to ensure the convergence of their estimates. (A2) F and Fy are absolutely continuous with respect to the Lebesgue measure on R dp and R p , respectively and have bounded densities. (A3) The mapping y ∈ S → Fy (.) is Prohorov continuous, S is a compact set of Rdp . (A4) For every time t, the stationary probability density function fY of F is positive on a compact set S ⊂ R dp . S is defined for a given reference process such that Yt ∈ S a.s. for all time t. We remark that the compactness of the support of fY is a technical assumption which is nonrestrictive for the applications. For instance, every physical system with finite energy has states that take their values on a compact subset, most of the biological parameters are bounded, etc. The Prohorov continuity is defined in a topological space equipped with a Prohorov measure. If M denotes the space of all probability measures on (, F), the weak convergence topology on M is metrizable by the Prohorov distance dpr (., .). The distance dpr (G1 , G2 ) of two measures G1 and G2 is given by dpr (G1 , G2 ) = inf{|G1 (A) G2 (A ) + , ∀A ∈ },
(3)
where A = {y|d(A, y) < } and d is a distance in Rd . In the sequel, the density probability function of the stationary distribution and the transition probability will be approximated by appropriate kernel estimates. Let Kd be a probability density function on Rd , Kdp a probability density function on dp R and {hT , T =0, 1, 2, . . .} a sequence of positive numbers such that hT → 0 as T → ∞.
V. Monbet, P.-F. Marteau / Journal of Statistical Planning and Inference 136 (2006) 3319 – 3338
3323
We suppose that the density kernels Kd and Kdp satisfy the following conditions: (K1) Ki (.) is continuous, symmetric, bounded, i.e. sup{Ki (x) : x ∈ Ri } < ∞ for all i ∈ {d, dp}. (K2) Ri x 2 Ki (x) dx < ∞ for all i ∈ {d, dp}. (K3) : R → R is a kernel density satisfying conditions (K1)–(K2), with i = 1, such that Ki (x) = ij =1 (xj ) for all i ∈ {d, dp} and for x = (x1 , . . . , xi ) ∈ Ri . 2.2. The LGB resampling scheme The LGB resampling algorithm generates a series Xˆ 1 , Xˆ 2 , . . . , Xˆ N where the length N may be chosen independently from the length T of the observed sequence X =(Xi )i∈{1,...,T } . Let us denote Yˆt = {Xˆ t , Xˆ t−1 , . . . , Xˆ t−p+1 } the state of the generated sequence at time t. The generation of Xˆ k is obtained by assigning probabilities to a finite subset of convenient states and sampling this subset according to the discrete probability mass. The idea behind the generation of unobserved states is that, if the underlying transition distribution is regular enough (for instance continuous), it is possible through the kernel estimate to assign a probability to unobserved states around the observations {Xk } and to synthesize a new time series that includes observed and unobserved points. The set of “unobserved states” is defined by a discretization of the image, through the Markovian transition operator, of a neighborhood of the current point Yˆt . The resampling scheme may be described as follows: Initialization step: Select an initial state Yˆp , the bandwidth parameter hT , the width T of the neighborhood of a given state, nmin the minimum number of observations in the neighborhood and the grid parameters: the discretization step g and the edge length g = Ng ∗ g . g may depend on T . Step t: • Let us suppose that the state Yˆt is already sampled. The neighborhood V Y t of Yˆt is defined ˆ by the set of observed Yl ∈ {Y |d(Y, Yt ) T /2}. At this step, if |{Y |d(Y, Yˆt ) T /2}|=1, Yt] parameter T is increased in order that the set contains at least nmin points. We note I [V the set of time index such that for all l ∈ I [V Y t ], Yl ∈ V Y t . Furthermore, we define the image (V Y t )+ of V Y t by (V Y t )+ = {Xl+1 , l ∈ I [V Y t ]} ⊂ Rd . t • Now, a grid Gt = {g1 , . . . , gTgt } is built by discretizing a cube of Rd with grid step g and Y t )+ and the edge length edge length g . The cube is centered on the barycentre of (V Y t )+ . We note g is defined such that the cube includes at least all the elements of (V (t) + Y t ) ∪ Gt . GY T = (V (t) (t) • Let J be a discrete random variable tacking its values in I [GY T ]={k ∈ N, Xk ∈ GY T }, with probability mass function given by P (J = k) = Wk =
p(Yˆt , Xk+1 ) , ˆt , Xj +1 ) (t) p(Y
(t)
∀k ∈ I [GY T ],
j ∈I [GY T ]
p(Yˆt , Xk+1 ) = pk =
i∈I [V Yt]
Kd
Xk+1 − Xi+1 hT
Kdp
Yˆt − Yi hT
(4) .
(5)
3324
V. Monbet, P.-F. Marteau / Journal of Statistical Planning and Inference 136 (2006) 3319 – 3338
The sampled state at time t + 1 is such that Xˆ t+1 = XJ . The above procedure assigns sampling probabilities to a set of points that includes the successors of the observed neighbors of the last sampled state and the points of a local grid positioned around these successors. This discrete probability mass may be considered as transition probabilities between Yˆt and XJ . It depends both on the density of the original sequence around XJ and on the density of the observed state around Yˆt . As the kernels Kd and Kdp are continuous on Rd and Rdp respectively, we are able to assign probabilities to unobserved points of the grid and consequently to sample unobserved states. 2.3. Properties of the bootstrap According to the above sampling scheme, for any fixed length T of the observed series, the generated series {Xˆ t , t = 1, 2, . . .} evolves following a dependence structure to be characterized. Theorems 1 and 2 describe the asymptotic validity of the LGB procedure. Theorem 1. For given T and X1 , . . . , XT , for every kernel satisfying (K1) and every fixed kernel bandwidth hT , there exists with probability one t0 ∈ N such that the generated series ˆ T = {Yˆt ; t > t0 } is a positive recurrent, irreductible and aperiodic Markov chain. Y The probabilistic properties of the bootstrap Markov chain Yˆ depends on the chosen kernel and on its bandwidth parameter hT . Indeed, if hT is sufficiently close to 0, the generated time series will be an exact reproduction of the reference series. On the contrary, if hT is too large, the bootstrap procedure will be unable to restore the statistical properties of the reference Markov chain. The grid parameters have no influence on the estimation of the transition probabilities itself. We will see later that the grid parameters impact directly on the reproduction ratios of subsequences of the reference time series inside the sampled one. dp
Theorem 2. Under assumptions (K1)–(K3), (A1)–(A3), if hT → 0 and T hT → ∞ as T → ∞ and T = chT with c a positive constant. For almost all x ∈ S and y ∈ S, we have for the one step transition and the stationary distributions: (1) Fˆy (x) → Fy (x) weakly in Prohorov as T → ∞, (2) Fˆ (y) → F (y) weakly in Prohorov as T → ∞, where S is a compact subset of Rd such that fX (x) > 0 for all x ∈ S and S =S ×· · ·×S ⊂ Rdp . fX denotes the stationary probability density function of the random variable Xt . The first part of Theorem 2 ensures the weak convergence of the transition distribution function of the simulated series to the transition distribution of observed time series when the observation time becomes long enough. This result is important because a Markov process may be entirely specified by its transition distribution function when the stationary mode is achieved. The second part of the theorem gives the weak convergence of the stationary distribution function of the LGB Markov chain to the stationary distribution function of the reference
V. Monbet, P.-F. Marteau / Journal of Statistical Planning and Inference 136 (2006) 3319 – 3338
3325
time series. Furthermore, the weak convergence of the empirical distribution Fˆ to F allows, through the Delta method, to extend the convergence property to statistics of the form (Fˆ ) where is an Hadamard-differentiable function (see van der Vaart, 1998).
3. Applications 3.1. The choice of the resampling parameters For LGB procedure, the transition probabilities are estimated nonparametrically. One of the difficulties of kernel based estimation lies in the choice of the bandwidth parameter hT that is used to estimate the probability density functions. A solution consists first to assume that the reference process may be approximated by a Gaussian process and secondly to choose the best bandwidth parameter for the Gaussian process. Another solution is to test several values for the bandwidth parameter and to compare the sampled time series. One can, for instance, compare the transition and the stationary distributions of the reference and the sampled time series. A third alternative consists in choosing hT such that the local neighborhood contains at least a minimum of observed points as follows: select an initial bandwidth value; if the neighborhood contains less than points, increase the bandwidth until it contains more than observed points. For this last solution, for a fixed length T of the observed time series, hT depends on the density fY . Nevertheless, as T tends to infinity, hT dp needs to vanish and T hT to tends to infinity. The number can be estimated experimentally, for instance by cross-validation on a subsequence of the reference time series. Bandwidth parameter hT may also be linked with T that defined the width of the local neighborhood of the current state. A particular case occur when |{Y |d(Y, Yˆt ) T /2}| = 1, then the bandwidth neighborhood parameter T is increased until the number of neighbours is large enough, say greater than a given number nmin . In fact, such case occurs only for rare events (in general for extreme values) such that nmin is not a very sensitive parameter and it can be chosen low with respect to T. The grid parameters, length of edge g and step g are also sensitives. They determine the number Ng of the nonobserved states included in the neighborhood of the current point for the sampling: Ng = ( gg )d . When the ratio g /g tends to zero, LGB algorithm tends to become a standard LB procedure as none unobserved state is added to the observations. If g is large compared to hT , nonobserved states will be far from the center of the grid and they will have a very low probability to be sampled. Ideally, g has to be chosen so that each nonobserved state has a sufficiently large probability to be reached. Now, we can observe that for fixed g , the smaller g , the larger the number of accessible states. Unfortunately, the complexity of the algorithm increases exponentially with g , so that the grid step should not be too small. 3.2. Autoregressive models In this section simulation examples are proposed in order to evaluate the properties of the LGB procedure and to compare it with LB algorithm developed by
3326
V. Monbet, P.-F. Marteau / Journal of Statistical Planning and Inference 136 (2006) 3319 – 3338
Table 1 Estimated “exact” and bootstrap estimates of the standard deviation r,T of the sample lag-reversibility coefficient √ T − r Pˆr , r = 1, 2.
r,T
AR ARC NLAR AR ARC NLAR
AR ARC NLAR AR ARC NLAR
T = 100 √ T − 1Pˆ1 0.291 0.271 0.306 √ T − 2Pˆ2 0.281 0.258 0.274 T = 200 √ T − 1Pˆ1 0.291 0.271 0.306 √ T − 2Pˆ2 0.276 0.251 0.269
E
STD
MSE
∗r,T
ˆ r,T
∗r,T
ˆ r,T
∗r,T
0.292 0.299 0.283
0.300 0.303 0.296
0.020 0.017 0.021
0.020 0.057 0.024
4.01e−4 11.0e−4 9.70e−4
4.81e−4 4.3e−4 6.76e−4
0.287 0.289 0.275
0.290 0.293 0.284
0.019 0.021 0.020
0.019 0.065 0.017
3.97e−4 14.0e−4 4.01e−4
4.42e−4 54.0e−4 3.89e−4
0.294 0.293 0.287
0.299 0.296 0.294
0.017 0.018 0.019
0.019 0.021 0.027
2.98e−4 0.80e−4 7.22e−4
4.25e−4 0.11e−4 8.13e−4
0.281 0.284 0.271
0.284 0.284 0.281
0.017 0.018 0.017
0.021 0.021 0.019
3.14e−4 14.0e−4 2.93e−4
5.05e−4 15.0e−4 5.05e−4
ˆ r,T
Here, r,T denotes the estimated “exact” standard deviation, E(∗r,T ) the mean and STD(∗r,T ) the standard error for LB estimates, E(ˆ r,T ) the mean and STD(ˆ r,T ) the standard error for LGB estimates, MSE are mean square errors.
Paparoditis and Politis (2002). Firstly, the statistics used for validation in Paparoditis and Politis (2002) paper are calculated, e.g., first and second lag-reversibility coefficients and the first order autocorrelation coefficient. Secondly, comparison of LB and LGB algorithms is extended by plotting estimates of instantaneous probability density functions and some persistence statistics. Data series of size T = 100 and 200 have been generated for the following Markov linear and nonlinear models: AR: Xt = 0.8Xt−1 − 0.6Xt−2 + t , ARC: Xt = 0.8Xt−1 − 0.6Xt−2 + ut , 2 2 NLAR: Xt = 0.8 log(1 + 3Xt−1 ) − 0.6 log(1 + 3Xt−3 ) + t ,
where the errors {t } and the {ut } are i.i.d. The distribution of the t is Gaussian (0,1), while that of ut is a mixture of 90% Gaussian (−1, 1) and 10% Gaussian (1,9). Several parameters are estimated. For each of them, the standard deviation is calculated. Let us denote r,T the “exact” standard deviation of the considered parameter, ∗r,T the
V. Monbet, P.-F. Marteau / Journal of Statistical Planning and Inference 136 (2006) 3319 – 3338
3327
Table 2 Estimated “exact” and bootstrap estimates of the standard deviation T of the sample first order auto-correlation coefficient.
r,T
E
STD
∗r,T
ˆ r,T
∗r,T
MSE
ˆ r,T
∗r,T
ˆ r,T
T = 100 AR ARC NLAR
0.045 0.043 0.084
0.055 0.058 0.086
0.047 0.068 0.084
0.009 0.020 0.016
0.011 0.018 0.013
1.81e−4 6.25e−4 2.60e−4
1.25e−4 9.49e−4 1.69e−4
T = 200 AR ARC NLAR
0.031 0.031 0.059
0.036 0.035 0.058
0.046 0.045 0.059
0.005 0.008 0.009
0.007 0.010 0.009
0.50e−4 0.80e−4 0.82e−4
2.74e−4 2.96e−4 0.81e−4
Here, T denotes the estimated “exact” standard deviation, E(∗T ) the mean and STD(∗T ) the standard error for LB estimates, E(ˆ T ) the mean and STD(ˆ T ) the standard error for LGB estimates, MSE are mean square errors.
probability density function
0.2
Reference LB LGB
0.18 0.16 0.14 0.12 0.1 0.08 0.06 0.04 0.02 0 -20
-15
-10
-5
0
5
10
15
20
25
Fig. 1. Probability density function of the ARC model. Solid line corresponds to reference, dashed line to LB sampled series and dashed-point to LGB sampled series.
standard deviation obtained on the sampled series using LB procedure and ˆ r,T the standard deviation obtained on the sampled series using LGB algorithm. The LGB algorithm is compared to the LB procedure Paparoditis and Politis (2002) according to the estimation of three parameters: the sample lag-reversibility coefficient √ T − r Pˆr , r = 1, 2 and the first order auto-correlation coefficient rˆ1 . For a time series (Xt )t , the parameter Pr is the probability that Xt − Xt−r > 0. And for a given sample
3328
V. Monbet, P.-F. Marteau / Journal of Statistical Planning and Inference 136 (2006) 3319 – 3338
Reference
LB
LGB
-10
-10
-10
-5
-5
-5
0
0
0
5
5
5
10
10
10
15
15
15
-10
0
10
-10
0
10
-10
0
10
Autocorrelation function
1
Reference LB LGB
0.5
0
-0.5
0
1
2
3
4
5
6
7
8
9
10
Fig. 2. Upper graphics plot the bivariate distribution of (Xt , Xt+1 ) for ARC model. And the figure below is the autocorrelation functions for the same model. Solid line corresponds to reference, dashed line to LB sampled series and dashed-point line to LGB sampled series.
(Xt )t=1,...,T , Pr is straightforwardly estimated by Pˆr , defined as follows: Pˆr =
T 1 I(Xt − Xt−r ), T −r t=r+1
where I(x) = 1 if x > 0 and I(x) = 0 elsewhere. The tests are performed on the basis of 100 trials and 250 bootstraps. It may be verified experimentally that the increase in trial number does not change significantly the results. The bandwidth parameter hT is calculated using the method described by Paparoditis and Politis (2002): the studied process is approximated by a linear autoregressive model and hT is obtained by minimization of a quadratic risk. The grid parameters are chosen experimentally as follows: the edge length g is computed for every time t such that the cubic neighborhood includes all the observed Xk ∈ (VˆY t )+ and the discretization step is deduced by g =g /Ng with fixed Ng = 10. Results given in Table 1 for the lag-reversibility coefficients and in Table 2 for the first autocorrelation coefficient show that LGB procedure has about the same properties as those of LB. Indeed we observe that most of the computed means and
V. Monbet, P.-F. Marteau / Journal of Statistical Planning and Inference 136 (2006) 3319 – 3338
persistence over level 2
persistence over level 3
1
1
0.9
0.9
0.8
0.8
0.7
0.7
0.6
0.6
0.5
0.5
0.4
0.4
0.3
0.3
0.2
0.2 Reference LB LGB
0.1 0
3329
0
2
4
Reference LB LGB
0.1
6
0
0
2
4
6
Fig. 3. The figures plots the cumulative distribution of the length of the sejourn over levels 2 and 3. Solid line corresponds to reference, dashed line to LB sampled series and dashed-point line to LGB sampled series.
standard-deviations in the tables are of the same order for LGB and LB. But we remark that LGB tends to overestimate some standard-deviations STD(ˆ r,T ) especially for ARC model. As unobserved states are sampled, a small sampling noise may be added to the time series that induces an increase of the standard-deviation of the estimators. Furthermore, ARC process behave like an AR process with switching so that it may be difficult to be learned for short time series. A transformation of the data, such as a log transformation, could help to obtain better bootstrap estimates. However, the figures described below show that, although the standard deviations of the LGB estimates are slightly larger than the standard deviations of LB estimates, the bias of LGB estimates seems to be small. Figs. 1 and 4 show estimations of the stationary probability density functions of the reference signal and of the simulated series obtained by LB and LGB procedures for ARC and NLAR models. For these figures, 400 series of length T = 100 are computed and the mean is plotted. Fig. 1 (resp. 4) compares the kernel estimates of the stationary probability density functions for ARC model (resp. NLAR model). We remark that in both cases LGB better approximates the reference series. Figs. 2 and 5 evaluate the time dependence structure of the series. The upper part of these figures presents the kernel estimate of the bivariate probability density function of the couple (Xt , Xt+1 ). And the lower part of the figures plots the mean of the autocorrelation functions of the
3330
V. Monbet, P.-F. Marteau / Journal of Statistical Planning and Inference 136 (2006) 3319 – 3338
probability density function Reference LB LGB
0.3
0.25
0.2
0.15
0.1
0.05
0
-6
-4
-2
0
2
4
6
Fig. 4. Probability density function of NLAR model. Solid line corresponds to reference, dashed line to LB sampled series and dashed-point line to LGB sampled series.
400 series. For the AR model such a figure has no sense because the process is Gaussian and the first order correlation coefficient suffices to describe the dependence structure. We remark that both resampling procedures generate about the same results. LGB seems to better restore the bivariate distribution of (Xt , Xt+1 ) for NLAR model. Figs. 3 and 6 show the cumulative distribution of the length of sejourns (or persistence) over some given levels: level 2 for left graphics and level 3 for right graphics. This statistic is linked both with higher order time dependence structure and high values of the data. It describes the behaviour of the time series over a quite high level. Here we notice that the LGB technique proposed in this paper better restore the persistence distribution (Figs. 4 and 5). The results presented here for the linear and nonlinear one-dimensional autoregressive models (AR, ARC and NLAR) lead us to conclude that the LGB procedure presents globally the same performances as LB algorithm. Nevertheless, in LGB procedure, the estimation of the transition probability is more constrained since it combines a kernel estimate around the current state Yt with a kernel estimate on the image of a neighborhood of Yt through the Markovian transition operator. This can explain why LGB procedure seems to better capture some statistics that describe the time dependence structure as for instance the persistences statistics (Fig. 6).
V. Monbet, P.-F. Marteau / Journal of Statistical Planning and Inference 136 (2006) 3319 – 3338
Reference
Paparoditis
LGB
-5
-5
-5
0
0
0
5
5 -5
0
5
3331
5 -5
0
5
-5
0
5
Autocorrelation function
1
Reference LB LGB
0.8 0.6 0.4 0.2 0 -0.2 -0.4
0
1
2
3
4
5
6
7
8
9
10
Fig. 5. Upper graphics plot the bivariate distribution of (Xt , Xt+1 ) for ARC model. The figure below gives the autocorrelation functions for the same model. Solid line corresponds to reference, dashed line to LB sampled series and dashed-point line to LGB sampled series.
3.3. Lorentz attractor LGB permits also to resample multidimensional processes. The Lorentz dynamical system is chosen as example. This oscillator model has been used the first by Lorentz in meteorology to modelize turbulences consisting of rollers of parallel convection that appear in a horizontal layer of fluid heated in its inferior part. Simplified equations are given by x˙ = NP r (y − x), y˙ = −xz + rx − y, z˙ = xy − bz,
(6)
where NP r is the number of Prandtl and parameters r and b depends on Rayleigh number. This oscillator exhibits a chaotic behavior characterized by an attractor with two lobes as shown in Fig. 7. The chaotic series has been computed with NP r = 16, b = 4, r = 45.92 integrating Eq. (6) by a fourth-order Runge–Kutta method with time step t = 0.02. The dimension of the attractor is 2.06.
3332
V. Monbet, P.-F. Marteau / Journal of Statistical Planning and Inference 136 (2006) 3319 – 3338
persistence over level 2
persistence over level 3
1
1
0.9
0.9
0.8
0.8
0.7
0.7
0.6
0.6
0.5
0.5
0.4
0.4
0.3
0.3
0.2 0.1 0
Reference LB LGB
0.2
Reference LB LGB
0.1
0
2
4
6
8
10
0
0
1
2
3
4
5
Fig. 6. The figures plots the cumulative distribution of the length of the sejourns over levels 2 (left) and 3 (right). Solid line corresponds to reference, dashed line to LB sampled series and dashed-point line to LGB sampled series.
The LGB resampling procedure has been applied to the Lorentz oscillator trajectories to obtain the result given in Fig. 8. The parameters of the resampling algorithm (hT , g ) are initialized experimentally, then the bandwidth hT is increased such that the local neighborhood contains at least = 5 points. We observe that the 3-dimensional structure of the attractors is globally restored as well as the shape of the 1-dimensional time series. Indeed the LGB induces a small resampling noise and that prevent the capture of fine details of the fractal attractor.
4. Concluding remarks In this paper we have presented a nonparametric resampling procedure referred as LGB that can be used to model and synthesize multivariate Markov processes of general order p. This procedure extends the one proposed by Paparoditis and Politis (2002) in two ways: firstly, it copes with multivariate processes and secondly unobserved states can be generated by using the smoothness properties of the transition kernel. The last feature is obtained by
V. Monbet, P.-F. Marteau / Journal of Statistical Planning and Inference 136 (2006) 3319 – 3338
3333
reference
x
40
0
80 -40
z
60
50
40
y
20
0
0 -50 80 0 z
x
0 40
y
40
0 40
50
60 time (s)
70
80
Fig. 7. Reference trajectories of the Lorentz attractor. On the left: 3D trajectory. On the right: 1D time series for the same signal.
means of a grid used to discretize locally the image through the transition operator of the neighborhood of the current state. Theoretical results show that the proposed resampling procedure is asymptotically correct. Practically, to reproduce quite well the results obtained by Paparoditis and Politis (2002) on linear and nonlinear autoregressive processes is permitted by LGB, although the expected lower rate of convergence for our approach should require a little more observed samples. We show that the parameters associated to the local grid, namely the length of the grid edge and the size of the discretization step, permit to control straightforwardly the number and the length of the subsequences reproduced in the sampled time series. Even for large discretization steps, LGB reduces significantly the reproduction of such subsequences comparatively to the LB. Furthermore, some statistics seem to be better captured by the LGB than the LB, in particular this seems to be the case for the invariant densities and probabilities of sejourn within a compact subspace of the state space for the autoregressive examples that have been tested. Finally, the LGB is applicable to the sampling of more complex multivariate processes. Some tests performed on the chaotic Lorentz oscillator show that the shape of the attractor is well captured although some residual sampling noise prevent the extraction of fine details of the fractal structure of the attractor.
3334
V. Monbet, P.-F. Marteau / Journal of Statistical Planning and Inference 136 (2006) 3319 – 3338
sampled
x
40
0
80 -40
z
60
50
40
y
20
0
0 -50 80
x
z
0 40
y
0 40
0 40
50
60 time (s)
70
80
Fig. 8. Sampled trajectories of the Lorentz attractor. On the left: 3D trajectory. On the right: 1D time series for the same signal.
Appendix. Proofs ˜ T ) be the -field of all finite Proof of Theorem 1. For X1 , X2 , . . . , XT given, let B(E (k) p (k) p T T subsets of ET = (∪k=1 GY T ) where (∪k=1 GY T ) denotes the cartesian product of p (k) ˜ ˜ ˜ copies of ∪Tk=1 GY T . Let T = ∞ k=1 Ek,T where Ek,T is a copy of ET equipped with a copy of the -field B(E˜ T ) and denote by AT the product -field on T . Consider now the Markov chain Yˆ = {Yˆt , t 1} on the path space (T , AT , Pˆ0 ) defined as follows: for anyA ∈ AT and for 0 an appropriate initial distribution, Pˆ0 (Yˆ ∈ A) describes the probability of the event Yˆ ∈ A when L(Yˆ1 ) = 0 . Apart from the initial distribution 0 , Pˆ0 is determined by the transition probability kernel P (x, z) = P (Yˆt+1 = z|Yˆt = x) which ˜ T , there is a time k such is obtained by Eqs. (4) and (5). Indeed, for all x = (xp , . . . , x1 ) ∈ E (k) that P (x, GY T ) > 0 and ⎧ p(x, z ) (k) ⎨ if z ∈ GY T , (k) p(x, zj ) P (x, z) = (7) ⎩ j ∈I [GY T ] 0 otherwise, where z = (z , xp , . . . , x2 ) and p(.) is defined in Eq. (5). Let us now define ET = (k) ˜ T . For every kernel bandwidth h > 0 and every x ∈ E˜ T , we have (∪Tk=p GY T )p ⊂ E
V. Monbet, P.-F. Marteau / Journal of Statistical Planning and Inference 136 (2006) 3319 – 3338
3335
by definition z∈ET
P (x, z) =
P (x, z) = 1.
(k) {z|z ∈GY T }
Hence, ET is an absorbing communicating class. Furthermore, t0 ∈ N exists such that ˜ T \ET by Theorem 2.3 of Doob (1953). By Proposition P (Yˆt0 ∈ ET |Yˆ1 =x)=1 for all x ∈ E 4.1.2 of Meyn and Tweedie (1993) it follows then that an irreductible Markov chain denoted by YˆT exists whose state space is restricted to ET with transition matrix PET , i.e., the restriction of the matrix P = (P (x, z))x,z∈E˜ to the class ET . The positive recurrence of T the chain is a simple consequence of ET < ∞ where A denotes the cardinal of A. The aperiodicity of the chain follows from the properties of the reference Markov chain X and the fact that the kernels Kd and Kdp are positives every where for every bandwidth h > 0 by (K1). Proof of Theorem 2. Assertion 1 is a consequence of the following lemma which does most of the work for Prohorov consistency of Fˆy . Let us denote F0 the mapping y → Fy (.). Lemma 1. Let be a bounded measurable function that is continuous on a set of Fy probability 1. Then under conditions (i)–(iii) below
(x)dFˆy (x) →
(x)dFy (x) in pr.
´ continuous at y (i) F0 is lProhorov (ii) Wy → y in pr. (iii) ny → ∞ in pr where ny = ( k Wk2 )−1 by definition. We have to verify hypothesis (i) to (iii) of Lemma 1 in order to deduce Theorem 2. Hypothesis (i) is given by Assumption (A2). Hypothesis (ii) is trivial given the definition of weights Wk and the properties (K1)–(K3) of the density kernels Kd and Kdpand hT → 0 as T → ∞. Hypothesis (iii) is obtained if k Wk2 tends to 0 as n tends to infinity. Given the definition of kernel Kd (respectively Kdp ) it is not restrictive to suppose that each term pj of the sums in the weight Wk (Eqs. (4) and (5)) is bounded by two positive constants 0 < C1 pj C2 , ∀j on a compact neighborhood centered on (Xk+1 , Yˆt ) and of volume
3336
V. Monbet, P.-F. Marteau / Journal of Statistical Planning and Inference 136 (2006) 3319 – 3338
min(T hd(p+1) , support(Kd ) × support(Kdp )). Then, 2 pk 2 Wk = (t) p j ∈I [GY ] j (t) k
k∈I [GY T ]
T
(t) C1 I [GY T ](I [VˆY ])2 (t) C2 (I [GY T ])2 (I [VˆY ])2 C1 , (t) C2 I [GY T ] (t)
where A denotes the cardinal of the set A. Now, by construction, the set I [GY T ] contains the images {Yi } in the local neighborhood of the last generated state Yˆt , of diameter T = chT , of the last generated state Yˆt and the points of the grid defined on the image of this neighborhood with discretization step equal to g . Then, two positive constants C and C exit such that dp (t) I [GY T ] ∼ Cf (Yˆt )T T +
C dT dg
.
(8)
Constant C depends on the regularity of the Markov chain operator H : Yt → Xt+1 . It implies that ny = ( k Wk2 )−1 → ∞ when T → ∞. It may be observed in Eq. (8) that the first term is sufficient to ensure the weak convergence but the second term permits to get a faster convergence. The second term corresponds to number of points of the grid that we add before sampling. Assertion 2 is derived by means of the same arguments than those of the proof of Theorem 3.3 in Paparoditis and Politis (2002). The idea is to extract a subsequence of the sampled bootstrap. If Fˆ = Fˆn is the stationary distribution of the bootstrap sequence Yˆ for a given length n, by Helly’s selection theorem in Billingsley (1995), there exists a subsequence {Fˆk }k such that for all continuity points y ∈ Rdp of G lim Fˆk (y) = G(y),
k→∞
where G is a right continuous, nondecreasing function from Rdp into [0, 1]. Since the transition probability density function of the reference Markov process is strictly positive on the compact S, the sequence {Fˆn }n is tight and G is also a distribution function. Let g : Rd × Rdp → R be any bounded and continuous function, we get
g(x, y)dFˆ (x, y) = g(x, y)Fˆy (dx)dFˆ (y)
g(x, y)Fy (dx)dFˆ (y)
+ g(x, y)(Fˆy (dx) − Fy (dx))dFˆ (y)
→ g(x, y)Fy (dx)dG(y)
V. Monbet, P.-F. Marteau / Journal of Statistical Planning and Inference 136 (2006) 3319 – 3338
3337
as k → ∞, where the last convergence follows by first assertion of the theorem, the continuity of y → g(x, y)Fy (dx) and the weak convergence of Fˆk to G. Thus G is the stationary distribution of Y from which it follows G = F by the uniqueness of F. Since the above is true for any subsequence {Fˆk }k , we conclude by a corollary in Billingsley (1995, p. 381). Proof of Lemma 1. We will follow Owen’s scheme of proof in Owen (1987) for Lemma 1. Owen demonstrates the same result for independent variables. ¯ = (x)dFˆy (x) and ¯ = (x)dFˆy (x). Define i i Let B = supx |(x)| and > 0. Then
(x)dFˆy (x) −
(x)dFy (x) =
T
¯ − ¯ 1− Wi ((Xˆ i ) − )
i=1
n
Wi .
(9)
i=1
The second term in (9) converges to 0 weakly under (ii). By the continuous mapping theorem (Billingsley, 1968) there is an open set O ⊂ S such ¯ − | ¯ < . The first term from (9) may now be written that yi ∈ O implies | i
¯ = Wi ((Xi ) − )
¯ + Wi ((Xi ) − )
yi ∈O
¯ Wi ((Xi ) − ).
(10)
yi ∈O /
The second term in (10) converges weakly to 0 under (i) and (ii) because ¯ 2B. |(Xi ) − | Let |W | = |Wi |. Conditionally on the Ys the first term in (10) has expectation bounded in absolute value by |W | and variance bounded by 8B 2 /ny . Indeed, conditionally on the sample X, ⎛ Var ⎝
⎞
¯ ⎠ Wi ((Xi ) − |X
yi ∈O
⎡⎛
⎞2
⎤
⎢ ¯ ⎠ |X ⎥ E ⎣⎝ Wi ((Xi ) − ) ⎦
yi ∈O
yi ∈O
Wi 4B 2 +
¯ ¯ Wi Wj E[((Xi ) − )((X j ) − )]
yi ∈O yj ∈O ,j =i
8B 2 ny
applying (Wi − Wj )2 = Wi2 + Wj2 − 2Wi Wj > 0.
3338
V. Monbet, P.-F. Marteau / Journal of Statistical Planning and Inference 136 (2006) 3319 – 3338 2
If |W | < 2 and ny > 8B then by Chebychev inequality we obtain the conditional 3 probability that ¯ > 3 W ((X ) − ) (11) i i yi ∈O is less than . It follows that the unconditional probability ⎛ ⎞ ¯ > 3⎠ < P (ny b2 /3 ) + P (|W | 2) + → P ⎝ Wi ((Xi ) − ) yi ∈O by (ii) and (iii). This establishes the result of Lemma 1.
References Ailliot, P., Prevosto, M., Soukissian, T., Diamanti, C., Theodoulides, A., Politis C., 2003. Simulation of sea state parameters process to study the profitability of a miritme line. Proceedings of ISOPE Conference 2003. Billingsley, P., 1968. Convergence of Probability Measures. Wiley, NY. Billingsley, P., 1995. Probability and Measures. Wiley, NY. Bühlmann, P., 2002. Bootstraps for time series. Statist. Sci. 17, 52–72. Chan, K.S., Tong, H., Stenseth, N.C., 1997. Analyzing abundance data from periodically fluctuating rodent populations by threshold models: a nearest block bootstrap approach. Technical Report, No 258, Department of Statistics and Actuarial Science, University of Iowa. Doob, J.L., 1953. Stochastic Processes. Wiley, NY. Efron, B., 1979. Bootstrap methods: another look at the Jacknife. Ann. Statist. 7, 1–26. Efron, B., Tibshirani, R., 1993. An Introduction to the Bootstrap. Chapman Hall, NY. Falk, M., Reiss, R.D., 1989. Bootstrapping conditional curves. In: Jöckel, K.H., Rothe, G., Sendler,W. (Eds.), Bootstrapping and Related Techniques, Lecture Notes in Economics and Mathematical Systems, vol. 376. Springer, NY. Freedman, D.A., 1984. On bootstrapping two-stage least squares estimates in stationary linear models. Ann. Stat. 12, 827–842. Franke, J., Härdle, W., 1992. On bootstraping kernel spectral estimates. Ann. Stat. 20, 120–145. Horowitz, J.L., 2003. Bootstrap Methods for Markov Processes. Econometrica 71, 1049–1082. Härdle, W., Horowitz, J., Kreiss, J.P., 2003. Bootstrap methods for time series. Internat. Statist. Rev. 71, 435–459. Künsch, H.R., 1989. The jacknife and the bootstrap for general stationary observations. Ann. Stat. 17, 1217–1241. Lall, U., Sharman, A., 1996. A nearest neighbor bootstrap for resampling hydrologic time series. Water Resour. Res. 32, 679–693. Meyn, S.P., Tweedie, R.L., 1993. Markov Chains and Stochastic Stability. Springer, London. Monbet, V., Marteau, P.F., 2001. Continuous space discrete time Markov models for multivariate sea state parameter processes. Proc. ISOPE Conference 2001. Owen, A.B., 1987. Non parametric conditional estimation. Ph.D. Dissertation. Stanford University. Paparoditis, E., Politis, D.N., 2001a. Tapered block bootstrap. Biometrika 88 (4), 1105–1119. Paparoditis, E., Politis, D.N., 2001b. A Markovian local resampling scheme for nonparametric estimators in time series analysis. Econometric Theory 17 (3), 540–566. Paparoditis, E., Politis, D.N., 2002. The local bootstrap for Markov processes. J. Statist. Plan. Inf. 108, 301–328. Politis, D.N., Romano, J.P., 1994. The stationary bootstrap. JASA 89, 1303–1313 Politis, D.N., White, H., 2004. Automatic block-length selection for the dependent bootstrap. Econometric Rev. 23 (1), 53–70. Shao, J., Tu, D., 1995. The Jacknife and Bootstrap. Springer, NY. Shi, S.G., 1991. Local Bootstrap. Ann. Institut. Statist. Math. 43, 667–676. van der Vaart, A.W., 1998. Asymptotic Statistics. Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press, Cambridge.