A note on filtering for long memory processes

A note on filtering for long memory processes

MATHEMATICAL COMPUTER MODELLING PERGAMON Mathematical and Computer Modelling 34 (2001) 1139-1144 www.elsevier.com/locate/mcm A Note on Filtering...

425KB Sizes 7 Downloads 110 Views

MATHEMATICAL COMPUTER MODELLING PERGAMON

Mathematical

and Computer

Modelling

34 (2001)

1139-1144 www.elsevier.com/locate/mcm

A Note on Filtering for Long Memory Processes A. THAVANESWARAN Department of Statistics The University of Manitoba Winnipeg, Manitoba, Canada R3T 2N2

C. C. HEYDE Columbia University and Australian National University Canberra, ACT 0200, Australia Abstract-This paper illustrates the use of quasilikelihood methods of inference for a class of possibly long-memory processes such as H-sssi (self-similar stationary increments) processes and In particular, they can be used in a general derivation without long-range dependent sequences. assuming normality of the process; this extends the result of Gripenberg and Norros (11. Recursive filtering for models with linear intensity is also discussed in some detail. @ 2001 Elsevier Science Ltd. All rights rese’rved. Keywords-Filtering,

Long-memory,

Estimating

function,

Prediction

1. INTRODUCTION The problem of LRD (long-range dependence) in statistical applications was known to scientists and applied statisticians long before suitable models were formulated. Parsimonious models with such behavior are stationary processes with nonsummable correlations. Many classical limit theorems do not hold for these processes, and rates of convergence are slower than independence or weak dependence. Well known now is the phenomenon in hydrology, under the name “Hurst effect”. LRD cannot be modeled by any of the standard models like ARMA processes, since for these the correlations are summable and decay exponentially rather than hyperbolically. For data where the sample correlations indicate a decay of the order Ilclpa (0 < LY< l), one would have to choose an ARMA-process of very high order. Such dependence can be modeled in a parsimonious way by stationary processes with covariances -yk = Cov(Xt, Xt+k) satisfying yk N Ll(k)lk12H-2, (ICI+ 03, H E (l/2,1) (cy = 2 - 2H); for H E (l/2, l), the correlations are not summable. Statistical inference for long-memory processes is still in its early development (e.g., [2,3]). Even for the simple problems of location estimation and regression, considerable mathematical difficulties arise. Note that to calculate the best linear unbiased estimator (BLUE) of the location parameter, the sample

one would have to know or estimate mean is to be preferred [3-51.

0895-7177/01/s - see front matter PII: SO895-7177(01)00122-4

@ 2001 Elsevier

all covariances.

Science

Therefore,

Ltd. All rights reserved.

for practical

Typeset

purposes

by &Z-m

A. THAVANESWARAN AND C. C. HEYDE

1140

Gripenberg and Norros [l] have used the fractional Brownian dependence and studied the prediction problem in the continuous Brownian

motion

is a specific one-parameter

all lags. This lack of flexibility to model

a real-life

phenomenon.

treated.

In Section

3, some results

dependence

(LRD)

Gaussian

can be a problem In Section

via self-similar

process,

its covariances

if one wants to use fractional

2, filtering

of Samorodnisky processes

motion to model long-range time case. Because fractional

for long-memory and Taqqu

are determined Brownian

randdmsequences

[6] on generating

are given and are applied

at

motion

to continuous

is

long-range time LRD

processes.

2. FILTERING PROCESSES In this section, environments, stochastic

we consider

also known

random

the problem

as processes

processes,

FOR LONG-MEMORY (DISCRETE TIME) of filtering

driven

or hidden

in LRD stochastic

by a random

long-memory

process,

processes.

state

processes

in random

space models,

doubly

These models offer a fair amount

of flexibility in understanding and modeling time series data and are receiving considerable attention in the literature. Our approach to the problem of filtering is semiparametric and is based on the theory of estimating functions. Consider a possible LRD process having the form Xt = Bt + it, t = 1,. . . , n, where {it} is a possibly LRQ sequence with zero mean and known covariance structure. Estimation of Bt, on the parameter on {Xt} without making any restrictions t = 1,2,. . . ) n from n-observations sequence is an ill-posed problem. Let us assume that Bt follows a random walk Bt = 0t-l + at where {ut} is a white noise series having mean zero and variance 02. The following theorem on optimal estimation of et obtained by identifying the sources of variations gives the filtering formula for et. Let SE 1 be the a-field generated by X1, . . , Xt_1. THEOREM 2.1.

The optimal

estimate

o=

of Bt based on X1,. . . ,X,

is given by

n

et- &,

-

xt - Xtlt-1 rt

qt.-1



basedonX1,...,Xt-1,2+1 where P,I~_~ = E[(& -8,,,_l)2 ( Sty_,], 6 tt1 _ lisanestimateofBt is the predictor of Xt based on X1,. . . , Xt - 1, and rt its mean square error is 2

E Moreover,

P,lt-l

= P,_llt_l

[

Xt - Xtjt-1

1.

+ 0,” and 1

1

-ze/t

Ptit-1

-tL. Tt

PROOF. Follows by combining the optimal estimating functions as in [7,8]. The elementary estimating function for fixed Qt based on the tth observation is hit = (X, _?tIt_l)/~t, and the corresponding information associated with hit is l/rt. It is of interest to note (c.f. [9, p. 4721) that under the Gaussian assumption, the corresponding elementary estimat,ing function for the parameter of interest, say 8, is

(x1- %) gf 7’t NOW hZt, the corresponding

estimating

function

h2t = Combining

the estimating

functions

for Qt based on prior information,

et - &,,-, P,,+,

hit, hzt, the theorem

follows

is

1141

Long Memory Processes NOTES. 1. Formulas

to compute

the linear

predictors

for Xt can be given in, for example,

[9].

2. The superiority of the estimating function approach when the variance of et is random over the case with nonrandom variance is clear. This was also noted in [8]. 3. The theorem that

covariances EXAMPLE

is true

the recursive

regardless

as the recursive

2.2. Suppose

of LRD or SRD. However,

computation

of the so-called

estimate

innovation

the following variance

does not have the Markov

{Ed} has mean zero and covariance

example

rt involves

shows all the

property.

r(t, s). Then

the optimal

estimate

of & at time t is given by Xt eQ

=

g-1

-

qt_1

q-1

),

rt

with

and q-1

When

r(t, s) is known,

=

Pt-1/t-1

+

d.

X2,,,_, could also be calculated

&t-1

=

t-1

r(t,s)

1 -

r(s,s)

s=l

recursively

using

x s

and rt = r(t,t)

-

t-1 ?(t

C s=l

).

s)

r(*%s)

Moreover, when Et is a short-memory process, then the above recursive to the Kalman filtering algorithm for the state space model

algorithm

corresponds

et = et-1 + at, x1 = et+ Et, with independent It is possible example

errors. to investigate

if rt = c for all t, then

the asymptotics

Ptlt --+ constant

3. CONTINUOUS

of, e.g.,

Ptltunder various

assumptions.

For

as t --f co, etc.

TIME

CASE

In this section, we first give a lemma regarding the construction of LRD via self-similar processes. Then some properties of integrals with respect to long-memory process are listed and applied

to non-Gaussian

LRD processes

to obtain

predictors.

A continuous time process {Z(t), -co < + < m} is self-similar with exponent H > 0 if for aH Z(t), -co < t < 00) have the same finite-dimensional all a > 0, {Z(at), -co < t < 00) and { distributions. That is, P{Z(at~)~Z~,...,Z(at,)~Z,}=P{aNZ(t~)~Z~,...,uHZ(t,)~Z,}, for all a > 0, n = 1,2,. . . , and ti, t2,. . . , t,. A process {Z(t), -co < t < oo} is H-sssi if it is self-similar with exponent H > 0 and if, in addition, it poses stationary increments, that is, if the finite-dimensional distribution of the processes {Z(t + s) - Z(s), -oc < t < co} does not depend on s. For example, Brownian motion is l/2-sssi process.

A.

1142

LEMMA 3.1.

THAVANESWARAN

AND

C. C. HEYDE

The following statements are true.

(i) H < 1; (ii) Xt = Z(t + 1) - Z’(t), the one-step increments of Z(t) (H-sssi) are stationary, and pk A EX,2H(2H - l)k2H-2 as k + co; i.e., {Xt} is an LRD sequence. PROOF. Follows from Mandelbrot and VanNess [lo]. NOTE. The autocovariances of the stationary sequence {X,}

yk

= E&X,+,,

=

- 2k2H + Ik - 112H}

{jk+112H

yf

for k 2 0 and hence, satisfy Tk N EXfH(2H

satisfy the following:

- l)k2H-2 as k ---t co.

NOTE. (i) yk = 0 for H = l/2 and is negative if H < l/2. (ii) The Gaussian assumption was not made to obtain the covariance. PROPOSITION 3.2. Let {Zt}

be H-sssi for --03 < t < 00, with the autocovariance function r(t, s) = ;

where C = EZ2(1).

{

Jt12H + IS12H - It - S12H} 7

Th en the following statements are true.

(i) Cov(Zy,dZt)

dt.

= q

(ii) Cov(dZ,, dZt) = & & T(S, t) dt. PROOF. Cov(.Z,,dZt)

= E[Z, d-Z,] =liiE

[

= ; ki ;

(‘t+h

- ‘t) h

dt I

r(s, t + h) - T(% t> dt h I [

=f;lg

z

Z,

[ (1~1~~-t- It + h/2H- (t + h - s12H - (s12H- (ty

[2Hlt12H-’ - 2Hlt -

s12H-1 dt]

+ (s - t(2H) 1 h]

,

i.e., Cov(Z,, d.Z,) = H [lt(2H-1 - It - sI~~-~] dt = ;

r(s, t) dt.

Similarly, Cov(dZ,,dZJ

= ;

;

~(t, s) = H(2H - 1)/s - t/21f-’ dsdt.

NOTE. We have extended and simplified the method of Gripenberg and Norros [l] by (i) allowing {Zt} to be H-sssi, and (ii) avoiding the use of complicated integral representation w.r.t. Brownian motion which will not, in general, be available for H-sssi processes. PROPOSITION 3.3. For f,g E L2(IR), we have E(j/(s)dZs&t)dZt) where {Z,}

=H(2H-1)~~Si(s)g(t)ls-t12H-2dt(li,

is H-sssi (and not Gaussian).

PROOF. Follows by using Proposition 3.2.

Long Memory Processes

1143

3.1. Optimal Prediction Equation The following theorem gives the optimal form of the predictor Zaa;~ for each a > 0, based on the observations up to time T, T < 0 (i.e., based on Z,, t < T). Let G = {GT : G T - Z a,~ - s,” gT(a,t) dZt} be a class of unbiased estimating functions GT. THEOREM 3.4.

where gG(a,

The optimal predictor in the class G can be represented as an integral

l) is a solution

J

of the integral equation

T

(2H - 1)

gG(a, -t)It - s12H-2 dt = (a + s)2H-1 - s2H-1,

S E (O,T),

0

and the variance of Za,T is given by

Var .&J [

1

T

= H(2H =H

J

- 1)

JJ0

T 0

gG(a, -s)g*(a,

-t)ls - t/2H-2 ds ds

T

o

g$(a, -s) ((a + t)2H-1 - t2H-1) dt.

PROOF. The proof is very similar to the one given in [ll].

The optimal G;

in the class of

estimating functions G is given by

= J &(a, t) 0

G;,

&,T

-

-T

d&,

where (g:, ~)(a, t) satisfy the integral equation

E [GIG;] = E [G?] ,

VGTEG,

or equivalently, by Propositions 3.2 and 3.3,

(2H - 1)

J

T gg(a, -t)lt

- s12H-2 dt = (a + s)2H-1 - Sag-‘,

s E C&T),

0

and the variance formula follows from the properties of the integrals given in Proposition 3.3. NOTE. The predictor depends on gg(a, -t) which satisfies an integral equation with complicated solution. In practice, one may use the fact that XL = Z(t + 1) - Z(t) has stationary increments and use the corresponding prediction formula for Xt in Section 2 to come up wit,h predictor for Z,. 3.2. Filtering

Theorem

In Section 3.1, we have looked at the prediction of an LRD process Xt based observed values of Xt over a period of time. Here, we consider the state space form of the processes with the observed process as an LRD process and the parameter process as a diffusion process. In analogy with the discrete time filtering problem, we obtain the filtered estimate of the parameter process &.

A. THAVANESWARAN AND C. C. HEYDE

1144

THEOREM 3.5.

A direct

analogue

of the model in Theorem

2.1 might

be

d& = a(t, X)& dt + b(t, X) dW,, Xt=&+Et, where Et is a continuous and a(t, X), b(t, X) given

time self-similar

are measurable

process

functions

with exponent H > 0, W, is a Wiener process, Then the optimal filtering equations are of Xt.

by de, = a(t, X)8, dt + Yt VarXt

(dX,

- d&)

and 2

= 2a(t,X)yt

+ b2(&X)

2 ^

(var;:

where A%t and Var(*‘t)

are predictor

and its variance, respectively,

PROOF. The proof follows by optimal combination MSE[8] = (bias)2 + Var(8) for any estimate 8 and

$+[(8,-8,)2( NOTE.

As noted

algorithm

in Example

$1

2.2, unlike

for Markov

2, t >>

of estimating

as in Theorem functions

2.4.

and observing

that

=2a(t,X)yt+b2(t,X). processes

or short

memory

processes,

the

is not recursive.

REFERENCES 1. G. Gripenberg and I. Norros, On the prediction of factional Brownian motion. J. Appl. Prob. 33, 400-410, (1996). 2. J. Beran, Statistical methods for data with long-range dependence, Statistical Science 7 (4), 404-427, (1992). 3. J. Beran, Statistics for Long-Memory Processes, Chapman Ri Hall, New York, (1994). 4. C.C. Heyde, Asymptotic efficiency results for the method of moments with application to estimation for queueing processes, In Queueing Theory and its Applications, (Edited by P.J. Boxma and R. Syki), NorthHolland, Amsterdam, (1988). 5. C.C. Heyde, Some results on in inference for stationary processes and queueing systems, In Queueing and Related Models, (Edited by N. Bhat and I.V. Basawa), Oxford Science Publications, Oxford, (1992). 6. G. Samorodnitsky and M.S. Taqqu, Linear models with long-range dependence and with finite or infinite variance, In New Directions in Time Series Analysis, Part II, pp. 325~340, Springer-Verlag, (1992). 7. C.C. Heyde, On combining quasi-likelihood estimating functions, Stochastic Processes and Their Applications 25, 281-287, (1987). 8. V.P. Godambe, Linear Bayes and optimal estimation, Technical Report Series, #ll, University of Waterloo, (1994).

9. P.J. Brockwell and R.A. Davis, Time Series: 10.

York, (1991). B.B. Mandelbrot

Theory and Methods,

Second Edition, Springer-Verlag,

and J.W. Van Ness, Fractional Brownian motions, frxtional

noises and applications,

Review 10,422-459,(1968). 11. A. Thavaneswaran and M.E. Thompson, A criterion for filtering in semimartingale cesses and Their Applacatzons 28, 259-265, (1988).

models,

Stochastic

New SIAM Pro-