MATHEMATICAL COMPUTER MODELLING PERGAMON
Mathematical
and Computer
Modelling
34 (2001)
1139-1144 www.elsevier.com/locate/mcm
A Note on Filtering for Long Memory Processes A. THAVANESWARAN Department of Statistics The University of Manitoba Winnipeg, Manitoba, Canada R3T 2N2
C. C. HEYDE Columbia University and Australian National University Canberra, ACT 0200, Australia Abstract-This paper illustrates the use of quasilikelihood methods of inference for a class of possibly long-memory processes such as H-sssi (self-similar stationary increments) processes and In particular, they can be used in a general derivation without long-range dependent sequences. assuming normality of the process; this extends the result of Gripenberg and Norros (11. Recursive filtering for models with linear intensity is also discussed in some detail. @ 2001 Elsevier Science Ltd. All rights rese’rved. Keywords-Filtering,
Long-memory,
Estimating
function,
Prediction
1. INTRODUCTION The problem of LRD (long-range dependence) in statistical applications was known to scientists and applied statisticians long before suitable models were formulated. Parsimonious models with such behavior are stationary processes with nonsummable correlations. Many classical limit theorems do not hold for these processes, and rates of convergence are slower than independence or weak dependence. Well known now is the phenomenon in hydrology, under the name “Hurst effect”. LRD cannot be modeled by any of the standard models like ARMA processes, since for these the correlations are summable and decay exponentially rather than hyperbolically. For data where the sample correlations indicate a decay of the order Ilclpa (0 < LY< l), one would have to choose an ARMA-process of very high order. Such dependence can be modeled in a parsimonious way by stationary processes with covariances -yk = Cov(Xt, Xt+k) satisfying yk N Ll(k)lk12H-2, (ICI+ 03, H E (l/2,1) (cy = 2 - 2H); for H E (l/2, l), the correlations are not summable. Statistical inference for long-memory processes is still in its early development (e.g., [2,3]). Even for the simple problems of location estimation and regression, considerable mathematical difficulties arise. Note that to calculate the best linear unbiased estimator (BLUE) of the location parameter, the sample
one would have to know or estimate mean is to be preferred [3-51.
0895-7177/01/s - see front matter PII: SO895-7177(01)00122-4
@ 2001 Elsevier
all covariances.
Science
Therefore,
Ltd. All rights reserved.
for practical
Typeset
purposes
by &Z-m
A. THAVANESWARAN AND C. C. HEYDE
1140
Gripenberg and Norros [l] have used the fractional Brownian dependence and studied the prediction problem in the continuous Brownian
motion
is a specific one-parameter
all lags. This lack of flexibility to model
a real-life
phenomenon.
treated.
In Section
3, some results
dependence
(LRD)
Gaussian
can be a problem In Section
via self-similar
process,
its covariances
if one wants to use fractional
2, filtering
of Samorodnisky processes
motion to model long-range time case. Because fractional
for long-memory and Taqqu
are determined Brownian
randdmsequences
[6] on generating
are given and are applied
at
motion
to continuous
is
long-range time LRD
processes.
2. FILTERING PROCESSES In this section, environments, stochastic
we consider
also known
random
the problem
as processes
processes,
FOR LONG-MEMORY (DISCRETE TIME) of filtering
driven
or hidden
in LRD stochastic
by a random
long-memory
process,
processes.
state
processes
in random
space models,
doubly
These models offer a fair amount
of flexibility in understanding and modeling time series data and are receiving considerable attention in the literature. Our approach to the problem of filtering is semiparametric and is based on the theory of estimating functions. Consider a possible LRD process having the form Xt = Bt + it, t = 1,. . . , n, where {it} is a possibly LRQ sequence with zero mean and known covariance structure. Estimation of Bt, on the parameter on {Xt} without making any restrictions t = 1,2,. . . ) n from n-observations sequence is an ill-posed problem. Let us assume that Bt follows a random walk Bt = 0t-l + at where {ut} is a white noise series having mean zero and variance 02. The following theorem on optimal estimation of et obtained by identifying the sources of variations gives the filtering formula for et. Let SE 1 be the a-field generated by X1, . . , Xt_1. THEOREM 2.1.
The optimal
estimate
o=
of Bt based on X1,. . . ,X,
is given by
n
et- &,
-
xt - Xtlt-1 rt
qt.-1
’
basedonX1,...,Xt-1,2+1 where P,I~_~ = E[(& -8,,,_l)2 ( Sty_,], 6 tt1 _ lisanestimateofBt is the predictor of Xt based on X1,. . . , Xt - 1, and rt its mean square error is 2
E Moreover,
P,lt-l
= P,_llt_l
[
Xt - Xtjt-1
1.
+ 0,” and 1
1
-ze/t
Ptit-1
-tL. Tt
PROOF. Follows by combining the optimal estimating functions as in [7,8]. The elementary estimating function for fixed Qt based on the tth observation is hit = (X, _?tIt_l)/~t, and the corresponding information associated with hit is l/rt. It is of interest to note (c.f. [9, p. 4721) that under the Gaussian assumption, the corresponding elementary estimat,ing function for the parameter of interest, say 8, is
(x1- %) gf 7’t NOW hZt, the corresponding
estimating
function
h2t = Combining
the estimating
functions
for Qt based on prior information,
et - &,,-, P,,+,
hit, hzt, the theorem
follows
is
1141
Long Memory Processes NOTES. 1. Formulas
to compute
the linear
predictors
for Xt can be given in, for example,
[9].
2. The superiority of the estimating function approach when the variance of et is random over the case with nonrandom variance is clear. This was also noted in [8]. 3. The theorem that
covariances EXAMPLE
is true
the recursive
regardless
as the recursive
2.2. Suppose
of LRD or SRD. However,
computation
of the so-called
estimate
innovation
the following variance
does not have the Markov
{Ed} has mean zero and covariance
example
rt involves
shows all the
property.
r(t, s). Then
the optimal
estimate
of & at time t is given by Xt eQ
=
g-1
-
qt_1
q-1
),
rt
with
and q-1
When
r(t, s) is known,
=
Pt-1/t-1
+
d.
X2,,,_, could also be calculated
&t-1
=
t-1
r(t,s)
1 -
r(s,s)
s=l
recursively
using
x s
and rt = r(t,t)
-
t-1 ?(t
C s=l
).
s)
r(*%s)
Moreover, when Et is a short-memory process, then the above recursive to the Kalman filtering algorithm for the state space model
algorithm
corresponds
et = et-1 + at, x1 = et+ Et, with independent It is possible example
errors. to investigate
if rt = c for all t, then
the asymptotics
Ptlt --+ constant
3. CONTINUOUS
of, e.g.,
Ptltunder various
assumptions.
For
as t --f co, etc.
TIME
CASE
In this section, we first give a lemma regarding the construction of LRD via self-similar processes. Then some properties of integrals with respect to long-memory process are listed and applied
to non-Gaussian
LRD processes
to obtain
predictors.
A continuous time process {Z(t), -co < + < m} is self-similar with exponent H > 0 if for aH Z(t), -co < t < 00) have the same finite-dimensional all a > 0, {Z(at), -co < t < 00) and { distributions. That is, P{Z(at~)~Z~,...,Z(at,)~Z,}=P{aNZ(t~)~Z~,...,uHZ(t,)~Z,}, for all a > 0, n = 1,2,. . . , and ti, t2,. . . , t,. A process {Z(t), -co < t < oo} is H-sssi if it is self-similar with exponent H > 0 and if, in addition, it poses stationary increments, that is, if the finite-dimensional distribution of the processes {Z(t + s) - Z(s), -oc < t < co} does not depend on s. For example, Brownian motion is l/2-sssi process.
A.
1142
LEMMA 3.1.
THAVANESWARAN
AND
C. C. HEYDE
The following statements are true.
(i) H < 1; (ii) Xt = Z(t + 1) - Z’(t), the one-step increments of Z(t) (H-sssi) are stationary, and pk A EX,2H(2H - l)k2H-2 as k + co; i.e., {Xt} is an LRD sequence. PROOF. Follows from Mandelbrot and VanNess [lo]. NOTE. The autocovariances of the stationary sequence {X,}
yk
= E&X,+,,
=
- 2k2H + Ik - 112H}
{jk+112H
yf
for k 2 0 and hence, satisfy Tk N EXfH(2H
satisfy the following:
- l)k2H-2 as k ---t co.
NOTE. (i) yk = 0 for H = l/2 and is negative if H < l/2. (ii) The Gaussian assumption was not made to obtain the covariance. PROPOSITION 3.2. Let {Zt}
be H-sssi for --03 < t < 00, with the autocovariance function r(t, s) = ;
where C = EZ2(1).
{
Jt12H + IS12H - It - S12H} 7
Th en the following statements are true.
(i) Cov(Zy,dZt)
dt.
= q
(ii) Cov(dZ,, dZt) = & & T(S, t) dt. PROOF. Cov(.Z,,dZt)
= E[Z, d-Z,] =liiE
[
= ; ki ;
(‘t+h
- ‘t) h
dt I
r(s, t + h) - T(% t> dt h I [
=f;lg
z
Z,
[ (1~1~~-t- It + h/2H- (t + h - s12H - (s12H- (ty
[2Hlt12H-’ - 2Hlt -
s12H-1 dt]
+ (s - t(2H) 1 h]
,
i.e., Cov(Z,, d.Z,) = H [lt(2H-1 - It - sI~~-~] dt = ;
r(s, t) dt.
Similarly, Cov(dZ,,dZJ
= ;
;
~(t, s) = H(2H - 1)/s - t/21f-’ dsdt.
NOTE. We have extended and simplified the method of Gripenberg and Norros [l] by (i) allowing {Zt} to be H-sssi, and (ii) avoiding the use of complicated integral representation w.r.t. Brownian motion which will not, in general, be available for H-sssi processes. PROPOSITION 3.3. For f,g E L2(IR), we have E(j/(s)dZs&t)dZt) where {Z,}
=H(2H-1)~~Si(s)g(t)ls-t12H-2dt(li,
is H-sssi (and not Gaussian).
PROOF. Follows by using Proposition 3.2.
Long Memory Processes
1143
3.1. Optimal Prediction Equation The following theorem gives the optimal form of the predictor Zaa;~ for each a > 0, based on the observations up to time T, T < 0 (i.e., based on Z,, t < T). Let G = {GT : G T - Z a,~ - s,” gT(a,t) dZt} be a class of unbiased estimating functions GT. THEOREM 3.4.
where gG(a,
The optimal predictor in the class G can be represented as an integral
l) is a solution
J
of the integral equation
T
(2H - 1)
gG(a, -t)It - s12H-2 dt = (a + s)2H-1 - s2H-1,
S E (O,T),
0
and the variance of Za,T is given by
Var .&J [
1
T
= H(2H =H
J
- 1)
JJ0
T 0
gG(a, -s)g*(a,
-t)ls - t/2H-2 ds ds
T
o
g$(a, -s) ((a + t)2H-1 - t2H-1) dt.
PROOF. The proof is very similar to the one given in [ll].
The optimal G;
in the class of
estimating functions G is given by
= J &(a, t) 0
G;,
&,T
-
-T
d&,
where (g:, ~)(a, t) satisfy the integral equation
E [GIG;] = E [G?] ,
VGTEG,
or equivalently, by Propositions 3.2 and 3.3,
(2H - 1)
J
T gg(a, -t)lt
- s12H-2 dt = (a + s)2H-1 - Sag-‘,
s E C&T),
0
and the variance formula follows from the properties of the integrals given in Proposition 3.3. NOTE. The predictor depends on gg(a, -t) which satisfies an integral equation with complicated solution. In practice, one may use the fact that XL = Z(t + 1) - Z(t) has stationary increments and use the corresponding prediction formula for Xt in Section 2 to come up wit,h predictor for Z,. 3.2. Filtering
Theorem
In Section 3.1, we have looked at the prediction of an LRD process Xt based observed values of Xt over a period of time. Here, we consider the state space form of the processes with the observed process as an LRD process and the parameter process as a diffusion process. In analogy with the discrete time filtering problem, we obtain the filtered estimate of the parameter process &.
A. THAVANESWARAN AND C. C. HEYDE
1144
THEOREM 3.5.
A direct
analogue
of the model in Theorem
2.1 might
be
d& = a(t, X)& dt + b(t, X) dW,, Xt=&+Et, where Et is a continuous and a(t, X), b(t, X) given
time self-similar
are measurable
process
functions
with exponent H > 0, W, is a Wiener process, Then the optimal filtering equations are of Xt.
by de, = a(t, X)8, dt + Yt VarXt
(dX,
- d&)
and 2
= 2a(t,X)yt
+ b2(&X)
2 ^
(var;:
where A%t and Var(*‘t)
are predictor
and its variance, respectively,
PROOF. The proof follows by optimal combination MSE[8] = (bias)2 + Var(8) for any estimate 8 and
$+[(8,-8,)2( NOTE.
As noted
algorithm
in Example
$1
2.2, unlike
for Markov
2, t >>
of estimating
as in Theorem functions
2.4.
and observing
that
=2a(t,X)yt+b2(t,X). processes
or short
memory
processes,
the
is not recursive.
REFERENCES 1. G. Gripenberg and I. Norros, On the prediction of factional Brownian motion. J. Appl. Prob. 33, 400-410, (1996). 2. J. Beran, Statistical methods for data with long-range dependence, Statistical Science 7 (4), 404-427, (1992). 3. J. Beran, Statistics for Long-Memory Processes, Chapman Ri Hall, New York, (1994). 4. C.C. Heyde, Asymptotic efficiency results for the method of moments with application to estimation for queueing processes, In Queueing Theory and its Applications, (Edited by P.J. Boxma and R. Syki), NorthHolland, Amsterdam, (1988). 5. C.C. Heyde, Some results on in inference for stationary processes and queueing systems, In Queueing and Related Models, (Edited by N. Bhat and I.V. Basawa), Oxford Science Publications, Oxford, (1992). 6. G. Samorodnitsky and M.S. Taqqu, Linear models with long-range dependence and with finite or infinite variance, In New Directions in Time Series Analysis, Part II, pp. 325~340, Springer-Verlag, (1992). 7. C.C. Heyde, On combining quasi-likelihood estimating functions, Stochastic Processes and Their Applications 25, 281-287, (1987). 8. V.P. Godambe, Linear Bayes and optimal estimation, Technical Report Series, #ll, University of Waterloo, (1994).
9. P.J. Brockwell and R.A. Davis, Time Series: 10.
York, (1991). B.B. Mandelbrot
Theory and Methods,
Second Edition, Springer-Verlag,
and J.W. Van Ness, Fractional Brownian motions, frxtional
noises and applications,
Review 10,422-459,(1968). 11. A. Thavaneswaran and M.E. Thompson, A criterion for filtering in semimartingale cesses and Their Applacatzons 28, 259-265, (1988).
models,
Stochastic
New SIAM Pro-