Systems & Control Letters 15 (1990) 417-423 North-Holland
417
On the martingale approximation of the estimation error of ARMA parameters Lfiszl6 G e r e n c s 6 r * Computer Vision and Robotics Laboratory, M c G i l l Research Centre for Intelligent Machines, McGill University, Montrdal, Quebec, Canada H 3 A 2A 7
Received 1 July 1990 Revised 30 September 1990 Abstract: T h e aim of this paper is to prove a theorem which is
instrumental in verifying Rissanen's tail condition for the estimation error of the parameters of a Gaussian A R M A process. We get an improved error bound for the martingale approximation of the estimation error for a wide class of ARMA processes. Keywords: ARMA process; prediction error estimation; strong
approximation.
1. Introduction
This paper has been motivated by a problem formulated in [17] (cf. also [18]) as follows: Let the maximum-likelihood estimator of the parameter vector 0* of a Gaussian A R M A ( p , q) process based on N samples be ~N" 0* is assumed to belong to an appropriate compact domain D* c R p+q. It was conjectured that for any c > 0 the following inequality holds: oo
E N=I
sup p ( N t / 2 1 t ~ N - O * I > C
log N ) < oo.
O*ED*
(TC) Under the hypothesis that the conjecture is true, the Rissanen-Shannon inequality is applicable to stationary Gaussian A R M A processes, and a lower bound for the mean cumulated prediction error was obtained, which reflects the cost of parameter uncertainty (cf. [10]).
Rissanen's conjecture was proved in [5]. The aim of this paper is to present a significantly simplified proof of a result on the martingale approximation of the parameter estimation error, which is instrumental in the proof of Rissanen's conjecture. The proof is a careful reexamination of a standard technique (linearization around the estimator ~ ) combined with recently published inequalities for a class of mixing processes [6]. Thus we shall get a very sharp bound on the error term of a standard martingale approximation of ~ N - 0 *, and many asymptotic properties o f / ~ N 0 * can be derived from those of martingales. Thus for example central limit theorems (CLT's) and laws of iterated logarithms (LIL's) are easily obtained. Prior to the result of this paper very little has been known about the fine asymptotics of t~N - 0 *. In [23] Taniguchi presents an Edgeworth expansion of/~v - 0 * but his result is not applicable to settle Rissanen's conjecture. However the result of the present paper combined with the result given in [9] does give a positive answer to Rissanen's conjecture. The results of the paper are easily extended to multivariable finite-dimensional linear stochastic systems (cf. [7]), and to continuous-time systems driven by a diffusion term (cf. [8]). However, these extensions are not always obvious since an important uniqueness theorem of .~str/3m and Srderstr/Sm has no general multivariable analogue. Some partial results have been given in [20] and [24]. Uniqueness is essential in the first part of the proof of Lemma 2.3. Now we specify the notations and technical conditions for the present paper. Let (Yn), n - 0, + 1, ___2 . . . . , be a second order stationary ARMA (p, q) process satisfying the following difference equation: Yn -Jr- a ~' y n _ 1 -Jr- • • • -k- a p Yn _ p
* On leave from Computer and Automation Institute of the Hungarian Academy of Sciences, Budapest.
= e n + c~'en_ a + • . .
0167-6911/90/$03.50 © 1990 - Elsevier Science Publishers B.V. (North-Holland)
+ C q e n _ q.
(1.1)
L, Gerencs& / Martingale approximation of estimation error
418
Let A*, C* be polynomials of the backward shift operator. Then (1.1) is sometimes written in a shorthand notation as A*y = C*e. Define
and the estimate ff~ of 0* is defined as the solution of the equation
8----VN(O, 0 " ) = VoN(O, 0 " ) = 0 .
00
P
A * ( z -1) = y" a*z -i, i=0 q
= E i=0
Condition 1.1. A * ( z -1) a n d C * ( z -1) have all their roots strictly inside the unit circle, i.e. A * ( z -1) and C * ( z -1) are asymptotically stable. Moreover, we assume that they are relative prime and a d ' = c ~ = l .
(1.2)
(Here differentiation is taken both in the almost sure and in the M-sense. For the definition of the latter cf. the Appendix.) More exactly ON is a random vector such that 0N ~ D for all ~ and if the equation (1.2) has a unique solution in D, then ON is equal to this solution. By the measurable selection such a random variable does exist. It is easy to see that e,(0, 0 " ) is a smooth function of 0 for all ~0, and hence (1.2) can be written as N
Y'. co,(O, O*)e,(O, 0 " 1 = 0 .
(1.3)
Condition 1.2. ( e , ) is a discrete-time, second-order stationary L-mixing martingale-difference process with respect to a pair of families of o-algebras ( ~ , ~ + ) , n = 0, + 1, + 2 . . . . . such that
Let us introduce the asymptotic cost function defined by
E(e~ I ~ - 1 )
W(O, 0* ) = lim T~ E e 2, ( 0 , 0 " ) .
= 0 *2 = const, a.s.
The concept of L-mixing together with the conditions imposed onto ~ , ~ + are described in the Appendix. A detailed exposition is given in [6]. Let G c R p+q denote the set of O's such that the corresponding polynomials A ( z -1) and C(z -1) are stable. G is an open set. Let D* and D be compact domains such that 0 * ~ D * c i n t D and D c G. Here int D denotes the interior of D. To estimate the unknown parameters a * , c~, i = 1 . . . . . p, j = 1 . . . . . q, and the unknown variance 0 *2 we use the prediction-error method which works as follows. Let us take an arbitrary 0 ~ D and define an estimated prediction error process (e~), n > 0, by the equation e = ( A / C ) y with initial values e, = y , = 0 for n < 0. Let the coefficients of A ( z -a) and C(z -1) be denoted by a~ and e;, respectively, and set
n=l
The function W(O, 0 " ) is smooth in the interior of D and we have
We(O*,O*)=O
and
R*~=Woo(O*,O*)>O,
i.e. R* is positive definite. It is well known that N 1 / 2 ( O N - 0") has the asymptotic distribution N(0, a * (R * ) - 1). Various forms of the C L T are given in e.g. [1,3,4,11,12,15, 16,21]. However the rate of convergence to the normal law has not been investigated except in [23] where an asymptotic expansion of the empirical distribution is given.
2. The martingale approximation of 0N -- O *
Theorem 2.1. Under Condition 1.1 and 1.2 we have
0 = (a 1..... ap, C1. . . . . Cq) T. u -0.= To stress the dependence of (e,) on 0 and 0* we shall write % = e, (0, 0 * ). Then the cost function associated with the prediction-error method is given by
-(R*)-I~
E Co,(0*, O * ) % + r u n=l
(2.1) where ru = 0 M ( N - 1), i.e. we have for all 1 < q < 00,
N
v (o,
E n=l
o*),
s u p N E 1/q I rN [q • ~ . N
(2.2)
L Gerencs& / Martingale approximation of estimation error
It is easy to see from the proof that (2.2) holds uniformly in 0* ~ D*. The power of the above theorem lies in the fact that the analysis of the estimation error is reduced to the analysis of a martingale. Since the error term is controlled in a convenient way, many statements for martingales ~uch as CLT or LIL) carry over to the process 0N - 0 *. Also the verification of the tail condition (TC) for /~N -- 0 * is trivially reduced to the verification of a similar tail condition for the dominant term in (2.1). The latter problem can be solved in a straightforward way in the Gaussian case using the theorem below (of. [5,9]): Theorem 2.2. Let (en) be a Gaussian white noise
and let ~=o(ei:
i
~+=o{ei:i>n
}.
Let ( f , ) be an R P-valued L-mixing process with respect to ( ~ , .~+), such that Ef, f ~ = R* a.s. for all n. Then for all e > O, N
E f,e, = iN + OM(N2/5+~) n=l
419
exists outside the sphere I 0 - 0* I > d then we have for
aVoN=
1
sup
V0N(0, 0") -- W0(0, 0")
O~D, O*~D*
the inequality aVoN > d'. But the process uo(0, 0 " ) = e0,(0, 0 * ) e , ( 0 , 0 " ) -
Wo(0, 0 " )
is an L-mixing process uniformly in (0, 0 " ) and the same holds for the process (uo,(O, 0")). Morever we have
c,~=Eu,(O, 0 " ) = O ( a " ) with some a such that 0 < a < 1. Indeed if the initial values %(0, 0 " ) , e0o(0, 0 " ) had stationary distribution then we had c, = 0. On the other hand the effects of nonstationary initial values %(0, 0 " ) = 0 and e0o(0, 0 " ) = 0 decay exponentially. Hence by Theorem 3.3 we have 8VoN= O M ( N - a/Z), therefore
P(aVoN> d ) = O ( U -s) with any s by Markov's inequality, and thus the statement at the beginning of the proof follows. Let us now consider the random variable
aVooN
where iN -- N(0, NR* ). = We start the proof of Theorem 2.1 with a lemma.
sup
~1 Voou (0, 0 *) - Woo(0, 0 " )
.
OED, O*~D*
By the same argument as above we have
P ( aVoON > a " ) = OM( N -~) Lemma 2.3. For any d > 0 the equation (1.2) has a
unique solution in D such that it is also in the sphere { 1 0 - 0 " 1 < d } with probability at least 1 O ( N -~) for any s > 0 where the constant in the error term O( N -~) = CN -s depends only on d and S.
Proof. We show first that the probability to have a solution outside the sphere { 0: I 0 - 0 * I < d ' } is less than O ( N -~) with any s > 0. Indeed, the equation W0(0, 0 * ) = 0 has a single solution 0 = 0* in D (cf. [2]), thus for any d > 0 we have
for any d " > 0 and hence for the event
AN=
aV0N< a ' ,
aVoo
we have P ( A N ) > 1 -- O ( N -~) with any s > 0. But on A N the equation (1.2) has a unique solution whenever d ' and d " are sufficiently small. Indeed, the equation W0(0, 0 " ) has a unique solution 0 = 0* in D by [2] and hence the existence of a unique solution of (1.2) can easily be derived from the implicit function theorem (cf. Lemma 3.4). Thus the lemma has been proved. Let us now consider equation (1.2) and write it
d" &inf( IWo(O, O*)l: O ~ D , a* ~ D * , as
IO-O*l>--d}>O
0 = V N(aN, 0*) since W0(0, 0 " ) is continuous in (0, 0 " ) and D x D * is compact. Therefore if a solution of (1.2)
= V o N ( O * , O * ) + VooN(I~N--O* )
(2.3)
420
L Gerencs& / Martingale approximation of estimation error
improve the inequality (2.5) be writing OM(N 1/2) on the right hand side. Thus we get after integration with respect to X that
where
1IooN = /o1Voou ((1 - X) 8 * + M~N , 8 * ) d )~.
]IWooN-- Woo(8*, O* ) [ ] = O M ( N -1/2)
(2.9)
Lemma 2.4. We have t~N - O* = OM( N-1/2). Proof. First note that VaN(O*, 8 * ) = O M ( N 1/2) by Burkholder's inequality for martingales (cf. e.g. Theorem 3.3.6 in [22]), since e,(0*, 0 " ) = e, + OM(a_") with some [a I < 1. Let us now investigate Va0N. Define
~'OON= foIWOo((1--~.)8* +~.ON, 8*) dX.
(2.4)
On the other hand the inequality Oa4 ( N - 1/2) implies that 1 -VooN - WOON "~
~Vo0 u =
OM(U-a/2).
=
(2.10)
Hence we finally get
VaOU-- W00(0*, 8*)
=OM(N
1/2).
(2.11)
Obviously WO0N> d with some positive c on A N if d is sufficiently small. Indeed since W is smooth we have for 0 < X _< 1,
Let us now focus on the event AN, where we have the inequality (2.6). A simple rearrangement shows that (2.6) and (2.11) imply
ii Wo0(8* +
X A N ---1 VOON-- N1
8 , ) - Wo0(8*, 8 , ) , (2.5)
where C is a constant depending on the system parameters and 0 2= Ee 2. Hence if d is sufficiently small then the positive definiteness of WO0(8*, 0 " ) and (2.4) imply that WooN > cI with some positive c. Since on A N we have
w0~l(8., 8" )
= 0 m (N-
3/2 ) (2.12)
Now we can get our final estimate for t~N - 0 * by substituting (2.12) into (2.3) to obtain
_ ---1 . .) -- -- XANVooNVoN ( 8 , 8
1'N VaON -- WOON < d " ,
= -X.%v(1WO~I(8 *, 8 * ) + O M ( N - 3 / 2 ) ) it follows that if d " is sufficiently small then •v
1-
~ rain("~ VOON) > C ~ ' 0
N(8*, 8 * )
(2.6) = --XANWO-01(8 *, 8
on A N where in general ~ n ( B ) denotes the minimal eigenvalue of the matrix B. Hence II F0~ II < CN-a on AN with some nonrandom constant C and we get from (2.3), XAN(ON--
8" ) = OM(N-1/2).
(2.7)
Combining this inequality with the previous inequality P(ACN)= O(N -s) where A~v denotes the complement of An, and using the fact that [0N -8" [ is bounded, we get for any s > 0,
.
1
)"~Vou(8* , 8*)
+ O M ( N -1)
= -w
1(8 *, 8
.
1
8*)
+OM(N-') 1
= -(R*)-'-~
N
Z %,(8*, 8*)e,
n=l
+ OM(N-a).
Adding this equality to (2.7) we get the lemma.
The last but one equality is obtained by taking into account that 1 - XAN= OM(N-s) with any s > 0 and that the expression in the first term
Now we can complete the proof of Theorem 2.1 as follows. Using the result of the lemma we can
multiplied by XA N is OM(N -1/2) (hence also OM(1)). Finally adding the equality (2.8) to (2.13) we get the proposition of the theorem.
Xa%(~N--8*)=OM(N-S).
(2.8)
L Gerencsdr/ Martingaleapproximation of estimation error 3. Appendix: S o m e previous results on L-mixing processes We summarize a few results published in [6] and used in this paper. The set of real numbers will be denoted by R, the p-dimensional Euclidean space will be denoted by R P. Let D c R p be a compact domain and let the stochastic process (x,(O)) be defined on l x D, where Z denotes the set of natural numbers.
421
Example. Discrete-time stationary Gaussian ARM A processes are L-mixing. (This can be seen using a state-space representation.) Theorem 3.1 (cf. Theorem 1.1 in [6]). Let (u,), n > O, be an L-mixing process with E u , = 0 for all n and let ( f , ) be a deterministic sequence. Then we haoe for all 1 < m < oo, •
N
2m
E 1/2m n~=lfnUn Definition 3.1. We say that (x,(O)) is M-bounded if for all 1 < q < oe,
< Cm
M q ( x ) = sup E 1 / q l x , ( O ) I q < oo.
l/21,, r1/21u)
__
~,Z2m \~*}Jt2m \
n>0
O~D
where Cm = 2(2m - 1) 1/2.
We say that a sequence of r.v. x , tends to a r.v. x in the M-sense if for all q >_ 1 we have lim E 1/q Ix n - x [q
n --+O0
=
O.
Define
Ax/A~O = Ix,(O + h) - x , ( O ) I / I hl ~ for n _ 0 ,
Similarly we can define differentiation in the Msense. A basic notion in our discussion is a kind of mixing, which appeared in a different form in [14], where it was called 'exponential stability'. See also [15,191. Let ( ~ ) , n > 0, be a family of monotone increasing o-algebras, and ( ~ + ) , n _> 0, be a monotone decreasing family of o-algebras. We assume that for all n > 0, ~ and ~ + are independent. For n < 0, o~,+ = ~0 ÷ . A typical example is provided by the o-algebras
.~,=o(ei: i
,~,+=o{ei: i>n},
where (e~) is an i.i.d, sequence of r.v. Definition 3.2. A stochastic process (x n, 0), n >_ 0, is L-mixing with respect to ( ~ , ~ + ) uniformly in 0 if it is E-progressively measurable, M-bounded and with "r being a positive integer and "Yq( T, X ) = ' y q ( T )
= sup E ~/q I x , ( O ) - E ( x , ( O ) I ~ + _ r ) I q, n>_'r
O~D
we have for any 1 < q < oo,
rq = rq(x) = E r=l
<
O~O+h~D
with 0 < a _ < 1.
Definition 3.3. The stochastic process x,(O) is M-HNder-continuous in 0 with exponent a if the process Ax/A~O is M-bounded, i.e. if for all 1 < q < oo we have Mq(Ax/mao)
=
sup
E1/qlxn(O+h)+xn(O)lq/lh[
a
n>0
O~O+h~D
Example. If (x,(O)) is absolutely continuous with respect to O a.s. and the gradient process (x,(O)) is M-bounded, then (x,(O)) is M-HNder-continuous with a = 1, in other words (x,(O)) is MLipschitz-continuous. Let us consider the case when (xn(O)) is a stochastic process which is measurable, separable, M-bounded and M-HNder-continuous in O with exponent a for O ~ D. By Kolmogorov's theorem (cf e.g. Theorem 19, Appendix I of [13]), the realizations of (x,(O)) are continuous in 0 with probability 1, and hence we can define for almost all ~, x* = max Ixn(0)
0~D0
I
1_, Gerencsdr / Martingale approximation of estimation error
422
where D o c i n t D is a compact domain. As the realizations of x , ( O ) are continuous, x * is measurable with respect to ~-, that is x* is a random variable. We shall estimate its moments.
Theorem 3.2 (cf. Theorem 3.4 in [6]). Assume that ( x , ( O)) is a stochastic process which is measurable, separable, M-bounded and M-Hrlder-continuous in 0 with exponent a for 0 ~ D. Let x * be the random variable defined above. Then we have for all positive integers q and s > p / a ,
working in the Computer and Automation Institute in Budapest. The author wishes to thank Jimmy Baikovicius, Karim Nassiri-Toussi and Zsuzsanna Vfig6 for their careful reading of the manuscript and to Mindle Levitt and Solomon Seifu for their considerable amount of work in the preparation of this document.
References
M q ( X * ) <_ C ( M q s ( X ) q - M q s ( A X / / A a O ) )
where C depends only on p, q, s, a and Do, D. Combining Theorem 3.1 and 3.2 we get the following theorem when f , = 1 and a = 1.
Theorem 3.3. Let u,(O) be an L-mixing process uniformly in 0 E D such that E u , ( 0 ) = 0 for all n >_O, 0 ~ D and assume that A u/AO is also L-mixing, uniformly in O, 0 + h ~ D. Then sup ~1 ~_~ N u.(O) O~Do
= OM(N-1/2).
n=l
I_~mma 3.4. Let D O and D be as above. Let W o(0), 8We(0), 0 ~ D c R p be R P-valued continuously differentiable functions, let for some 0 " ~ Do, Wo( O * ) = O, and let Woo(O* ) be nonsingular. Then for any d > 0 there exists positive numbers d ', d " such that 18W0(0) I < d '
and
II 8Woo(O)
II < d "
for all 0 ~ D O implies that the equation Wo(O ) + 6Wa( O) = 0 has exactly one solution in a neighbourhood of radius d of 0 *. The proof is obtained by the application of the imphcit theorem to the equation W0(0 ) + aSWo(O ) = 0 with 0 < a < 1.
Acknowledgements This research was supported in part by the Natural Sciences and Engineering Research Council under Grant 01329, and by the Hungarian Academy of Sciences under the research project " T h e Mathematics of Control Theory", while
[1] T.W. Anderson, The Statistical Analysis of Time Series (Wiley, New York, 1971). [2] K.J. ,~strSm and T. SSderstrSm, Uniqueness of the maximum-likelihood estimates of the parameters of an ARMA model, IEEE Trans. Automat. Control 19 (1974) 769-773. [3] P.E. Caines, Linear Stochastic Systems (Wiley, New York, 1988). [4] W. Dunsmuir and E.J. Harman, Vector linear time series models, Adv. Appl. Probab. 8 (1976) 339-364. [5] L. Gerencsrr, On the normal approximation of the maximum-likelihood estimator of ARMA parameters, Report WP 49, Computer and Automation Institute of the Hungarian Academy of Sciences, Budapest (1985). Revised as: Verification of Rissanen's tail condition for the parameter estimator of a Gaussian ARMA process, Report TR-CIM-89-6, McGiU Research Center for Intelligent Machines (1989). [6] L. Gerencsrr, On a class of mixing processes, Stochastics 26 (1989) 165-191. [7] L. Gerencsrr, Some new results in the theory of recursive identification, Proe. of the 28th IEEE CDC, Vol. 1 (1989) 242-248. [8] L. Gerencs&, Strong approximation theorems for estimator processes in continuous time, in: I. Berkes, E. Csfiki and P. Rrvrsz, Eds., Limit Theorems in Probability and Statistics, Colloquia Mathematica Societatis Jfinos Bolyai (North-Holland, Amsterdam, 1990, to appear). [9] L. Gerencsrr, Strong approximation theorems for stochastic integrals, Submitted for publication (1990). [10] L. Gerencs~r and J. Rissanen, A prediction bound for Gaussian ARMA processes, Proc. of the 25th CDC, Athens, Vol. 3 (1986) 1487-1490. [11] E.J. Hannan and M. Deistler, The Statistical Theory of Linear Stystems (Wiley, New York, 1988). [12] E.J. Hannan and L. Kavalieris, Multivariate linear time series models, Adv. Appl. Probab. 16 (1984) 492-561. [13] I.A. Ibragimov and R.Z. Khasminskii, Statistical Estimation. Asymptotic Theory (Springer-Verlag, Berlin-New York, 1981). [14] L. Ljung, On consistency and identifiability, Mathematical Programming Study 5 (1976) 169-190. [15] L. Ljung and P.E. Caines, Asymptotic normality of prediction error estimation for approximate system models, Stochastics 3 (1979) 29-46. [16] L. Ljung, System Identification: Theory for the User (Prentice-HaU, Englewood Cliffs, N J, 1987).
L. Gerencs~r / Martingale approximation of estimation error
[17] J. Rissanen, Stochastic complexity and predictive modeling, Annals of Statistics 14 (1986) 1080-1100. [18] J. Rissanen, Stochastic Complexity in Statistical Inquiry (World Scientific, Singapore, 1989). [19] J. Rissanen and P.E. Caines, The strong consistency of maximum likelihood estimators for ARMA processes, Ann Statist. 7 (1979) 297-315. [20] T. SSderstr6m and P. Stoica, Uniqueness of prediction error estimates of multivariable moving average models, Automatica 18 (1982) 617-620. [21] T. S6derstr/Sm and P. Stoica, System Identification (Prentice-Hall, Hemel Hempstead, NY, 1989).
423
[22] W.F. Stout, Almost Sure Convergence (Academic Press, New York, 1974). [23] M. Taniguchi, Validity of Edgeworth expansions of minimum contrast estimates for Gaussian ARMA processes, J. Multivariate Anal. 18 (1986) 1-31. [24] Zs. V~g6 and L. Gerencs6r, Uniqueness of the maximumlikelihood estimates of the Kalman-gain matrix of a state space model, Proc. of the IFA C / I F O R S Conference on Dynamic Modelling of National Economics, Budapest (1985).