Systems & Control Letters 12 (1989) 71-76 North-Holland
On the L
71
consistency of
P.E. C A I N E S Department of Electrical Engineering, McGill University, 3480 University St., Montrdal, P.Q., H3A 2A7 Canada and the Canadian Institute for Advanced Research M. BAYKAL-GURSOY ** Department of Systems Engineering, University of Pennsylvania, PA 19104, U.S.A. Received 3 February 1988 Revised 22 August 1988
estimators *
We shall show that the m i n i m u m ~p(e i°) exists and is L °° convergent as p ~ oo when L °~ denotes the space of functions on the unit circle with essential s u p r e m u m norm. This theorem m a y be applied to the approximation of an infinitely long A R process by a finite A R ( p ) predictor model. First note that for any p~Z+,
fi(ap)= (1--Otp(Z))Y
Abstract: It is shown that subject to a regularity condition on the weighting function F, L 2 (F) least-squares estimators of an analytic function a~ converge in the L ~° norm. Using this fact, a simple argument shows that for an analytic function the number of zeros in the closed unit disk is equal to that of all its least-squares approximations of sufficiently high order. Furthermore, subject to certain hypotheses, statistical least-squares ^Jr are L ~ consistent w.p. 1 if the order p tends to estimators ap infinity sufficiently slowly as a function of N. approximation; L °° approximation; least squares parameter estimation; long AR models.
Keywords: L 2
is the least-squares predictor corresponding to the system described b y the formal power series equation
..(z)y(z)
= w(z),
where % ( 0 ) = 1, ap(Z) has no zeros in the closed unit disk and w is an orthogonal wide-sense stationary process. Next, let a~ol(z), z ~ C, satisfy the same hypothesis as aoo and assume that y is generated by the system described by the formal power series equation
aoo(z)yCz)=w(z);
1. I n t r o d u c t i o n
Let a ~ ( z ) = Y~k=Oak oo z k, Z ~ C, be a complexvalued function which is analytic in an open neighborhood of the closed unit disk D and assume that a ~ ( z ) has real coefficients (ak, k = 0, 1 . . . . }. Let dp(e ie) denote the optimal L 2 ( F ) least-squares approximation of ao~(e ie) by p - t h order polynomials with respect to the measure F(dO):
Sp(eiO) zx
L2
it follows by T h e o r e m 1.2, Chapter 2 of [2] that y is a wide-sense stationary process. F o r the process y, the m i n i m u m prediction error for predictors of order p is achieved at dp, where
Sp(e i°) =
=
arg rain p + l 2~r ( ap.o..... ap.p ) R
arg min E(y o -fio(ap)) 2 { %.z ..... %,p } ERP 1 arg min -(-p.1 ..... ~p.~)~n~ 2~r dO
X f02"~lOtp(eiO) - o t ~ ( e i ° ) 1 2 F ( d O ) .
×fo:~' 1 - ( 1 - C t p
(1)
* Work partially supported by the Natural Sciences and Enginering Research Council of Canada Grant 01329 and by the Department of Electrical and Computer Engineering of the University of Newcastle, Newcastle, NSW, Australia. * * Presently at Dept. of Industrial and Systems Engineering, Rutgers University, P.O. Box 909, Piscataway, NJ 08854, U.S.A. 0167-6911/89/$3.50 © 1989, Elsevier Science Publishers B.V. (North-Holland)
=
arg min
(el°))
2 [o/oo(eiO)[2
1
(Ore, 1. . . . . ap,p } ~ R p
x f2~, 1% (e'°_) ~ (___ei_°) ~ 12 dO
P E. Caines, M. B a v k a l - G i i r s q v / L ~ consistency of L-' estimators
72
and this is precisely the form of the equation (1). As is well known, the solution @ can be found from the Yule-Walker equations, i.e., P
RI=
E °tp,kRj k-I
k,
l
The proof o f the theorem then goes as follows: (i) Let @ denote the L2(/O least-squares estimate of a~; then comparing this estimate in the L2(F) norm with @ and using the regularity condition yields ^
where Rj ~= Eyjyo, j = .,., - 1 , O, 1. . . . . In Theorem 2 below we give a statistical application of the L ~ approximation result contained in Theorem 1. Theorem 2 is similar to Theorem 1 of [1] but presents a w.p. 1. convergence property of the estimates of a ~ ( z ) . Furthermore, by virtue of Theorem 1, the proof of Theorem 2 employs the exact p-th order approximating polynomial @ ( z ) as an intermediate quantity, and this enables us to give a transparent proof of the result in question. Our results are related to the work of Ljung and Yuan, see e.g. [5], in that the central problem is that of the convergence of finite approximations to possibly irrational transfer functions. The results here differ, however, from those in the reference cited which essentially treats a regression problem where the observed input process generates via a ~ ( z ) an output which is subject to observation in noise.
^
But the completeness of the exponentials in L2(bt) gives ^
and hence II @ - ~ llL2~F> ~ 0, as p ~ ~c. (ii) For a trigonometric polynomial of degree p it is true that P
II % 11~-< ~
I%,k[
k=0
G(P+I)
1/2
I'%.~1 z
E k=O
= ( P + 1) '/2 II ~p ILL=(.)-
(2)
In addition, the Cauchy estimate [akl < M X k,
2. Main results Theorem 1. Assume that F satisfies the regularity condition
3, d O < F ( d O ) < ~ d O ,
0<-/<~<
~.
then the indicated minimum exists and as p --, oc, (i)
SpL2(F))oG~,
and
(ii)
Sp
L oo
k~E+.
holds for a function a ~ ( z ) analytic and bounded by M in an open connected set containing the closed disk of radius (1 + e) N X-1, where e > 0. Now if a_ denotes the truncation ~2P=0akeik° of v ikO a~ = Zk=0ak e , 0 ~ [0, 2~T], then it follows that the bound &
,a~.
Olke ikO 1
Remark. If c ~ does not vanish on T, these statements are true for dF(O) := l a~(e i°) 1-2 dO, 0 [0, 2~r]. Proof. The minima specifying &p always exist since they are characterized by orthogonal projection in the Hilbert space L2(F). This space is complete because L p spaces with respect to positive measures on arbitrary measurable spaces are complete for 1 < p < ~ ; hence, in particular, those L p spaces whose measures are given by distribution functions are complete.
M < - - X -
(l-X)
L~
'+'
(3)
holds. The theorem follows from the sequence of inequalities: I1G - ~
IlL ~
[] &p -- Otp IIL~ + II ap - am IIL~ (by the triangle inequality) M Xp+l -< ( p + 1) '/2 II '~p - '% IIL=~., + X-----~ (1 -
73
P.E. Caines, M. Baykal-Gi~rsoy / L °° consistency of L 2 estimators
large enough p. This gives a simple proof of the stability of the A R ( p ) system @ estimated by the least-squares method given infinite data. The stability of least-squares estimates of A R ( p ) pedictors is in fact known to be true for all p ~ Z+ because the associated reflection coefficients are all strictly less than 1 (see e.g. [3]).
(by equations (2) and (3)) 1 -< (P + 1)x/2~ II &p - a s IIL2(F) + ( P + 1)1/2 1
II c% - % IIL2(F)
Xp+l + - M (1 - X )
Because &p is the L2(F) least-squares estimate of a S the inequality
Corollary 2. L e t . (ap. N o . . . . . ap,pN ) ~ R p + 1, N ~ Z + , be a stochastic process defined on a probability space ( ~2, B, P ) for each p ~ Z +, and let us assume that either
RHS of equation (4)
(*) IIC- ~PllL~-~0
(by the regularity condition).
2( P + -< 3'
1) 1/2 II % - a s II L2(F).'it."
(4)
M -
~p+l
P-a.s. as N ~ oo f o r each p ~ 71 +,
-
(1 -x) (5)
or
holds. Applying the regularity condition and noting that l] f IlL:,u) < II f IlL~, we obtain RHS of equation (5) <_(~(p+I)I/2
+I)(1M~_~)XP+I.
P-a.s as N --* oo for each p ~ Z +. Then, subject to the conditions of the theorem, there exists a sequence N ( p ) --->~ , as p --->~ , such that
(i)
(*)or(**)
implies
Hence, as p ---, oo, [[ @ - a S [IL= ~ 0, as required. []
P-a.s. as N ~ ~ , (ii)
(**)
114-o s IIL~ --' 0
implies
Let Nv denote the number of zeros of an analytic function 7 in D. Corollary 1. Assume aoo does not vanish on T. Then, subject to the conditions of the theorem, any L2( F ) least-squares estimate ~p of a s satisfies
P-a.s. as N ~ ~ . Proof. (i) follows from the triangle inequality
ll4-°sllL2, ,- ll4- pllL2, , + II ~p - a s
N~ =No~ VN~Z+, for sufficiently large p.
p~Z+,
Vz~T.
together with part (i) of the theorem and the hypothesis (* *). In case (*) is postulated, the result is obtained using the inequalities
I1o;- plIL ,F,-<
But by the theorem there exists a value of p ~ Z+ such that II kp - a s HL ~ < ~" Hence
lap(Z) - as(z) I < l as(z) I, Vz ~ T and so by RouchCs Theorem (see [Rudin, 1966, p.
218]) U~ = N ~ .
IIL2(F),
[]
Proof. If a S does not vanish on T there exists 8 > 0 such that
a___ l a s ( z ) I ,
[14-asllL~,~,-.0
114- PlIL ,.
--<'11 a; -which depend upon the regularity property and the property that II f IIL=(., -< II f IIL~(ii) follows from the triangle inequality
[]
Ilap- as I[L -< II4 Remark. It follows from Corollary 1 that if a S has no zeros in D, then @ has no zeros there for
VN~Z+,
p~Z+,
+ fl
- as H (6)
74
P.E. Caines. M. Baykal-Giirsqv / L ~ consistency of L 2 estimators
together with part (ii) of the theorem and the postulate (*). [] Remark. Notice that in general (* *) does not imply ( * ). Indeed the theorem shows it is a special feature of optimal L2(F) approximations that they permit one to deduce L ~ convergence. However, under the conditions of the theorems in Sections 1 and 2, Chapter 6 of [2], for any fixed p~Z~, aSA,,--~ ~p
mate of ~p based upon data up to the instant N ( p ) , and where ]1 • ]1 denotes Euclidean norm. Let us write the least-squares estimate c%x~e) as a N ( P , = _ [Ru(P)]-'rN~P) P
where 1
N(p)- l
RN(P) ,,
y"
N(p)-p 1
~t.S. as N ~ oo
rpx' P) ~
,4,P,,~eT
k=p VkVk , N(p)-- 1
Y~ ~PYk +, k=p
N(p)-p
and so
IlC(ei )-ae(e" a.s. as X ~
with ~P ~ (Yk, Yk-1 . . . . . Yk-p+l) T. In this notation, @ is given by
P
Z
j=l
se= -[R,I 're
oo.
Thus, condition (* *) implies that condition ( * ) of Corollary 2 holds. Theorem
2.
Let
_ ~ k , z ~ C, ( ~ k ~ ~o~(z)--Zk=O~kZ
where Rp zx l:~.4.p.c.pT and ~ "f'O~0
re =zx (r(1), r(2),.. ., r(P))T =~ E , p y l ,
r 0 ='~ E y 2
R1; k > 0), be a complex-valued function which
together with its inverse is analytic in an open neighborhood of the closed unit disk D. Further assume that the coefficients (ak; k > O) of a ~ ( z ) are real. Let the strictly stationary process y be generated by the system described by the formal power series equation
with the time subscript 0 just taken for definiteness. To examine II c~NCP)- @ II we consider the inequality
([aN(P)]
1_ [ipl-l)rp
ao~(z)y(z ) = e ( z ) where e is an independent indentically distributed zero-mean stochastic process with Ee02=o 2 and Ee4o < oo. Then the least-squares estimate aN(P)(z) of @ ( Z ), p = l . . . . . based upon the data block (Yl . . . . . YN(p)), P = 1,2 . . . . . satisfies IlaucP'(Z)--ao~(Z)llt~0
a.s.
asp --+ oo, when N ( p ) > pS+n, rl > O. Proof. From equation (6) and the left-most inequality in (2) it is clear that it is sufficient to show that, for some function N ( p ) of p,
s, ll - 0
a.s.
as p - , oo, where tte _N~p) is the least-squares esti-
+ [Rp]-'(4'(P)-r,)
.
[Rp] -1 is uniformly bounded in p because is uniformly bounded away from zero, i.e. there exist a > 0 such that
Now R e
e~Ip < Rp
for all p = 1, 2 . . . . . This is true because we may take Xp ~ R P, IL)tp II = 1, and set ~kp(e i0) = ~ ; _ l ) k p je i(j-1)O, then
1---[ 2"[Xp(ei°)12 dO XTpRpXp= 2• Jo [a~(ei0) ___ min ]aoo(e i0) 0el0, 2-~]
2
>0,
with the minimum independent of p.
75
P.E. Caines, M. BaykaI-C,-firsoy / L °~ consistency of L 2 estimators
Further, when I1" IIo denotes the operator norm, we have
< [R~(P']-I
o
2
I
1 E N(p)-i
[ RNp(P)]-I - [ Rp] -I o --
H a n n a n [4] has shown that
1
Ilrpll < lim IIr~ll p--*oo
X ((r(S))2'+
<~
"~ k 4
because a~(eie), 0 ~ [0, 2~r], is bounded away from zero. Hence, to establish the convergence of II a~ (p) - ap 11 to zero as p ~ oo, it is sufficient to prove that
[Irp~(;,-rpl[~o
a.s.
(7)
a
-o 0
~ bjbj+sbj+ibj+,+i )
N(p)- 1
~_, YkYk-i -- r(i) k=i
k=p YkYk-i--r(i)
U(p)
-
i
for some finite positive M, since
s=O (9)
[~ 0
(r(S))2 + k4 ]
e=
1 r2~
- i0-1-2
j°
)
dO<
because otoo(ei°), 0 E [0, 2~r], is bounded away from zero. We now have the following sequence of inequalities where 0 < i < p - 1:
p(SN(p~ _>v) < ±E(SN(~,) ,/2
a.s.
o~
M
< -
as p ~ oo. Let us denote the covariance estimation error term in brackets in (9) by e~ (e). Then to prove (9) holds it is sufficient to prove that aN(p)
)2
1
,
a.s.
p--1 E
r(~-i)r (s+s)
<- N ( p ) - i
which holds for any real n X n matrix, we see that to establish (7) and (8) it is sufficient to show that
pZo
2
(by Chebychev's inequality)
j=o
as p ~ oo. T o establish this we shall show that for all y > O, P(SN(p) >__y ) < oo;
) - i
(8)
as p --* oo. Invoking the standard bound
i,j=l
N(p)
j=O
E N(p)-i
I I A I I o<- IIAII ~
1
where k 4 is the fourth cumulant of e given in the present case by E e 4 - 3 0 2 and b(e i0) ---~-l(ei°), 8 ~ [0, 2~r]. As in [1], we m a y use two applications of the C a u c h y - S c h w a r z inequality to obtain
1 and
R IIo-, 0 a.s.
IS[
Y'~
N ( p ) - i s= -(N(p)-i)+l
But
,aoo(ei°)l-4dO
)
N(p)-i-l(
[[RN(P)--RpII IIR;6[I o " --P a _
< \ 2~r j0
~ YkYk-i-- r(° k=i
(10)
<_-~E
E leiN(p,
j=0
(by the definition of 8N(p))
p=0 then by the Borel-Cantelh lemma it follows that 8~(p) >_ ~, finitely often w.p. 1.
v2 i=0 E E(~'")2 ]
P.E. Caines, M. Baykal-GiJrso / L ~ consistencyof L: estimators
76
with L < 0o independent of p. Finally, by the hypothesis that p S + n < N ( p ) with ~ > 0, we have
= p_2 i~=i [ N ( p ) - i
E N(p)-p
3, 2
1
X
N(p)-I
E
N(p)-i
~ YkYk-, k=i /
N(~-p-1
3p3(N(p)-i) ~ - k N(( -f) = p i=0
r(~)
• P(SN(p)~y)~L ~ p=l
P
(|+n)
p=t
and so we conclude that 8N(p) ~ O a.s. as p ~ oc and the theorem is proven. []
p r(i) M N(p ) - i
-
Acknowledgement
3p 3 1 + Y2 (N(p)-p)
3p4M
--
t
1 N(p)-p
3p 3 (_p)2 +-y2 (N(p)-p)
YkYk-,
k=i
E 2 E
E YkYk i The authors wish to thank Dr. P. Koosis for useful suggestions concerning part (ii) of T h e o r e m 1.
k=i
(r(i)) 2
2 /=o
2
2/2 References
+ 3p 3
22
P - ] tr~ 4xl/2/E
- P E [~"Yk } YZ N 2 ( P ) k=0 by P < N ( p ) 2
4
)1/2
~ Yk-i
and the C a u c h y - S c h w a r z inequality)
+
3p 3 p222 oo y2 NZ(p)i~=o(r(i))2X-"
Lp4
<- N(p--~
(by Ey04 <
oo)
[1] K,N. Berk, Consistent autoregressive spectral estimates, Ann. of Statist. 2 (1974) 489-501. [2] P.E. Caines, Linear Stochastic Systems (John Wiley & Sons, New York, 1988). [3] G.C. Goodwin and K.S. Sin, Adaptive Filtering Prediction and Control. Prentice-Hall, Englewood Cliffs, N J, (1984). [4] EJ. Hannah, Time Series Analysis (Methuen, London, 1960). [5] L. Ljung and C.D. Yuan, Asymptotic properties of blackbox identification of transfer functions, IEEE Trans. Automat. Control 30, (6) (1985) 514-530. [6] W. Rudin, Real and Complex Analysis (McGraw-Hill, New York, 1966).