Journal
of Statistical
Planning
and Inference
27 (1991) 105-123
105
North-Holland
Efficient estimation of the stationary distribution for exponentially ergodic Markov chains Spiridon
Penev *
Institute of Applied Mathematics & Informatics, 1000 Sofia, Bulgaria Received
4 April
Recommended
1989; revised
manuscript
received
Abstract: In a classical paper by Dvoretsky, pirical
distribution
function
been shown.
If X,,X,,
tion function
as an estimator
totic
efficiency
exponentially estimator
arises ergodic
Using the bounded local asymptotic
Kiefer and Wolfowitz
in case of i.i.d.
observations
. . . ,X,,, . . . is only a stationary of the (continuous)
in this Markov
case.
subconvex
minimax
Under
sequence)
in a local asymptotic
1989
bound
Key words and phrases: Local
sequence
stationary some
the asymptotic
X1,X,,
. . ..X.
of the em-
variable
distribution
additional
F. But the question
assumptions
(stationary
distribution
function
X has
distribu-
of its asymphomogeneous is an efficient
sense.
loss function equals
minimaxity
of the random
we still could use the empirical
we show that the empirical
minimax
AMS Subject Classification: Primary
sequences;
21 September
by J. Pfanzagl
62G20,
asymptotic
IF,,(t)-F(t)J) with g-bounded,
g(sup, fi
IY(t)l) w h ere Y(t) is a certain
Eg(sup
62M05;
secondary
minimaxity;
62605,
empirical
Gaussian
increasing, process.
the
62G30. distribution
function;
Markov
stability.
1. Introduction The problem of asymptotic minimaxity of the empirical distribution function (EDF) has attracted the attention of many statisticians. In a pioneering paper of Dvoretsky, Kiefer and Wolfowitz (1956) it was shown that in case of i.i.d. observations the EDF is asymptotically minimax among the collection of all continuous distributions. As Millar (1979) notes, “This paper has stood for over 20 years as one of the pivotal achievements of nonparametric decision theory”. One direction for generalizing this result was to show the asymptotic minimax character of the EDF in case i.i.d. among smaller classes of DF’s (such as the class of the concave distributions, the distributions having a decreasing density with respect to Lebesgue measure, the IFR-distributions and so on). * Research tract
partially
supported
by the Ministry
of Culture,
Science and Education
1035.
0378-3758/91/$03.50
0
1991-Elsevier
Science
Publishers
B.V. (North-Holland)
in Bulgaria;
Con-
S. Penev
106
/ Efficient estimation of the stationary distribution
Kiefer and Wolfowitz (1976) proved the asymptotic minimaxity of the EDF in the class of all concave distributions. In the papers Millar (1979, 1983), using the modern technique of convergence of experiments and the general formulation of the asymptotic minimax theorem of Le Cam (1972), the asymptotic minimaxity of the EDF among each of the above mentioned (and also other) classes was shown. On the other hand it was natural to try to generalize the results of Dvoretsky, Kiefer and Wolfowitz in another direction - namely to avoid the i.i.d. assumption. Indeed this makes the problem harder, but there exists a result of Billingsley (1968) in the literature, showing the existence of a limit distribution for the EDF of a stationary q-mixing sequence of observations. This bolstered our feeling that it could be done similarly for weakly dependent observations. Also there was the book of Roussas (1972), showing the possibility to prove local asymptotic minimax optimality of estimators and tests in parametric situations also in case of observations arising from stationary ergodic Markov sequences. As far as we know, not much has been done in applying this in nonparametric situations. Our contribution here is, using the theory of convergence of experiments, to show that the (piecewise linear and continuous version of the) EDF for a special class of stationary ergodic Markov sequences possesses a local asymptotic minimax (LAM) optimality property. We do not strive for the utmost generality in the assumptions because this would make the proofs more involved. Also the discussion will be heuristic in some part. Let us start with a concise outline of the probabilistic setting we deal with. We consider a homogeneous Markov chain X= (Xn)nkO taking values in (E, %). Here E = [0, l] and YI3its Bore1 a-field. The chain has a regular transition probability kernel P(x,A),xe [0, l],A E % and (to begin with) arbitrary initial distribution g(xo). We assume
that the following
condition
Condition (A). Existence of bounded the unit square such that P(x,A)
=
iA
P(Y
Lebesgue
measure
density,
i.e. a bounded
function
p(y 1x) on
1xl dy
for all A E S&VXE [0, 11 and, moreover, positive
holds:
inf, p(y 1x)26>0
for all Y in a set S with
h(s) > 0.
This condition has important consequences: (i) Doeblin’s condition holds. (ii) There is a uniquely defined invariant probability measure rr for P( . , . ) and moreover exponential convergence holds, i.e. there exist qe (0,l),a>O such that ;trr
;;g
IP”(x,B)--rc(B)I
5aqn
for all n.
107
S. Penev / Efficient estimation of the stationary distribution
(Loeve (1960, Chapter
VII, 27.3), Doob (1956, p. 197)). Here P”(.
n-step transition probability (iii) If 9(X0) := n then the v(n) =aq”. Here we use the (1968, Chapter 20, 20.2). (iv) 7reA lE. Th’is f o 11ows p( . , . ) is bounded. Additionally to Condition
, .) denotes
the
kernel. sequence X= (X,,)nZo is stationary and v-mixing with definition of p-mixing sequence given in Billingsley easily
from
the equality
n=nP
and
the fact
that
(A) we assume:
Condition (B). Z- 2 lE. After these probabilistic preliminaries let us introduce the statistical model we consider. Suppose we have weakly dependent observations x0, xi, x2, . . . ,x, from a stationary Markov chain X with (unknown) transition probability kernel P( . , - ) and initial law 9(X0)= 71 satisfying Conditions (A) and (B). The problem is to estimate the stationary distribution function F. Of course we still could use the EDF as we would certainly do (relying on the result of Dvoretsky, Kiefer and Wolfowitz) if the observations were i.i.d. But now the question of the asymptotic efficiency of this estimator arises. We shall see (Theorem 5.1 and Corollary 5.1) that the (piecewise linear and continuous version of the) EDF preserves its optimality in a local asymptotic minimax sense.
2. Perturbations
and stability
Now we would like to discuss the difficulties which arise when we try to describe LAM lower bounds in non-i.i.d. situations. The complexity here is of qualitative nature. Let us explain it in some words. In order to describe LAM lower bounds one has to consider perturbations of a given probability structure in a neighborhood of this structure. Now in the i.i.d. case describing such neighborhoods is an easy job because once perturbed the density for one observation, one has already perturbed the whole (product-density) structure. In case of dependency there are much more possibilities for perturbance. But they also can not be too much because one has to preserve the main properties of the structure (e.g. stationarity, ergodicity) after the perturbation. That means that the structure has to possess some kind of stability. To describe this property we have first to define the perturbations of the chain in a suitable form. Let H be the set of all measurable functions h(x,y) on the unit square with E h2(Xo,X,)< 03, E(h(X,,X,) 1X0) = 0 almost surely. H is a Hilbert space with respect to the scalar product (hi,&)
= Eh,Wo,X,MXo,X,).
108
S.
Penev / Efficient estimation of the stationary distribution
Let p( . ) denote the density of n with respect We denote the corresponding norm by 1x)&4
llhll?f = jj h2(x>y)p(_v
to A IE.
dy h.
Let HO be the subset of all bounded (sup-norm) h E H. Then H, is dense in H which follows for example from Strasser (1985, Lemma 75.5). For h E HO and sufficiently
large n define
the (perturbed)
ph/di (XT A) =
c
P(Y
A
transition
kernel
I-9(1+ W,Wfi) dy =
P,,,\ii; by:
s
Ph,dY
A
1X>dy.
Now we shall see that under small perturbations of the kernel P( . , a) of the form prescribed the chain X remains geometrically ergodic with invariant probability =h/di
-
Tt.
It is obvious that Condition (A) remains valid under small perturbations using kernels Ph/fi if h E HO and n is large enough for it will exist a positive constant 6, I 6 such that infph/fi(y x
1x)>dl>o
if
inf& x
) x)z6>0.
For n large enough we get transition probability kernels Ph,\i;2(. , . ) satisfying Condition (A) for all h E H,,. Hence (cf. (ii)after Condition (A)), a unique invariant probability zh,,&, (. ) for Ph/fi(. , . ) exists with nh/j, 4 il for all h E HO. The following lemma is true: Lemma 7th/fi
-A
2.1. For n large enough, IEfor all hEHo.
under the Conditions
This lemma shows that also Condition tions we consider. Corollary
(B) remains
(A) and (B) it holds
valid under the small perturba-
2.1. For n large enough, zh/\i;;- rcfor all h E HO.
Our next step is to see that not only remain Conditions (A) and (B) valid for the small perturbations described, but also a kind of stability property is true. To describe it let us denote by m the set of finite signed measures on [0, l] endowed with the variation norm 11 . I/ (which makes it a Banach space). The kernel P( . , -) defines a linear mapping m + m by fiP( .) = 1p(dx)P(x, . ) for p E m. The norm II . 11defines in a natural way a norm in the space of linear bounded operators B : m + m by
IIBII=sup( IIiuBII:lliull%1). Let us fix some arbitrary
d>O. Denote
Kd = h EH: sup lh(x,y)j cd X,Y
. I
109
S. Penev / Efficient estimation of the stationary distribution
The stability
property
means that for all h E Kd and all n 1 n,(d) a constant
C(P)
exists such that /In-
=h/\in
11i
c(p>
11ph/\l;l
(2.1)
- p 11.
In a more general framework and for general norms such stability requirements are studied in Kartashov (1981) and Kartashov (1984) who considers the so-called strongly stable Markov chains. For the variation norm we consider it was shown in Neveu (1964, Chapter V.3.2) (cf. also Kartashov (1981)) that strong stability and in particular (2.1) is ensured by Doeblin’s condition. Hence (cf. (i) after Condition (A)), (2.1) holds in our case. For a given h EH~ and n large enough nh,,,$ --71. Write the density (dn(h,\i,)/ drc)(x) in the form l+&(x). If SUP~,~ lh(x,y))~C, and sup,,,p(y/x)~C,, C= C, . C,, then: supll~~lc 1 11 pPh,& - P)ll4 I?&%. Hence: IIPh,fi - PI/ i c/hi. Finally
(2.2)
using (2.1) we get: fill 7r-
nh/vs
11 =
V%Z
"
~h_(x)~(x)~
dKdic(P)lIPh,vi;-P
11Ice.
(2.3)
10
Write P#,x for the law of X0, Xi,. . . ,X,, under Ph/\ii2( . ) and 9(X0) = nh/&. Denote by pi;,( .) the density with respect to A lE of the measure ?rh&.
Under conditions (A) and (B) it holds
Lemma 2.2.
dP/$&
loi3 where An,h s
dP,j”’
(X0,x,
3 . . . >
X,)
=&h-i
llhll~+o~,y(l)
N(0, I(hll$) under PF’.
3. The construction
of the mapping
z
Now we want to introduce the main steps in finding the LAM bound for the estimators or the stationary distribution of the chain. We are going to follow Millar (1983, Chapter VIII). Fix some h E Ho. Write v, for zh/“;l and Q,( . , . ) for Ph,dx(. , . ). Let us consider F J&(u) = 10” pg,(X) dX. It holds: -iI F,,du) Here,
of course,
= Fe(u) +
I;I 0
h-WPW
dx.
F,(u) = F(u) = jt p(x) d_x= x([O, u]). We have:
&(F,,~(u)
-F(U)) = fi [” h-,(x)p(x) ti = G(~, - rr)[~,4. -0
(3.1)
S.
110
Crucial Kartashov
Penev / Efficient estimation of the stationary distribution
in the sequel is the following (1984): v, = rt(Z- (Q, - P)R)-’
presentation
given in Kartashov
(1981) and
(valid for large n).
Here R = (Z-P+ZZ-' =ZZ+ CT=*=, (P'-ZZ)and ZZ= 1 on is the stationary projector of the transition kernel P, i.e. ZZ(x, dy) = n[O, 11. rr(dy). By Z we denote the identity mapping I: m + m and by QP: QP(x,A)= jQ(x,dy)P(y,A). bounded because of the strong stability property (Kartashov For large n we can present v, as a convergent sum:
The operator R is (1981, Theorem 1)).
~[Z+(Q,-P)R+((Q,-P)R)2+~~~]
v,=
= n(Z+(Q,-P)R)+o(IIQ,-PII)=n(Z+(Q,-P)R)+o(l/fi) For the last equality Hence fi(v,
(2.2) has also been used.
- n)
=&T(Q~-P)R+o(I)=&(Q,-P) In view of the obvious
(Q, - P)Zi'=0 we have:
equality
=\/;;n(Q,-P,.~~~(P’-n)lo,u)+o(l).
fi(P,,&-F(u))
1x) the k-step transition
Let us denote by #)(y ten in the form fi(Ph,fi(~) 111
ZZ+[!~(P~-n)]+o(l). L
(3.2)
density.
Then (3.2) may be writ-
-P(u)) 1
=
Y)P(Y 1X)P(X) dx dy
W,
I 0 .I’0 U +kE,
I
I ZG,Y)P(Y
1i.i 0.0
1x)~(x)P’%
1~) kdydz+o(l).
0
This gives rise to the following definition of the mapping tl : H+B Banach space of continuous functions x in [0,11,x(O) =x(l) = 0, endowed supremum norm): U rlh(z.4)
(B - the with the
1
h(x,y)P(y
=
1X)P(X) tidy
s0 I 0 + j,
”
1
1
0
0
0
Z&Y)P(Y 552
I x)P(x)P(%
I u) dx dy dz.
(3.3)
This mapping could be used for construction of an abstract Wiener space (Millar (1983)). But at this point we have to overcome some additional difficulties. The
S. Penev / Efficient estimation of the stationary distribution
111
problem is that the mapping ~1: H -+ B lacks the desirable one-to-one property (many kernel densities ph(y 1x) with essentially different functions h will yield the same stationary density pb(x)). In order to make the mapping one-to-one, we decompose the space H in a direct sum of ker ~~ and its orthogonal complement H,:H=kerr,@H,. Now if hi, h2 E H and hl = hiker + hlkerl, h2 = hzker + hzkerl are their corresponding decompositions, then rlhl = s,h, iff h, - h2 E ker rl almost surely and this means h Iker~= hzkerL almost surely. Hence if we consider the rather narrower parametrization, using only the subspace HI instead of the space H, then the mapping T : Hl -+ B (T - restriction of T, on the space H,) will be one-to-one.
4. The dual mapping
T*
:
B* --+H:
The closure of TH, in sup-norm gives the space B. The dual space B* coincides with the set of finite signed measures on [0,11. Denote by ( . , 1jB the duality relation between the elements of B* and B. For a finite signed measure m on [0, l] and for arbitrary h E Hl we can write (m, ThjB = (T*m, h) =
Now
s*m(s, i’)h(s, t)p(t
remember that the functions ( X0) = 0 almost surely. Hence for on t we can write:
depending
(4.1)
h EH
we
E(h(&,X,)
1@p(s) dt ds.
any
satisfy the property functions c(s), c~(s)~>~ not
(m, r&n
-
c(s)Ms, OP@j S)P(S)dt b m(du)
U,,
u,(d
-
~dW+~‘(r
] t)
drl
. m(du)W, OPU 1@P(S)dt h 1
=
1
-1
1
m[t,l]--E(s)+
i1 CO .o
(m [r, 11- P&)p(@(r
f so
k=l . h(s,
t)p(f
1 S)P(S)
dt
ds
( t) dr 1
(4.2)
112
S. Penev / Efficient estimation
We have denoted
of the stationary distribution with respect to m of
by E(s), Sk(s)kzl the results of the integration
c(s), C!&)k> 1. The functions S(s) and ?&) should be chosen j 7*m(s, f)p(t ) s) dt = 0 for all s. This will be true if E(s) = j F(b 1s)m(db) and &(s) =
F@+‘)(b
r*m(s, t) EH~ cH,
so that
i.e.
k = 1,2, . . .
1s)m(db),
where
’p’“‘(b ) s) db
Fck)(t ) s) =
.i 0 (here we have used Fubini’s theorem paring (4.1) and (4.2) we get: T*m(s,
C) = m [t,
l] -
’ l F(b d0
and the integration-by-parts
+,t,
m [r, ]_
0
1f7(k+I) (a ( s)m(da)
0
s)m(db)+ we have used Fubini
and integration ~,o,.,(t)
-F(u
pCk)(r1t) dr 1
1
E k=l
(again
Com-
s)mW
si s 1
formula).
(F’k’(r 1t) -Fck+
by parts).
1s) + kgl
‘)(r 1s))m(dr)
s0
(Fck)(u
Hence j t) -Fck+‘)(u
j s))]
m(du).
Now we have to prove that not only T*m(s, t) E H, but even T*m(s, t) EH~. At first we note that if Qn,h, =(I +hi/v%)p(y 1x), i= 1,2, then r,h, -T~Iz~=O means in view of (3.2) that fi~(Q,,~,Qn,JR =O. Because of the one-to-one property of the mapping I-P+I7=R-‘:m -+m (Kartashov (1981)) it follows then that rQ,,h, = nQ,,tzz. Hence if h E ker TV, then essentially # h(s, t)p(t 1s)p(s) ds=O for all t E [0, l] and 1: h(s, t)p(t I s) dt = 0 for all SE [0, l] hold. In view of these equalities
one can easily see that the equality 1
1
T*m(.s, t)h(s, t)p(t 1s)p(s) ds dt = 0 r .o i LO holds,
which means
Proposition
s*m E H,.
4.1. It holds Il7*ll?f=
[I -0
{ E(Y(u)Y(o)))m(du)m(du),
!” 0
where Y(t), t E [0, l] is the ‘Billingsfeyprocess’ (Biflingsley (1968, Theorem 22. l)), i.e.
113
S. Penev / Efficient estimation of the stationary distribution
the Gaussian stochastic process with a.s. continuous paths, E Y(u) = 0, P(Y(O)= Y(l)=O)=l,
E{ Y(u)Y(u)}
= F(min(u,
u))-P’(u)F(o) 1
zp,u)w-% 1.i’ $1ii’ o)(t)F’k’(U two -
I I
I 0 @To - Wu)F(u)
+
0 1
j,
I0
Z[O,
F(u)F(u) *
0
Remark 4.1. Formula (4.3) is just another covariance function in Billingsley (1968).
5. The local asymptotic
information
version
of the formula
(4.3)
22.12 for the
hound
Assume the chain satisfies the Conditions (A) and (B). The expansion of log(dP&/dP&@) in Lemma 2.2 and the first lemma of Le Cam (1972) show that the measures Pi;k and P,$“’are contiguous. Denote by
A n,h
= &
:$i
h(xi*xi+
1).
The Cramer-Wold device, combined with Theorem 20.1 of Billingsley (1968), shows normal with mean that the vector dn,h,,dn,h *,..., dn,hk converges to multivariate vector zero and covariance matrix z=
(oc)i,i=r,2
,_,_,k,
oi,j=
(hi,hj).
If & is the canonical normal cylinder measure on HI, then its characteristic function is Q(h) = exp( - + llhll;) for all h E HI”= H,. The crucial fact is that the image R of this cylinder measure by the mapping r has characteristic function (Millar (1983, Chapter
V.l,
exp{ -9
(1.7))): IIr*mll’,} = exp
1
1
-3 L i’.r 0
0
E[Y(u)Y(u)]m(du)m(du)
. I
i.e. R (on C[O, 11) is the law of the process Y of Proposition 4.1. The process Y(t), te [0, l] possesses continuous trajectories a.s. and R is a a-additive measure on the space B. Denote by Kid (d>O) the set Kid = {h EHI 1SUP~,~ (h(x,y)/ O the convergence of the experiments {Pi;k : h eKld} to the limit experiment t%, the Gaussian shift for the abstract Wiener space (r,H,,B) (see also Millar (1983, Chapters 11.2.3, V.2)). We have proved also that fi(Ph,,&(u) --F(u)) = th(u) + o(l). Hence &(y-F/&
= fi(y-F,)+fi(F,-F,,&
=y’-sh+o(l).
114
S. Penev / Efficient estimation of the stationary distribution
Here Y’ = fi(Y -F,) will be considered as an estimator of the ‘local parameter’ rh if Y is an estimator of the ‘global parameter’ Fe. Let g be a bounded increasing function defined on [O,oo) and I(x) = g(sup, Ix(t)]), where x is a real continuous function on [0,11. If F is the continuous distribution function of X0 then the loss when estimating
F by the function
x will be defined to be equal to l(fi(x-F)). Then the same arguments as in Millar (1983, Theorem 1.10.(a)) or in Strasser (1985, Chapter 83) lead to the following theorem: 5.1. Denote by b any Markov kernel in the decision space. Then under the Conditions (A) and (B) it holds
Theorem
lim lim inf inf “+L= b d-m
sup hcK,d
1(~(Y_Fh,J;;))b(x,dY)~~~(dx)2E
~(Y,,(P,).
13
Here Y,,(,, denotes the ‘Billingsley process’ with F= F,, Fck)(u 1t) = Pk(t, [0, v)), FO(t) = n [0, t). Note that in Millar’s theorem the inf is taken over the so-called generalized procedures (which are a little bit more than the Markov kernels). But taking inf only over the Markov kernels we preserve, of course, the sign of the inequality. Corollary 5.1. Let now Kd= {h EH ) SUP~,~ 1h(x, y) I< d}. (A) and (B) it holds lim lim inf inf d’03 n-03 b
sup hcKd
This is of course true, because has to be taken.
6. The asymptotic
efficiency
Then under Conditions
I(~(Y-F,,~))b(x,dY)P~,;~(dx)lEz(Y,,(,,).
we have enlarged
the set over which the supremum
of the EDF
Now we want to show that the lower bound in Corollary 5.1 can actually be attained and that the efficient estimator attaining it is the (piecewise linear and continuous version of the) EDF (see e.g. Billingsley (1968, Chapter 11.13)). In fact we would like to have the ‘standard’ EDF
but there is a problem because it does not belong to the class of decision functions we consider. Note however that asymptotically it does not matter if we take the ‘standard’ EDF or its continuous version. Abusing notation we shall denote both
115
S. Penev / Efficient estimation of the stationary distribution
of them in the same way. Alternatively one could try to extend Millar’s results for the case of D-spaces instead of separable Banach spaces but we do not make such an effort here. We have to show
(6.1) Our loss function in order
1 is bounded.
The discussion
to show (6.1) it suffices
in Millar
(1984) shows then that
to show that for every fixed d>O
under
I’$&,
9 fi@-Ft,,/d
-
yFo,(P)
for an arbitrary sequence hr, h,, So we have to prove a uniform 22.1 of Billingsley (1968). The illustrate only some steps in the To make a start, we introduce expected value under Pr>h; n e(u)
= EoUI,o,.,Wo)
&,~,/d~)
. . . , h,, . . . in Kd. (in shrinking neighbourhoods) variant of Theorem whole proof is tedious. We skip the details and proof. the following notations: Eh,,,x( .) will denote the
-FoW)U,o,,,(&)-Fo(u))l;
= Ewd(~,o,u~(Xo)
-Fh,/,~(u))(I[o,u)(xk)
-Fh,,/fi(@)b
02w = @o(u)+ 2 E@k,O@). k=l
We have already seen that the v-mixing property, the uniqueness of the stationary distribution and the exponential speed of convergence remain valid for small perturbations of the transition kernel. Now we want to show some uniformity of this validity when h := h,/fi, h, E I&d>0 fixed. First of all, we show the following lemma: Lemma
6.1. Under the Conditions hsz; n
d
suP x
(A) and (B):
sip IP;Jvrn(X,A) - %,J,dA)l 5 4”
for sufficiently large n (here P&G denotes the n-step transition probability corresponding to the kernel Ph,/&(x, A) = jA ph,/fi(y 1x) dy and nh”,,& iS the stationary probability distribution, corresponding to the same kernel; q E (0,1)). Corollary
6.1. For large n it holds: hs~g n
dzere vh,/di)
f
d
i=l
i2v5GXl
= suPx sup,4 jp&,,&,A)
- n,,,,&(A)\.
116
S. Penev / Efficient estimation of the stationary distribution
Lemma 6.2.
uniformly in 1.4and in h,eKd. Analogous tedious calculations like in the proof of Lemma 6.2 show that corresponding uniform (in shrinking neighbourhoods) variants of Lemma 4 (Chapter 20), Theorem 20.1 and Lemma 1 (Chapter 22) in Billingsley (1968) hold. This shows the convergence of the finite dimensional distributions in Theorem 22.1 (Billingsley (1968)) is uniform. Now it remains to show that for large n, for all E > 0, q > 0 and 6 E (0,l) and for all h, E Kd the inequality P~~~~(W(r,,h,/\,6)r&)rSrl holds,
(6.2)
where Yn, h,/&i
=
fi@n
-
Fh,/d
~(x.6)=,t~~P6
IW-W)l.
lnequalrty (6.2) means some kind of ‘uniform tightness’ for all h, E Kd. This can be proved in analogous way as in the proof of Theorem 22.1, using on the corresponding places the uniform variants of Lemma 4 (Chapter 20) and Lemma 1 (Chapter 22). Remark 6.1. The condition of boundedness of the loss function 1 can easily be weakened. The most trivial way to do this is to replace 1 by min(a, 1) and then to like I,,F(~) =g(n{ (x(t) - F(t))2F(dt) with g let a--+00. Also other loss functions bounded, increasing and uniformly continuous could alternatively be used. Remark 6.2. We consider in this paper the state space E= [0,11. In fact this is not a severe restriction. It is not difficult to see that Theorem 22.1 of Billingsley can be reformulated for the case that the state space is R’ by conveniently defining the function gt(o) there. Correspondingly our optimality result can be reformulated to cover this case.
7. Appendix This appendix
contains
the proofs
of the main
statements
in the paper.
Proof of Lemma 2.1. For large n the kernels ph/fi(. , .) satisfy Condition (A) if h E HO and hence (cf. (iv)) zh/fi 9A for n large enough. In view of Condition (B) it suffices to show that n Q rr,,,\i;; for n large enough. We shall see even more. Instead of ‘shrinking’ functions h/fi +O let us consider ‘fixed’ functions h but in suitably small neighbourhood (sup-norm) of 0. Of course if h E HO then h/fi will
S. Penev / Efficient estimation of the stationary distribution
117
belong
to any fixed neighbourhood of 0 for n large enough. Denote by Kd= {~EH: su~x,~ lh(x,y)l 0. We shall see that there exists 6 E (0,l) such that if h E K8 then n + nh. At first note that under Condition (A) there exist r~(0, 1) and 6’>0 such that
for n large enough (cf. Loke (1960, p. 369) or study carefully the proof of Lemma 6.1 below). Now take 6 E (0, min(d’, 1 -r)). Assume there exists A E!J~ such that n(A) > 0 but r&I) = 0 for some h E K6. Then r”zP,“(x,A)r
1 -sup (
lh(x,y)I
&Y
‘PN(x,A)Z(l
for all n large enough. Hence (1 - d)n/rn 5 2/r&4). contradiction if n is large enough. Proof of Lemma
-6)“n(A)/2
>
But (l-&)/r>
1 and we get
2.2. It holds:
for a realization x0, x1, . . . , x,, of the random variables X0(o), X,(o), . . . ,X,(o). Here pr;,( . ) denotes the density with respect to A lE of the invariant measure nh,&. Let us denote
is p-mixing with p(n) = 2a. q” ~ ‘, a> 0, q E (0,l). This is a consequence of the exponential convergence (cf. (ii) and (iii) after Condition (A) or rbragimov and Linnik (1965, Chapter XIX)). Then it holds: C,“=, n’j/$($
then l;lo,yll,...,v,,...
(Theorem Eqol;lk=O
Because
20.1 of Billingsley (1968)). But using the definition for k= 1,2, . . . . Hence under Pp),
of the stationarity
it holds also a.s.
of 17i, we have easily
S. Penev / Efficient estimation of the stationary distribution
118
Proof of Proposition for llr*m1];:
4.1. Using Fubini’s
theorem,
(
Let
US
fix some natural s 01 i 01 (ILO, u,(f)
= F(min(u,
number - F(u
0)) -
k=l
1
m(du)m(do)
F(dt 1s)F(ds) N. Then
1 d)(Jo,
- F(u
u,(t)
1s)]F(dt
Z,o,U)(f)F(k)(v I t)F(dt)
F(u
1 s))F(dt
1s)F(ds)
’ F(v I s)F(u 1s)F(ds), s0
Z,o,U~(t)[F(k)(v 1t)-Fck+‘)(v
_
expression
Is)+ i (F’k’(v( t)-F(k+‘)
. z,o,“)(t)-F(v
.
we get the following
- F(u)F(v)
(7.1)
1s)F(ds)
I
+ NF(u)F(o)
I s)Fck+ ‘)(v I s)F(ds).
(7.3) -analogous to (7.2) with ‘exchanged Using the equality
(7.2)
roles’ of u and v.
“1 Fck)(u
1 t)p(t
1 S)
dt = Ftk+ l)(u / s),
;
(F(‘)(o 1t) - Fck+‘)(u 1s))F(dt
1s)F(ds)
= 0,
(7.4)
(Fck’(u 1t) - Fck+ ‘)(u 1s))F(dt
1s)F(ds)
= 0.
(7.5)
!0 we
get: 1
‘1
to i i0 1
‘1
0
0
.I’I
F(u 1s).
F(o
k=l
1 S) . f k=l
It is easy to check the following
equality:
1
$1
(F’k’(~ 1t) - Fck+ ‘)(u 1s))(F(‘)(o 1t) - F(‘+ ‘)(v ) s)) . F(dt 1s)F(dt) LO i 50 1
‘1
F’k’(~ 1t)F(‘)(o 1t)F(dt)
= 0
-
Fck+ “(u / s)F(‘+ ‘)(v 1s)F(ds). 0
119
S. Penev / Efficient estimation of the stationary distribution
Using
this result,
j,
j,
one has:
1 1 t)F(u
1
t)F(dt)
+
F(u 1t).
s
0
f
0
’F(u
+
1s)F(ds)
1 F(u
=
‘)(u 1s))F(dt
1: (F’@(u 1t) -Fck+ ‘)(u / s))(F(‘)(u ) t) -F”+
i;
F(‘)(u 1t)F(dt)
I=2
I FcN+ “(u 1s)FcN+ “(u 1s)F(ds)
I t). f F(k)(~I i’)F(dt) + s0
k=2
s0
1
FcN+ ‘)(u ) s) . ,fl F(‘+ ‘)(u 1s)F(ds)
_ 5 0 1
FcN+ ‘)(u 1s) . ki, Fck+ ‘)(u 1s)F(ds).
s0
Using
(7.1)-(7.5) 1
and the last equality,
we deduce:
1
Z,o,Uj(t)- F(u 1s) + ,E, (F’k’(~ 1t) -Fck+ ‘) (u I s))]
AN=
i 0 i[ 0 Zro,,,(t)-F(u
1s)+
2
(Fck’(u 1t)-Ftk+‘)
(u ( s))] . F(dt j s)F(ds)
k=l
= F(min(u,
u)) + :
1
) W(W
-
F(u)F(o) 1
Ir;
F(N+
+
- F(u)F(u)
ii
Zlo,uj(W(k)(u +k!,
/ OF(W
ILo,uj(OF(k)(u
k=l
l)(u
1 s)~(N+
1)
(u
1
s)F(ds)
F(u 1t)FcN+ ‘)(u 1t)F(dt)
s
_
s
F(u 1t)FcN+ ‘)(u ) t)F(dt)
I
+
+
(u I s) . ,;, F”+ “(u 1s)F(ds)
NF(u)F(o)
-
FcN+ ‘)(u 1s).
f
1
Fck+ ‘)(u 1s)F(ds)
k=l
. 1
Because of the uniform and exponentially fast convergence, the expression in the third brackets on the right side tends uniformly in u and u to -F(u)F(u) as N+ m. Due to the same reason: NF(u)F(u)
i
I
’ FcN+ ‘) (u 1s) . ,!I F”+ ‘)(u 1s)F(ds)
IF(u) - FcN+ ‘) (u 1s) I . ,il F(‘+ ‘)(u 1GF(ds)
S. Penev / Efficient estimation of the stationary distribution
120
sNeaoqN+‘-+O uniformly
as N--*03
in u and v. Hence lim A,-,, = F(min(u,
v)) -F(u)F(v)
N-CC
1
+
1su)w~(k)(vI Jo,
j,
+
0 fl(O - F(u)F(v)
0 1
k!, isZ[O,V)Wk)(UI
0 fl(O -F(uF(v)
0
I I
*
Therefore
(E(Y(u)Y(v)))m(du)m(do). IIT*di = 1;1; Proof of Lemma
6.1. Denote
A h,/di
=
A h&i
=
by
sup sup {P~,/\~;;(x~,A)-P~,,\~;;(x*,A)). XI,XZ A
Then SUP
{P(Y I Xl)
SUP
XI,XZ A
n-“2h,(x~,~))-~(~
. (I+
5sup
sup
xl,xz (we
1 A
1x2)(1 + n-1’2Mx2,~HI
(P(Y j XI)-P(Y
A
dy
j~2W~+2C2d/fi
sA
used that p( y 1x) is bounded and h, E K,J. But the condition (A) ensures that sup sup XI,XZ A
(consult
(P(YI~,)-P(YI~~))~YI~-~ sA
for this inequality
Loeve (1960, Chapter
Hence there exists (for large n) a constant h, E Kd the inequality Ah,,&< q holds. Now using Loke (1960, Chapter VII.27.3.B)
Proof of Lemma
VII.27.3)). q< 1 such that we have
6.2. Obviously
Eh,/\i;l{~(~(u)-Fh,,~(u))}2 =
Eh,/fi(z[O,
+ 2 i k=l
u)(~O)
(l
-
-
Fh,/6i(@)2
(k/n))E,“,~~{(z,O,u)(XO)
-Fh,,du)) ’ cz[O, .)(xk)
-
Fh,/vd~)))~
independently
of
121
S. Penev / Efficient estimation of the stationary distribution
We want to evaluate from above the difference
lb204-e?,,dmElw -4?“/d~))1*1 5
leow-eo,h,/d~)l n-1
+zkEn l@,(U)1
n-1
n-1
+ 2 kg,@k(U) - k;, @k,h,/dK@) + E, Wn)Qk,hn/\lJ2@) .
(7.6)
Now we use Corollary 7.1 of Ibragimov and Linnik (1965, Chapter XIX) and inequality (20.35) of Billingsley (1968) to verify that the series I,“=, ek and C,“=, Q~,~,/G are absolutely convergent uniformly in h, E Kd, u E [0,11. Hence ,tn l&r +O ; 1:; ,r;
as n-+oo,
I@k,h,/dK +o
(7.7) as n-too
(7.8)
and the convergence is uniform in h, E K,,, u E [0,11. But from (7.6) it follows:
Ia*(U)-Eh./\lJI{~(~n(U)-Fh~,~(U))}*l 5 I@&+@0,h,/fi04+g
le,(u)l
n-1 co n-l + WEE Fi l@/&/d~)I+2 k;, k+r(~)-ek,h,/dA~)) * Apparently, in order to complete the proof, we have only to show the uniform convergence of Cil: (ek(u)-@k,h,,~(z.4)) to zero. It is easy to see that
S. Penev / Efficient estimation of the stationary distribution
122
I +
~ro,u,W~~ki;;c~1s)-Ft,,,\l;;@) -F(%
14 +F@WXW.
s0
Here h;, denotes kernel
the density
P/,,\l;;(x,-4) =
of the stationary
distribution,
s A
PLY 1x)(1 +W%Lw9)
corresponding
to the
dy.
As in (2.3) we have 1
s ~~o,u~W[F&x(~
I!
~(42;,(4 dads 5 G . yk/fi
.
-F/,,,d~)l
14
0
0
with some constants Ct > 0 and y E (0, l), independent Apparently then (l/fi)Ciii yk+O as n+oc. Now consider
of h, E Kd.
1 I[o,
.,CW&z@
19
-
F/z,,,\l;;@)
-
Fck)@
) 4
+
.I’0 On the one hand there exist constants
Cz> 0, /3~ (0,l)
F@W’(W. such that
I ~,o,u,(W&i@
14 -
F/zn/fi@M’W
5 G. Bk,
15 0 1 ~[o,&)[F(~)(~ Ii
because constant
1s)
-F(u)lF(W
5
C2.
Pk
0
of the uniform exponential C,>O, such that
convergence.
On the other hand there exists a
(for the last inequality we have used the comment in Theorem 6 of Kartashov (1984) which states that for minor norms I/Q-P 11 also the inequality sup, 11 Qk - Pk 11 I C IIQ- P I(with some positive constant C is valid). Hence
both the inequalities 1
IS
~~o,u~W~%d~
14
-4z,,d~)-F(~)(~
1 ~)+F(u)lF(ds)
r2C2pk,
0
hold for large n and k. For a given n we choose [. ] denotes the integer part). Then
for example
m(n) = [n1’3] (where
S. Penev / Efficient estimation of the stationary distribution
The expression on the right side of the last inequality small for large n. This completes the proof.
123
can be made
arbitrarily
Acknowledgement The author to substantial
is indebted to the referee for critical reading of earlier version leading improvement of both the style and presentation of this article.
References Billingsley,
P. (1968). Convergence of Probability Measures. Cambridge
Dvoretsky,
A., J. Kiefer and J. Wolfowitz
tion function Doob,
and the classical
(1956). Asymptotic
multinomial
estimator.
(1956). Stochastic Processes. Wiley,
J.L.
Ibragimov, Moscow.
I.A. and Yu.V. Linnik
Kartashov,
N.V. (1981). Strongly
Cambridge.
Ann. Math. Statist. 27, 642-669.
(1965). Independent and Stationary Sequences (in Russian). stable Markov
chains.
In: V.M. Zolotarev of Seminar.
N.V. phase
(1984). state.
Nauka,
and V.V. Kalishnikov,
The Institute
for Systems
Eds., Studies,
Criteria
for uniform
ergodicity
and strong
L.M.
(1972).
Mathematical
stability
of Markov
chains
with
Theory Probab. Math. Statist. 30, 65-81.
Kiefer, J. and J. Wolfowitz (1976). Asymptotically minimax tion functions. Z. Wahrsch. Verw. Gebiete 34, 73-85. LeCam,
Press,
of the sample distribu-
54-59.
Kartashov, general
University character
New York.
Stability Problems for Stochastic Models, Proceedings Moscow,
minimax
Limits
of experiments.
Statistics and Probability.
estimation
of concave
and convex distribu-
In: Proceedings of the Sixfh Berkeley Symposium University
of California
Press,
Berkeley-Los
on
Angeles,
245-261. Loeve,
M. (1960). Probability Theory, 2nd ed. D. van Nostrand,
Millar,
P.W.
(1979). Asymptotic
minimax
theorems
for the sample
Verw. Gebiete 48, 233-252. Millar, P.W. (1983). The Minimax Principle in Asymptotic babilites
de Saint-Fluor
76-265. Millar, P.W.
X1-1981.
Lecture
Princeton, distribution
NJ. function.
Z. Wahrsch.
Statistical Theory. Ecole d’Ete de Pro-
Notes in Mathematics,
Vol. 976. Springer-Verlag,
Berlin,
(1984). A general approach to the optimality of minimum distance estimators. Trans. Amer. Math. Sot. 286 (l), 377-418. Neveu, J. (1964). Bases MathPmatiques du Calcul des ProbabilitB. Masson, Paris. Roussas, G. (1972). Contiguity of Probability Measures. Cambridge University Press, Cambridge. Strasser, H. (1985). Mathematical Theorey of Statistics. W. de Gruyter, Berlin-New York.